IntermediateVocabulary#data-science-ml#devops#developer-tools

Apache Airflow DAGs Vocabulary

Practice the vocabulary of orchestrating a multi-step data pipeline as a directed acyclic graph of tasks.

0 / 5 completed

1 / 5

At standup, a dev mentions defining a data pipeline as a set of tasks with explicit dependencies between them, forming a graph with no cycles, so a scheduler can determine the correct execution order. What is this graph called?

2 / 5

During a design review, the team wants a failed task to automatically retry a configured number of times with a delay between attempts, rather than immediately marking the entire DAG run as failed. Which capability supports this?

3 / 5

In a code review, a dev notices a DAG is scheduled with a defined interval and Airflow automatically backfills a run for every past interval that hasn't yet been executed since the DAG's start date. What does this represent?

4 / 5

An incident report shows a transient network error caused a single task partway through a large DAG to fail, and because that task had no retry configured, the entire DAG run was marked failed and had to be manually restarted from the beginning. What practice would prevent this?

5 / 5

During a PR review, a teammate asks why the team defines its pipeline as an Airflow DAG instead of a single monolithic script running every step sequentially in a fixed order. What is the reasoning?