English for Airflow Orchestration
Learn the English vocabulary for Apache Airflow: DAGs, tasks, operators, sensors, and backfills, explained for discussing data pipelines clearly.
Airflow’s whole design revolves around DAGs — directed acyclic graphs of tasks — and precise vocabulary about how those tasks relate, retry, and get backfilled is what makes a pipeline discussion useful instead of a vague description of “the job that runs every night.”
Key Vocabulary
DAG (Directed Acyclic Graph) — the top-level definition of a pipeline: a collection of tasks and the dependencies between them, with no cycles allowed, scheduled to run on a defined interval. “The nightly ETL DAG has twelve tasks, with the extraction tasks running in parallel and the transformation task waiting on all of them to finish.”
Task — a single unit of work within a DAG, an instance of an operator configured with specific parameters, representing one node in the DAG’s graph.
“The extract_orders task pulls yesterday’s orders from the source database — it’s one task among several that make up the full DAG.”
Operator — a template defining what kind of work a task performs (running a Python function, executing a SQL query, triggering another DAG), instantiated with specific arguments to become a task.
“We’re using the PythonOperator for the transformation step and the PostgresOperator for the final load — the operator determines what mechanism actually executes the work.”
Sensor — a special kind of operator that waits for a condition to become true (a file to appear, another DAG to finish, a database row to exist) before allowing downstream tasks to proceed. “The DAG starts with a sensor that polls for the source file’s arrival — nothing else runs until that file actually shows up in the bucket.”
Backfill — the process of running a DAG for past scheduled intervals that were missed or need reprocessing, distinct from its regular forward-looking scheduled runs. “After fixing the transformation bug, we ran a backfill for the last two weeks of intervals to correct the historical data, rather than leaving it wrong until new data caught up.”
Common Phrases
- “Is this a DAG-level failure, or did just one task fail?”
- “What operator is this task using, and does that explain the timeout?”
- “Is this a sensor waiting on an external condition, or an actual processing task?”
- “Do we need to backfill this, or is it fine to just let new runs be correct going forward?”
- “Are these tasks running in parallel, or is there a dependency forcing them sequential?”
Example Sentences
Reporting a pipeline failure:
“The load_to_warehouse task failed on today’s DAG run — upstream tasks succeeded, so this looks isolated to a connection timeout on the load step specifically, not a broader pipeline issue.”
Explaining a design decision in a review: “We used a sensor to wait for the upstream team’s export to land before starting our own processing — polling for the file directly avoids us needing a fragile fixed schedule offset.”
Requesting a data fix: “Since the enrichment task was silently dropping a field for the last five days, we’ll need to backfill those five daily intervals once the fix is deployed, not just let it self-correct going forward.”
Professional Tips
- Say DAG when referring to the whole pipeline and task when referring to one step — conflating them (“the DAG failed” when only one task failed) makes triage slower for whoever picks up the incident.
- Name the operator in play when a task behaves unexpectedly — a timeout on a
PythonOperatortask has different likely causes than one on aPostgresOperatortask. - Distinguish a sensor from a processing task explicitly — a sensor stuck waiting looks identical to a stuck task in a dashboard, but the fix is completely different.
- Always state whether a fix requires a backfill — a bug fix without a backfill leaves historical data silently wrong, which is easy to forget once the pipeline is green again.
Practice Exercise
- Write a sentence distinguishing a DAG from a task.
- Describe what a sensor does differently from a regular processing task.
- Explain when you’d need to backfill a DAG after a bug fix.