English for Airflow Orchestration

Learn the English vocabulary for Apache Airflow: DAGs, tasks, operators, sensors, and backfills, explained for discussing data pipelines clearly.

Airflow’s whole design revolves around DAGs — directed acyclic graphs of tasks — and precise vocabulary about how those tasks relate, retry, and get backfilled is what makes a pipeline discussion useful instead of a vague description of “the job that runs every night.”

Key Vocabulary

DAG (Directed Acyclic Graph) — the top-level definition of a pipeline: a collection of tasks and the dependencies between them, with no cycles allowed, scheduled to run on a defined interval. “The nightly ETL DAG has twelve tasks, with the extraction tasks running in parallel and the transformation task waiting on all of them to finish.”

Task — a single unit of work within a DAG, an instance of an operator configured with specific parameters, representing one node in the DAG’s graph. “The extract_orders task pulls yesterday’s orders from the source database — it’s one task among several that make up the full DAG.”

Operator — a template defining what kind of work a task performs (running a Python function, executing a SQL query, triggering another DAG), instantiated with specific arguments to become a task. “We’re using the PythonOperator for the transformation step and the PostgresOperator for the final load — the operator determines what mechanism actually executes the work.”

Sensor — a special kind of operator that waits for a condition to become true (a file to appear, another DAG to finish, a database row to exist) before allowing downstream tasks to proceed. “The DAG starts with a sensor that polls for the source file’s arrival — nothing else runs until that file actually shows up in the bucket.”

Backfill — the process of running a DAG for past scheduled intervals that were missed or need reprocessing, distinct from its regular forward-looking scheduled runs. “After fixing the transformation bug, we ran a backfill for the last two weeks of intervals to correct the historical data, rather than leaving it wrong until new data caught up.”

Common Phrases

  • “Is this a DAG-level failure, or did just one task fail?”
  • “What operator is this task using, and does that explain the timeout?”
  • “Is this a sensor waiting on an external condition, or an actual processing task?”
  • “Do we need to backfill this, or is it fine to just let new runs be correct going forward?”
  • “Are these tasks running in parallel, or is there a dependency forcing them sequential?”

Example Sentences

Reporting a pipeline failure: “The load_to_warehouse task failed on today’s DAG run — upstream tasks succeeded, so this looks isolated to a connection timeout on the load step specifically, not a broader pipeline issue.”

Explaining a design decision in a review: “We used a sensor to wait for the upstream team’s export to land before starting our own processing — polling for the file directly avoids us needing a fragile fixed schedule offset.”

Requesting a data fix: “Since the enrichment task was silently dropping a field for the last five days, we’ll need to backfill those five daily intervals once the fix is deployed, not just let it self-correct going forward.”

Professional Tips

  • Say DAG when referring to the whole pipeline and task when referring to one step — conflating them (“the DAG failed” when only one task failed) makes triage slower for whoever picks up the incident.
  • Name the operator in play when a task behaves unexpectedly — a timeout on a PythonOperator task has different likely causes than one on a PostgresOperator task.
  • Distinguish a sensor from a processing task explicitly — a sensor stuck waiting looks identical to a stuck task in a dashboard, but the fix is completely different.
  • Always state whether a fix requires a backfill — a bug fix without a backfill leaves historical data silently wrong, which is easy to forget once the pipeline is green again.

Practice Exercise

  1. Write a sentence distinguishing a DAG from a task.
  2. Describe what a sensor does differently from a regular processing task.
  3. Explain when you’d need to backfill a DAG after a bug fix.