English for OpenTelemetry

Learn the English vocabulary for OpenTelemetry: traces, spans, and instrumentation, explained for discussing distributed observability clearly.

“Where’s the latency coming from” is unanswerable without the right vocabulary in a distributed system, because the delay could be in any one of a dozen services — traces, spans, and instrumentation are the words that let a team point at the exact hop that’s slow.

Key Vocabulary

Trace — the complete record of a single request’s journey through a distributed system, made up of all the spans generated as it passes through each service. “Pull up the trace for that slow request — it’ll show us exactly which service in the chain added the extra 400 milliseconds.”

Span — a single unit of work within a trace, representing one operation (like a database query or an API call) with a start time, duration, and metadata, nested to show the call hierarchy. “The span for the database query is taking 300 of the trace’s 400 milliseconds — that’s where we should focus, not the API gateway.”

Instrumentation — the code (manual or automatic) that generates spans and traces as a service processes requests, the actual mechanism by which observability data gets collected in the first place. “We added instrumentation to the payment service last sprint — before that, it was a black box in every trace, just a gap with no visibility into what it was doing.”

Context propagation — passing trace identifiers between services as a request flows through them, so spans generated by different services can be correctly linked into a single trace instead of appearing as unrelated fragments. “The trace breaks at the queue boundary because context propagation isn’t wired up for async jobs — spans before and after the queue show up as two disconnected traces instead of one.”

Collector — the OpenTelemetry component that receives, processes, and exports telemetry data from instrumented services to a backend (like Jaeger or Datadog) for storage and visualization. “Traces stopped showing up in the dashboard because the collector’s been down for an hour — services are still generating spans, they’re just not reaching anywhere to be stored.”

Common Phrases

  • “Can we pull up the trace for that request and see which span is slow?”
  • “Is this service instrumented yet, or is it still a black box in the trace?”
  • “Is context propagation actually wired up across this service boundary?”
  • “Is the collector healthy, or is that why traces are missing?”
  • “Which span is actually the bottleneck here?”

Example Sentences

Diagnosing latency with a trace: “Looking at the trace, the request spends 50ms in the API gateway and 20ms in auth, but then 800ms in a single span for the recommendation service — that’s clearly where the latency budget is going.”

Explaining a visibility gap: “We can’t see what’s happening in the legacy billing service during incidents because it’s not instrumented — every trace just shows a gap where that service should be, so we’re debugging blind for that hop.”

Describing a broken trace during a retro: “The trace was fragmenting into pieces because context propagation wasn’t implemented for the message queue — spans on either side of the queue looked like two unrelated requests instead of one continuous trace.”

Professional Tips

  • Say span, not “step” or “part,” when pointing at a specific slow operation within a trace — it’s the exact unit observability tools measure, and using it precisely gets you a faster answer from whoever’s reading the trace.
  • Flag missing instrumentation directly as a gap, not just “we don’t have visibility” — naming the specific uninstrumented service tells the team exactly what to fix.
  • Check context propagation first when a trace looks fragmented across an async boundary (queues, background jobs) — it’s the most common reason traces break apart instead of linking correctly.
  • Confirm the collector is healthy before assuming instrumentation itself is broken — a missing trace can mean nothing was ever generated, or that it was generated and lost in transit.

Practice Exercise

  1. Write a sentence explaining the relationship between a trace and a span.
  2. Explain what context propagation does and why it matters across service boundaries.
  3. Describe how you’d diagnose whether missing trace data is an instrumentation problem or a collector problem.