English for Datadog Observability

Learn the English vocabulary for working with Datadog: monitors, dashboards, tags, SLOs, and the terms for discussing observability with your team.

Datadog unifies metrics, logs, traces, and synthetic checks under one platform, and the terms teams use to talk about it — monitors, tags, SLOs — carry specific meaning that’s worth getting right, especially when an alert fires at 3am and precision saves time.

Key Vocabulary

Monitor — a configured rule that evaluates a metric, log query, or trace condition against a threshold and triggers a notification when it’s breached, Datadog’s term for what other tools call an alert rule. “We set up a monitor on p99 latency that pages on-call if it stays above 500ms for five consecutive minutes.”

Tag — a key-value label (like env:production or service:checkout) attached to metrics, logs, and traces that lets you filter, group, and correlate data across all three signal types consistently. “Every service ships with a team tag now, so we can filter the entire dashboard down to just our team’s services instead of scrolling through everyone else’s.”

SLO (Service Level Objective) — a target reliability threshold (like 99.9% of requests succeeding over 30 days) tracked against a defined SLI, with an error budget that shows how much unreliability is left before the target is breached. “Our checkout SLO is 99.9% success over a rolling 30 days — we’re currently burning error budget faster than the month’s pace allows, which is why we froze non-critical deploys.”

Dashboard — a curated collection of widgets (timeseries graphs, query values, heatmaps) built to give an at-a-glance view of a service’s or team’s health, distinct from ad-hoc exploration in the metrics explorer. “The on-call dashboard shows error rate, latency, and saturation for every service in one view — it’s the first thing anyone opens when a page comes in.”

Faceted search (log facets) — the structured attributes extracted from logs (status code, endpoint, user ID) that let you filter and pivot log data without writing a full-text search query. “Instead of grepping the raw log message, we filtered by the http.status_code facet directly — it’s indexed and much faster than a text search across millions of log lines.”

Common Phrases

  • “Is this a monitor threshold breach, or just noisy data that needs a longer evaluation window?”
  • “Are these services tagged consistently, or is that why the dashboard is missing some of them?”
  • “How much error budget is left on this SLO before we need to freeze deploys?”
  • “Is this dashboard curated for on-call, or is it more of an exploration view?”
  • “Can we filter this by a log facet, or do we need a full-text search here?”

Example Sentences

Reporting an incident trigger: “The page came from a monitor on error rate exceeding 5% over a five-minute window — it correctly caught the regression about ninety seconds after the bad deploy went out.”

Explaining a tagging convention in onboarding: “Every service needs env, team, and service tags at minimum — without them, this service won’t show up correctly on the shared dashboards or in cross-team queries.”

Discussing SLO status in a review: “We’ve burned sixty percent of this month’s error budget already, mostly from the incident on the 15th — if we don’t slow down on risky deploys, we’ll breach the SLO before month end.”

Professional Tips

  • Name the specific monitor that triggered a page in incident reports — “an alert fired” without the monitor name forces the reader to go hunting for context that should already be in the report.
  • Enforce tag consistency early and mention it in onboarding — inconsistent tagging is the single most common reason a dashboard silently excludes a service.
  • Reference error budget remaining, not just the SLO target, when discussing release risk — a team with budget left can take more risk than a team that’s already over.
  • Distinguish a curated dashboard from ad-hoc exploration when pointing someone to a view — “check the dashboard” is more useful when it’s clear which one and why it’s the right one for the question.

Practice Exercise

  1. Write a sentence describing a monitor and the condition that triggers it.
  2. Explain what an SLO and error budget mean in your own words.
  3. Describe how tags help correlate data across metrics, logs, and traces.