English for Grafana Alerting
Learn the English vocabulary for Grafana's alerting system: alert rules, contact points, notification policies, and silences.
Most confusion around Grafana alerting comes from mixing up two separate questions — “did this condition actually fire?” and “did the notification actually reach anyone?” — and the vocabulary below exists mainly to keep those two questions apart.
Key Vocabulary
Alert rule — a query paired with a condition and an evaluation interval, which Grafana checks on a schedule to decide whether an alert should fire. “The alert rule itself was correct — it just had a five-minute evaluation interval, so we waited longer than we expected to see it fire.”
Contact point — the destination a firing alert is sent to, such as an email address, Slack channel, or PagerDuty integration, configured independently of the alert rule that triggers it. “The alert fired exactly as expected — the notification just never arrived because the contact point had an outdated Slack webhook URL.”
Notification policy — the routing logic that decides which contact point an alert goes to, based on matching labels, and whether to group or mute certain notifications.
“We weren’t missing alerts — the notification policy was routing everything with the team: payments label to a channel nobody on the current team was watching.”
Silence — a temporary, explicit suppression of notifications matching certain labels, typically used during planned maintenance so real alerts don’t get lost in expected noise. “We created a silence for the database labels during the migration window, so the on-call engineer wouldn’t get paged for expected, self-resolving alerts.”
Flapping — when an alert repeatedly transitions between firing and resolved in quick succession, usually because a threshold sits too close to the metric’s normal variance. “This alert was flapping every few minutes, not because the system was actually unstable, but because the threshold was set right on the edge of normal noise.”
Common Phrases
- “Did the alert rule actually fire, or is this a contact point delivery problem?”
- “Is the notification policy routing this to the right team, or is it matching the wrong label?”
- “Should we put a silence on this during the maintenance window, or leave it live?”
- “Is this alert flapping because the threshold is too tight, or because something is genuinely unstable?”
- “What’s the evaluation interval on this rule — is it actually checking often enough to catch this?”
Example Sentences
Diagnosing a missed alert in a postmortem: “The alert rule fired on time — the real gap was that the contact point pointed at a Slack channel that had been archived.”
Explaining a routing fix:
“We updated the notification policy so anything labeled severity: critical always reaches PagerDuty, regardless of which team owns it.”
Describing maintenance handling: “We scheduled a silence for the exact maintenance window instead of disabling the alert rule entirely, so it re-arms automatically afterward.”
Professional Tips
- Separate an alert rule firing from a contact point delivering when debugging a missed alert — they fail independently and need different fixes.
- Audit notification policy label matching periodically — a routing rule that made sense under an old team structure silently misroutes alerts after a reorg.
- Use a silence instead of disabling an alert rule for planned maintenance — it self-expires and avoids someone forgetting to re-enable the rule afterward.
- Treat a flapping alert as a threshold-tuning problem first, not a system-stability problem, unless other evidence says otherwise.
Practice Exercise
- Explain the difference between an alert rule and a contact point.
- Describe what a notification policy does and why label matching matters.
- Write a sentence explaining why a silence is preferable to disabling an alert rule during maintenance.