DevOps Engineer English Essentials

55 terms and 20 phrases DevOps and SRE engineers use under pressure — across CI/CD, containers, infrastructure, and the high-stakes English of incident updates and on-call escalation.

Last reviewed:

On this page

CI/CD

pipeline
The automated sequence of steps that builds, tests and ships your code.
CI
Continuous Integration — merging and testing changes frequently and automatically.
CD
Continuous Delivery/Deployment — automatically shipping changes to production.
build
Compiling and packaging the code into a runnable form.
artifact
The output of a build (a binary, an image, a zip) stored for later deploy.
deploy
Releasing a new version of the software to an environment.
rollback
Reverting to the previous known-good version after a bad deploy.
canary
Releasing to a small slice of traffic first to catch problems early.
blue-green
Running two identical environments and switching traffic between them.
feature flag
A toggle that turns a feature on or off without redeploying.
staging
A production-like environment used to test before going live.
smoke test
A quick check after deploy that the basics still work.
gate
A required check (tests, approval) that must pass before the pipeline proceeds.
runner / agent
The machine that executes pipeline jobs.

Containers & orchestration

container
A lightweight, isolated package of an app and its dependencies.
image
The immutable template a container is started from.
registry
A store for container images (Docker Hub, ECR, GHCR).
pod
The smallest deployable unit in Kubernetes — one or more containers together.
node
A worker machine in a cluster that runs pods.
cluster
A set of nodes managed together by an orchestrator like Kubernetes.
orchestration
Automatically scheduling, scaling and healing containers.
autoscaling
Adding or removing instances automatically based on load.
horizontal scaling
Adding more instances; vertical scaling means making each one bigger.
service mesh
A layer that manages traffic, security and observability between services.
ingress
The rules that route external traffic into a cluster.
sidecar
A helper container running alongside the main one in the same pod.
liveness probe
A health check that restarts a container if it stops responding.

Infrastructure

provision
To create and configure infrastructure (servers, networks, databases).
IaC
Infrastructure as Code — defining infra in version-controlled files.
state
IaC tools’ record of what infrastructure currently exists.
drift
When real infrastructure no longer matches what the code declares.
idempotent
An operation you can run repeatedly with the same end result.
immutable infrastructure
Replacing servers instead of changing them in place.
secret
A sensitive value (password, token) stored and injected securely.
environment
A named deployment target — dev, staging, production.
load balancer
A component that distributes traffic across instances.
reverse proxy
A server that forwards client requests to backend services.
DNS
The system that maps domain names to IP addresses.
TLS certificate
The credential that enables encrypted HTTPS connections.

Reliability & on-call

SLO
Service Level Objective — the reliability target you commit to internally.
SLA
Service Level Agreement — the reliability promise made to customers.
SLI
Service Level Indicator — the actual metric you measure (e.g. % of fast requests).
error budget
The allowed amount of unreliability before you must stop shipping risky changes.
incident
An unplanned disruption to a service that needs a response.
severity (SEV)
How serious an incident is — SEV1 is critical, SEV3 is minor.
on-call
Being responsible for responding to alerts during a shift.
alert
An automated notification that something is wrong.
pager / paging
Being notified urgently, often outside hours, to handle an incident.
runbook
A step-by-step guide for handling a known operational task or incident.
postmortem
A blameless write-up after an incident explaining cause and fixes.
MTTR
Mean Time To Recovery — average time to restore service after failure.
observability
Understanding system state from logs, metrics and traces.
mitigation
A quick action that reduces impact before the root cause is fixed.
root cause
The underlying reason an incident happened, not just the symptom.
toil
Repetitive manual work that should be automated away.

Key phrases DevOps engineers use at work

  • We’re seeing elevated error rates in production — opening a SEV2 now.
  • Update: we’ve mitigated by rolling back to the previous release; investigating root cause.
  • Current status: customer-facing impact is contained, monitoring for the next 30 minutes.
  • I’m paging the database team — this is beyond what on-call can resolve alone.
  • We’ve burned through most of our error budget this month, so let’s hold risky deploys.
  • The canary is healthy after 10% for an hour — promoting to 100%.
  • Heads up: the deploy to staging is blocked by a failing smoke test. No action needed yet.
  • There’s config drift on the prod cluster — Terraform plan shows three unmanaged changes.
  • Let’s gate this behind a feature flag so we can roll it back instantly if needed.
  • The pipeline is red — the build step is failing on a missing dependency.
  • I’ll write up the postmortem; it’s blameless, so let’s focus on the systemic fix.
  • The pods are getting OOM-killed — we need to bump the memory limit.
  • Autoscaling kicked in during the spike and held latency within the SLO.
  • This alert is noisy and not actionable — let’s tune the threshold to cut the toil.
  • Escalating to the platform team; I’ve added the dashboard link and the relevant logs.
  • Confirmed recovery — MTTR was about 18 minutes. I’ll send the incident summary.
  • We should make this idempotent so re-running the deploy script is always safe.
  • The certificate expires Friday — I’ve scheduled the renewal and added an alert.
  • Yesterday: migrated the CI to the new runners. Today: writing the rollback runbook.
  • Quick question before I proceed: do we want blue-green or a rolling deploy for this service?

How to use this cheatsheet

Incident communication is where DevOps English matters most — clear, calm, factual updates build trust. Memorise the status-update and escalation phrases first; they follow a predictable structure (what’s happening, impact, what you’re doing, next update). Skim the terms to fill gaps, then practise in the linked exercises so the words are ready when the pager goes off.