DevOps & Cloud
Vocabulary for CI/CD pipelines, deployment strategies, SRE practices, and cloud infrastructure.
- Pipeline /ˈpaɪplaɪn/
An automated sequence of stages (build → test → deploy) that takes code from commit to production.
"Every pull request triggers the CI pipeline — it must pass all stages before the branch can be merged."
- Artefact /ˈɑːtɪfækt/
A packaged output of the build stage (Docker image, .jar, .zip) that is versioned and promoted through environments.
"The build artefact is a Docker image tagged with the commit SHA — the same image is deployed to staging and production."
- Container /kənˈteɪnər/
A lightweight, isolated runtime environment that packages an application with its dependencies using OS-level virtualisation.
"Running the app in a container eliminates the 'works on my machine' problem — the same image runs locally and in production."
- Orchestration /ˌɔːkɪˈstreɪʃən/
Automated management of containerised workloads: scheduling, scaling, self-healing, and service discovery (e.g. Kubernetes).
"Kubernetes handles orchestration — when a pod crashes, the controller detects it and schedules a replacement automatically."
- Infrastructure as Code /ˈɪnfrəstrʌktʃər æz kəʊd/
Managing and provisioning infrastructure through machine-readable configuration files (Terraform, Pulumi, CloudFormation) rather than manual processes.
"All our AWS resources are defined in Terraform — we can spin up an identical staging environment with a single apply command."
- Blue-Green Deployment /bluː ɡriːn dɪˈplɔɪmənt/
A release strategy that runs two identical environments (blue = live, green = new). Traffic is switched atomically, enabling instant rollback.
"We use blue-green deployments to achieve zero downtime — if smoke tests fail on green, we flip the load balancer back to blue."
- Canary Release /kəˈneəri rɪˈliːs/
A deployment strategy that routes a small percentage of traffic to a new version before a full rollout, limiting blast radius.
"We deployed the new recommendation engine as a canary to 5% of users — after 24 hours with no error rate increase, we ramped to 100%."
- Feature Flag /ˈfiːtʃər flæɡ/
A runtime toggle that enables or disables features without deploying new code, allowing dark launches and A/B tests.
"The new checkout flow is behind a feature flag — we can enable it for beta users without a separate release."
- SLI / SLO / SLA /ɛsɛlˈaɪ / ɛslˈəʊ / ɛslˈeɪ/
SLI = measurement (e.g. request latency). SLO = internal target (p99 < 200ms). SLA = contract with customers (99.9% uptime).
"Our SLO is 99.9% availability — the SLI (current measurement) shows 99.95%, so we have error budget to spend on experiments."
- Error Budget /ˈerər ˈbʌdʒɪt/
The allowed downtime or unreliability within an SLO period. When exhausted, new deployments are paused until reliability recovers.
"We burned through 80% of our monthly error budget in one incident — feature releases are frozen until we address the root cause."
- Observability /ɒbˌzɜːvəˈbɪlɪti/
The ability to infer the internal state of a system from its external outputs: logs, metrics, and distributed traces.
"We improved observability by adding structured logging, Prometheus metrics, and Jaeger tracing to every service."
- Distributed Trace /dɪˈstrɪbjuːtɪd treɪs/
A record of a request's path across multiple services, showing latency at each hop to identify performance bottlenecks.
"The distributed trace showed that 90% of the latency came from a synchronous call to the inventory service — we made it async."
- On-Call /ɒn kɔːl/
A rotation where engineers are responsible for responding to production incidents outside business hours.
"I'm on-call this week — my PagerDuty is configured to escalate to my phone if a P1 alert fires."
- Runbook /ˈrʌnbʊk/
A documented set of procedures for handling a specific operational task or incident type.
"The runbook for database failover has 12 steps — anyone on-call can follow it without needing the original author."
- Postmortem /ˈpəʊstmɔːtəm/
A blameless written analysis of an incident covering timeline, root cause, impact, and action items to prevent recurrence.
"The postmortem identified three contributing factors — we opened tickets for all three with owners and due dates."
Quick Quiz — DevOps & Cloud
Test yourself on these 15 terms. You'll answer 10 multiple-choice questions — each shows a term, you pick the correct definition.
What does this term mean?