Vocabulary for Kubernetes Troubleshooting: Debugging Terms Explained

Master the English vocabulary of Kubernetes troubleshooting — pods, evictions, CrashLoopBackOff, OOMKilled and more — with example sentences and pronunciation tips.

When a Kubernetes cluster misbehaves, you need to describe what’s happening fast and precisely — in Slack, on a call, or in an incident channel. The vocabulary is dense and full of compound terms that don’t appear in any dictionary. This guide explains the core troubleshooting vocabulary with example sentences you can reuse.


The building blocks

Before troubleshooting, make sure these are second nature:

  • pod — the smallest deployable unit; one or more containers. “The pod is stuck in Pending.”
  • node — a worker machine. “That node is under memory pressure.”
  • deployment — manages a set of identical pods. “I rolled back the deployment.”
  • replica — one copy of a pod. “We’re running three replicas.”
  • namespace — a logical partition of the cluster.
  • control plane — the brain (API server, scheduler, etc.).
  • kubelet — the agent on each node. “The kubelet stopped reporting.”

Pronunciation: kubectl is commonly said “cube-control” or “cube-cuttle”; kubelet is “cube-let”. Either is accepted — pick one and be consistent.


Pod states and what they mean

Half of troubleshooting is reading and naming pod states.

  • Pending — scheduled but not running yet. “It’s stuck in Pending — no node has capacity.”
  • Running — at least one container is up.
  • CrashLoopBackOff — the container keeps crashing and restarting, with growing delays. “The pod is in CrashLoopBackOff — it’s failing on startup.”
  • ImagePullBackOff — Kubernetes can’t pull the container image. “ImagePullBackOff — probably a bad tag or a registry auth issue.”
  • Terminating — being shut down. “The pod is stuck Terminating.”
  • Completed — a job finished successfully.
  • Error / Failed — terminated unsuccessfully.

The verb to use is stuck in a state: “It’s stuck in CrashLoopBackOff.”


Failure reasons you’ll say out loud

These are the terms that appear in kubectl describe and events:

  • OOMKilled — “out-of-memory killed”; the kernel killed the container for exceeding its memory limit. “The container got OOMKilled — we need to bump the memory limit.”
  • Evicted — the kubelet removed the pod to reclaim resources. “Three pods were evicted under disk pressure.”
  • Throttled — CPU usage was capped at its limit. “The container is being throttled — it’s hitting its CPU limit.”
  • Backoff — an increasing delay between retries. “It’s backing off, so restarts are getting slower.”
  • Liveness probe failed — the health check failed, so Kubernetes restarted the container. “The liveness probe is failing, so it keeps getting killed.”
  • Readiness probe failed — the pod isn’t ready to serve traffic. “It’s running but not ready — the readiness probe is red.”

“The pod was OOMKilled twice, then went into CrashLoopBackOff because the liveness probe timed out on a cold start.”


Resources and scheduling

  • request — the minimum resource a pod reserves. “We under-requested memory.”
  • limit — the cap a pod can use. “It blew past its CPU limit and got throttled.”
  • resource pressure — a node running low on memory or disk. “The node is under memory pressure.”
  • to schedule — to place a pod on a node. “The scheduler can’t place it anywhere.”
  • taint / toleration — rules that keep pods off (or allow them onto) certain nodes. “It won’t schedule because the node has a taint.”
  • affinity — rules about where pods should run.
  • to drain a node — evict all pods to do maintenance. “I’m draining node-3 before the upgrade.”
  • to cordon — mark a node unschedulable. “Cordon it first, then drain it.”

Networking and storage terms

  • Service — a stable endpoint in front of pods. “The Service has no endpoints — no healthy pods.”
  • Ingress — routes external traffic in. “The Ingress is returning 502s.”
  • endpoint — the actual pod IPs behind a Service. “Endpoints are empty.”
  • DNS resolution“Cross-namespace DNS isn’t resolving.”
  • PersistentVolumeClaim (PVC) — a request for storage. “The PVC is stuck Pending — no volume to bind.”
  • to bind — attach a volume to a claim. “The PV won’t bind.”

The verbs of troubleshooting

These action verbs make your sentences sound native:

  • to roll back — revert to a previous version. “Let’s roll back the last deploy.”
  • to roll out / restart — push or restart. “I’ll do a rolling restart.”
  • to scale up / down — change replica count. “Scale it down to zero and back up.”
  • to exec into — open a shell in a container. “Let me exec into the pod and check the logs.”
  • to tail the logs — watch logs live. “I’m tailing the logs now.”
  • to describe — inspect an object’s events. “Describe the pod — the reason’s in the events.”
  • to reproduce — make the bug happen again. “I can’t reproduce it locally.”
  • to bump — increase a value. “Bump the memory limit to 1Gi.”

“I exec’d into the pod, tailed the logs, saw it OOMing, bumped the limit, and did a rolling restart.”


Phrases for the incident channel

  • “The pod’s in CrashLoopBackOff — looking at the events now.”
  • “Confirmed OOMKilled. Bumping the limit and redeploying.”
  • “The Service has zero endpoints, so traffic’s black-holing.”
  • “Node-2 is under memory pressure and evicting pods.”
  • “Rolling back to the last known-good image.”
  • “Probe’s flapping — it passes, then fails, then passes.”

The verb flapping (rapidly alternating between states) is very useful: “The readiness probe is flapping.”


Common mistakes

  • “The pod is down.” Be specific: is it Pending, CrashLooping, or Terminating? Each has a different fix.
  • Confusing liveness and readiness. Liveness failing → restart. Readiness failing → no traffic, but no restart.
  • “It killed itself.” Say what killed it: OOMKilled, evicted, or probe-killed.
  • Mixing up request and limit. A request reserves; a limit caps. Throttling and OOM relate to limits.
  • Saying “container” when you mean “pod”. A pod can hold several containers.

Key takeaways

  • Learn pod states cold: Pending, CrashLoopBackOff, ImagePullBackOff, Terminating.
  • Name failure reasons precisely: OOMKilled, Evicted, throttled, probe failed.
  • Distinguish request vs limit and liveness vs readiness.
  • Use native verbs: roll back, exec into, tail the logs, drain, bump.
  • In incidents, state the state, the reason, and your next action.

Nail this vocabulary and your Kubernetes debugging — in any channel — will be faster and clearer for everyone on the call.