What do Kubernetes resource requests and limits control?
Requests declare the resources a container is guaranteed — the scheduler uses them to find a node with enough free capacity, and they form the basis of scheduling decisions. Limits cap usage at runtime: a container exceeding its CPU limit is throttled, and one exceeding its memory limit is OOM-killed. The gap between request and limit defines the pod's Quality of Service class (Guaranteed, Burstable, BestEffort). Setting these correctly is critical: too-low requests cause node overcommit and instability; too-high requests waste capacity and money.
2 / 5
What is a taint and a toleration in Kubernetes scheduling?
Taints and tolerations work together to repel pods from nodes unless explicitly permitted. A taint on a node (e.g. gpu=true:NoSchedule) blocks any pod that lacks a matching toleration. This is used to reserve specialized nodes (GPU machines, nodes for a particular team) so only pods that opt in via a toleration land there. Note the asymmetry: a toleration permits scheduling onto a tainted node but does not attract the pod there — for attraction you use node affinity. Together they give fine-grained placement control.
3 / 5
What is the difference between node affinity and pod anti-affinity?
Node affinity expresses a pod's preference or requirement to run on nodes with particular labels (e.g. disktype=ssd or a specific zone) — it is about pod-to-node relationships. Pod anti-affinity expresses pod-to-pod relationships: "do not schedule this pod near pods matching these labels." A classic use is spreading the replicas of a service across different nodes or zones so a single node/zone failure does not take down all replicas. Both come in required (hard) and preferred (soft) forms for strict vs best-effort placement.
4 / 5
What does the Horizontal Pod Autoscaler (HPA) do?
The HPA scales a workload horizontally — changing the replica count — based on metrics. It periodically compares an observed metric (CPU/memory utilization, or custom/external metrics like queue depth or requests-per-second) against a target and adjusts replicas to converge on it. Contrast with the Vertical Pod Autoscaler (which resizes a pod's requests/limits) and the Cluster Autoscaler (which adds/removes nodes). The HPA needs the metrics server (or an adapter for custom metrics) and well-set resource requests, since utilization is measured relative to requests.
5 / 5
What is the difference between a liveness probe and a readiness probe?
These probes serve different purposes. A liveness probe answers "is this container alive, or stuck/deadlocked?" — if it fails, Kubernetes restarts the container. A readiness probe answers "is this container ready to serve requests?" — if it fails, the pod is removed from the Service's endpoints so it receives no traffic, but it is not restarted. This distinction matters at startup (an app may be alive but still warming up) and during temporary overload (mark not-ready to shed traffic without killing the process). Misusing liveness as readiness causes needless restart loops.