Build fluency in the vocabulary of automatically adjusting replica count based on an observed metric.
0 / 5 completed
1 / 5
At standup, a dev mentions a Kubernetes controller that automatically adjusts the number of running pod replicas based on an observed metric like CPU utilization. What is this controller called?
A HorizontalPodAutoscaler, or HPA, automatically adjusts the number of running pod replicas based on an observed metric, like CPU utilization, rather than requiring a person to notice a change in load and react to it. A fixed replica count set once doesn't adapt as real traffic actually fluctuates over time. This automatic adjustment is what lets a service scale with demand without constant manual intervention.
2 / 5
During a design review, the team wants the HPA configured with a specific target, like 70 percent average CPU utilization, that it tries to maintain by adding or removing replicas. Which capability supports this?
Target metric and threshold configuration gives the HPA a specific goal, like 70 percent average CPU utilization, that it tries to maintain by adding or removing replicas as actual usage moves above or below that target. Configuring an HPA with no target at all leaves it with nothing concrete to scale toward. This configured target is what turns the HPA from an inert controller into one that actively keeps a service's replica count matched to real demand.
3 / 5
In a code review, a dev notices the HPA is configured with a stabilization window that prevents it from scaling down again too soon after a recent scale-up, avoiding rapid oscillation. What does this represent?
A scaling stabilization window prevents the HPA from scaling down again too soon after a recent scale-up, avoiding rapid oscillation between adding and removing replicas in response to a noisy or briefly fluctuating metric. Allowing the HPA to react immediately to every fluctuation risks exactly that kind of thrashing. This stabilization window is what keeps an HPA's scaling decisions smooth rather than jittery.
4 / 5
An incident report shows a service's replica count oscillated rapidly up and down for hours, adding churn and occasional latency spikes during each scale-up, because no stabilization window had been configured on its HPA. What practice would prevent this?
Configuring a scaling stabilization window on the HPA prevents it from scaling down again too soon after a recent scale-up, smoothing out its reaction to a noisy metric. Leaving the HPA with no such window configured risks exactly the rapid oscillation this incident describes. This stabilization configuration is a standard safeguard for any HPA reacting to a metric that fluctuates quickly in normal operation.
5 / 5
During a PR review, a teammate asks why the team relies on an HPA instead of just setting a fixed replica count sized for the service's expected peak traffic. What is the reasoning?
A fixed replica count sized for expected peak traffic wastes resources during a quieter period, since that full peak capacity runs continuously whether or not it's actually needed. An HPA adjusts the replica count automatically to match real, fluctuating demand instead. The tradeoff is the added care needed in choosing the right target metric and stabilization settings to avoid a scaling delay or an oscillation problem.