Practice the vocabulary of adding and removing a cluster's own node capacity to match real demand.
0 / 5 completed
1 / 5
At standup, a dev mentions a controller that adds a new node to the cluster when a pod can't be scheduled due to insufficient capacity, and removes an underutilized node when it's no longer needed. What is this controller called?
The Cluster Autoscaler adds a new node to the cluster when a pod can't be scheduled due to insufficient capacity, and removes an underutilized node once it's no longer needed, operating at the level of the cluster's own infrastructure rather than an individual deployment's replica count. A HorizontalPodAutoscaler adjusts replica count but does nothing if the cluster simply doesn't have enough node capacity to schedule those replicas. This node-level scaling is what lets a cluster grow and shrink its own underlying capacity to match real demand.
2 / 5
During a design review, the team wants the Cluster Autoscaler to check whether every pod on a candidate node could actually be rescheduled elsewhere before removing that node, avoiding a disruption to a pod with nowhere else to go. Which capability supports this?
Safe node removal checks whether every pod on a candidate node could actually be rescheduled elsewhere before the Cluster Autoscaler removes that node, avoiding a disruption to a pod that would otherwise have nowhere else to go. Removing an underutilized node immediately with no such check risks evicting a pod into a state where it can't actually be rescheduled, like one with a strict local-storage constraint. This safety check is what keeps the Cluster Autoscaler's scale-down decisions from causing an unnecessary outage.
3 / 5
In a code review, a dev notices a pod's Pod Disruption Budget is respected during a scale-down decision, so the Cluster Autoscaler won't remove a node if doing so would violate that budget's minimum-availability floor. What does this represent?
Respecting a PodDisruptionBudget as a constraint on the Cluster Autoscaler's scale-down decisions means it won't remove a node if doing so would drop a service below that budget's minimum-availability floor. Removing a node regardless of any configured PDB risks violating exactly the availability guarantee that budget was set up to protect. This respect for existing disruption budgets is what keeps the Cluster Autoscaler's own scaling decisions consistent with a service's other availability safeguards.
4 / 5
An incident report shows a scale-down event removed a node that was hosting the sole replica of a service with no PodDisruptionBudget configured, causing a brief outage right as the Cluster Autoscaler consolidated underutilized capacity. What practice would prevent this?
Configuring a PodDisruptionBudget for the service gives the Cluster Autoscaler's scale-down logic a minimum-availability floor to respect before removing a node, preventing exactly the brief outage this incident describes. Leaving the service with no PDB configured gives the autoscaler nothing to check against before consolidating capacity onto fewer nodes. This configuration is a standard safeguard for any service whose availability requirement needs to survive a routine cluster-level scale-down.
5 / 5
During a PR review, a teammate asks why the team relies on the Cluster Autoscaler instead of just provisioning a large, fixed number of nodes sized for peak load at all times. What is the reasoning?
A fixed, peak-sized node count wastes infrastructure cost during a quieter period, since that full peak capacity runs continuously whether or not it's actually needed. The Cluster Autoscaler adds and removes node capacity automatically to match real, fluctuating demand instead. The tradeoff is the added care needed in respecting a PodDisruptionBudget and verifying safe rescheduling before a scale-down, so consolidating capacity doesn't itself cause a disruption.