Reconciliation loop: observe desired state → compare actual state → act to close the gap → requeue on error
CRD design: use status subresource; define conditions following the Kubernetes API convention
Controller idempotency: every reconcile must be safe to run multiple times with the same input
Webhook types: validating (reject invalid resources) vs mutating (set defaults) — order matters
0 / 5 completed
1 / 5
The interviewer asks: "Explain the reconciliation loop in a Kubernetes controller. What happens when a reconcile returns an error?" Which answer is most technically accurate?
Option B is the strongest. It describes the informer-backed cache (not direct API calls), the work queue with exponential backoff on error, the RequeueAfter pattern for polling, and the critical design requirement of idempotency. It also notes that the controller must assume other writers exist — a common oversight for engineers new to operator development. Option A says "the loop stops" on error — wrong; the work queue retries. Option C describes a polling loop — incorrect; controllers are event-driven. Option D marks the resource as Failed and stops — this would leave the resource in a broken state indefinitely.
2 / 5
The interviewer asks: "How do you design a CRD (Custom Resource Definition) for a new operator? What conventions should you follow?" Which answer is most complete?
Option B is the strongest. It covers seven specific CRD conventions: spec/status separation, the status subresource (and why — preventing race conditions), the metav1.Condition convention for status conditions, printer columns, API versioning with conversion webhooks, and structural schema validation. This is the complete set of CRD design decisions a senior operator developer should know. Option A describes the minimum structure but none of the conventions. Option C is copying — fast but misses conventions that may not be in the chosen example. Option D designs status after implementation — this order leads to status fields that do not reflect what users actually need.
3 / 5
The interviewer asks: "What is the difference between a validating admission webhook and a mutating admission webhook, and when do you use each?" Which answer is most precise?
Option B is the strongest. It correctly states the ordering (mutating before validating), describes what each can do (mutating: modify; validating: accept/reject only), gives concrete use cases for each, explains the dependency benefit (validating can assume defaults are set), and addresses the critical operational question of fail-open vs fail-closed. Option A is a minimal description without ordering, use cases, or operational concerns. Option C is wrong about triggering conditions — both webhook types are triggered by write operations (CREATE, UPDATE, DELETE). Option D treats validating webhooks as redundant — wrong, because mutating webhooks cannot reject resources.
4 / 5
The interviewer asks: "How do you test a Kubernetes operator? What testing strategy do you use?" Which answer demonstrates the most layered approach?
Option B is strongest. It defines three testing layers (unit with fake client, integration with envtest, E2E with kind), names specific tools (controller-runtime fake client, envtest, kind), tests idempotency explicitly, covers leader election and external modification scenarios, and tests the negative path (missing references → meaningful status condition). Option A is manual testing only — not scalable or reproducible. Option C skips controller testing, which is the most complex and error-prone part. Option D uses Helm chart tests — validates installation but not operator behaviour.
5 / 5
The interviewer asks: "How do you handle operator upgrades without downtime for managed resources?" Which answer shows the most mature operational thinking?
Option C is the strongest. It addresses five specific upgrade concerns: leader election continuity, CRD backward compatibility (never remove fields, use conversion webhooks), reconcile safety for legacy resource states, staged rollout (canary per namespace), and status preservation. This is the complete operational picture. Option A deletes and recreates the operator — resource state and continuity are at risk. Option B uses OLM — valid for OLM-managed operators but does not explain the underlying mechanisms. Option D relies on RollingUpdate — handles pod replacement but does not address CRD compatibility or legacy resource state.