5 exercises — CRDs and the Custom Resource model, reconciliation loops, desired vs actual state and self-healing, mutating vs validating admission webhooks, and the Kubebuilder/Operator SDK ecosystem.
0 / 5 completed
1 / 5
A platform engineer says: "We extended the Kubernetes API with a CRD to manage database clusters." What does a Custom Resource Definition do?
CRD — Custom Resource Definition vocabulary:
CRDs are the foundation of the Kubernetes extension model. They allow the Kubernetes API to manage domain-specific objects.
How CRDs work: 1. A developer creates a CRD manifest defining the new resource type, its group, version, and schema (OpenAPI v3 validation) 2. Apply it: kubectl apply -f database-crd.yaml 3. The API server now accepts objects of the new type: kubectl get databases 4. Custom Resources (instances) are stored in etcd like native resources
CRD anatomy: apiVersion: apiextensions.k8s.io/v1 kind: CustomResourceDefinition Key fields: • group — API group (e.g., databases.example.com) • versions — list of supported API versions (v1alpha1, v1beta1, v1) • scope — Namespaced or Cluster • validation (schema) — OpenAPI v3 schema for validation • subresources — optional status and scale subresources
CRD alone vs Operator: A CRD just stores objects. Without a controller watching and acting on those objects, nothing happens. The operator pattern combines CRDs with a controller.
Vocabulary: • Custom Resource (CR) — an instance of a CRD (like a Pod is an instance of the Pod resource type) • apiextensions.k8s.io — the API group where CRDs live • schema validation — prevents invalid Custom Resources from being created • status subresource — a dedicated endpoint for updating status without triggering a spec reconciliation • Aggregated API Server — an alternative to CRDs for more complex custom APIs; runs a separate API server registered with the main one
2 / 5
A team builds a Kubernetes operator for their database cluster. The senior engineer explains that the controller uses a reconciliation loop. What does the reconciliation loop do?
The reconciliation loop (also called the "control loop" or "reconcile loop") is the fundamental pattern of all Kubernetes controllers — both built-in (Deployment controller, ReplicaSet controller) and custom operators.
The loop in pseudocode: for event in watch(CustomResource, Pod, Service, ...): desired = get_desired_state(cr.spec) actual = observe_cluster_state() if actual != desired: apply_changes_to_make(actual → desired)
Key properties: • Level-triggered (not edge-triggered) — the controller reconciles based on the current state, not individual events. If two rapid updates arrive, one reconciliation handles both. • Idempotent — running reconciliation when already in the desired state should be a no-op • Eventually consistent — the controller keeps trying until the desired state is reached
What triggers reconciliation: • A Custom Resource is created, updated, or deleted • A resource the operator manages (Pod, Service, etc.) changes • Periodic re-queuing (safety net for missed events or external changes)
Reconciliation result: A reconciler in controller-runtime returns either: • Result{} — success, no requeue • Result{Requeue: true, RequeueAfter: 30s} — requeue after delay • error — failure, requeue with back-off
Vocabulary: • watch — subscribing to Kubernetes API events for resource changes • informer — a cached watch mechanism that reduces API server load • work queue — a rate-limited queue where events are enqueued for the reconciler • owner reference — links a managed resource (e.g., Pod) to its owner (e.g., the Custom Resource), enabling garbage collection
3 / 5
During code review, a teammate asks: "What happens when the desired state in the CR spec differs from the actual state in the cluster?" How does the controller respond?
Desired state vs actual state — vocabulary:
Desired state What you declare in the Custom Resource spec (or a built-in resource spec). Example: "I want a PostgreSQL cluster with 3 replicas, version 15, 50Gi storage."
Actual state What exists in the cluster right now: the running Pods, Services, PVCs, ConfigMaps that the operator manages. The operator discovers this by querying the Kubernetes API.
The drift → reconcile cycle: 1. User updates the CR spec: changes replicas from 3 to 5 2. The reconciler is triggered 3. It reads desired state (5 replicas) and observes actual state (3 running Pods) 4. It creates 2 new Pods (or StatefulSet replicas) 5. It updates the CR status: status.readyReplicas: 5 6. Next reconciliation: desired == actual → no-op
Self-healing examples: • A managed Pod crashes → reconciler detects the missing Pod → creates a replacement • Someone manually deletes a Service the operator manages → reconciler detects it is missing → recreates it • A ConfigMap is manually edited → reconciler detects the drift → reverts it to the desired content
Status conditions: Operators typically update the CR's status field with conditions: conditions: - type: Ready status: "True" - type: Degraded status: "False"
Vocabulary: • spec — the desired state declared by the user in a Kubernetes resource • status — the observed state reported by the controller • drift — discrepancy between desired and actual state • self-healing — the system automatically corrects drift without human intervention • convergence — the process of actual state approaching desired state over time
4 / 5
A security review requires: "All Pod specs must go through a mutating webhook to inject the sidecar, and a validating webhook to enforce policy." What is the key difference between these two admission webhook types?
Admission webhooks vocabulary:
Admission webhooks are HTTP callbacks that the Kubernetes API server calls as part of the admission control pipeline — after the request is authenticated and authorised, but before it is persisted to etcd.
Mutating Admission Webhook • Can modify the incoming object by returning a JSON Patch in the response • Runs before validating webhooks • Multiple mutating webhooks run in a non-deterministic order (reinvocations may be needed) • Use cases: sidecar injection (Istio, Linkerd inject proxies), adding default values, injecting labels/annotations, pulling secrets from a vault
Validating Admission Webhook • Can only approve or deny the request — cannot modify it • Runs after all mutating webhooks • Multiple validating webhooks run in parallel • Use cases: enforce policy (image registry allowlist, required labels, resource limits, PSP alternatives), compliance checks
Webhook configuration: • MutatingWebhookConfiguration — registers mutating webhooks • ValidatingWebhookConfiguration — registers validating webhooks • failurePolicy — Fail (reject request if webhook is unreachable) or Ignore (allow through) • namespaceSelector — limit which namespaces the webhook applies to • rules — which API operations and resource types trigger the webhook
Vocabulary: • admission controller — a plugin in the API server that intercepts requests; webhooks are dynamic admission controllers • JSON Patch — RFC 6902 format for describing mutations (add, replace, remove operations) • OPA (Open Policy Agent) / Gatekeeper — common policy engine using validating webhooks • Kyverno — Kubernetes-native policy engine using both webhook types
5 / 5
A new team member asks: "The job description mentions Kubebuilder and Operator SDK. What do these tools help build?"
Kubebuilder and Operator SDK vocabulary:
Kubebuilder An official Kubernetes project (SIG API Machinery) for building operators. It scaffolds a complete operator project with: • Go project structure with Makefile • CRD API type definitions (Go structs with markers that generate CRD YAML) • Controller stubs with the reconcile loop • RBAC manifests for the controller ServiceAccount • Webhook scaffolding for mutating/validating webhooks • Integration test setup with envtest (runs a real API server for testing)
Operator SDK Built on top of Kubebuilder (they share controller-runtime). Maintained by Red Hat / OpenShift. Extends Kubebuilder with: • Go operators — same as Kubebuilder • Helm operators — wraps an existing Helm chart in an operator (no Go code needed) • Ansible operators — uses Ansible playbooks/roles as the reconciliation logic • OLM (Operator Lifecycle Manager) integration — packaging and distributing operators on OperatorHub
controller-runtime The underlying Go library both tools use. Provides: • Manager — starts controllers, webhooks, and cache; handles leader election • Reconciler interface — implement Reconcile(ctx, req) • Client — typed Kubernetes API client with caching • Builder — fluent API to set up watches and event filters
Vocabulary: • scaffold — generate boilerplate code structure from templates • markers — Go code comments like //+kubebuilder:rbac:groups=... that generate YAML during make generate • envtest — test environment that starts etcd and kube-apiserver for integration tests • OLM — Operator Lifecycle Manager; manages operator installation, upgrades, and dependencies • OperatorHub — a catalogue of community and certified operators