Reading a Root Cause Analysis Section — Comprehension Exercises
Read the root cause analysis excerpt below, then answer comprehension questions about cascading failures, contributing factors, and failure analysis terminology.
📄 PASSAGE — Read carefully before answering
Root Cause Analysis — INC-2026-0314
Root cause: The primary queue-svc instance rejected all incoming connections after its TLS certificate expired. Because queue-svc uses mutual TLS (mTLS), deployment workers could not authenticate and failed immediately rather than falling back to a secondary queue.
Contributing factors:
1. The certificate renewal automation was introduced 14 months ago but was never configured to cover internal services — only public-facing endpoints.
2. No alert existed for certificate expiry on internal services with fewer than 30 days remaining.
3. The fallback queue path was documented but never tested in production conditions.
Applying the 5 Whys method: Why did deployments fail? Because queue-svc rejected connections. Why? Because its certificate had expired. Why was it not renewed? Because the renewal automation did not cover it. Why? Because internal services were excluded from the initial automation scope. Why was this not corrected? Because no audit process verified automation coverage after new services were added.
This was a latent issue: the certificate had been expiring slowly for 90 days with no observable effect until it crossed the expiry threshold. The system had no single point of failure in its design, yet the failure propagated because the fallback path was untested and silently broken.
Question 1 of 4