Intermediate Log Reading #json-logs #structured-logging #tracing #incident-response

Reading JSON / Structured Logs

5 exercises — traceId and distributed tracing, connection pool exhaustion, degraded health checks, circuit breakers, and rate limiting. Read log evidence and communicate findings clearly.

0 / 5 completed

JSON log field reference

timestamp — when the event happened (ISO 8601 / UTC)
level — DEBUG < INFO < WARN < ERROR < FATAL/CRITICAL
traceId / requestId — correlates events across multiple services for one request
service / component — which system emitted this log
msg — human-readable description of the event
duration_ms — time taken (latency signal)
error / error_code — machine-readable error identifier

1 / 5

A log entry reads:

{"timestamp":"2026-04-07T03:14:22.441Z","level":"ERROR","service":"payment-api","traceId":"9f2c1d8e","userId":"usr_8821","msg":"charge failed","error":"card_declined","duration_ms":312}

A colleague asks: "Which field should I use to find all other log entries from the same request across multiple services?" What is your answer?

2 / 5

You see these two log entries from the same service within 2 seconds:
{"level":"WARN","msg":"database connection pool exhausted, waiting for connection","pool_size":10,"waiting":8}

{"level":"ERROR","msg":"database query timeout after 30000ms","query":"SELECT * FROM orders WHERE...","timeout_ms":30000}

What is the correct interpretation of what these two log lines are telling you together?

3 / 5

A log entry shows:

{"level":"INFO","msg":"health check","status":"ok","checks":{"db":"ok","redis":"ok","queue":"degraded"},"duration_ms":45}

What action, if any, does this log entry require?

4 / 5

During an incident, you find this log line:

{"level":"ERROR","msg":"upstream service unavailable","upstream":"inventory-service","attempts":3,"last_error":"connection refused (ECONNREFUSED)","circuit_breaker":"open"}

What does "circuit_breaker: open" mean in this context?

5 / 5

You are investigating a spike in errors. You find this log entry:

{"level":"WARN","msg":"rate limit applied","client_ip":"203.0.113.42","endpoint":"/api/search","requests_last_minute":847,"limit":100,"action":"throttled","retry_after_ms":24000}

Write a one-sentence Slack incident update based only on the information in this log line. Which of the following is the best update?