Practice English vocabulary for LLM observability: traces, token counts, TTFT latency, spans for retrieval and generation, and LLM-specific telemetry.
0 / 5 completed
1 / 5
What does 'the LLM trace shows the full prompt and response' mean?
LLM traces (in tools like LangSmith, Weights & Biases Prompts, or OpenTelemetry with LLM extensions) capture the full prompt and response for each call. This is essential for debugging: you can see whether hallucinations came from missing context or poor instructions.
2 / 5
Why does 'token count per call' matter for LLM observability?
Monitoring token counts reveals cost anomalies (a single call consuming 50K tokens), prompt engineering opportunities (prompts that can be shortened without quality loss), and helps with capacity planning for high-volume LLM applications.
3 / 5
What is 'TTFT (Time to First Token)' and why is it important?
In streaming LLM applications, TTFT determines how quickly the user starts seeing output. A long TTFT feels slow even if total generation time is reasonable. TTFT is affected by queue time, context length, and model warm-up — tracking it separately from total latency is essential.
4 / 5
What does 'the span captures the retrieval and generation separately' mean?
In a RAG pipeline, separate spans for retrieval (vector DB query) and generation (LLM call) allow profiling each step independently. If P99 latency is 5s, spans reveal whether it's slow vector search (0.2s), slow LLM generation (4.5s), or both — directing optimization effort precisely.
5 / 5
What is 'LLM-specific telemetry' beyond standard application metrics?
Standard application telemetry (requests/sec, error rate, p99 latency) is necessary but not sufficient for LLM applications. LLM-specific telemetry adds insight into why a call was expensive (long prompt), why it stopped (reached max_tokens), and whether prompt caching is working (cache hit rate).