Advanced Vocabulary #llmops#rag#ai#mlplatform

LLMOps & ML Platform Vocabulary

5 exercises — Practice LLMOps vocabulary in English: RAG, embeddings, vector stores, hallucination, evaluation pipelines, context windows, and inference cost.

Core LLMOps vocabulary clusters
  • RAG: retrieval-augmented generation, embedding, vector store, chunking, semantic search, retrieval quality, reranking
  • Evaluation: faithfulness, relevance, groundedness, LLM-as-judge, eval pipeline, hallucination detection
  • Context: context window, token, prompt, system prompt, few-shot, context stuffing, lost-in-the-middle
  • Deployment: inference, latency, throughput, batching, quantization, KV cache, model serving, cold start
  • Fine-tuning: PEFT, LoRA, instruction tuning, base model vs. fine-tuned, RLHF, DPO
0 / 5 completed
1 / 5
An ML engineer explains RAG architecture to the product team:
"RAG — Retrieval-Augmented Generation — solves the knowledge cutoff problem. Instead of relying only on what the LLM learned during training, we retrieve relevant documents from a knowledge base at inference time and inject them into the prompt. The pipeline: user query → embed the query → vector search → retrieve top-K chunks → inject into prompt context → LLM generates a grounded answer. The quality of the answer depends on retrieval quality, not just the LLM."
What problem does RAG solve, and what is the role of embeddings in the pipeline?