4 exercises — master the essential terms every developer needs to discuss AI: LLM, RAG, tokens, context window, fine-tuning, and more.
0 / 4 completed
1 / 4
What is a Large Language Model (LLM)?
An LLM (Large Language Model) is a type of deep learning model trained on very large text datasets. It learns statistical patterns in language and can:
• Generate text — write code, emails, articles, summaries • Answer questions — based on patterns learned during training • Translate languages • Reason over context — analyse documents, compare options, explain concepts
Examples: GPT-4 (OpenAI), Claude (Anthropic), Gemini (Google), Llama (Meta), Mistral.
Key vocabulary: • Parameters — the numbers inside the model that were trained. GPT-4 has an estimated ~1 trillion parameters. • Pre-training — the initial training phase on massive text data • Fine-tuning — additional training on a smaller dataset to specialise behaviour • Inference — using the trained model to generate a response (the "running" phase, as opposed to training)
2 / 4
Your team is building a feature where an LLM answers questions about your company's internal documentation. Which technique best describes this approach?
RAG (Retrieval-Augmented Generation) is the technique of combining an LLM with a search/retrieval system so the model can answer questions based on external or private documents.
How RAG works: 1. User asks a question 2. A retrieval system searches a document store (vector database, Elasticsearch, etc.) for relevant chunks 3. The relevant chunks are injected into the LLM's prompt as context 4. The LLM generates an answer grounded in the retrieved documents
Why RAG over fine-tuning for this use case? • Fine-tuning embeds knowledge into model weights — expensive, slow to update, requires training data • RAG retrieves fresh documents at query time — cheap to update (just update the document store), no retraining needed
Key RAG vocabulary: • Embeddings — vector representations of text used for similarity search • Vector database — a database optimised to store and search embeddings (e.g. Pinecone, Weaviate, pgvector) • Chunk — a piece of the document that is retrieved and injected • Context window — the maximum amount of text the LLM can process at once (determines how many chunks fit)
3 / 4
An LLM has a context window of 128,000 tokens. Roughly how many words is that?
Option C is correct. The common rule of thumb is: 1 token ≈ 0.75 words (in English). So 128,000 tokens ≈ 96,000 words.
For reference: the average novel is ~80,000–100,000 words. A 128K context window can hold roughly one entire novel.
Why tokens, not words? LLMs don't process words — they process tokens, which are sub-word units. The word "tokenisation" might be split into ["token", "isation"] = 2 tokens. Code is typically more token-dense than prose (braces, semicolons, and operators each cost tokens).
Practical implications: • Long documents — if a document exceeds the context window, you must truncate, chunk (RAG), or summarise • Cost — most LLM APIs charge per token (input + output tokens combined) • Temperature (not related to context, but often asked together) — a setting (0 to 1) controlling output randomness. Temperature 0 = deterministic/consistent; Temperature 1 = creative/varied
4 / 4
What is the difference between fine-tuning and prompting an LLM?
Option B is the correct and important distinction:
Prompting (zero-shot, few-shot, chain-of-thought) • Instructions given in the conversation at query time • No model weights are modified • Available to everyone, no ML expertise needed • Effect lasts only for that conversation • Cost: only the token cost of the request
Fine-tuning • Continues the training process with new labelled data • Updates the model's weights — the knowledge or style is baked in permanently • Requires a training dataset, ML infrastructure, and GPU time • Permanently changes how the model behaves (for that version) • Cost: training compute (significant) + serving the fine-tuned model
When to use each: • Use prompting for: context injection, format control, task steering • Use fine-tuning for: consistent style/tone that prompting can't achieve, domain-specific knowledge that must be in weights, faster inference by reducing prompt length
Most production AI features start with prompting and only fine-tune when the use case demands it.