5 exercises — choose the best-structured answer to common AI/ML interview questions. Focus on technical precision, correct ML vocabulary, and clear explanation structure.
Structure for ML concept questions
Define: state what the concept is and what property makes it distinct
Contrast: compare with the alternative or predecessor approach
Problem solved: explain what issue this technique addresses
Example: name a specific algorithm, model, or paper to show depth
0 / 5 completed
1 / 5
The interviewer asks: "What is the difference between supervised and unsupervised learning?" Which answer is most precise and well-structured?
Option C is the strongest: it provides precise definitions of both paradigms, explains the mechanism for supervised learning (minimising prediction error), explains what unsupervised learning finds (intrinsic structure — clusters, density regions, latent factors), and names specific algorithms. Naming actual algorithms (K-means, PCA) signals hands-on experience. Option A is good and includes concrete examples, but lacks algorithm names and the term "mapping function." Option B is accurate but extremely brief. Option D is vague and adds no depth. Structure to copy: define A → mechanism → define B → mechanism → examples from each.
2 / 5
The interviewer asks: "Can you explain overfitting and how you would address it?" Choose the best answer.
Option B is the best: it defines overfitting precisely (memorises training data instead of learning general patterns), gives the key symptom (high training accuracy but poor generalisation), and lists four specific remedies with brief technical descriptions. The depth of the remedy list (L1/L2, dropout, early stopping, data augmentation) signals real experience. Option D is technically correct and uses good vocabulary (bias-variance, variance, noise), but mentions only one remedy. Option C is clear and mentions the validation loss symptom — a good specific detail — but its remedies are imprecise. Option A is correct but shallow. Tip: always give at least two distinct solutions with brief explanations of each.
3 / 5
The interviewer asks: "What is a transformer architecture?" Which answer best demonstrates deep understanding?
Option B is the strongest: it places transformers in context (foundation of most modern language models), contrasts with the prior paradigm (unlike RNNs), defines self-attention precisely (weighted relationship between every token pair), names the key capability (long-range dependencies), and mentions the practical engineering benefit (training parallelisable on GPUs). Option D is also strong — naming "Attention is All You Need" and the encoder/decoder split for BERT vs GPT is impressive. Option A is accurate but superficial. Option C is conversational and correct but lacks technical depth. In senior ML interviews, contrast with the previous dominant approach (RNNs) to show historical understanding.
4 / 5
The interviewer asks: "What is RAG — Retrieval-Augmented Generation — and when would you use it?" Choose the most technically complete answer.
Option A is the best: it defines RAG clearly (retrieves relevant documents from an external knowledge base at inference time), contrasts it with the alternative (knowledge baked into model weights), names the two core problems it solves (staleness and hallucination), and gives three concrete use cases. Option C is technically accurate and mentions the implementation stack (embeddings, vector DB) — a useful technical detail — but misses the "why" and use cases. Option D uses correct terms but is still relatively shallow. Option B is too informal ("make things up"). For product/engineering interviews, always state what problem the technique solves and give at least one concrete use case.
5 / 5
The interviewer asks: "Can you explain what embedding is in the context of ML?" Which answer is the most complete and accurate?
Option B is the strongest: it defines embeddings precisely (dense, low-dimensional vector representation), names the key property (similar items map to nearby points), lists multiple data types they apply to (text, images, graph nodes), explains how they're created (learned during training), and situates them in current ML systems (NLP, recommendation, RAG). Option D is also solid — naming Word2Vec and BERT and mentioning three applications shows good breadth. Option C is accurate and gives a helpful example (king/queen) but lacks precision ("words into numbers" is too vague). Option A is technically true but extremely superficial. Key tip: name the property (semantic similarity → geometric proximity), the learning mechanism, and at least two downstream applications.