5 exercises — choose the best-structured answer to common recommendation system interview questions. Focus on algorithm design, cold start, evaluation, and real-time serving.
Structure for recommendation system interview answers
Name the filtering paradigm: collaborative, content-based, hybrid — and when each applies
Address cold start explicitly: new users and new items need separate strategies
Separate retrieval from ranking: two-stage pipelines are standard at scale
Name offline and online metrics: NDCG/MAP and CTR/conversion are different signal types
0 / 5 completed
1 / 5
The interviewer asks: "Explain collaborative filtering — how it works, its assumptions, and where it breaks down." Choose the most complete and accurate answer.
Option C is strongest: it covers memory-based and model-based CF with specific algorithms (ALS, two-tower neural network), articulates the key assumptions (which are rarely stated but show deep understanding), and enumerates four distinct failure modes including sparsity, popularity bias, cold start, and data poisoning. Option D makes a valid practical observation but doesn't explain the mechanism or failure modes. Collaborative filtering: memory-based vs. model-based → matrix factorisation mechanics → key assumptions → failure modes (sparsity, popularity bias, cold start, data poisoning).
2 / 5
The interviewer asks: "How would you handle the cold start problem for a new user signing up for a music recommendation service?" Which answer covers the most complete cold start strategy?
Option A is strongest: it provides six complementary strategies rather than one, motivates explicit elicitation with psychology (feeling like personalisation, not a survey), specifies the content-based signals for music specifically (audio features), uses contextual signals (time of day, geography), constrains popularity bias to stated preferences, specifies the signal threshold for switching modes (10-20 interactions), and adds the A/B test for elicitation length trade-off. Options B and D describe the right approach but each cover only one or two of the six dimensions. Cold start strategy: explicit elicitation → content-based bootstrap → contextual signals → popular-within-segment → quick warm-up threshold → A/B test elicitation length.
3 / 5
The interviewer asks: "How would you evaluate a recommendation system offline — and what are the limitations of offline evaluation?" Choose the answer that best demonstrates evaluation methodology depth.
Option D is strongest: it specifies temporal split (random split leaks future data — a well-known evaluation mistake), names five specific metrics with definitions, identifies the selection bias limitation at a fundamental level (false negatives from items never shown), names counterfactual evaluation with IPS as the solution, and acknowledges the offline/online gap explicitly (showing real-world experience). Options A and B list the right metrics but miss the temporal split requirement and don't address selection bias. Offline evaluation: temporal split (random = data leakage) → NDCG/MRR/HitRate/Coverage/Diversity → selection bias limitation → IPS for counterfactual debiasing → offline/online gap warning.
4 / 5
The interviewer asks: "Design a recommendation system to serve personalised product recommendations at 20,000 requests per second with <100ms latency." Which answer best covers the system design requirements?
Option B is strongest: it explicitly separates retrieval (candidate generation) from ranking (two-stage architecture — standard in production recommendation systems), assigns concrete latency budgets to each stage that sum to the 100ms requirement, specifies retrieval methods per type (ANN, rules-based, popularity within segment), names ranking model types (XGBoost, two-tower), describes the caching strategy with invalidation conditions, and explains the pre-computation pattern for user embeddings. Option C names the right components but doesn't specify the two-stage architecture or break down the latency budget. Recommendation system design: two-stage (retrieval → ranking) → latency budget per stage → retrieval methods → ranking features → cache with invalidation → pre-computation of embeddings.
5 / 5
The interviewer asks: "How would you mitigate filter bubbles and popularity bias in a recommendation system?" Which answer demonstrates the most nuanced understanding of these problems?
Option A is strongest: it defines both problems precisely, provides five mitigation techniques (exploration injection with real Netflix example, diversity constraints with the MMR algorithm, popularity-debiasing in training via inverse propensity scoring, serendipity as a metric, hidden gem surfacing), and crucially explains the mechanisms (inverse propensity scoring in training corrects the training signal itself, not just post-hoc ranking adjustments). Option D correctly identifies exploration vs. exploitation trade-off but only for new users and only at the surface level. Filter bubble and popularity bias mitigation: exploration injection → diversity constraints (MMR) → popularity-debiasing in training (IPS) → serendipity metric → hidden gem surfacing.