5 exercises — choose the best-structured answer to common RecSys Platform Engineer interview questions. Focus on retrieval architecture, ANN indexes, feature stores, evaluation gaps, and A/B testing.
Structure for RecSys interview answers
Separate retrieval from ranking: two-tower retrieval → ANN index → re-ranking is the standard pipeline
Name loss functions: sampled softmax, BPR, InfoNCE — and their in-batch negative trade-offs
Distinguish offline from online stores: point-in-time correctness, freshness SLA, and recency feature computation
Quantify evaluation limits: Pearson r 0.3-0.5 offline/online correlation, SUTVA violations in A/B tests
0 / 5 completed
1 / 5
The interviewer asks: "Explain two-tower retrieval architecture for recommendation systems — how are the towers trained, what loss functions are used, and how is the model served at scale?" Which answer best covers two-tower architecture?
Option B provides the complete architecture: embedding dimensionality, sampled softmax mechanics with in-batch negatives and the popularity bias problem they introduce, hard negative mining as the solution, BPR and InfoNCE loss variants, the offline/online serving split (item embeddings batch-indexed, user tower online at query time with latency targets), specific ANN tools, the negative sampling pitfall and the mixed-negative fix, and recall@K as the right evaluation metric (not precision). Options A, C, D each describe the architecture correctly but don't cover loss function details, the popularity bias from in-batch negatives, or the serving latency targets.
2 / 5
The interviewer asks: "Compare HNSW and IVF-PQ for approximate nearest neighbour search — when would you choose each for a recommendation retrieval system?" Which answer best covers ANN index trade-offs?
Option B provides the complete comparison: HNSW graph search mechanics (multi-layer, greedy navigation), tuning parameters (ef, M, ef_construction), memory formula (O(N × dim × 4 bytes) + graph overhead), IVF-PQ two-stage mechanics (Voronoi partitioning + PQ 8-bit codes), memory compression factor (32× vs float32), the training requirement for IVF, practical scale thresholds (<10M for HNSW, billion-scale for IVF-PQ), ScaNN as an alternative, and the IVF-PQ + exact re-rank pattern to recover recall. Options A, C, D each identify the right use cases but provide no memory formulas, compression factors, tuning parameters, or the re-rank pattern.
3 / 5
The interviewer asks: "How does a feature store handle real-time user features for a recommendation system — explain the online/offline store duality and how recency features are computed and served?" Which answer best covers feature store architecture?
Option B covers all five dimensions: the online/offline store duality with specific technologies, point-in-time correctness with the data leakage problem and feature platform names (Feast, Tecton, Hopsworks), three real-time recency feature solutions (Flink streaming, session store with TTL, lambda architecture), freshness SLA per-feature with monitoring approach, and serving performance guidance (bulk get, P99 target, warm-up for top-K users). Options A, C, D each mention the dual store model and point-in-time joins but don't cover recency feature computation approaches, freshness SLA monitoring, or serving performance patterns.
4 / 5
The interviewer asks: "Why do offline recommendation metrics often fail to predict online performance — explain the offline/online gap and the role of popularity bias." Which answer best covers evaluation methodology?
Option B covers all six dimensions: the gap magnitude with a correlation range (Pearson r 0.3-0.5), root cause 1 (selection bias and false negatives for novel items), root cause 2 (power law distribution in interaction logs with the 90/10 example), root cause 3 (distribution shift), three mitigation techniques (IPS with its mechanism, interleaving with the statistical efficiency advantage, bandit-based replay), and the practical recommendation for how to use offline metrics. Options C and D identify popularity bias and selection bias but don't give the correlation range, the false-negative mechanism, distribution shift as a third cause, or the specific mitigation techniques.
5 / 5
The interviewer asks: "What are the unique challenges of A/B testing a recommendation system change — explain hold-out set contamination, network effects, and how you design a clean experiment?" Which answer best covers RecSys A/B testing?
Option B covers all six challenges: SUTVA violation and hold-out contamination with a concrete viral sharing example, network effects (social proof carry-over), novelty effect with the minimum duration recommendation, user-level vs item-level vs geo-based randomisation strategies for contamination prevention, CUPED variance reduction technique, MDE pre-calculation, and the experimental hygiene rules (one primary metric, 10% holdback). Options A, C, D each mention novelty effects but don't cover SUTVA, network effects with carry-over, CUPED, or the item/geo randomisation alternatives.