Practice vocabulary for evaluating models in production: online evaluation, A/B testing models, shadow mode, comparing model versions, and interpreting live metrics.
0 / 5 completed
1 / 5
The MLOps team uses ___ evaluation to measure model performance on live traffic.
Online evaluation measures a model's performance using real production traffic — tracking metrics like click-through rate, conversion, or user satisfaction as users actually interact with the system.
2 / 5
To compare two recommendation models the team runs an A/___ test in production.
A/B testing in ML means splitting live traffic between two model versions (A and B) and measuring a business or quality metric for each group. The version with better outcomes is selected for full rollout.
3 / 5
Before full rollout the new model runs in ___ mode: it receives live requests and makes predictions, but results are not served to users.
Shadow mode evaluation runs a new model on real production requests in parallel with the live model, without serving its responses. This lets you compare behaviour and catch issues before the new model affects users.
4 / 5
The engineering report reads: 'We're comparing ___ and ___ in production.' What are v1 and v2?
'Comparing v1 and v2 in production' means running controlled traffic experiments between the current model (v1) and a new candidate (v2) to determine whether the new version improves the target metric.
5 / 5
The weekly report says: 'The new model shows 3% better ___ in shadow mode.' What metric is being measured?
CTR (click-through rate) in shadow mode means the shadow model's predictions, if they had been served, would have resulted in 3% more clicks than the live model — a positive signal before promotion to production.