Once trained, the team will ___ the model behind an API for real-time predictions.
To serve a model means to deploy it so it can respond to inference requests. Serve is the precise term, behind "model serving" and tools like TensorFlow Serving and TorchServe. Feed, wait on, and help do not carry this meaning. Engineers "serve the model at scale," so serve the model is the correct collocation for making a trained model available to receive inputs and return predictions in production.
2 / 5
For large offline jobs, the system uses ___ inference to process many records at once.
To run batch inference means to process many inputs together in scheduled jobs rather than one at a time. Batch is the precise term, contrasted with "real-time" or "online" inference. Group, bulk up, and pile are not the fixed ML phrase. Teams run "nightly batch inference," so batch inference is the correct collocation for efficiently scoring large volumes of data when low latency on each item is not required.
3 / 5
To avoid recomputing identical results, the service will ___ predictions for common inputs.
To cache predictions means to save model outputs so repeated identical requests are served instantly. Cache is the precise term, behind "prediction caching" for performance and cost savings. Store away, hold, and keep lack the temporary fast-copy sense of cache. Engineers "cache embeddings and predictions," so cache predictions is the correct collocation for reusing prior inference results instead of rerunning an expensive model.
4 / 5
In production the team must ___ drift to catch when the model's accuracy degrades.
To monitor drift means to continuously watch for shifts in data or model performance over time. Monitor is the standard term, behind "drift monitoring" and "model monitoring." See, spot, and eye are passive or informal. Teams set up "monitoring for data and concept drift," so monitor drift is the precise collocation for the ongoing observation needed to detect when a deployed model stops performing as expected.
5 / 5
When accuracy drops too far, the pipeline will ___ the model on fresh data.
To retrain a model means to train it again on newer data to restore performance. Retrain is the standard term, behind "automated retraining" and "retraining triggers." Redo and remake are vague, and while refit exists in statistics, retrain is the dominant ML term. Teams "retrain when drift is detected," so retrain the model is the precise collocation for refreshing a model so it keeps up with changing data.