IntermediateVocabulary#data-science-ml#backend#developer-tools

Overfitting Vocabulary

Build fluency in the vocabulary of a model memorizing training-data noise instead of learning generalizable patterns.

0 / 5 completed

1 / 5

At standup, a dev mentions a model that fits the training data extremely closely, including its noise and quirks, but performs noticeably worse on new, unseen data than it did during training. What is this phenomenon called?

2 / 5

During a design review, the team sets aside a held-out validation split before training a model, specifically because comparing training accuracy against validation accuracy reveals whether the model is starting to memorize training-data noise instead of learning generalizable patterns. Which capability does this provide?

3 / 5

In a code review, a dev notices a model-evaluation feature reports only training accuracy as proof the model is ready, with no held-out validation split used to check whether that accuracy reflects genuine generalization or memorized training-data noise. What does this represent?

4 / 5

An incident report shows a deployed model performed far worse on real user data than its reported training accuracy suggested, because it was evaluated only on training accuracy with no held-out validation split to catch memorized noise. What practice would prevent this?

5 / 5

During a PR review, a teammate asks why the team checks for overfitting using a held-out validation split instead of simply trusting a high training accuracy as proof the model is good. What is the reasoning?