Learn the vocabulary of a model losing prior general knowledge while fine-tuning on a new, narrow task.
0 / 5 completed
1 / 5
At standup, a dev mentions fine-tuning a language model on a new, narrow dataset, only to find it has abruptly lost much of the broad general knowledge it had before, because the new training overwrote the earlier learned weights. What is this phenomenon called?
Catastrophic forgetting is exactly this: it occurs when fine-tuning a model on a new, narrow dataset overwrites the weights that encoded its earlier, broader knowledge, causing the model to abruptly lose much of what it knew before, even though the new fine-tuning task itself may go well. A hash collision is an unrelated hash-table concept about two keys sharing a bucket. This overwrite-earlier-knowledge behavior is exactly why fine-tuning must be approached carefully whenever a model's prior general capability still needs to be preserved.
2 / 5
During a design review, the team uses a low learning rate and a replay buffer of earlier training examples while fine-tuning a model on a new task, specifically because mixing in a sample of earlier data helps preserve the model's prior general knowledge instead of letting new training overwrite it entirely. Which capability does this provide?
A replay buffer and low learning rate here provide preservation of prior knowledge alongside new-task learning, since replaying a sample of earlier data during fine-tuning keeps the model's weights anchored to what it already knew instead of letting the new task overwrite it entirely. Fine-tuning on the new task alone with no replay of earlier data risks catastrophic forgetting, since nothing during training reinforces the earlier knowledge. This anchor-to-earlier-data behavior is exactly why replay-based fine-tuning is a standard defense against catastrophic forgetting.
3 / 5
In a code review, a dev notices a fine-tuning pipeline trains a general-purpose model on a narrow new dataset with a high learning rate and no replay of any earlier training examples, instead of mixing in a sample of earlier data to anchor the model's prior knowledge. What does this represent?
This is a missed opportunity to prevent catastrophic forgetting, since replaying a sample of earlier training examples during fine-tuning would anchor the model's prior knowledge instead of letting the new narrow dataset overwrite it entirely. A cache eviction policy is an unrelated concept about discarded cache entries. This no-replay-high-learning-rate pattern is exactly the kind of risk a reviewer flags once the model's prior general capability still needs to be preserved.
4 / 5
An incident report shows a fine-tuned model lost most of its general-purpose capability after training on a narrow customer-support dataset, because the fine-tuning ran with a high learning rate and no replay of any earlier training examples. What practice would prevent this?
Fine-tuning with a lower learning rate and a replay buffer of earlier training examples keeps the model's prior general knowledge anchored instead of letting it be overwritten. Continuing to fine-tune with a high learning rate and no replay of earlier examples regardless of how much general capability the model ends up losing is exactly what caused the loss described in this incident. This replay-and-lower-learning-rate approach is the standard fix once a model's prior general capability is confirmed to matter after fine-tuning.
5 / 5
During a PR review, a teammate asks why the team fine-tunes with a replay buffer and a low learning rate instead of simply fine-tuning aggressively on the new dataset alone to maximize new-task accuracy as fast as possible. What is the reasoning?
A replay buffer with a low learning rate trades a bit of new-task learning speed for preserving the model's prior general knowledge, while fine-tuning aggressively on the new dataset alone maximizes new-task accuracy quickly but risks catastrophic forgetting of everything the model knew before. This is exactly why replay-based, lower-learning-rate fine-tuning is preferred whenever prior general capability must be preserved alongside the new task.