Practice the vocabulary of adapting a pre-trained language model to a specific task.
0 / 5 completed
1 / 5
At standup, a dev mentions further training a pre-trained language model on a smaller, task-specific dataset to improve its performance on that particular task. What is this process called?
Fine-tuning continues training an already pre-trained model on a smaller, task-specific dataset, adjusting its weights to improve performance on that particular task rather than starting from randomly initialized weights and training from scratch. This is far more efficient than full pre-training, since it builds on the general knowledge the base model already learned. It's commonly used to adapt a general-purpose model to a narrower domain, tone, or output format.
2 / 5
During a design review, the team wants to adapt a large model efficiently by training only a small number of additional parameters rather than updating the entire model's weights. Which capability supports this?
Parameter-efficient fine-tuning, such as LoRA, trains only a small number of additional parameters while keeping the base model's original weights frozen, rather than updating the entire model. This dramatically reduces the compute and memory needed for fine-tuning compared to updating every weight. It also makes it practical to maintain several lightweight, task-specific adaptations of the same underlying base model.
3 / 5
In a code review, a dev notices the fine-tuning dataset was manually reviewed to remove duplicate and low-quality examples before training began. What does this represent?
Fine-tuning dataset curation removes duplicate and low-quality examples before training, since a fine-tuned model tends to reflect the quality and patterns of its training examples closely. Training on raw, unreviewed data risks the model learning from noisy or repetitive examples that degrade its output quality. This curation step is widely considered one of the highest-leverage parts of a successful fine-tuning effort.
4 / 5
An incident report shows a fine-tuned model became noticeably worse at general tasks it previously handled well, after being fine-tuned narrowly on a specialized dataset. What practice would reduce this risk?
Evaluating a fine-tuned model against a broader benchmark catches a regression in general capability, known as catastrophic forgetting, that can occur when narrow fine-tuning overwrites more general patterns the base model previously learned. Assuming fine-tuning never affects general capability ignores a well-documented risk of this training approach. This evaluation step is a standard safeguard before deploying a fine-tuned model to replace a more general one.
5 / 5
During a PR review, a teammate asks why the team fine-tunes a pre-trained model instead of training an entirely new model from scratch for this task. What is the reasoning?
Training a new model entirely from scratch requires an enormous amount of data and compute to learn even basic language patterns before it can specialize on the task at hand. Fine-tuning starts from a model that already has that general knowledge, needing only a smaller task-specific dataset to adapt it. The tradeoff is that fine-tuning inherits any limitations or biases already present in the base model.