Practise the English used to discuss learning rate, batch size, epochs, and hyperparameter search strategies.
0 / 6 completed
1 / 6
A colleague says 'training diverged after we increased the learning rate'. What happened?
Divergence means the loss grows instead of shrinking. A learning rate that is too high causes update steps that overshoot the minimum, making training unstable.
2 / 6
What does 'the model is overfitting' mean in plain English?
Overfitting: excellent training performance, poor generalisation. The model has learned noise and specific training examples rather than the underlying pattern.
3 / 6
A team says 'we ran a grid search over learning rate and batch size'. This means they:
Grid search exhaustively evaluates every combination in a predefined parameter grid — e.g. learning rates [0.001, 0.01] × batch sizes [32, 64] gives 4 experiments.
4 / 6
Bayesian optimisation for hyperparameter tuning is described as:
Bayesian optimisation builds a probabilistic model of the objective function and uses it to select the next hyperparameter configuration — it learns from previous runs to search more efficiently than random or grid search.
5 / 6
An 'ablation study' in ML research means:
Ablation studies isolate the contribution of individual components. 'We ablated the attention mechanism and saw a 4% drop in accuracy' means removing that component hurt performance by 4%.
6 / 6
What does 'early stopping' mean in model training?
Early stopping monitors validation loss and stops training when it stops improving (or starts degrading), acting as a regularisation technique that prevents the model from overfitting.