Practice English vocabulary for ML model training: convergence, learning rate schedules, vanishing gradients, batch size, and training vs. validation loss.
0 / 5 completed
1 / 5
What does 'the training run converges after 50 epochs' mean?
Convergence means the optimization process has found a (local) minimum and loss improvement per epoch becomes negligible. Training is stopped at convergence — running more epochs wastes compute and may cause overfitting. 50 epochs is an example; actual convergence depends on the problem.
2 / 5
What is a 'learning rate schedule that reduces after plateau'?
ReduceLROnPlateau (PyTorch) and similar schedulers detect when the model has stopped improving and reduce the learning rate. Smaller learning rates allow the optimizer to fine-tune weights without overshooting the loss minimum, often leading to better final performance.
3 / 5
What is the 'vanishing gradient' problem in deep networks?
Vanishing gradients plague deep networks: as gradients flow backward through many layers, they shrink toward zero (with sigmoid/tanh activations), preventing the early layers from updating meaningfully. Solutions include ReLU activations, residual connections (ResNets), batch normalization, and careful weight initialization.
4 / 5
How does 'batch size affect memory and convergence'?
Batch size involves trade-offs: large batches (512, 1024) saturate GPU memory efficiently but may require learning rate scaling and can overfit; small batches (32, 64) have noisier gradients that act as implicit regularization. The 'linear scaling rule' adjusts LR proportionally to batch size.
5 / 5
What does a diverging gap between 'training loss vs. validation loss' indicate?
Plotting training loss vs. validation loss is the primary diagnostic for overfitting. A growing gap (training keeps decreasing, validation plateaus or rises) signals overfitting. Solutions: more data, data augmentation, regularization (L2, dropout), early stopping, or a simpler model.