Vocabulary for Machine Learning Engineers
The essential English vocabulary for machine learning engineers — model training, evaluation metrics, MLOps, and deployment terms explained with examples.
Machine learning has its own dense vocabulary — combining statistics, software engineering, and data science. Whether you are discussing a model’s performance in a meeting, writing a technical report, or preparing for a machine learning interview, knowing the right terms in English is essential.
Model Training Vocabulary
| Term | Definition | Example sentence |
|---|---|---|
| training data | Data used to teach the model | ”We trained the model on 500,000 labelled examples.” |
| validation data | Data used to tune hyperparameters during training | ”We monitor loss on the validation set to detect overfitting.” |
| test data | Held-out data used only for final evaluation | ”We evaluate on the test set only once, after training is complete.” |
| epoch | One full pass through the training data | ”We trained for 50 epochs with early stopping.” |
| batch size | Number of samples processed in one training step | ”We use a batch size of 32 — larger batches tend to generalise worse on this dataset.” |
| learning rate | How fast the model updates its weights | ”We started with a learning rate of 0.001 and used decay after 20 epochs.” |
| overfitting | Model performs well on training data but poorly on new data | ”The training accuracy was 98% but validation accuracy was 74% — a clear sign of overfitting.” |
| underfitting | Model is too simple to capture the patterns | ”The model underfits — we need a more expressive architecture.” |
| regularisation | Techniques to prevent overfitting | ”We applied L2 regularisation and dropout to reduce overfitting.” |
| fine-tuning | Adapting a pre-trained model to a specific task | ”We fine-tuned BERT on our domain-specific corpus.” |
Evaluation Metrics Vocabulary
“Accuracy alone is misleading here because the dataset is highly imbalanced. We should prioritise precision and recall.”
| Metric | Definition | When to use it |
|---|---|---|
| accuracy | Fraction of correct predictions | When classes are balanced |
| precision | Of all positive predictions, how many were correct | When false positives are costly |
| recall | Of all actual positives, how many were found | When false negatives are costly |
| F1 score | Harmonic mean of precision and recall | When you need a balance of both |
| AUC-ROC | Area Under the ROC Curve — model’s ability to distinguish classes | Binary classification problems |
| loss | The error signal used during training | All supervised learning tasks |
| RMSE | Root Mean Square Error — for regression tasks | ”The model achieved an RMSE of 4.2 on the test set.” |
| MAE | Mean Absolute Error | Regression; easier to interpret than RMSE |
| perplexity | Measure of how well a language model predicts text | NLP tasks |
| BLEU score | Similarity between generated and reference text | Machine translation, summarisation |
Model Architecture Vocabulary
| Term | Definition |
|---|---|
| neural network | A model inspired by the structure of the brain, made of layers of connected units |
| layer | A building block of a neural network (e.g., convolutional layer, attention layer) |
| parameter | A learned weight inside the model |
| hyperparameter | A setting configured before training (learning rate, depth, batch size) |
| transformer | The architecture underlying modern LLMs; introduced in “Attention Is All You Need” (2017) |
| attention mechanism | A component that allows the model to focus on relevant parts of the input |
| embedding | A dense vector representation of a piece of data (word, image patch, etc.) |
| gradient descent | The optimisation algorithm used to train most neural networks |
| backpropagation | How gradients flow backward through the network to update weights |
MLOps Vocabulary
“We use an MLflow tracking server to log experiments and compare runs. Successful models are promoted to the model registry and deployed via a blue-green deployment strategy.”
| Term | Definition |
|---|---|
| MLOps | DevOps practices applied to machine learning — training, deployment, monitoring |
| experiment tracking | Recording parameters, metrics, and artefacts from training runs |
| model registry | A central store for versioned, validated models |
| feature store | A centralised repository of engineered features for training and inference |
| data drift | When the distribution of production data diverges from training data |
| model degradation | Decline in model performance over time |
| pipeline | A sequence of automated steps from raw data to deployed model |
| inference | Using a trained model to make predictions on new data |
| latency | Time taken to return a prediction |
| throughput | Number of predictions the model can make per second |
| shadow mode | Running a new model alongside the old one for comparison before switching |
| A/B testing | Splitting traffic between two models to compare performance in production |
Phrases for ML Discussions
Describing model performance:
“The model achieves 89% accuracy on the test set, which is a 12-point improvement over our baseline.”
“The precision is high, but recall is low — the model is conservative in making positive predictions.”
Describing training decisions:
“We experimented with three architectures. The transformer-based model outperformed the others on validation, so we proceeded with that.”
“We stopped training at epoch 40 based on early stopping — validation loss had plateaued for five epochs.”
Describing deployment:
“The model is served via a REST API with a P99 latency of 45ms.”
“We’re monitoring for data drift on a weekly cadence. If the drift score exceeds the threshold, we trigger a retraining pipeline.”
This vocabulary will help you participate confidently in ML team meetings, write clear model documentation, and communicate results to stakeholders who need to understand what the numbers actually mean.