Vocabulary for Machine Learning Engineers

Machine learning has its own dense vocabulary — combining statistics, software engineering, and data science. Whether you are discussing a model’s performance in a meeting, writing a technical report, or preparing for a machine learning interview, knowing the right terms in English is essential.

Model Training Vocabulary

Term	Definition	Example sentence
training data	Data used to teach the model	”We trained the model on 500,000 labelled examples.”
validation data	Data used to tune hyperparameters during training	”We monitor loss on the validation set to detect overfitting.”
test data	Held-out data used only for final evaluation	”We evaluate on the test set only once, after training is complete.”
epoch	One full pass through the training data	”We trained for 50 epochs with early stopping.”
batch size	Number of samples processed in one training step	”We use a batch size of 32 — larger batches tend to generalise worse on this dataset.”
learning rate	How fast the model updates its weights	”We started with a learning rate of 0.001 and used decay after 20 epochs.”
overfitting	Model performs well on training data but poorly on new data	”The training accuracy was 98% but validation accuracy was 74% — a clear sign of overfitting.”
underfitting	Model is too simple to capture the patterns	”The model underfits — we need a more expressive architecture.”
regularisation	Techniques to prevent overfitting	”We applied L2 regularisation and dropout to reduce overfitting.”
fine-tuning	Adapting a pre-trained model to a specific task	”We fine-tuned BERT on our domain-specific corpus.”

Evaluation Metrics Vocabulary

“Accuracy alone is misleading here because the dataset is highly imbalanced. We should prioritise precision and recall.”

Metric	Definition	When to use it
accuracy	Fraction of correct predictions	When classes are balanced
precision	Of all positive predictions, how many were correct	When false positives are costly
recall	Of all actual positives, how many were found	When false negatives are costly
F1 score	Harmonic mean of precision and recall	When you need a balance of both
AUC-ROC	Area Under the ROC Curve — model’s ability to distinguish classes	Binary classification problems
loss	The error signal used during training	All supervised learning tasks
RMSE	Root Mean Square Error — for regression tasks	”The model achieved an RMSE of 4.2 on the test set.”
MAE	Mean Absolute Error	Regression; easier to interpret than RMSE
perplexity	Measure of how well a language model predicts text	NLP tasks
BLEU score	Similarity between generated and reference text	Machine translation, summarisation

Model Architecture Vocabulary

Term	Definition
neural network	A model inspired by the structure of the brain, made of layers of connected units
layer	A building block of a neural network (e.g., convolutional layer, attention layer)
parameter	A learned weight inside the model
hyperparameter	A setting configured before training (learning rate, depth, batch size)
transformer	The architecture underlying modern LLMs; introduced in “Attention Is All You Need” (2017)
attention mechanism	A component that allows the model to focus on relevant parts of the input
embedding	A dense vector representation of a piece of data (word, image patch, etc.)
gradient descent	The optimisation algorithm used to train most neural networks
backpropagation	How gradients flow backward through the network to update weights

MLOps Vocabulary

“We use an MLflow tracking server to log experiments and compare runs. Successful models are promoted to the model registry and deployed via a blue-green deployment strategy.”

Term	Definition
MLOps	DevOps practices applied to machine learning — training, deployment, monitoring
experiment tracking	Recording parameters, metrics, and artefacts from training runs
model registry	A central store for versioned, validated models
feature store	A centralised repository of engineered features for training and inference
data drift	When the distribution of production data diverges from training data
model degradation	Decline in model performance over time
pipeline	A sequence of automated steps from raw data to deployed model
inference	Using a trained model to make predictions on new data
latency	Time taken to return a prediction
throughput	Number of predictions the model can make per second
shadow mode	Running a new model alongside the old one for comparison before switching
A/B testing	Splitting traffic between two models to compare performance in production

Phrases for ML Discussions

Describing model performance:

“The model achieves 89% accuracy on the test set, which is a 12-point improvement over our baseline.”

“The precision is high, but recall is low — the model is conservative in making positive predictions.”

Describing training decisions:

“We experimented with three architectures. The transformer-based model outperformed the others on validation, so we proceeded with that.”

“We stopped training at epoch 40 based on early stopping — validation loss had plateaued for five epochs.”

Describing deployment:

“The model is served via a REST API with a P99 latency of 45ms.”

“We’re monitoring for data drift on a weekly cadence. If the drift score exceeds the threshold, we trigger a retraining pipeline.”

This vocabulary will help you participate confidently in ML team meetings, write clear model documentation, and communicate results to stakeholders who need to understand what the numbers actually mean.

Vocabulary for Machine Learning Engineers

Model Training Vocabulary

Evaluation Metrics Vocabulary

Model Architecture Vocabulary

MLOps Vocabulary

Phrases for ML Discussions

What to Read Next

Practice This Vocabulary

IT Collocations Drills

Interview Preparation

IT Vocabulary Modules

Model Training Vocabulary

Evaluation Metrics Vocabulary

Model Architecture Vocabulary

MLOps Vocabulary

Phrases for ML Discussions

Related Articles

What to Read Next

Practice This Vocabulary

IT Collocations Drills

Interview Preparation

IT Vocabulary Modules