Intermediate Vocabulary #data-science #machine-learning #ml-vocabulary #ai

Data Science & ML Vocabulary

5 exercises — essential vocabulary for data scientists, ML engineers, and analysts: model evaluation metrics, pipeline terminology, and the concepts you need to discuss AI systems in English.

Core ML vocabulary clusters

Model quality: overfitting, underfitting, generalisation, bias-variance trade-off
Evaluation: accuracy, precision, recall, F1 score, AUC-ROC, confusion matrix
Training: hyperparameter, epoch, batch size, learning rate, gradient descent
Data: feature, label, training/validation/test split, feature engineering, normalisation
Infrastructure: ML pipeline, feature store, model registry, experiment tracking
Explainability: interpretability, SHAP, LIME, feature importance, attention

0 / 5 completed

1 / 5

A data scientist explains their work to a colleague. Which sentence correctly describes overfitting?

2 / 5

A team is evaluating a binary classifier for detecting fraudulent transactions. They note:
"Our model flags very few legitimate transactions as fraud."
Which metric does this statement describe?

3 / 5

An ML engineer describes a system component: "This is the sequence of automated steps that takes raw data, transforms it, and produces model predictions at scale."
What is the correct technical term?

4 / 5

A data scientist says: "We need to tune the learning rate, batch size, and regularisation strength before training."
What are these parameters called?

5 / 5

A data analyst reads this in a project requirement: "The model must explain which features contributed most to each prediction."
Which concept does this requirement describe?