Data Scientist & ML Engineer English Essentials
50 terms and 20 phrases for data scientists and ML engineers — covering models, evaluation metrics, data pipelines, and the precise, careful English of presenting findings to stakeholders.
Last reviewed:
On this page
- ML basics (15)
- Evaluation (13)
- Data & pipelines (12)
- Presenting findings (10)
- Key phrases (20)
ML basics
- model
- A function learned from data that makes predictions on new inputs.
- feature
- An input variable the model uses to make a prediction.
- label / target
- The correct answer the model is trying to predict.
- training
- The process of fitting a model to data by adjusting its parameters.
- inference
- Using a trained model to make predictions in production.
- supervised learning
- Learning from labelled examples; unsupervised has no labels.
- overfitting
- When a model memorises the training data and fails on new data.
- underfitting
- When a model is too simple to capture the patterns in the data.
- hyperparameter
- A setting you choose before training (learning rate, tree depth), not learned from data.
- epoch
- One full pass over the training dataset.
- gradient descent
- The optimisation method that nudges parameters to reduce error.
- loss function
- The number measuring how wrong the model’s predictions are.
- regularization
- Techniques that penalise complexity to reduce overfitting.
- embedding
- A dense numeric vector representing something like a word or user.
- fine-tuning
- Adapting a pre-trained model to a specific task with extra training.
Evaluation
- accuracy
- The fraction of predictions that are correct — misleading on imbalanced data.
- precision
- Of the items flagged positive, how many really were positive.
- recall
- Of the truly positive items, how many the model caught.
- F1 score
- The harmonic mean of precision and recall, balancing the two.
- AUC / ROC
- A threshold-independent measure of how well a classifier ranks positives above negatives.
- confusion matrix
- A table of true/false positives and negatives.
- baseline
- A simple reference model your model must beat to be worth deploying.
- cross-validation
- Splitting data several ways to estimate performance more reliably.
- train/test split
- Holding out data to test on examples the model never saw.
- bias–variance trade-off
- Balancing systematic error against sensitivity to the training data.
- A/B test
- Comparing two versions on live traffic to measure real impact.
- RMSE / MAE
- Common error metrics for regression — how far predictions are from actuals.
- leakage
- When information from the future or the target sneaks into training, inflating scores.
Data & pipelines
- pipeline
- An automated sequence that ingests, transforms and serves data or predictions.
- ETL
- Extract, Transform, Load — moving and reshaping data into a warehouse.
- feature store
- A central system for storing and serving features consistently to models.
- data drift
- When the input data distribution shifts away from what the model trained on.
- concept drift
- When the relationship between inputs and the target changes over time.
- data warehouse
- A central store optimised for analytical queries over large datasets.
- feature engineering
- Creating useful input variables from raw data.
- imputation
- Filling in missing values in the data.
- normalization / scaling
- Putting features on a comparable numeric range.
- labeling
- Annotating data with the correct answers for supervised training.
- batch vs streaming
- Processing data in scheduled chunks vs continuously as it arrives.
- reproducibility
- Being able to rerun an experiment and get the same result.
Presenting findings
- statistical significance
- Confidence that a result is unlikely to be due to chance.
- p-value
- The probability of seeing the result if there were truly no effect.
- confidence interval
- A range that likely contains the true value, with a stated confidence.
- correlation vs causation
- Two things moving together doesn’t mean one causes the other.
- effect size
- How big a difference is, not just whether it’s statistically detectable.
- sample size
- How many observations the conclusion is based on.
- distribution
- How values are spread across a range.
- outlier
- A data point far from the rest that can distort results.
- hypothesis
- A testable statement about what the data will show.
- interpretability
- How easily a human can understand why a model made a prediction.
Key phrases for presenting results & explaining models
- The data suggests a clear correlation, but I’d caution against reading causation into it.
- The model beats the baseline on F1, though precision dipped slightly.
- This result is statistically significant at p < 0.05, with a 95% confidence interval of 2–4%.
- We’re seeing data drift — the input distribution has shifted since we trained.
- Recall matters more than precision here, because missing a fraud case is costly.
- I suspect leakage — the score looks too good, so let me audit the features.
- In plain terms, the model is most confident when the customer has a long history.
- Let me caveat this: the sample size is small, so treat the numbers as directional.
- We optimised for recall at the cost of more false positives — that’s the trade-off the business asked for.
- The headline number is encouraging, but the confidence interval is wide.
- To translate that into impact: this should reduce churn by roughly two percentage points.
- The model is overfitting — training accuracy is high but it falls apart on the holdout set.
- I’d recommend we A/B test this before rolling it out to everyone.
- These are early results from a single experiment, not a final recommendation.
- The feature importance tells us location is the strongest signal by far.
- Happy to go deeper on the methodology, but the takeaway is the lift is real and repeatable.
- We retrained after detecting drift, and performance recovered to the previous level.
- Let me walk you through the assumptions before we get to the conclusion.
- Yesterday: finished the feature engineering. Today: tuning hyperparameters. No blockers.
- Quick question before I proceed: do we care more about accuracy or interpretability for this use case?
How to use this cheatsheet
The hardest English in this field isn’t the jargon — it’s translating technical results into clear, honest language for non-technical stakeholders. Notice how the phrases hedge appropriately (“the data suggests”, “I’d caution against”, “treat this as directional”). Learn the metric vocabulary so you can defend your numbers, then practise framing findings in the linked exercises.