Data Science & ML
Data scientists bridge technical research with business impact. This path builds the English for reading papers, presenting model results, discussing data quality issues, and communicating uncertainty to stakeholders.
Topics covered
- ML paper English
- Model training vocabulary
- Data pipeline language
- Statistical terms
- Presenting results
Vocabulary spotlight
4 terms every Data Science & ML should know in English:
overfitting n.
When a model learns training data too well and fails to generalise to new data
"Validation loss rising while training loss falls is a clear sign of overfitting."
baseline n.
A simple reference model used to benchmark more complex approaches
"Our model outperforms the baseline by 8% on the held-out test set."
data leakage n.
When information from outside the training set is used to build the model
"The suspiciously high accuracy was caused by data leakage in the feature pipeline."
ablation study n.
An experiment removing individual components to measure their contribution
"The ablation study shows that attention is not contributing to accuracy."
📚 Vocabulary Reference
Key terms organised by category for Data Science & MLs:
Data Fundamentals
ML Concepts
Model Types
Evaluation
Tools & Ecosystem
Recommended exercises
Real-world scenarios you'll practise
- Explaining model performance metrics to a non-technical PM
- Writing the findings section of an ML experiment report
- Presenting a data pipeline architecture to engineering
- Discussing dataset bias with stakeholders