🏭 Synthetic Data Generation Vocabulary
0 / 5 completed
1 / 5
A colleague says: "We trained the model on synthetic data because we didn't have access to real patient records."
What makes data "synthetic" by definition?
Synthetic data = artificially generated data that was never directly observed or collected from real people or systems:
| Data type | Definition | Derived from real data? |
|---|---|---|
| Synthetic data | Algorithmically generated to mirror real-data statistics | No |
| Anonymised data | Real data with identifiers removed or generalised | Yes |
| Pseudonymised data | Real data with identifiers replaced by tokens | Yes |
Key vocabulary: fully synthetic, partially synthetic, hybrid synthetic, statistical properties, data distribution.