AdvancedVocabulary#data-science-ml#backend#developer-tools

Knowledge Distillation Vocabulary

Build fluency in the vocabulary of training a small student model to mimic a larger teacher model's outputs.

0 / 5 completed
1 / 5
At standup, a dev mentions training a small, fast model to mimic the output probabilities of a much larger, more accurate model, so the small model captures most of the larger model's behavior at a fraction of its inference cost. What is this technique called?