AdvancedVocabulary#ai-llm#data-science-ml#developer-tools

Model Quantization Vocabulary

Learn the vocabulary of reducing a model's numeric precision to shrink memory and compute cost.

0 / 5 completed
1 / 5
At standup, a dev mentions reducing a model's weights from 16-bit floating point down to 8-bit or 4-bit integers to shrink its memory footprint and speed up inference. What is this technique called?