Quantization: represents weights/activations with fewer bits to shrink memory footprint and speed up inference, with some potential loss of accuracy.
2 / 5
What is the primary benefit of quantizing a model?
Benefit: lower precision reduces memory and bandwidth needs and leverages faster integer hardware, enabling deployment on smaller GPUs or edge devices.
3 / 5
How does post-training quantization (PTQ) differ from quantization-aware training (QAT)?
PTQ vs QAT: PTQ is quick and applied after training, while QAT incorporates quantization effects into training, usually recovering more accuracy at lower bit widths.
4 / 5
What is a typical risk of aggressive (e.g. 4-bit) quantization?
Accuracy risk: very low bit widths can lose information; techniques like per-channel scales, outlier handling, and mixed precision mitigate the quality drop.
5 / 5
What does INT8 quantization use to map floats to integers?
Scale and zero-point: quantization computes a scale factor (and zero-point for asymmetric schemes) so floating-point values map onto the limited integer range and back during dequantization.