GPU & Inference Scaling Vocabulary

Practise vocabulary for scaling ML inference: GPU utilisation, dynamic batching, autoscaling, cold starts, and throughput vs latency trade-offs.

0 / 5 completed
1 / 5
Grouping several incoming requests so they run together on the GPU in a single forward pass is called ___.