AdvancedVocabulary#ai-llm#data-science-ml#developer-tools

Activation Checkpointing Vocabulary

Learn the vocabulary of trading extra compute for reduced memory usage during model training.

0 / 5 completed

1 / 5

At standup, a dev mentions a training technique that discards a layer's intermediate activations during the forward pass and recomputes them during the backward pass, trading extra compute for reduced memory usage. What is this technique called?

2 / 5

During a design review, the team wants to decide specifically which layers act as checkpoint boundaries, balancing how much recomputation happens during the backward pass against how much memory is actually saved. Which capability supports this?

3 / 5

In a code review, a dev notices the recomputed activations during the backward pass are generated using the exact same numeric precision as the original forward pass, rather than a different precision that could silently introduce a mismatch. What does this represent?

4 / 5

An incident report shows training slowed down dramatically after activation checkpointing was enabled, because checkpoints had been placed at nearly every layer, forcing the backward pass to recompute almost the entire forward pass. What practice would prevent this?

5 / 5

During a PR review, a teammate asks why the team uses activation checkpointing instead of just buying more memory or using more GPUs to fit the same model and batch size. What is the reasoning?