AdvancedVocabulary#ai-llm#data-science-ml#developer-tools

Activation Checkpointing Vocabulary

Learn the vocabulary of trading extra compute for reduced memory usage during model training.

0 / 5 completed
1 / 5
At standup, a dev mentions a training technique that discards a layer's intermediate activations during the forward pass and recomputes them during the backward pass, trading extra compute for reduced memory usage. What is this technique called?