Moonwalk: Inverse-Forward Differentiation
A new paper on arXiv proposes Moonwalk, a method to eliminate the need for storing intermediate activations during neural network training. Backpropagation's memory bottleneck arises from saving residuals in the forward pass. The authors define submersive networks with trivial Jacobian cokernels, allowing exact gradient reconstruction without stored activations. For non-submersive layers, fragmental gradient checkpointing records only minimal residuals to restore lost cotangents. The key innovation is the vector-inverse operator. This work addresses a fundamental limitation in training deep networks.
Key facts
- Paper title: Moonwalk: Inverse-Forward Differentiation
- arXiv ID: 2402.14212
- Announcement type: replace-cross
- Addresses backpropagation's need to store intermediate activations
- Defines submersive networks with trivial Jacobian cokernels
- Introduces fragmental gradient checkpointing for non-submersive layers
- Novel operator: vector-inverse
- Aims to enable training of deeper networks without memory overhead
Entities
Institutions
- arXiv