Moonwalk: Inverse-Forward Differentiation

ai-technology · 2026-05-25

A new paper on arXiv proposes Moonwalk, a method to eliminate the need for storing intermediate activations during neural network training. Backpropagation's memory bottleneck arises from saving residuals in the forward pass. The authors define submersive networks with trivial Jacobian cokernels, allowing exact gradient reconstruction without stored activations. For non-submersive layers, fragmental gradient checkpointing records only minimal residuals to restore lost cotangents. The key innovation is the vector-inverse operator. This work addresses a fundamental limitation in training deep networks.

Key facts

Paper title: Moonwalk: Inverse-Forward Differentiation
arXiv ID: 2402.14212
Announcement type: replace-cross
Addresses backpropagation's need to store intermediate activations
Defines submersive networks with trivial Jacobian cokernels
Introduces fragmental gradient checkpointing for non-submersive layers
Novel operator: vector-inverse
Aims to enable training of deeper networks without memory overhead

Moonwalk: Inverse-Forward Differentiation

Key facts

Entities

Institutions

Sources