Forward-Forward Networks Suffer Layer Free-Riding, Remedies Found
A new paper on arXiv (2605.06240) identifies and addresses a flaw in cumulative-goodness variants of Forward-Forward (FF) neural networks. The authors formalize 'layer free-riding,' where later layers inherit partially separated tasks from earlier layers, causing the class-discrimination gradient to decay exponentially with accumulated positive margin. They propose three local remedies—per-block, hardness-gated, and depth-scaled—that recover separation without backpropagated gradients. On CIFAR-10 and CIFAR-100, these fixes improve layer-separation statistics by 4× to 45× in deeper layers, while accuracy changes by less than one percentage point for non-degenerate training. Tiny ImageNet provides a cross-dataset check. The work suggests free-riding is real and repairable but not accuracy-dominant.
Key facts
- Forward-Forward (FF) training uses local goodness criterion per layer.
- Cumulative-goodness variants cause layer free-riding.
- Gradient decays exponentially with positive margin from preceding blocks.
- Three local remedies: per-block, hardness-gated, depth-scaled.
- CIFAR-10 and CIFAR-100 show 4× to 45× gains in deeper layers.
- Accuracy changes by less than one percentage point.
- Tiny ImageNet used as cross-dataset check.
- Paper available at arXiv:2605.06240.
Entities
Institutions
- arXiv