Two Speeds of Learning: Grokking and Double Descent Decomposed
A recent paper on arXiv (2605.27078) presents a framework that is task-agnostic for elucidating grokking and epoch-wise double descent in deep neural networks. The researchers break down the learning dynamics into two opposing processes: representation learning within the encoder and readout calibration in the classifier's final stage. By employing representational geometry, neural tangent kernels, and linear probing, they demonstrate that both processes remain active during training, with their varying speeds leading to the generalization effects observed. This framework aims to fill the gap for a cohesive analytical tool applicable to realistic tasks and architectures.
Key facts
- arXiv paper 2605.27078
- Announce Type: cross
- Analyzes grokking and epoch-wise double descent
- Decomposes learning into representation learning and readout calibration
- Uses representational geometry, neural tangent kernels, linear probing
- Both processes are active throughout training
- Relative speed fluctuations cause the phenomena
- Task-agnostic framework for realistic tasks and architectures
Entities
Institutions
- arXiv