Two Speeds of Learning: Grokking and Double Descent Decomposed

other · 2026-05-27

A recent paper on arXiv (2605.27078) presents a framework that is task-agnostic for elucidating grokking and epoch-wise double descent in deep neural networks. The researchers break down the learning dynamics into two opposing processes: representation learning within the encoder and readout calibration in the classifier's final stage. By employing representational geometry, neural tangent kernels, and linear probing, they demonstrate that both processes remain active during training, with their varying speeds leading to the generalization effects observed. This framework aims to fill the gap for a cohesive analytical tool applicable to realistic tasks and architectures.

Key facts

arXiv paper 2605.27078
Announce Type: cross
Analyzes grokking and epoch-wise double descent
Decomposes learning into representation learning and readout calibration
Uses representational geometry, neural tangent kernels, linear probing
Both processes are active throughout training
Relative speed fluctuations cause the phenomena
Task-agnostic framework for realistic tasks and architectures

Two Speeds of Learning: Grokking and Double Descent Decomposed

Key facts

Entities

Institutions

Sources