Sparse Autoencoders Reveal Concept-Level Forgetting in Continual Learning
A recent preprint on arXiv (2605.16374) presents a diagnostic framework that utilizes Sparse Autoencoders (SAEs) to examine catastrophic forgetting in supervised continual learning with greater detail. The researchers consider each SAE latent as a proxy for concepts representing recurring visual patterns, which allows for an exploration of the internal evolution of task-specific information. This approach surpasses conventional performance-based assessments and broad measures of representational drift, providing a deeper understanding of what forgetting signifies within the representation space of the vision model.
Key facts
- arXiv preprint 2605.16374
- Proposes diagnostic framework using Sparse Autoencoders (SAEs)
- Defines task-anchored latent feature space
- Treats SAE latents as concept proxies
- Analyzes forgetting at finer granularity than task-level performance
- Focuses on internal representation space of vision models
- Cross type: new research
- Addresses catastrophic forgetting in continual learning
Entities
Institutions
- arXiv