AEN-SAEs: Solving Feature Starvation in Sparse Autoencoders
A new paper on arXiv (2605.05341) argues that feature starvation in sparse autoencoders (SAEs) is a fundamental geometric pathology, not just a data diversity issue. Standard ℓ1-regularized SAEs suffer from dead neurons and shrinkage bias, requiring costly heuristic fixes. The authors propose adaptive elastic net SAEs (AEN-SAEs), a fully differentiable architecture combining an ℓ2 term for strong convexity, addressing the instability of ℓ1-induced sparse coding maps in overcomplete dictionaries.
Key facts
- Paper on arXiv: 2605.05341
- Title: Feature Starvation as Geometric Instability in Sparse Autoencoders
- SAEs are used to disentangle LLM representations into monosemantic concepts
- Standard ℓ1-regularized SAEs suffer from feature starvation and shrinkage bias
- Feature starvation is argued to be a fundamental optimization-geometric pathology
- AEN-SAEs combine ℓ1 and ℓ2 regularization for strong convexity
- AEN-SAEs are fully differentiable and grounded in classical sparse regression
Entities
Institutions
- arXiv