AEN-SAEs: Solving Feature Starvation in Sparse Autoencoders

ai-technology · 2026-05-09

A new paper on arXiv (2605.05341) argues that feature starvation in sparse autoencoders (SAEs) is a fundamental geometric pathology, not just a data diversity issue. Standard ℓ1-regularized SAEs suffer from dead neurons and shrinkage bias, requiring costly heuristic fixes. The authors propose adaptive elastic net SAEs (AEN-SAEs), a fully differentiable architecture combining an ℓ2 term for strong convexity, addressing the instability of ℓ1-induced sparse coding maps in overcomplete dictionaries.

Key facts

Paper on arXiv: 2605.05341
Title: Feature Starvation as Geometric Instability in Sparse Autoencoders
SAEs are used to disentangle LLM representations into monosemantic concepts
Standard ℓ1-regularized SAEs suffer from feature starvation and shrinkage bias
Feature starvation is argued to be a fundamental optimization-geometric pathology
AEN-SAEs combine ℓ1 and ℓ2 regularization for strong convexity
AEN-SAEs are fully differentiable and grounded in classical sparse regression

AEN-SAEs: Solving Feature Starvation in Sparse Autoencoders

Key facts

Entities

Institutions

Sources