AMOR: Adaptive Entropy Gate for Hybrid Recurrent-Attention Models
So, there’s this new thing called AMOR, which stands for Adaptive Metacognitive Output Router. It’s a fresh type of hybrid model that uses attention based on how uncertain predictions are. The setup improves a recurrent structure by adding these special entropy-gated attention blocks that kick in only when the model's output uncertainty goes beyond a certain point, determined by the median and standard deviation of the current batch. This leads to a simple way of routing that’s inspired by how we process uncertainty. When tested on Mamba2 and Gated DeltaNet backbones, which range from 180 million to 1.5 billion parameters, AMOR often outperformed both standard recurrent models and fixed attention methods, while using attention much more efficiently.
Key facts
- AMOR stands for Adaptive Metacognitive Output Router
- It is a post-hoc hybrid architecture for recurrent-attention models
- Attention is invoked based on predictive uncertainty via entropy gating
- Dynamic threshold uses running batch median and scaled standard deviation
- Gradient-free routing mechanism inspired by System 1 / System 2
- Tested on Mamba2 and Gated DeltaNet backbones from 180M to 1.5B parameters
- Matches or outperforms pure recurrent and fixed-schedule hybrid baselines
- Invokes attention on only a fraction of tokens
Entities
Institutions
- arXiv