AdaMerge: Salience-Aware Token Merging for Faster ViTs
A new method called AdaMerge accelerates Vision Transformers (ViTs) by adaptively merging tokens based on salience, addressing the quadratic cost of self-attention. Existing token merging (ToMe) assumes all tokens are equally important, but self-attention is non-uniform, causing information loss in high-salience tokens under aggressive compression. AdaMerge introduces two mechanisms: salience-weighted similarity, which uses column-wise feature-affinity centrality as a token-importance proxy and incorporates salience scores into bipartite matching, ensuring pivotal tokens contribute more; and adaptive merging intensity, which uses pre-computed layer-wise statistics to adjust merging rates. The framework is training-free and designed for practical deployment. The paper is available on arXiv under ID 2605.27465.
Key facts
- AdaMerge is a token-merging framework for Vision Transformers.
- It addresses the quadratic cost of self-attention.
- Existing token merging (ToMe) assumes token equality.
- Self-attention is non-uniform, causing information loss in high-salience tokens.
- Salience-weighted similarity uses column-wise feature-affinity centrality.
- Adaptive merging intensity uses pre-computed layer-wise statistics.
- AdaMerge is training-free.
- Paper ID: arXiv:2605.27465.
Entities
Institutions
- arXiv