ARTFEED — Contemporary Art Intelligence

AdaMerge: Salience-Aware Token Merging for Faster ViTs

ai-technology · 2026-05-28

A new method called AdaMerge accelerates Vision Transformers (ViTs) by adaptively merging tokens based on salience, addressing the quadratic cost of self-attention. Existing token merging (ToMe) assumes all tokens are equally important, but self-attention is non-uniform, causing information loss in high-salience tokens under aggressive compression. AdaMerge introduces two mechanisms: salience-weighted similarity, which uses column-wise feature-affinity centrality as a token-importance proxy and incorporates salience scores into bipartite matching, ensuring pivotal tokens contribute more; and adaptive merging intensity, which uses pre-computed layer-wise statistics to adjust merging rates. The framework is training-free and designed for practical deployment. The paper is available on arXiv under ID 2605.27465.

Key facts

  • AdaMerge is a token-merging framework for Vision Transformers.
  • It addresses the quadratic cost of self-attention.
  • Existing token merging (ToMe) assumes token equality.
  • Self-attention is non-uniform, causing information loss in high-salience tokens.
  • Salience-weighted similarity uses column-wise feature-affinity centrality.
  • Adaptive merging intensity uses pre-computed layer-wise statistics.
  • AdaMerge is training-free.
  • Paper ID: arXiv:2605.27465.

Entities

Institutions

  • arXiv

Sources