ARTFEED — Contemporary Art Intelligence

Budgeted Attention Allocation Boosts Transformer Efficiency

other · 2026-05-09

A novel technique named Budgeted Attention Allocation enables transformers to function across various cost-quality levels using a single trained model. This method incorporates a monotone head-gating system based on the specified attention budget. The importance of dense warm-starting for stability was highlighted. In a synthetic sequence task, the budgeted model recorded an impressive 99.7% accuracy at an estimated attention cost of 0.303 and achieved 100.0% at a cost of 0.504. For AG News, a custom word-level transformer with hard-gate adaptation attained 82.1% accuracy and a 1.28x speedup at a budget of 0.50. Additionally, budgeted structural pruning on a pretrained BERT-Mini yielded 87.6% accuracy with a 1.20x speedup at the same budget, surpassing a zero-shot dense post-hoc baseline (86.1%) and closely approaching a per-budget specialist after one recovery epoch (87.9%). The approach was also evaluated on DBpedia14 using BERT-Mini.

Key facts

  • Budgeted Attention Allocation is a monotone head-gating mechanism conditioned on a requested attention budget.
  • Dense warm-starting is important for stability.
  • On a synthetic sequence task, the model reached 99.7% accuracy at 0.303 cost and 100.0% at 0.504 cost.
  • On AG News with a custom word-level transformer, hard-gate adaptation achieved 82.1% accuracy with 1.28x speedup at budget 0.50.
  • On pretrained BERT-Mini AG News, budgeted structural pruning reached 87.6% accuracy with 1.20x speedup at budget 0.50.
  • A zero-shot dense post-hoc structural baseline reached 86.1% accuracy.
  • One recovery epoch raised the per-budget specialist to 87.9% accuracy.
  • The method was also tested on DBpedia14 with BERT-Mini.

Entities

Institutions

  • arXiv

Sources