ARTFEED — Contemporary Art Intelligence

FAR Framework Optimizes Transformer Attention for ReRAM Accelerators

other · 2026-05-18

A new framework called FAR (Function-preserving Attention Replacement) has been introduced by researchers to replace attention mechanisms in pretrained DeiT vision transformers with sequential modules that work with in-memory computing (IMC) devices. This method substitutes self-attention with a multi-head bidirectional LSTM architecture through block-wise distillation, allowing for linear-time computation and efficient weight reuse. FAR effectively mitigates the latency and bandwidth issues associated with activation-to-activation multiplications and non-local memory access on ReRAM-based accelerators. Additionally, structured pruning is utilized to tailor models for resource-limited IMC arrays while preserving functional integrity. Evaluations conducted on the DeiT family highlight the framework's effectiveness.

Key facts

  • FAR replaces attention in pretrained DeiTs with sequential modules for IMC compatibility
  • Self-attention is replaced by multi-head bidirectional LSTM via block-wise distillation
  • Enables linear-time computation and localized weight reuse
  • Structured pruning allows adaptation to resource-constrained IMC arrays
  • Evaluated on the DeiT family of vision transformers
  • Addresses latency and bandwidth overhead on ReRAM accelerators
  • Published on arXiv with ID 2505.21535
  • Announce type: replace-cross

Entities

Institutions

  • arXiv

Sources