ARTFEED — Contemporary Art Intelligence

Mixture of Activations: Token-Adaptive FFN Design for LLMs

ai-technology · 2026-05-27

Researchers have introduced the Mixture of Activations (MoA), a novel token-adaptive feedforward network structure that utilizes a blend of activation functions through lightweight, input-dependent gates while maintaining shared linear projections. Additionally, a counterpart known as learnable activations (LA) creates linear combinations of activation functions applicable to both ReLU-type and SwiGLU-type FFNs. This study delineates clear finite-width expressive distinctions: LA is a strict superset of fixed-activation FFNs, and MoA is a strict superset of LA. This innovation overcomes the drawback of conventional FFN designs that rely on a single fixed activation function uniformly across all tokens. The research can be accessed on arXiv under ID 2605.26647.

Key facts

  • Mixture of Activations (MoA) is a token-adaptive FFN design
  • MoA mixes a dictionary of activation functions using lightweight input-dependent gates
  • Linear projections are shared across activations in MoA
  • Learnable activations (LA) are an input-independent counterpart
  • LA forms linear combinations of activation functions for ReLU-type and SwiGLU-type FFNs
  • Strict finite-width expressive separations are established: LA contains fixed-activation FFNs, MoA contains LA
  • Most FFN designs use a single fixed activation function applied uniformly to all tokens
  • Paper available on arXiv with ID 2605.26647

Entities

Institutions

  • arXiv

Sources