Mixture of Activations: Token-Adaptive FFN Design for LLMs

ai-technology · 2026-05-27

Researchers have introduced the Mixture of Activations (MoA), a novel token-adaptive feedforward network structure that utilizes a blend of activation functions through lightweight, input-dependent gates while maintaining shared linear projections. Additionally, a counterpart known as learnable activations (LA) creates linear combinations of activation functions applicable to both ReLU-type and SwiGLU-type FFNs. This study delineates clear finite-width expressive distinctions: LA is a strict superset of fixed-activation FFNs, and MoA is a strict superset of LA. This innovation overcomes the drawback of conventional FFN designs that rely on a single fixed activation function uniformly across all tokens. The research can be accessed on arXiv under ID 2605.26647.

Key facts

Mixture of Activations (MoA) is a token-adaptive FFN design
MoA mixes a dictionary of activation functions using lightweight input-dependent gates
Linear projections are shared across activations in MoA
Learnable activations (LA) are an input-independent counterpart
LA forms linear combinations of activation functions for ReLU-type and SwiGLU-type FFNs
Strict finite-width expressive separations are established: LA contains fixed-activation FFNs, MoA contains LA
Most FFN designs use a single fixed activation function applied uniformly to all tokens
Paper available on arXiv with ID 2605.26647

Mixture of Activations: Token-Adaptive FFN Design for LLMs

Key facts

Entities

Institutions

Sources