Research Reveals How Mixture-of-Experts Models Route Information Through Control and Content Channels

ai-technology · 2026-04-22

A recent study presents a novel parameter-free decomposition technique for Mixture-of-Experts (MoE) models, examining six unique architectures. The findings reveal that the hidden state of each layer divides into two separate channels: one for a control signal that influences routing choices and another for an orthogonal content channel that the router cannot detect. The content channel retains surface-level attributes such as language, token identity, and position, whereas the control signal represents an abstract function that varies across layers. Due to the low bandwidth of routing decisions, this division necessitates compositional specialization among layers. Although individual experts in the models exhibit polysemy, the routes they take become monosemantic, grouping tokens by semantic function across various languages and forms. The same token may navigate different paths based on its semantic context. This research was published on arXiv under identifier 2604.17837v1.

Key facts

A parameter-free decomposition method for Mixture-of-Experts models was introduced
The method splits each layer's hidden state into control and content channels
Six different MoE architectures were analyzed in the research
Surface-level features are preserved in the content channel
The control signal encodes an abstract function that rotates between layers
Routing decisions operate with low bandwidth
Individual experts remain polysemantic while expert paths become monosemantic
The research was announced on arXiv with identifier 2604.17837v1

Research Reveals How Mixture-of-Experts Models Route Information Through Control and Content Channels

Key facts

Entities

Institutions

Sources