Mixtral MoE Router Behavior Under Benign and Harmful Prompts
A research paper examines the routing behavior of the Mixtral 8x7B-Instruct, a sparse mixture-of-experts language model, in response to both benign and harmful prompts. Researchers utilized activation-based and gradient-based signals, revealing that expert usage based on activation is extensive and follows a long-tailed distribution, whereas gradient-based importance is more focused. When assessing at the expert level, the groups responding to benign and harmful prompts show slight separation. In terms of layer analysis, routing based on activation is particularly selective in layers 8-15, while gradient-based importance is concentrated in the final layers. The full paper can be accessed on arXiv.
Key facts
- Study of Mixtral 8x7B-Instruct routing behavior
- Uses activation-based and gradient-based signals
- Activation-based expert usage is broad and long-tailed
- Gradient-based importance is concentrated
- Benign and harmful prompt groups show modest separation at expert level
- Activation-based routing most selective at layers 8-15
- Gradient-based importance concentrated in final layers
- Paper available on arXiv
Entities
Institutions
- arXiv