Mixture of Heterogeneous Grouped Experts for Efficient Language Modeling

ai-technology · 2026-04-29

A recent paper published on arXiv introduces the Mixture of Heterogeneous Grouped Experts (MoHGE) to tackle the shortcomings of traditional Mixture-of-Experts (MoE) models in Large Language Models (LLMs). Conventional MoEs impose uniform sizes on experts, leading to inflexibility that does not match computational demands with the varying complexity of tokens. Although heterogeneous expert designs aim to diversify expert sizes, they struggle with uneven GPU usage and poor parameter efficiency. MoHGE features a two-tier routing system for adaptable, resource-conscious expert combinations and suggests a Group-Wise Auxiliary Loss to effectively direct tokens to the most efficient experts, enhancing inference performance. The paper can be found on arXiv with the ID 2604.23108.

Key facts

arXiv paper ID: 2604.23108
Proposes Mixture of Heterogeneous Grouped Experts (MoHGE)
Addresses rigidity of uniform expert sizes in standard MoE
Heterogeneous expert architectures have unbalanced GPU utilization
MoHGE uses a two-level routing mechanism
Introduces Group-Wise Auxiliary Loss for token steering
Aims to bridge theoretical heterogeneity and industrial application
Focuses on optimizing inference efficiency

Mixture of Heterogeneous Grouped Experts for Efficient Language Modeling

Key facts

Entities

Institutions

Sources