ARTFEED — Contemporary Art Intelligence

Post-Training Method Converts Dense LLMs to Sparse MoE Architectures

ai-technology · 2026-04-25

An innovative analytical framework has been introduced by researchers that transforms feed-forward networks (FFNs) in large language models (LLMs) into sparse Mixture-of-Experts (MoE) architectures, utilizing only a limited calibration dataset. By examining neuron activation patterns, the method categorizes neurons into consistently active shared experts and conditionally active routed experts, subsequently creating a router based on representative neuron statistics. This allows for immediate implementation or optional lightweight fine-tuning, circumventing the need for extensive retraining on hundreds of billions of tokens. The technique is applicable to both dense models and existing MoE models for hierarchical sparsity. While scaling LLMs enhances performance, it also raises inference costs, primarily due to FFNs, which consume the majority of computational resources. MoE architectures mitigate these expenses through sparse activation, yet transforming dense models into MoEs usually necessitates significant retraining. This framework effectively resolves that challenge by facilitating swift conversion with minimal data.

Key facts

  • Framework restructures FFNs into sparse MoE architectures post-training
  • Uses only a small calibration dataset
  • Analyzes neuron activation patterns to partition neurons into shared and routed experts
  • Router constructed analytically from representative neuron statistics
  • Enables immediate deployment or optional lightweight fine-tuning
  • Applies to dense models and recursively to existing MoE models
  • Avoids retraining on hundreds of billions of tokens
  • Reduces inference costs of LLMs through sparse activation

Entities

Institutions

  • arXiv

Sources