Shodh-MoE: Sparse Mixture-of-Experts Architecture for Multi-Physics Foundation Models
A new architecture called Shodh-MoE has been developed by researchers to tackle the issue of negative transfer in multi-physics foundation models. This negative transfer arises when training different partial differential equation (PDE) regimes together—like broadband open-channel fluid dynamics and boundary-dominated porous media flows—leading to gradient conflicts, unstable optimization, and loss of plasticity in dense neural operators. Shodh-MoE utilizes compressed 16^3 physical latents generated by a physics-informed autoencoder, incorporating an intra-tokenizer Helmholtz-style velocity parameterization to ensure that decoded states remain within divergence-free velocity manifolds. The model achieves precise mass conservation, yielding a velocity divergence of about 2.8 x 10^-10, thus addressing a significant challenge in advancing scientific machine learning (SciML) towards universal foundation models.
Key facts
- Shodh-MoE is a sparse-activated latent transformer architecture for multi-physics transport.
- It addresses negative transfer in co-training disparate PDE regimes.
- Operates on compressed 16^3 physical latents from a physics-informed autoencoder.
- Uses Helmholtz-style velocity parameterization to enforce divergence-free velocity manifolds.
- Achieves exact mass conservation with velocity divergence ~2.8 x 10^-10.
- Negative transfer causes gradient conflict, unstable optimization, and plasticity loss.
- Broadband open-channel fluid dynamics and porous media flows impose incompatible demands.
- Published on arXiv with ID 2605.15179.
Entities
Institutions
- arXiv