Suiren-1.0: Molecular Foundation Models for Organic Systems
The introduction of the Suiren-1.0 family of molecular foundation models aims to enhance the accurate modeling of various organic systems. This family consists of three distinct variants: Suiren-Base, Suiren-Dimer, and Suiren-ConfAvg, all embedded within an algorithmic framework that links 3D conformational geometry with 2D statistical ensemble spaces. Suiren-Base features 1.8 billion parameters and was pre-trained on a 70-million-sample Density Functional Theory dataset, utilizing spatial self-supervision and SE(3)-equivariant architectures to ensure reliable quantum property predictions. Suiren-Dimer builds upon this by further pre-training on 13.5 million intermolecular interaction samples. The Conformation Compression Distillation (CCD) framework employs a diffusion-based method to convert intricate 3D structures into 2D conformation-averaged representations for streamlined downstream use.
Key facts
- Suiren-1.0 is a family of molecular foundation models for organic systems.
- Includes three variants: Suiren-Base, Suiren-Dimer, and Suiren-ConfAvg.
- Suiren-Base has 1.8 billion parameters.
- Pre-trained on a 70-million-sample Density Functional Theory dataset.
- Uses spatial self-supervision and SE(3)-equivariant architectures.
- Suiren-Dimer pre-trained on 13.5 million intermolecular interaction samples.
- CCD framework distills 3D representations into 2D conformation-averaged representations.
- Published on arXiv with ID 2603.21942.
Entities
Institutions
- arXiv