MIMIC: Generative Multimodal AI Model for Biomolecules
A team of researchers has unveiled MIMIC, a generative multimodal foundation model designed for biomolecules, which has been trained on a newly assembled dataset known as LORE. This model integrates various modalities, including nucleic acid, protein, evolutionary, structural, regulatory, and semantic/contextual aspects, within partially observed biomolecular states. Utilizing a split-track encoder-decoder framework, MIMIC conditions on selected subsets of observed modalities to reconstruct or generate absent components in the genome, transcriptome, and proteome. The application of multimodal conditioning significantly enhances sequence reconstruction over inputs that rely solely on sequences, and its learned representations set new benchmarks in RNA and protein downstream tasks. This research was published on arXiv (2604.24506).
Key facts
- MIMIC is a generative multimodal foundation model for biomolecules.
- It is trained on the LORE dataset, which aligns multiple modalities.
- Modalities include nucleic acid, protein, evolutionary, structural, regulatory, and semantic/contextual data.
- The architecture is a split-track encoder-decoder.
- It conditions on arbitrary subsets of observed modalities.
- Multimodal conditioning improves sequence reconstruction over sequence-only inputs.
- Learned representations achieve state-of-the-art on RNA and protein tasks.
- Published on arXiv with ID 2604.24506.
Entities
Institutions
- arXiv