OmicsLM: Multimodal LLM for Multi-Sample Omics Reasoning
OmicsLM is a large multimodal language model that integrates quantitative omics data with biological tasks expressed in natural language. It encodes each transcriptomic profile into a concise continuous format within the LLM framework, maintaining the quantitative expression signals while enabling the simultaneous processing of natural-language directives, specific gene references, and various biological samples. The training involved over 5.5 million examples of instruction-following across more than 70 different task categories, merging continuous transcriptomic data with experimental information expressed through a variety of language templates.
Key facts
- OmicsLM is a multimodal LLM for multi-sample omics reasoning.
- It connects quantitative omics profiles with natural-language biological tasks.
- Each transcriptomic profile is represented as a compact continuous representation within the LLM context.
- The interface preserves quantitative expression signal.
- It allows natural-language instructions, explicit gene mentions, and multiple interleaved biological samples.
- Trained on over 5.5 million instruction-following examples.
- Spans more than 70 task types.
- Combines continuous transcriptomic inputs and experimental data rendered through diverse language templates.
Entities
—