JetBrains Releases Mellum2, a 12B MoE Model for Low-Latency Text and Code

ai-technology · 2026-06-01

JetBrains has unveiled Mellum2, a Mixture-of-Experts (MoE) model featuring 12 billion parameters, developed from the ground up for natural language and code. This model activates just 2.5 billion parameters per token, facilitating efficient, low-latency inference. It is released under the Apache 2.0 license and can be found on Hugging Face. Mellum2 is aimed at latency-sensitive tasks such as routing, retrieval-augmented generation (RAG), summarization, sub-agents, and private deployments, boasting over 2x faster inference compared to similar-sized open models. Focused solely on text and code, it avoids multimodal tasks to maintain efficiency in software engineering. A comprehensive technical report on its architecture, training, benchmarks, and evaluation is accessible on arXiv.

Key facts

Mellum2 is a 12B-parameter Mixture-of-Experts model.
It activates only 2.5B parameters per token.
Released under Apache 2.0 license.
Available on Hugging Face.
Achieves more than 2x faster inference than similar-sized models.
Focused on text and code, not multimodal.
Designed for latency-sensitive workloads in multi-model AI systems.
Technical report available on arXiv.

JetBrains Releases Mellum2, a 12B MoE Model for Low-Latency Text and Code

Key facts

Entities

Institutions

Sources