ARTFEED — Contemporary Art Intelligence

JetBrains Releases Mellum2, a 12B MoE Model for Low-Latency Text and Code

ai-technology · 2026-06-01

JetBrains has unveiled Mellum2, a Mixture-of-Experts (MoE) model featuring 12 billion parameters, developed from the ground up for natural language and code. This model activates just 2.5 billion parameters per token, facilitating efficient, low-latency inference. It is released under the Apache 2.0 license and can be found on Hugging Face. Mellum2 is aimed at latency-sensitive tasks such as routing, retrieval-augmented generation (RAG), summarization, sub-agents, and private deployments, boasting over 2x faster inference compared to similar-sized open models. Focused solely on text and code, it avoids multimodal tasks to maintain efficiency in software engineering. A comprehensive technical report on its architecture, training, benchmarks, and evaluation is accessible on arXiv.

Key facts

  • Mellum2 is a 12B-parameter Mixture-of-Experts model.
  • It activates only 2.5B parameters per token.
  • Released under Apache 2.0 license.
  • Available on Hugging Face.
  • Achieves more than 2x faster inference than similar-sized models.
  • Focused on text and code, not multimodal.
  • Designed for latency-sensitive workloads in multi-model AI systems.
  • Technical report available on arXiv.

Entities

Institutions

  • JetBrains
  • Hugging Face

Sources