Orthrus: Dual-View Diffusion for Parallel LLM Generation

ai-technology · 2026-05-14

A new framework named Orthrus has been developed by researchers, merging the precise output of autoregressive Large Language Models (LLMs) with the rapid token generation capabilities of diffusion models. Traditional autoregressive decoding limits high-throughput inference, whereas diffusion language models face challenges like performance drops and expensive training. Orthrus enhances a static LLM by incorporating a lightweight, trainable module, allowing for a parallel diffusion perspective in addition to the conventional autoregressive approach. Both perspectives utilize the same high-fidelity Key-Value (KV) cache, ensuring accurate generation fidelity during parallel processing. This framework is intended for seamless integration into current Transformers. The research paper can be found on arXiv, reference 2605.12825.

Key facts

Orthrus is a dual-architecture framework for parallel token generation.
It unifies autoregressive LLMs and diffusion models.
Standard autoregressive decoding is a bottleneck for high-throughput inference.
Diffusion language models suffer from performance degradation and high training costs.
Orthrus augments a frozen LLM with a lightweight trainable module.
Both views attend to the same high-fidelity KV cache.
The framework integrates into existing Transformers.
The paper is on arXiv with reference 2605.12825.

Orthrus: Dual-View Diffusion for Parallel LLM Generation

Key facts

Entities

Institutions

Sources