Reasoning Compression via Mixed-Policy Distillation

ai-technology · 2026-05-12

A new method called Mixed-Policy Distillation (MPD) compresses reasoning traces from large language models into smaller ones, reducing token usage and inference cost. Larger models produce more concise reasoning, while smaller models generate longer, redundant trajectories. MPD transfers concise reasoning behavior from a larger teacher to a smaller student without explicit length constraints, addressing real-world deployment constraints like memory and latency.

Key facts

Reasoning-centric LLMs generate intermediate reasoning trajectories.
Larger models produce more concise traces than smaller models.
Smaller models have longer, more redundant trajectories.
MPD transfers reasoning compression from large to small models.
MPD uses teacher-compressed student trajectories for distillation.
The method avoids explicit length constraints.
It targets memory, latency, and serving-cost constraints.
The paper is on arXiv with ID 2605.08776.

Reasoning Compression via Mixed-Policy Distillation

Key facts

Entities

Institutions

Sources