D-RPC: Compressing Reasoning Paths for LLM Distillation

ai-technology · 2026-05-11

A novel technique known as Distillation through Reasoning Path Compression (D-RPC) enhances the transfer of reasoning skills from large language models (LLMs) to smaller counterparts. The rationales provided by teachers for analogous problems frequently differ in their structure and approach, leading to inconsistent supervision. D-RPC limits the teacher's options to a compact, dynamically updated collection of reusable high-level reasoning paths, selecting the most pertinent path for each training query. This approach ensures consistent rationales for similar issues while preserving diversity among various problem types. A PAC-Bayes analysis clarifies the balance between the size of the bank and its coverage: smaller banks lower supervision entropy but may create coverage gaps, with the generalization bound pinpointing an ideal intermediate size.

Key facts

D-RPC stands for Distillation through Reasoning Path Compression
Teacher rationales for similar problems often vary in structure and strategy
D-RPC constrains the teacher to follow a compact bank of reusable reasoning paths
The bank is dynamically maintained
For each training question, D-RPC retrieves the most relevant path
Rationales are consistent across similar problems but diverse across problem types
A PAC-Bayes analysis formalizes the trade-off between bank size and coverage
Smaller banks reduce supervision entropy but risk coverage gaps

Entities

—

Sources

arXiv cs.AI — 2026-05-11