CoRD: Collaborative Multi-Teacher Decoding for Long-CoT Reasoning
The newly introduced framework, CoRD (Collaborative Multi-Teacher Decoding), enhances the distillation process of long chain-of-thought (Long-CoT) reasoning derived from large reasoning models (LRMs). Current techniques select complete reasoning paths after the fact, which overlooks the collaboration between diverse teachers and lacks dynamic exploration, resulting in redundant sampling. CoRD enables step-by-step reasoning synthesis through predictive perplexity-based scoring and beam search, allowing multiple LRMs to collaboratively create coherent reasoning paths while maintaining a variety of hypotheses. Experimental results indicate that CoRD yields higher-quality reasoning data, achieving student performance comparable to that of teachers with fewer supervision signals and minimal efficiency costs. This framework also demonstrates strong generalization to out-of-distribution tasks. The research is available on arXiv with ID 2605.02290.
Key facts
- CoRD stands for Collaborative Multi-Teacher Decoding.
- It addresses limitations of curation-based distillation for Long-CoT reasoning.
- Uses step-wise reasoning synthesis with perplexity-based scoring and beam search.
- Enables heterogeneous LRMs to jointly construct reasoning trajectories.
- Achieves near teacher-level student performance with fewer structured signals.
- Generalizes well to out-of-distribution tasks.
- Published on arXiv with ID 2605.02290.
- Reduces redundant sampling and missed complementary reasoning.
Entities
Institutions
- arXiv