Prefix-RFT: Hybrid LLM Post-Training Method

other · 2026-05-18

A new hybrid approach to large language model post-training, Prefix-RFT, combines supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT) to overcome their respective limitations. SFT excels at mimicking demonstration data but suffers from behavior cloning, while RFT enhances performance but is sensitive to initial policy and prone to unexpected behaviors. Prefix-RFT synergizes learning from both demonstration and exploration, using mathematical reasoning problems as a test bed. The method outperforms standalone SFT, standalone RFT, and parallel mixed-policy RFT. The paper highlights the complementary nature of SFT and RFT, proposing a unified view of these techniques.

Key facts

Prefix-RFT is a hybrid approach combining SFT and RFT
SFT excels at mimicking demonstration data but can lead to problematic generalization
RFT enhances performance but is sensitive to initial policy
Prefix-RFT outperforms standalone SFT and RFT
Prefix-RFT outperforms parallel mixed-policy RFT
Mathematical reasoning problems were used as test bed
The approach is described as simple yet effective
The paper proposes a unified view of SFT and RFT

Entities

—

Sources

arXiv cs.AI — 2026-05-18