RIFT: A New Framework for LLM Alignment Using Negative Samples
Researchers propose Reward Informed Fine-Tuning (RIFT), a framework for aligning large language models (LLMs) that repurposes negative samples instead of discarding them. Unlike Supervised Fine-Tuning (SFT), which relies on costly expert data, and Rejection Sampling Fine-Tuning (RFT), which uses hard thresholding to discard negative trajectories, RIFT reweights the loss with scalar rewards to learn from both positive and negative self-generated samples. A stabilized loss formulation prevents training collapse caused by naive reward integration. Experiments on mathematical benchmarks across various base models show RIFT consistently outperforms RFT. The paper is available on arXiv.
Key facts
- RIFT stands for Reward Informed Fine-Tuning.
- RIFT repurposes negative trajectories by reweighting loss with scalar rewards.
- RIFT addresses data inefficiency in SFT and RFT.
- A stabilized loss formulation ensures numerical robustness.
- Experiments on mathematical benchmarks show RIFT outperforms RFT.
- The paper is published on arXiv with ID 2601.09253.
- RIFT uses all self-generated samples, both positive and negative.
- The framework is designed for LLM alignment.
Entities
Institutions
- arXiv