Iterative Finetuning Found Mostly Idempotent in New Study
A recent study published on arXiv investigates the impact of fine-tuning language models using their own generated outputs. Researchers assessed multiple models, each succeeding model refined with output data from the previous one, based on an initial character set. They experimented with three techniques: supervised fine-tuning (SFT) for instruction-based models, synthetic document fine-tuning (SDF) for foundational models, and direct preference optimization (DPO). Findings indicated that while SFT and SDF generally maintained or decreased specific model traits, DPO consistently enhanced these traits unless the models were reset, highlighting DPO as an exception in the effects of iterative fine-tuning.
Key facts
- arXiv paper 2605.01130 examines iterative finetuning effects on model behavior.
- Supervised finetuning (SFT) on instruct models showed trait decay or constancy.
- Synthetic document finetuning (SDF) on base models also showed decay or constancy.
- Direct preference optimization (DPO) could reliably amplify traits under continuous training.
- Trait amplification in DPO vanished when models were reinitialized each cycle.
- Rare amplification in SFT/SDF came at the cost of coherence.
- Initial model was seeded with a persona or belief.
- Study concludes iterative finetuning is mostly idempotent.
Entities
Institutions
- arXiv