SFT Effectiveness in LLMs Explained via Interaction Perspective
A new arXiv paper (2605.17967) investigates why supervised fine-tuning (SFT) works well for small neural networks but can harm large language models (LLMs). Using interaction-based explanations, researchers found that SFT primarily removes noise-like interactions without acquiring reliable new ones, and this denoising phase is extremely brief. Continued fine-tuning introduces overfitted interactions. The study validates these findings across multiple LLMs and datasets, offering insights into early stopping and practical guidance for LLM training.
Key facts
- arXiv paper 2605.17967 explores SFT effectiveness in LLMs
- SFT removes noise-like interactions but rarely acquires reliable new ones
- Denoising stage is extremely brief
- Continued fine-tuning introduces overfitted interactions
- Validated across multiple LLMs and datasets
- Provides insights into early stopping
- Interaction-based explanations used as metric
- SFT can produce inconsistent or detrimental effects on LLMs
Entities
Institutions
- arXiv