Group Fine-Tuning (GFT): A Unified Post-Training Framework for LLMs

other · 2026-04-30

A recent study published on arXiv introduces Group Fine-Tuning (GFT), a comprehensive post-training framework designed for large language models that overcomes the challenges associated with supervised fine-tuning (SFT) and reinforcement learning (RL). The researchers investigate the dynamics of training and conclude that SFT represents a specific instance of policy gradient optimization characterized by sparse implicit rewards and unstable inverse-probability weighting. This leads to issues such as single-path dependency, entropy collapse, and gradient explosion. GFT features Group Advantage Learning, which creates varied response groups and employs normalized contrastive supervision to mitigate reward sparsity, along with Dynamic Coefficient Rectification, which adaptively regulates inverse-probability weights for stable training. The study can be found at arXiv:2604.14258.

Key facts

arXiv:2604.14258
Group Fine-Tuning (GFT) proposed
SFT interpreted as special case of policy gradient optimization
SFT issues: single-path dependency, entropy collapse, gradient explosion
GFT includes Group Advantage Learning and Dynamic Coefficient Rectification
Group Advantage Learning uses diverse response groups and normalized contrastive supervision
Dynamic Coefficient Rectification adaptively bounds inverse-probability weights
Paper type: replace

Group Fine-Tuning (GFT): A Unified Post-Training Framework for LLMs

Key facts

Entities

Institutions

Sources