IRIS: A New Self-Play Fine-Tuning Framework for LLMs

ai-technology · 2026-04-25

A novel framework for self-play fine-tuning, named IRIS (Interpolative Rényi Iterative Self-play), has been introduced for large language models. This approach enhances models beyond traditional supervised fine-tuning, eliminating the need for human annotations by comparing annotated outputs with those generated by the model itself. Current techniques such as SPIN (KL-based), SPACE (Jensen-Shannon through noise contrastive estimation), and SPIF (χ²-regularized) each excel in different scenarios based on the distributional gap between the model and the target; however, no single divergence achieves optimal learning dynamics throughout all training phases. IRIS employs a Rényi-based objective with a flexible order parameter α, dividing into two independent tilted risk components for annotated and synthetic data, with exponential importance weights managed by α. This framework seeks to effectively balance learning from both real and generated datasets. The research can be found on arXiv under ID 2604.20933.

Key facts

IRIS stands for Interpolative Rényi Iterative Self-play.
It is a self-play fine-tuning framework for large language models.
Self-play fine-tuning improves models beyond supervised fine-tuning without human annotations.
Existing methods include SPIN (KL-based), SPACE (Jensen-Shannon), and SPIF (χ²-regularized).
IRIS uses a Rényi-based objective with adjustable order parameter α.
The objective decomposes into two independent tilted risk terms.
Exponential importance weights are controlled by α.
The paper is on arXiv (ID 2604.20933).

IRIS: A New Self-Play Fine-Tuning Framework for LLMs

Key facts

Entities

Institutions

Sources