Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

ai-technology · 2026-05-23

A new paper on arXiv (2605.22511) introduces Search-E1, a method for improving search-augmented language models without external supervision or complex modules. The approach uses vanilla GRPO interleaved with offline self-distillation (OFSD) to enable self-evolution. Current post-training pipelines often rely on external systems, process reward models, tree search, or hand-crafted rewards, each adding complexity. Search-E1 challenges the necessity of these augmentations, proposing a simpler alternative that achieves gains through self-distillation alone.

Key facts

Search-E1 is a self-evolution method for search-augmented reasoning agents.
It uses vanilla GRPO interleaved with offline self-distillation (OFSD).
The paper argues that complex augmentations like external supervision or tree search may be unnecessary.
The method is described in arXiv preprint 2605.22511.
Post-training is currently the dominant recipe for search-augmented reasoning agents.

Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning

Key facts

Entities

Institutions

Sources