Search-E1: Self-Distillation Drives Self-Evolution in Search-Augmented Reasoning
A new paper on arXiv (2605.22511) introduces Search-E1, a method for improving search-augmented language models without external supervision or complex modules. The approach uses vanilla GRPO interleaved with offline self-distillation (OFSD) to enable self-evolution. Current post-training pipelines often rely on external systems, process reward models, tree search, or hand-crafted rewards, each adding complexity. Search-E1 challenges the necessity of these augmentations, proposing a simpler alternative that achieves gains through self-distillation alone.
Key facts
- Search-E1 is a self-evolution method for search-augmented reasoning agents.
- It uses vanilla GRPO interleaved with offline self-distillation (OFSD).
- The paper argues that complex augmentations like external supervision or tree search may be unnecessary.
- The method is described in arXiv preprint 2605.22511.
- Post-training is currently the dominant recipe for search-augmented reasoning agents.
Entities
Institutions
- arXiv