OpenSeeker-v2 Achieves SOTA on BrowseComp with Simple SFT
OpenSeeker-v2 has been unveiled by researchers, showcasing exceptional capabilities across four benchmarks through the use of supervised fine-tuning (SFT) on 10.6k data points. This model leverages a 30B-parameter LLM within the ReAct framework, achieving a score of 46.0% on BrowseComp. Notable advancements include three modifications in data synthesis: enhancing the knowledge graph's size for improved exploration, increasing the tool set's size for greater functionality, and implementing strict low-step filtering. This approach challenges the conventional industry methods that rely on resource-heavy processes such as pre-training, continual pre-training, SFT, and reinforcement learning. The findings indicate that complex and informative trajectories can render simple SFT remarkably effective for developing cutting-edge search agents.
Key facts
- OpenSeeker-v2 achieves state-of-the-art performance on 4 benchmarks.
- Trained on only 10.6k data points using SFT.
- Based on a 30B-sized agent with ReAct paradigm.
- Scores 46.0% on BrowseComp.
- Uses three data synthesis modifications: scaling knowledge graph size, expanding tool set size, strict low-step filtering.
- Challenges resource-intensive industry pipeline of pre-training, CPT, SFT, and RL.
- Demonstrates power of informative and high-difficulty trajectories for SFT.
- Published on arXiv with ID 2605.04036.
Entities
Institutions
- arXiv