CORE Algorithm Boosts Reasoning with Few Samples
Researchers have introduced Contrastive Reflection (CORE), a non-parametric learning algorithm that enables language models to improve reasoning using as few as five training samples. CORE compares past reasoning traces to generate insights—short natural-language descriptions of strategies and constraints—that capture differences between successful and unsuccessful attempts. Across four reasoning tasks, CORE outperformed parametric methods like GRPO and non-parametric methods like GEPA, episodic RAG, and MemRL, while requiring fewer rollouts. The algorithm addresses the high cost of traditional approaches, which typically need hundreds of samples and thousands of rollouts. The paper is available on arXiv under ID 2605.28742.
Key facts
- CORE is a non-parametric learning algorithm.
- It uses contrastive reflection to generate insights from reasoning traces.
- Requires as few as five training samples.
- Outperforms GRPO, GEPA, episodic RAG, and MemRL.
- Tested on four reasoning tasks.
- Reduces need for hundreds of samples and thousands of rollouts.
- Published on arXiv with ID 2605.28742.
- Focuses on language model reasoning improvement.
Entities
Institutions
- arXiv