ANCORA: Self-Play Framework for Verifiable Reasoning Without Human Supervision

other · 2026-05-01

Researchers propose ANCORA, a novel framework that shifts from learning to answer to learning to question. The system alternates between a Proposer that generates novel specifications and a Solver that produces verified solutions, enabling self-improvement without human supervision. Key mechanisms include a two-level group-relative update coupling advantages, iterative self-distilled SFT projecting onto a valid-output manifold, and a UCB-guided Curriculum DAG that grows only through verified specifications. These stabilizers prevent Proposer collapse under sparse verifier feedback. The work is detailed in arXiv:2604.27644.

Key facts

ANCORA is an anchored-curriculum framework for verifiable reasoning.
It alternates between a Proposer and a Solver.
Uses two-level group-relative update for advantages.
Employs iterative self-distilled SFT and UCB-guided Curriculum DAG.
Designed to prevent Proposer collapse from sparse feedback.
Operates without human supervision.
Published on arXiv with ID 2604.27644.
Represents a paradigm shift from learning to answer to learning to question.

ANCORA: Self-Play Framework for Verifiable Reasoning Without Human Supervision

Key facts

Entities

Institutions

Sources