Metacognition-as-Reward Framework Enhances LLM Reasoning
A new reinforcement learning framework called Metacognition-as-Reward (MaR) improves reasoning in large language models by incorporating metacognitive knowledge and regulation signals. MaR addresses limitations of existing reward paradigms: RLVR relies on outcome signals from executable checks or ground-truth answers, offering limited intermediate guidance; RaR uses natural-language rubrics but requires instance-specific design. MaR introduces two general process dimensions—metacognitive knowledge for identifying task-relevant information without hand-crafted rubrics, and metacognitive regulation for planning and adjusting reasoning—to provide reward guidance. The framework is detailed in arXiv paper 2605.23384.
Key facts
- MaR stands for Metacognition-as-Reward
- Framework is metacognition-inspired
- Addresses RLVR and RaR limitations
- Two dimensions: metacognitive knowledge and metacognitive regulation
- Metacognitive knowledge identifies task-relevant information without instance-specific rubrics
- Metacognitive regulation plans and adjusts reasoning process
- Paper available on arXiv with ID 2605.23384
- Announce type is cross
Entities
Institutions
- arXiv