Metacognition-as-Reward Framework Enhances LLM Reasoning

ai-technology · 2026-05-25

A new reinforcement learning framework called Metacognition-as-Reward (MaR) improves reasoning in large language models by incorporating metacognitive knowledge and regulation signals. MaR addresses limitations of existing reward paradigms: RLVR relies on outcome signals from executable checks or ground-truth answers, offering limited intermediate guidance; RaR uses natural-language rubrics but requires instance-specific design. MaR introduces two general process dimensions—metacognitive knowledge for identifying task-relevant information without hand-crafted rubrics, and metacognitive regulation for planning and adjusting reasoning—to provide reward guidance. The framework is detailed in arXiv paper 2605.23384.

Key facts

MaR stands for Metacognition-as-Reward
Framework is metacognition-inspired
Addresses RLVR and RaR limitations
Two dimensions: metacognitive knowledge and metacognitive regulation
Metacognitive knowledge identifies task-relevant information without instance-specific rubrics
Metacognitive regulation plans and adjusts reasoning process
Paper available on arXiv with ID 2605.23384
Announce type is cross

Metacognition-as-Reward Framework Enhances LLM Reasoning

Key facts

Entities

Institutions

Sources