TRIAGE Framework Tests LLMs' Metacognitive Control Under Token Budgets

ai-technology · 2026-05-14

A novel assessment framework named TRIAGE examines the ability of large language models to manage a queue of tasks within a limited token budget, a type of metacognitive control previously analyzed in human cognition. This framework necessitates that models adhere to a single, organized strategy that determines the order of problems to tackle, the sequence of actions, and the token distribution for each, all without receiving execution feedback. Plans are evaluated against an oracle that possesses complete knowledge of each problem's solvability and associated costs, resulting in a triage efficiency ratio. The research assesses both frontier and open-source models, with and without reasoning capabilities, on competitive math problems. Findings reveal that existing LLMs struggle with effective prospective metacognitive control, frequently misprioritizing solvable tasks and inefficiently allocating tokens. This research underscores a significant deficiency in the capabilities of autonomous agents and implies that future enhancements may necessitate architectural modifications or training focused on metacognitive techniques.

Key facts

TRIAGE evaluates prospective metacognitive control in LLMs under token budgets.
Models must create a single ordered plan for selection, sequencing, and allocation.
Plans are scored against an oracle with full knowledge of solvability and cost.
Evaluation includes frontier and open-source models with and without reasoning.
Problems are from competition math datasets.
Current LLMs show poor prospective metacognitive control.
The framework is introduced in arXiv:2605.13414.
The study was published in 2025.

TRIAGE Framework Tests LLMs' Metacognitive Control Under Token Budgets

Key facts

Entities

Institutions

Sources