NeuroState-Bench: Benchmarking Commitment Integrity in LLM Agents

ai-technology · 2026-05-06

NeuroState-Bench is a human-calibrated benchmark designed to evaluate commitment integrity in LLM agent profiles. It uses benchmark-defined side-query probes instead of inferred hidden activations to assess whether an agent preserves commitments across multi-turn tasks. The benchmark includes 144 deterministic tasks and 306 side-query probes covering eight cognitively motivated failure families, with clean and distractor variants across three difficulty bands. The main evaluation involves 32 profiles: 16 local and 16 hosted large-model profiles. Human calibration was performed on 104 sampled task units, with 216 raw annotations and 108 adjudicated task rows, achieving weighted kappa = 0.977 and ICC(2,1) = 0.977. The benchmark reveals that task success and commitment integrity are distinct dimensions of agent performance.

Key facts

NeuroState-Bench evaluates commitment integrity in LLM agent profiles.
Uses benchmark-defined side-query probes rather than hidden activations.
Contains 144 deterministic tasks and 306 side-query probes.
Covers eight cognitively motivated failure families.
Includes clean and distractor variants across three difficulty bands.
Main evaluation involves 32 profiles: 16 local and 16 hosted large-model.
Human calibration on 104 task units achieved weighted kappa = 0.977.
Task success and commitment integrity are distinct performance dimensions.

Entities

—

Sources

arXiv cs.AI — 2026-05-05