AuthBench: Benchmarking Least-Privilege Authorization in Coding Agents

ai-technology · 2026-05-16

A recent study presents permission-boundary inference, in which a model correlates task instructions and terminal environments to a file-level policy for reading, writing, and executing. The researchers developed AuthBench, a benchmark consisting of 120 realistic terminal tasks, complete with human-reviewed permission labels and executable validators for assessing utility and attack outcomes. Findings from AuthBench indicate that authorization is more complex than merely balancing conservative and permissive approaches: frontier models frequently neglect permissions necessary for the execution chain while also allowing access to unused or sensitive resources. Enhancing reasoning during inference does not rectify this discrepancy. This research is available on arXiv with the identifier 2605.14859.

Key facts

Coding agents require least-privilege authorization for safe deployment.
Permission-boundary inference maps task instructions to file-level policies.
AuthBench includes 120 realistic terminal tasks.
Tasks have human-reviewed permission labels and executable validators.
Frontier models omit required permissions and grant unnecessary ones.
Increased inference-time reasoning does not fix the mismatch.
The study is on arXiv with ID 2605.14859.
The announcement type is cross.

AuthBench: Benchmarking Least-Privilege Authorization in Coding Agents

Key facts

Entities

Institutions

Sources