Counterfactual Trace Auditing for LLM Agent Skills
A novel framework named Counterfactual Trace Auditing (CTA) has been developed to assess how skills impact the behavior of LLM agents. Existing evaluation techniques consider skills as black boxes, merely indicating variations in pass rates. In contrast, CTA associates agent traces with and without specific skills, divides them into goal-oriented phases, aligns these phases, and generates structured Skill Influence Pattern (SIP) annotations. These annotations highlight behavioral changes that extend beyond mere task results. Implemented on SWE-Skills-Bench using Claude across 49 software engineering tasks, findings indicate an average pass rate increase of only +0.3 percentage points, while CTA uncovers significant evaluation gaps, showcasing subtle behavioral distinctions overlooked by pass rates.
Key facts
- Counterfactual Trace Auditing (CTA) is a new framework for measuring skill effects on LLM agent behavior.
- Current evaluation methods treat skills as black boxes, reporting only pass rate changes.
- CTA pairs agent traces with and without a skill on the same task.
- Traces are segmented into goal-directed phases and aligned.
- CTA emits structured Skill Influence Pattern (SIP) annotations.
- CTA was instantiated on SWE-Skills-Bench with Claude across 49 tasks.
- Pass rate changes by only +0.3 percentage points on average.
- CTA identifies a clear evaluation gap that pass rates miss.
Entities
Institutions
- arXiv
- SWE-Skills-Bench
- Claude