Skill Availability Boosts LLM Agent Performance in Controlled Study

ai-technology · 2026-06-01

A new study on arXiv (2605.31408) examines how skill document presentation granularity affects large-language-model agents' task success. Using a pinned SkillsBench version with 30 tasks, two reasoning-enabled models (GPT-5.5 and DeepSeek V4-Flash), six skill conditions, and five trials per cell, the experiment generated 1,800 rows of data (900 per model). Skill availability proved the strongest signal: compared to no skill, skill conditions increased task-mean pass rate by 26.7–36.0 percentage points for GPT-5.5 and 18.0–26.0 for DeepSeek V4-Flash. Primary presentation contrasts showed smaller and uncertain effects. The study aggregates five trials per task-condition-model cell before paired contrasts over 30 tasks.

Key facts

Study published on arXiv with ID 2605.31408
Uses SkillsBench version with 30 domain-balanced tasks
Tests two models: GPT-5.5 and DeepSeek V4-Flash
Six skill conditions applied
Five trials per task-condition-model cell
1,800 total data rows (900 per model)
Skill availability increased pass rate by 26.7–36.0 pp for GPT-5.5
Skill availability increased pass rate by 18.0–26.0 pp for DeepSeek V4-Flash

Skill Availability Boosts LLM Agent Performance in Controlled Study

Key facts

Entities

Institutions

Sources