AI Agents Consume 1000x More Tokens in Coding Tasks Than Chat
A recent study published on arXiv (2604.22750) presents the inaugural comprehensive examination of token usage trends among AI agents engaged in coding activities. The research scrutinized data from eight advanced large language models using SWE-bench Verified, revealing that agentic tasks utilize approximately 1,000 times more tokens compared to conventional code reasoning or code chat. Notably, the majority of expenses stem from input tokens rather than output. The variability in token consumption is significant, with variations of up to 30 times observed in total tokens for identical tasks. Additionally, the study assesses the models' capacity to estimate their own token expenditures prior to execution.
Key facts
- arXiv paper 2604.22750 analyzes token consumption in AI agentic coding tasks.
- Eight frontier LLMs were tested on SWE-bench Verified.
- Agentic tasks consume 1000x more tokens than code reasoning or code chat.
- Input tokens are the primary cost driver, not output tokens.
- Token usage varies by up to 30x across runs on the same task.
- The study examines models' ability to predict token costs before task execution.
- Token consumption is described as inherently stochastic.
- The research is the first systematic study of token consumption patterns in agentic coding.
Entities
Institutions
- arXiv