SWE-ContextBench: Benchmarking Context Learning in Coding Agents
SWE-ContextBench has been unveiled by researchers as a benchmark aimed at assessing context comprehension and retrieval in coding agents powered by large language models. In contrast to current benchmarks that evaluate tasks in isolation, SWE-ContextBench focuses on the agents' capacity to apply prior knowledge to interconnected challenges. It includes 1,100 foundational tasks and 376 associated tasks, all based on authentic dependency and reference links found in GitHub issues and pull requests. These tasks cover 51 distinct repositories and involve 9 programming languages. The objective is to evaluate the precision and efficiency with which agents address related problems by utilizing common context.
Key facts
- SWE-ContextBench evaluates context understanding and retrieval in coding agents.
- It consists of 1,100 base tasks and 376 related tasks.
- Tasks are derived from real dependency and reference relationships among GitHub issues and pull requests.
- Tasks span 51 unique repositories and 9 programming languages.
- The benchmark measures how accurately and efficiently agents solve related issues.
- Current benchmarks treat tasks as independent and do not assess reuse of previous experience.
- The benchmark aims to measure efficiency gains from reusing previous experience.
- The paper is available on arXiv with ID 2602.08316.
Entities
Institutions
- arXiv