KOCO-BENCH: Benchmarking LLMs for Domain-Specific Software Development

other · 2026-04-24

While large language models (LLMs) are proficient in general programming, they face challenges in specialized software development, necessitating methods for domain specialization. Current benchmarks focus on the knowledge LLMs have rather than their processes of acquiring and utilizing new information, and they do not include explicit knowledge corpora. Introducing KOCO-BENCH, a novel benchmark that assesses domain specialization techniques in practical software development. It encompasses six emerging domains, 11 software frameworks, and 25 projects, offering curated knowledge corpora and a variety of evaluation tasks that range from function-level to project-level domain code generation, all supported by rigorous testing suites.

Key facts

LLMs excel at general programming but struggle with domain-specific software development.
Existing domain-specific code benchmarks cannot evaluate the effectiveness of domain specialization methods.
KOCO-BENCH is a benchmark for evaluating domain specialization methods.
KOCO-BENCH contains 6 emerging domains.
KOCO-BENCH includes 11 software frameworks and 25 projects.
KOCO-BENCH features curated knowledge corpora.
Evaluation tasks include domain code generation from function-level to project-level.
Test suites are rigorous.

Entities

—

Sources

arXiv cs.AI — 2026-04-23