LLM Tool Accuracy Degrades Before Advertised Context Limits

ai-technology · 2026-05-16

A recent investigation conducted by Paulsen reveals that the accuracy of large language models (LLMs) in developer tools declines significantly before they reach their claimed context window sizes. The study introduces the Maximum Effective Context Window (MECW), which is notably smaller than the advertised limits. Contemporary software repositories often include substantial non-code elements—such as compiled datasets, binary model weights, minified JavaScript bundles, and extensive log files—that can exceed the context window, pushing aside pertinent source code. To tackle this issue, a framework focused on correctness-aware context hygiene is suggested: a heuristic filter based on size that operates prior to tokenization during repository scans. This filter relies solely on OS-level stat() metadata with minimal overhead, eliminating the need for index creation or query-time inference used in semantic retrieval methods like RepoCoder, GraphRAG, and AST-based chunking. The goal of this framework is to enhance the quality of context construction by filtering out irrelevant large files, thereby maintaining the effective context for relevant code.

Key facts

Paulsen shows all tested LLMs degrade in accuracy before their advertised context limits.
The Maximum Effective Context Window (MECW) is introduced as a practical constraint.
Large non-code artifacts overflow the context window and push out relevant source code.
The proposed framework uses a pre-execution, size-based heuristic filter with sub-millisecond overhead.
The filter uses only OS-level stat() metadata.
Semantic retrieval approaches like RepoCoder, GraphRAG, and AST-based chunking require index construction and query-time inference.
The framework is correctness-aware and designed for context hygiene.
The study is published on arXiv with ID 2605.14362.

LLM Tool Accuracy Degrades Before Advertised Context Limits

Key facts

Entities

Institutions

Sources