LLM Context Windows Fail Far Before Advertised Maximums
A new study on arXiv defines the 'Maximum Effective Context Window' (MECW) to measure real-world LLM performance, finding that most models degrade severely by 1000 tokens, with some failing at just 100 tokens, despite advertised context windows of 128K or more. The research collected hundreds of thousands of data points across multiple models and problem types, revealing that MECW varies by task and is drastically smaller than reported Maximum Context Window (MCW).
Key facts
- Study defines Maximum Effective Context Window (MECW) concept
- Hundreds of thousands of data points collected across multiple models
- Significant differences found between MCW and MECW
- MECW shifts based on problem type
- Some top models failed with as little as 100 tokens in context
- Most models had severe degradation by 1000 tokens in context
- Published on arXiv with ID 2509.21361
- Announce type: replace-cross
Entities
Institutions
- arXiv