AI Coding Tools Inflate Productivity Metrics, Developer Investigation Reveals
Software developer William O'Connell conducted an independent investigation into the accuracy of AI code generation metrics reported by popular IDEs. Testing Windsurf (formerly Codeium) and Cursor, he found that both tools significantly overestimate the percentage of code attributed to AI. Windsurf's "% new code written by Windsurf" (PCW) metric reported 98% AI contribution for his work, despite manual testing showing the actual figure was far lower. The bias stems from counting auto-added closing symbols and pasted text as non-human, while crediting AI for all code moved by the tool. Cursor's "AI Share of Committed Code" performed better but still claimed 100% AI generation for a file where only quote marks were changed. O'Connell warns that such skewed metrics could mislead management into overvaluing AI tools, potentially affecting team sizes and legal considerations around copyright of AI-generated code. He concludes that vendors have financial incentives to report high AI percentages and should not be trusted to measure their own impact accurately.
Key facts
- Windsurf's PCW metric reported 98% AI-generated code for O'Connell's work.
- Windsurf counts auto-added closing symbols as non-human, biasing toward AI.
- Pasted text is not counted as human contribution in Windsurf.
- Cursor's line-based metric claimed 100% AI for a file where only quotes were changed.
- Both tools use protobuf for data encoding, complicating analysis.
- Windsurf's analytics update almost instantly despite claiming three-hour intervals.
- Git integration in Windsurf appears nonexistent despite documentation claims.
- Cursor only offers analytics on its Team plan.
- O'Connell tested by creating identical files manually and via AI, then comparing byte/line counts.
- The investigation was published on O'Connell's personal blog.
Entities
Artists
- William O'Connell
Institutions
- Windsurf
- Codeium
- Cursor
- GitHub Copilot
- Amazon Kiro
- Cognition
- Devin