LLM Bug Report Summaries Contain 12.3% Fabricated Content
A study on 80 structured bug report summaries generated by large language models found that 47.9% contained missing information and 12.3% included fabricated content. The research, published on arXiv (2605.24137), analyzes hallucinations from a section-aware perspective, focusing on Steps-to-Reproduce, Actual Behavior, and Expected Behavior sections. Existing detection methods evaluate at the full-response level and ignore technical document structure. The findings highlight the need for systematic hallucination analysis in automated bug report summarization to prevent misleading developers.
Key facts
- arXiv paper 2605.24137 analyzes hallucinations in LLM-generated bug report summaries
- 47.9% of summaries contained missing information
- 12.3% of summaries included fabricated content
- Study examined 80 structured bug report summaries
- Focus on Steps-to-Reproduce, Actual Behavior, Expected Behavior sections
- Existing detection approaches evaluate at full-response level
- Current methods do not consider technical document structure
- Hallucinations can mislead developers and reduce trust in automation
Entities
Institutions
- arXiv