LLM Bug Report Summaries Contain 12.3% Fabricated Content

other · 2026-05-26

A study on 80 structured bug report summaries generated by large language models found that 47.9% contained missing information and 12.3% included fabricated content. The research, published on arXiv (2605.24137), analyzes hallucinations from a section-aware perspective, focusing on Steps-to-Reproduce, Actual Behavior, and Expected Behavior sections. Existing detection methods evaluate at the full-response level and ignore technical document structure. The findings highlight the need for systematic hallucination analysis in automated bug report summarization to prevent misleading developers.

Key facts

arXiv paper 2605.24137 analyzes hallucinations in LLM-generated bug report summaries
47.9% of summaries contained missing information
12.3% of summaries included fabricated content
Study examined 80 structured bug report summaries
Focus on Steps-to-Reproduce, Actual Behavior, Expected Behavior sections
Existing detection approaches evaluate at full-response level
Current methods do not consider technical document structure
Hallucinations can mislead developers and reduce trust in automation

LLM Bug Report Summaries Contain 12.3% Fabricated Content

Key facts

Entities

Institutions

Sources