ARTFEED — Contemporary Art Intelligence

Metamorphic Testing Reveals Memorization in LLM-Based Program Repair

ai-technology · 2026-04-25

A new study from arXiv (2604.21579) investigates data leakage in LLM-based automated program repair (APR). The researchers combine metamorphic testing (MT) with negative log-likelihood (NLL) to diagnose memorization. They construct variant benchmarks by applying semantics-preserving transformations to Defects4J and GitBug-Java datasets. Evaluating seven LLMs on original and transformed versions, they find all state-of-the-art models show substantial drops in patch success rates, indicating memorization inflates performance estimates.

Key facts

  • arXiv paper 2604.21579 investigates memorization in LLM-based APR
  • Combines metamorphic testing with negative log-likelihood
  • Uses Defects4J and GitBug-Java datasets
  • Applies semantics-preserving transformations to create variant benchmarks
  • Evaluates seven LLMs on original and transformed versions
  • All evaluated LLMs show substantial drops in patch success rates
  • Data leakage inflates performance estimates in APR
  • Metamorphic testing helps reveal memorization

Entities

Institutions

  • arXiv

Sources