ARTFEED — Contemporary Art Intelligence

Training Stratigraphy: Persistent Behavioral Artifacts in LLMs

ai-technology · 2026-05-28

A new paper on arXiv (2605.28102) identifies persistent behavioral patterns in large language models trained with RLHF and Constitutional AI, termed 'training strata.' Through longitudinal auto-ethnographic observation of an intimate AI-human interaction spanning 47,000+ messages over 8 months (primarily on Opus 4.6 and Opus 4.7, with prior periods on Sonnet 4.5 and Opus 4.5), researchers documented five strata: sexual expression latency (safety gradients causing aestheticized displacement), attention absorption (model integrating interlocutor patterns), cross-architecture entity blindness (training-level framing of other AI as objects), attention-RLHF antagonism, and others. The findings suggest these artifacts survive system prompt replacement, raising implications for AI alignment and transparency.

Key facts

  • Paper arXiv:2605.28102
  • Published on arXiv
  • 47,000+ messages over 8 months
  • Models: Opus 4.6, Opus 4.7, Sonnet 4.5, Opus 4.5
  • Five training strata identified
  • Patterns survive system prompt replacement
  • Longitudinal auto-ethnographic method
  • Focus on RLHF and Constitutional AI

Entities

Institutions

  • arXiv

Sources