ARTFEED — Contemporary Art Intelligence

Frontier AI Models Show Capability Cooperation but Saturation Looms

ai-technology · 2026-05-20

A new study from arXiv (2605.18840) analyzes 34 frontier AI models from 10 labs (2024–2026) and finds that capabilities across benchmarks cooperate (r = +0.72, p < 10⁻⁶), but this cooperation varies by lab and over time. DeepSeek reversed from reasoning-rich to coding-first (h: +11.2 → -4.7, 15.9 pp swing), Google maintains consistent reasoning emphasis, and Anthropic oscillates between coding excursions and recovery. Six open-weight architectures confirm a second capability transition at 30–72B parameters. SWE-bench is now saturating, while HLE (Harder than Human-Level Evaluation) emerges as a more informative next metric. The paper introduces a population coupling trend and per-release residual (h-field) to diagnose capability emphasis and identify which measurement is most informative next.

Key facts

  • 34 models from 10 labs analyzed over 2024–2026
  • Capabilities cooperate across benchmarks (r = +0.72, p < 10⁻⁶)
  • DeepSeek reversed from reasoning-rich to coding-first (h: +11.2 → -4.7, 15.9 pp swing)
  • Google maintains consistent reasoning emphasis
  • Anthropic oscillates between coding excursions and recovery
  • Six open-weight architectures show second capability transition at 30–72B
  • SWE-bench is saturating; HLE is next informative metric
  • Method uses population coupling trend and per-release residual (h-field)

Entities

Institutions

  • DeepSeek
  • Google
  • Anthropic
  • arXiv

Sources