ARTFEED — Contemporary Art Intelligence

A$^2$RD: Agentic Autoregressive Diffusion for Long Video Consistency

other · 2026-05-11

Researchers propose A$^2$RD, an agentic autoregressive diffusion architecture for long video synthesis. The method decouples creative generation from consistency enforcement using a closed-loop process with multimodal memory, adaptive segment generation, and hierarchical self-improvement. A new benchmark LVBench-C is introduced for non-linear entity and environment transitions. The work addresses semantic drift and narrative collapse in long videos.

Key facts

  • A$^2$RD stands for Agentic Auto-Regressive Diffusion.
  • It synthesizes long videos segment-by-segment via a Retrieve-Synthesize-Refine-Update cycle.
  • Three core components: Multimodal Video Memory, Adaptive Segment Generation, Hierarchical Test-Time Self-Improvement.
  • LVBench-C is a new benchmark for non-linear entity and environment transitions.
  • The method aims to prevent semantic drift and narrative collapse.
  • The paper is available on arXiv under identifier 2605.06924.
  • The approach uses a closed-loop process for self-improvement.
  • It switches among generation modes for natural progression and visual consistency.

Entities

Institutions

  • arXiv

Sources