A$^2$RD: Agentic Autoregressive Diffusion for Long Video Consistency

other · 2026-05-11

Researchers propose A$^2$RD, an agentic autoregressive diffusion architecture for long video synthesis. The method decouples creative generation from consistency enforcement using a closed-loop process with multimodal memory, adaptive segment generation, and hierarchical self-improvement. A new benchmark LVBench-C is introduced for non-linear entity and environment transitions. The work addresses semantic drift and narrative collapse in long videos.

Key facts

A$^2$RD stands for Agentic Auto-Regressive Diffusion.
It synthesizes long videos segment-by-segment via a Retrieve-Synthesize-Refine-Update cycle.
Three core components: Multimodal Video Memory, Adaptive Segment Generation, Hierarchical Test-Time Self-Improvement.
LVBench-C is a new benchmark for non-linear entity and environment transitions.
The method aims to prevent semantic drift and narrative collapse.
The paper is available on arXiv under identifier 2605.06924.
The approach uses a closed-loop process for self-improvement.
It switches among generation modes for natural progression and visual consistency.

A$^2$RD: Agentic Autoregressive Diffusion for Long Video Consistency

Key facts

Entities

Institutions

Sources