A$^2$RD: Agentic Autoregressive Diffusion for Long Video Consistency
Researchers propose A$^2$RD, an agentic autoregressive diffusion architecture for long video synthesis. The method decouples creative generation from consistency enforcement using a closed-loop process with multimodal memory, adaptive segment generation, and hierarchical self-improvement. A new benchmark LVBench-C is introduced for non-linear entity and environment transitions. The work addresses semantic drift and narrative collapse in long videos.
Key facts
- A$^2$RD stands for Agentic Auto-Regressive Diffusion.
- It synthesizes long videos segment-by-segment via a Retrieve-Synthesize-Refine-Update cycle.
- Three core components: Multimodal Video Memory, Adaptive Segment Generation, Hierarchical Test-Time Self-Improvement.
- LVBench-C is a new benchmark for non-linear entity and environment transitions.
- The method aims to prevent semantic drift and narrative collapse.
- The paper is available on arXiv under identifier 2605.06924.
- The approach uses a closed-loop process for self-improvement.
- It switches among generation modes for natural progression and visual consistency.
Entities
Institutions
- arXiv