ARTFEED — Contemporary Art Intelligence

DySink: Dynamic Frame Sinks for Efficient Long Video Generation

other · 2026-05-22

DySink, a novel framework, enhances the generation of long videos in an autoregressive manner by substituting static early-frame sinks with dynamic, retrieval-based alternatives. Conventional techniques rely on unchanging early frames as long-range references, which can become obsolete as the visual context shifts, leading to bias and possible sink failure. In contrast, DySink utilizes a streamlined memory bank to dynamically choose visually pertinent historical frames, along with a sink anomaly gate that identifies excessive consensus in inter-head attention. This flexible method significantly improves both the quality and efficiency of video generation.

Key facts

  • DySink is a retrieval-based framework for autoregressive long video generation.
  • It replaces static early-frame sinks with dynamic frame sinks.
  • Traditional methods use fixed early frames that become outdated.
  • Static sinks can cause bias and sink collapse due to RoPE-induced phase re-alignment.
  • DySink maintains a compact memory bank.
  • It selects visually relevant historical frames adaptively.
  • A sink anomaly gate detects excessive inter-head consensus.
  • The framework improves generation quality and efficiency.

Entities

Institutions

  • arXiv

Sources