ARTFEED — Contemporary Art Intelligence

SimInsert: Training-Free Video Object Insertion via Sparse Attention

ai-technology · 2026-05-25

SimInsert is a training-free paradigm for video object insertion that decouples the task into single-frame editing and semantic motion description. It leverages image-to-video diffusion models to propagate edits temporally while preserving background invariance and enabling text-driven interactions. The approach uses non-invasive guidance mechanisms to enforce structural consistency, facilitate seamless boundary fusion, and counteract fidelity drift. The paper is available on arXiv under ID 2605.23245.

Key facts

  • SimInsert is a training-free paradigm for video object insertion.
  • It decouples the task into single-frame editing and semantic motion description.
  • It uses image-to-video diffusion models for temporal propagation.
  • It preserves background invariance.
  • It enables text-driven interactions between inserted object and environment.
  • It uses non-invasive guidance mechanisms for structural consistency.
  • It facilitates seamless boundary fusion.
  • It counteracts fidelity drift.
  • The paper is on arXiv with ID 2605.23245.

Entities

Institutions

  • arXiv

Sources