SimInsert: Training-Free Video Object Insertion via Sparse Attention

ai-technology · 2026-05-25

SimInsert is a training-free paradigm for video object insertion that decouples the task into single-frame editing and semantic motion description. It leverages image-to-video diffusion models to propagate edits temporally while preserving background invariance and enabling text-driven interactions. The approach uses non-invasive guidance mechanisms to enforce structural consistency, facilitate seamless boundary fusion, and counteract fidelity drift. The paper is available on arXiv under ID 2605.23245.

Key facts

SimInsert is a training-free paradigm for video object insertion.
It decouples the task into single-frame editing and semantic motion description.
It uses image-to-video diffusion models for temporal propagation.
It preserves background invariance.
It enables text-driven interactions between inserted object and environment.
It uses non-invasive guidance mechanisms for structural consistency.
It facilitates seamless boundary fusion.
It counteracts fidelity drift.
The paper is on arXiv with ID 2605.23245.

SimInsert: Training-Free Video Object Insertion via Sparse Attention

Key facts

Entities

Institutions

Sources