SimInsert: Training-Free Video Object Insertion via Sparse Attention
SimInsert is a training-free paradigm for video object insertion that decouples the task into single-frame editing and semantic motion description. It leverages image-to-video diffusion models to propagate edits temporally while preserving background invariance and enabling text-driven interactions. The approach uses non-invasive guidance mechanisms to enforce structural consistency, facilitate seamless boundary fusion, and counteract fidelity drift. The paper is available on arXiv under ID 2605.23245.
Key facts
- SimInsert is a training-free paradigm for video object insertion.
- It decouples the task into single-frame editing and semantic motion description.
- It uses image-to-video diffusion models for temporal propagation.
- It preserves background invariance.
- It enables text-driven interactions between inserted object and environment.
- It uses non-invasive guidance mechanisms for structural consistency.
- It facilitates seamless boundary fusion.
- It counteracts fidelity drift.
- The paper is on arXiv with ID 2605.23245.
Entities
Institutions
- arXiv