CoReVAD: Training-Free Video Anomaly Detection with VLMs
A new framework called CoReVAD has been introduced by researchers for detecting video anomalies without the need for training. This contextual reasoning model utilizes a single frozen Vision-Language Model (VLM) and stands apart from existing techniques that depend on specific training or external Large Language Models (LLMs). CoReVAD generates both anomaly scores and temporal descriptions directly, eliminating extra training requirements. It incorporates a Local Respo mechanism to reduce noise in generative outputs. This innovative approach tackles the challenges of domain dependency and the high costs associated with traditional video anomaly detection methods, offering human-readable reasoning in addition to scalar anomaly scores. The findings are detailed in a paper available on arXiv (2605.23116v1).
Key facts
- CoReVAD is a training-free video anomaly detection framework.
- It uses a single frozen Vision-Language Model (VLM).
- It generates anomaly scores and temporal descriptions directly.
- It introduces a Local Respo mechanism to reduce noise.
- It avoids additional training steps like instruction tuning or verbalized learning.
- It does not require external Large Language Models (LLMs).
- It addresses domain dependency and high training costs of existing VAD methods.
- The paper is available on arXiv with ID 2605.23116v1.
Entities
Institutions
- arXiv