OmniVL-Guard: Unified Framework for Vision-Language Forgery Detection
A new framework called OmniVL-Guard has been introduced by researchers for the detection and grounding of omnibus vision-language forgeries. Current techniques face challenges with the integration of text, images, and videos in real-world misinformation scenarios. This framework tackles the issue of 'difficulty bias,' which occurs when veracity classification overshadows gradients, negatively affecting fine-grained grounding. OmniVL-Guard employs balanced reinforcement learning along with Self-Evolving CoT Generation and Adaptive Reward Scaling. The research paper can be found on arXiv.
Key facts
- OmniVL-Guard targets unified vision-language forgery detection and grounding.
- Existing methods are limited to uni-modal or bi-modal settings.
- The framework handles interleaved text, images, and videos.
- A 'difficulty bias' problem arises from simpler veracity classification dominating gradients.
- OmniVL-Guard uses balanced reinforcement learning.
- Two core designs: Self-Evolving CoT Generation and Adaptive Reward Scaling.
- The paper is on arXiv with ID 2602.10687.
- Announce type is replace-cross.
Entities
Institutions
- arXiv