New Benchmark for Multimodal Claim Extraction from Social Media Introduced

ai-technology · 2026-04-22

A new benchmark for multimodal claim extraction from social media has been developed, addressing the challenge of misinformation that combines short text with images like memes and screenshots. This work, presented in arXiv:2604.16311v1, marks the first such benchmark, consisting of posts with text and images annotated with gold-standard claims from real-world fact-checkers. Researchers evaluated state-of-the-art multimodal LLMs (MLLMs) using a three-part framework assessing semantic alignment, faithfulness, and decontextualization. They found baseline MLLMs struggle with modeling rhetorical intent and contextual cues. To overcome these limitations, the team introduced MICE, an intent-aware framework. Automated Fact-Checking (AFC) traditionally depends on claim extraction as an initial step, but existing methods have largely ignored the multimodal nature of contemporary misinformation. Social media posts often blend informal text with visual elements, creating distinct challenges not seen in text-only extraction or tasks like image captioning. The announcement type is cross, indicating interdisciplinary relevance. The research highlights the gap in current AFC approaches and proposes a solution to improve accuracy in detecting claims from mixed media content.

Key facts

Automated Fact-Checking (AFC) relies on claim extraction as a first step
Existing methods largely overlook the multimodal nature of today's misinformation
Social media posts often combine short, informal text with images such as memes, screenshots, and photos
This creates challenges that differ from both text-only claim extraction and well-studied multimodal tasks
The work presents the first benchmark for multimodal claim extraction from social media
The benchmark consists of posts containing text and one or more images
Posts are annotated with gold-standard claims derived from real-world fact-checkers
Researchers evaluated state-of-the-art multimodal LLMs (MLLMs) under a three-part evaluation framework

New Benchmark for Multimodal Claim Extraction from Social Media Introduced

Key facts

Entities

Institutions

Sources