CHASM Dataset Exposes Covert Ads on Chinese Social Media

ai-technology · 2026-04-24

A team of researchers has unveiled CHASM, the inaugural dataset tailored for assessing multimodal large language models (MLLMs) in identifying covert advertisements on social media. These covert ads masquerade as ordinary posts, posing ethical and legal dilemmas. The dataset consists of 4,992 high-quality, anonymized, and manually curated examples sourced from the Chinese platform Rednote, gathered with stringent privacy and quality measures. It features posts sharing product experiences that closely resemble covert ads, complicating detection efforts. Evaluations conducted under zero-shot conditions indicate that current MLLMs face difficulties in identifying these ads, underscoring a significant shortcoming in social media moderation standards.

Key facts

CHASM is the first dataset for evaluating MLLMs on covert ad detection.
Dataset contains 4,992 instances from Chinese social media platform Rednote.
Instances are anonymized and manually curated under strict privacy protocols.
Covert advertisements disguise as regular posts to mislead consumers.
Current benchmarks for LLMs in social media moderation overlook covert ads.
Dataset includes product experience sharing posts that resemble covert ads.
Results show MLLMs perform poorly under zero-shot conditions.
Covert ads pose significant ethical and legal concerns.

Entities

Institutions

arXiv

Locations

China

Sources

arXiv cs.AI — 2026-04-23