CloudWeb Attack Hijacks Remote Sensing Vision-Language RAG via Atmospheric Patterns

ai-technology · 2026-05-11

CloudWeb is a novel adversarial attack targeting the retrieval stage of vision-language multimodal RAG systems in remote sensing. Unlike prior attacks that manipulate the retrieval corpus or end-task predictions, CloudWeb modifies only the input image by overlaying parameterized cloud- and haze-like patterns. These patterns are optimized to pull adversarial image embeddings toward target atmospheric evidence while suppressing source-scene evidence, enforcing rank separation, and regularizing naturalness and coverage. The attack keeps the retriever, generator, and knowledge base fixed at deployment, exposing a critical vulnerability in input-space threats to evidence retrieval. This work highlights the need for robust defenses in remote sensing multimodal RAG systems.

Key facts

CloudWeb is an atmospheric retrieval hijacking attack for remote sensing multimodal RAG.
It modifies only the input image, keeping retriever, generator, and knowledge base fixed.
The attack overlays cloud- and haze-like patterns on remote sensing images.
Optimization objectives include pulling embeddings toward target evidence and suppressing source evidence.
Existing adversarial studies mainly target retrieval corpus or end-task predictions.
Input-space threats to evidence retrieval in remote sensing RAG are underexplored.
The attack is introduced in arXiv paper 2605.07273.
CloudWeb exposes vulnerabilities in vision-language retrievers for remote sensing.

CloudWeb Attack Hijacks Remote Sensing Vision-Language RAG via Atmospheric Patterns

Key facts

Entities

Institutions

Sources