ARTFEED — Contemporary Art Intelligence

CloudWeb Attack Hijacks Remote Sensing Vision-Language RAG via Atmospheric Patterns

ai-technology · 2026-05-11

CloudWeb is a novel adversarial attack targeting the retrieval stage of vision-language multimodal RAG systems in remote sensing. Unlike prior attacks that manipulate the retrieval corpus or end-task predictions, CloudWeb modifies only the input image by overlaying parameterized cloud- and haze-like patterns. These patterns are optimized to pull adversarial image embeddings toward target atmospheric evidence while suppressing source-scene evidence, enforcing rank separation, and regularizing naturalness and coverage. The attack keeps the retriever, generator, and knowledge base fixed at deployment, exposing a critical vulnerability in input-space threats to evidence retrieval. This work highlights the need for robust defenses in remote sensing multimodal RAG systems.

Key facts

  • CloudWeb is an atmospheric retrieval hijacking attack for remote sensing multimodal RAG.
  • It modifies only the input image, keeping retriever, generator, and knowledge base fixed.
  • The attack overlays cloud- and haze-like patterns on remote sensing images.
  • Optimization objectives include pulling embeddings toward target evidence and suppressing source evidence.
  • Existing adversarial studies mainly target retrieval corpus or end-task predictions.
  • Input-space threats to evidence retrieval in remote sensing RAG are underexplored.
  • The attack is introduced in arXiv paper 2605.07273.
  • CloudWeb exposes vulnerabilities in vision-language retrievers for remote sensing.

Entities

Institutions

  • arXiv

Sources