ImmersedPrivacy: Evaluating VLM Privacy Awareness in Physical Environments

ai-technology · 2026-05-09

A new study introduces ImmersedPrivacy, an interactive audio-visual evaluation framework designed to assess privacy awareness in Vision-Language Models (VLMs) deployed as autonomous cognitive cores for embodied assistants. Unlike digital chatbots, these agents operate in intimate spaces like homes and hospitals, where they can observe and manipulate privacy-sensitive information and artifacts. Current benchmarks are limited to unimodal, text-based representations that fail to capture real-world demands. ImmersedPrivacy uses a Unity-based simulator to create realistic physical environments and evaluates privacy awareness across three progressive tiers: identifying sensitive items in cluttered scenes, adapting to shifting social contexts, and resolving conflicts between privacy and task objectives. The framework aims to bridge the gap between existing evaluations and the physical grounding required for safe deployment. The study is published on arXiv under identifier 2605.05340.

Key facts

ImmersedPrivacy is an interactive audio-visual evaluation framework for VLMs.
It simulates realistic physical environments using a Unity-based simulator.
The framework tests privacy awareness across three progressive tiers.
Current benchmarks are limited to unimodal, text-based representations.
VLMs are increasingly deployed as autonomous cognitive cores for embodied assistants.
These agents operate in intimate spaces like homes and hospitals.
They have physical agency to observe and manipulate privacy-sensitive information.
The study is published on arXiv with identifier 2605.05340.

ImmersedPrivacy: Evaluating VLM Privacy Awareness in Physical Environments

Key facts

Entities

Institutions

Sources