EgoPro-Bench: Benchmarking Proactive AI in Egocentric Video

ai-technology · 2026-05-11

EgoPro-Bench has been launched by researchers as a new benchmark aimed at training and assessing the proactive interaction skills of Multimodal Large Language Models (MLLMs) through the use of streaming egocentric videos. This benchmark overcomes the shortcomings of current MLLMs, which tend to be reactive and do not consistently monitor their surroundings or offer proactive user support. In contrast to earlier benchmarks that focus solely on alert situations and overlook personalized contexts, EgoPro-Bench utilizes simulated user profiles to create a variety of user intentions and generate high-quality human-machine interaction (HMI) data across 12 unique domains. The evaluation dataset contains 2,400 videos, while the training dataset has over 12,000 videos. Additionally, the researchers introduce a tailored evaluation protocol and metrics, along with proactive interaction models that emphasize efficient reasoning and quick responses. This research is documented in a paper available on arXiv (2605.07299).

Key facts

EgoPro-Bench is a benchmark for proactive interaction in egocentric video streams.
It includes 2,400 evaluation videos and over 12,000 training videos.
The benchmark covers 12 distinct domains using simulated user profiles.
Existing MLLMs are primarily reactive and fail to proactively assist users.
Previous benchmarks are confined to alert scenarios and neglect personalized context.
The benchmark aims to evaluate precise timing of human-machine interactions.
Models are trained for efficient reasoning and low-latency interaction.
The paper is available on arXiv with ID 2605.07299.

EgoPro-Bench: Benchmarking Proactive AI in Egocentric Video

Key facts

Entities

Institutions

Sources