New AI Framework Improves In-Context Object Localization Without Category Supervision

ai-technology · 2026-06-01

A research paper introduces a two-stage training framework for in-context object localization (ICL) that operates without category supervision. The method explicitly optimizes attention between support bounding boxes and query images using reinforcement learning, addressing limitations of existing vision-language models that rely on category labels and introduce bias. The approach aims to enable category-agnostic, visually grounded localization for applications like image editing and personalized search. The paper is available on arXiv under ID 2605.31145.

Key facts

In-context localization (ICL) localizes a target object from support examples in a query image without training or parameter updates.
Existing methods require explicit category supervision, limiting applicability to unnamed or instance-specific objects.
The new framework uses a two-stage training process to optimize in-context attention without category labels.
Reinforcement learning further refines localization performance.
The approach targets applications such as image editing, personalized visual search, and retrieval.
The paper is published on arXiv with ID 2605.31145.

New AI Framework Improves In-Context Object Localization Without Category Supervision

Key facts

Entities

Institutions

Sources