New AI Framework Improves In-Context Object Localization Without Category Supervision
A research paper introduces a two-stage training framework for in-context object localization (ICL) that operates without category supervision. The method explicitly optimizes attention between support bounding boxes and query images using reinforcement learning, addressing limitations of existing vision-language models that rely on category labels and introduce bias. The approach aims to enable category-agnostic, visually grounded localization for applications like image editing and personalized search. The paper is available on arXiv under ID 2605.31145.
Key facts
- In-context localization (ICL) localizes a target object from support examples in a query image without training or parameter updates.
- Existing methods require explicit category supervision, limiting applicability to unnamed or instance-specific objects.
- The new framework uses a two-stage training process to optimize in-context attention without category labels.
- Reinforcement learning further refines localization performance.
- The approach targets applications such as image editing, personalized visual search, and retrieval.
- The paper is published on arXiv with ID 2605.31145.
Entities
Institutions
- arXiv