SARE: Adaptive Reasoning for Fine-Grained Visual Recognition
A new framework called SARE (Sample-wise Adaptive Reasoning) has been proposed for training-free Fine-Grained Visual Recognition (FGVR) using Large Vision-Language Models (LVLMs). The method addresses two key limitations of existing approaches: uniform inference across samples with varying difficulty, and lack of error experience reuse. SARE employs a cascaded design combining fast candidate retrieval with adaptive reasoning, improving accuracy and efficiency. The paper is available on arXiv under ID 2603.17729.
Key facts
- SARE stands for Sample-wise Adaptive Reasoning
- It targets training-free Fine-Grained Visual Recognition (FGVR)
- Uses Large Vision-Language Models (LVLMs)
- Addresses uneven recognition difficulty across samples
- Incorporates mechanisms to consolidate error-specific experience
- Employs a cascaded design with fast candidate retrieval
- Paper ID: arXiv:2603.17729
- Announcement type: replace-cross
Entities
Institutions
- arXiv