LAGO: Language-Guided Adaptive Object-Region Focus for Zero-Shot Visual-Text Alignment
A new method called LAGO (Language-Guided Adaptive Object-Region Focus) has been proposed for zero-shot visual-text alignment. The approach addresses limitations in fine-grained recognition by adaptively focusing on object regions guided by language, avoiding the prediction loop failure mode where early semantic bias amplifies errors. LAGO reduces inference cost compared to methods relying on random crops.
Key facts
- LAGO is a method for zero-shot visual-text alignment.
- It addresses fine-grained recognition by focusing on localized parts.
- It avoids the prediction loop failure mode.
- It reduces inference cost compared to random crop methods.
- The method is language-guided and adaptive.
- It is proposed in arXiv paper 2605.08156.
- The paper is a cross submission.
- The method targets zero-shot recognition without task-specific supervision.
Entities
Institutions
- arXiv