STBIR Framework Combines Sketches and Text for Enhanced Fine-Grained Image Retrieval
The newly introduced research framework, Sketch and Text Based Image Retrieval (STBIR), tackles the challenge of fine-grained image retrieval by integrating hand-drawn sketches with textual descriptions. It utilizes the structural outlines of sketches alongside the color and texture details from text. STBIR comprises three key elements: a robustness enhancement module driven by curriculum learning to adapt to different query qualities, a feature space optimization module based on category knowledge for improved representation, and a multi-stage method for cross-modal feature integration. This study, detailed in arXiv preprint 2604.15735v1, highlights the advantages of combining sketches and text for precise image matching, enhancing multimodal AI systems.
Key facts
- The research proposes the Sketch and Text Based Image Retrieval (STBIR) framework
- STBIR combines hand-drawn sketches with textual descriptions for image retrieval
- Sketches provide structural contours while text provides color and texture information
- The framework includes a curriculum learning-driven robustness enhancement module
- A category-knowledge-based feature space optimization module boosts representational power
- Multi-stage cross-modal feature integration synergizes different information sources
- The research addresses modality gaps in fine-grained image retrieval
- The work is documented in arXiv preprint 2604.15735v1 as a cross announcement
Entities
Institutions
- arXiv