ARTFEED — Contemporary Art Intelligence

STBIR Framework Combines Sketches and Text for Enhanced Fine-Grained Image Retrieval

ai-technology · 2026-04-20

The newly introduced research framework, Sketch and Text Based Image Retrieval (STBIR), tackles the challenge of fine-grained image retrieval by integrating hand-drawn sketches with textual descriptions. It utilizes the structural outlines of sketches alongside the color and texture details from text. STBIR comprises three key elements: a robustness enhancement module driven by curriculum learning to adapt to different query qualities, a feature space optimization module based on category knowledge for improved representation, and a multi-stage method for cross-modal feature integration. This study, detailed in arXiv preprint 2604.15735v1, highlights the advantages of combining sketches and text for precise image matching, enhancing multimodal AI systems.

Key facts

  • The research proposes the Sketch and Text Based Image Retrieval (STBIR) framework
  • STBIR combines hand-drawn sketches with textual descriptions for image retrieval
  • Sketches provide structural contours while text provides color and texture information
  • The framework includes a curriculum learning-driven robustness enhancement module
  • A category-knowledge-based feature space optimization module boosts representational power
  • Multi-stage cross-modal feature integration synergizes different information sources
  • The research addresses modality gaps in fine-grained image retrieval
  • The work is documented in arXiv preprint 2604.15735v1 as a cross announcement

Entities

Institutions

  • arXiv

Sources