ARTFEED — Contemporary Art Intelligence

PFCVR: Part-Level Fine-Grained Model for Text-to-Image Vehicle Retrieval

other · 2026-05-09

A new model called PFCVR, which stands for Part-level Fine-grained Cross-modal Vehicle Retrieval, has been introduced by researchers for the purpose of text-to-image vehicle re-identification. This innovative model creates locally paired images and texts at a granular part level and features learnable part-query tokens that integrate both specific part information and complete sentence context before matching with visual part features. Additionally, a bi-directional mask recovery module enables each modality to reconstruct its masked elements with the assistance of the other, effectively linking local correspondences to achieve global feature alignment. Furthermore, a large-scale dataset has been developed. This research has been published on arXiv (2605.06012).

Key facts

  • PFCVR is a Part-level Fine-grained Cross-modal Vehicle Retrieval model.
  • It uses learnable part-query tokens for alignment.
  • A bi-directional mask recovery module bridges local and global features.
  • A new large-scale dataset was constructed.
  • The paper is on arXiv with ID 2605.06012.
  • Vehicle Re-ID extends to text-based queries.
  • The model handles non-overlapping camera images.
  • It enables retrieval from witness descriptions.

Entities

Institutions

  • arXiv

Sources