PFCVR: Part-Level Fine-Grained Model for Text-to-Image Vehicle Retrieval
A new model called PFCVR, which stands for Part-level Fine-grained Cross-modal Vehicle Retrieval, has been introduced by researchers for the purpose of text-to-image vehicle re-identification. This innovative model creates locally paired images and texts at a granular part level and features learnable part-query tokens that integrate both specific part information and complete sentence context before matching with visual part features. Additionally, a bi-directional mask recovery module enables each modality to reconstruct its masked elements with the assistance of the other, effectively linking local correspondences to achieve global feature alignment. Furthermore, a large-scale dataset has been developed. This research has been published on arXiv (2605.06012).
Key facts
- PFCVR is a Part-level Fine-grained Cross-modal Vehicle Retrieval model.
- It uses learnable part-query tokens for alignment.
- A bi-directional mask recovery module bridges local and global features.
- A new large-scale dataset was constructed.
- The paper is on arXiv with ID 2605.06012.
- Vehicle Re-ID extends to text-based queries.
- The model handles non-overlapping camera images.
- It enables retrieval from witness descriptions.
Entities
Institutions
- arXiv