PFCVR: Part-Level Fine-Grained Model for Text-to-Image Vehicle Retrieval

other · 2026-05-09

A new model called PFCVR, which stands for Part-level Fine-grained Cross-modal Vehicle Retrieval, has been introduced by researchers for the purpose of text-to-image vehicle re-identification. This innovative model creates locally paired images and texts at a granular part level and features learnable part-query tokens that integrate both specific part information and complete sentence context before matching with visual part features. Additionally, a bi-directional mask recovery module enables each modality to reconstruct its masked elements with the assistance of the other, effectively linking local correspondences to achieve global feature alignment. Furthermore, a large-scale dataset has been developed. This research has been published on arXiv (2605.06012).

Key facts

PFCVR is a Part-level Fine-grained Cross-modal Vehicle Retrieval model.
It uses learnable part-query tokens for alignment.
A bi-directional mask recovery module bridges local and global features.
A new large-scale dataset was constructed.
The paper is on arXiv with ID 2605.06012.
Vehicle Re-ID extends to text-based queries.
The model handles non-overlapping camera images.
It enables retrieval from witness descriptions.

PFCVR: Part-Level Fine-Grained Model for Text-to-Image Vehicle Retrieval

Key facts

Entities

Institutions

Sources