SAGA-ReID: A New CLIP-Based Method for Person Re-Identification

publication · 2026-04-27

A new study on arXiv presents SAGA-ReID, a fresh method for person re-identification that builds on existing CLIP-based approaches. Unlike traditional methods that compress spatial features into a single global token for image-text alignment—often leading to issues with spatial accuracy, occlusion, and camera differences—SAGA-ReID takes a different route. It aligns intermediate patch tokens with anchor vectors in CLIP’s text embedding space, prioritizing reliable spatial information while reducing the effects of any missing or damaged areas, all without needing text descriptions. Tests involving synthetic masking and actual human distractions reveal that SAGA's effectiveness notably improves in conditions of high occlusion.

Key facts

Paper on arXiv with ID 2604.22190
Proposes SAGA-ReID for person re-identification
Addresses fragility of CLIP-based methods under occlusion and cross-camera variation
Aligns intermediate patch tokens with anchor vectors in CLIP's text embedding space
Does not require textual descriptions of individual images
Tested under synthetic masking and realistic human distractors
Advantage over global pooling increases with occlusion
Published as arXiv preprint

SAGA-ReID: A New CLIP-Based Method for Person Re-Identification

Key facts

Entities

Institutions

Sources