RCSR: Federated Cross-Modal Retrieval with Missing Modalities via Semantic Routing and Adapter Personalization
The recently introduced federated learning framework, RCSR, tackles issues in cross-modal retrieval arising from heterogeneous client data, which includes non-IID semantic distributions and absent modalities. Utilizing a frozen CLIP backbone, RCSR incorporates lightweight shared adapters for global knowledge sharing, along with optional client-specific adapters for tailored personalization. Prototype anchoring connects unimodal clients to global cross-modal semantics, and a semantic router on the server side dynamically adjusts aggregation weights according to retrieval consistency, helping to reduce alignment drift. Tests conducted on the MS-COCO and Flickr30K benchmarks showcase its effectiveness.
Key facts
- RCSR is a federated cross-modal retrieval framework.
- It handles non-IID semantic distributions and missing modalities.
- Built on a frozen CLIP backbone.
- Uses lightweight shared adapters and optional client-specific adapters.
- Prototype anchoring aligns unimodal clients with global semantics.
- Server-side semantic router assigns aggregation weights based on retrieval consistency.
- Tested on MS-COCO and Flickr30K benchmarks.
- Addresses alignment drift during heterogeneous updates.
Entities
—