RCSR: Federated Cross-Modal Retrieval with Missing Modalities via Semantic Routing and Adapter Personalization

other · 2026-04-29

The recently introduced federated learning framework, RCSR, tackles issues in cross-modal retrieval arising from heterogeneous client data, which includes non-IID semantic distributions and absent modalities. Utilizing a frozen CLIP backbone, RCSR incorporates lightweight shared adapters for global knowledge sharing, along with optional client-specific adapters for tailored personalization. Prototype anchoring connects unimodal clients to global cross-modal semantics, and a semantic router on the server side dynamically adjusts aggregation weights according to retrieval consistency, helping to reduce alignment drift. Tests conducted on the MS-COCO and Flickr30K benchmarks showcase its effectiveness.

Key facts

RCSR is a federated cross-modal retrieval framework.
It handles non-IID semantic distributions and missing modalities.
Built on a frozen CLIP backbone.
Uses lightweight shared adapters and optional client-specific adapters.
Prototype anchoring aligns unimodal clients with global semantics.
Server-side semantic router assigns aggregation weights based on retrieval consistency.
Tested on MS-COCO and Flickr30K benchmarks.
Addresses alignment drift during heterogeneous updates.

Entities

—

Sources

arXiv cs.AI — 2026-04-28