ARTFEED — Contemporary Art Intelligence

RCSR: Federated Cross-Modal Retrieval with Missing Modalities via Semantic Routing and Adapter Personalization

other · 2026-04-29

The recently introduced federated learning framework, RCSR, tackles issues in cross-modal retrieval arising from heterogeneous client data, which includes non-IID semantic distributions and absent modalities. Utilizing a frozen CLIP backbone, RCSR incorporates lightweight shared adapters for global knowledge sharing, along with optional client-specific adapters for tailored personalization. Prototype anchoring connects unimodal clients to global cross-modal semantics, and a semantic router on the server side dynamically adjusts aggregation weights according to retrieval consistency, helping to reduce alignment drift. Tests conducted on the MS-COCO and Flickr30K benchmarks showcase its effectiveness.

Key facts

  • RCSR is a federated cross-modal retrieval framework.
  • It handles non-IID semantic distributions and missing modalities.
  • Built on a frozen CLIP backbone.
  • Uses lightweight shared adapters and optional client-specific adapters.
  • Prototype anchoring aligns unimodal clients with global semantics.
  • Server-side semantic router assigns aggregation weights based on retrieval consistency.
  • Tested on MS-COCO and Flickr30K benchmarks.
  • Addresses alignment drift during heterogeneous updates.

Entities

Sources