FedSurg Challenge benchmarks federated learning for surgical video analysis

other · 2026-04-25

The FedSurg EndoVis 2024 Challenge, detailed in arXiv paper 2510.04772, marks the inaugural global benchmarking effort focused on federated learning (FL) in the realm of surgical vision. This initiative was tested as a proof-of-concept using a multi-center dataset from laparoscopic appendectomies, a preliminary segment of Appendix300. Three entries were evaluated for their ability to generalize to an unseen center and adapt specifically to individual centers. Baselines from Centralized and Swarm Learning helped to differentiate the effects of task complexity and decentralization on performance. Despite central data pooling, the task only reached a 26.31% F1-score at the unseen center, with decentralized training leading to an additional, distinct performance decline. The challenge underscores the necessity of multi-institutional data for creating generalizable surgical AI, while patient privacy issues hinder direct data sharing, positioning FL as a promising solution. However, the use of FL for intricate, spatiotemporal surgical video data remains largely untested.

Key facts

FedSurg EndoVis 2024 Challenge is the first international benchmarking initiative for FL in surgical vision.
Evaluated on a multi-center laparoscopic appendectomy dataset (preliminary subset of Appendix300).
Three submissions were evaluated on generalization to an unseen center and center-specific adaptation.
Centralized and Swarm Learning baselines were used to isolate task difficulty and decentralization effects.
Centralized training achieved only 26.31% F1-score on the unseen center.
Decentralized training introduced an additional, separable performance drop.
Patient privacy constraints preclude direct data sharing, motivating FL.
Application of FL to spatiotemporal surgical video data remains largely unbenchmarked.

FedSurg Challenge benchmarks federated learning for surgical video analysis

Key facts

Entities

Institutions

Sources