CODS 2025 AssetOpsBench Challenge Results and Analysis

other · 2026-05-12

An analysis has been conducted on the CODS 2025 AssetOpsBench Challenge, a competition focused on industrial multi-agent orchestration within the privacy-aware Codabench framework based on AssetOps. This event saw 149 teams register and a total of 300 entries submitted. Notable insights reveal that the public planning leaderboard reached a maximum of 72.73%, with enhanced prompts failing to elevate this score. Additionally, there was a moderate correlation between public and hidden evaluation scores in planning (r=0.69), while execution scores showed a negative correlation (r=-0.13). Some systems that recorded 45.45% in public execution scores managed to achieve 63.64% on the hidden evaluation. The analysis utilized final rank sheets, server logs, best-submission exports, organizer reports, and verified source trees from the planning track.

Key facts

CODS 2025 AssetOpsBench Challenge was a privacy-aware Codabench competition on industrial multi-agent orchestration built on AssetOps.
149 teams registered and 300 submissions were made.
Public planning leaderboard saturated at 72.73%.
Richer prompts did not improve the peak score.
Hidden evaluation scores correlated moderately with public scores in planning (r=0.69).
Hidden evaluation scores correlated negatively with public scores in execution (r=-0.13).
Several systems with 45.45% public execution scores achieved 63.64% on the hidden set.
Analysis used final rank sheets, server logs, best-submission exports, organizer reports, and verified planning-track source trees.

CODS 2025 AssetOpsBench Challenge Results and Analysis

Key facts

Entities

Institutions

Sources