CODS 2025 AssetOpsBench Challenge Results and Analysis
An analysis has been conducted on the CODS 2025 AssetOpsBench Challenge, a competition focused on industrial multi-agent orchestration within the privacy-aware Codabench framework based on AssetOps. This event saw 149 teams register and a total of 300 entries submitted. Notable insights reveal that the public planning leaderboard reached a maximum of 72.73%, with enhanced prompts failing to elevate this score. Additionally, there was a moderate correlation between public and hidden evaluation scores in planning (r=0.69), while execution scores showed a negative correlation (r=-0.13). Some systems that recorded 45.45% in public execution scores managed to achieve 63.64% on the hidden evaluation. The analysis utilized final rank sheets, server logs, best-submission exports, organizer reports, and verified source trees from the planning track.
Key facts
- CODS 2025 AssetOpsBench Challenge was a privacy-aware Codabench competition on industrial multi-agent orchestration built on AssetOps.
- 149 teams registered and 300 submissions were made.
- Public planning leaderboard saturated at 72.73%.
- Richer prompts did not improve the peak score.
- Hidden evaluation scores correlated moderately with public scores in planning (r=0.69).
- Hidden evaluation scores correlated negatively with public scores in execution (r=-0.13).
- Several systems with 45.45% public execution scores achieved 63.64% on the hidden set.
- Analysis used final rank sheets, server logs, best-submission exports, organizer reports, and verified planning-track source trees.
Entities
Institutions
- Codabench