ARTFEED — Contemporary Art Intelligence

Stargazer: AI Agent Benchmark for Astrophysics

other · 2026-05-13

Stargazer serves as a scalable benchmark framework designed to assess AI agents on dynamic, iterative tasks that involve physics-based model fitting with radial-velocity time series data. It comprises 120 distinct tasks categorized into three levels of difficulty, featuring 20 authentic archival cases that range from high-SNR single-planet systems to intricate multi-planet configurations. An evaluation of eight leading agents highlighted a disparity between numerical optimization and compliance with physical constraints, indicating that while agents frequently attain satisfactory statistical fits, they often struggle to accurately retrieve the correct physical parameters.

Key facts

  • Stargazer is a benchmark for AI agents on astrophysical model-fitting tasks.
  • It uses radial-velocity time series data.
  • Includes 120 tasks across three difficulty tiers.
  • 20 tasks are real archival cases.
  • Covers high-SNR single-planet to complex multi-planetary systems.
  • Eight frontier agents were evaluated.
  • Agents often achieve good statistical fit but fail to recover correct physical parameters.

Entities

Sources