EO-Gym: Interactive Environment for Earth Observation Agents
Researchers have launched EO-Gym, an innovative interactive framework designed for Earth Observation (EO) agents, which overcomes the constraints of current benchmarks that view EO analysis as static, single-turn tasks. EO-Gym features a structured executable setting resembling a Gymnasium-style local geospatial workspace, supported by more than 660,000 multimodal files categorized by location, time, and sensor type. It incorporates 35 specialized EO tools across six task categories. From this setup, the team developed EO-Gym-Data, a benchmark consisting of 9,078 trajectories and 34,604 reasoning steps, based on eight public EO datasets, including Landsat and Sentinel-2 imagery. An assessment of 10 vision-language models (VLMs), both open and closed, indicated that even robust general-purpose models face challenges with the interactive, multi-step reasoning essential for EO analysis.
Key facts
- EO-Gym is a multimodal interactive environment for Earth Observation agents.
- It addresses the gap of fixed-input, single-turn EO benchmarks.
- The environment is a Gymnasium-style local geospatial workspace.
- It contains over 660,000 multimodal files indexed by location, time, and sensor type.
- Includes 35 EO-specialized tools across six task families.
- EO-Gym-Data benchmark has 9,078 trajectories and 34,604 reasoning steps.
- Grounded in eight public EO datasets plus Landsat and Sentinel-2 imagery.
- 10 VLMs evaluated; general-purpose models still struggle.
Entities
Institutions
- arXiv