GeoX: Self-Play Framework Enhances Geospatial Reasoning in VLMs

ai-technology · 2026-05-20

GeoX is an innovative self-play framework aimed at enhancing geospatial reasoning within vision-language models (VLMs) without the need for extensive human-curated datasets. It employs executable programs to create spatial challenges, resolving them through abduction, deduction, and induction, while a verifier supplies reward signals for reinforcement learning. When applied to satellite and aerial imagery, GeoX improves the performance of base VLMs by an average of 5.5 points, either matching or surpassing baselines that are trained on millions of curated examples. This framework effectively tackles the expensive process of annotating complex spatial queries by utilizing verifiable rewards derived from program execution.

Key facts

GeoX is a self-play framework for geospatial reasoning.
It uses executable programs to propose and solve spatial problems.
Three reasoning modes: abduction, deduction, induction.
A verifier executes programs to provide reward signals.
Reinforcement learning jointly optimizes problem proposal and solving.
GeoX improves base VLMs by up to 5.5 points on average.
It matches or exceeds baselines trained on millions of curated data.
The framework targets satellite and aerial images.

Entities

—

Sources

arXiv cs.AI — 2026-05-20