ARTFEED — Contemporary Art Intelligence

GeoSym127K: A Scalable Framework for Multimodal Geometric Reasoning

other · 2026-05-20

A group of researchers has launched GeoSym127K, a comprehensive dataset designed to improve multimodal geometric reasoning. This project tackles challenges seen in Large Multimodal Models (LMMs), like visual hallucinations and a lack of Chain-of-Thought data. The GeoSym Engine, which is an automated neuro-symbolic system, uses type-conditional grammar and an analytic SymGT Solver to generate accurate symbolic ground truths, along with a rendering pipeline for creating precise diagrams. The dataset itself includes 51,000 high-resolution images, 127,000 questions with symbolic ground truths, and 55,000 answer-verified CoT QA pairs, sorted by difficulty. Additionally, GeoSym-Bench offers a specially curated set of 511 complex samples for detailed evaluation, showing that GeoSym boosts performance in diagram and multi-step geometry tasks during supervised fine-tuning tests.

Key facts

  • GeoSym127K is a dataset for multimodal geometric reasoning.
  • It addresses visual hallucinations and lack of precise CoT data in LMMs.
  • The GeoSym Engine is an automated neuro-symbolic framework.
  • It uses a type-conditional grammar and SymGT Solver for symbolic ground truths.
  • The dataset includes 51K high-resolution images and 127K questions.
  • It has 55K answer-verified CoT QA pairs.
  • GeoSym-Bench is an expert-curated suite of 511 complex samples.
  • Supervised fine-tuning shows improvements on diagram-dependent and multi-step tasks.

Entities

Sources