Symbolic Inputs Boost LLM Performance on Abstract Visual Reasoning

ai-technology · 2026-04-25

A recent study published on arXiv (2604.21346) explores whether vision-language models (VLMs) struggle with abstract visual reasoning due to issues in reasoning or representation. Researchers utilized the Bongard-LOGO benchmark to assess end-to-end VLMs with raw images against large language models (LLMs) that received symbolic inputs derived from these images. The study introduced the Componential-Grammatical (C-G) approach, transforming Bongard-LOGO into a symbolic reasoning challenge through LOGO-style action programs or structured descriptions. LLMs demonstrated mid-90s accuracy on Free-form problems, whereas a robust visual baseline performed near chance under equivalent task definitions. Further investigations into input formats, explicit concept prompts, and minimal visual cues underscored representational bottlenecks.

Key facts

Study compares VLMs on raw images with LLMs on symbolic inputs
Uses Bongard-LOGO synthetic benchmark for abstract concept learning
C-G paradigm reformulates benchmark as symbolic reasoning task
LLMs reach mid-90s accuracy on Free-form problems
Visual baseline remains near chance under matched definitions
Ablations test input format, concept prompts, and visual cues
Published on arXiv with ID 2604.21346
Research highlights representational bottlenecks in VLMs

Symbolic Inputs Boost LLM Performance on Abstract Visual Reasoning

Key facts

Entities

Institutions

Sources