ARTFEED — Contemporary Art Intelligence

Symbolic Inputs Boost LLM Performance on Abstract Visual Reasoning

ai-technology · 2026-04-25

A recent study published on arXiv (2604.21346) explores whether vision-language models (VLMs) struggle with abstract visual reasoning due to issues in reasoning or representation. Researchers utilized the Bongard-LOGO benchmark to assess end-to-end VLMs with raw images against large language models (LLMs) that received symbolic inputs derived from these images. The study introduced the Componential-Grammatical (C-G) approach, transforming Bongard-LOGO into a symbolic reasoning challenge through LOGO-style action programs or structured descriptions. LLMs demonstrated mid-90s accuracy on Free-form problems, whereas a robust visual baseline performed near chance under equivalent task definitions. Further investigations into input formats, explicit concept prompts, and minimal visual cues underscored representational bottlenecks.

Key facts

  • Study compares VLMs on raw images with LLMs on symbolic inputs
  • Uses Bongard-LOGO synthetic benchmark for abstract concept learning
  • C-G paradigm reformulates benchmark as symbolic reasoning task
  • LLMs reach mid-90s accuracy on Free-form problems
  • Visual baseline remains near chance under matched definitions
  • Ablations test input format, concept prompts, and visual cues
  • Published on arXiv with ID 2604.21346
  • Research highlights representational bottlenecks in VLMs

Entities

Institutions

  • arXiv

Sources