GraphARC Benchmark Tests AI on Graph-Based Abstract Reasoning
A new benchmark named GraphARC has been unveiled by researchers to assess abstract reasoning in graph-structured data. Unlike previous benchmarks that are limited to grids or text, GraphARC expands the few-shot transformation learning approach from the Abstraction and Reasoning Corpus (ARC). Each task involves deducing a transformation rule from several input-output graph pairs and applying it to a fresh test graph, addressing local, global, and hierarchical transformations. GraphARC can generate instances across various graph families and sizes, facilitating thorough generalization evaluation. Testing on advanced language models indicates a comprehension-execution gap, as these models can identify graph properties but struggle with complete graph transformation tasks, particularly as complexity increases. The benchmark is detailed in a paper available on arXiv (2605.31031).
Key facts
- GraphARC is a benchmark for abstract reasoning on graph-structured data.
- It generalizes the few-shot transformation learning paradigm of the Abstraction and Reasoning Corpus (ARC).
- Each task requires inferring a transformation rule from a few input-output pairs and applying it to a new test graph.
- Transformations cover local, global, and hierarchical graph changes.
- GraphARC instances can be generated at scale across diverse graph families and sizes.
- State-of-the-art language models show a comprehension-execution gap on GraphARC.
- Models can answer questions about graph properties but often fail to solve full transformation tasks.
- Performance further degrades with increasing complexity.
Entities
Institutions
- arXiv