New AI Benchmark DW-Bench Tests LLMs on Data Warehouse Graph Reasoning

ai-technology · 2026-04-22

A research paper introduces DW-Bench, a benchmark designed to evaluate large language models on their ability to reason about graph topologies within data warehouse schemas. The benchmark incorporates both foreign-key relationships and data-lineage edges across five different schemas. It contains 1,046 automatically generated questions that have been verified for correctness. Experimental results indicate that methods augmented with tools perform significantly better than static approaches. However, these tool-augmented methods reach a performance plateau when faced with difficult compositional question subtypes. The paper was submitted to arXiv, a repository for scientific preprints.

Key facts

DW-Bench is a new benchmark for evaluating LLMs.
It focuses on graph-topology reasoning over data warehouse schemas.
The benchmark integrates foreign-key and data-lineage edges.
It comprises 1,046 automatically generated questions.
Questions are verifiably correct.
Five different schemas are used.
Tool-augmented methods outperform static approaches.
Performance plateaus on hard compositional subtypes.

New AI Benchmark DW-Bench Tests LLMs on Data Warehouse Graph Reasoning

Key facts

Entities

Institutions

Sources