Semantic Context Boosts LLM Accuracy on Database Queries by 17-23 Points

ai-technology · 2026-04-30

A recent study published on arXiv (2604.25149) reveals that supplying LLMs with a semantic context document greatly enhances their performance in natural-language queries of analytical databases. Researchers evaluated three leading models—Claude Opus 4.7, Claude Sonnet 4.6, and GPT-5.4—using 100 questions based on the Cleaned Contoso Retail Dataset in ClickHouse. Each model underwent two tests: one with just the warehouse schema and another with the schema plus a 4 KB markdown document detailing measures, conventions, and disambiguation rules. The inclusion of the document resulted in an accuracy boost of +17 to +23 percentage points for all models. When provided with the document, their performance ranged from 67.7% to 68.7%, while performance without it was statistically similar. The study notes that both inaccurate responses and confident hallucinations arise when models attempt to deduce business semantics absent from the schema.

Key facts

Study published on arXiv under ID 2604.25149
Benchmarked Claude Opus 4.7, Claude Sonnet 4.6, and GPT-5.4
Used Cleaned Contoso Retail Dataset in ClickHouse
100 natural-language questions tested
Paired single-shot protocol applied
Accuracy improved by +17 to +23 percentage points with semantic context
With context, models scored 67.7-68.7%
Without context, models were statistically indistinguishable

Semantic Context Boosts LLM Accuracy on Database Queries by 17-23 Points

Key facts

Entities

Institutions

Sources