SymbolBench Benchmark Tests LLMs on Time Series Symbolic Reasoning

ai-technology · 2026-04-27

A new benchmark named SymbolBench has been launched to assess the capability of large language models (LLMs) in executing symbolic reasoning on real-world time series data. It encompasses three distinct tasks: causal discovery, Boolean network inference, and multivariate symbolic regression. Unlike earlier initiatives that focused solely on basic algebraic equations, SymbolBench incorporates a variety of symbolic forms with differing levels of complexity. Additionally, the research introduces a cohesive framework that merges LLMs with genetic programming, creating a closed-loop system for symbolic reasoning. This study tackles a fundamental issue that traces back to Kepler's findings on planetary motion: revealing concealed symbolic laws from time series data. The findings are available on arXiv with the identifier 2508.03963.

Key facts

SymbolBench is a comprehensive benchmark for symbolic reasoning over real-world time series.
The benchmark assesses three tasks: multivariate symbolic regression, Boolean network inference, and causal discovery.
SymbolBench covers diverse symbolic forms with varying complexity, unlike prior limited efforts.
A unified framework integrates LLMs with genetic programming for closed-loop symbolic reasoning.
The aspiration of uncovering symbolic laws from time series dates back to Kepler's discovery of planetary motion.
The research is published on arXiv with identifier 2508.03963.
LLMs show promise in structured reasoning tasks but their ability in time series symbolic reasoning is underexplored.
The study systematically evaluates LLM capability in inferring interpretable, context-aligned symbolic structures.

SymbolBench Benchmark Tests LLMs on Time Series Symbolic Reasoning

Key facts

Entities

Institutions

Sources