ARTFEED — Contemporary Art Intelligence

SymbolBench Benchmark Tests LLMs on Time Series Symbolic Reasoning

ai-technology · 2026-04-27

A new benchmark named SymbolBench has been launched to assess the capability of large language models (LLMs) in executing symbolic reasoning on real-world time series data. It encompasses three distinct tasks: causal discovery, Boolean network inference, and multivariate symbolic regression. Unlike earlier initiatives that focused solely on basic algebraic equations, SymbolBench incorporates a variety of symbolic forms with differing levels of complexity. Additionally, the research introduces a cohesive framework that merges LLMs with genetic programming, creating a closed-loop system for symbolic reasoning. This study tackles a fundamental issue that traces back to Kepler's findings on planetary motion: revealing concealed symbolic laws from time series data. The findings are available on arXiv with the identifier 2508.03963.

Key facts

  • SymbolBench is a comprehensive benchmark for symbolic reasoning over real-world time series.
  • The benchmark assesses three tasks: multivariate symbolic regression, Boolean network inference, and causal discovery.
  • SymbolBench covers diverse symbolic forms with varying complexity, unlike prior limited efforts.
  • A unified framework integrates LLMs with genetic programming for closed-loop symbolic reasoning.
  • The aspiration of uncovering symbolic laws from time series dates back to Kepler's discovery of planetary motion.
  • The research is published on arXiv with identifier 2508.03963.
  • LLMs show promise in structured reasoning tasks but their ability in time series symbolic reasoning is underexplored.
  • The study systematically evaluates LLM capability in inferring interpretable, context-aligned symbolic structures.

Entities

Institutions

  • arXiv

Sources