SymbolBench Benchmark Tests LLMs on Time Series Symbolic Reasoning
A new benchmark named SymbolBench has been launched to assess the capability of large language models (LLMs) in executing symbolic reasoning on real-world time series data. It encompasses three distinct tasks: causal discovery, Boolean network inference, and multivariate symbolic regression. Unlike earlier initiatives that focused solely on basic algebraic equations, SymbolBench incorporates a variety of symbolic forms with differing levels of complexity. Additionally, the research introduces a cohesive framework that merges LLMs with genetic programming, creating a closed-loop system for symbolic reasoning. This study tackles a fundamental issue that traces back to Kepler's findings on planetary motion: revealing concealed symbolic laws from time series data. The findings are available on arXiv with the identifier 2508.03963.
Key facts
- SymbolBench is a comprehensive benchmark for symbolic reasoning over real-world time series.
- The benchmark assesses three tasks: multivariate symbolic regression, Boolean network inference, and causal discovery.
- SymbolBench covers diverse symbolic forms with varying complexity, unlike prior limited efforts.
- A unified framework integrates LLMs with genetic programming for closed-loop symbolic reasoning.
- The aspiration of uncovering symbolic laws from time series dates back to Kepler's discovery of planetary motion.
- The research is published on arXiv with identifier 2508.03963.
- LLMs show promise in structured reasoning tasks but their ability in time series symbolic reasoning is underexplored.
- The study systematically evaluates LLM capability in inferring interpretable, context-aligned symbolic structures.
Entities
Institutions
- arXiv