TS-Skill Benchmark Evaluates Analytical Skills in Time-Series QA
A new benchmark named TS-Skill has been introduced to measure critical analytical abilities in time-series question answering (TSQA). This benchmark assesses three main skills: temporal scale selection, temporal localization, and cross-interval integration. Unlike prior TSQA assessments that emphasized task types, TS-Skill focuses on evaluating fundamental signal-level skills with timestamp-aware inquiries across various fields. Developed by a research team, TS-Skill has been rigorously tested for quality. In conjunction with this, SKEvol, a framework for generating skill-specific time series, has been created. The details of this research are available on arXiv as paper 2605.24703, highlighting its relevance to large language models.
Key facts
- TS-Skill evaluates three analytical skills: temporal scale selection (SK1), temporal localization (SK2), and cross-interval integration (SK3).
- Existing TSQA benchmarks are organized by task types or high-level reasoning categories, not signal-level capabilities.
- TS-Skill includes timestamp-aware questions, broad domain coverage, and human-validated QA quality.
- SKEvol is a skill-guided agentic framework for constructing the benchmark at scale.
- The benchmark targets LLMs and TSLMs applied to time-series question answering.
- The paper is available on arXiv with ID 2605.24703.
- TSQA requires models to ground answers in temporal signals with patterns at different scales, specific time locations, or across separated intervals.
- The work aims to diagnose signal-level capabilities driving model performance in TSQA.
Entities
Institutions
- arXiv