TS-Skill Benchmark Evaluates Analytical Skills in Time-Series QA

ai-technology · 2026-05-26

A new benchmark named TS-Skill has been introduced to measure critical analytical abilities in time-series question answering (TSQA). This benchmark assesses three main skills: temporal scale selection, temporal localization, and cross-interval integration. Unlike prior TSQA assessments that emphasized task types, TS-Skill focuses on evaluating fundamental signal-level skills with timestamp-aware inquiries across various fields. Developed by a research team, TS-Skill has been rigorously tested for quality. In conjunction with this, SKEvol, a framework for generating skill-specific time series, has been created. The details of this research are available on arXiv as paper 2605.24703, highlighting its relevance to large language models.

Key facts

TS-Skill evaluates three analytical skills: temporal scale selection (SK1), temporal localization (SK2), and cross-interval integration (SK3).
Existing TSQA benchmarks are organized by task types or high-level reasoning categories, not signal-level capabilities.
TS-Skill includes timestamp-aware questions, broad domain coverage, and human-validated QA quality.
SKEvol is a skill-guided agentic framework for constructing the benchmark at scale.
The benchmark targets LLMs and TSLMs applied to time-series question answering.
The paper is available on arXiv with ID 2605.24703.
TSQA requires models to ground answers in temporal signals with patterns at different scales, specific time locations, or across separated intervals.
The work aims to diagnose signal-level capabilities driving model performance in TSQA.

TS-Skill Benchmark Evaluates Analytical Skills in Time-Series QA

Key facts

Entities

Institutions

Sources