ARTFEED — Contemporary Art Intelligence

TS-Skill Benchmark Evaluates Analytical Skills in Time-Series QA

ai-technology · 2026-05-26

A new benchmark named TS-Skill has been introduced to measure critical analytical abilities in time-series question answering (TSQA). This benchmark assesses three main skills: temporal scale selection, temporal localization, and cross-interval integration. Unlike prior TSQA assessments that emphasized task types, TS-Skill focuses on evaluating fundamental signal-level skills with timestamp-aware inquiries across various fields. Developed by a research team, TS-Skill has been rigorously tested for quality. In conjunction with this, SKEvol, a framework for generating skill-specific time series, has been created. The details of this research are available on arXiv as paper 2605.24703, highlighting its relevance to large language models.

Key facts

  • TS-Skill evaluates three analytical skills: temporal scale selection (SK1), temporal localization (SK2), and cross-interval integration (SK3).
  • Existing TSQA benchmarks are organized by task types or high-level reasoning categories, not signal-level capabilities.
  • TS-Skill includes timestamp-aware questions, broad domain coverage, and human-validated QA quality.
  • SKEvol is a skill-guided agentic framework for constructing the benchmark at scale.
  • The benchmark targets LLMs and TSLMs applied to time-series question answering.
  • The paper is available on arXiv with ID 2605.24703.
  • TSQA requires models to ground answers in temporal signals with patterns at different scales, specific time locations, or across separated intervals.
  • The work aims to diagnose signal-level capabilities driving model performance in TSQA.

Entities

Institutions

  • arXiv

Sources