New AI Benchmark RT-QA Uses Executable Code for Real-Time Question Answering

ai-technology · 2026-04-22

Researchers have unveiled RT-QA, an innovative evaluation framework designed to measure the real-time question answering abilities of AI models, effectively overcoming the shortcomings of traditional static benchmarks. This framework utilizes executable code workflows to gather up-to-date information via web crawling and DOM-based extraction of answers. It features a self-repair system that adjusts to alterations in web page layouts, ensuring long-term reliability. RT-QA spans 12 domains, such as Finance and Sports, comprising 320 Chinese questions divided into three levels of difficulty. The framework conducts thorough evaluations of cutting-edge models, generating real-time ground truth autonomously. This approach aims to reflect the temporal dynamics and ever-changing nature of real-world knowledge, essential for practical search-integrated agents. Detailed findings are available in a preprint on arXiv (arXiv:2604.16349v1), announced as a cross-type abstract.

Key facts

RT-QA is a dynamic evaluation framework for real-time question answering
It uses executable code workflows to retrieve up-to-date answers at evaluation time
The framework includes a self-repair mechanism for adapting to web page structure changes
It spans 12 domains such as Finance and Sports
There are 320 Chinese questions categorized into three difficulty levels
Extensive evaluations of state-of-the-art models are conducted
The pipeline autonomously generates code for web crawling and DOM-based answer extraction
The work is detailed in a preprint on arXiv under arXiv:2604.16349v1

New AI Benchmark RT-QA Uses Executable Code for Real-Time Question Answering

Key facts

Entities

Institutions

Sources