RSAT: Structured Attribution Improves Table Reasoning in Small Language Models

ai-technology · 2026-05-04

Researchers have introduced RSAT, a method that trains small language models (SLMs) of 1-8 billion parameters to produce step-by-step reasoning with cell-level citations when answering table questions. The approach consists of two phases: Phase 1 uses supervised fine-tuning (SFT) to teach a structured JSON output format from verified reasoning traces, while Phase 2 applies group relative policy optimization (GRPO) with a composite reward centered on NLI-based faithfulness, citation validity, and parsimony. Tested across six models from two families—Qwen 2.5 (1.5B, 3B, 7B) and Llama 3 (1B, 3B, 8B)—RSAT improved faithfulness by 3.7 times over SFT alone (from 0.224 to 0.826), with near-perfect citation validity of 0.992. Post-hoc attribution methods collapsed below 13% format success, demonstrating that attribution must be integrated into reasoning rather than retrofitted. Ablation studies showed the faithfulness reward is essential: removing it dropped faithfulness from 0.97 to 0.03.

Key facts

RSAT trains small language models (1-8B parameters) for table reasoning with cell-level citations.
Phase 1: SFT teaches structured JSON output from verified reasoning traces.
Phase 2: GRPO optimizes a composite reward including NLI-based faithfulness, citation validity, and parsimony.
Tested on Qwen 2.5 (1.5B, 3B, 7B) and Llama 3 (1B, 3B, 8B) models.
Faithfulness improved 3.7× over SFT alone (0.224 to 0.826).
Citation validity reached near-perfect 0.992.
Post-hoc attribution methods had less than 13% format success.
Removing faithfulness reward dropped faithfulness from 0.97 to 0.03.

RSAT: Structured Attribution Improves Table Reasoning in Small Language Models

Key facts

Entities

Institutions

Sources