Study Examines RLVR Effectiveness for Small Language Models with Limited Data and Compute

ai-technology · 2026-04-22

A new empirical study investigates how Reinforcement Learning with Verifiable Rewards (RLVR) performs when applied to open-source Small Language Models (SLMs) under constrained conditions of data and computational resources. The research addresses a gap in previous work, which typically assumes abundant high-quality annotated data and substantial compute power for fine-tuning Large Language Models (LLMs). This work presents a comprehensive analysis across three novel datasets designed for number counting problems, graph reasoning, and spatial reasoning tasks. It characterizes how model performance scales with variations in dataset size, diversity, and complexity. A key finding is that procedural datasets enable fine-grained evaluation and facilitate the development of training datasets with controllable properties. The study demonstrates that RLVR can be effectively utilized in low-data regimes, expanding its applicability to real-world settings where annotated data and accessible compute are often scarce. The research is documented in the preprint arXiv:2604.18381v1, which was announced as new. The approach contrasts with prior explorations that focused on scaling both data and compute to enhance model reasoning capabilities through RLVR.

Key facts

The study focuses on Reinforcement Learning with Verifiable Rewards (RLVR) for Small Language Models (SLMs).
It examines performance in low-data and low-compute regimes.
Three novel datasets cover number counting, graph reasoning, and spatial reasoning.
Research characterizes scaling of model performance with dataset size, diversity, and complexity.
Procedural datasets allow for fine-grained evaluation and training dataset development with controllable properties.
The work addresses limitations of previous RLVR studies that assumed abundant data and compute.
Findings aim to increase RLVR applicability in real-world settings with scarce resources.
The research is documented in the preprint arXiv:2604.18381v1, announced as new.

Entities

—

Sources

arXiv cs.AI — 2026-04-21