ARTFEED — Contemporary Art Intelligence

Sample Difficulty's Non-Monotonic Role in RLVR for LLMs

ai-technology · 2026-05-28

A new arXiv preprint (2605.28388) investigates the mechanistic role of sample difficulty in Reinforcement Learning with Verifiable Reward (RLVR) for large language models (LLMs). The study finds that sample difficulty has a non-monotonic effect: easy and medium-difficulty problems yield the strongest reasoning improvements, while overly hard problems provide weak learning signals, induce degenerate behaviors like answer repetition or skipping necessary computation, and can degrade pre-existing capabilities. Using Temporal Sparse Autoencoders (T-SAE), the authors analyze internal feature dynamics, revealing that easy problems reinforce direct-answer and basic computation pathways. The research focuses on mathematics and programming tasks.

Key facts

  • Study examines RLVR for LLMs
  • Sample difficulty has non-monotonic effect
  • Easy and medium problems yield strongest improvements
  • Overly hard problems cause weak signals and degenerate behaviors
  • Degenerate behaviors include answer repetition and skipping computation
  • Hard problems can degrade pre-existing capabilities
  • Temporal Sparse Autoencoders (T-SAE) used for internal analysis
  • Focus on mathematics and programming tasks

Entities

Institutions

  • arXiv

Sources