OPT-BENCH: Quality-Aware RLVR Framework for NP-Hard Optimization in LLMs

ai-technology · 2026-05-12

A recent preprint on arXiv presents OPT-BENCH, the inaugural all-encompassing framework designed for the training and assessment of Large Language Models (LLMs) tackling NP-hard optimization challenges through quality-aware Reinforcement Learning with Verifiable Rewards (RLVR). This framework fills a void in current benchmarks, which focus solely on correctness rather than optimality, or the capability to identify the best solutions within constraints. OPT-BENCH comprises three elements: a scalable training setup featuring instance generators, quality verifiers, and optimal baselines across ten tasks; a benchmark of 1,000 instances that measures feasibility through Success Rate and quality via Quality Ratio; and quality-aware rewards that facilitate ongoing enhancement beyond mere binary correctness. Training utilized Qwen2.5-7B-Instruct-1M with 15,000 examples. The paper can be found on arXiv under ID 2605.08905.

Key facts

OPT-BENCH is the first framework for training and evaluating LLMs on NP-hard optimization problems with quality-aware RLVR.
Existing benchmarks evaluate only correctness, not optimality.
The framework includes instance generators, quality verifiers, and optimal baselines across 10 tasks.
The benchmark comprises 1,000 instances.
Success Rate measures feasibility; Quality Ratio measures quality.
Quality-aware rewards enable continuous improvement beyond binary correctness.
Training used Qwen2.5-7B-Instruct-1M with 15,000 examples.
The paper is published on arXiv with ID 2605.08905.

OPT-BENCH: Quality-Aware RLVR Framework for NP-Hard Optimization in LLMs

Key facts

Entities

Institutions

Sources