ARTFEED — Contemporary Art Intelligence

OPT-BENCH: Quality-Aware RLVR Framework for NP-Hard Optimization in LLMs

ai-technology · 2026-05-12

A recent preprint on arXiv presents OPT-BENCH, the inaugural all-encompassing framework designed for the training and assessment of Large Language Models (LLMs) tackling NP-hard optimization challenges through quality-aware Reinforcement Learning with Verifiable Rewards (RLVR). This framework fills a void in current benchmarks, which focus solely on correctness rather than optimality, or the capability to identify the best solutions within constraints. OPT-BENCH comprises three elements: a scalable training setup featuring instance generators, quality verifiers, and optimal baselines across ten tasks; a benchmark of 1,000 instances that measures feasibility through Success Rate and quality via Quality Ratio; and quality-aware rewards that facilitate ongoing enhancement beyond mere binary correctness. Training utilized Qwen2.5-7B-Instruct-1M with 15,000 examples. The paper can be found on arXiv under ID 2605.08905.

Key facts

  • OPT-BENCH is the first framework for training and evaluating LLMs on NP-hard optimization problems with quality-aware RLVR.
  • Existing benchmarks evaluate only correctness, not optimality.
  • The framework includes instance generators, quality verifiers, and optimal baselines across 10 tasks.
  • The benchmark comprises 1,000 instances.
  • Success Rate measures feasibility; Quality Ratio measures quality.
  • Quality-aware rewards enable continuous improvement beyond binary correctness.
  • Training used Qwen2.5-7B-Instruct-1M with 15,000 examples.
  • The paper is published on arXiv with ID 2605.08905.

Entities

Institutions

  • arXiv

Sources