ARTFEED — Contemporary Art Intelligence

MemReward: Graph-Based Memory for LLM Reward with Scarce Labels

other · 2026-05-25

MemReward is a graph-based experience memory framework designed to improve reward prediction for large language models (LLMs) in reinforcement learning when ground-truth labels are limited. The system stores rollouts (thinking processes) and propagates reward signals from labeled to unlabeled samples, inspired by semi-supervised learning. This addresses challenges in data-scarce scenarios, such as evaluating mathematical proofs or open-ended question answering, where human annotation or expert verification is expensive. MemReward integrates directly into online policy optimization, enhancing the effectiveness of reinforcement learning fine-tuning with scarce labels. The paper is available on arXiv under ID 2603.19310.

Key facts

  • MemReward is a graph-based experience memory framework for LLM reward prediction.
  • It addresses reinforcement learning with limited ground-truth labels.
  • The method propagates rewards from labeled to unlabeled rollouts.
  • It is inspired by semi-supervised learning techniques.
  • Target applications include mathematical proof evaluation and open-ended QA.
  • MemReward integrates into online policy optimization.
  • The paper is published on arXiv with ID 2603.19310.
  • It aims to reduce reliance on expensive human annotation.

Entities

Institutions

  • arXiv

Sources