LoPE: Prompt Perturbation Boosts LLM Reasoning in GRPO

ai-technology · 2026-05-09

Researchers propose Lorem Perturbation for Exploration (LoPE), a training framework that addresses the zero-advantage problem in Group Relative Policy Optimization (GRPO) for large language models. When all sampled rollouts for a query fail, GRPO loses effective training signals. LoPE introduces task-irrelevant prompt-space perturbations to shift the model's output distribution, enabling broader reasoning exploration without increasing sampling budgets. The method aims to improve success rates on complex reasoning tasks.

Key facts

GRPO suffers from zero-advantage problem when all rollouts fail
LoPE uses prompt-space perturbations to unlock exploration
LoPE is a simple yet effective training framework
Task-irrelevant perturbations shift output distribution
LoPE aims to improve success rates on complex tasks
Method does not require increasing sampling budget
Paper published on arXiv with ID 2605.05566
LoPE stands for Lorem Perturbation for Exploration

LoPE: Prompt Perturbation Boosts LLM Reasoning in GRPO

Key facts

Entities

Institutions

Sources