ReCast: New RL Framework Improves Generative Recommendation by 36.6%
A new framework called ReCast has been developed by researchers for reinforcement learning in generative recommendation, focusing on a repair-then-contrast learning-signal approach. This method tackles a significant drawback of standard group-based RL, which presumes that sampled rollout groups can be used as learning signals right away. In the context of sparse-hit generative recommendation, numerous sampled groups remain unlearnable. ReCast enhances learnability for all-zero groups and substitutes full-group reward normalization with a contrastive update centered on the most significant positive and the most challenging negative. It keeps the outer RL framework intact while altering only the construction of within-group signals and partially separates rollout search width from actor-side update width. In various generative recommendation tasks, ReCast consistently surpasses OpenOneRec-RL, achieving up to a 36.6% relative increase in Pass@1. Its matched-budget performance is even more impressive, reaching baseline target performance with just 4.1% of the budget. The paper can be found on arXiv with the identifier 2604.22169.
Key facts
- ReCast is a repair-then-contrast learning-signal framework for RL in generative recommendation.
- It addresses the breakdown of generic group-based RL in sparse-hit scenarios.
- ReCast restores learnability for all-zero groups and uses boundary-focused contrastive updates.
- It modifies only within-group signal construction, leaving the outer RL framework unchanged.
- ReCast partially decouples rollout search width from actor-side update width.
- It outperforms OpenOneRec-RL by up to 36.6% relative improvement in Pass@1.
- ReCast reaches baseline target performance with only 4.1% of the budget.
- The paper is published on arXiv with identifier 2604.22169.
Entities
Institutions
- arXiv