Ecom-RLVE Framework Extends Verifiable Environments to E-Commerce Conversational AI

ai-technology · 2026-04-19

The Ecom-RLVE framework modifies the RLVE approach for multi-turn, tool-enhanced e-commerce dialogues, launching EcomRLVE-GYM, which includes eight verifiable environments focused on product discovery, substitution, cart creation, returns, order tracking, policy Q&A, bundle planning, and multi-intent journeys. Each environment is equipped with procedural problem generation, a 12-axis difficulty curriculum, and rewards that can be verified algorithmically. A Qwen 3 8B model underwent training with DAPO over 300 steps, demonstrating effective scaling and adaptive challenge levels. This initiative emerged from the PyTorch OpenEnv Hackathon and is still in development. All configurations and environments are open-source, featuring a 2M-product catalog on Hugging Face Hub. Key research contributions include RLVE (ICML 2025), DAPO, and the Qwen3 Technical Report.

Key facts

Ecom-RLVE extends RLVE framework to multi-turn e-commerce conversations
EcomRLVE-GYM provides eight verifiable environments with algorithmic rewards
Trained Qwen 3 8B model with DAPO over 300 steps
Features 12-axis difficulty curriculum with adaptive scheduling
Rewards combine task completion, efficiency, and hallucination penalties
Project originated in PyTorch OpenEnv Hackathon
Environments and training configs are open-source
Includes user simulator based on Qwen3.5 (9.7B)

Entities

Institutions

Hugging Face
PyTorch
Meta AI
DeepSeek-AI
Qwen Team

Sources

Hugging Face Blog — 2026-04-16