Ecom-RLVE Framework Extends Verifiable Environments to E-Commerce Conversational AI
The Ecom-RLVE framework modifies the RLVE approach for multi-turn, tool-enhanced e-commerce dialogues, launching EcomRLVE-GYM, which includes eight verifiable environments focused on product discovery, substitution, cart creation, returns, order tracking, policy Q&A, bundle planning, and multi-intent journeys. Each environment is equipped with procedural problem generation, a 12-axis difficulty curriculum, and rewards that can be verified algorithmically. A Qwen 3 8B model underwent training with DAPO over 300 steps, demonstrating effective scaling and adaptive challenge levels. This initiative emerged from the PyTorch OpenEnv Hackathon and is still in development. All configurations and environments are open-source, featuring a 2M-product catalog on Hugging Face Hub. Key research contributions include RLVE (ICML 2025), DAPO, and the Qwen3 Technical Report.
Key facts
- Ecom-RLVE extends RLVE framework to multi-turn e-commerce conversations
- EcomRLVE-GYM provides eight verifiable environments with algorithmic rewards
- Trained Qwen 3 8B model with DAPO over 300 steps
- Features 12-axis difficulty curriculum with adaptive scheduling
- Rewards combine task completion, efficiency, and hallucination penalties
- Project originated in PyTorch OpenEnv Hackathon
- Environments and training configs are open-source
- Includes user simulator based on Qwen3.5 (9.7B)
Entities
Institutions
- Hugging Face
- PyTorch
- Meta AI
- DeepSeek-AI
- Qwen Team