AstraFlow: A Dataflow-Oriented RL System for Agentic LLMs
AstraFlow is a new reinforcement learning system designed to scale agentic training for large language models. It replaces conventional trainer-centered control with decoupled, autonomous components for rollout services, dataflow management, and training. This architecture supports multi-policy collaborative training and efficient use of elastic, heterogeneous, cross-region compute resources, addressing the prohibitive cost of agentic RL. The system is presented in arXiv paper 2605.15565.
Key facts
- AstraFlow is a dataflow-oriented RL system for agentic LLMs.
- It decouples rollout services, dataflow management, and training into autonomous components.
- It supports multi-policy collaborative training.
- It efficiently uses elastic, heterogeneous, and cross-region compute resources.
- The system addresses the high cost of agentic RL.
- It replaces trainer-centered control architectures.
- The paper is available on arXiv with ID 2605.15565.
- The approach aims to reduce system engineering burden for new extensions.
Entities
Institutions
- arXiv