ASTOR: Utility-Guided Multi-Task RL for Code LLMs
A new framework called ASTOR (multi-tASk code reinforcement learning via uTility-driven coORdination) addresses limitations in multi-task reinforcement learning for code large language models. Proposed in arXiv:2605.06111, ASTOR introduces task utility—a signal capturing each task's learning potential and cross-task synergy—to guide training. It comprises two modules: a Hierarchical Utility-Routed Data Scheduling module that allocates training budget and prioritizes informative prompts, and an Adaptive Utility-Calibrated Policy Optimization module. This approach aims to overcome the inefficiency of deploying separate task-specific specialists and the limitations of existing multi-task RL methods that treat all coding tasks uniformly with fixed data curricula.
Key facts
- ASTOR is a multi-task code reinforcement learning framework.
- It uses utility-driven coordination.
- Task utility captures learning potential and cross-task synergy.
- Two modules: data scheduling and policy optimization.
- Addresses limitations of fixed data curricula in multi-task RL.
- Published on arXiv with ID 2605.06111.
- Aims to reduce costs of deploying separate task-specific specialists.
- Announcement type is cross.
Entities
Institutions
- arXiv