ASTOR: Utility-Guided Multi-Task RL for Code LLMs

publication · 2026-05-09

A new framework called ASTOR (multi-tASk code reinforcement learning via uTility-driven coORdination) addresses limitations in multi-task reinforcement learning for code large language models. Proposed in arXiv:2605.06111, ASTOR introduces task utility—a signal capturing each task's learning potential and cross-task synergy—to guide training. It comprises two modules: a Hierarchical Utility-Routed Data Scheduling module that allocates training budget and prioritizes informative prompts, and an Adaptive Utility-Calibrated Policy Optimization module. This approach aims to overcome the inefficiency of deploying separate task-specific specialists and the limitations of existing multi-task RL methods that treat all coding tasks uniformly with fixed data curricula.

Key facts

ASTOR is a multi-task code reinforcement learning framework.
It uses utility-driven coordination.
Task utility captures learning potential and cross-task synergy.
Two modules: data scheduling and policy optimization.
Addresses limitations of fixed data curricula in multi-task RL.
Published on arXiv with ID 2605.06111.
Aims to reduce costs of deploying separate task-specific specialists.
Announcement type is cross.

ASTOR: Utility-Guided Multi-Task RL for Code LLMs

Key facts

Entities

Institutions

Sources