ARTFEED — Contemporary Art Intelligence

ASTOR: Utility-Guided Multi-Task RL for Code LLMs

publication · 2026-05-09

A new framework called ASTOR (multi-tASk code reinforcement learning via uTility-driven coORdination) addresses limitations in multi-task reinforcement learning for code large language models. Proposed in arXiv:2605.06111, ASTOR introduces task utility—a signal capturing each task's learning potential and cross-task synergy—to guide training. It comprises two modules: a Hierarchical Utility-Routed Data Scheduling module that allocates training budget and prioritizes informative prompts, and an Adaptive Utility-Calibrated Policy Optimization module. This approach aims to overcome the inefficiency of deploying separate task-specific specialists and the limitations of existing multi-task RL methods that treat all coding tasks uniformly with fixed data curricula.

Key facts

  • ASTOR is a multi-task code reinforcement learning framework.
  • It uses utility-driven coordination.
  • Task utility captures learning potential and cross-task synergy.
  • Two modules: data scheduling and policy optimization.
  • Addresses limitations of fixed data curricula in multi-task RL.
  • Published on arXiv with ID 2605.06111.
  • Aims to reduce costs of deploying separate task-specific specialists.
  • Announcement type is cross.

Entities

Institutions

  • arXiv

Sources