ARTFEED — Contemporary Art Intelligence

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

other · 2026-05-27

The arXiv preprint 2605.26646 presents UnityMAS-O, a framework designed for optimizing reinforcement learning in multi-agent systems that utilize large language models (LLMs). In contrast to current RL post-training frameworks focused on optimizing a single policy, UnityMAS-O considers the entire workflow as the optimization unit. This allows for customizable multi-agent workflows, structured interactions, role-specific credit assignments, and adjustable parameter sharing. The framework conceptualizes workflows using four primary objects: logical agent roles, graph trajectories, user-defined rewards, and agent-model mappings, which facilitate complete sharing, full separation, and partial sharing by decoupling logical agents from their physical model parameters.

Key facts

  • UnityMAS-O is a general RL optimization framework for LLM-based multi-agent systems.
  • It treats the complete workflow as the optimization unit.
  • Existing RL post-training frameworks mainly target single-policy optimization.
  • UnityMAS-O supports user-defined multi-agent workflows, structured interaction, role-specific credit assignment, and configurable parameter sharing.
  • The framework represents workflows through four first-class objects: logical agent roles, graph trajectories, user-defined rewards, and agent-model mappings.
  • It decouples logical agents from physical model parameters.
  • Supports full sharing, full separation, and partial sharing of parameters.
  • The paper is available on arXiv with ID 2605.26646.

Entities

Institutions

  • arXiv

Sources