UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

other · 2026-05-27

The arXiv preprint 2605.26646 presents UnityMAS-O, a framework designed for optimizing reinforcement learning in multi-agent systems that utilize large language models (LLMs). In contrast to current RL post-training frameworks focused on optimizing a single policy, UnityMAS-O considers the entire workflow as the optimization unit. This allows for customizable multi-agent workflows, structured interactions, role-specific credit assignments, and adjustable parameter sharing. The framework conceptualizes workflows using four primary objects: logical agent roles, graph trajectories, user-defined rewards, and agent-model mappings, which facilitate complete sharing, full separation, and partial sharing by decoupling logical agents from their physical model parameters.

Key facts

UnityMAS-O is a general RL optimization framework for LLM-based multi-agent systems.
It treats the complete workflow as the optimization unit.
Existing RL post-training frameworks mainly target single-policy optimization.
UnityMAS-O supports user-defined multi-agent workflows, structured interaction, role-specific credit assignment, and configurable parameter sharing.
The framework represents workflows through four first-class objects: logical agent roles, graph trajectories, user-defined rewards, and agent-model mappings.
It decouples logical agents from physical model parameters.
Supports full sharing, full separation, and partial sharing of parameters.
The paper is available on arXiv with ID 2605.26646.

UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems

Key facts

Entities

Institutions

Sources