UnityMAS-O: A General RL Optimization Framework for LLM-Based Multi-Agent Systems
The arXiv preprint 2605.26646 presents UnityMAS-O, a framework designed for optimizing reinforcement learning in multi-agent systems that utilize large language models (LLMs). In contrast to current RL post-training frameworks focused on optimizing a single policy, UnityMAS-O considers the entire workflow as the optimization unit. This allows for customizable multi-agent workflows, structured interactions, role-specific credit assignments, and adjustable parameter sharing. The framework conceptualizes workflows using four primary objects: logical agent roles, graph trajectories, user-defined rewards, and agent-model mappings, which facilitate complete sharing, full separation, and partial sharing by decoupling logical agents from their physical model parameters.
Key facts
- UnityMAS-O is a general RL optimization framework for LLM-based multi-agent systems.
- It treats the complete workflow as the optimization unit.
- Existing RL post-training frameworks mainly target single-policy optimization.
- UnityMAS-O supports user-defined multi-agent workflows, structured interaction, role-specific credit assignment, and configurable parameter sharing.
- The framework represents workflows through four first-class objects: logical agent roles, graph trajectories, user-defined rewards, and agent-model mappings.
- It decouples logical agents from physical model parameters.
- Supports full sharing, full separation, and partial sharing of parameters.
- The paper is available on arXiv with ID 2605.26646.
Entities
Institutions
- arXiv