Decision Language Model for Multi-Agent Sequential Decision Making
A novel framework named Decision Language Model (DLM) has been introduced to tackle issues in offline multi-agent reinforcement learning (MARL). DLM reinterprets multi-agent decision-making as a sequence prediction challenge resembling a dialogue, utilizing centralized training while allowing decentralized execution. It employs large language models (LLMs) to manage diverse observations and actions, thus addressing the constraints of rigid formats. The training process includes supervised fine-tuning on datasets that mimic dialogue and group relative policy optimization to enhance robustness.
Key facts
- DLM is proposed for offline multi-agent sequential decision making.
- It uses a dialogue-style sequence prediction approach.
- Training includes supervised fine-tuning and group relative policy optimization.
- DLM leverages LLMs for flexible modeling of observations and actions.
- The framework operates under centralized training with decentralized execution.
- It aims to improve generalization from offline datasets.
- The approach addresses limitations of fixed observation formats and action spaces.
- The paper is available on arXiv with ID 2604.23557.
Entities
Institutions
- arXiv