New Mamba-based AI Framework Developed for Biomimetic Underwater Robot Coordination
A novel artificial intelligence framework called Mamba-based multi-agent group relative policy optimization (M²GRPO) has been introduced to address cooperative pursuit challenges in biomimetic underwater robots. The system tackles issues of long-horizon decision making, partial observability, and inter-robot coordination by integrating a selective state-space Mamba policy with group-relative policy optimization. Operating under the centralized-training and decentralized-execution paradigm, the Mamba-based policy utilizes observation history to capture temporal dependencies over extended periods. It employs attention-based relational features to encode interactions between agents, generating bounded continuous actions through normalized Gaussian sampling. The framework further enhances credit assignment while maintaining stability by calculating group-relative advantages through normalization processes. This approach aims to provide both expressiveness and stability in policy learning methods for underwater robotic systems that mimic biological organisms. The research was documented in the arXiv preprint 2604.19404v1, which was announced as a cross-disciplinary abstract. The work specifically targets fundamental challenges in cooperative pursuit scenarios where traditional methods have proven inadequate for the complex requirements of biomimetic underwater environments.
Key facts
- M²GRPO framework integrates selective state-space Mamba policy with group-relative policy optimization
- Designed for biomimetic underwater robots in cooperative pursuit scenarios
- Addresses long-horizon decision making, partial observability, and inter-robot coordination
- Uses centralized-training and decentralized-execution paradigm
- Mamba-based policy captures temporal dependencies from observation history
- Employs attention-based relational features to encode inter-agent interactions
- Produces bounded continuous actions through normalized Gaussian sampling
- Improves credit assignment while maintaining stability through group-relative advantages
Entities
Institutions
- arXiv