ICRL4AHT Benchmark Tests In-Context Reinforcement Learning for Ad-Hoc Teamwork
A new benchmark named ICRL4AHT has been developed by researchers, based on a high-throughput JAX implementation of Overcooked-V2, to assess In-Context Reinforcement Learning (ICRL) within Ad-Hoc Teamwork (AHT) contexts. This benchmark features a varied suite of teammates that includes both RL and heuristic strategies, allowing for controlled training and testing variations. It also offers a reproducible pipeline for generating teammates, collecting learning histories, constructing datasets, and conducting online evaluations across multiple episodes. The study tested Algorithm Distillation (AD) and Decision-Pretrained Transformer (DPT) over millions of transitions. Findings indicate significant drawbacks: unlike their performance in single-agent settings, these methods struggle to coordinate with unfamiliar partners, underscoring the difficulties of implementing ICRL in AHT.
Key facts
- ICRL4AHT benchmark is built on a high-throughput JAX implementation of Overcooked-V2
- Benchmark includes a large, diverse teammate suite spanning RL and heuristic policies
- Enables controlled train-test shifts
- Provides reproducible end-to-end pipeline for teammate generation, learning-history collection, dataset construction, and online multi-episode evaluation
- Evaluated Algorithm Distillation (AD) and Decision-Pretrained Transformer (DPT)
- Evaluated across millions of transitions
- Baselines fail to effectively coordinate with unknown partners
- Study highlights limitations of ICRL in Ad-Hoc Teamwork
Entities
Institutions
- arXiv