ICRL4AHT Benchmark Tests In-Context Reinforcement Learning for Ad-Hoc Teamwork

other · 2026-05-26

A new benchmark named ICRL4AHT has been developed by researchers, based on a high-throughput JAX implementation of Overcooked-V2, to assess In-Context Reinforcement Learning (ICRL) within Ad-Hoc Teamwork (AHT) contexts. This benchmark features a varied suite of teammates that includes both RL and heuristic strategies, allowing for controlled training and testing variations. It also offers a reproducible pipeline for generating teammates, collecting learning histories, constructing datasets, and conducting online evaluations across multiple episodes. The study tested Algorithm Distillation (AD) and Decision-Pretrained Transformer (DPT) over millions of transitions. Findings indicate significant drawbacks: unlike their performance in single-agent settings, these methods struggle to coordinate with unfamiliar partners, underscoring the difficulties of implementing ICRL in AHT.

Key facts

ICRL4AHT benchmark is built on a high-throughput JAX implementation of Overcooked-V2
Benchmark includes a large, diverse teammate suite spanning RL and heuristic policies
Enables controlled train-test shifts
Provides reproducible end-to-end pipeline for teammate generation, learning-history collection, dataset construction, and online multi-episode evaluation
Evaluated Algorithm Distillation (AD) and Decision-Pretrained Transformer (DPT)
Evaluated across millions of transitions
Baselines fail to effectively coordinate with unknown partners
Study highlights limitations of ICRL in Ad-Hoc Teamwork

ICRL4AHT Benchmark Tests In-Context Reinforcement Learning for Ad-Hoc Teamwork

Key facts

Entities

Institutions

Sources