ARTFEED — Contemporary Art Intelligence

ICRL4AHT Benchmark Tests In-Context Reinforcement Learning for Ad-Hoc Teamwork

other · 2026-05-26

A new benchmark named ICRL4AHT has been developed by researchers, based on a high-throughput JAX implementation of Overcooked-V2, to assess In-Context Reinforcement Learning (ICRL) within Ad-Hoc Teamwork (AHT) contexts. This benchmark features a varied suite of teammates that includes both RL and heuristic strategies, allowing for controlled training and testing variations. It also offers a reproducible pipeline for generating teammates, collecting learning histories, constructing datasets, and conducting online evaluations across multiple episodes. The study tested Algorithm Distillation (AD) and Decision-Pretrained Transformer (DPT) over millions of transitions. Findings indicate significant drawbacks: unlike their performance in single-agent settings, these methods struggle to coordinate with unfamiliar partners, underscoring the difficulties of implementing ICRL in AHT.

Key facts

  • ICRL4AHT benchmark is built on a high-throughput JAX implementation of Overcooked-V2
  • Benchmark includes a large, diverse teammate suite spanning RL and heuristic policies
  • Enables controlled train-test shifts
  • Provides reproducible end-to-end pipeline for teammate generation, learning-history collection, dataset construction, and online multi-episode evaluation
  • Evaluated Algorithm Distillation (AD) and Decision-Pretrained Transformer (DPT)
  • Evaluated across millions of transitions
  • Baselines fail to effectively coordinate with unknown partners
  • Study highlights limitations of ICRL in Ad-Hoc Teamwork

Entities

Institutions

  • arXiv

Sources