TowerMind: A New Benchmark for LLM Agents Using Tower Defense Games
Researchers have introduced TowerMind, a novel environment and benchmark for evaluating Large Language Models (LLMs) as agents, grounded in the tower defense (TD) subgenre of real-time strategy (RTS) games. TowerMind addresses limitations of existing RTS game environments, which either have high computational demands or lack textual observations, by offering low computational requirements and a multimodal observation space. This allows for assessing LLMs' long-term planning and decision-making capabilities, which are crucial for adapting to diverse scenarios. The environment preserves key evaluation strengths of RTS games while being more accessible for LLM testing.
Key facts
- TowerMind is a new environment for LLM agents based on tower defense games.
- It features low computational demands and multimodal observation space.
- Existing RTS environments have high computational demands or lack textual observations.
- LLMs are being evaluated for long-term planning and decision-making capabilities.
- RTS games require macro-level strategic planning and micro-level tactical adaptation.
- The environment is designed to benchmark LLMs as agents.
- TowerMind is presented in arXiv paper 2601.05899.
- The paper was announced as a replace type on arXiv.
Entities
Institutions
- arXiv