CreativityBench Tests LLMs on Affordance-Based Tool Repurposing

ai-technology · 2026-05-07

CreativityBench, a newly established benchmark, assesses large language models based on their capability to utilize tools by considering their attributes and affordances instead of traditional applications. It features an extensive affordance knowledge base comprising 4,000 entities and more than 150,000 annotations that connect objects, components, characteristics, and practical uses. This foundation generates 14,000 grounded tasks that demand solutions that are not immediately obvious yet physically feasible within specific constraints. Evaluations conducted on 10 leading LLMs, including both open-source and closed models, reveal that while these models can often identify a plausible tool, they frequently falter in creative reasoning. The findings are documented on arXiv with the ID 2605.02910.

Key facts

CreativityBench evaluates creative tool use in LLMs.
Benchmark uses a knowledge base with 4K entities and 150K+ affordance annotations.
14,000 grounded tasks require non-obvious, physically plausible solutions.
10 state-of-the-art LLMs were evaluated.
Models can select plausible tools but struggle with creative reasoning.
Published on arXiv with ID 2605.02910.
Focus on affordance-based reasoning rather than canonical usage.
Tasks generated under constraints to test creative problem-solving.

CreativityBench Tests LLMs on Affordance-Based Tool Repurposing

Key facts

Entities

Institutions

Sources