LLMs as Goal Recognizers: First Systematic Zero-Shot Evaluation

ai-technology · 2026-05-18

A recent paper on arXiv (2605.15333) introduces the inaugural systematic zero-shot assessment of advanced large language models (LLMs) as goal recognizers using traditional PDDL benchmarks. The findings indicate that the ability of LLMs in goal recognition varies significantly: certain models improve with additional evidence and achieve near landmark accuracy with complete observations, while others remain reliant on prior world knowledge despite accumulating evidence. An analysis of reasoning paths shows that this variation stems from a core difference in how evidence is integrated, rather than in deductive reasoning. The authors contend that goal recognition, an abductive task that assesses alignment with world knowledge, aligns more closely with the strengths of LLMs compared to planning, which necessitates the creation of new action sequences.

Key facts

First systematic zero-shot evaluation of frontier LLMs as goal recognizers on classical PDDL benchmarks.
Some LLMs scale with evidence and approach landmark-based accuracy at full observations.
Other LLMs remain anchored to world-knowledge priors regardless of evidence accumulation.
Divergence reflects a fundamental difference in evidence integration rather than deduction.
Goal recognition is an abductive task evaluating consistency with world knowledge.
LLM planning competence relies on world-knowledge exploitation rather than genuine symbolic reasoning.
Paper available on arXiv with ID 2605.15333.
Goal recognition is structurally better suited to LLM strengths than planning.

LLMs as Goal Recognizers: First Systematic Zero-Shot Evaluation

Key facts

Entities

Institutions

Sources