XGrammar-2: Dynamic Structured Generation Engine for LLM Agents

ai-technology · 2026-05-27

XGrammar-2 serves as a structured generation engine tailored for dynamic agentic tasks in large language models (LLMs), including tool invocation and response protocols. It tackles the difficulties arising from inconsistent output formats both across and within requests. Notable advancements feature TagDispatch for flexible structural dispatch and Cross-Grammar Cache for reusing substructure-level caches. Further enhancements consist of an Earley-based adaptive token mask cache, just-in-time compilation, and compression of repetition states. Experimental results indicate that it achieves more than 6 times faster compilation than previous engines.

Key facts

XGrammar-2 is a structured generation engine for dynamic agentic workloads.
It supports tag-triggered structure switching and fine-grained reuse across requests.
TagDispatch enables dynamic structural dispatching.
Cross-Grammar Cache allows substructure-level cache reuse across grammars.
It uses an Earley-based adaptive token mask cache.
Just-in-time compilation and repetition state compression improve efficiency.
Experiments show over 6x faster compilation than prior engines.
The engine targets modern LLM agents with dynamic structured generation needs.

XGrammar-2: Dynamic Structured Generation Engine for LLM Agents

Key facts

Entities

Institutions

Sources