Intent2Tx Benchmark Tests LLMs for Ethereum Transaction Translation

ai-technology · 2026-05-01

Researchers have introduced Intent2Tx, a benchmark designed to evaluate large language models (LLMs) on translating natural language intents into Ethereum transactions. The benchmark comprises 29,921 single-step and 1,575 multi-step instances derived from 300 days of real-world Ethereum mainnet traces, covering 11 categories including long-tail DeFi primitives. Unlike prior benchmarks relying on synthetic instructions, Intent2Tx grounds intents in actual protocol interactions. An execution-aware framework uses differential state analysis on forked mainnet environments for evaluation. Testing 16 state-of-the-art LLMs revealed performance variations, with scaling and retrieval methods showing promise. The work is detailed in arXiv:2604.27763.

Key facts

Intent2Tx benchmark has 29,921 single-step and 1,575 multi-step instances
Instances derived from 300 days of real Ethereum mainnet traces
Covers 11 categories including long-tail DeFi primitives
Uses execution-aware framework with differential state analysis on forked mainnet
Evaluated 16 state-of-the-art LLMs
Scaling and retrieval methods improved performance
Published on arXiv with ID 2604.27763
Focuses on translating high-level user intents into on-chain transactions

Intent2Tx Benchmark Tests LLMs for Ethereum Transaction Translation

Key facts

Entities

Institutions

Sources