Intent2Tx Benchmark Tests LLMs for Ethereum Transaction Translation
Researchers have introduced Intent2Tx, a benchmark designed to evaluate large language models (LLMs) on translating natural language intents into Ethereum transactions. The benchmark comprises 29,921 single-step and 1,575 multi-step instances derived from 300 days of real-world Ethereum mainnet traces, covering 11 categories including long-tail DeFi primitives. Unlike prior benchmarks relying on synthetic instructions, Intent2Tx grounds intents in actual protocol interactions. An execution-aware framework uses differential state analysis on forked mainnet environments for evaluation. Testing 16 state-of-the-art LLMs revealed performance variations, with scaling and retrieval methods showing promise. The work is detailed in arXiv:2604.27763.
Key facts
- Intent2Tx benchmark has 29,921 single-step and 1,575 multi-step instances
- Instances derived from 300 days of real Ethereum mainnet traces
- Covers 11 categories including long-tail DeFi primitives
- Uses execution-aware framework with differential state analysis on forked mainnet
- Evaluated 16 state-of-the-art LLMs
- Scaling and retrieval methods improved performance
- Published on arXiv with ID 2604.27763
- Focuses on translating high-level user intents into on-chain transactions
Entities
Institutions
- arXiv