ARTFEED — Contemporary Art Intelligence

Intent2Tx Benchmark Tests LLMs for Ethereum Transaction Translation

ai-technology · 2026-05-01

Researchers have introduced Intent2Tx, a benchmark designed to evaluate large language models (LLMs) on translating natural language intents into Ethereum transactions. The benchmark comprises 29,921 single-step and 1,575 multi-step instances derived from 300 days of real-world Ethereum mainnet traces, covering 11 categories including long-tail DeFi primitives. Unlike prior benchmarks relying on synthetic instructions, Intent2Tx grounds intents in actual protocol interactions. An execution-aware framework uses differential state analysis on forked mainnet environments for evaluation. Testing 16 state-of-the-art LLMs revealed performance variations, with scaling and retrieval methods showing promise. The work is detailed in arXiv:2604.27763.

Key facts

  • Intent2Tx benchmark has 29,921 single-step and 1,575 multi-step instances
  • Instances derived from 300 days of real Ethereum mainnet traces
  • Covers 11 categories including long-tail DeFi primitives
  • Uses execution-aware framework with differential state analysis on forked mainnet
  • Evaluated 16 state-of-the-art LLMs
  • Scaling and retrieval methods improved performance
  • Published on arXiv with ID 2604.27763
  • Focuses on translating high-level user intents into on-chain transactions

Entities

Institutions

  • arXiv

Sources