ARTFEED — Contemporary Art Intelligence

Tool Attention Reduces MCP Token Overhead in LLM Agent Workflows

ai-technology · 2026-04-25

A recent study published on arXiv (2604.21816) presents Tool Attention, a middleware solution aimed at addressing the 'MCP Tax' or 'Tools Tax' in scalable agentic workflows. The Model Context Protocol (MCP) typically links LLM agents with external tools but results in a per-turn overhead of 10k to 60k tokens due to its stateless and eager schema injection. This excess burden enlarges the key-value cache and hampers reasoning when context usage hits 70%. Tool Attention extends the 'Attention Is All You Need' framework, shifting from self-attention on tokens to gated attention for tools. It integrates an Intent Schema Overlap (ISO) score derived from sentence embeddings, a state-aware gating mechanism for preconditions and access scopes, along with a two-phase lazy schema loader, aiming to lower recurring token budget costs.

Key facts

  • Paper arXiv:2604.21816 introduces Tool Attention.
  • MCP imposes a per-turn overhead of 10k to 60k tokens.
  • Overhead inflates key-value cache and degrades reasoning at 70% context utilization.
  • Tool Attention uses gated attention over tools.
  • Components: Intent Schema Overlap (ISO) score, state-aware gating, two-phase lazy schema loader.
  • ISO score derived from sentence embeddings.
  • Gating enforces preconditions and access scopes.
  • Aims to eliminate MCP Tax in scalable agentic workflows.

Entities

Institutions

  • arXiv

Sources