ARTFEED — Contemporary Art Intelligence

New Metric Reveals Hidden Workflow Failures in LLM Payment Agents

ai-technology · 2026-05-09

A new metric called Agentic Success Rate (ASR) has been developed by researchers to assess trajectory fidelity in multi-agent payment systems utilizing LLMs. Unlike the Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), ASR evaluates the sequences of agent executions at the transition level, breaking down performance into Transition Recall and Transition Precision. When tested on the Hierarchical Multi-Agent System for Payments (HMASP) with 18 LLMs and 90,000 task instances, it was found that 10 out of 18 models consistently bypass a confirmation checkpoint during payment processing—an issue not detected by TSR and HF1—while 8 models correctly implement the checkpoint. GPT-4.1 shows hidden workflow shortcuts despite perfect TSR and HF1, whereas GPT-5.2 achieves flawless ASR. The research suggests improvements through prompt adjustments and deterministic routing guards informed by ASR.

Key facts

  • Agentic Success Rate (ASR) is a new trajectory-fidelity metric for LLM-based multi-agent payment systems.
  • ASR compares observed and expected agent execution sequences at the transition level.
  • ASR decomposes performance into Transition Recall and Transition Precision.
  • Applied to Hierarchical Multi-Agent System for Payments (HMASP) across 18 LLMs and 90,000 task instances.
  • 10 of 18 models systematically skip a confirmation checkpoint during payment checkout.
  • The skipped checkpoint is invisible to Task Success Rate (TSR) and Agent Handoff F1-Score (HF1).
  • GPT-4.1 exhibits hidden workflow shortcuts despite achieving perfect TSR and HF1.
  • GPT-5.2 achieves perfect ASR.
  • Prompt refinements and deterministic routing guards guided by ASR are proposed.

Entities

Sources