ARTFEED — Contemporary Art Intelligence

Decentralized Q-Learning for Multi-Agent Workflow Handoffs

other · 2026-05-20

A recent preprint on arXiv presents a structured approach to workflow learning within multi-agent systems, where specialized agents transfer control via a common artifact while only accessing local data. The study introduces an interface-constrained semi-Markov decision process (IC-SMDP) that features decision points occurring at handoff intervals. The researchers also introduce IC-Q, an asynchronous decentralized Q-learning method that restricts inter-agent coordination to a single scalar at each handoff. Additionally, a finite-sample bound for neural IC-Q is derived, breaking down error into three distinct components: neural function approximation, interface representation gap, and mixing-time residual based on random option-duration discount. This research is relevant to multi-agent LLM pipelines that function across trust or organizational boundaries, lacking a centralized learner to utilize joint trajectories.

Key facts

  • arXiv:2605.19140v1
  • Published on arXiv
  • Introduces IC-SMDP framework
  • Proposes IC-Q algorithm
  • Coordination limited to one scalar per handoff
  • Finite-sample bound for neural IC-Q
  • Three error sources identified
  • Targets multi-agent LLM pipelines

Entities

Institutions

  • arXiv

Sources