ARTFEED — Contemporary Art Intelligence

CoSee Framework Reveals Failure Modes in Collaborative Visual AI

ai-technology · 2026-06-01

A recent publication on arXiv presents CoSee, an auditing framework designed to formalize the read-write-verify loop for tracing information flow in document visual question answering. The study investigates the failure dynamics associated with collaborative reasoning using weak learners (4B–8B models) and the impact of noise accumulation. Through testing on multi-page, chart, and web-based benchmarks, the researchers discovered that naive shared workspaces tend to exacerbate hallucinations instead of mitigating them. They identified two primary failure modes: Noise Reinforcement, where ungrounded notes serve as evidence, and Policy Collapse, where additional context leads to under-specified, brief answers. The research indicates that higher computational power can adversely affect performance in low-capacity scenarios, as shown by cost-accuracy Pareto frontiers. This paper is cataloged as arXiv:2605.31354.

Key facts

  • Study focuses on failure modes of shared-state collaboration in resource-constrained visual agents
  • CoSee framework formalizes read-write-verify loop for tracing information flow
  • Weak learners of 4B–8B models are used in the experiments
  • Benchmarks include multi-page, chart, and web-based document VQA tasks
  • Naive shared workspaces can amplify hallucinations
  • Two failure modes identified: Noise Reinforcement and Policy Collapse
  • Cost-accuracy Pareto frontiers show increased compute can correlate negatively with performance
  • Paper published on arXiv with identifier 2605.31354

Entities

Institutions

  • arXiv

Sources