Human-AI Complementarity Shows Modest Gains Across Diverse Tasks

ai-technology · 2026-05-07

A recent study published on arXiv explores the potential of human-AI collaboration to enhance task performance in realistic scenarios. Researchers utilized a diverse dataset comprising 1,886 samples that examined knowledge, factual accuracy, long-context reasoning, and deception detection. They evaluated hybridization alongside two AI support strategies: top-2 assistance and subtask delegation. The baseline hybridization yielded only a marginal improvement of +0.4 percentage points compared to AI alone (69.3% versus 68.9%). This limited success stems from a narrow complementarity region (8.9% of cases where AI fails but humans succeed) and the ineffectiveness of confidence-based routing, as model confidence levels are similarly spread across both accurate and inaccurate predictions. These results indicate that establishing effective human-AI complementarity is still a significant challenge.

Key facts

Study published on arXiv (2605.04070)
Dataset of 1,886 samples across knowledge, factuality, long-context reasoning, and deception detection
Baseline hybridization yields +0.4 percentage points over AI alone
AI alone accuracy: 68.9%
Hybridization accuracy: 69.3%
Complementarity region: only 8.9% of items
Confidence-based routing fails due to similar confidence distributions
Two AI assistance methods tested: top-2 assistance and subtask delegation

Human-AI Complementarity Shows Modest Gains Across Diverse Tasks

Key facts

Entities

Institutions

Sources