Human-AI Complementarity Shows Modest Gains Across Diverse Tasks
A recent study published on arXiv explores the potential of human-AI collaboration to enhance task performance in realistic scenarios. Researchers utilized a diverse dataset comprising 1,886 samples that examined knowledge, factual accuracy, long-context reasoning, and deception detection. They evaluated hybridization alongside two AI support strategies: top-2 assistance and subtask delegation. The baseline hybridization yielded only a marginal improvement of +0.4 percentage points compared to AI alone (69.3% versus 68.9%). This limited success stems from a narrow complementarity region (8.9% of cases where AI fails but humans succeed) and the ineffectiveness of confidence-based routing, as model confidence levels are similarly spread across both accurate and inaccurate predictions. These results indicate that establishing effective human-AI complementarity is still a significant challenge.
Key facts
- Study published on arXiv (2605.04070)
- Dataset of 1,886 samples across knowledge, factuality, long-context reasoning, and deception detection
- Baseline hybridization yields +0.4 percentage points over AI alone
- AI alone accuracy: 68.9%
- Hybridization accuracy: 69.3%
- Complementarity region: only 8.9% of items
- Confidence-based routing fails due to similar confidence distributions
- Two AI assistance methods tested: top-2 assistance and subtask delegation
Entities
Institutions
- arXiv