LLMs Boost Human Annotation Efficiency but Fall Short of Expert Reliability

ai-technology · 2026-04-30

A new study from arXiv evaluates the use of large language models (LLMs) in event annotation workflows, finding that while LLMs are not reliable as independent annotators compared to human experts, they significantly enhance expert efficiency. The research, submitted on March 10, 2025, tests a holistic pipeline that filters irrelevant documents, merges related event reports, and annotates event variables. LLM-based automated annotations outperform traditional TF-IDF methods and Event Set Curation but still lag behind human coders. However, when LLMs assist experts in Variable Annotation, they reduce time and cognitive load, with higher agreement on extracted variables than fully automated systems. The study highlights LLMs' potential as collaborative tools rather than replacements in tasks like market analysis, news monitoring, and sociological research.

Key facts

LLMs are not reliable independent annotators compared to human experts.
LLMs outperform TF-IDF-based methods and Event Set Curation.
LLMs reduce time and mental effort for expert annotators in Variable Annotation.
LLM-assisted experts show higher agreement on extracted variables than fully automated LLMs.
The study evaluates a holistic workflow: filtering, merging, and annotating events.
Event annotation is crucial for market changes, breaking news, and sociological trends.
Human coding is expensive and inefficient.
The research was submitted to arXiv on March 10, 2025.

LLMs Boost Human Annotation Efficiency but Fall Short of Expert Reliability

Key facts

Entities

Institutions

Sources