LLMs Outperform Fine-Tuned Models on Rare Suicide Circumstance Extraction

other · 2026-05-23

A recent study published on arXiv presents a 'Complexity Score' algorithm designed to assess when elaborate prompts enhance the performance of large language models (LLMs) compared to simple name-only prompts for extracting structured data from death investigation narratives. This research examines 25 complex inferential scenarios sourced from the National Violent Death Reporting System (NVDRS), evaluating LLMs against a fine-tuned version of RoBERTa. The findings indicate that LLMs significantly excel in low-prevalence situations with limited training data. The proposed framework is applicable across cutting-edge LLMs, including GPT-5.2, Gemini 2.5 Pro, and Llama-3 70B. Notably, suicide continues to be a primary cause of death in the United States, with many scenarios necessitating semantic inference beyond mere keyword matching.

Key facts

Suicide is a leading cause of death in the United States.
The study uses the National Violent Death Reporting System (NVDRS).
A 'Complexity Score' algorithm predicts when detailed prompts improve performance.
LLMs were compared against fine-tuned RoBERTa on 25 inferentially complex circumstances.
LLMs substantially outperform on low-prevalence circumstances.
The framework generalizes across GPT-5.2, Gemini 2.5 Pro, and Llama-3 70B.
Many circumstances require semantic inference beyond keyword matching.
The hybrid approach selects prompt strategy per circumstance.

Entities

Institutions

arXiv
National Violent Death Reporting System (NVDRS)

Locations

United States

Sources

arXiv cs.AI — 2026-05-23