LLMs Outperform Fine-Tuned Models on Rare Suicide Circumstance Extraction
A recent study published on arXiv presents a 'Complexity Score' algorithm designed to assess when elaborate prompts enhance the performance of large language models (LLMs) compared to simple name-only prompts for extracting structured data from death investigation narratives. This research examines 25 complex inferential scenarios sourced from the National Violent Death Reporting System (NVDRS), evaluating LLMs against a fine-tuned version of RoBERTa. The findings indicate that LLMs significantly excel in low-prevalence situations with limited training data. The proposed framework is applicable across cutting-edge LLMs, including GPT-5.2, Gemini 2.5 Pro, and Llama-3 70B. Notably, suicide continues to be a primary cause of death in the United States, with many scenarios necessitating semantic inference beyond mere keyword matching.
Key facts
- Suicide is a leading cause of death in the United States.
- The study uses the National Violent Death Reporting System (NVDRS).
- A 'Complexity Score' algorithm predicts when detailed prompts improve performance.
- LLMs were compared against fine-tuned RoBERTa on 25 inferentially complex circumstances.
- LLMs substantially outperform on low-prevalence circumstances.
- The framework generalizes across GPT-5.2, Gemini 2.5 Pro, and Llama-3 70B.
- Many circumstances require semantic inference beyond keyword matching.
- The hybrid approach selects prompt strategy per circumstance.
Entities
Institutions
- arXiv
- National Violent Death Reporting System (NVDRS)
Locations
- United States