ARTFEED — Contemporary Art Intelligence

LLMs Outperform Fine-Tuned Models on Rare Suicide Circumstance Extraction

other · 2026-05-23

A recent study published on arXiv presents a 'Complexity Score' algorithm designed to assess when elaborate prompts enhance the performance of large language models (LLMs) compared to simple name-only prompts for extracting structured data from death investigation narratives. This research examines 25 complex inferential scenarios sourced from the National Violent Death Reporting System (NVDRS), evaluating LLMs against a fine-tuned version of RoBERTa. The findings indicate that LLMs significantly excel in low-prevalence situations with limited training data. The proposed framework is applicable across cutting-edge LLMs, including GPT-5.2, Gemini 2.5 Pro, and Llama-3 70B. Notably, suicide continues to be a primary cause of death in the United States, with many scenarios necessitating semantic inference beyond mere keyword matching.

Key facts

  • Suicide is a leading cause of death in the United States.
  • The study uses the National Violent Death Reporting System (NVDRS).
  • A 'Complexity Score' algorithm predicts when detailed prompts improve performance.
  • LLMs were compared against fine-tuned RoBERTa on 25 inferentially complex circumstances.
  • LLMs substantially outperform on low-prevalence circumstances.
  • The framework generalizes across GPT-5.2, Gemini 2.5 Pro, and Llama-3 70B.
  • Many circumstances require semantic inference beyond keyword matching.
  • The hybrid approach selects prompt strategy per circumstance.

Entities

Institutions

  • arXiv
  • National Violent Death Reporting System (NVDRS)

Locations

  • United States

Sources