LLMs for Named Entity Recognition in Historical Texts

ai-technology · 2026-04-30

A new paper on arXiv (2508.18090) explores the use of large language models (LLMs) for Named Entity Recognition (NER) in historical texts. NER identifies proper names like people, organizations, locations, and dates. Traditional supervised methods require large annotated datasets, which are scarce for historical documents due to high labeling costs and expertise needs. Historical language also suffers from inconsistent spelling and archaic vocabulary. The study investigates LLMs' ability to perform NER without extensive training data, addressing these challenges.

Key facts

arXiv paper 2508.18090
Focuses on NER for historical texts
LLMs used as alternative to supervised learning
Historical texts lack annotated datasets
Challenges include spelling variability and archaic language
NER identifies people, organizations, locations, dates
Supervised approaches require large annotated data
Paper explores LLM versatility in NLP tasks

LLMs for Named Entity Recognition in Historical Texts

Key facts

Entities

Institutions

Sources