AI Pipeline Transcribes Medieval English Legal Manuscripts

digital · 2026-05-06

A new open-source AI pipeline achieves 79% word accuracy in transcribing medieval English legal manuscripts written in abbreviated Latin. The dataset comprises 4,029 lines from 193 criminal and civil cases. The system uses R-Blla for line segmentation and CNN+LSTM with CTC decoding for handwriting recognition. Simple post-processing significantly boosts accuracy, despite a small training set and the challenge of expanding abbreviations. This project aims to democratize access to the records of the Anglo-American legal system, which are currently readable by only a few dozen scholars worldwide.

Key facts

Dataset of 4,029 lines from 193 medieval cases
Uses R-Blla and CNN+LSTM with CTC decoding
79% word accuracy achieved
Post-processing significantly boosts accuracy
Manuscripts in abbreviated medieval Latin
Only a few dozen scholars can read them
Open-source end-to-end pipeline
Records of the first centuries of Anglo-American legal system

Entities

—

Sources

arXiv cs.AI — 2026-05-05