AI Pipeline Transcribes Medieval English Legal Manuscripts
A new open-source AI pipeline achieves 79% word accuracy in transcribing medieval English legal manuscripts written in abbreviated Latin. The dataset comprises 4,029 lines from 193 criminal and civil cases. The system uses R-Blla for line segmentation and CNN+LSTM with CTC decoding for handwriting recognition. Simple post-processing significantly boosts accuracy, despite a small training set and the challenge of expanding abbreviations. This project aims to democratize access to the records of the Anglo-American legal system, which are currently readable by only a few dozen scholars worldwide.
Key facts
- Dataset of 4,029 lines from 193 medieval cases
- Uses R-Blla and CNN+LSTM with CTC decoding
- 79% word accuracy achieved
- Post-processing significantly boosts accuracy
- Manuscripts in abbreviated medieval Latin
- Only a few dozen scholars can read them
- Open-source end-to-end pipeline
- Records of the first centuries of Anglo-American legal system
Entities
—