BERT Classifier Identifies 55K Letters in Ming-Qing Chinese Wenji
Lepton, an enhanced BERT classifier, determines if a title in Classical Chinese wenji contents refers to a personal letter or a potentially confusing preface. It is fine-tuned using 5,438 manually labeled titles from 33 literati of the late Ming and early Qing periods. Available on Hugging Face, Lepton has been utilized by the China Biographical Database (CBDB) to locate around 55,000 letters spanning from the mid-Ming to early Qing wenji, contributing to the Ming Letter Platform.
Key facts
- Lepton is a fine-tuned BERT classifier for personal-letter titles in Classical Chinese wenji.
- It distinguishes personal letters from closely confusable prefaces, especially farewell-preface.
- Fine-tunes bert-base-chinese on 5438 hand-labeled wenji titles.
- Titles come from thirty-three late-Ming and early-Qing literati.
- Model deployed on Hugging Face.
- Used at China Biographical Database (CBDB).
- Identified approximately fifty-five thousand letters across mid-Ming through early-Qing wenji.
- Populates the Ming Letter Platform.
Entities
Institutions
- China Biographical Database (CBDB)
- Ming Letter Platform
- Hugging Face