Low-Resource Languages in Semantic Web Defined
A new methodology categorizes languages in Linked Open Data Knowledge Graphs as low-, medium-, or high-resource, aiming to address the digital divide. The study uses DBpedia, BabelNet, and Wikidata to propose a formal definition for low-resource languages, enabling better cross-lingual transfer in AI. This work highlights how emerging technologies exacerbate Open Access Data inequality, excluding many communities from digital transformation.
Key facts
- Emerging digital technologies worsen the divide in Open Access Data between high- and low-resource languages.
- Multilingual Linked Open Data Knowledge Graphs could mitigate the divide through cross-lingual transfer.
- No clear quantitative definition of low-resource languages existed for LOD KGs before this study.
- The methodology analyzes language distribution across LOD KGs.
- A preliminary multi-level categorization is based on DBpedia, BabelNet, and Wikidata.
- The categorization provides formal definitions for low-, high-, and medium-resource languages.
- The definitions can be used to select cross-lingual transfer candidates.
- The work is presented as a poster in the field of Computer Science > Artificial Intelligence.
Entities
Institutions
- DBpedia
- BabelNet
- Wikidata
- Semantic Scholar
- arXiv