Hybrid Neurosymbolic Framework for Vietnamese NER
On arXiv (2605.04489), researchers have introduced a novel hybrid neurosymbolic framework designed for Named Entity Recognition (NER) in low-resource languages, focusing on Vietnamese. This framework employs a two-stage pipeline that merges rule-based techniques with deep learning models. Initially, a rule-based system simplifies label complexity by categorizing relational and special labels. Subsequently, pre-trained language models undergo fine-tuning to enhance extraction accuracy. A post-processing step reinstates detailed labels to maintain expressiveness. To combat the issue of data scarcity, a scalable data augmentation method utilizing Large Language Models (LLMs) is proposed. This strategy seeks to elevate NER effectiveness in areas with limited annotated datasets and diverse label classifications.
Key facts
- arXiv:2605.04489v1
- Hybrid neurosymbolic framework for Vietnamese NER
- Two-stage pipeline: rule-based then deep learning
- Post-processing module restores fine-grained labels
- LLM-based data augmentation for data scarcity
- Addresses limited annotated data and heterogeneous label sets
Entities
Institutions
- arXiv