LLM-Guided Semi-Supervised Learning for Crisis Tweet Classification
A recent study published on arXiv marks the inaugural empirical assessment of semi-supervised learning for classifying crisis-related tweets, utilizing large language models (LLMs). This research evaluates two contemporary techniques—LLM-guided Co-Training (LG-CoTrain) and VerifyMatch—against traditional baselines such as Self-Training. Findings indicate that LG-CoTrain greatly surpasses conventional methods in low-resource scenarios with 5, 10, and 25 labeled instances per category, achieving the best average Macro F1 score across various events. VerifyMatch demonstrates competitive results with effective calibration. As the quantity of labeled data increases, the performance differences diminish, and Self-Training emerges as a robust baseline. The research underscores LLMs' potential to enhance crisis data classification with limited labeled data, benefiting disaster response initiatives.
Key facts
- First empirical evaluation of LLM-guided semi-supervised learning for crisis tweet classification.
- Compares VerifyMatch and LG-CoTrain against established semi-supervised baselines.
- LG-CoTrain outperforms classical approaches with 5, 10, and 25 labeled examples per class.
- VerifyMatch achieves competitive performance with strong calibration.
- Performance gap narrows as labeled examples increase; Self-Training becomes strong baseline.
- Study focuses on social media data in disaster management contexts.
- Published on arXiv with ID 2605.08448.
- Compact semi-supervised models are observed.
Entities
Institutions
- arXiv