Fine-Tuning LLMs with Synthetic Data for Toxicity Detection in Gaming Chat
A team of researchers participated in the EEUCA 2026 Shared Task focused on Understanding Toxic Behavior in Gaming Communities, securing 4th place among 35 competing teams. This challenge required the classification of chat messages from World of Tanks into six categories of toxicity: Non-toxic, Insults/Flaming, Other Offensive, Hate/Harassment, Threats, and Extremism. Their most effective strategy integrated Llama 3.1 8B with a 5% augmentation of synthetic data through fine-tuning, achieving an F1-macro score of 0.6234 on the test dataset. Additionally, the research uncovered a 'validation trap' issue, highlighting that strong validation results did not necessarily lead to effective generalization on the test set.
Key facts
- System placed 4th out of 35 teams in EEUCA 2026 Shared Task
- Task involves classifying World of Tanks chat messages into six toxicity categories
- Best system uses Llama 3.1 8B with 5% synthetic data augmentation
- Achieved F1-macro score of 0.6234 on test set
- Identified 'validation trap' phenomenon in model generalization
- Explored encoder-based models, instruction-tuned LLMs with LoRA, hierarchical classification, one-vs-rest strategies, and ensemble methods
- Categories: Non-toxic, Insults/Flaming, Other Offensive, Hate/Harassment, Threats, Extremism
Entities
—