Fine-Tuning LLMs with Synthetic Data for Toxicity Detection in Gaming Chat

other · 2026-05-11

A team of researchers participated in the EEUCA 2026 Shared Task focused on Understanding Toxic Behavior in Gaming Communities, securing 4th place among 35 competing teams. This challenge required the classification of chat messages from World of Tanks into six categories of toxicity: Non-toxic, Insults/Flaming, Other Offensive, Hate/Harassment, Threats, and Extremism. Their most effective strategy integrated Llama 3.1 8B with a 5% augmentation of synthetic data through fine-tuning, achieving an F1-macro score of 0.6234 on the test dataset. Additionally, the research uncovered a 'validation trap' issue, highlighting that strong validation results did not necessarily lead to effective generalization on the test set.

Key facts

System placed 4th out of 35 teams in EEUCA 2026 Shared Task
Task involves classifying World of Tanks chat messages into six toxicity categories
Best system uses Llama 3.1 8B with 5% synthetic data augmentation
Achieved F1-macro score of 0.6234 on test set
Identified 'validation trap' phenomenon in model generalization
Explored encoder-based models, instruction-tuned LLMs with LoRA, hierarchical classification, one-vs-rest strategies, and ensemble methods
Categories: Non-toxic, Insults/Flaming, Other Offensive, Hate/Harassment, Threats, Extremism

Entities

—

Sources

arXiv cs.AI — 2026-05-11