Third-Place Solution in Hume-ABAW10 Emotion Mimicry Challenge
A team secured third place in the Hume-ABAW10 Emotional Mimicry Intensity (EMI) Challenge by utilizing a two-stage multimodal framework. This competition focused on predicting six continuous dimensions of emotion intensity: Admiration, Amusement, Determination, Empathic Pain, Excitement, and Joy, using real-world multimodal video clips. Their innovative framework integrates textual, acoustic, and visual data, with an optional motion component. Modality-specific encoders are trained separately and then combined through a lightweight regressor that employs modality dropout and controlled encoder adaptation. The highest validation performance achieved was an average Pearson correlation of 0.4722, realized by the text–audio–vision–motion fusion model under a 4:1 split. Although the motion branch yielded minimal improvements, it provided intriguing insights for further research.
Key facts
- Team placed third in Hume-ABAW10 EMI Challenge
- Predicts six emotion dimensions: Admiration, Amusement, Determination, Empathic Pain, Excitement, Joy
- Two-stage multimodal framework combines text, audio, vision, and optional motion
- Best validation Pearson correlation: 0.4722
- Model uses modality dropout and controlled encoder adaptation
- Motion branch yields slight gains
- Challenge focuses on in-the-wild multimodal video clips
- Framework trains modality-specific encoders independently before fusion
Entities
Institutions
- Hume-ABAW10
- EMI Challenge
- arXiv