ARTFEED — Contemporary Art Intelligence

Rank-Aware Fusion Improves Blended Emotion Recognition

ai-technology · 2026-05-22

A new multi-encoder framework for blended emotion recognition is proposed, which selectively fuses the most informative pre-extracted video and audio encoders. The method projects heterogeneous features into a shared latent space, estimates encoder importance via attention-based gating, and fuses only the top-n encoders. It decouples prediction into presence and salience heads, aligned through probability-level fusion, and incorporates unsupervised domain adaptation for robustness. Experiments on the BlEmoRE challenge show it outperforms strong individual encoders and naive multi-encoder baselines.

Key facts

  • Proposed rank-aware multi-encoder framework for blended emotion recognition
  • Selectively fuses top-n most informative pre-extracted video and audio encoders
  • Projects heterogeneous encoder features into a shared latent space
  • Estimates sample-wise encoder importance via attention-based gating module
  • Decouples prediction into presence and salience heads
  • Aligns heads through probability-level fusion
  • Incorporates feature-level unsupervised domain adaptation without pseudo-labeling
  • Outperforms strong individual encoders and naive multi-encoder baselines on BlEmoRE challenge

Entities

Sources