Sticky HDP-HMM Framework for Persistent Emotional State Tracking
A novel lightweight framework has been introduced for monitoring ongoing emotional states during conversations, utilizing sticky factorial HDP-HMMs alongside multimodal valence-arousal data from video, audio, and text. This method conceptualizes conversational emotions as a series of underlying emotional regimes, overcoming the challenges posed by utterance-level emotion recognition that fails to capture sustained phases. When assessed with LLM-as-a-Judge, geometric, and temporal consistency metrics, the sticky HDP-HMM yields more interpretable regime sequences compared to standard Gaussian HMMs, while significantly reducing the computational demands associated with LLM-based dialogue state tracking techniques. This framework aims to enhance communication understanding and guidance, particularly in clinical conversational settings.
Key facts
- Framework uses sticky factorial HDP-HMMs
- Models emotion as latent regimes
- Inputs: video, audio, text
- Outputs: valence-arousal representations
- Evaluated with LLM-as-a-Judge, geometric, temporal consistency metrics
- Outperforms Gaussian HMM in interpretability
- Lower computational cost than LLM-based methods
- Targeted at clinical conversational contexts
Entities
—