MICA: A New RL Framework for Multi-Turn Emotional Support Dialogue

other · 2026-05-07

A novel framework called MICA (Multi-granularity Intertemporal Credit Assignment) has been introduced by researchers to enhance multi-turn emotional support dialogues using large language models, eliminating the need for a critic in reinforcement learning. MICA tackles the challenges of sparse rewards and ineffective credit assignment by utilizing a common potential function to derive both immediate and delayed credit based on the user's structured support state. It incorporates Incremental Distance Reward for assessing progress towards the target state and employs Monte Carlo returns to account for delayed impacts. These signals, once normalized, create a mixed advantage for consistent per-turn optimization without requiring matched-state comparisons. This framework is particularly aimed at long-horizon emotional support tasks that influence future user states. The paper can be found on arXiv with ID 2603.06194.

Key facts

MICA stands for Multi-granularity Intertemporal Credit Assignment
It is a critic-free RL framework for multi-turn emotional support dialogue
Addresses sparse rewards and poor per-turn credit assignment in LLMs
Uses Incremental Distance Reward to measure per-turn progress
Monte Carlo returns capture delayed effects of actions
Scope-specific normalization creates a mixed advantage signal
No matched-state comparison is needed
Paper available on arXiv: 2603.06194

MICA: A New RL Framework for Multi-Turn Emotional Support Dialogue

Key facts

Entities

Institutions

Sources