New Framework Improves Multi-Role Dialogue Summarization with Reasoning and Rewards
A new framework for summarizing dialogues involving multiple roles has been introduced by researchers, integrating cognitive-style reasoning with reward-based optimization. Initially, structured reasoning traces are extracted from a large teacher model to set up a reasoning-aware summarizer through a phased supervised fine-tuning process. Subsequently, GRPO is employed, utilizing a dual-principle reward that merges metric-based signals with criteria aligned to human preferences for essential information coverage. This innovative approach overcomes the shortcomings of current methods that focus solely on surface-level metrics such as ROUGE and BERTScore, which fail to ensure accuracy or alignment with human expectations. The framework seeks to enhance factual consistency and maintain role-specific information in intricate multi-speaker dialogues.
Key facts
- arXiv:2604.17188v2
- Announce Type: replace-cross
- Multi-role dialogue summarization requires modeling complex interactions among multiple speakers
- Existing methods optimize for ROUGE and BERTScore
- Proposed framework couples cognitive-style reasoning with reward-based optimization
- Method distills structured reasoning traces from a large teacher model
- Uses staged supervised fine-tuning to initialize a reasoning-aware summarizer
- Applies GRPO with a dual-principle reward
Entities
—