New Framework Improves Multi-Role Dialogue Summarization with Reasoning and Rewards

other · 2026-04-30

A new framework for summarizing dialogues involving multiple roles has been introduced by researchers, integrating cognitive-style reasoning with reward-based optimization. Initially, structured reasoning traces are extracted from a large teacher model to set up a reasoning-aware summarizer through a phased supervised fine-tuning process. Subsequently, GRPO is employed, utilizing a dual-principle reward that merges metric-based signals with criteria aligned to human preferences for essential information coverage. This innovative approach overcomes the shortcomings of current methods that focus solely on surface-level metrics such as ROUGE and BERTScore, which fail to ensure accuracy or alignment with human expectations. The framework seeks to enhance factual consistency and maintain role-specific information in intricate multi-speaker dialogues.

Key facts

arXiv:2604.17188v2
Announce Type: replace-cross
Multi-role dialogue summarization requires modeling complex interactions among multiple speakers
Existing methods optimize for ROUGE and BERTScore
Proposed framework couples cognitive-style reasoning with reward-based optimization
Method distills structured reasoning traces from a large teacher model
Uses staged supervised fine-tuning to initialize a reasoning-aware summarizer
Applies GRPO with a dual-principle reward

Entities

—

Sources

arXiv cs.AI — 2026-04-29