ARTFEED — Contemporary Art Intelligence

New Framework Improves Multi-Role Dialogue Summarization with Reasoning and Rewards

other · 2026-04-30

A new framework for summarizing dialogues involving multiple roles has been introduced by researchers, integrating cognitive-style reasoning with reward-based optimization. Initially, structured reasoning traces are extracted from a large teacher model to set up a reasoning-aware summarizer through a phased supervised fine-tuning process. Subsequently, GRPO is employed, utilizing a dual-principle reward that merges metric-based signals with criteria aligned to human preferences for essential information coverage. This innovative approach overcomes the shortcomings of current methods that focus solely on surface-level metrics such as ROUGE and BERTScore, which fail to ensure accuracy or alignment with human expectations. The framework seeks to enhance factual consistency and maintain role-specific information in intricate multi-speaker dialogues.

Key facts

  • arXiv:2604.17188v2
  • Announce Type: replace-cross
  • Multi-role dialogue summarization requires modeling complex interactions among multiple speakers
  • Existing methods optimize for ROUGE and BERTScore
  • Proposed framework couples cognitive-style reasoning with reward-based optimization
  • Method distills structured reasoning traces from a large teacher model
  • Uses staged supervised fine-tuning to initialize a reasoning-aware summarizer
  • Applies GRPO with a dual-principle reward

Entities

Sources