When2Speak Dataset Improves LLM Turn-Taking in Multi-Party Conversations

ai-technology · 2026-05-09

When2Speak is a newly developed synthetic dataset aimed at enhancing the capability of large language models to discern appropriate moments for speaking during multi-party discussions. It comprises more than 215,000 instances sourced from 16,000 dialogues featuring between 2 and 6 participants, showcasing a variety of conversational styles, tones, and dynamics among speakers. The dataset specifically focuses on modeling SPEAK versus SILENT choices at each conversational turn. A four-phase generation process integrates real-world grounding, structured augmentation, controlled transcript creation, and supervision suitable for fine-tuning. Both the dataset and the generation pipeline are fully open-source, promoting reproducibility and adaptation for specific conversational domains. This initiative tackles a significant issue in LLM effectiveness, as existing models frequently cause interruptions in group interactions, harming overall coherence.

Key facts

Dataset named When2Speak
Over 215,000 examples
Derived from 16,000 conversations
Involves 2 to 6 speakers
Models SPEAK vs. SILENT decisions
Four-stage generation pipeline
Fully open-sourced
Addresses interruption problem in multi-party conversations

Entities

—

Sources

arXiv cs.AI — 2026-05-09