When2Speak Dataset Improves LLM Turn-Taking in Multi-Party Conversations
When2Speak is a newly developed synthetic dataset aimed at enhancing the capability of large language models to discern appropriate moments for speaking during multi-party discussions. It comprises more than 215,000 instances sourced from 16,000 dialogues featuring between 2 and 6 participants, showcasing a variety of conversational styles, tones, and dynamics among speakers. The dataset specifically focuses on modeling SPEAK versus SILENT choices at each conversational turn. A four-phase generation process integrates real-world grounding, structured augmentation, controlled transcript creation, and supervision suitable for fine-tuning. Both the dataset and the generation pipeline are fully open-source, promoting reproducibility and adaptation for specific conversational domains. This initiative tackles a significant issue in LLM effectiveness, as existing models frequently cause interruptions in group interactions, harming overall coherence.
Key facts
- Dataset named When2Speak
- Over 215,000 examples
- Derived from 16,000 conversations
- Involves 2 to 6 speakers
- Models SPEAK vs. SILENT decisions
- Four-stage generation pipeline
- Fully open-sourced
- Addresses interruption problem in multi-party conversations
Entities
—