Google DeepMind Unveils Gemini 3.1 Flash TTS with Granular Audio Control
Google DeepMind has released Gemini 3.1 Flash TTS, a new text-to-speech model that introduces granular audio tags for precise control over AI-generated speech. The model allows users to direct expressive audio generation with fine-grained adjustments, marking a significant step in AI speech synthesis. This development builds on DeepMind's ongoing research in generative audio and aims to enhance applications in accessibility, content creation, and interactive systems.
Key facts
- Gemini 3.1 Flash TTS is a new audio model from Google DeepMind.
- It introduces granular audio tags for precise control of AI speech.
- The model enables expressive audio generation.
- It represents the next generation of AI speech technology.
- The announcement was made on the Google DeepMind blog.
- The model is designed for applications like accessibility and content creation.
- Granular audio tags allow users to direct speech output.
- This is part of DeepMind's ongoing work in generative audio.
Entities
Institutions
- Google DeepMind