Google DeepMind Unveils Gemini 3.1 Flash TTS with Granular Audio Control

ai-technology · 2026-05-07

Google DeepMind has released Gemini 3.1 Flash TTS, a new text-to-speech model that introduces granular audio tags for precise control over AI-generated speech. The model allows users to direct expressive audio generation with fine-grained adjustments, marking a significant step in AI speech synthesis. This development builds on DeepMind's ongoing research in generative audio and aims to enhance applications in accessibility, content creation, and interactive systems.

Key facts

Gemini 3.1 Flash TTS is a new audio model from Google DeepMind.
It introduces granular audio tags for precise control of AI speech.
The model enables expressive audio generation.
It represents the next generation of AI speech technology.
The announcement was made on the Google DeepMind blog.
The model is designed for applications like accessibility and content creation.
Granular audio tags allow users to direct speech output.
This is part of DeepMind's ongoing work in generative audio.

Google DeepMind Unveils Gemini 3.1 Flash TTS with Granular Audio Control

Key facts

Entities

Institutions

Sources