ARTFEED — Contemporary Art Intelligence

Google DeepMind Unveils Gemini 3.1 Flash TTS with Granular Audio Control

ai-technology · 2026-05-07

Google DeepMind has released Gemini 3.1 Flash TTS, a new text-to-speech model that introduces granular audio tags for precise control over AI-generated speech. The model allows users to direct expressive audio generation with fine-grained adjustments, marking a significant step in AI speech synthesis. This development builds on DeepMind's ongoing research in generative audio and aims to enhance applications in accessibility, content creation, and interactive systems.

Key facts

  • Gemini 3.1 Flash TTS is a new audio model from Google DeepMind.
  • It introduces granular audio tags for precise control of AI speech.
  • The model enables expressive audio generation.
  • It represents the next generation of AI speech technology.
  • The announcement was made on the Google DeepMind blog.
  • The model is designed for applications like accessibility and content creation.
  • Granular audio tags allow users to direct speech output.
  • This is part of DeepMind's ongoing work in generative audio.

Entities

Institutions

  • Google DeepMind

Sources