Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

ai-technology · 2026-05-06

Google has released Multi-Token Prediction (MTP) drafters for its Gemma 4 open models, which use speculative decoding to predict future tokens and achieve up to 3x faster generation. The Gemma 4 models, launched this spring, are built on the same technology as Google's frontier Gemini AI but are designed to run locally on user hardware. They can run at full precision on a single high-power AI accelerator or on a consumer GPU with quantization. Google also switched the Gemma 4 license to Apache 2.0, more permissive than previous custom licenses. MTP addresses hardware limitations of local AI by speeding up token generation.

Key facts

Google released Multi-Token Prediction (MTP) drafters for Gemma 4.
MTP uses speculative decoding to predict future tokens.
Gemma 4 models can achieve up to 3x faster generation with MTP.
Gemma 4 was launched in spring 2026.
Gemma 4 is built on the same technology as Gemini AI.
Gemma 4 is designed to run locally on user hardware.
Gemma 4 can run at full precision on a single high-power AI accelerator.
Gemma 4 license is Apache 2.0.
Gemma 4 can run on a consumer GPU with quantization.
MTP addresses hardware limitations of local AI.

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

Key facts

Entities

Institutions

Sources