ARTFEED — Contemporary Art Intelligence

Google's Gemma 4 AI models get 3x speed boost by predicting future tokens

ai-technology · 2026-05-06

Google has released Multi-Token Prediction (MTP) drafters for its Gemma 4 open models, which use speculative decoding to predict future tokens and achieve up to 3x faster generation. The Gemma 4 models, launched this spring, are built on the same technology as Google's frontier Gemini AI but are designed to run locally on user hardware. They can run at full precision on a single high-power AI accelerator or on a consumer GPU with quantization. Google also switched the Gemma 4 license to Apache 2.0, more permissive than previous custom licenses. MTP addresses hardware limitations of local AI by speeding up token generation.

Key facts

  • Google released Multi-Token Prediction (MTP) drafters for Gemma 4.
  • MTP uses speculative decoding to predict future tokens.
  • Gemma 4 models can achieve up to 3x faster generation with MTP.
  • Gemma 4 was launched in spring 2026.
  • Gemma 4 is built on the same technology as Gemini AI.
  • Gemma 4 is designed to run locally on user hardware.
  • Gemma 4 can run at full precision on a single high-power AI accelerator.
  • Gemma 4 license is Apache 2.0.
  • Gemma 4 can run on a consumer GPU with quantization.
  • MTP addresses hardware limitations of local AI.

Entities

Institutions

  • Google
  • Ars Technica

Sources