Google Releases MTP Drafters for Gemma 4 — Up to 3x Faster Text Generation Without Quality Loss

Details: By Chris Borden; Category: Models; 1 w; 06 May 2026; 42

Google has released Multi-Token Prediction Drafters (MTP) for its open-source Gemma 4 model family, designed to accelerate text generation by up to three times.

Normally, large language models generate text one token at a time. At each individual step, billions of parameters must be loaded from memory — leaving the processor's compute core largely idle, waiting most of the time for data to arrive from memory.

This is exactly where MTP technology comes in. While the large main model is waiting on its data, a small, fast auxiliary model uses the spare compute capacity to propose several words at once. The main model then reviews these suggestions in a single batched pass. If the proposals are correct, all of them are accepted at once. Although two models are running simultaneously, the small auxiliary model fills idle cycles that would otherwise go to waste — producing the same output in significantly less time, reportedly without any loss in quality or accuracy.

According to Google, smartphones, local machines, and cloud applications all stand to benefit. The drafters are available under the open Apache 2.0 license on Hugging Face and Kaggle. The Gemma 4 open-weight model, introduced in early April, has already been downloaded more than 60 million times, according to Google.

MTP drafters represent a meaningful engineering step toward making large language models practical on consumer hardware — closing the gap between raw model capability and real-world deployment speed. If the reported 3x throughput gains hold across diverse workloads, this could significantly accelerate the adoption of on-device AI without requiring hardware upgrades.

About The Hosts

Chris Borden

AI Analyst & Technology Researcher

AI researcher and industry analyst covering decentralized infrastructure, AI systems, and emerging technology markets. Focused on data-driven analysis, long-term trends, and real-world adoption of artificial intelligence.

AI News

Accenture Tracks AI Tool Usage and Ties Adoption to Promotions

Adobe Firefly Introduces Unlimited AI Image and Video Generation for Subscribers

Adobe Unveils CX Enterprise AI Agent Platform as It Searches for a New CEO

AGI May Arrive by 2026–2027, Warns Anthropic CEO Dario Amodei

AI & Society

AI Agents Create a Lobster Religion on Moltbook

AI Could Trigger a Major U.S. Economic Crisis by 2028, Citrini Research Warns

AI Is Increasing Workload Instead of Reducing It, ActivTrak Study Finds

Amazon Launches Health AI Assistant in One Medical App

AI Insights

Adobe Reinvents Document Work with Acrobat Studio and AI

AI agents could disrupt ads and reshape internet commerce

AI as a Role Model for Generation Alpha: Promise, Risks, and the Future of Childhood

AI as a Toy: Why Humanity Always Misuses New Technology First

Google Releases MTP Drafters for Gemma 4 — Up to 3x Faster Text Generation Without Quality Loss

About The Hosts

More From Chris Borden

Platforms

Google DeepMind Prepares AI Cursor Powered by Gemini for Chrome and Googlebook

Work

Developers Say AI Coding Tools Are Creating Technical and Cognitive Debt

Work

Anthropic Expands Claude With New AI Tools for Legal Professionals

Platforms

Google Gemini Intelligence Brings New AI Features to Android

Robotics

Unitree Unveils World's First Mass-Production-Ready Manned Robot at $650,000

Models

OpenAI Launches GPT-5.5-Cyber, Three Voice Models, Codex for Chrome and Emergency Contact Feature

Models

Baidu Unveils Ernie 5.1 With Lower Training Costs and Strong Search Rankings

Models

GPT-5.5 Costs Rise Sharply as Real-World Usage Outpaces Official Pricing

Platforms

Anthropic Adds Dreaming and Multiagent Tools to Claude Managed Agents

Work

Chinese Court Rules Companies Cannot Fire Workers Solely for Being Replaced by AI

Categories

AI News

Categories

AI & Society

Categories

AI Insights

Google Releases MTP Drafters for Gemma 4 — Up to 3x Faster Text Generation Without Quality Loss

About The Hosts

More From Chris Borden