affects: inference, open-source
This drop
Gemma 4
TypeTooling
ShippedMay 7, 2026
Drops · Google
Gemma 4 multi-token prediction drafters speed up inference via speculative decoding
Google shipped multi-token prediction drafters for Gemma 4 — speculative decoding with shared KV cache; Google reports roughly 2x tokens/sec for the 26B model on an RTX PRO 6000 and ~2.2x on Apple Silicon edge models.
ProductGemma 4TypeToolingShippedMay 7, 2026