straced/layer
/drops

Drops · Google

Gemma 4 multi-token prediction drafters speed up inference via speculative decoding

Google shipped multi-token prediction drafters for Gemma 4 — speculative decoding with shared KV cache; Google reports roughly 2x tokens/sec for the 26B model on an RTX PRO 6000 and ~2.2x on Apple Silicon edge models.

ProductGemma 4TypeToolingShippedMay 7, 2026