diffusiongemma-26B-A4B-it

diffusiongemma‑26B‑A4B‑it is a 26B‑parameter (4B active) Gemma 4 MoE block‑diffusion model that generates 256‑token chunks via iterative denoising, delivering up to ~6× higher per‑request throughput than autoregressive Gemma 4 26B‑A4B with a 256K multimodal context window under Apache 2.0.

Core Model

Chat

Getting Started

Step 1

Step 2

Paste the URL from Huggingface repository: https://huggingface.co/google/diffusiongemma-26B-A4B-it

Step 3

Choose the GPU machine to deploy.

That’s all! You’re ready to use the model in few minutes without infrastructure complexity in few minutes.

Additional Information

Credits to Google

Applications & Use Cases

High‑throughput content generation services (summaries, drafts, marketing copy, documentation) where batched requests and total completion time matter more than streaming the first token ASAP.
Long‑context multimodal analysis over documents plus images (and short videos via frames) using the 256K context window and thinking mode for large‑batch offline processing.
Function‑calling and agent backends that execute multi‑step reasoning in a single large response, benefiting from the MoE efficiency (4B active) and block‑wise decoding.
Cost‑sensitive deployments that want near‑Gemma‑4‑31B quality with better GPU utilization and higher tokens‑per‑second per request on a single H100 or similar GPU.