Google Gemma 4 Is Out Today and the Numbers Are Hard to Ignore

Google dropped something today: Gemma 4, the newest generation of its open-weight model family, built from the same research stack that powers Gemini 3. Four models, Apache 2.0 license, and a claim that sounds like a direct challenge to the rest of the industry: “unprecedented intelligence per parameter.”

Let’s break down what that actually means, and why it matters even if you’re not a developer.

Four Models for Every Setup

Gemma 4 comes in four sizes, and Google has been unusually specific about where each one fits:

  • Gemma 4 E2B (2 billion parameters) — designed for smartphones, Raspberry Pi, and Jetson Nano devices. Can run with near-zero latency on phones. Supports audio input natively.
  • Gemma 4 E4B (4 billion parameters) — same edge-device focus, more capable. Also handles audio. The “E” stands for “Effective,” meaning Google engineered these to punch above their weight on resource-constrained hardware.
  • Gemma 4 26B MoE (26 billion parameters, Mixture of Experts) — the smart middle ground. MoE architecture means the model only activates a fraction of its parameters at any time, making it more efficient than a traditional 26B dense model. Ranked #6 globally on Arena AI.
  • Gemma 4 31B Dense (31 billion parameters) — the flagship. Ranked #3 on Arena AI’s text leaderboard. Against models 20 times larger.

That last point deserves a moment. A 31B model finishing third on a leaderboard that includes models with hundreds of billions of parameters is not a minor technical footnote. It’s the entire pitch.

What Can These Things Actually Do?

All four Gemma 4 models share a common baseline of capabilities:

  • Multimodal input — every model processes video and images. Useful for OCR, chart understanding, and visual analysis without a separate vision model.
  • Audio understanding — the E2B and E4B edge models handle speech input natively. Practical for on-device voice assistants that don’t send data to a remote server.
  • Long context windows — 128K tokens for the edge models, 256K for the larger ones. At 256K you can feed an entire codebase or long document in a single prompt.
  • 140+ languages — trained natively, not via translation layers.
  • Agentic workflows — native function-calling, structured JSON output, system instructions. Google is explicitly positioning these for autonomous agent use, not just chat.
  • Offline code generation — you can run a capable local code assistant without an internet connection. For developers with sensitive codebases or patchy connectivity, this is genuinely useful.

The Apache 2.0 Shift Is a Bigger Deal Than It Sounds

Previous Gemma models shipped under Google’s own custom license, which had restrictions. Gemma 4 is Apache 2.0, one of the most permissive open-source licenses in existence. You can use it commercially, modify it, redistribute it, and build products on top of it. No royalties, no special agreements, no asking Google for permission.

Google’s own framing: “complete developer flexibility and digital sovereignty; granting you complete control over your data, infrastructure and models.”

That’s a direct response to the narrative that AI means handing your data to a big tech company. If you run Gemma 4 locally, your data doesn’t go anywhere. For enterprises with privacy requirements, healthcare organizations, or anyone operating under strict data regulations, this changes the calculus on whether local AI is viable.

The timing matters too. Meta’s Llama 4 has dominated the open-weight AI conversation for months. Google is signaling it wants back in, with models that perform better at equivalent parameter counts and a license that’s arguably cleaner.

How Does It Stack Up Against the Competition?

Arena AI’s text leaderboard is the closest thing the industry has to an impartial benchmark, because it uses crowdsourced human preferences rather than automated test suites that labs can optimize to game. Gemma 4 31B at #3 means real humans, comparing real outputs, preferred it over most of what’s available.

The 26B MoE at #6 is also worth noting. MoE architectures have a reputation for being fast and cheap to run but sometimes inconsistent in quality. A top-6 ranking suggests Google managed to keep quality high while keeping compute requirements lower than a comparable dense model.

For context: most closed proprietary models from major labs cluster in the top 10-20 on this leaderboard. Gemma 4’s two largest models are competing directly with them, not just within the open-source tier.

This fits a broader pattern worth watching. The open-source AI wave has been steadily closing the gap between what you can run locally and what requires a cloud API. Models like Qwen, Mistral, and now Gemma 4 keep moving that line. The AI coding war that dominated 2025 is now playing out on a wider front, with open-weight models claiming territory that was proprietary six months ago.

Where to Get It

Google has made Gemma 4 available through the standard developer channels:

  • Hugging Face — model weights at google/gemma-4
  • Kaggle — for experimentation without local setup
  • Ollama — the simplest path if you want to run it locally with a single command
  • Google AI Studio — the 31B and 26B variants in a hosted environment
  • Google AI Edge Gallery — for the E2B and E4B edge models

The Ollama path is worth calling out specifically. If you have a capable enough laptop, you can be running a top-6-on-Arena model locally within minutes. That was not the situation with open-weight releases at this quality level a year ago.

Why This Matters Beyond Developers

The practical implication of models like Gemma 4 isn’t just about code. It’s about what kind of AI infrastructure becomes viable outside of Big Tech’s cloud services.

Hospitals that can’t send patient data to OpenAI can run Gemma 4 on-premises. Schools in regions with unreliable internet can deploy it locally. Independent developers in countries where API costs are prohibitive can build on it without ongoing subscription fees. Journalists working in environments where US cloud services carry legal or safety risks have an option that doesn’t require those services.

There’s also a competitive dynamics angle. The more capable open-weight models become, the harder it is for any single provider to maintain lock-in. That’s good for users and problematic for the kind of platform monopolies that form in AI markets. The top-tier closed models still have advantages in specific benchmarks, but the gap is narrowing in ways that weren’t true twelve months ago.

Gemma 4 isn’t the end of that story. But it’s a meaningful point in the trajectory.

The Short Version

Four open-weight models, Apache 2.0 license, top-3 on Arena’s leaderboard, runs on a phone or a workstation. Built from Gemini 3 research. Available today on Hugging Face, Kaggle, and Ollama.

Google needed a statement in the open-source AI space after Meta’s Llama 4 dominated the conversation. Gemma 4 is that statement. Whether the benchmark numbers hold up under real-world use is the next question, but the initial figures are hard to wave away.


🐾 Visit the Pudgy Cat Shop for prints and cat-approved goodies, or find our illustrated books on Amazon.

Pudgy Cat

Join the Pudgy Cat club!

We don’t spam! Read our privacy policy for more info.

Share this story

Leave a Reply

Your email address will not be published. Required fields are marked *