Google has launched Gemma 4



Built from the same research as Gemini 3, the new family ranges from a 2B edge model running on a Raspberry Pi to a dense 31B model that currently sits third in Arena AI’s open model rankings. The Apache 2.0 license is a significant change from previous versions of Gemma.

Google has launched Gemma 4the latest generation of its family of openweight models, in four sizes designed to cover everything from on-device inference on smartphones to workstation-style deployments.

The models are built from The same research and technology that underpins Gemini 3.Google’s proprietary frontier model, and are released under an Apache 2.0 license, more permissive terms than previous generations of Gemma and a change that Hugging Face co-founder Clément Delangue described as “a great milestone.”

Demis Hassabis, CEO of Google DeepMind, rated the new models “The best open models in the world for their respective sizes.”

The four variants are the Effective 2B (E2B) and Effective 4B (E4B) edge models, designed to run on devices on phones, Raspberry Pi and Jetson Nano hardware developed in collaboration with the Pixel team, Qualcomm and MediaTek; and the 26B Mixture-of-Experts (MoE) and 31B Dense models, intended for offline use on developer hardware and consumer GPUs.

The 31B Dense model currently ranks third among all open models in Arena AI’s text classification; the 26 billion Ministry of Education ranks sixth. Google claims that both larger models outperform models up to 20 times their size in that benchmark.

The 31B’s unquantized weights fit on a single 80GB Nvidia H100 GPU; Quantized versions run on commodity hardware.

All four models are multimodal, process videos and images natively, and are trained in over 140 languages. The E2B and E4B models also support native audio input for voice recognition. The context windows are 128,000 tokens for the edge models and 256,000 for the two largest variants.

In terms of capability, Google highlights improvements to multi-step reasoning, native function calls, and structured JSON output for agent workflows and offline code generation. As for performance, the Android Developer Blog notes that the E2B model runs three times faster than the E4B, while the Edge family overall is up to four times faster than previous versions of Gemma and uses up to 60% less battery.

The E2B and E4B models are also the basis for the Gemini Nano 4, Google’s next-generation model for Android devices, which will hit consumer devices later this year.

Gemma has amassed more than 400 million downloads and more than 100,000 community-created variants since its first release, a figure Google points to as evidence of wide-scale adoption by developers.

Gemma 4 is available immediately on Hugging Face, Kaggle and Ollama, with the 31B and 26B models accessible through Google AI Studio and the edge models through the AI ​​Edge Gallery.

The Apache 2.0 licensing decision is the most important commercial signal of the release: it removes restrictions that prevented some enterprise and commercial deployments under Gemma’s previous terms, opening the ecosystem to a broader range of production use cases.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *