Gemma 4 models use a training trick to reduce their memory footprint

The promotional graphic for the Gemma 4 QAT models.

TL;DR

Gemma 4 models are now available for download with quantization-aware training (QAT), which reduces the size and memory footprint of the models.
These open source models retain quality better with QAT compared to those using post-training quantization (PTQ).
The QAT-optimized Gemma 4 models are available in five sizes: Gemma 4 E2B, Gemma 4 E4B, Gemma 4 12B, Gemma 4 26B A4B and Gemma 4 31B.

Following Google launches laptop-friendly Gemma 4 12B model Earlier this week, the company launched new Gemma 4 model checkpoints with quantization-aware training. Quantization is necessary to reduce the amount of memory required to run lightweight models. The standard method is post-training quantization (PTQ), which quantifies the model after training, but could result in weaker performance. The latest versions of Gemma 4 use quantization-aware training (QAT) to reduce model quality loss and speed up decoding speed, according to Google. blog post.

Google says that incorporating quantization into the training process results in checkpoints that perform better than models refined with PTQ. The compressed models work well on phones and laptops thanks to a custom mobile quantization scheme. This involves the use of precomputed settings, 2-bit compression on certain parts of the model and vocabulary list, and short-term memory compression. For the user, this results in a smaller model that consumes less system memory.

I don’t want to miss the best of Android Authority?

Various model sizes are available with QAT optimization, including Gemma 4 E2B, Gemma 4 E4B, Gemma 4 12B, Gemma 4 26B A4B and Gemma 4 31B. The smaller versions, like the text-only Gemma 4 E2B modelThey require less than a gigabyte of memory to run. These small Gemma 4 checkpoints without intensive resource requirements are ideal for running on phones.

Google shared the approximate memory requirements to load the new Gemma 4 models with QAT in various sizes:

The memory requirements of the Gemma 4 model sizes.

There are four different formats of Gemma 4 QAT models available for download: Unquantized QAT Checkpoints, GPT Generated Unified Format (GGUF), Mobile Optimized, and Compressed Tensors. These models retain “bfloat16-like quality while dramatically reducing the memory requirements to load the model,” according to Google.

After downloading the Gemma 4 QAT model weights, users can run the checkpoints on their phones, laptops or desktop computers. You can find the mobile and desktop models in Hugging Face, as well as in LM Studio.

Thank you for being part of our community. Read our Comment Policy before publishing.

Source link

Gemma 4 models use a training trick to reduce their memory footprint

Leave a ReplyCancel Reply

All Xbox and PC games shown during Summer Game Fest 2026: announcements, reveals, trailers, gameplay and more

Steam Machine summer release confirmed, but no price yet

New iOS 27 skins reportedly coming to these iPhone apps

Leave a ReplyCancel Reply

Trending now

All Xbox and PC games shown during Summer Game Fest 2026: announcements, reveals, trailers, gameplay and more

Steam Machine summer release confirmed, but no price yet

New iOS 27 skins reportedly coming to these iPhone apps