
During the WWDC26 keynote, Apple announced its third generation of Apple Foundation Models (AFM), comprising five models, some of which are on-premises, some of which are cloud-based, and one of which resides on Google servers running on Nvidia chips. Here’s a breakdown of how it will work.
A little background
When Apple first announced its basic models in 2024the line included an on-device language model with about 3 billion parameters and “a larger server-based language model available with Private Cloud Compute and running on Apple silicon servers,” as the company said. put it At the moment.
Private Cloud Computing was a ambitious company, as it aimed to offer cloud-based artificial intelligence capabilities while preserving the same privacy guarantees that users expect from on-device processing.
For this reason, it was essential to keep everything in-house. Private Cloud Compute ran in Apple data centers, on servers powered by Apple silicon. Still, third-party security researchers could independently verify its privacy guarantees.
However, as Apple struggled to get its AI aspirations off the ground, the company associated with Google to use Gemini as the backbone of its new AI efforts, the results of which it announced earlier this week during the WWDC26 keynote.
Apple’s new base models
The third generation of AFM includes five models: AFM 3 cores and AFM Code 3 Advancedwhich are models on the device, and AFM Cloud, ADM Cloud 3 (image)and AFM 3 Cloud Prowhich are server based. The D in ADM 3 Cloud (Image) stands for broadcast, a technology we have covered in the past. here.
Except for the AFM 3 Cloud Pro, all other models were created to run on Apple Silicon devices. AFM 3 Cloud Pro, meanwhile, runs on NVIDIA GPUs hosted on Google Cloud.
This was possible thanks to Apple. extended its Private Cloud Compute architecture to third-party infrastructure for the first time, “while maintaining Apple’s powerful security and privacy protections,” according to the company.
As for the models themselves, here’s a breakdown of each one, as Apple explains:
- AFM 3 Core, the next generation of our dense 3 billion parameter model that offers a step forward in quality.
- AFM 3 Core Advanced, our most powerful device model. It is natively multimodal and enables useful features such as expressive voices and higher precision dictation. Based on Apple’s cutting-edge research, this 20 billion parameter model uses a sparse architecture and activates only 1 to 4 billion parameters at a time, depending on the application. AFM 3 Core Advanced is unlocked and optimized for our most capable Apple Silicon systems.
- AFM 3 Cloud, our server-side workhorse, optimized for speed, efficiency and performance.
- ADM 3 Cloud (Image), for image generation and editing, unlocking advanced photo editing tools, the new Image Playground and more.
- AFM 3 Cloud Pro, our most capable server-based model, driving our most demanding use cases, such as agent tooling and complex reasoning.
The highlights here are AFM 3 Core Advanced and AFM 3 Cloud Pro.
Starting with AFM 3 Core Advanced, it packs 20 billion parameters into one on-device model, which is no small feat. Most device models aimed at the general public tend to stay in the billions of single-digit parameters.
For AFM 3 Core Advanced to work well, Apple used a sparse architecture that activates up to 4 billion parameters at a time, depending on the message, rather than a dense architecture that would need to keep all 20 billion parameters active for each request.
Although conceptually similar to the Expert Mix ApproachThis selective activation is based on a technique invented by Apple and detailed in the interesting study Instruction Tracing Pruning for Large Language Models released a year ago.

As for AFM 3 Cloud Pro, this is the one that runs on external infrastructure. You can read some of the technical details of this expansion at This article posted on Apple’s security blog earlier this week, but here’s the most important part:
Building on this, Apple and Google collaborated to develop capabilities that go far beyond a traditional confidential computing implementation:
- We do not rely solely on sensitive computing technologies to mitigate attacks that leverage privileged access outside of a sensitive virtual machine, including side-channel attacks. We consider all components (from firmware to host and guest operating system stacks to application code) to be part of our trusted computing foundation, subject to our verifiable transparency and unprivileged access guarantees.
- To mitigate the risk of supply chain attacks, we maintain a cryptographically verifiable, annex-only ledger of all Google Cloud hardware that is part of the PCC fleet. For components that could be abused to extract user data if compromised, our software certification is based on at least two separate roots of trust from independent vendors.
- Even when deployed with confidential computing, we believe the inference stack should be designed with privacy and security in mind from the beginning. PCC on Google Cloud leverages many of the same architectural security patterns as PCC on Apple silicon to implement these layered protections: initial analysis of network data for each request occurs in a dedicated process within its own namespace, shared inference software is recycled with a short lifetime, and certified keys are stored in a separate, dedicated confidential virtual machine, isolated from external input.
On its Machine Learning Research blog, Apple says that the five models “shared a common initial foundation before specializing in their respective architectures and use cases, adding multimodal capabilities such as audio, image understanding, long context reasoning, and high-quality visual generation.”
The company adds that to train these models, it used “a combination of data that includes publicly available information, data licensed or purchased from third parties, open source data, data obtained through dedicated studies, and synthetic data.” Apple also emphasizes that the training process did not include user data or interactions and that web publishers can opt out of basic model training.
The results
Apple says it conducted extensive human evaluations of its base third-generation models, with internal reviewers rating responses in categories such as following instructions, veracity, presentation and image understanding.
The models were evaluated against their predecessors (where applicable) and you can see some of the results below:

Fraction of preferred responses in parallel human evaluations of general text capabilities, comparing AFM 3 Core and AFM 3 Cloud with our previous generation of models. Results are presented in four different local groups to demonstrate consistent performance across all international variants. “English” represents our global English evaluation set, while “PFIGSCJK”, “DNNSTV”, and “AFIHHMPRTU” represent our remaining supported global locales.

Fraction of preferred responses in parallel human assessments of English picture comprehension abilities. The results compare AFM 3 Core and AFM 3 Cloud with their 2025 predecessors.

Fraction of preferred responses in parallel human evaluations for dictation tasks. The results compare AFM 3 Core Advanced to Apple’s existing production dictation system across seven dimensions of quality. AFM 3 Core Advanced demonstrates a positive gain rate in overall quality, with preference spanning consistently across all individual dimensions of formatting and comprehension.
To dive even deeper into the third-generation Apple Foundation models, follow this link.
Worth checking out on Amazon
FTC: We use automatic affiliate links that generate income. Further.







