Running local models on Mac gets faster with Ollama’s MLX support



Ollama, a runtime system for operating large language models on a local computer, has introduced support for Apple’s open source. mlx Framework for machine learning. Additionally, Ollama says it has improved caching performance and is now supported by Nvidia. NVFP4 format for model compression, making memory usage much more efficient on certain models.

Combined, these developments promise significantly improved performance on Macs with Apple Silicon chips (M1 or later), and the timing couldn’t be better, as local models are starting to gain traction like never before outside of the researcher and hobbyist communities.

The recent overwhelming success of OpenClaw, which reached more than 300,000 stars on GitHubmade the news with experiments like Moltbook and became an obsession in China In particular—has a lot of people experimenting with running models on their machines.

As developers grow frustrated with fee caps and the high cost of premium subscriptions to tools like Claude Code or ChatGPT Codex, experimentation with local coding models has intensified. (Ollama also recently expanded Visual Studio Code integration.)

The new support is available in preview (in Ollama 0.19) and currently only supports one model: the 35 billion parameter variant of Alibaba’s Qwen3.5. The hardware requirements are intense by normal user standards. Users need a Mac equipped with Apple Silicon, sure, but they also need at least 32GB of RAM, according to Ollama’s advertisement.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *