Large Language Models (LLM) are incredibly useful. They’re not perfect, but when ordered and used effectively, they can improve your productivity and allow you to free up valuable time for other tasks. Most of the hard work is done in the cloud with ChatGPTClaude, NotebookLM and Copilot, to name just a few. Servers in data centers are primed to handle incoming requests, and you’ve probably read a news story or two about how much power these vast complexes require to handle AI. If you thought Bitcoin mining wasted resources, you’ll be surprised to see AI doing the same.
But that’s where running your own LLMs can make a big difference. Being able to load highly optimized models in free and open software, requiring only a PC to run it and some electricity every time the system speeds up to handle your requests. It will never be as smooth and capable as cloud-based AI, but as long as you keep expectations in check and learn how best to request each model, you can achieve incredible results with nothing more than a discrete GPU and a basic desktop setup. take a NVIDIA GeForce RTX 5090and suddenly you have access to some really powerful models.
Using Ollama powered by LXC and Open WebUI
It’s easy, fast and what I already know.
I never really bothered to use the CPU or specifically the integrated GPU found on the chip. That was until I decided I had enough of my LLM box consuming 100 watts at idle and up to 300 watts or so when handling a request. I swapped it for a compact, low-power mini PC with a fairly mediocre processor and the results weren’t as bad as I expected. I decided to keep the mini PC running as my new LLM box, excited to see how the future refines the models even further and improves things with some pretty heavy restrictions. Should you run LLM on a cheap mini PC? Not if you’re expecting ChatGPT levels of responsiveness, but it can be a fun project.
I fired up Proxmox on the mini PC, verified that all available CPU cores and RAM were locked and loaded. Then a quick trip to the Proxmox community scripts page to take the command to install Open WebUI with Ollama. Once I had it installed and configured with a dedicated IP address via OPNsense, replacing the previous Open WebUI running on a beefier PC, I was ready to go. Like many other home lab projects, there are countless ways to do it, but I felt like the Proxmox and an LXC were the best way to make the most of the hardware available.
I’m not looking for the best possible result (the CPU has a TDP of only 15W), at least not yet. I know I’m going to have hardware limitations before anything Ollama related. Using llama.cpp can provide a performance boost, but even then, there is the question of whether it is worth it. This is something I will discuss later. For reference, this Minisforum U850 mini PC has an Intel Core i5-10210U CPU with four cores and 16 GB of DDR4-2666 RAM. That’s pretty disappointing for a local LLM setup, especially the memory since we’ll be limited to the CPU and that RAM is super slow compared to DDR5 and a discrete GPU.
Low-power CPUs are surprisingly capable
But it has absolutely nothing in dedicated hardware.
It’s also pretty easy to configure Open WebUI. After creating the first account (also with administrator privileges), I downloaded qwen3:4b-14_k_m and qwen2.5coder:7b-instruct-q4_k_swhich would be my two testbeds to see how capable this system is at running smaller but highly optimized LLMs. The results were surprising, as my esteemed colleague Ayush Pande discovered while running a similar test on a mini PC with an Intel N100 CPU. For Qwen3 on my compact system, the 4B model achieved around 4 tok/s with a simple question and when asked what is XDA Developers. It’s not brilliant, but it’s more than enough to load queries while doing something else.
The Intel Core i5-10210U was never designed with local LLMs in mind. It is a mobile chip placed on a compact mini PC motherboard. Having it do a lot of heavy lifting will result in slow waits, but the four physical cores and upgradable RAM provide some headroom for heavier tasks, like running local models. I’ve found that anything below 10B is entirely possible without entering swap territory and waiting an absolute age for the CPU to handle everything. The downloaded qwen3:4b test model is great for general inquiries, and the slightly larger model qwen2.5code:7b It’s solid to help.
I found it funny how Qwen3 It is believed that XDA does not cover LLM or PC hardware, although it is interesting how the model was largely based on the community forum. That’s what happens with these more compact models with smaller total parameters. You need to tell them the right way to get the most out of technology. It’s no use simply asking if XDA covers PC hardware and LLM, especially after asking what XDA is. The LLM will base its follow-up response on the forum, but that’s where people may have difficulty interacting with on-premises and cloud-based models.
Not a great daily driver.
Although around 4 tok/s is perfectly fine for my needs with a local LLM, it is not an ideal setup for running models on a daily basis. If you expect fast responses and high accuracy, you’ll need the cloud or powerful hardware to run everything locally, but then you’ll have to pay electricity costs. Sometimes, depending on where you live and what PC parts you have available, cloud AI can be more affordable. For those of us who don’t mind waiting a minute or two for a response and use LLM for specific needs, even a cheap, low-power mini PC with a 15W CPU like this one can get the job done.
Do you have money for a mini PC designed to run AI? Grab something like the GMKtec EVO-X2 AI.
- Brand
-
GMKtec
- UPC
-
AMD Ryzen AI Max+ 395
- Memory
-
LPDDR5X-8000
- Operating system
-
Windows 11/Ubuntu
- Graphics
-
AMDRadeon 8060S







