Building a Local LLM on a Budget: Part I — The Hardware

LLM on a Budget series

Part I — The hardware plan
Interlude — Gemma 4 on laptop hardware
Part II — The build
Part III — Software, the 4am saga, and benchmarks
Companion — Model comparison deep dive
Part IV — GPU upgrade, six models, 16GB VRAM

The problem with depending on cloud AI

I've been spending a lot of time working with AI lately — and I mean a lot. The productivity gains are real: things that used to take days now take hours, things that took hours take minutes. Once you've worked that way for a while, going back isn't really an option.

But using AI at that volume isn't free, and the costs come in two flavours. The first is the high-performance work — complex tasks that genuinely need a capable model, burn through tokens fast, and push hard against subscription limits. The second is the mundane stuff: tedious, time-consuming busywork where I'm not asking for anything particularly clever, just help moving through a pile of repetitive tasks more efficiently. Both categories eat into the same limits, and I tend to hit those limits at exactly the wrong moment — mid-task, mid-thought, mid-project.

The obvious solution for the mundane work is to run something locally. A model that handles routine tasks without touching my cloud subscription, running on hardware I own, available whenever I need it. No token counts, no rate limits, no interruptions.

The budget constraint

Building a machine capable of running decent local models isn't cheap — unless, as it happens, you're the sort of person who has a pile of old server hardware gathering dust.

I've got enough on financially right now that pouring serious money into this isn't sensible, even if I could justify the expense long-term. So the budget option it is — which, as it turns out, is an interesting constraint to work within.

The hardware

The starting point was four 32GB ECC RAM modules pulled from a ProLiant Gen9 — 128GB total — and an Intel Xeon E5-2640 v3 CPU that had been sitting unused. Given the current cost of RAM, those modules are probably the most valuable components in the entire build.

A bit of research led me to the Machinist X99 MR9S motherboard — an affordable option that supports 8 RAM slots, quad-channel memory configuration with this CPU, and multiple full-length PCIe x16 slots. It's not glamorous, but it gives the build a solid foundation.

CPU	Intel Xeon E5-2640 v3 (8 cores / 16 threads)
Motherboard	Machinist X99 MR9S
RAM	128GB ECC DDR4, quad channel (4× 32GB modules, salvaged from ProLiant Gen9)
GPU	Asus Dual RTX 2060 Super Evo V2 — 8GB VRAM
Storage	2× SSD, rear-mounted via 3D-printed brackets
Case	DS Webhosting original 4RU server case, circa 2001

The GPU was the tightest constraint. Affordable options with useful VRAM are limited, and I landed on a used Asus Dual RTX 2060 Super Evo V2 with 8GB. It's not a great GPU for current large models — Gemma 4's bigger variants are out of reach — but the smaller E2B and E4B models should fit comfortably. The other factor working in my favour is that you can split a model across GPU VRAM and system RAM, and with the amount of RAM this machine will carry, there's some useful extra headroom there.

On model selection: 8GB of VRAM limits which models run well entirely on-GPU. The practical options at that budget are quantised versions of mid-sized models — Gemma 3 4B, Llama 3.1 8B, Phi-4 Mini, and similar. For routine tasks, these are more than adequate. The goal isn't to match a frontier model; it's to handle the busywork without touching cloud credits.

The case — a very old friend

The case is an old DS Webhosting 4RU server chassis — something I've been lugging around for about 25 years. I still remember picking it up new from the wholesaler. The slide rails defy any logical interpretation of standard rack unit mounting, and I've long since lost all the 5.25" drive bay cages that originally filled the front.

Regular readers of this blog will not be surprised to learn that the missing hardware has been addressed with a 3D printer. A bracket on one side now mounts a 120mm Noctua fan to pull cool air in from the front. A pair of brackets on the rear hold the two SSDs that will serve as storage. Neither solution is elegant. Both work.

The case has outlasted every piece of hardware that's ever lived inside it, and it will outlast this build too. Solid steel construction with thick aluminium doors will do that. At this point it's less a server case and more a family heirloom.

Power consumption and old habits

I'll be honest: I have a long-standing aversion to turning computers off. There is one server here that stays powered down by default — ready to spin up automatically and restore services if the primary fails — but everything else runs continuously. Always has.

This probably comes from years of hard-won experience. In the early days, turning off a computer was something you did with a degree of optimism rather than confidence — old hard disks had a habit of deciding, after a power cycle, that today was not the day they'd be spinning up again. Servers that had been running faithfully for years had a similar tendency: keep them going and they'd run indefinitely; give them a rest and you'd find out which components had been hanging on by sheer momentum. And then there were the boot times. Anyone who spent meaningful time with 1990s hardware knows that "I'll just restart it" was not a casual commitment — it was a decision you sat with for several minutes, possibly with a cup of tea.

Those habits don't entirely leave you.

This machine might genuinely be the first one I make a real effort to manage power on — throttling back at idle, sleeping when not actively in use. The Xeon E5-2640 v3 has a 90W TDP and the RTX 2060 Super adds another 175W under load; leaving that running around the clock for something I'm not always actively using doesn't make a lot of sense.

Whether I'll follow through on this noble intention is another matter entirely. My track record with turning computers off is not good. Every machine I've ever owned has started with a careful power management plan and ended up running permanently "just because." I expect this one will be no different, but I live in hope.

The one thing that does make the power consumption easier to stomach: solar panels. It's remarkable how quickly the mental calculus on running hardware changes when a meaningful chunk of what you're consuming is coming off the roof for free. Installing them was, in retrospect, very easy to justify.

What comes next

Right now I'm waiting for hardware to arrive. The motherboard and GPU are on their way; the RAM and CPU are already here. Once everything lands and I've got it assembled, Part II will cover the build itself, the software stack (almost certainly Ollama with an Open WebUI frontend), initial model testing, and whether this thing is actually usable for the day-to-day tasks I have in mind.

The honest answer to "will this work?" is: probably, with caveats. The 8GB VRAM is the binding constraint, and running models split across GPU RAM and system RAM is meaningfully slower than running them fully on-GPU. Whether it's fast enough to be genuinely useful rather than just a curiosity is what Part II will tell us.

Building a local LLM on a budget: Part I — the hardware

The problem with depending on cloud AI

The budget constraint

The hardware

The case — a very old friend

Power consumption and old habits

What comes next