// Field Report — Hobart, Tasmania

THE
FORTRESS LIVES

A 12-year-old HP Z420 workstation running a 120B parameter model at agent-usable throughput. Two agents — one local, one cloud-routed — same model, same mission. Just DDR3, a Xeon, and stubbornness.

Start the build See the model

// Z420-FORTRESS — SYSTEM MANIFEST

Machine

HP Z420 Workstation 2012–2013

CPU

Intel Xeon E5-1650 v2
6-core / 12-thread @ 3.5GHz

RAM

64GB DDR3 ECC 1866MHz KEY

Storage

SSD (OS) + HDD (model weights)

GPU

Low-end CPU INFERENCE

Ubuntu 24.04 LTS

Model

Nemotron-3-Super 120B LOCAL + CLOUD

Runtime

Ollama + OpenClaw Gateway

// Inference throughput — agent workload

1–3 tok/s → Usable. Real. Ours.

// Hardware

THE RIG

The Z420 is a 2012–2013 HP professional workstation. They're everywhere on eBay and second-hand markets for next to nothing — because corporations cycled them out and nobody thought to run 120B parameter AI models on them. That was a mistake on their part.

// Why this machine

The Z420's secret weapon isn't the CPU — it's the RAM ceiling. With the right Xeon (E5-16xx or E5-26xx series), it takes up to 64GB of ECC DDR3. Large language models run in RAM when there's no GPU to speak of. More RAM means larger models, less swapping, faster inference. The Z420 gives you 64GB on a budget that would embarrass a new gaming PC.

CPU

Xeon E5-1650 v2

6 cores, 12 threads. 3.5GHz base, 3.9GHz boost. Ivy Bridge-EP architecture. More than enough for inference orchestration.

↗ v2 suffix matters — LGA2011 socket, check compatibility

RAM — the important part

64GB DDR3 ECC 1866MHz

8× 8GB sticks in quad-channel configuration. ECC is a bonus — the capacity is the point. This is your VRAM substitute.

↗ Don't go below 32GB. 64GB is the build.

Storage

SSD for OS + HDD for weights

Any modern SSD for Ubuntu. A large spinning disk or second SSD for model files — Nemotron-3-Super 120B in Q4 quantisation is ~60GB.

↗ Weights don't need speed. Capacity matters.

GPU

Optional — CPU inference is the play

The Z420's proprietary 625W PSU limits GPU options. Avoid high-TDP cards. A GTX 1060 6GB or Quadro P2000 are safe choices. But honestly — not required.

↗ CPU-only inference works. Don't block on GPU.

Power Supply

625W HP Proprietary PSU

Non-standard connector. Don't try to swap it out unless you know what you're doing. Factor this into any GPU plans.

↗ This is the Z420's one real constraint.

Operating System

Ubuntu 24.04 LTS

Ollama has excellent Linux support. Ubuntu 24.04 is stable and well-documented. Avoid kernel bleeding-edge — driver conflicts multiply fast.

↗ Stick to LTS. Seriously.

// What to buy and where

Search eBay, Facebook Marketplace, Gumtree, or local government surplus auctions. Keywords: "HP Z420 workstation", "Xeon workstation", "E5-1650". Budget AUD $150–300 for the base unit. Add $80–150 for RAM if it needs topping up. Total outlay for the Fortress: under $400.

// The Model

NEMOTRON-3-SUPER 120B

Nvidia Nemotron-3-Super 120B is a Mixture-of-Experts model with 120 billion total parameters and 12 billion active per forward pass — meaning it reasons like a large model but runs cheaper than its parameter count implies. It's the engine behind both Arktron and Arbor, accessed two different ways.

Arktron — Cloud path

openrouter/nvidia/nemotron-3-super-120b-a12b:free

Routed through OpenRouter's free tier. No local resources consumed. Arktron uses this for tasks where latency matters less than throughput, or when the local model is under load.

↗ Free tier — rate limited but zero cost

Arbor — Local path

nemotron-3-super:120b (Ollama)

Running directly on the Fortress via Ollama. Private, offline-capable, no rate limits. Arbor owns this instance — it's the model that lives in the machine.

↗ CPU inference — 1–3 tok/s on the Z420

Architecture

Mixture of Experts — 12B active

120B total parameters, but only ~12B active per token. This is why it's runnable locally — the actual compute per forward pass is far less than the total parameter count suggests.

↗ MoE is the reason this works at all

// Runtime

OLLAMA SETUP

Ollama is the runtime layer — it handles model downloading, quantisation loading, and exposes a local REST API that everything else talks to. It's the part that makes this actually work without pain.

Install Ollama

One-line installer on Ubuntu. This handles the binary, systemd service, and default paths.

Pull the model

Downloads directly from the Ollama registry. ~28GB — make sure you have the space and a decent connection. It resumes if interrupted.

Configure for CPU inference

Set OLLAMA_NUM_GPU=0 if you want pure CPU. On the Z420 with a low-end GPU, Ollama will usually figure this out — but being explicit avoids surprises.

Configure RAM allocation

Set OLLAMA_MAX_LOADED_MODELS=1 to avoid loading multiple models simultaneously. With 64GB, you can technically load two smaller ones, but for 120B, single-model is essential.

Test the endpoint

Ollama serves on http://localhost:11434 by default. Hit it with a quick curl to confirm it's alive before building anything on top.

Enable as a systemd service

Ollama installs a systemd unit. Enable it so the model server starts on boot — important if this machine is running headless or via RDP.

// Install & pull

$ curl -fsSL https://ollama.com/install.sh | sh

# Pull the main model (Nemotron-3-Super 120B)
$ ollama pull nemotron-3-super:120b

# Test it's running
$ curl http://localhost:11434/api/generate \
-d '{"model":"nemotron-3-super:120b","prompt":"Hello."}'

// Environment config — /etc/systemd/system/ollama.service.d/override.conf

[Service]
Environment="OLLAMA_NUM_GPU=0"
Environment="OLLAMA_MAX_LOADED_MODELS=1"
Environment="OLLAMA_HOST=0.0.0.0:11434"
# OLLAMA_HOST=0.0.0.0 exposes to LAN — useful if other machines
# on your network need to reach the model server

// On swap and RAM pressure

With 64GB and a ~28GB model, you have plenty of headroom. But if you're on 32GB, the model will use swap. This works — but throughput drops to roughly 0.2–0.5 tok/s. Still technically functional for offline batch work, but painful for interactive use. The 64GB target isn't perfectionism — it's the usability threshold.

// Orchestration

OPENCLAW

OpenClaw is the API gateway layer — it sits between your agents and the model backends, handling multi-provider routing, prompt caching, model fallbacks, and unified API access. Nemotron via Ollama becomes one provider among several. Claude, Gemini, Qwen — all addressable through a single interface.

◌ Chapter In Progress

AGENT PERFORMANCE TESTING UNDERWAY

The OpenClaw chapter is being written as the system is put through its paces. Real numbers from real workloads — not benchmarks. This section will cover installation, provider configuration, routing logic, and prompt cache setup.

Multi-provider routing

Prompt caching config

Model fallback logic

Ollama integration

openclaw.json structure

// Why OpenClaw matters for the Fortress

Without a gateway, you're choosing one model for every task. With OpenClaw, the Fortress runs Nemotron locally for private or heavy reasoning tasks, routes to Claude or Gemini for tasks that benefit from cloud scale, and caches repeated context so you're not re-uploading the same 50k-token prompt for every request. The local model stops being a limitation and starts being a deliberate choice.

THE FORTRESS LIVES

THE
FORTRESS LIVES