Featured image of post Home Lab Infrastructure: Why I Built It Backwards

Home Lab Infrastructure: Why I Built It Backwards

Cloud-first AI infrastructure that doesn't require a data center. A Raspberry Pi handles orchestration, Ollama Cloud handles inference, and a gaming mini-PC provides local processing when privacy matters.

You don’t start with a data center. You start with a problem.

My problem was simple: I wanted AI that just works. Cloud-first for speed, local when I need privacy. No GPU management on my daily driver.

What I built looks weird from the outside. A Raspberry Pi doing brain work. A gaming mini-PC acting as the heavy lifter. A Surface Go and a Pixel 8 as edge nodes. None of it matches the “proper” architecture diagrams.

But that’s the pattern: start with constraints, build what works, add sophistication later.

How the Pieces Connect

The Pi orchestrates. The cloud does the thinking. The EVO handles privacy-sensitive work. The GL.iNet Flint 2 routes everything — and provides WireGuard VPN for remote access from the Surface and Pixel when I’m not home.

The Pi 5 Runs the Show

Eight gigabytes. Plenty when you’re not running models.

The Pi is pure infrastructure — no inference, just orchestration:

  • OpenClaw — the orchestrator, always listening
  • SearXNG — local search, no API limits
  • fmem — memory system, semantic search
  • Hermes — personal AI assistant (Docker container, for family use)
  • Browser Node — disposable Chromium container for web automation (Docker)

No Ollama here. The Pi routes requests to Ollama Cloud by default, hands off to EVO when I need local processing. This keeps the Pi cool, responsive, and reliable.

5GB buffer means no memory pressure. The Pi runs 24/7 without breaking a sweat.

Where the Thinking Happens

Primary: Ollama Cloud

Most queries go to ollama.com. No local GPU management, no memory pressure, no model updates to track.

Tiers: Free (with session limits), Pro ($20/mo for frontier models), Max ($100/mo for heavy use). I use the free tier for day-to-day; Pro when I need frontier models.

Local: EVO X2

When I need privacy — sensitive documents, work data, personal notes — the request routes to the EVO instead. Local inference, nothing leaves the network.

Privacy note: “Nothing leaves the network” means the request doesn’t go to cloud. The Pi gateway still sees all traffic. This is “not sent to cloud” privacy, not adversarial security — if someone compromises the Pi, they see everything.

This isn’t cloud-native. It’s cloud-first, local-when-needed.

The EVO Wakes When Needed

The EVO doesn’t run 24/7. It’s the heavy lifter — 96GB unified memory, Radeon 890M GPU, NPU for inference.

What makes it work:

Component Why It Matters
96GB unified Models don’t need to fit in VRAM — CPU and GPU share the pool
ROCm 7.1.1 AMD’s CUDA alternative, experimental but working
Distrobox Container isolation without losing hardware access
Bazzite Immutable OS, atomic updates, SteamOS for desktop

What runs here:

  • Ollama (local privacy) — When cloud isn’t appropriate. GLM 4.7 Flash is my preferred model (has a KV cache bug I work around), but Qwen 3.5 runs without issues. Also serves as the backend for coding agents (Pi, Opencode) and other services that need local LLM access.
  • ComfyUI — Image generation (FLUX Schnell, Real-ESRGAN)

The EVO wakes when I need local inference. Most of the time, Ollama Cloud handles the load.

Edge Devices

The GL.iNet Flint 2 sits at the edge — it’s the router that connects everything and provides WireGuard VPN for remote access.

Why the Flint 2?

  • WireGuard at line rate — no CPU bottleneck, no latency penalty
  • Surface Go 2 and Pixel 8 connect through it when I’m not home
  • Routes traffic to the Pi gateway without exposing services directly

The Surface Go 2 runs Ultramarine Linux — a Fedora Spin, not Atomic. Performance matters on constrained hardware. Immutable distros add weight; Ultramarine keeps things light. Plus, it has Surface kernel support out of the box.

The Pixel 8 is my daily driver. Both connect to the Pi gateway through standard OpenClaw channels — no dedicated node software needed.

Neither runs heavy compute. They’re terminals with a direct line to the Pi — whether I’m at my desk or on WireGuard from somewhere else.

Why This Layout Works

Keep data close to compute.

Service Location Why
OpenClaw Pi Orchestrates everything, must be always-on
fmem Pi Memory queries need low latency
SearXNG Pi Search during conversations, no API limits
Hermes Pi (Docker) Family assistant, lightweight container
Ollama Cloud Remote Primary inference, zero local overhead
Ollama Local EVO Privacy-sensitive work, offline fallback
ComfyUI EVO GPU-required, not time-critical

What doesn’t move: the orchestrator and the memory. What moves: heavy compute to where the RAM is.

This is the same pattern as grocery shopping. You don’t check every aisle. You check the list. The list is the cache — local, fast, filtered.

What I Didn’t Build

I didn’t build a Kubernetes cluster. No Proxmox, no TrueNAS, no homelab staples.

This isn’t competing with homelabs — it’s a different focus. Kubernetes would help with GPU scheduling if I needed it. Proxmox would help if I were running many services. I’m not. I’m running three things: orchestration, memory, inference.

Homelab Focus AI Lab Focus
Service availability Inference speed
High availability Privacy-first
Many services Few services, deep
GUI dashboards CLI and APIs

I don’t need five nines uptime. I need inference available, my sensitive data local when needed, and minimal cloud costs.

What It Costs

Component Hardware Power Draw Est. Cost/Month
Pi 5 8GB, always-on ~5W ~$0.50
GL.iNet Flint 2 Router, WireGuard ~6W ~$0.60
EVO X2 96GB, on-demand ~120W active ~$5 (occasional use)
Surface Go 2 On-demand ~15W Negligible
Pixel 8 Personal device
Ollama Cloud Remote inference Free tier / $20 Pro

Total: ~$6/month power + Ollama Cloud tier.

How It Stays Secure

Nothing is exposed to the public internet. The entire lab runs on a private network — no open ports, no port forwarding, no attack surface.

The GL.iNet Flint 2 handles the perimeter:

  • WireGuard VPN for remote access — Surface Go and Pixel 8 connect securely from anywhere
  • Line-rate encryption — no CPU bottleneck, no noticeable latency
  • Routes all traffic through the Pi, nothing bypasses the gateway

What’s exposed:

  • SpudHub — Dashboard via Cloudflare tunnel
  • BingeWatching — Entertainment tracking
  • Foundry VTT — Tabletop gaming (on-demand)

What’s not exposed:

  • OpenClaw API
  • Ollama endpoints
  • fmem, SearXNG, Hermes

Cloudflare tunnels (cloudflared) handle the routing. No inbound connections. The tunnel dials out, Cloudflare routes traffic back. If the tunnel dies, the service disappears — no stale attack surface.

Where This Breaks

This architecture isn’t for everyone.

Pi 5 limits:

  • 8GB RAM is fine without Ollama — 5GB+ buffer
  • No GPU. But with cloud-first, this doesn’t matter
  • External SSD for OS — fast storage for orchestration workloads

EVO X2 limits:

  • ROCm 7.1.1 works for Ollama out of the box (vLLM has issues, but I’m not using it)
  • Binary execution requires approval in Distrobox containers — by design, but can be disabled with yolo mode if needed
  • Not always-on without accepting power cost

The mistake: Copy this because it looks cool. Don’t.

The pattern: Start with your constraints. Build what works. Add sophistication when the constraint bites.

The Point

You don’t need a data center. You need a Raspberry Pi and a clear idea of what you’re optimizing for.

For me, it was: cloud-first for convenience, local for privacy.

The Pi orchestrates. Ollama Cloud thinks. The EVO handles sensitive work. Edge devices are terminals with a direct line home.

Same pattern as grocery delivery: let someone else stock the warehouse, cook in your own kitchen when it matters.


Disclaimer: This architecture reflects my constraints — power cost priority, local-first preference, no need for enterprise HA. If you’re running a production workload, build for your constraints, not mine.