Home Lab Infrastructure: Why I Built It Backwards

You don’t start with a data center. You start with a problem.

My problem was simple: I wanted AI that just works. Cloud-first for speed, local when I need privacy. No GPU management on my daily driver.

What I built looks weird from the outside. A Raspberry Pi doing brain work. A gaming mini-PC acting as the heavy lifter. A Surface Go and a Pixel 8 as edge nodes. None of it matches the “proper” architecture diagrams.

But that’s the pattern: start with constraints, build what works, add sophistication later.

How the Pieces Connect

graph TB
    subgraph Internet
        OllamaCloud[Ollama Cloud
Primary Inference]
    end
    
    subgraph "GL.iNet Flint 2 (Router)"
        WireGuard[WireGuard VPN
Remote Access]
    end
    
    subgraph "Pi 5 (Gateway) - 192.168.121.10"
        OpenClaw[OpenClaw Agent
Always-On]
        SearXNG[SearXNG
Local Search]
        fmem[fmem
Memory Index]
        Hermes[Hermes
Family Assistant]
    end
    
    subgraph "EVO X2 (Compute) - 192.168.121.20"
        OllamaEVO[Ollama Local
Privacy Mode]
        ComfyUI[ComfyUI
Image Gen]
    end
    
    subgraph Edge
        Surface[Surface Go 2
Ultramarine Linux]
        Pixel[Pixel 8
Android]
    end
    
    OpenClaw -->|Primary| OllamaCloud
    OpenClaw -->|Local Privacy| OllamaEVO
    Surface -->|WireGuard| WireGuard
    Pixel -->|WireGuard| WireGuard
    WireGuard --> OpenClaw

The Pi orchestrates. The cloud does the thinking. The EVO handles privacy-sensitive work. The GL.iNet Flint 2 routes everything — and provides WireGuard VPN for remote access from the Surface and Pixel when I’m not home.

The Pi 5 Runs the Show

graph LR
    subgraph "Pi 5 (8GB RAM)"
        Core[Core Services]
    end
    
    subgraph "Memory Budget"
        OS[OS + OpenClaw: 2GB]
        Search[SearXNG: 500MB]
        Memory[fmem Index: 500MB]
        Buffer[Buffer: 5GB]
    end
    
    Core --> OS
    Core --> Search
    Core --> Memory
    Core --> Buffer

Eight gigabytes. Plenty when you’re not running models.

The Pi is pure infrastructure — no inference, just orchestration:

OpenClaw — the orchestrator, always listening
SearXNG — local search, no API limits
fmem — memory system, semantic search
Hermes — personal AI assistant (Docker container, for family use)
Browser Node — disposable Chromium container for web automation (Docker)

No Ollama here. The Pi routes requests to Ollama Cloud by default, hands off to EVO when I need local processing. This keeps the Pi cool, responsive, and reliable.

5GB buffer means no memory pressure. The Pi runs 24/7 without breaking a sweat.

Where the Thinking Happens

graph LR
    subgraph "Inference Tiers"
        Primary[Primary
Ollama Cloud]
        Local[Local Privacy
EVO X2]
    end
    
    Request[Request] --> Primary
    Primary -->|Privacy Needed| Local
    Primary -->|Cloud Down| Local

Primary: Ollama Cloud

Most queries go to ollama.com. No local GPU management, no memory pressure, no model updates to track.

Tiers: Free (with session limits), Pro ($20/mo for frontier models), Max ($100/mo for heavy use). I use the free tier for day-to-day; Pro when I need frontier models.

Local: EVO X2

When I need privacy — sensitive documents, work data, personal notes — the request routes to the EVO instead. Local inference, nothing leaves the network.

Privacy note: “Nothing leaves the network” means the request doesn’t go to cloud. The Pi gateway still sees all traffic. This is “not sent to cloud” privacy, not adversarial security — if someone compromises the Pi, they see everything.

This isn’t cloud-native. It’s cloud-first, local-when-needed.

The EVO Wakes When Needed

The EVO doesn’t run 24/7. It’s the heavy lifter — 96GB unified memory, Radeon 890M GPU, NPU for inference.

What makes it work:

Component	Why It Matters
96GB unified	Models don’t need to fit in VRAM — CPU and GPU share the pool
ROCm 7.1.1	AMD’s CUDA alternative, experimental but working
Distrobox	Container isolation without losing hardware access
Bazzite	Immutable OS, atomic updates, SteamOS for desktop

What runs here:

Ollama (local privacy) — When cloud isn’t appropriate. GLM 4.7 Flash is my preferred model (has a KV cache bug I work around), but Qwen 3.5 runs without issues. Also serves as the backend for coding agents (Pi, Opencode) and other services that need local LLM access.
ComfyUI — Image generation (FLUX Schnell, Real-ESRGAN)

The EVO wakes when I need local inference. Most of the time, Ollama Cloud handles the load.

Edge Devices

The GL.iNet Flint 2 sits at the edge — it’s the router that connects everything and provides WireGuard VPN for remote access.

Why the Flint 2?

WireGuard at line rate — no CPU bottleneck, no latency penalty
Surface Go 2 and Pixel 8 connect through it when I’m not home
Routes traffic to the Pi gateway without exposing services directly

The Surface Go 2 runs Ultramarine Linux — a Fedora Spin, not Atomic. Performance matters on constrained hardware. Immutable distros add weight; Ultramarine keeps things light. Plus, it has Surface kernel support out of the box.

The Pixel 8 is my daily driver. Both connect to the Pi gateway through standard OpenClaw channels — no dedicated node software needed.

Neither runs heavy compute. They’re terminals with a direct line to the Pi — whether I’m at my desk or on WireGuard from somewhere else.

Why This Layout Works

Keep data close to compute.

Service	Location	Why
OpenClaw	Pi	Orchestrates everything, must be always-on
fmem	Pi	Memory queries need low latency
SearXNG	Pi	Search during conversations, no API limits
Hermes	Pi (Docker)	Family assistant, lightweight container
Ollama Cloud	Remote	Primary inference, zero local overhead
Ollama Local	EVO	Privacy-sensitive work, offline fallback
ComfyUI	EVO	GPU-required, not time-critical

What doesn’t move: the orchestrator and the memory. What moves: heavy compute to where the RAM is.

This is the same pattern as grocery shopping. You don’t check every aisle. You check the list. The list is the cache — local, fast, filtered.

What I Didn’t Build

I didn’t build a Kubernetes cluster. No Proxmox, no TrueNAS, no homelab staples.

This isn’t competing with homelabs — it’s a different focus. Kubernetes would help with GPU scheduling if I needed it. Proxmox would help if I were running many services. I’m not. I’m running three things: orchestration, memory, inference.

Homelab Focus	AI Lab Focus
Service availability	Inference speed
High availability	Privacy-first
Many services	Few services, deep
GUI dashboards	CLI and APIs

I don’t need five nines uptime. I need inference available, my sensitive data local when needed, and minimal cloud costs.

What It Costs

Component	Hardware	Power Draw	Est. Cost/Month
Pi 5	8GB, always-on	~5W	~$0.50
GL.iNet Flint 2	Router, WireGuard	~6W	~$0.60
EVO X2	96GB, on-demand	~120W active	~$5 (occasional use)
Surface Go 2	On-demand	~15W	Negligible
Pixel 8	Personal device	—	—
Ollama Cloud	Remote inference	—	Free tier / $20 Pro

Total: ~$6/month power + Ollama Cloud tier.

How It Stays Secure

Nothing is exposed to the public internet. The entire lab runs on a private network — no open ports, no port forwarding, no attack surface.

The GL.iNet Flint 2 handles the perimeter:

WireGuard VPN for remote access — Surface Go and Pixel 8 connect securely from anywhere
Line-rate encryption — no CPU bottleneck, no noticeable latency
Routes all traffic through the Pi, nothing bypasses the gateway

What’s exposed:

SpudHub — Dashboard via Cloudflare tunnel
BingeWatching — Entertainment tracking
Foundry VTT — Tabletop gaming (on-demand)

What’s not exposed:

OpenClaw API
Ollama endpoints
fmem, SearXNG, Hermes

Cloudflare tunnels (cloudflared) handle the routing. No inbound connections. The tunnel dials out, Cloudflare routes traffic back. If the tunnel dies, the service disappears — no stale attack surface.

Where This Breaks

This architecture isn’t for everyone.

Pi 5 limits:

8GB RAM is fine without Ollama — 5GB+ buffer
No GPU. But with cloud-first, this doesn’t matter
External SSD for OS — fast storage for orchestration workloads

EVO X2 limits:

ROCm 7.1.1 works for Ollama out of the box (vLLM has issues, but I’m not using it)
Binary execution requires approval in Distrobox containers — by design, but can be disabled with yolo mode if needed
Not always-on without accepting power cost

The mistake: Copy this because it looks cool. Don’t.

The pattern: Start with your constraints. Build what works. Add sophistication when the constraint bites.

The Point

You don’t need a data center. You need a Raspberry Pi and a clear idea of what you’re optimizing for.

For me, it was: cloud-first for convenience, local for privacy.

The Pi orchestrates. Ollama Cloud thinks. The EVO handles sensitive work. Edge devices are terminals with a direct line home.

Same pattern as grocery delivery: let someone else stock the warehouse, cook in your own kitchen when it matters.

Disclaimer: This architecture reflects my constraints — power cost priority, local-first preference, no need for enterprise HA. If you’re running a production workload, build for your constraints, not mine.