TrueNAS VM 9000 was operationally decommissioned 2026-04-13; NFS has been
served by Proxmox host (192.168.1.127) since. This commit scrubs remaining
references from active docs. VM 9000 itself remains on PVE in stopped state
pending user decision on deletion.
In-session cleanup already landed: reverse-proxy ingress + Cloudflare record
removed; Technitium DNS records deleted; Vault truenas_{api_key,ssh_private_key}
purged; homepage_credentials.reverse_proxy.truenas_token removed;
truenas_homepage_token variable + module deleted; Loki + Dashy cleaned;
config.tfvars deprecated DNS lines removed; historical-name comment added to
the nfs-truenas StorageClass (48 bound PVs, immutable name — kept).
Historical records (docs/plans/, docs/post-mortems/, .planning/) intentionally
untouched — they describe state at a point in time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 KiB
VPN & Remote Access Architecture
Last updated: 2026-04-10
Overview
Remote access to the homelab is provided through a hybrid VPN architecture: WireGuard site-to-site tunnels connect physical locations (Sofia, London, Valchedrym), while Headscale (self-hosted Tailscale control server) provides mesh overlay networking for roaming clients. Split DNS architecture ensures resilience: AdGuard serves as the global DNS resolver for all VPN clients, while Technitium handles internal .lan domains. This design prevents tunnel dependency for public DNS resolution — if the Cloudflared tunnel goes down, clients can still access the internet.
Architecture Diagram
VPN Topology
graph TB
subgraph "Site-to-Site WireGuard (Hub-and-Spoke)"
Sofia[Sofia pfSense<br/>10.3.2.1<br/>tun_wg0]
London[London GL-iNet Flint 2<br/>10.3.2.6<br/>192.168.8.0/24]
Valchedrym[Valchedrym OpenWRT<br/>10.3.2.5<br/>192.168.0.0/24]
Sofia ---|WireGuard Tunnel| London
Sofia ---|WireGuard Tunnel| Valchedrym
end
subgraph "Headscale Mesh Overlay"
HS[Headscale<br/>headscale.viktorbarzin.me<br/>K8s Service]
Authentik[Authentik OIDC<br/>SSO Login]
DERP[DERP Relay<br/>Region 999<br/>Embedded in Headscale]
subgraph "Clients"
Laptop[MacBook<br/>Tailscale Client]
Phone[iPhone<br/>Tailscale Client]
Remote[Remote VM<br/>Tailscale Client]
end
HS --> Authentik
HS --> DERP
Laptop -.mesh.- Phone
Laptop -.mesh.- Remote
Phone -.mesh.- Remote
Laptop --> HS
Phone --> HS
Remote --> HS
Laptop -.relay fallback.- DERP
Phone -.relay fallback.- DERP
end
Sofia --> HS
DNS Resolution Flow
sequenceDiagram
participant Client as VPN Client
participant AdGuard as AdGuard DNS<br/>(Global)
participant Technitium as Technitium DNS<br/>(Internal .lan)
participant Cloudflare as Cloudflare DNS<br/>(Public Domains)
Note over Client: Query: immich.viktorbarzin.me
Client->>AdGuard: DNS query
AdGuard->>Cloudflare: Forward (not .lan)
Cloudflare-->>AdGuard: A record (Cloudflare IP)
AdGuard-->>Client: Response
Note over Client: Query: nextcloud.viktorbarzin.lan
Client->>AdGuard: DNS query
AdGuard->>Technitium: Forward (.lan domain)
Technitium-->>AdGuard: A record (10.0.20.200)
AdGuard-->>Client: Response
Note over Client,Technitium: If Cloudflared tunnel is down:
Client->>AdGuard: DNS query (google.com)
AdGuard->>Cloudflare: Forward (public DNS works)
Cloudflare-->>AdGuard: A record
AdGuard-->>Client: Response (no tunnel dependency)
Components
| Component | Version/Type | Location | Purpose |
|---|---|---|---|
| WireGuard | Built-in (pfSense/OpenWRT) | Sofia (pfSense), London (GL-iNet Flint 2), Valchedrym (OpenWRT) | Site-to-site encrypted tunnels (hub-and-spoke) |
| Headscale | v0.23.x (container) | K8s (headscale.viktorbarzin.me) | Tailscale control server, mesh coordinator |
| Tailscale | Client v1.x | User devices | Mesh VPN client |
| Authentik | OIDC provider | K8s | SSO authentication for Headscale |
| DERP Relay | Embedded in Headscale | K8s (region 999) | Relay for NAT traversal |
| AdGuard DNS | Container | K8s | Global DNS resolver with ad-blocking |
| Technitium DNS | Container | K8s (10.0.20.101) | Internal .lan domain resolver |
How It Works
WireGuard Site-to-Site
Three physical locations are permanently connected via WireGuard in a hub-and-spoke topology with Sofia as the hub. A single WireGuard interface (tun_wg0) on pfSense carries both peers on the 10.3.2.0/24 tunnel subnet:
- Sofia (hub):
10.3.2.1— pfSense, K8s cluster on10.0.20.0/24, management on10.0.10.0/24, LAN on192.168.1.0/24 - London (spoke):
10.3.2.6— GL-iNet Flint 2 (GL-MT6000), LAN192.168.8.0/24, guest192.168.9.0/24 - Valchedrym (spoke):
10.3.2.5— OpenWRT router, LAN192.168.0.0/24
Routes are configured as static routes on pfSense. London and Valchedrym route Sofia-bound traffic through their WireGuard tunnels. London ↔ Valchedrym traffic transits through Sofia (no direct tunnel).
Use cases:
- Replication of Vault data between Sofia and London
- Offsite database replicas
- Accessing Proxmox hosts across locations
Headscale Mesh Overlay
Headscale is a self-hosted alternative to Tailscale's commercial control plane. It provides:
- Mesh networking: Clients establish direct WireGuard connections to each other (peer-to-peer).
- NAT traversal: DERP relays provide connectivity when direct connections fail.
- OIDC authentication: Users log in via Authentik, no pre-shared keys.
- ACL policies: Fine-grained control over which clients can reach which destinations.
Client onboarding:
- User installs Tailscale client (official macOS/iOS/Android app)
- Runs:
tailscale login --login-server https://headscale.viktorbarzin.me - Browser opens to Authentik SSO login
- After successful login, Tailscale presents a registration URL
- Admin approves the device via
headscale nodes register --user <username> --key <key> - Client is added to the mesh, receives IP in 100.64.0.0/10 range
Connectivity test: ping 10.0.20.100 (Sofia K8s API server) verifies full access to the homelab network.
DERP Relay for NAT Traversal
Problem: Symmetric NAT or restrictive firewalls prevent direct WireGuard connections between clients.
Solution: Headscale runs an embedded DERP relay server (region 999, named "Home DERP"). DERP is Tailscale's NAT traversal protocol, implemented as an HTTPS-based relay.
How it works:
- Clients attempt direct WireGuard connection via STUN/ICE.
- If direct connection fails, both clients connect to the DERP relay via HTTPS.
- Traffic is encrypted end-to-end with WireGuard, DERP only relays packets.
- No additional ports needed — DERP uses the same HTTPS ingress as Headscale (443).
Performance: DERP adds latency (extra hop through Sofia K8s cluster), but ensures connectivity in all scenarios.
Split DNS Architecture
Design goal: Prevent tunnel dependency for public DNS resolution. If the Headscale tunnel or Cloudflared tunnel fails, clients must still resolve public domains.
Implementation:
- AdGuard DNS: Global recursive resolver, serves all VPN clients. Includes ad-blocking and malicious domain filtering.
- Technitium DNS: Internal authoritative server for
.viktorbarzin.landomains.
Resolution flow:
- Client queries AdGuard for any domain.
- If domain ends in
.lan, AdGuard forwards to Technitium (10.0.20.201). - For all other domains, AdGuard resolves directly via upstream (Cloudflare 1.1.1.1).
- AdGuard caches responses, reducing load on Technitium and upstream.
Resilience: Even if the tunnel to Sofia is down, clients can still resolve google.com, github.com, etc., because AdGuard talks directly to Cloudflare. Only .lan domains become unavailable.
Access Control (Authentik Groups)
Headscale Users group in Authentik controls VPN access. Membership is invitation-only:
- Admin creates user in Authentik.
- Admin adds user to "Headscale Users" group.
- User logs in via OIDC during
tailscale login. - Headscale verifies group membership via OIDC claims.
Removing a user from the group revokes VPN access on next re-authentication (every 30 days).
Configuration
Terraform Stacks
| Stack | Path | Resources |
|---|---|---|
| Headscale | stacks/headscale/ |
Deployment, Service, Ingress, ConfigMap |
| AdGuard | stacks/adguard/ |
Deployment, Service, PVC |
| Technitium | stacks/technitium/ |
Deployment, Service, PVC |
| pfSense (Sofia) | Not in Terraform | WireGuard tunnel configs (managed via pfSense UI) |
Headscale Configuration
ConfigMap: stacks/headscale/main.tf
server_url: https://headscale.viktorbarzin.me
listen_addr: 0.0.0.0:8080
metrics_listen_addr: 0.0.0.0:9090
oidc:
issuer: https://authentik.viktorbarzin.me/application/o/headscale/
client_id: <redacted>
client_secret: <from Vault>
scope: ["openid", "profile", "email", "groups"]
allowed_groups: ["Headscale Users"]
derp:
server:
enabled: true
region_id: 999
region_code: "home"
region_name: "Home DERP"
stun_listen_addr: "0.0.0.0:3478"
urls:
- https://controlplane.tailscale.com/derpmap/default
auto_update_enabled: true
update_frequency: 24h
ip_prefixes:
- 100.64.0.0/10
dns_config:
nameservers:
- 10.0.20.102 # AdGuard DNS
domains:
- viktorbarzin.lan
magic_dns: true
Secrets (Vault):
secret/headscale/oidc_client_secret
Ingress: Standard ingress_factory with protected = false (OIDC is handled by Headscale itself).
AdGuard Configuration
Upstream DNS servers:
- Cloudflare:
1.1.1.1,1.0.0.1 - Google:
8.8.8.8,8.8.4.4
Conditional forwarding:
viktorbarzin.lan→10.0.20.101(Technitium)
Ad-blocking lists:
- AdGuard DNS filter
- OISD full list
- Developer Dan's ads and tracking list
Custom rules: Block telemetry for Windows, macOS, and smart TVs.
WireGuard (pfSense — Hub)
Single interface tun_wg0 (OPT2) with two peers on subnet 10.3.2.0/24. Listens on *:51821 for both IPv4 and IPv6. IPv6 access via HE tunnel (gif0, 2001:470:6e:43d::2) requires a pass in pf rule on the HE_IPv6 interface (interface name opt3 in config.xml):
Peer: London Flint 2:
- WireGuard IP:
10.3.2.6 - Remote endpoint:
vpn.viktorbarzin.me:51821(dual-stack: A=176.12.22.76, AAAA=2001:470:6e:43d::2) - Allowed IPs:
192.168.8.0/24, 192.168.9.0/24, 192.168.10.0/24, 10.3.2.6/32 - Keepalive: 25 seconds (configured on London side)
Peer: Valchedrym:
- WireGuard IP:
10.3.2.5 - Remote endpoint:
85.130.41.28:51820 - Allowed IPs:
10.3.2.5/32, 192.168.0.0/24 - Keepalive: none (should be added)
Static routes on pfSense:
192.168.0.0/24→ gatewayvalchedrym(10.3.2.5)192.168.8.0/24→ gatewaylondon_flint_2(10.3.2.6)192.168.9.0/24→ gatewaylondon_flint_2(10.3.2.6)192.168.10.0/24→ gatewaylondon_flint_2(10.3.2.6)
Note: WireGuard on pfSense is NOT managed by Terraform — configured via pfSense UI/shell.
WireGuard (London — GL-iNet Flint 2)
- Interface:
wgclient1(protowgclient, configpeer_855) - Local IP:
10.3.2.6/32 - Remote endpoint:
vpn.viktorbarzin.me:51821(dual-stack — resolves to IPv4 or IPv6) - Allowed IPs:
10.0.0.0/8, 192.168.1.0/24, 192.168.0.0/24 - Keepalive: 25 seconds
- Policy routing: GL-iNet marks traffic via iptables mangle → routing table 1001 (ipset
dst_net10) - Persistence:
/etc/firewall.userinjects LOCAL_POLICY mangle rule (GL-iNet'sgl-tertfcreates TUNNEL10_ROUTE_POLICY but not the LOCAL_POLICY rule for router-originated traffic)
GL-iNet AllowedIPs format: UCI list allowed_ips entries are concatenated by the wgclient protocol handler. Use a single comma-separated entry ('10.0.0.0/8,192.168.1.0/24,192.168.0.0/24'), NOT multiple list entries. Multiple entries cause a parse error like 10.0.0.0/8192.168.1.0/24 (no separator).
DNS: AdGuardHome runs on the router. Upstream DNS should NOT include 1.1.1.1 — it creates conntrack conflicts with ICMP and GL-iNet's carrier-monitor health check floods Cloudflare, triggering ICMP rate limits. Use 9.9.9.9, 8.8.4.4 instead. Health check IPs (glconfig.general.track_ip) should use 1.0.0.1 not 1.1.1.1.
WireGuard (Valchedrym — OpenWRT)
- WireGuard IP:
10.3.2.5 - Remote endpoint: Sofia public IP
- LAN:
192.168.0.0/24
Vault Secrets
- Headscale OIDC client secret:
secret/headscale/oidc_client_secret - WireGuard private keys:
secret/pfsense/wg_privkey_london,secret/pfsense/wg_privkey_valchedrym
Decisions & Rationale
Why Headscale Instead of Plain WireGuard?
Alternatives considered:
- WireGuard with static configs: Requires manual key distribution, complex peer management.
- OpenVPN: Slower, more overhead, less mobile-friendly.
- Commercial Tailscale: SaaS, not self-hosted, less control over data.
Decision: Headscale provides:
- Mesh networking: Clients connect directly, not through a central server.
- OIDC authentication: No pre-shared keys, integrates with existing SSO.
- Easy onboarding: Users install official Tailscale app, no custom configs.
- Self-hosted: Full control over control plane and data.
Trade-off: More complex setup than plain WireGuard, but operational benefits outweigh initial complexity.
Why Split DNS (AdGuard + Technitium)?
Alternatives considered:
- Single DNS server (Technitium only): Requires forwarding all public domains to upstream, creating single point of failure.
- Cloudflare only: Fast, but no internal
.landomain support without zone delegation. - Tailscale MagicDNS only: Depends on Headscale control plane, fails if control plane is down.
Decision: Split DNS architecture provides:
- Resilience: If Headscale tunnel fails, public DNS still works via AdGuard → Cloudflare.
- Ad-blocking: AdGuard filters ads and malicious domains for all VPN clients.
- Internal domains: Technitium authoritatively serves
.lan, no external dependency.
Key benefit: Zero tunnel dependency for public DNS. Users can browse the internet even if the homelab is completely offline.
Why Embedded DERP Relay?
Alternatives considered:
- External DERP relays only (Tailscale's public relays): Free, but adds latency and exposes traffic metadata to Tailscale.
- No DERP, direct connections only: Fails for symmetric NAT clients (mobile networks).
Decision: Embedded DERP (region 999) provides:
- Privacy: All relay traffic stays within the homelab.
- Reliability: Not dependent on Tailscale's public infrastructure.
- No extra ports: DERP uses HTTPS (443), same as Headscale API.
Trade-off: Adds CPU/memory overhead to Headscale pod, but minimal compared to benefits.
Why OIDC Authentication Instead of Pre-Authorized Keys?
Alternatives considered:
- Pre-authorized keys: Headscale generates keys, admin shares with users.
- Shared secret: Single password for all users.
Decision: OIDC via Authentik provides:
- Centralized access control: Add/remove users in one place.
- Audit trail: Authentik logs all login attempts.
- Group-based authorization: Only "Headscale Users" group can access VPN.
- SSO integration: Users already have accounts in Authentik for other services.
Key workflow: Admin invites user → user logs in via Authentik → admin approves device → access granted. No key exchange needed.
Troubleshooting
Headscale Login Fails (OIDC Error)
Symptoms: tailscale login --login-server opens browser, but after Authentik login, shows "OIDC error: invalid state".
Diagnosis: Check Headscale logs: kubectl logs -n headscale deploy/headscale
Common causes:
- Client clock skew: OIDC tokens have short validity (5 minutes). Ensure client's system time is accurate.
- Callback URL mismatch: Authentik application must have
https://headscale.viktorbarzin.me/oidc/callbackin Redirect URIs. - Group membership: User is not in "Headscale Users" group in Authentik.
Fix: Sync system clock, verify Authentik application config, add user to group.
Direct Connection Fails, Traffic Goes via DERP
Symptoms: tailscale status shows relay "home" instead of direct connection. Higher latency.
Diagnosis: Check DERP usage: tailscale netcheck
Common causes:
- Symmetric NAT: Mobile networks or restrictive corporate firewalls block UDP hole-punching.
- Firewall blocking WireGuard: Port 51820 UDP blocked on one or both clients.
- STUN failure: Can't determine external IP and port.
Fix: This is expected behavior in many environments. DERP relay ensures connectivity. If latency is unacceptable, use site-to-site WireGuard instead.
Can't Resolve .lan Domains from VPN
Symptoms: nslookup nextcloud.viktorbarzin.lan returns NXDOMAIN.
Diagnosis: Check DNS chain: Client → AdGuard → Technitium.
Steps:
- Verify AdGuard is running:
kubectl get pod -n adguard - Check AdGuard conditional forwarding: Query AdGuard directly:
nslookup nextcloud.viktorbarzin.lan <adguard-ip> - Check Technitium:
nslookup nextcloud.viktorbarzin.lan 10.0.20.101
Common causes:
- AdGuard not forwarding .lan: Conditional forwarding rule missing or misconfigured.
- Technitium down: Pod crash-looping or PVC corrupted.
- DNS propagation delay: Technitium zone update not yet applied.
Fix: Verify conditional forwarding in AdGuard UI. Restart Technitium if needed. Check zone file in Technitium UI.
VPN Client Can't Reach K8s Services
Symptoms: Can ping 10.0.20.1 (pfSense), but curl https://immich.viktorbarzin.me times out.
Diagnosis: Check connectivity at each layer:
- DNS: Does
nslookup immich.viktorbarzin.mereturn correct IP? - Routing: Can client reach MetalLB IP?
ping <loadbalancer-ip> - Firewall: Is pfSense blocking traffic from VPN subnet?
Common causes:
- Split DNS working too well: Client resolves to Cloudflare IP instead of internal LAN IP. Expected for proxied domains — use direct domain (e.g.,
immich-direct.viktorbarzin.me). - ACL policy: Headscale ACL blocks client from accessing certain subnets.
- pfSense NAT rule missing: Traffic from VPN subnet not routed to VLAN 20.
Fix: For proxied domains, use non-proxied DNS names. Check Headscale ACL policy. Verify pfSense NAT rules.
DERP Relay Returns 502 Bad Gateway
Symptoms: Tailscale clients can't connect, DERP shows offline in tailscale netcheck.
Diagnosis: Check Headscale ingress: kubectl get ingress -n headscale
Common causes:
- Traefik middleware blocking DERP traffic: Forward-auth interferes with WebSocket upgrade.
- Headscale pod not ready: Liveness probe failing.
- Cloudflared tunnel issue: DERP uses WebSockets, which require HTTP/1.1 upgrade support.
Fix: Ensure Headscale ingress has protected = false (no forward-auth). Check Headscale pod readiness. Verify Cloudflared supports WebSocket upgrades.
WireGuard Site-to-Site Tunnel Disconnects
Symptoms: Can't reach services in London from Sofia. ping 192.168.8.1 fails.
Diagnosis: Check pfSense WireGuard status via pfsense.py wireguard or Dashboard → VPN → WireGuard → Status
Common causes:
- AllowedIPs parse error on GL-iNet: If
wg show wgclient1shows no peers and interface is DOWN withqdisc noop, check/etc/config/wireguardpeer config. AllowedIPs must be a single comma-separated entry, not multiplelistentries (see London section above). - IPv6 endpoint resolution: If IPv4 is down, DNS resolves to IPv6 (AAAA record). Ensure the pfSense
HE_IPv6(gif0) interface has apass inrule for UDP 51821. - Keepalive packets dropped: Firewall or ISP blocking UDP 51821.
- Public IP changed: Dynamic IP on remote site changed, config still has old IP.
- GL-iNet policy routing lost: After firewall reload, check if
TUNNEL10_ROUTE_POLICYandLOCAL_POLICYmangle rules exist. If not, run/etc/init.d/firewall restartand check/etc/firewall.userexecution. - Kill switch active: If WG interface is DOWN, table 1001 only has blackhole routes → all marked traffic dropped → IPv4 internet broken.
Fix: Check wg show wgclient1 on London router. If no peers, fix AllowedIPs format and ifdown/ifup wgclient1. Verify handshake with ping 10.3.2.1.
Related
- Runbooks:
docs/runbooks/add-headscale-user.mddocs/runbooks/reset-derp-relay.mddocs/runbooks/update-wireguard-peer.md
- Architecture Docs:
docs/architecture/networking.md— Core network architecturedocs/architecture/dns.md— Full DNS architecture (coming soon)
- Reference:
.claude/reference/authentik-state.md— OIDC application configs.claude/reference/service-catalog.md— Full service inventory