Remote access to the homelab is provided through a hybrid VPN architecture: WireGuard site-to-site tunnels connect physical locations (Sofia, London, Valchedrym), while Headscale (self-hosted Tailscale control server) provides mesh overlay networking for roaming clients. Split DNS architecture ensures resilience: AdGuard serves as the global DNS resolver for all VPN clients, while Technitium handles internal `.lan` domains. This design prevents tunnel dependency for public DNS resolution — if the Cloudflared tunnel goes down, clients can still access the internet.
Three physical locations are permanently connected via WireGuard in a **hub-and-spoke** topology with Sofia as the hub. A single WireGuard interface (`tun_wg0`) on pfSense carries both peers on the `10.3.2.0/24` tunnel subnet:
- **Valchedrym** (spoke): `10.3.2.5` — OpenWRT router, LAN `192.168.0.0/24`
Routes are configured as static routes on pfSense. London and Valchedrym route Sofia-bound traffic through their WireGuard tunnels. London ↔ Valchedrym traffic transits through Sofia (no direct tunnel).
4. After successful login, Tailscale presents a registration URL
5. Admin approves the device via `headscale nodes register --user <username> --key <key>`
6. Client is added to the mesh, receives IP in 100.64.0.0/10 range
**Connectivity test**: `ping 10.0.20.100` (Sofia K8s API server) verifies full access to the homelab network.
### DERP Relay for NAT Traversal
**Problem**: Symmetric NAT or restrictive firewalls prevent direct WireGuard connections between clients.
**Solution**: Headscale runs an embedded DERP relay server (region 999, named "Home DERP"). DERP is Tailscale's NAT traversal protocol, implemented as an HTTPS-based relay.
**How it works**:
1. Clients attempt direct WireGuard connection via STUN/ICE.
2. If direct connection fails, both clients connect to the DERP relay via HTTPS.
3. Traffic is encrypted end-to-end with WireGuard, DERP only relays packets.
4. No additional ports needed — DERP uses the same HTTPS ingress as Headscale (443).
**Performance**: DERP adds latency (extra hop through Sofia K8s cluster), but ensures connectivity in all scenarios.
### Split DNS Architecture
**Design goal**: Prevent tunnel dependency for public DNS resolution. If the Headscale tunnel or Cloudflared tunnel fails, clients must still resolve public domains.
**Implementation**:
- **AdGuard DNS**: Global recursive resolver, serves all VPN clients. Includes ad-blocking and malicious domain filtering.
- **Technitium DNS**: Internal authoritative server for `.viktorbarzin.lan` domains.
3. For all other domains, AdGuard resolves directly via upstream (Cloudflare 1.1.1.1).
4. AdGuard caches responses, reducing load on Technitium and upstream.
**Resilience**: Even if the tunnel to Sofia is down, clients can still resolve `google.com`, `github.com`, etc., because AdGuard talks directly to Cloudflare. Only `.lan` domains become unavailable.
### Access Control (Authentik Groups)
**Headscale Users** group in Authentik controls VPN access. Membership is invitation-only:
1. Admin creates user in Authentik.
2. Admin adds user to "Headscale Users" group.
3. User logs in via OIDC during `tailscale login`.
4. Headscale verifies group membership via OIDC claims.
Removing a user from the group revokes VPN access on next re-authentication (every 30 days).
**Single interface `tun_wg0`** (OPT2) with two peers on subnet `10.3.2.0/24`. Listens on `*:51821` for both IPv4 and IPv6. IPv6 access via HE tunnel (`gif0`, `2001:470:6e:43d::2`) requires a `pass in` pf rule on the `HE_IPv6` interface (interface name `opt3` in config.xml):
- Policy routing: GL-iNet marks traffic via iptables mangle → routing table 1001 (ipset `dst_net10`)
- Persistence: `/etc/firewall.user` injects LOCAL_POLICY mangle rule (GL-iNet's `gl-tertf` creates TUNNEL10_ROUTE_POLICY but not the LOCAL_POLICY rule for router-originated traffic)
**GL-iNet AllowedIPs format**: UCI `list allowed_ips` entries are concatenated by the `wgclient` protocol handler. Use a **single comma-separated entry** (`'10.0.0.0/8,192.168.1.0/24,192.168.0.0/24'`), NOT multiple list entries. Multiple entries cause a parse error like `10.0.0.0/8192.168.1.0/24` (no separator).
**DNS**: AdGuardHome runs on the router. Upstream DNS should NOT include `1.1.1.1` — it creates conntrack conflicts with ICMP and GL-iNet's `carrier-monitor` health check floods Cloudflare, triggering ICMP rate limits. Use `9.9.9.9`, `8.8.4.4` instead. Health check IPs (`glconfig.general.track_ip`) should use `1.0.0.1` not `1.1.1.1`.
1.**Symmetric NAT**: Mobile networks or restrictive corporate firewalls block UDP hole-punching.
2.**Firewall blocking WireGuard**: Port 51820 UDP blocked on one or both clients.
3.**STUN failure**: Can't determine external IP and port.
**Fix**: This is expected behavior in many environments. DERP relay ensures connectivity. If latency is unacceptable, use site-to-site WireGuard instead.
1.**AdGuard not forwarding .lan**: Conditional forwarding rule missing or misconfigured.
2.**Technitium down**: Pod crash-looping or PVC corrupted.
3.**DNS propagation delay**: Technitium zone update not yet applied.
**Fix**: Verify conditional forwarding in AdGuard UI. Restart Technitium if needed. Check zone file in Technitium UI.
### VPN Client Can't Reach K8s Services
**Symptoms**: Can `ping 10.0.20.1` (pfSense), but `curl https://immich.viktorbarzin.me` times out.
**Diagnosis**: Check connectivity at each layer:
1.**DNS**: Does `nslookup immich.viktorbarzin.me` return correct IP?
2.**Routing**: Can client reach MetalLB IP? `ping <loadbalancer-ip>`
3.**Firewall**: Is pfSense blocking traffic from VPN subnet?
**Common causes**:
1.**Split DNS working too well**: Client resolves to Cloudflare IP instead of internal LAN IP. Expected for proxied domains — use direct domain (e.g., `immich-direct.viktorbarzin.me`).
2.**ACL policy**: Headscale ACL blocks client from accessing certain subnets.
3.**pfSense NAT rule missing**: Traffic from VPN subnet not routed to VLAN 20.
**Fix**: For proxied domains, use non-proxied DNS names. Check Headscale ACL policy. Verify pfSense NAT rules.
1.**AllowedIPs parse error on GL-iNet**: If `wg show wgclient1` shows no peers and interface is DOWN with `qdisc noop`, check `/etc/config/wireguard` peer config. AllowedIPs must be a single comma-separated entry, not multiple `list` entries (see London section above).
2.**IPv6 endpoint resolution**: If IPv4 is down, DNS resolves to IPv6 (AAAA record). Ensure the pfSense `HE_IPv6` (gif0) interface has a `pass in` rule for UDP 51821.
3.**Keepalive packets dropped**: Firewall or ISP blocking UDP 51821.
4.**Public IP changed**: Dynamic IP on remote site changed, config still has old IP.
5.**GL-iNet policy routing lost**: After firewall reload, check if `TUNNEL10_ROUTE_POLICY` and `LOCAL_POLICY` mangle rules exist. If not, run `/etc/init.d/firewall restart` and check `/etc/firewall.user` execution.
6.**Kill switch active**: If WG interface is DOWN, table 1001 only has blackhole routes → all marked traffic dropped → IPv4 internet broken.
**Fix**: Check `wg show wgclient1` on London router. If no peers, fix AllowedIPs format and `ifdown/ifup wgclient1`. Verify handshake with `ping 10.3.2.1`.