infra/docs/plans/2026-07-04-immich-frame-lan-only-design.md
Viktor Barzin 8bac9914ec
Some checks failed
ci/woodpecker/push/default Pipeline failed
immich-frame: LAN-only access via home-lans-only allowlist + dns_type=internal
Viktor asked to tighten who can see the immich-frame deployments: make
them not public while keeping the two Meta Portals working as frames.
The Portal app bakes the URL into the APK, so the same hostnames must
keep loading from the home networks with zero device or router changes.

- New shared Traefik middleware home-lans-only (Sofia/London/Valchedrym
  LANs + 10/8 + internal v6) — separate from local-only so the remote
  LANs don't inherit access to admin surfaces.
- New ingress_factory dns_type="internal": publicly-resolvable A record
  carrying the internal Traefik LB IP (10.0.20.203). Outsiders resolve
  but can't route; WG spokes policy-route 10/8 down the tunnel. Never
  combine the allowlist with proxied DNS (cloudflared pod IPs are in
  10/8 and would bypass it).
- Both frame ingresses: dns_type internal + allowlist attached +
  external_monitor=false (drop the doomed [External] monitors).
- rybbit worker: highlights-immich route/site removed (off Cloudflare).
- Docs: CLAUDE.md/AGENTS.md ingress tiers, networking.md DNS categories,
  design doc docs/plans/2026-07-04-immich-frame-lan-only-design.md.

Pre-verified: London router DNS returns RFC1918 answers unfiltered;
Technitium already CNAMEs both hosts to the LB; no public wildcard.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-04 14:21:01 +00:00

6.7 KiB

immich-frame: LAN-only access, Portals untouched (2026-07-04)

Goal

Strangers must no longer be able to view highlights-immich.viktorbarzin.me (Viktor's London Portal Plus frame) or highlights-immich-emo.viktorbarzin.me (Emo's Sofia Portal Mini frame) — pages or ImmichFrame API. Both were auth = "none", Cloudflare-proxied, fully public.

Who keeps access (per Viktor, this session): the two Portals plus any household device on the Sofia, London, or Valchedrym home networks. No public access, no tailnet requirement. Hard constraint: the Portal app is a WebView with the URL baked in at APK build time (portal-immich-frame, -PframeUrl), so the exact URLs must keep loading from where the Portals sit — zero app rebuilds, zero device touches, zero router changes.

Design

Two cooperating pieces — the gate and the reachability pointer:

  1. The gate — home-lans-only Traefik middleware (traefik stack, next to local-only): ipAllowList of 192.168.1.0/24 (Sofia LAN), 10.0.0.0/8 (VLANs, K8s pods 10.10.0.0/16, services 10.96.0.0/12, WG tunnel 10.3.2.0/24), 192.168.8.0/24 (London LAN), 192.168.0.0/24 (Valchedrym LAN), fc00::/7, fe80::/10. Attached to both frame ingresses via extra_middlewares. Everyone else gets a Traefik 403 — including direct-to-WAN-IP requests carrying the right SNI, which DNS changes alone cannot stop. A separate middleware rather than a widened local-only, because widening would silently grant the remote LANs access to the 9 admin surfaces using it (Prometheus, iDRAC, Loki, …).

  2. The pointer — dns_type = "internal" (new ingress_factory tier, Viktor's idea): a non-proxied public A record → 10.0.20.203 (module var internal_lb_ip). Outsiders resolve it but get an unroutable RFC1918 address; every household resolver path delivers a working answer with no config anywhere: Sofia LAN already gets the internal CNAME from Technitium, London/Valchedrym resolve the public record via any upstream and policy-route 10.0.0.0/8 down the WireGuard tunnel. IPv4-only (spokes route no internal v6 range).

Interlock (the reason both flip together): with a proxied record, public traffic arrives from cloudflared pod IPs inside 10/8 and would sail through the allowlist. internal removes the Cloudflare path entirely (CF edge stops serving the hostname), so every request reaches Traefik with its real source IP (ETP=Local). Verified: no wildcard *.viktorbarzin.me record exists to resurrect public resolution.

auth stays "none" — there is still no user auth by design (kiosk WebView; forward-auth would 302 the device to a login it can't complete, and emo's Google-only account can't log in inside a WebView at all); the convention comment now names the ipAllowList as the gate.

Resulting flows

Client Path Result
Emo's Portal Mini (Sofia LAN) Technitium CNAME → .203 direct (unchanged) allowed (192.168.1.x)
Viktor's Portal Plus (London LAN) public A → 10.0.20.203 → WG tunnel allowed (192.168.8.x)
Household browsers (any of the 3 LANs) same as above allowed
In-cluster checks (homelab browser, blackbox) CoreDNS → Technitium → .203 allowed (pod IP in 10/8)
Stranger, resolves hostname gets 10.0.20.203 unroutable
Stranger, hits WAN IP with SNI pfSense NAT → Traefik (real source IP) 403
Stranger, via Cloudflare no proxied record CF edge won't serve the host

Rejected alternatives

  • ImmichFrame AuthenticationSecret (supported upstream: web input field or ?authsecret= param + bearer API): real auth from anywhere, but family browsers would face a secret prompt (fails "household devices just work"), the secret leaks into URLs/analytics/APK, and robust rollout needs APK rebuild + USB-adb sideload on both Portals (the Sofia one is high-friction).
  • Authentik forward-auth / auth = "public": WebView can't complete SSO (Google blocks WebView logins; session expiry silently bricks an appliance); the anonymous outpost is an audit trail, not a gate.
  • Remove DNS + London router AdGuardHome rewrites: works, but adds an out-of-band, un-IaC'd router dependency the internal-IP record makes unnecessary. Kept as documented fallback if resolver-side private-IP filtering ever appears in the London path.

Pre-verified facts (2026-07-04)

  • London Flint 2 DNS chain returns RFC1918 answers unfiltered (nslookup 10.0.20.203.nip.io 127.0.0.1 on the router → 10.0.20.203; dnsmasq rebind_protection '0', no AdGuardHome rebind filtering).
  • Technitium already CNAMEs both hostnames → apex → 10.0.20.203 (technitium-ingress-dns-sync is ingress-driven, not DNS-record-driven, so the internal answer survives the Cloudflare record swap).
  • Pod CIDR 10.10.0.0/16, service CIDR 10.96.0.0/12 — inside 10.0.0.0/8.
  • No public wildcard record in the zone.

Blast radius & cleanups

  • external_monitor = false set explicitly on both ingresses: the external-monitor-sync default opt-in would otherwise keep the now-doomed [External] highlights-immich* uptime-kuma monitors alive and red. Verify the sync drops them post-apply.
  • rybbit CF worker: highlights-immich removed from SITE_IDS (index.js) and wrangler.toml routes — off Cloudflare the route can never fire. Requires a wrangler deploy to take effect (route removal is hygiene, not functional).
  • Homepage dashboard link keeps working from LANs (hostname unchanged).
  • Docs updated in the same change: .claude/CLAUDE.md (DNS tier + external-monitor mechanism), AGENTS.md, docs/architecture/networking.md (Internal-IP domains category). The portal-immich-frame repo's glossary ("public, login-less URL") updated separately in that repo.

Failure-mode delta

London frame now depends on the WG tunnel instead of Cloudflare+cloudflared (the app self-heals with 5s retries; tunnel-flap modes documented in docs/architecture/vpn.md). A Traefik LB renumber must update internal_lb_ip in the module alongside the split-horizon apex record. Cutover window: cached proxied answers keep working ≤ ~5 min TTL, then the WebView's own retry picks up the new path.

Verification & rollback

Verify: public dig → 10.0.20.203 (both hosts); Technitium dig → .203; curl from devvm (10/8) → 200; external vantage (WebFetch/cloud) → unreachable or 403; middleware attached on both ingresses; Emo's frame renders via homelab browser; London Portal image fetches visible in Traefik access logs from 192.168.8.x. Rollback: git revert + apply traefik/immich — records and middleware chain restore (allow_overwrite = true re-adopts the records).