From eae35c511a1ae80c4fb3c1442d93d10e50847746 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Wed, 10 Jun 2026 18:41:07 +0000 Subject: [PATCH] =?UTF-8?q?pfsense:=20SNI-routed=20internal=20443=20?= =?UTF-8?q?=E2=80=94=20mail.viktorbarzin.me=20serves=20webmail=20everywher?= =?UTF-8?q?e?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Completes the internal port table of the mail front door (10.0.20.1): 443 was squatted by the pfSense webGUI (self-signed cert expired 2022), so internal webmail and the kuma [External] mail probe hit the firewall login instead of Roundcube — the last leg of the mail split-brain name. Design (Viktor): route by what the client asked for. New HAProxy frontend internal_https_443 (binds 10.0.20.1+10.0.10.1 :443, mode tcp): SNI present -> Traefik .203 with send-proxy-v2 (trusted, IPv6-bridge pattern, no health check per the PROXY-probe gotcha); SNI of pfsense.viktorbarzin.{lan,me} or NO SNI (bare-IP admin access) -> webGUI, which moved to :8443 (invisible to habits — https://10.0.20.1 still lands on the login page; :8443 doubles as direct fallback). The reverse-proxy pfsense ingress now targets :8443 directly. Declared idempotently in pfsense-haproxy-bootstrap.php; config.xml backed up on-box (config.xml.bak-2026-06-10-pre-sni443). Verified: bare IP -> GUI login; pfsense.viktorbarzin.lan -> GUI; pfsense.viktorbarzin.me -> 302 via ingress; mail.viktorbarzin.me -> Roundcube with STRICT cert validation; :993 IMAPS untouched. Co-Authored-By: Claude Fable 5 --- docs/architecture/dns.md | 2 +- docs/runbooks/mailserver-pfsense-haproxy.md | 24 +++++- scripts/pfsense-haproxy-bootstrap.php | 86 ++++++++++++++++++- .../modules/reverse_proxy/main.tf | 5 +- 4 files changed, 113 insertions(+), 4 deletions(-) diff --git a/docs/architecture/dns.md b/docs/architecture/dns.md index d82bcd5f..6150d226 100644 --- a/docs/architecture/dns.md +++ b/docs/architecture/dns.md @@ -269,7 +269,7 @@ Technitium's **Split Horizon AddressTranslation** app post-processes DNS respons - **Affected**: Non-proxied domains (ha-sofia, immich, headscale, calibre, vaultwarden, etc.) for 192.168.1.x clients - **Not affected**: Cloudflare-proxied domains (resolve to Cloudflare edge IPs, no translation needed) -- **10.0.x.x clients (k8s nodes, devvm, other VMs)** — handled at the resolver since 2026-06-10: **pfSense Unbound carries a domain override forwarding the whole `viktorbarzin.me` zone to Technitium** (`10.0.20.201`). Technitium's split-horizon zone answers with the zone apex A record, which auto-tracks the live Traefik LB IP (`technitium-ingress-dns-sync` CNAMEs every ingress host hourly; `viktorbarzin-apex-probe` is the drift canary). Every client of pfSense Unbound — all VLANs, k8s nodes included — therefore gets internal answers with **zero per-host configuration** (no `/etc/hosts` pins, no resolved drop-ins; both earlier same-day approaches were removed, nodes are stock). Names not behind Traefik keep distinct records in the zone (e.g. `mail.viktorbarzin.me → 10.0.20.1`, verified working on :993/:25). See `docs/runbooks/pfsense-unbound.md` for the override config + rollback, and `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md` for the incident that motivated this (kubelet forgejo pulls riding the broken hairpin; the containerd hosts.toml mirror cannot fix it — Traefik 404s bare-IP requests and the registry auth realm is an absolute public URL). +- **10.0.x.x clients (k8s nodes, devvm, other VMs)** — handled at the resolver since 2026-06-10: **pfSense Unbound carries a domain override forwarding the whole `viktorbarzin.me` zone to Technitium** (`10.0.20.201`). Technitium's split-horizon zone answers with the zone apex A record, which auto-tracks the live Traefik LB IP (`technitium-ingress-dns-sync` CNAMEs every ingress host hourly; `viktorbarzin-apex-probe` is the drift canary). Every client of pfSense Unbound — all VLANs, k8s nodes included — therefore gets internal answers with **zero per-host configuration** (no `/etc/hosts` pins, no resolved drop-ins; both earlier same-day approaches were removed, nodes are stock). Names not behind Traefik keep distinct records in the zone (e.g. `mail.viktorbarzin.me → 10.0.20.1`, verified working on :993/:25; since 2026-06-10 its :443 also works internally — pfSense carries an SNI-routed HAProxy frontend on 443 that sends hostname traffic to Traefik and bare-IP/no-SNI traffic to the webGUI, which moved to :8443; see `docs/runbooks/mailserver-pfsense-haproxy.md`). See `docs/runbooks/pfsense-unbound.md` for the override config + rollback, and `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md` for the incident that motivated this (kubelet forgejo pulls riding the broken hairpin; the containerd hosts.toml mirror cannot fix it — Traefik 404s bare-IP requests and the registry auth realm is an absolute public URL). - **devvm**: also covered by a `~viktorbarzin.me → 10.0.20.201` resolved routing domain (predates the pfSense override, provisioned by `setup-devvm.sh`) — redundant-but-harmless belt-and-suspenders. - **in-cluster PODS are ordinary internal clients too** (since 2026-06-10 evening): CoreDNS's dedicated `viktorbarzin.me:53` block (in `stacks/technitium`, TF-managed) forwards to the Technitium ClusterIP (`10.96.0.53`, same as the `.lan` block), so pods get the same split-horizon answers as everyone else. This works because on k8s 1.34 **pods CAN reach the ETP=Local Traefik LB IP** — kube-proxy short-circuits in-cluster traffic to LB IPs via the cluster path (verified from pods on three non-Traefik nodes; re-verify after major k8s upgrades — the canary is the uptime-kuma `[External]` fleet going red). forgejo stays pinned to Traefik's **ClusterIP** in the same block so CI pushes survive a Technitium outage. History: the block briefly forwarded to `8.8.8.8/1.1.1.1` (morning of 2026-06-10), which kept pods on public IPs and the broken TP-Link NAT loopback — 27 non-proxied `[External]` uptime-kuma monitors dark (beads code-yh33). Note: in-cluster `[External]` monitors now test DNS+Traefik+service via the internal path for ALL names, including Cloudflare-proxied ones — genuine edge-path fidelity is the job of a true external vantage (ha-london), not in-cluster probes. - **Trade-off**: `viktorbarzin.me` resolution via pfSense now depends on in-cluster Technitium (3 replicas). During a full cluster outage the zone SERVFAILs LAN-wide — acceptable, the services behind it are down anyway; node bootstrap images pull via the IP-addressed `10.0.20.10` mirrors, so cold-start self-unwinds. diff --git a/docs/runbooks/mailserver-pfsense-haproxy.md b/docs/runbooks/mailserver-pfsense-haproxy.md index 329be214..f0780caf 100644 --- a/docs/runbooks/mailserver-pfsense-haproxy.md +++ b/docs/runbooks/mailserver-pfsense-haproxy.md @@ -55,7 +55,7 @@ External mail (WAN) path — PROXY v2 │ pfSense WAN:{25,465,587,993} │ │ │ NAT rdr → 10.0.20.1:{same} │ │ ▼ │ -│ pfSense HAProxy (mode tcp, 4 frontends, 4 backend pools) │ +│ pfSense HAProxy (mode tcp, 5 frontends, 6 backend pools) │ │ │ data: send-proxy-v2 → :{30125..30128} (PROXY-aware pod) │ │ │ health: TCP-check → :{30145..30147} (no-PROXY pod) │ │ │ inter 5000 │ @@ -113,6 +113,28 @@ kubectl logs -c docker-mailserver deployment/mailserver -n mailserver \ # Expect external source IPs (e.g., Brevo 77.32.148.x), NOT 10.0.20.x ``` +## SNI-routed internal :443 frontend (2026-06-10) + +`internal_https_443` binds `10.0.20.1:443` + `10.0.10.1:443` and completes +the internal port table of the mail front door so `mail.viktorbarzin.me` +(internal A record → 10.0.20.1) serves webmail too. Routing (Viktor's +design — route by what the client asked for): + +| Client connects with | Routed to | +|---|---| +| SNI = `pfsense.viktorbarzin.{lan,me}` | webgui backend `127.0.0.1:8443` | +| any other SNI (hostnames, e.g. `mail.…`) | Traefik `10.0.20.203:443`, send-proxy-v2 | +| no SNI (bare IP — `https://10.0.20.1`) | webgui backend `127.0.0.1:8443` | + +The **pfSense webGUI was moved to `:8443`** (config.xml +`system.webgui.port`, 2026-06-10) to free the 443 socket; admin access by +IP keeps working through the no-SNI route, and `:8443` remains a direct +fallback if HAProxy is down. The `pfsense.viktorbarzin.me` Traefik ingress +(stacks/reverse-proxy) targets `:8443` directly. Traefik leg mirrors the +IPv6 bridge: send-proxy-v2 (Traefik trusts 10.0.20.1), **no health check** +(PROXY-expecting receivers reject bare probes — gotcha above). All of this +is declared in `pfsense-haproxy-bootstrap.php` — re-run to reset. + ## Bootstrap / restore from scratch pfSense HAProxy config lives in `/cf/conf/config.xml` under diff --git a/scripts/pfsense-haproxy-bootstrap.php b/scripts/pfsense-haproxy-bootstrap.php index 5452b198..5b9119b2 100644 --- a/scripts/pfsense-haproxy-bootstrap.php +++ b/scripts/pfsense-haproxy-bootstrap.php @@ -45,6 +45,8 @@ $h['maxconn'] = '1000'; // Our declared object names (anything starting with mailserver_ is ours) $POOL_NAMES = [ + 'webgui_traefik_443', // SNI-routed 443: hostname traffic -> Traefik + 'pfsense_webgui_8443', // SNI-routed 443: no-SNI / pfsense.* -> webgui 'mailserver_nodes', // legacy (Phase 2/3 test) 'mailserver_nodes_smtp', 'mailserver_nodes_smtps', @@ -52,6 +54,7 @@ $POOL_NAMES = [ 'mailserver_nodes_imaps', ]; $FRONTEND_NAMES = [ + 'internal_https_443', // SNI-routed internal 443 (2026-06-10) 'mailserver_proxy_test', // legacy (Phase 2/3 test, :2525) 'mailserver_proxy_25', 'mailserver_proxy_465', @@ -185,6 +188,58 @@ $h['ha_pools']['item'][] = build_pool('mailserver_nodes_smtps', '30126', $NODES, $h['ha_pools']['item'][] = build_pool('mailserver_nodes_sub', '30127', $NODES, 'TCP', '30147'); $h['ha_pools']['item'][] = build_pool('mailserver_nodes_imaps', '30128', $NODES); +// ── SNI-routed internal :443 pools (2026-06-10) ───────────────────────── +// Completes the internal port table of 10.0.20.1 so mail.viktorbarzin.me +// (internal A record -> 10.0.20.1) serves webmail too. Routing rule +// (Viktor's design): TLS with a hostname (SNI present) -> Traefik; bare-IP +// /no-SNI (admin hitting https://10.0.20.1) -> pfSense webgui, which moved +// to :8443 to free the socket. pfsense.viktorbarzin.{lan,me} SNI is +// excepted back to the webgui. Traefik leg mirrors the IPv6 bridge: +// send-proxy-v2 (Traefik trusts 10.0.20.1), NO health check (PROXY- +// expecting receivers reject bare probes — see runbook gotcha). +$h['ha_pools']['item'][] = [ + 'name' => 'webgui_traefik_443', + 'balance' => '', + 'check_type' => 'none', + 'monitor_domain' => '', + 'checkinter' => '', + 'retries' => '', + 'ha_servers' => ['item' => [[ + 'name' => 'traefik', + 'address' => '10.0.20.203', + 'port' => '443', + 'weight' => '10', + 'ssl' => '', + 'advanced' => 'send-proxy-v2', + 'status' => 'active', + ]]], + 'advanced_bind' => '', + 'persist_cookie_enabled' => '', + 'transparent_clientip' => '', + 'advanced' => '', +]; +$h['ha_pools']['item'][] = [ + 'name' => 'pfsense_webgui_8443', + 'balance' => '', + 'check_type' => 'none', + 'monitor_domain' => '', + 'checkinter' => '', + 'retries' => '', + 'ha_servers' => ['item' => [[ + 'name' => 'webgui', + 'address' => '127.0.0.1', + 'port' => '8443', + 'weight' => '10', + 'ssl' => '', + 'advanced' => '', + 'status' => 'active', + ]]], + 'advanced_bind' => '', + 'persist_cookie_enabled' => '', + 'transparent_clientip' => '', + 'advanced' => '', +]; + // ── Frontends ─────────────────────────────────────────────────────────── if (!is_array($h['ha_backends'])) $h['ha_backends'] = ['item' => []]; if (!is_array($h['ha_backends']['item'])) $h['ha_backends']['item'] = []; @@ -228,7 +283,36 @@ $h['ha_backends']['item'][] = build_frontend( 'mailserver_nodes_imaps' ); -write_config('code-yiu: mailserver HAProxy — 4 production frontends + legacy :2525 test'); +// ── SNI-routed internal :443 frontend (2026-06-10) ────────────────────── +// Binds both internal interface IPs so IP-based GUI access works from +// either VLAN. mode tcp + SNI inspection; TLS passthrough on both legs +// (Traefik serves the real certs; the webgui keeps its self-signed one). +$h['ha_backends']['item'][] = [ + 'name' => 'internal_https_443', + 'descr' => 'SNI-routed internal 443: hostname->Traefik (proxy-v2), no-SNI/pfsense.*->webgui:8443', + 'status' => 'active', + 'secondary' => '', + 'type' => 'tcp', + 'a_extaddr' => ['item' => [ + ['extaddr' => 'custom', 'extaddr_custom' => '10.0.20.1', 'extaddr_port' => '443', 'extaddr_ssl' => '', 'extaddr_advanced' => ''], + ['extaddr' => 'custom', 'extaddr_custom' => '10.0.10.1', 'extaddr_port' => '443', 'extaddr_ssl' => '', 'extaddr_advanced' => ''], + ]], + 'backend_serverpool' => 'pfsense_webgui_8443', + 'ha_acls' => ['item' => [ + ['name' => 'sni_pfsense', 'expression' => 'custom', 'value' => 'req.ssl_sni -i -m str pfsense.viktorbarzin.lan pfsense.viktorbarzin.me', 'casesensitive' => '', 'not' => ''], + ['name' => 'sni_any', 'expression' => 'custom', 'value' => 'req.ssl_sni -m found', 'casesensitive' => '', 'not' => ''], + ]], + 'a_actionitems' => ['item' => [ + ['action' => 'use_backend', 'use_backendbackend' => 'pfsense_webgui_8443', 'acl' => 'sni_pfsense'], + ['action' => 'use_backend', 'use_backendbackend' => 'webgui_traefik_443', 'acl' => 'sni_any'], + ]], + 'dontlognull'=> '', + 'httpclose' => '', + 'forwardfor' => '', + 'advanced' => base64_encode("tcp-request inspect-delay 5s\n\ttcp-request content accept if { req.ssl_hello_type 1 } || !{ req.ssl_hello_type 1 }"), +]; + +write_config('mailserver HAProxy + SNI-routed internal 443 (hostname->Traefik, no-SNI->webgui:8443)'); $messages = ''; $rc = haproxy_check_and_run($messages, true); diff --git a/stacks/reverse-proxy/modules/reverse_proxy/main.tf b/stacks/reverse-proxy/modules/reverse_proxy/main.tf index ebb145e9..92a3cd34 100644 --- a/stacks/reverse-proxy/modules/reverse_proxy/main.tf +++ b/stacks/reverse-proxy/modules/reverse_proxy/main.tf @@ -36,7 +36,10 @@ module "pfsense" { name = "pfsense" external_name = "pfsense.viktorbarzin.lan" tls_secret_name = var.tls_secret_name - port = 443 + # webGUI moved to :8443 on 2026-06-10 — :443 on pfSense is now the + # SNI-routed HAProxy frontend (hostname->Traefik, no-SNI->GUI). Direct + # backend port avoids a Traefik->HAProxy->GUI double hop. + port = 8443 backend_protocol = "HTTPS" extra_annotations = {