Merge origin/master (pfsense SNI-routed internal 443) into forgejo/master
Reconciles the two live infra remotes after the pve-host logging change landed on forgejo (which was a commit behind origin). Non-destructive merge — keeps botheae35c51(pfsense webmail SNI routing) andaac807fb(pve-host Loki shipping).
This commit is contained in:
commit
8304ef0f70
4 changed files with 113 additions and 4 deletions
|
|
@ -269,7 +269,7 @@ Technitium's **Split Horizon AddressTranslation** app post-processes DNS respons
|
|||
|
||||
- **Affected**: Non-proxied domains (ha-sofia, immich, headscale, calibre, vaultwarden, etc.) for 192.168.1.x clients
|
||||
- **Not affected**: Cloudflare-proxied domains (resolve to Cloudflare edge IPs, no translation needed)
|
||||
- **10.0.x.x clients (k8s nodes, devvm, other VMs)** — handled at the resolver since 2026-06-10: **pfSense Unbound carries a domain override forwarding the whole `viktorbarzin.me` zone to Technitium** (`10.0.20.201`). Technitium's split-horizon zone answers with the zone apex A record, which auto-tracks the live Traefik LB IP (`technitium-ingress-dns-sync` CNAMEs every ingress host hourly; `viktorbarzin-apex-probe` is the drift canary). Every client of pfSense Unbound — all VLANs, k8s nodes included — therefore gets internal answers with **zero per-host configuration** (no `/etc/hosts` pins, no resolved drop-ins; both earlier same-day approaches were removed, nodes are stock). Names not behind Traefik keep distinct records in the zone (e.g. `mail.viktorbarzin.me → 10.0.20.1`, verified working on :993/:25). See `docs/runbooks/pfsense-unbound.md` for the override config + rollback, and `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md` for the incident that motivated this (kubelet forgejo pulls riding the broken hairpin; the containerd hosts.toml mirror cannot fix it — Traefik 404s bare-IP requests and the registry auth realm is an absolute public URL).
|
||||
- **10.0.x.x clients (k8s nodes, devvm, other VMs)** — handled at the resolver since 2026-06-10: **pfSense Unbound carries a domain override forwarding the whole `viktorbarzin.me` zone to Technitium** (`10.0.20.201`). Technitium's split-horizon zone answers with the zone apex A record, which auto-tracks the live Traefik LB IP (`technitium-ingress-dns-sync` CNAMEs every ingress host hourly; `viktorbarzin-apex-probe` is the drift canary). Every client of pfSense Unbound — all VLANs, k8s nodes included — therefore gets internal answers with **zero per-host configuration** (no `/etc/hosts` pins, no resolved drop-ins; both earlier same-day approaches were removed, nodes are stock). Names not behind Traefik keep distinct records in the zone (e.g. `mail.viktorbarzin.me → 10.0.20.1`, verified working on :993/:25; since 2026-06-10 its :443 also works internally — pfSense carries an SNI-routed HAProxy frontend on 443 that sends hostname traffic to Traefik and bare-IP/no-SNI traffic to the webGUI, which moved to :8443; see `docs/runbooks/mailserver-pfsense-haproxy.md`). See `docs/runbooks/pfsense-unbound.md` for the override config + rollback, and `docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md` for the incident that motivated this (kubelet forgejo pulls riding the broken hairpin; the containerd hosts.toml mirror cannot fix it — Traefik 404s bare-IP requests and the registry auth realm is an absolute public URL).
|
||||
- **devvm**: also covered by a `~viktorbarzin.me → 10.0.20.201` resolved routing domain (predates the pfSense override, provisioned by `setup-devvm.sh`) — redundant-but-harmless belt-and-suspenders.
|
||||
- **in-cluster PODS are ordinary internal clients too** (since 2026-06-10 evening): CoreDNS's dedicated `viktorbarzin.me:53` block (in `stacks/technitium`, TF-managed) forwards to the Technitium ClusterIP (`10.96.0.53`, same as the `.lan` block), so pods get the same split-horizon answers as everyone else. This works because on k8s 1.34 **pods CAN reach the ETP=Local Traefik LB IP** — kube-proxy short-circuits in-cluster traffic to LB IPs via the cluster path (verified from pods on three non-Traefik nodes; re-verify after major k8s upgrades — the canary is the uptime-kuma `[External]` fleet going red). forgejo stays pinned to Traefik's **ClusterIP** in the same block so CI pushes survive a Technitium outage. History: the block briefly forwarded to `8.8.8.8/1.1.1.1` (morning of 2026-06-10), which kept pods on public IPs and the broken TP-Link NAT loopback — 27 non-proxied `[External]` uptime-kuma monitors dark (beads code-yh33). Note: in-cluster `[External]` monitors now test DNS+Traefik+service via the internal path for ALL names, including Cloudflare-proxied ones — genuine edge-path fidelity is the job of a true external vantage (ha-london), not in-cluster probes.
|
||||
- **Trade-off**: `viktorbarzin.me` resolution via pfSense now depends on in-cluster Technitium (3 replicas). During a full cluster outage the zone SERVFAILs LAN-wide — acceptable, the services behind it are down anyway; node bootstrap images pull via the IP-addressed `10.0.20.10` mirrors, so cold-start self-unwinds.
|
||||
|
|
|
|||
|
|
@ -55,7 +55,7 @@ External mail (WAN) path — PROXY v2
|
|||
│ pfSense WAN:{25,465,587,993} │
|
||||
│ │ NAT rdr → 10.0.20.1:{same} │
|
||||
│ ▼ │
|
||||
│ pfSense HAProxy (mode tcp, 4 frontends, 4 backend pools) │
|
||||
│ pfSense HAProxy (mode tcp, 5 frontends, 6 backend pools) │
|
||||
│ │ data: send-proxy-v2 → :{30125..30128} (PROXY-aware pod) │
|
||||
│ │ health: TCP-check → :{30145..30147} (no-PROXY pod) │
|
||||
│ │ inter 5000 │
|
||||
|
|
@ -113,6 +113,28 @@ kubectl logs -c docker-mailserver deployment/mailserver -n mailserver \
|
|||
# Expect external source IPs (e.g., Brevo 77.32.148.x), NOT 10.0.20.x
|
||||
```
|
||||
|
||||
## SNI-routed internal :443 frontend (2026-06-10)
|
||||
|
||||
`internal_https_443` binds `10.0.20.1:443` + `10.0.10.1:443` and completes
|
||||
the internal port table of the mail front door so `mail.viktorbarzin.me`
|
||||
(internal A record → 10.0.20.1) serves webmail too. Routing (Viktor's
|
||||
design — route by what the client asked for):
|
||||
|
||||
| Client connects with | Routed to |
|
||||
|---|---|
|
||||
| SNI = `pfsense.viktorbarzin.{lan,me}` | webgui backend `127.0.0.1:8443` |
|
||||
| any other SNI (hostnames, e.g. `mail.…`) | Traefik `10.0.20.203:443`, send-proxy-v2 |
|
||||
| no SNI (bare IP — `https://10.0.20.1`) | webgui backend `127.0.0.1:8443` |
|
||||
|
||||
The **pfSense webGUI was moved to `:8443`** (config.xml
|
||||
`system.webgui.port`, 2026-06-10) to free the 443 socket; admin access by
|
||||
IP keeps working through the no-SNI route, and `:8443` remains a direct
|
||||
fallback if HAProxy is down. The `pfsense.viktorbarzin.me` Traefik ingress
|
||||
(stacks/reverse-proxy) targets `:8443` directly. Traefik leg mirrors the
|
||||
IPv6 bridge: send-proxy-v2 (Traefik trusts 10.0.20.1), **no health check**
|
||||
(PROXY-expecting receivers reject bare probes — gotcha above). All of this
|
||||
is declared in `pfsense-haproxy-bootstrap.php` — re-run to reset.
|
||||
|
||||
## Bootstrap / restore from scratch
|
||||
|
||||
pfSense HAProxy config lives in `/cf/conf/config.xml` under
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue