dns: pfSense forward-zone for viktorbarzin.me, nodes fully stock [ci skip]
Round 3 of the forgejo-pull hairpin fix (per Viktor: no per-node customization — split-brain lives in the DNS infra): - pfSense Unbound domain override viktorbarzin.me -> Technitium 10.0.20.201 (applied via php write_config, backup on-box). Every Unbound client on every VLAN now gets the internal split-horizon answers (live Traefik IP via apex CNAME) with zero per-host config. - CoreDNS carve-out (TF, applied): dedicated viktorbarzin.me:53 block — forgejo pinned to Traefik ClusterIP via data source (pods cannot reach the ETP=Local LB IP pfSense now returns), all other .me names kept on public resolvers (pods' pre-existing behavior). Replaces the .:53 forgejo rewrite. - Removed the same-day resolved routing-domain drop-ins from all 7 nodes; node5/6 link DNS repointed Technitium -> pfSense (netplan + qm 205/206) for fleet parity; cloud-init no longer writes any DNS drop-ins. - Docs: dns.md, pfsense-unbound runbook (override + rollback), registry bullet, post-mortem final-architecture addendum. Verified: nodes resolve forgejo -> .203 via pfSense, crictl pull OK, pods resolve forgejo -> ClusterIP / others -> public, mail record works, .lan zone unaffected. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
1ee1bf0817
commit
2b8c0def30
8 changed files with 182 additions and 101 deletions
|
|
@ -1,23 +1,23 @@
|
|||
#!/usr/bin/env bash
|
||||
# One-shot deployment of the forgejo pull path across every k8s node:
|
||||
# systemd-resolved routing domain ~viktorbarzin.me -> Technitium, plus the
|
||||
# (vestigial) containerd hosts.toml entry. Cloud-init only fires on VM
|
||||
# provision, so existing nodes need this manual rollout.
|
||||
# One-shot deployment of the (vestigial) forgejo containerd hosts.toml entry
|
||||
# across every k8s node, plus cleanup of legacy node-side DNS customization.
|
||||
# Cloud-init only fires on VM provision, so existing nodes need this manual
|
||||
# rollout.
|
||||
#
|
||||
# The routing domain is what actually makes pulls hairpin-proof: Technitium's
|
||||
# split-horizon zone resolves forgejo.viktorbarzin.me (CNAME, auto-synced from
|
||||
# ingresses) to the zone apex whose A record tracks the live Traefik LB IP —
|
||||
# no hardcoded service IPs on nodes. The hosts.toml mirror alone CANNOT do
|
||||
# this: Traefik 404s its bare-IP requests (no Host/SNI match) and the registry
|
||||
# Bearer auth realm is the absolute public URL fetched outside the mirror
|
||||
# (2026-06-10 tuya-bridge outage; see
|
||||
# docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md).
|
||||
# Node DNS is intentionally STOCK: internal split-horizon for
|
||||
# *.viktorbarzin.me happens at pfSense Unbound (domain override ->
|
||||
# Technitium), whose split-horizon zone serves the live Traefik LB IP for
|
||||
# every ingress host — nodes need no resolved drop-ins or /etc/hosts pins.
|
||||
# The hosts.toml mirror alone CANNOT keep pulls internal: Traefik 404s its
|
||||
# bare-IP requests (no Host/SNI match) and the registry Bearer auth realm is
|
||||
# the absolute public URL fetched outside the mirror (2026-06-10 tuya-bridge
|
||||
# outage; see docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md).
|
||||
#
|
||||
# What it does, per node:
|
||||
# 1. drain (ignore-daemonsets, delete-emptydir-data)
|
||||
# 2. ssh in: write /etc/systemd/resolved.conf.d/viktorbarzin.conf (routing
|
||||
# domain), neuter any public global-dns.conf to FallbackDNS-only, drop
|
||||
# legacy forgejo-internal-pin /etc/hosts lines, restart systemd-resolved,
|
||||
# 2. ssh in: remove legacy DNS customization (forgejo-internal-pin
|
||||
# /etc/hosts lines, viktorbarzin.conf / global-dns.conf resolved
|
||||
# drop-ins), restart systemd-resolved,
|
||||
# write /etc/containerd/certs.d/forgejo.viktorbarzin.me/hosts.toml
|
||||
# 3. systemctl restart containerd
|
||||
# 4. uncordon
|
||||
|
|
@ -51,26 +51,9 @@ for n in $NODES; do
|
|||
|
||||
ssh -o StrictHostKeyChecking=accept-new "wizard@$n" sudo bash <<EOF
|
||||
set -euo pipefail
|
||||
mkdir -p /etc/systemd/resolved.conf.d
|
||||
cat > /etc/systemd/resolved.conf.d/viktorbarzin.conf <<'CONF'
|
||||
# Route *.viktorbarzin.me to Technitium (split-horizon zone -> live Traefik LB),
|
||||
# so kubelet image pulls of forgejo.viktorbarzin.me never traverse the public
|
||||
# NAT-hairpin. Everything else uses the link DNS.
|
||||
# Managed: setup-forgejo-containerd-mirror.sh / cloud_init.yaml
|
||||
[Resolve]
|
||||
DNS=10.0.20.201
|
||||
Domains=~viktorbarzin.me
|
||||
CONF
|
||||
# Public servers in the global DNS= set would race the routing domain —
|
||||
# demote any legacy global-dns.conf to emergency fallback only.
|
||||
if [ -f /etc/systemd/resolved.conf.d/global-dns.conf ]; then
|
||||
cat > /etc/systemd/resolved.conf.d/global-dns.conf <<'CONF'
|
||||
# Emergency fallback only (used when no link DNS is configured at all).
|
||||
[Resolve]
|
||||
FallbackDNS=8.8.8.8 1.1.1.1
|
||||
CONF
|
||||
fi
|
||||
sed -i '/forgejo-internal-pin/d' /etc/hosts
|
||||
rm -f /etc/systemd/resolved.conf.d/viktorbarzin.conf \
|
||||
/etc/systemd/resolved.conf.d/global-dns.conf
|
||||
systemctl restart systemd-resolved
|
||||
mkdir -p "$CERTS_DIR"
|
||||
cat > "$CERTS_DIR/hosts.toml" <<'TOML'
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue