forgejo pulls: route *.viktorbarzin.me to Technitium, drop /etc/hosts pins [ci skip]
Supersedes this morning's per-node /etc/hosts pin (no hardcoded service
IPs on nodes, per Viktor). Technitium's split-horizon zone already
resolves forgejo.viktorbarzin.me -> CNAME apex -> live Traefik LB IP
(ingress-dns-sync auto-CNAMEs every ingress host; apex drift probe
alerts) -- the nodes just never queried it. Rolled the devvm's
systemd-resolved routing-domain pattern (~viktorbarzin.me ->
10.0.20.201) to all 7 nodes, removed the pins, verified getent +
crictl pull via pure DNS.
Also demoted node5/6's cloud-init global-dns.conf (DNS=8.8.8.8 1.1.1.1)
to FallbackDNS-only: public servers in the global set race the routing
domain. Its justification ("Technitium NXDOMAINs forgejo") was obsolete
-- exactly the stale comment that pointed new nodes at the hairpin.
hosts.toml mirror kept but documented as vestigial (Traefik 404s
bare-IP requests; registry auth realm is an absolute URL).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
b6976ce014
commit
1ee1bf0817
7 changed files with 135 additions and 66 deletions
|
|
@ -90,18 +90,35 @@ runcmd:
|
|||
- sed -i 's/#Compress=yes/Compress=yes/' /etc/systemd/journald.conf
|
||||
- systemctl restart systemd-journald
|
||||
%{if is_k8s_template}
|
||||
# systemd-resolved global DNS fallback. Without this, only the
|
||||
# link-level DNS from Proxmox's `qm set --nameserver` (Technitium,
|
||||
# 10.0.20.201) is consulted — and Technitium returns NXDOMAIN for
|
||||
# forgejo.viktorbarzin.me, so kubelet image pulls from the Forgejo
|
||||
# registry break. Public DNS upstream + Technitium fallback matches
|
||||
# the pre-existing manual setup on k8s-node1..4.
|
||||
# systemd-resolved split DNS, two drop-ins (2026-06-10, replaces the
|
||||
# public-first global DNS that was here before):
|
||||
#
|
||||
# viktorbarzin.conf — routing domain ~viktorbarzin.me -> Technitium
|
||||
# (10.0.20.201). The technitium-ingress-dns-sync CronJob keeps a CNAME
|
||||
# for every ingress host (incl. forgejo.viktorbarzin.me) chained to the
|
||||
# zone apex, whose A record auto-tracks the live Traefik LB IP (canary:
|
||||
# viktorbarzin-apex-probe). Keeps kubelet pulls of forgejo images off
|
||||
# the flaky public NAT-hairpin with no hardcoded service IPs. (The old
|
||||
# comment claiming Technitium NXDOMAINs forgejo.viktorbarzin.me is
|
||||
# obsolete — ingress-dns-sync added it to the split-horizon zone. See
|
||||
# docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md.)
|
||||
#
|
||||
# global-dns.conf — emergency fallback only. Public servers must NOT
|
||||
# sit in the global DNS= set: they merge with viktorbarzin.conf's set
|
||||
# and race the ~viktorbarzin.me routing domain, intermittently
|
||||
# returning the public IP again. General resolution uses the
|
||||
# link-level DNS from Proxmox's `qm set --nameserver`.
|
||||
- mkdir -p /etc/systemd/resolved.conf.d
|
||||
- |
|
||||
cat > /etc/systemd/resolved.conf.d/viktorbarzin.conf <<'EOF'
|
||||
[Resolve]
|
||||
DNS=10.0.20.201
|
||||
Domains=~viktorbarzin.me
|
||||
EOF
|
||||
- |
|
||||
cat > /etc/systemd/resolved.conf.d/global-dns.conf <<'EOF'
|
||||
[Resolve]
|
||||
DNS=8.8.8.8 1.1.1.1
|
||||
FallbackDNS=10.0.20.201
|
||||
FallbackDNS=8.8.8.8 1.1.1.1
|
||||
EOF
|
||||
- systemctl restart systemd-resolved
|
||||
# Re-enabled 2026-05-10: unattended-upgrades is back on, but with a tight
|
||||
|
|
|
|||
|
|
@ -49,10 +49,16 @@ server = "https://ghcr.io"
|
|||
capabilities = ["pull", "resolve"]
|
||||
GHCR
|
||||
|
||||
# Forgejo OCI registry: prefer in-cluster Traefik LB (10.0.20.203) to
|
||||
# avoid hairpin NAT. Traefik serves the *.viktorbarzin.me wildcard so
|
||||
# SNI verification succeeds. If the mirror is unreachable, fall back to
|
||||
# public DNS resolution (needs the global DNS fallback set up below).
|
||||
# Forgejo OCI registry. NOTE: this hosts.toml mirror is VESTIGIAL — it
|
||||
# cannot keep pulls off the public hairpin on its own (Traefik routes by
|
||||
# Host/SNI and 404s the mirror's bare-IP requests, and the registry's
|
||||
# Bearer auth realm is the absolute https://forgejo.viktorbarzin.me/v2/token
|
||||
# URL fetched outside the mirror). What actually keeps forgejo pulls
|
||||
# internal is the systemd-resolved routing domain ~viktorbarzin.me ->
|
||||
# Technitium (viktorbarzin.conf, written by cloud_init.yaml), which
|
||||
# resolves forgejo to the live Traefik LB via the split-horizon zone.
|
||||
# Kept for config uniformity; harmless. See
|
||||
# docs/post-mortems/2026-06-10-tuya-bridge-forgejo-pull-hairpin.md.
|
||||
mkdir -p /etc/containerd/certs.d/forgejo.viktorbarzin.me
|
||||
cat > /etc/containerd/certs.d/forgejo.viktorbarzin.me/hosts.toml <<'FORGEJO'
|
||||
server = "https://forgejo.viktorbarzin.me"
|
||||
|
|
@ -62,20 +68,6 @@ server = "https://forgejo.viktorbarzin.me"
|
|||
skip_verify = true
|
||||
FORGEJO
|
||||
|
||||
# /etc/hosts pin — REQUIRED in addition to the hosts.toml mirror. The
|
||||
# mirror alone cannot make forgejo pulls hairpin-proof for two reasons
|
||||
# (2026-06-10 tuya-bridge outage, third incident of this class):
|
||||
# a) Traefik routes by Host/SNI and 404s the mirror's bare-IP requests,
|
||||
# so containerd always falls back to `server` (public DNS → hairpin).
|
||||
# b) The registry's Bearer auth realm is the absolute URL
|
||||
# https://forgejo.viktorbarzin.me/v2/token, which containerd fetches
|
||||
# verbatim — that leg never goes through the mirror at all.
|
||||
# Pinning the name to Traefik's LB fixes resolve + token + blob legs with
|
||||
# correct SNI and a valid cert. If Traefik's LB IP ever changes, update
|
||||
# this pin together with the hosts.toml IP above.
|
||||
grep -q forgejo-internal-pin /etc/hosts || \
|
||||
echo '10.0.20.203 forgejo.viktorbarzin.me # forgejo-internal-pin (managed: setup-forgejo-containerd-mirror.sh)' >> /etc/hosts
|
||||
|
||||
# quay.io + registry.k8s.io: include mirror configs that match node4's
|
||||
# layout (no real pull-through cache today, server line is the direct
|
||||
# upstream). Keeping these present makes the per-node config uniform and
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue