infra: fix stale Traefik LB-IP refs + accurate LB-IP registry
Part of the L4 LB-IP review (docs/plans/2026-06-03-lb-ip-hygiene-design.md). The 2026-05-30 Traefik .200->.203 move left consumers pointing at the dead .200; this fixes the two in-Terraform ones and replaces the stale networking doc with an accurate registry + a renumber checklist. - woodpecker: forgejo.viktorbarzin.me hostAlias hardcoded 10.0.20.200 (.200:443 refuses TLS now; the next woodpecker apply would re-pin it and break pipeline creation). Now reads the Traefik ClusterIP dynamically via a kubernetes_service data source -- cannot rot on a future renumber and avoids the ETP=Local hairpin trap. - monitoring: ViktorBarzinApexDrift alert summary said "expected 10.0.20.200" -> 10.0.20.203 (cosmetic; alert logic already correct). - docs/architecture/networking.md: rewrote the MetalLB section (it wrongly had KMS on .200, mailserver on a LB IP, "two dedicated") into an accurate 4-IP registry + LB-IP renumber checklist (in-band + out-of-band consumers). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
dcb7c74531
commit
7d7a0ad474
4 changed files with 147 additions and 24 deletions
|
|
@ -2455,7 +2455,7 @@ serverFiles:
|
|||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "viktorbarzin.me apex A drifted from expected 10.0.20.200"
|
||||
summary: "viktorbarzin.me apex A drifted from expected 10.0.20.203"
|
||||
description: "Technitium serves the split-horizon apex for ~80 *.viktorbarzin.me CNAMEs. If this is wrong, every internal service (auth, vault, immich, ha-sofia, ...) breaks. Check Technitium primary zone records via API or web console."
|
||||
- alert: ViktorBarzinApexProbeStale
|
||||
expr: (time() - viktorbarzin_apex_last_correct_timestamp{job="viktorbarzin-apex-probe"}) > 900
|
||||
|
|
|
|||
|
|
@ -178,20 +178,34 @@ resource "helm_release" "woodpecker" {
|
|||
# Keeps the OAuth/forge-API path off the WAN gateway (forgejo.viktorbarzin.me
|
||||
# resolves to the public IP via DNS, which round-trips through Cloudflare
|
||||
# and routinely tripped 30s context-deadline timeouts when fetching pipeline
|
||||
# config). 10.0.20.200 is the Traefik LB that fronts forgejo internally;
|
||||
# Traefik serves the *.viktorbarzin.me wildcard so SNI verification still
|
||||
# passes.
|
||||
# config). We pin forgejo.viktorbarzin.me to the in-cluster Traefik Service
|
||||
# ClusterIP (read dynamically below); Traefik serves the *.viktorbarzin.me
|
||||
# wildcard so SNI verification still passes. ClusterIP (not the LB IP) avoids
|
||||
# two traps: (1) the Traefik LB IP is ETP=Local and not reliably hairpin-
|
||||
# reachable from an arbitrary pod's node; (2) a hard-coded LB IP rots on every
|
||||
# Traefik renumber — this previously pinned 10.0.20.200, which went dead when
|
||||
# Traefik moved to its dedicated .203 (2026-05-30), the same failure class the
|
||||
# cloudflared origin hit (also fixed by targeting the in-cluster Service).
|
||||
data "kubernetes_service" "traefik" {
|
||||
metadata {
|
||||
name = "traefik"
|
||||
namespace = "traefik"
|
||||
}
|
||||
}
|
||||
|
||||
resource "null_resource" "woodpecker_server_host_alias" {
|
||||
triggers = {
|
||||
# Re-run on every helm_release version bump or values change.
|
||||
helm_version = helm_release.woodpecker.version
|
||||
helm_values = sha256(join("", helm_release.woodpecker.values))
|
||||
# Re-run on helm version/values change, and when the Traefik ClusterIP
|
||||
# changes (e.g. the Service is recreated) so the alias self-heals.
|
||||
helm_version = helm_release.woodpecker.version
|
||||
helm_values = sha256(join("", helm_release.woodpecker.values))
|
||||
traefik_cluster_ip = data.kubernetes_service.traefik.spec[0].cluster_ip
|
||||
}
|
||||
|
||||
provisioner "local-exec" {
|
||||
command = <<-BASH
|
||||
set -euo pipefail
|
||||
kubectl -n woodpecker patch statefulset/woodpecker-server --type=strategic --patch '{"spec":{"template":{"spec":{"hostAliases":[{"ip":"10.0.20.200","hostnames":["forgejo.viktorbarzin.me"]}]}}}}'
|
||||
kubectl -n woodpecker patch statefulset/woodpecker-server --type=strategic --patch '{"spec":{"template":{"spec":{"hostAliases":[{"ip":"${data.kubernetes_service.traefik.spec[0].cluster_ip}","hostnames":["forgejo.viktorbarzin.me"]}]}}}}'
|
||||
kubectl -n woodpecker rollout status statefulset/woodpecker-server --timeout=120s
|
||||
BASH
|
||||
interpreter = ["/bin/bash", "-c"]
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue