infra/stacks/k8s-version-upgrade/scripts
Viktor Barzin e4e2babd6a k8s-version-upgrade: FQDN SSH targets + python3 in place of envsubst
Two latent bugs in the K8s-version-upgrade pipeline surfaced when a
real detection run ran post-26.04 upgrade today:

1. **DNS**: pod's CoreDNS search path is `<ns>.svc.cluster.local
   svc.cluster.local cluster.local` (+ ndots=2 via Kyverno mutation).
   Unqualified `k8s-master` falls through all of those and then queries
   upstream Technitium for the bare name → NXDOMAIN. The FQDN
   `k8s-master.viktorbarzin.lan` is what Technitium actually serves.
   Suffix every node SSH target with `$NODE_DOMAIN`.

2. **envsubst missing**: claude-agent-service image doesn't ship
   `gettext-base`. Replace `envsubst <template | apply` with
   `python3 -c 'import os,sys; sys.stdout.write(os.path.expandvars(
   sys.stdin.read()))' <template | apply`. Same semantics, image
   already has python3. Multi-line $SCHEDULING_BLOCK is preserved
   correctly through expandvars.

Verified by manually triggering `k8s-version-check` post-fix:
detection now reads `Latest patch: v1.34.8` (currently running 1.34.7)
and spawns `k8s-upgrade-preflight-1-34-8`. The Job pod scheduled and
started; killed before it touched the cluster (will land on Sunday
2026-05-24 12:00 UTC like the schedule says).

Root cause of why these bugs lay dormant: yesterday's first
manual-test detection found "no upgrade needed" so neither code path
exercised SSH or envsubst. Today's apt-source restore (do-release-
upgrade had mangled them) unmasked the v1.34.8 candidate, which made
detection finally proceed past the SSH step.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 14:16:56 +00:00
..
upgrade-step.sh k8s-version-upgrade: FQDN SSH targets + python3 in place of envsubst 2026-05-22 14:16:56 +00:00