k8s-version-upgrade: FQDN SSH targets + python3 in place of envsubst

Two latent bugs in the K8s-version-upgrade pipeline surfaced when a
real detection run ran post-26.04 upgrade today:

1. **DNS**: pod's CoreDNS search path is `<ns>.svc.cluster.local
   svc.cluster.local cluster.local` (+ ndots=2 via Kyverno mutation).
   Unqualified `k8s-master` falls through all of those and then queries
   upstream Technitium for the bare name → NXDOMAIN. The FQDN
   `k8s-master.viktorbarzin.lan` is what Technitium actually serves.
   Suffix every node SSH target with `$NODE_DOMAIN`.

2. **envsubst missing**: claude-agent-service image doesn't ship
   `gettext-base`. Replace `envsubst <template | apply` with
   `python3 -c 'import os,sys; sys.stdout.write(os.path.expandvars(
   sys.stdin.read()))' <template | apply`. Same semantics, image
   already has python3. Multi-line $SCHEDULING_BLOCK is preserved
   correctly through expandvars.

Verified by manually triggering `k8s-version-check` post-fix:
detection now reads `Latest patch: v1.34.8` (currently running 1.34.7)
and spawns `k8s-upgrade-preflight-1-34-8`. The Job pod scheduled and
started; killed before it touched the cluster (will land on Sunday
2026-05-24 12:00 UTC like the schedule says).

Root cause of why these bugs lay dormant: yesterday's first
manual-test detection found "no upgrade needed" so neither code path
exercised SSH or envsubst. Today's apt-source restore (do-release-
upgrade had mangled them) unmasked the v1.34.8 candidate, which made
detection finally proceed past the SSH step.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-17 21:10:58 +00:00
parent 6de4549a96
commit e4e2babd6a
2 changed files with 24 additions and 11 deletions

View file

@ -333,7 +333,7 @@ resource "kubernetes_cron_job_v1" "k8s_version_check" {
echo "Running version: v$RUNNING (minor $RUNNING_MINOR)"
# 2. Latest patch within current minor (refresh master's apt cache)
LATEST_PATCH=$($SSH wizard@k8s-master \
LATEST_PATCH=$($SSH wizard@k8s-master.viktorbarzin.lan \
"sudo apt-get update -qq -o Dir::Etc::sourcelist='sources.list.d/kubernetes.list' -o Dir::Etc::sourceparts='-' -o APT::Get::List-Cleanup='0' >/dev/null 2>&1 ; \
apt-cache madison kubeadm 2>/dev/null \
| awk '{print \$3}' \
@ -360,7 +360,7 @@ resource "kubernetes_cron_job_v1" "k8s_version_check" {
TARGET="$LATEST_PATCH"
KIND="patch"
elif [ "$NEXT_MINOR_AVAILABLE" = "yes" ]; then
NEXT_MINOR_PATCH=$($SSH wizard@k8s-master \
NEXT_MINOR_PATCH=$($SSH wizard@k8s-master.viktorbarzin.lan \
"curl -sf 'https://pkgs.k8s.io/core:/stable:/v$NEXT_MINOR/deb/Packages' \
| grep -oE 'Version: [0-9.-]+' \
| awk '{print \$2}' | sed 's/-.*//' \
@ -411,7 +411,8 @@ resource "kubernetes_cron_job_v1" "k8s_version_check" {
KIND="$KIND" IMAGE="$${IMAGE}" \
SCHEDULING_BLOCK=$' nodeSelector:\n kubernetes.io/hostname: k8s-node1'
envsubst < /template/job-template.yaml \
python3 -c 'import os,sys;sys.stdout.write(os.path.expandvars(sys.stdin.read()))' \
< /template/job-template.yaml \
| /usr/local/bin/kubectl apply -f -
slack "Spawned $JOB_NAME (target=v$TARGET kind=$KIND)"