Goal: re-clone the worker template, boot, and have it appear as `kubectl
get nodes …Ready` with no manual steps. Adds `scripts/provision-k8s-worker
NAME VMID IP` and rebuilds the cloud-init pipeline that was failing five
distinct ways on a clean boot.
Bugs fixed (all hit during the k8s-node5 + k8s-node6 builds today):
1. `indent(6, containerd_config_update_command)` indented the bodies of
`cat >> /etc/containerd/config.toml <<'CONTAINERD_GC'` heredocs, so
[plugins.*] TOML sections landed in /etc/containerd/config.toml at
col 6 — containerd refused to parse them. Source is now a normal
.sh file (`modules/create-template-vm/k8s-node-containerd-setup.sh`)
base64-embedded into `write_files`; YAML whitespace never touches
the heredoc bodies.
2. The same script tried to `cat >> /etc/containerd/config.toml`
`[plugins."io.containerd.gc.v1.scheduler"]` etc., which containerd
v2.2.4's `config default` ALREADY emits. Result: `toml: table …
already exists`. Patched with sed-in-place overrides instead.
3. Kubelet tuning (sed against /var/lib/kubelet/config.yaml) ran from
the containerd setup script — BEFORE `kubeadm join` writes that
file. Sed aborted with "No such file or directory", `set -e` killed
the script, post-script cloud-init steps kept going (cloud-init
doesn't stop on runcmd failure). Split into a dedicated
`k8s-node-post-join-tune.sh` invoked AFTER kubeadm join.
4. cloud_init.yaml fallocate'd a 4G swapfile and `swapon`'d it BEFORE
kubeadm join. kubelet defaults to failSwapOn=true → exited 1
immediately. Replaced the swap setup with `swapoff -a` (node4
already runs this way and the cluster is fine).
5. Without `hostname:` in the shared user-data snippet, Proxmox's
auto-generated meta-data does NOT include local-hostname when
`cicustom user=…` is set — so cloud-init falls back to the cloud
image's default `ubuntu` and `kubeadm join` registers the wrong
node name. `provision-k8s-worker` now writes a per-node
`<NAME>-meta.yaml` snippet and passes both via
`cicustom user=…,meta=…`.
Other improvements rolled in while fixing the above:
- `ssh_public_key` read from Vault (`secret/viktor.ssh_public_key`,
added today) instead of `var.ssh_public_key`. The last
`terragrunt apply` was run with that var empty, leaving the snippet's
`ssh_authorized_keys` with a single blank entry; the wizard user
was effectively locked out of every fresh node.
- `cloud_init.yaml` adds `/etc/systemd/resolved.conf.d/global-dns.conf`
with `DNS=8.8.8.8 1.1.1.1, FallbackDNS=10.0.20.201`. Without it,
systemd-resolved only consulted Technitium (link-level), which
returns NXDOMAIN for `forgejo.viktorbarzin.me` — kubelet pulls from
the Forgejo registry then failed DNS until I patched it manually
on node5.
- k8s apt repo bumped v1.32 → v1.34 (matches cluster).
- The containerd setup script now creates hosts.toml for forgejo,
quay, registry.k8s.io in addition to docker.io + ghcr.io. node3/4
had these added by hand post-bootstrap; now they're baked in.
- `config_path` sed matches both `""` (containerd v1) and `''`
(containerd v2.x). Without the v2 match, the certs.d mirror dir was
silently ignored.
- `proxmox-csi` node map adds k8s-node5 + k8s-node6 entries so CSI
topology labels (region/zone, max-volume-attachments=28) apply on
next `tg apply`.
- `stacks/infra/main.tf` shed the 160-line inline containerd setup
heredoc — that whole thing now lives in the module as a .sh file.
Known unsolved gaps (deferred):
- iscsid restart hangs ~90s on first boot before SIGKILL releases it
(systemd-resolved restart kicks iscsid via dependency). Adds wall-
clock time but doesn't block the join.
- `provision-k8s-worker` doesn't run `tg apply` on `proxmox-csi`
afterward, so the CSI topology labels need a manual apply after
the node joins. Solving cleanly needs the CSI map to derive from
`kubectl get nodes` instead of a static local — separate work.
- `var.containerd_config_update_command` is now ignored when
is_k8s_template=true (replaced by the bundled .sh file). Variable
kept with a deprecation note to avoid breaking other call sites.
E2E proof: k8s-node6 (VMID 206) boots hands-off from
`provision-k8s-worker k8s-node6 206 10.0.20.106` and appears as
`kubectl get nodes …Ready` ~7 min later (most of which is the apt
package_upgrade — separate optimization).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Moves the containerd_config_update_command interpolation out of the
runcmd list and into a write_files block delivering
/usr/local/bin/k8s-node-containerd-setup.sh. runcmd then just calls
the script.
Why: the heredoc in stacks/infra/main.tf has mixed-indent inner shell
heredocs (CONTAINERD_GC, KUBELET_PATCH bodies at col 0, surrounding
text at col 2). When inserted into a `runcmd: - $${var}` item — even
wrapped in a `- |` literal block — YAML's block-indent rule
terminates the block early on the col-0 lines. The result is a silent
cloud-init parse failure on every new k8s node (observed 2026-05-26
during node4 rebuild — node booted into the minimal default config,
no kubeadm join, no containerd tuning, no kubelet shutdown grace).
write_files writes the multi-line content into a YAML literal block
where the script body is just opaque text — the block's content
indent is set by the `content: |` block's own indentation (col 6)
and any indent >= 6 is valid content. Any further indent inside the
script (like the col-0 `[plugins...]` heredoc lines now at col 6 via
indent(6, ...)) is preserved cleanly.
Verified: `yaml.safe_load()` on the rendered snippet now reports
`runcmd=36 write_files=1` (was throwing ParserError before).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous indent(6, containerd_config_update_command) attempt didn't
fix the YAML parse error — the heredoc in stacks/infra/main.tf has
mixed indentation (most lines at col 2, inner shell heredoc bodies
like CONTAINERD_GC and KUBELET_PATCH at col 0). Any uniform-prefix
function (indent / replace / join) preserves the relative offset, so
the column-0 lines always end up below the block's first-line indent
and YAML terminates the literal block early.
The cleanest fix is a refactor: move the containerd setup snippet out
of the inline heredoc into a cloud-init `write_files` block (script
file delivered to the VM, then `bash /path/to/script.sh` in runcmd).
That bypasses the multi-line YAML interpolation entirely.
Reverting to the previous (also-broken) interpolation pattern with a
big WARNING comment instead. New k8s nodes still need manual backfill
after first boot — node4 was backfilled today; see memory id=2767/2772
for the backfill steps. Tracked separately.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two bugs found while rebuilding k8s-node4 (2026-05-26):
1. **runcmd YAML breakage**: `- $${containerd_config_update_command}`
interpolated a multi-line heredoc as bare list-item content. The
trailing lines lost their list-item prefix, breaking cloud-config
parsing. Cloud-init silently fell back to the minimal default
(hostname + package_upgrade only) — kubeadm join, containerd config,
kubelet tuning, iSCSI hardening, swap, ALL skipped. No error visible
in `cloud-init status`.
Fix: wrap the interpolation in `- |` literal block with `indent(4, ...)`.
2. **containerd v2 single-quote mismatch**: `containerd config default`
in v2 writes `config_path = ''` (single quotes), v1 writes `""` (double).
The sed pattern matched only double quotes → silent no-op on fresh
containerd 2.x nodes → registry-mirror hosts.toml ignored → all image
pulls hit upstream registries → DNS-to-MetalLB chicken-and-egg loop.
Fix: match any value with `config_path = .*`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reverses the March 2026 outage mitigation that disabled unattended-
upgrades cluster-wide. Now re-enables it on the k8s template VM with:
- Allowed-Origins limited to security/updates pockets
- Package-Blacklist for k8s/containerd/runc/calico-node (apt-mark
hold on the cluster-critical components)
- Automatic-Reboot disabled — kured drives the actual reboots
- Compatible with the existing kured + sentinel-gate flow
kured side:
- rebootDelay 30s, concurrency 1
- Sentinel cool-down stretched 30m → 24h (aligns with the 24h soak
window from the post-mortem)
- prometheusUrl + alertFilterRegexp wired so any firing non-ignored
alert halts the rollout. Ignore-list excludes self-referential
alerts (Watchdog/RebootRequired/KuredNodeWasNotDrained/
InfoInhibitor) that would otherwise deadlock kured.
Prometheus side (already partly landed in 6c4e0966 — the "Upgrade
Gates" rule group):
- Refine `KubeQuotaAlmostFull` to include the resourcequota label in
both the on-clause and the summary, so multi-quota namespaces
(authentik, beads-server, frigate) report the quota name correctly.
grafana.tf: terraform fmt whitespace only.
Together with the post-mortem 2026-03-22 (memory id=390) the loop is
closed: unattended-upgrades runs again, kernel-class updates can land,
but only when cluster health is green and the reboot window is open.
The kured sentinel gate DaemonSet requires /sentinel to exist on
all nodes. Without it, kured pods get stuck in ContainerCreating
with hostPath mount failure. Previously created manually; now
provisioned automatically for new nodes.