cloud-init: revert indent(6) wrap; document the YAML interpolation bug

The previous indent(6, containerd_config_update_command) attempt didn't
fix the YAML parse error — the heredoc in stacks/infra/main.tf has
mixed indentation (most lines at col 2, inner shell heredoc bodies
like CONTAINERD_GC and KUBELET_PATCH at col 0). Any uniform-prefix
function (indent / replace / join) preserves the relative offset, so
the column-0 lines always end up below the block's first-line indent
and YAML terminates the literal block early.

The cleanest fix is a refactor: move the containerd setup snippet out
of the inline heredoc into a cloud-init `write_files` block (script
file delivered to the VM, then `bash /path/to/script.sh` in runcmd).
That bypasses the multi-line YAML interpolation entirely.

Reverting to the previous (also-broken) interpolation pattern with a
big WARNING comment instead. New k8s nodes still need manual backfill
after first boot — node4 was backfilled today; see memory id=2767/2772
for the backfill steps. Tracked separately.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-26 07:11:20 +00:00
parent 321c073ca0
commit 146dc143c6

View file

@ -107,12 +107,16 @@ runcmd:
- apt-mark hold containerd containerd.io runc 2>/dev/null || true
- systemctl stop kubelet
- containerd config default | sudo tee /etc/containerd/config.toml
# Multi-line containerd config update — wrapped in `- |` literal block so the
# heredoc content survives YAML rendering. Without this, the multi-line var
# gets inserted as bare top-level lines and breaks the cloud-config parser
# (silent fallback to default — observed 2026-05-26 during node4 rebuild).
- |
${indent(4, containerd_config_update_command)}
# KNOWN BUG (2026-05-26): the `${containerd_config_update_command}` heredoc in
# stacks/infra/main.tf has lines at mixed indent (most at col 2, inner shell
# heredocs CONTAINERD_GC/KUBELET_PATCH bodies at col 0). When interpolated
# into a YAML runcmd item, the rendered output is invalid YAML and cloud-init
# silently falls back to the default minimal config (skips kubeadm join,
# containerd config, kubelet tuning, iSCSI, swap). Properly fixing requires
# refactoring the inner heredocs to write_files or normalising to uniform
# indent. Until then, new k8s nodes must be backfilled manually after first
# boot (see infra/docs/runbooks/k8s-node-bootstrap-backfill.md TODO).
- ${containerd_config_update_command}
- systemctl restart containerd
- systemctl enable --now iscsid
# Harden iSCSI: increase recovery timeout (300s vs 120s default) and enable