infra

Author	SHA1	Message	Date
Viktor Barzin	8ed427a7e4	cloud-init: hands-off k8s worker provisioning + 5 bug fixes Goal: re-clone the worker template, boot, and have it appear as `kubectl get nodes …Ready` with no manual steps. Adds `scripts/provision-k8s-worker NAME VMID IP` and rebuilds the cloud-init pipeline that was failing five distinct ways on a clean boot. Bugs fixed (all hit during the k8s-node5 + k8s-node6 builds today): 1. `indent(6, containerd_config_update_command)` indented the bodies of `cat >> /etc/containerd/config.toml <<'CONTAINERD_GC'` heredocs, so [plugins.*] TOML sections landed in /etc/containerd/config.toml at col 6 — containerd refused to parse them. Source is now a normal .sh file (`modules/create-template-vm/k8s-node-containerd-setup.sh`) base64-embedded into `write_files`; YAML whitespace never touches the heredoc bodies. 2. The same script tried to `cat >> /etc/containerd/config.toml` `[plugins."io.containerd.gc.v1.scheduler"]` etc., which containerd v2.2.4's `config default` ALREADY emits. Result: `toml: table … already exists`. Patched with sed-in-place overrides instead. 3. Kubelet tuning (sed against /var/lib/kubelet/config.yaml) ran from the containerd setup script — BEFORE `kubeadm join` writes that file. Sed aborted with "No such file or directory", `set -e` killed the script, post-script cloud-init steps kept going (cloud-init doesn't stop on runcmd failure). Split into a dedicated `k8s-node-post-join-tune.sh` invoked AFTER kubeadm join. 4. cloud_init.yaml fallocate'd a 4G swapfile and `swapon`'d it BEFORE kubeadm join. kubelet defaults to failSwapOn=true → exited 1 immediately. Replaced the swap setup with `swapoff -a` (node4 already runs this way and the cluster is fine). 5. Without `hostname:` in the shared user-data snippet, Proxmox's auto-generated meta-data does NOT include local-hostname when `cicustom user=…` is set — so cloud-init falls back to the cloud image's default `ubuntu` and `kubeadm join` registers the wrong node name. `provision-k8s-worker` now writes a per-node `<NAME>-meta.yaml` snippet and passes both via `cicustom user=…,meta=…`. Other improvements rolled in while fixing the above: - `ssh_public_key` read from Vault (`secret/viktor.ssh_public_key`, added today) instead of `var.ssh_public_key`. The last `terragrunt apply` was run with that var empty, leaving the snippet's `ssh_authorized_keys` with a single blank entry; the wizard user was effectively locked out of every fresh node. - `cloud_init.yaml` adds `/etc/systemd/resolved.conf.d/global-dns.conf` with `DNS=8.8.8.8 1.1.1.1, FallbackDNS=10.0.20.201`. Without it, systemd-resolved only consulted Technitium (link-level), which returns NXDOMAIN for `forgejo.viktorbarzin.me` — kubelet pulls from the Forgejo registry then failed DNS until I patched it manually on node5. - k8s apt repo bumped v1.32 → v1.34 (matches cluster). - The containerd setup script now creates hosts.toml for forgejo, quay, registry.k8s.io in addition to docker.io + ghcr.io. node3/4 had these added by hand post-bootstrap; now they're baked in. - `config_path` sed matches both `""` (containerd v1) and `''` (containerd v2.x). Without the v2 match, the certs.d mirror dir was silently ignored. - `proxmox-csi` node map adds k8s-node5 + k8s-node6 entries so CSI topology labels (region/zone, max-volume-attachments=28) apply on next `tg apply`. - `stacks/infra/main.tf` shed the 160-line inline containerd setup heredoc — that whole thing now lives in the module as a .sh file. Known unsolved gaps (deferred): - iscsid restart hangs ~90s on first boot before SIGKILL releases it (systemd-resolved restart kicks iscsid via dependency). Adds wall- clock time but doesn't block the join. - `provision-k8s-worker` doesn't run `tg apply` on `proxmox-csi` afterward, so the CSI topology labels need a manual apply after the node joins. Solving cleanly needs the CSI map to derive from `kubectl get nodes` instead of a static local — separate work. - `var.containerd_config_update_command` is now ignored when is_k8s_template=true (replaced by the bundled .sh file). Variable kept with a deprecation note to avoid breaking other call sites. E2E proof: k8s-node6 (VMID 206) boots hands-off from `provision-k8s-worker k8s-node6 206 10.0.20.106` and appears as `kubectl get nodes …Ready` ~7 min later (most of which is the apt package_upgrade — separate optimization). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-26 11:52:00 +00:00
Viktor Barzin	5cc91e67bf	cloud-init: refactor to write_files for multi-line containerd setup Moves the containerd_config_update_command interpolation out of the runcmd list and into a write_files block delivering /usr/local/bin/k8s-node-containerd-setup.sh. runcmd then just calls the script. Why: the heredoc in stacks/infra/main.tf has mixed-indent inner shell heredocs (CONTAINERD_GC, KUBELET_PATCH bodies at col 0, surrounding text at col 2). When inserted into a `runcmd: - $${var}` item — even wrapped in a `- \|` literal block — YAML's block-indent rule terminates the block early on the col-0 lines. The result is a silent cloud-init parse failure on every new k8s node (observed 2026-05-26 during node4 rebuild — node booted into the minimal default config, no kubeadm join, no containerd tuning, no kubelet shutdown grace). write_files writes the multi-line content into a YAML literal block where the script body is just opaque text — the block's content indent is set by the `content: \|` block's own indentation (col 6) and any indent >= 6 is valid content. Any further indent inside the script (like the col-0 `[plugins...]` heredoc lines now at col 6 via indent(6, ...)) is preserved cleanly. Verified: `yaml.safe_load()` on the rendered snippet now reports `runcmd=36 write_files=1` (was throwing ParserError before). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-26 08:30:53 +00:00
Viktor Barzin	146dc143c6	cloud-init: revert indent(6) wrap; document the YAML interpolation bug The previous indent(6, containerd_config_update_command) attempt didn't fix the YAML parse error — the heredoc in stacks/infra/main.tf has mixed indentation (most lines at col 2, inner shell heredoc bodies like CONTAINERD_GC and KUBELET_PATCH at col 0). Any uniform-prefix function (indent / replace / join) preserves the relative offset, so the column-0 lines always end up below the block's first-line indent and YAML terminates the literal block early. The cleanest fix is a refactor: move the containerd setup snippet out of the inline heredoc into a cloud-init `write_files` block (script file delivered to the VM, then `bash /path/to/script.sh` in runcmd). That bypasses the multi-line YAML interpolation entirely. Reverting to the previous (also-broken) interpolation pattern with a big WARNING comment instead. New k8s nodes still need manual backfill after first boot — node4 was backfilled today; see memory id=2767/2772 for the backfill steps. Tracked separately. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-26 07:11:20 +00:00
Viktor Barzin	9b75b2817b	cloud-init: fix k8s node bootstrap snippet (multi-line interp + containerd v2 quotes) Two bugs found while rebuilding k8s-node4 (2026-05-26): 1. runcmd YAML breakage: `- $${containerd_config_update_command}` interpolated a multi-line heredoc as bare list-item content. The trailing lines lost their list-item prefix, breaking cloud-config parsing. Cloud-init silently fell back to the minimal default (hostname + package_upgrade only) — kubeadm join, containerd config, kubelet tuning, iSCSI hardening, swap, ALL skipped. No error visible in `cloud-init status`. Fix: wrap the interpolation in `- \|` literal block with `indent(4, ...)`. 2. containerd v2 single-quote mismatch: `containerd config default` in v2 writes `config_path = ''` (single quotes), v1 writes `""` (double). The sed pattern matched only double quotes → silent no-op on fresh containerd 2.x nodes → registry-mirror hosts.toml ignored → all image pulls hit upstream registries → DNS-to-MetalLB chicken-and-egg loop. Fix: match any value with `config_path = .*`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-26 07:06:50 +00:00
Viktor Barzin	c0991f7f8f	infra: re-enable unattended-upgrades with kured prometheus-gating Reverses the March 2026 outage mitigation that disabled unattended- upgrades cluster-wide. Now re-enables it on the k8s template VM with: - Allowed-Origins limited to security/updates pockets - Package-Blacklist for k8s/containerd/runc/calico-node (apt-mark hold on the cluster-critical components) - Automatic-Reboot disabled — kured drives the actual reboots - Compatible with the existing kured + sentinel-gate flow kured side: - rebootDelay 30s, concurrency 1 - Sentinel cool-down stretched 30m → 24h (aligns with the 24h soak window from the post-mortem) - prometheusUrl + alertFilterRegexp wired so any firing non-ignored alert halts the rollout. Ignore-list excludes self-referential alerts (Watchdog/RebootRequired/KuredNodeWasNotDrained/ InfoInhibitor) that would otherwise deadlock kured. Prometheus side (already partly landed in `6c4e0966` — the "Upgrade Gates" rule group): - Refine `KubeQuotaAlmostFull` to include the resourcequota label in both the on-clause and the summary, so multi-quota namespaces (authentik, beads-server, frigate) report the quota name correctly. grafana.tf: terraform fmt whitespace only. Together with the post-mortem 2026-03-22 (memory id=390) the loop is closed: unattended-upgrades runs again, kernel-class updates can land, but only when cluster health is green and the reboot window is open.	2026-05-10 17:07:32 +00:00
Viktor Barzin	6101fb99f9	Reduce disk write amplification across cluster (~200-350 GB/day savings) [ci skip] - Prometheus: persist metric whitelist (keep rules) to Helm template, preventing regression from 33K to 250K samples/scrape on next apply. Reduce retention 52w→26w. - MySQL InnoDB: aggressive write reduction — flush_log_at_trx_commit=0, sync_binlog=0, doublewrite=OFF, io_capacity=100/200, redo_log=1GB, flush_neighbors=1, reduced page cleaners. - etcd: increase snapshot-count 10000→50000 to reduce WAL snapshot frequency. - VM disks: enable TRIM/discard passthrough to LVM thin pool via create-vm module. - Cloud-init: enable fstrim.timer, journald limits (500M/7d/compress). - Kubelet: containerLogMaxSize=10Mi, containerLogMaxFiles=3. - Technitium: DNS query log retention 0→30 days (was unlimited writes to MySQL). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-09 19:01:21 +00:00
Viktor Barzin	c2f9ca0d13	modules: improve create-vm with additional config options and cloud-init updates	2026-04-06 11:57:55 +03:00
Viktor Barzin	a44f35bcf8	harden vaultwarden iSCSI storage and increase backup frequency - Increase backup from daily to every 6 hours (0 /6 * *) - Add pre/post-flight SQLite integrity checks to backup job - Harden iSCSI on all nodes: increase recovery timeout (300s), enable CRC32C data/header digests for bit-flip detection - Fix restore runbook PVC name (vaultwarden-data-iscsi) Motivated by SQLite corruption from iSCSI I/O errors.	2026-03-23 00:36:11 +02:00
Viktor Barzin	67d1ce453c	add /sentinel dir to cloud-init for kured reboot gating The kured sentinel gate DaemonSet requires /sentinel to exist on all nodes. Without it, kured pods get stuck in ContainerCreating with hostPath mount failure. Previously created manually; now provisioned automatically for new nodes.	2026-03-19 19:57:27 +00:00
Viktor Barzin	c034adab5f	mitigate cluster instability during terraform applies - Recreate strategy for heavy single-replica deployments (onlyoffice, stirling-pdf) - Reduce maxSurge on multi-replica deployments (traefik, authentik, grafana, kyverno) to prevent memory request surge overwhelming scheduler - Weekly etcd defrag CronJob (Sunday 3 AM) to prevent fragmentation buildup - Disable Kyverno policy reports (ephemeral report cleanup) - Cloud-init: journald persistence + 4Gi swap for worker nodes - Kubelet: LimitedSwap behavior for memory pressure relief	2026-03-15 17:23:39 +00:00
Viktor Barzin	0638e2cc2e	[ci skip] iSCSI migration, healthcheck fixes, health probes, etcd backup - Migrate MySQL/PostgreSQL storage from local-path to iscsi-truenas - Add democratic-csi iSCSI driver module for TrueNAS - Add open-iscsi to cloud-init VM template - Fix Shlink health probe path (/api/v3 -> /rest/v3 for Shlink 5.0) - Fix etcd backup: use etcd 3.5.21-0 (3.6.x is distroless, no /bin/sh) - Fix cluster healthcheck CronJob: always exit 0 to prevent circular JobFailed alerts (reporting via Slack, not exit codes) - Fix Uptime Kuma nested list handling in cluster-health.sh - Add health probes to: audiobookshelf, immich ML, ntfy, headscale, uptime-kuma, vaultwarden, rybbit (clickhouse + server + client), shlink, shlink-web - Add iSCSI storage documentation to CLAUDE.md	2026-03-06 19:54:21 +00:00
Viktor Barzin	946b5b1745	[ci skip] add qemu-guest-agent to VM templates and enable agent by default	2026-03-01 01:58:46 +00:00
Viktor Barzin	3b7d295119	add nginx reverse proxy to serialize registyr requests for the same path to avoid race conditions [ci skip]	2025-12-29 20:16:13 +00:00
Viktor Barzin	45e74bedc6	update vm creation tempaltes [ci skip]	2025-12-14 09:50:15 +00:00
Viktor Barzin	b15246a2cb	add docker registry vm and allow multiple provisioning cmds in templates [ci skip]	2025-10-12 18:54:29 +00:00
Viktor Barzin	1968f353a2	add module to create a k8s worker [ci skip]	2025-10-11 20:40:34 +00:00
Viktor Barzin	51a94faff4	add template vm in proxmox [ci skip]	2025-10-11 17:07:47 +00:00

17 commits