Merge forgejo/master — reconcile 18-day divergence with origin

Origin and forgejo had drifted since 2026-05-05 (merge base b45c45e4). Each remote was receiving Viktor's commits independently — origin since 2026-05-23 and forgejo from 2026-05-06 to 2026-05-22 14:15. Both had ~30 substantive commits. This merge brings forgejo's work into the local branch. 13 conflict files resolved as follows (all favoured HEAD = origin/local, which is newer in every case): - secrets/{fullchain,privkey}.pem — kept HEAD (renewed 2026-05-24, vs forgejo's 2026-05-17 renewal) - stacks/blog/main.tf — kept HEAD (ingress-www intentionally removed today after DNS+monitor cleanup; forgejo had the old block) - stacks/xray/modules/xray/main.tf — kept HEAD (vless dropped today as dead ingress; forgejo had the old 3-port service) - stacks/k8s-version-upgrade/scripts/upgrade-step.sh — kept HEAD (allowlist refactor, master-phase idempotency skip, tigera-operator quiesce/restore, IngressTTFBCritical ignore — all newer than forgejo) - stacks/k8s-version-upgrade/main.tf — kept HEAD (deployments/scale RBAC, oldest-kubelet detection — both added 2026-05-23) - scripts/update_k8s.sh — kept HEAD (--etcd-upgrade=false fallback) - stacks/llama-cpp/main.tf — kept HEAD (KEEL_LIFECYCLE_V1 ignore_changes block added today, commit 0b1282a1) - stacks/openclaw/main.tf — kept HEAD (nim/meta/llama-3.1-70b primary) - stacks/trading-bot/main.tf — kept HEAD (claude-haiku-4-5 pin + kevin-signal-bridge container) - stacks/postiz/modules/postiz/main.tf — kept HEAD (memory 2Gi/3Gi bump, despite postiz being destroyed today — kept TF intent) - stacks/nvidia/modules/nvidia/values.yaml — kept HEAD (mem 822Mi) - stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl — kept HEAD (richer alert list + raised StatefulSet `for: 3m`) - stacks/kyverno/modules/kyverno/security-policies.tf — kept HEAD (expanded registry allowlist + comments) - docs/architecture/security.md — kept HEAD (detailed W1.7 analysis) - docs/plans/2026-05-21-ha-control-plane-design.md — kept HEAD (178-line superset incl. 2026-05-23 deferral rationale) Auto-merged (no conflict): broker-sync, claude-agent-service, cloudflared, mailserver, n8n, technitium, traefik, url, proxmox-csi, xray (deployment portion). Brings in forgejo-only substantive commits: fire-planner, openclaw v3 flow + recruiter-responder wiring, several k8s-version-upgrade hardening passes (kill-switch, RecentNodeReboot ignore, pipefail fixes), HA control plane design, security wave 1 expansion to tier 3+4, alloy file-tail switch, prometheus scrape 2m, authentik replica cut, forgejo archive disable. Meta: forgejo and origin drift is a coordination bug. Going forward we need to either (a) have one CI mirror to the other, or (b) standardize on one remote. Filed mentally; not addressed in this commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
llama-cpp: ignore_changes for keel/k8s-managed annotations
2026-05-24 09:41:36 +00:00 · 2026-05-24 09:01:17 +00:00 · 2026-05-24 01:22:53 +00:00 · 2026-05-24 01:13:54 +00:00 · 2026-05-24 01:11:25 +00:00 · 2026-05-24 01:11:16 +00:00
36 changed files with 1167 additions and 244 deletions
--- a/docs/architecture/networking.md
+++ b/docs/architecture/networking.md
@ -416,7 +416,7 @@ Containerd on all K8s nodes uses `hosts.toml` to redirect pulls to the local cac

 ### Ingress Returns 502 Bad Gateway

-**Symptoms**: Cloudflared tunnel is up, Traefik logs show `dial tcp: lookup <service> on 10.0.20.101:53: no such host`.
+**Symptoms**: Cloudflared tunnel is up, Traefik logs show `dial tcp: lookup <service> on 10.0.20.201:53: no such host`.

 **Diagnosis**: DNS resolution failed. Check:
 1. Is Technitium pod running? `kubectl get pod -n technitium`
--- a/docs/architecture/security.md
+++ b/docs/architecture/security.md
@ -181,7 +181,7 @@ Beads epic: `code-8ywc`. **Status: partially live as of 2026-05-18.**
 | W1.4 Kyverno security policies → Enforce | **LIVE** — 3 policies in Enforce mode with 35-namespace exclude list. |
 | W1.5 Kyverno trusted-registries → Enforce | **LIVE** — explicit allowlist (15 registries + 6 DockerHub library bare names + 56 DockerHub user repos). Verified by admission dry-run: `evilcorp.example/malware:v1` BLOCKED, `alpine:3.20` and `docker.io/library/alpine:3.20` ALLOWED. |
 | W1.6 Calico observe-phase (pilot: recruiter-responder) | **LIVE** (2026-05-19) — GlobalNetworkPolicy `wave1-egress-observe-recruiter-responder` with rules `[action:Log, action:Allow]`. FelixConfiguration.flowLogsFileEnabled approach abandoned (Calico Enterprise-only field, rejected by OSS v3.26). Log action emits iptables LOG with prefix `calico-packet: ` → kernel → journald → Alloy → Loki. Verified: `{job="node-journal"} \|~ "calico-packet"` returns real packet metadata (SRC/DST/PROTO). Expand to more namespaces by adding to `namespaceSelector`. |
-| W1.7 NetworkPolicy phased enforce | **PENDING** — needs ~1 week of W1.6 observation, then build empirical allowlist from Loki queries, flip GNP rules from `[Log, Allow]` to `[Allow specific dests, Deny rest]`. |
+| W1.7 NetworkPolicy phased enforce | **PARTIAL ANALYSIS** — first observation snapshot at `docs/architecture/wave1-egress-observation-2026-05-22.md` (36 source namespaces seen so far, 29 thin-profile candidates). Recommend continuing observation through 2026-05-29 (full week) before any enforce flip. Pilot enforce target: `recruiter-responder` (2 destinations only). `servarr` stays in Log+Allow indefinitely (BitTorrent P2P incompatible with static enforce). |

 The block below documents the locked design.

--- a/docs/architecture/vpn.md
+++ b/docs/architecture/vpn.md
@ -86,7 +86,7 @@ sequenceDiagram
 | Authentik | OIDC provider | K8s | SSO authentication for Headscale |
 | DERP Relay | Embedded in Headscale | K8s (region 999) | Relay for NAT traversal |
 | AdGuard DNS | Container | K8s | Global DNS resolver with ad-blocking |
-| Technitium DNS | Container | K8s (10.0.20.101) | Internal .lan domain resolver |
+| Technitium DNS | Container | K8s (10.0.20.201) | Internal .lan domain resolver |

 ## How It Works

@ -224,7 +224,7 @@ dns_config:
 - Google: `8.8.8.8`, `8.8.4.4`

 **Conditional forwarding**:
- `viktorbarzin.lan` → `10.0.20.101` (Technitium)
+- `viktorbarzin.lan` → `10.0.20.201` (Technitium)

 **Ad-blocking lists**:
 - AdGuard DNS filter
@ -377,7 +377,7 @@ dns_config:
 **Steps**:
 1. Verify AdGuard is running: `kubectl get pod -n adguard`
 2. Check AdGuard conditional forwarding: Query AdGuard directly: `nslookup nextcloud.viktorbarzin.lan <adguard-ip>`
-3. Check Technitium: `nslookup nextcloud.viktorbarzin.lan 10.0.20.101`
+3. Check Technitium: `nslookup nextcloud.viktorbarzin.lan 10.0.20.201`

 **Common causes**:
 1. **AdGuard not forwarding .lan**: Conditional forwarding rule missing or misconfigured.
--- a/docs/architecture/wave1-egress-observation-2026-05-22.md
+++ b/docs/architecture/wave1-egress-observation-2026-05-22.md
@ -0,0 +1,141 @@
+# Wave 1 W1.6/W1.7 — Egress Observation Snapshot (2026-05-22)
+
+First analysis pass over the Calico GNP `wave1-egress-observe-tier34` data
+captured in Loki via `{job="node-journal"} |~ "calico-packet"`.
+
+**Data scope:** ~10000 flow log lines pulled from Loki over ~6h+24h windows.
+Loki caps queries at 5000 records so longer windows are sample-capped.
+
+**Coverage:** 36 source namespaces observed making egress (out of 82 selected
+by `tier in {3-edge, 4-aux}`). Namespaces missing from data are either idle,
+scaled to 0, or producing only intra-namespace traffic (which Calico Log
+captures from-workload but most pods in those namespaces talk locally).
+
+## Egress fan-out per namespace
+
+| Namespace | dests | pod-ns | svc | external |
+|---|---:|---:|---:|---:|
+| affine | 3 | 2 | 1 | 0 |
+| beads-server | 4 | 3 | 1 | 0 |
+| cyberchef | 2 | 1 | 1 | 0 |
+| dawarich | 3 | 2 | 1 | 0 |
+| default | 1 | 0 | 0 | 1 |
+| ebooks | 3 | 2 | 1 | 0 |
+| f1-stream | 16 | 2 | 1 | 13 |
+| forgejo | 2 | 1 | 1 | 0 |
+| hackmd | 2 | 1 | 1 | 0 |
+| homepage | 2 | 1 | 1 | 0 |
+| isponsorblocktv | 2 | 0 | 1 | 1 |
+| jsoncrack | 2 | 1 | 1 | 0 |
+| kms | 2 | 1 | 1 | 0 |
+| mailserver | 2 | 0 | 1 | 1 |
+| meshcentral | 2 | 2 | 0 | 0 |
+| n8n | 2 | 1 | 1 | 0 |
+| nextcloud | 5 | 2 | 1 | 2 |
+| onlyoffice | 2 | 1 | 1 | 0 |
+| openclaw | 18 | 4 | 1 | 13 |
+| paperless-ngx | 3 | 2 | 1 | 0 |
+| phpipam | 3 | 2 | 1 | 0 |
+| poison-fountain | 2 | 1 | 1 | 0 |
+| postiz | 9 | 8 | 1 | 0 |
+| realestate-crawler | 2 | 1 | 1 | 0 |
+| recruiter-responder | 2 | 0 | 1 | 1 |
+| rybbit | 2 | 1 | 1 | 0 |
+| send | 2 | 1 | 1 | 0 |
+| servarr | 134 | 2 | 2 | 130 |
+| speedtest | 2 | 1 | 1 | 0 |
+| status-page | 10 | 2 | 1 | 7 |
+| tandoor | 2 | 1 | 1 | 0 |
+| technitium | 5 | 2 | 1 | 2 |
+| trading-bot | 5 | 2 | 1 | 2 |
+| url | 2 | 1 | 1 | 0 |
+| website | 2 | 1 | 1 | 0 |
+| woodpecker | 8 | 2 | 1 | 5 |
+
+## Common patterns
+
+**Universal baseline** (every observed namespace makes these):
+- `kube-system/kube-dns` UDP/53 — DNS resolution
+- Often `dbaas` TCP/3306 (MySQL) or TCP/5432 (Postgres)
+- Often `redis` TCP/6379
+
+**Per-namespace specifics** (the part that varies):
+- External HTTPS to specific IPs (CDNs, APIs)
+- Internal pod-to-pod for service-specific clients
+
+## W1.7 rollout candidates (sorted by simplicity)
+
+**Tier A — trivial egress (recommend first wave):**
+
+`recruiter-responder` has the simplest profile of all observed:
+- `kube-system/kube-dns` :53/UDP
+- `99.83.136.103` :443/TCP (Telegram API)
+
+That's it. Two destinations. Perfect first enforce candidate.
+
+**Tier B — small egress (≤3 external + ≤5 internal, 29 namespaces):**
+
+affine, beads-server, cyberchef, dawarich, ebooks, forgejo, hackmd, homepage,
+isponsorblocktv, jsoncrack, kms, mailserver, meshcentral, n8n, nextcloud,
+onlyoffice, paperless-ngx, phpipam, poison-fountain, realestate-crawler,
+rybbit, send, speedtest, tandoor, technitium, trading-bot, url, website.
+
+These can be enforce'd in batches of 3-5/day after the recruiter-responder
+pilot proves out.
+
+**Tier C — moderate egress (5–18 external):**
+
+f1-stream (13 ext), openclaw (13 ext), woodpecker (5 ext), status-page (7 ext).
+Need per-IP allowlist or domain-based selectors.
+
+**Tier D — broad egress (do NOT enforce statically):**
+
+`servarr` has 130+ external IPs because it runs BitTorrent peer-to-peer.
+Static IP enforcement won't work; either leave in Log+Allow mode permanently
+or use a port-only allowlist (TCP+UDP 6881+random high ports outbound).
+
+## Important caveats before flipping to enforce
+
+1. **Observation horizon is too short.** Only ~6h of dense data and ~24h
+   total. CronJobs that run weekly, periodic Vault token rotations (7d),
+   external service maintenance windows, Keel auto-rollouts pulling new
+   image versions — all missed. Recommend collecting **at least 7 days**
+   before declaring an allowlist complete.
+
+2. **`servarr`** is fundamentally incompatible with static enforce — keep
+   in Log+Allow (or explicit deny for known-bad CIDRs only).
+
+3. **External IPs are dynamic.** Cloudflare-fronted services rotate IPs.
+   The recruiter-responder external IP `99.83.136.103` is one of Telegram's
+   API endpoints — Telegram has a CIDR range. Allowing single IPs will break
+   when DNS resolves to a different IP. Prefer Calico's `domains:` selector
+   (Calico OSS supports DNS-based egress allowlists via `dns_policy_resolver`)
+   OR allow the full Cloudflare/AWS CIDR range OR use a per-app egress
+   gateway.
+
+4. **The observation didn't capture intra-namespace traffic** by design —
+   the Calico Log rule fires on egress from workload endpoint, but
+   pod-to-same-namespace-pod traffic on the same node may bypass the
+   filter chain (varies). Real-world testing needed after enforce flip.
+
+## Suggested next-session sequencing
+
+1. **Continue observation for at least 7 days** before any enforce flip.
+   Compare data on 2026-05-29 vs today; if no new destinations show up,
+   the allowlist is stable.
+2. **First enforce: recruiter-responder.** GNP with allowlist =
+   {kube-dns, telegram CIDR, vault svc, eso svc}. Watch for breakage.
+3. **Tier B batch rollout** at 3-5 namespaces/day per Keel-style phased
+   rollout pattern (memory id=1972).
+4. **Tier C requires per-namespace investigation** — what are those
+   external IPs? Map to known services first.
+5. **servarr stays in Log+Allow** indefinitely (or migrate to dedicated
+   egress proxy).
+
+## Source data location
+
+- Loki LogQL: `{job="node-journal"} |~ "calico-packet"`
+- Pod IP → namespace map at observation time saved at
+  `/tmp/pod-ip-map.txt` on the analysis host (ephemeral).
+- Analysis scripts: `/tmp/analyze_flows2.py`, `/tmp/build_allowlist.py`.
+- Tracked under beads `code-8ywc` (W1.7).
--- a/docs/plans/2026-02-22-talos-linux-migration-evaluation.md
+++ b/docs/plans/2026-02-22-talos-linux-migration-evaluation.md
@ -106,7 +106,7 @@ machine:
          - network: 0.0.0.0/0
            gateway: 10.0.20.1
    nameservers:
-      - 10.0.20.101  # Technitium
+      - 10.0.20.201  # Technitium
      - 1.1.1.1
  registries:
    mirrors:
--- a/docs/plans/2026-05-21-ha-control-plane-design.md
+++ b/docs/plans/2026-05-21-ha-control-plane-design.md
@ -1,9 +1,32 @@
 # HA Control Plane (3 masters) — Design

-**Date**: 2026-05-21
-**Status**: Drafted, NOT scheduled
-**Beads**: code-n0ow
-**Trigger**: today's k8s 1.34.7→1.34.8 autonomous-upgrade session repeatedly hit a storm cascade rooted in single-master apiserver outages
+**Date**: 2026-05-21 (decisions locked 2026-05-22; **deferred 2026-05-23**)
+**Status**: **DEFERRED** — design + plan complete, NOT scheduled. Awaiting either PVE host capacity expansion OR a separate right-sizing pass on the existing master before this becomes affordable. Paired plan: `2026-05-21-ha-control-plane-plan.md`.
+**Beads**: code-n0ow (open, deferred — see `bd show code-n0ow`)
+**Trigger**: 2026-05-21 k8s 1.34.7→1.34.8 autonomous-upgrade session repeatedly hit a storm cascade rooted in single-master apiserver outages
+
+## Why deferred (2026-05-23)
+
+Measured during the locking pass:
+
+- **k8s-master uses 4.6 GB of 32 GB allocated** (kube-apiserver 2.6 GB + etcd 660 MB + cm 360 MB + ~1 GB everything else). The 32 GB sizing is ~5-6× oversized vs working set.
+- **PVE host is already 98% RAM-committed** — 262 GB allocated to VMs against 267 GB physical, with 1.5 GB of active swap. The planned 3 × 32 GB control plane (+64 GB net) would push allocation to 326 GB → OOM on the host.
+- **Software-only HA on a single PVE host has bounded value** — a hypervisor crash still loses all 3 masters. The big resilience wins (kubeadm upgrades, cert rotation, planned reboots) are real but the disaster-recovery angle is limited until a second PVE host exists.
+
+### Revisit triggers — any of:
+
+1. **Second PVE host added** to the lab. Hardware HA becomes possible; HA control plane becomes the natural follow-up. Spread the 3 masters across 2 hosts (2+1).
+2. **Cluster-wide right-sizing pass** that frees enough headroom for the original 3 × 32 GB plan, OR pre-agreed amendment to provision 16 GB masters (right-sized to actual usage; 3-4× current working-set headroom).
+3. **Storm cascade burns enough hours** that the operational cost outweighs the memory cost — track minutes spent manually nursing kubeadm upgrades; if cumulative > ~10h over a few months, revisit.
+
+### What's still good
+
+The design + plan in this directory remain authoritative. When we revisit:
+
+- All 14 locked decisions stand.
+- Challenger amendments (cloud-init template bump, rbac multi-master refactor, HTTPS `/readyz` health check, expanded blast radius, etcd-backup nodeSelector, chain extension as Phase 7) are baked in.
+- Only the sizing decision needs revisiting — likely 16 GB per master instead of 32 GB.
+- Adding `k8s_master_hosts` list-based refactor to the rbac stack (Phase 1.5) is a **standalone win** that could be done independently of HA — it would future-proof the cluster against the day HA lands. Consider lifting that as its own task.

 ## Problem statement

@ -50,18 +73,24 @@ The k8s upgrade chain doesn't need to be aware of *any* of this — the
 underlying availability of apiserver makes the chain's gates
 naturally pass on each iteration.

-## Decisions (proposed — to be confirmed)
+## Decisions (locked 2026-05-22)

 | # | Decision | Notes |
 |---|----------|-------|
 | 1 | **3 masters** (not 5) | Quorum tolerates 1 failure, sufficient for home-lab. 5 would tolerate 2 but doubles etcd write amplification. |
-| 2 | **Sizing**: match current `k8s-master` (8 vCPU, 32GB RAM, ~64 GB disk) for all 3 | Symmetric. New VMs `k8s-master-2`, `k8s-master-3` on Proxmox. |
-| 3 | **Apiserver LB**: **pfSense HAProxy** (existing pattern, see mailserver-pfsense-haproxy.md) over keepalived+haproxy-on-each-master | Pros: no per-node moving parts, mirrors the mailserver layout already in production. Cons: pfSense becomes more SPoF — but it's already SPoF for everything else (DNS, gateway, ingress). |
-| 4 | **VIP**: pick an unused IP on the cluster VLAN, e.g. `10.0.20.99`, point all kubeconfigs + kubelet `--server` at it | Internal-only VIP; external API access stays via Cloudflared. |
-| 5 | **etcd**: kubeadm-managed (existing); just `kubeadm join --control-plane` brings new members into the etcd cluster automatically | Avoids running etcd separately. |
-| 6 | **kured-sentinel-gate**: extend "quorum-safe" check to verify ≥2 control-plane nodes Ready before allowing a reboot | Otherwise kured could reboot 2 masters at once and break quorum. |
-| 7 | **etcd backup**: today's `etcd-backup` CronJob already takes a snapshot from one member; that's still sufficient (etcd snapshot is a consistent point-in-time). No new work needed. | |
-| 8 | **Migration order**: add masters one at a time, run smoke (kubectl from each), then cut over kubeconfigs | Each `kubeadm join --control-plane` is reversible (just `kubeadm reset` + remove from etcd member list). |
+| 2 | **Sizing**: match current `k8s-master` (8 vCPU, 32GB RAM, ~64 GB disk) for all 3 | Symmetric. New VMs `k8s-master-2` (VMID 205, 10.0.20.110), `k8s-master-3` (VMID 206, 10.0.20.111). |
+| 3 | **Apiserver LB**: **pfSense HAProxy** — new TCP frontend on `10.0.20.99:6443` mirroring the mailserver pattern. Idempotent via `scripts/pfsense-haproxy-bootstrap.php`. | Pros: no per-node moving parts, mirrors existing mailserver layout. Cons: pfSense becomes more SPoF — but it's already SPoF for everything else (gateway/DNS/ingress). |
+| 4 | **VIP**: `10.0.20.99` (one below current master `.100`, well clear of MetalLB pool `.200-.220`). Internal-only — external API access stays via Cloudflared. | All kubeconfigs + kubelet.conf entries flip from `10.0.20.100:6443` → `10.0.20.99:6443`. |
+| 5 | **etcd**: kubeadm-managed stacked; `kubeadm join --control-plane` brings new members into the etcd cluster automatically | Avoids running etcd separately. |
+| 6 | **kured-sentinel-gate**: extend the bash loop in `stacks/kured/main.tf` with a "≥2 control-plane nodes Ready" check between the existing all-nodes-Ready and calico-Ready checks | Otherwise kured could reboot 2 masters at once and break quorum. |
+| 7 | **etcd backup**: `etcdctl snapshot save` from any member is a consistent point-in-time of the full quorum state — but the existing CronJob is pinned `node_name = "k8s-master"`. Phase 4.5 flips this to a control-plane label + toleration so backups don't silently skip when master-1 is drained. | Snapshot CORRECTNESS unchanged; SCHEDULING needs fixing. |
+| 8 | **Migration order**: Phase 0 (retrofit existing cluster) → Phase 1 (LB up, single backend, HTTPS health check) → Phase 1.5 (rbac stack refactor) → Phase 2 (cloud-init bump + master-2 join + add to LB) → Phase 3 (master-3 join + add to LB) → Phase 4 (flip clients + workers to VIP) → Phase 4.5 (etcd-backup CronJob fix) → Phase 5 (kured-sentinel-gate quorum check) → Phase 6 (E2E validation) → Phase 7 (k8s-version-upgrade chain extension) | Each kubeadm join is reversible (`kubeadm reset` + `etcdctl member remove`). |
+| 9 | **VM provisioning**: cloud-init via `create-template-vm` module, **but the template needs an apt-source bump first** (v1.32 → v1.34) and a control-plane gate on `k8s_join_command` so master VMs don't auto-join as workers. Existing master stays as the legacy manual VM (not rebuilt). | The repo has zero VMs using cloud-init for provisioning today — we're the first user. Update template first, then use it. |
+| 10 | **Cert SAN + controlPlaneEndpoint retrofit**: Phase 0, before any new master joins. Patch `kubeadm-config` via `kubeadm init phase upload-config kubeadm --config <file>` (kubeadm-owned write, future-proof against `kubeadm upgrade apply`), regen `apiserver.crt` via `kubeadm init phase certs apiserver`, restart the kube-apiserver pod (~30s outage on the existing master only). | Standard kubeadm retrofit path; `kubeadm join --control-plane` requires controlPlaneEndpoint to be set. |
+| 11 | **Multi-master config propagation (Phase 1.5)**: refactor `stacks/rbac/modules/rbac/{apiserver-oidc,audit-policy,etcd-tuning}.tf` to loop over a list of master hosts. Apply BEFORE master-2/3 join so they boot with OIDC, audit policy, and etcd tuning already in place. | Today these stacks SSH into a single master and sed into `kube-apiserver.yaml` — if not propagated, Authentik login flaps depending on which master the LB lands on. |
+| 12 | **k8s-version-upgrade chain extension (Phase 7)**: extend `stacks/k8s-version-upgrade/scripts/upgrade-step.sh` to discover and iterate over all control-plane nodes (drain → upgrade → uncordon, gated by quorum check). | Without this, chain only upgrades master-1; masters 2/3 drift behind one version per upgrade. Original autonomous-upgrades goal unmet. |
+| 13 | **LB health check**: HTTPS `GET /readyz` (with `verify none` for self-signed apiserver cert), NOT plain TCP. | Plain TCP misses apiserver-NotReady states (etcd unreachable, controller-manager flapping). |
+| 14 | **VIP DNS name**: add `k8s-apiserver IN A 10.0.20.99` to `config.tfvars` BEFORE Phase 4. Delete stale `kubernetes IN A 10.0.20.100`. Consumers reference the FQDN, not the bare IP — future renumbering is then a single record change. | |

 ## Out of scope

@ -74,12 +103,15 @@ naturally pass on each iteration.

 | Risk | Mitigation |
 |---|---|
+| Phase 0 cert regen on existing master triggers a brief apiserver outage (~30s) | Already a known cluster behaviour during static-pod restart. Schedule during a low-activity window. Tigera/operators will crash-loop briefly but recover — same blast radius as today's k8s upgrade. **Once HA is up, future restarts won't have this surface at all.** |
 | etcd quorum split-brain during member join | kubeadm join is atomic; if it fails, the new member doesn't join the quorum. Existing etcd stays healthy. |
-| LB misconfiguration → all kubectl breaks | Smoke-test from each master before flipping clients. Keep a kubeconfig pointing directly at one master as fallback. |
-| Existing kubeconfigs (dev VM, agents, woodpecker) need updating | List all consumers, update in a single TF apply. |
-| New masters get scheduled some workload pods unintentionally | Verify control-plane taint is applied at join time. |
-| Cluster-wide cert rotation might be needed | kubeadm join handles certs automatically using the `--certificate-key` from `kubeadm init phase upload-certs`. |
-| 32GB per master × 3 = 96GB RAM used for control plane alone | Proxmox host has headroom; not blocking. |
+| LB misconfiguration → all kubectl breaks | Smoke-test from each master directly (bypass LB) before flipping clients. Keep a kubeconfig pointing at `10.0.20.100:6443` as fallback. |
+| Existing kubeconfigs (Woodpecker pipelines, agents, dev VM, in-cluster RBAC default) need updating | Single Terraform apply touches `stacks/rbac/modules/rbac/apiserver-oidc.tf` (default), `.woodpecker/*.yml` (committed kubeconfigs). Worker `kubelet.conf` files patched in Phase 4 via ssh loop. |
+| New masters get scheduled workload pods unintentionally | Verify `node-role.kubernetes.io/control-plane:NoSchedule` taint is applied at join time (default with `--control-plane`). |
+| Cert rotation propagation | kubeadm join uses the `--certificate-key` from `kubeadm init phase upload-certs` to fetch existing CA materials. Single short-lived secret in `kube-system/kubeadm-certs` (**2h TTL** — Phases 2 + 3 must complete within the window, or re-upload between them). |
+| 32GB per master × 3 = 96GB RAM used for control plane alone | PVE host has 272GB total, 176GB allocated to cluster pre-HA. Post-HA: 240GB allocated, 32GB headroom. Sufficient. |
+| Pre-existing kubeadm-config does NOT have `controlPlaneEndpoint` set | Phase 0 patches it. Verify: `kubectl -n kube-system get cm kubeadm-config -o yaml \| grep controlPlaneEndpoint` (absent → `10.0.20.99:6443` post-Phase 0). |
+| Existing master cert SANs are `[k8s-master, 10.96.0.1, 10.0.20.100]` only — missing VIP | Phase 0 regens with `--apiserver-cert-extra-sans 10.0.20.99` after patching kubeadm-config. |

 ## Verification

@ -91,12 +123,23 @@ kubectl get nodes -l node-role.kubernetes.io/control-plane=

 # etcd quorum healthy
 kubectl -n kube-system exec etcd-k8s-master -- etcdctl \
-    --endpoints=https://10.0.20.100:2379,https://10.0.20.X:2379,https://10.0.20.Y:2379 \
+    --endpoints=https://10.0.20.100:2379,https://10.0.20.110:2379,https://10.0.20.111:2379 \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
+   
    --cert=/etc/kubernetes/pki/etcd/server.crt \
    --key=/etc/kubernetes/pki/etcd/server.key \
    endpoint health --cluster

+# Kubeconfig points at VIP
+kubectl --kubeconfig ~/.kube/config config view --minify -o jsonpath='{.clusters[0].cluster.server}'
+# Expect: https://10.0.20.99:6443
+
+# Worker kubelet.conf points at VIP
+for n in k8s-node{1,2,3,4}; do
+  ssh wizard@$n.viktorbarzin.lan "sudo grep -E '^\s+server:' /etc/kubernetes/kubelet.conf"
+done
+# Expect: server: https://10.0.20.99:6443 on every node
+
 # Failover test: cordon master-1, reboot it, observe kubectl still works through LB
 kubectl drain k8s-master --delete-emptydir-data --ignore-daemonsets
 ssh wizard@k8s-master.viktorbarzin.lan sudo reboot
@ -110,7 +153,7 @@ kubectl -n k8s-upgrade create job --from=cronjob/k8s-version-check ha-validation

 - 2× VMs at 8 vCPU + 32GB RAM each = +64GB RAM on Proxmox host
 - ~+128GB disk usage (2× 64GB master disks)
- ~2-4 hours of operator time end-to-end (VM provisioning + kubeadm join + LB config + smoke)
+- **~5-7 hours of operator time end-to-end** (cloud-init template bump + Phase 0 retrofit + LB + Phase 1.5 rbac refactor + 2× kubeadm join + Phase 4 cutover + Phase 4.5 etcd-backup fix + Phase 5 kured-gate + Phase 6 validation + Phase 7 chain extension). Phases 0–6 can land in one session; Phase 7 can be deferred a few days if needed.

 ## What's already in place from today's work

--- a/docs/plans/2026-05-21-ha-control-plane-plan.md
+++ b/docs/plans/2026-05-21-ha-control-plane-plan.md
@ -0,0 +1,325 @@
+# HA Control Plane (3 masters) — Plan
+
+**Date**: 2026-05-21 (locked + revised 2026-05-22 after challenger pass)
+**Status**: Drafted, awaiting approval
+**Pairs with**: `2026-05-21-ha-control-plane-design.md`
+**Beads**: `code-n0ow`
+
+## Goal
+
+Migrate the single-master cluster to a 3-master HA control plane behind
+a pfSense HAProxy VIP (`10.0.20.99:6443`), enabling autonomous k8s
+upgrades without storm-cascade manual nursing.
+
+## Topology — before / after
+
+```
+Before                            After
+                                  ┌──────────────────────┐
+                                  │ pfSense HAProxy      │
+                                  │  10.0.20.99:6443     │
+                                  │  TCP, /readyz health │
+                                  └──┬───────┬───────┬───┘
+┌───────────────┐                    │       │       │
+│ k8s-master    │                    ▼       ▼       ▼
+│ 10.0.20.100   │     ┌──────────────┐ ┌────────────┐ ┌────────────┐
+│ apiserver+etcd│     │k8s-master    │ │k8s-master-2│ │k8s-master-3│
+│ + workers join│     │10.0.20.100   │ │10.0.20.110 │ │10.0.20.111 │
+│ directly      │     │(VMID 200)    │ │(VMID 205)  │ │(VMID 206)  │
+└───────────────┘     │apiserver+etcd│ │apiserver+e.│ │apiserver+e.│
+                      └──────────────┘ └────────────┘ └────────────┘
+                          ▲                ▲                ▲
+                          └────────────────┼────────────────┘
+                                           │
+                            etcd quorum (3 members, tolerates 1 down)
+```
+
+## Research decisions (locked — see design doc for full table)
+
+| Decision | Value |
+|---|---|
+| LB strategy | pfSense HAProxy, TCP mode, HTTPS `/readyz` health check |
+| VIP | `10.0.20.99` (FQDN `k8s-apiserver.viktorbarzin.lan`) |
+| New master IPs | `10.0.20.110`, `10.0.20.111` |
+| New master VMIDs | `205`, `206` |
+| Master sizing | 8 vCPU, 32 GB RAM, 64 GB disk (matches existing) |
+| VM provisioning | cloud-init via `create-template-vm` (template bumped v1.32 → v1.34 first; `k8s_join_command = ""` for masters) |
+| etcd | stacked (kubeadm-managed) |
+| Multi-master apiserver flags | rbac stack refactored to loop over master list (Phase 1.5) |
+| controlPlaneEndpoint + cert SAN retrofit | Phase 0, before any new master joins |
+| k8s-version-upgrade chain | extended to multi-master in Phase 7 |
+
+## Callers / blast radius
+
+| Surface | Path | Phase |
+|---|---|---|
+| Worker `/etc/kubernetes/kubelet.conf` × 4 | nodes 1-4 | 4.2 |
+| `/home/wizard/code/infra/config` (root kubeconfig used by every `tg apply`) | repo root | 4.1 |
+| `config.tfvars:115` (`kubernetes IN A 10.0.20.100` zone-file record) | repo root | 1.1 (delete) |
+| `config.tfvars:231` (`k8s_join_command` for cloud-init template) | repo root | 4.1 (flip to VIP) |
+| `stacks/rbac/modules/rbac/{apiserver-oidc,audit-policy,etcd-tuning}.tf` | `var.k8s_master_host` defaults | 1.5 (refactor to list) |
+| `.woodpecker/{default,drift-detection,renew-tls,provision-user}.yml` (4 files × 2 refs each — kubeconfig `server:` AND `curl` lines) | repo root | 4.1 |
+| `stacks/k8s-portal/.../files/src/routes/{download,setup/script}/+server.ts` (`CLUSTER_SERVER` const used to generate user kubeconfigs) | k8s-portal module | 4.1 |
+| `stacks/k8s-version-upgrade/scripts/upgrade-step.sh` (hard-coded `k8s-master` in phase_master) | stack | 7.1 |
+| `stacks/infra-maintenance/.../main.tf` lines 98 + 218 (`node_name = "k8s-master"` on etcd-backup + defrag-etcd CronJobs) | stack | 4.5 |
+| `kured-sentinel-gate` bash loop | `stacks/kured/main.tf` | 5.1 |
+| `docs/architecture/compute.md`, `.claude/skills/uptime-kuma/SKILL.md`, runbooks | docs | 6.3 |
+| **No-op surfaces** (confirmed clean): Vault (uses `kubernetes.default.svc`), Cloudflared (no apiserver tunnel), in-cluster `kubernetes.default.svc` / `10.96.0.1`, etcd-backup CORRECTNESS (snapshot is cluster-wide), kubeadm-managed etcd peer certs (auto-generated on join) | | — |
+
+## Edge cases
+
+- **Phase 0 apiserver restart (~30s)** = same blast radius as today's k8s upgrade (tigera/cnpg/gpu-operator briefly crash). The LB doesn't help here because the new cert isn't yet trusted by clients. Accept the brief outage. Schedule during a low-activity window.
+- **`kubeadm-certs` secret TTL = 2h** (NOT 24h as initially stated). Phase 2 + 3 must complete within the window, or re-upload between them.
+- **pfSense haproxy bootstrap = reset-to-declared-state** on each run (lines 155-158 of the script). Adding master-2 means the apiserver pool is briefly torn down + rebuilt. TCP frontends bounce. Long-poll connections from kubelets break + reconnect. Expect ~2-5s of "kubectl: unable to connect" during pool rewrites.
+- **TCP health check is too lax** for apiserver (listener up ≠ ready). Phase 1 uses HTTPS `GET /readyz` with `verify none` — catches NotReady (etcd unreachable, controller-manager flapping).
+- **Worker kubelet.conf flip**: kubelet TLS bootstrap re-auths against new endpoint on restart. Expect 5-10s NotReady per node during the Phase 4.2 loop.
+- **VIP cannot be the existing master IP**: confirmed `.99` is free (no grep matches, no MetalLB pool conflict — pool is .200-.220).
+- **pfSense reboot windows**: pre-Phase-4 OK (clients still on direct IP), post-Phase-4 breaks everything. Don't migrate near a pfSense maintenance window.
+
+## Phased plan
+
+Reversible up to Phase 4. Phase 4+ reverse via the rollback section.
+
+### Phase 0 — Retrofit existing cluster (~30 min, ~30s of apiserver outage)
+
+- [ ] **0.1 Pre-flight**
+  - [ ] Cluster healthy: `kubectl get nodes` (all Ready), `kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded` empty
+  - [ ] Recent etcd backup valid: `ls -lh /srv/nfs/etcd-backup/ | tail -5`
+  - [ ] Proxmox VM snapshot of `k8s-master`: `ssh root@192.168.1.127 qm snapshot 200 pre-ha-retrofit`
+  - [ ] IPs free: `for ip in 99 110 111; do ping -c1 -W1 10.0.20.$ip && echo "BUSY $ip" || echo "free $ip"; done`
+- [ ] **0.2 Patch `kubeadm-config` ConfigMap via kubeadm (NOT kubectl apply)**
+  - [ ] On master: `sudo kubeadm config print init-defaults --component-configs=KubeletConfiguration > /tmp/kubeadm-new.yaml`
+  - [ ] Hand-edit /tmp/kubeadm-new.yaml: take the existing CM as base, add `controlPlaneEndpoint: 10.0.20.99:6443` under ClusterConfiguration, add `apiServer.certSANs: [10.0.20.99, k8s-apiserver.viktorbarzin.lan]`
+  - [ ] Apply via kubeadm (kubeadm-owned, future `kubeadm upgrade apply` won't overwrite): `sudo kubeadm init phase upload-config kubeadm --config /tmp/kubeadm-new.yaml`
+  - [ ] Verify: `kubectl -n kube-system get cm kubeadm-config -o yaml | grep -E 'controlPlaneEndpoint|certSANs'`
+- [ ] **0.3 Regen apiserver cert**
+  - [ ] On master: `sudo mkdir -p /tmp/apiserver-backup && sudo mv /etc/kubernetes/pki/apiserver.{crt,key} /tmp/apiserver-backup/`
+  - [ ] `sudo kubeadm init phase certs apiserver` (reads patched kubeadm-config)
+  - [ ] Verify: `sudo openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text | grep -A2 'Subject Alternative'` — expect `IP Address:10.0.20.99` PLUS existing SANs (kubeadm adds, doesn't replace)
+- [ ] **0.4 Restart kube-apiserver static pod**
+  - [ ] On master: `sudo kubectl -n kube-system delete pod kube-apiserver-k8s-master --force --grace-period=0`
+  - [ ] Wait: `kubectl wait --for=condition=Ready pod/kube-apiserver-k8s-master -n kube-system --timeout=180s`
+  - [ ] Verify: `kubectl get nodes` works (apiserver alive on direct IP)
+- [ ] **0.5 Panic-mode rollback procedure (DOCUMENTED ONLY — only run if 0.4 fails)**
+  - [ ] `sudo cp /tmp/apiserver-backup/apiserver.{crt,key} /etc/kubernetes/pki/`
+  - [ ] `sudo systemctl restart kubelet` (forces static pod re-read)
+  - [ ] Wait for apiserver Ready; revert kubeadm-config edits via the file backup
+- [ ] **0.6 Verify operators recovered from brief outage**
+  - [ ] `kubectl get pods -n calico-system -l app=tigera-operator -o wide` — Running, restart count incremented by 1 max
+  - [ ] `kubectl get pods -n gpu-operator -o wide` — same
+  - [ ] `kubectl get pods -n cnpg-system -o wide` — same
+
+### Phase 1 — pfSense HAProxy + DNS (~30 min)
+
+- [ ] **1.1 Reserve VIP `10.0.20.99` + DNS**
+  - [ ] Add Virtual IP on pfSense (Firewall → Virtual IPs → IP Alias on VLAN20, `10.0.20.99/24`)
+  - [ ] Add `k8s-apiserver-vip → 10.0.20.99` host alias (Firewall → Aliases → Hosts)
+  - [ ] phpIPAM: register `10.0.20.99` under section "K8s cluster"
+  - [ ] Add DNS A record `k8s-apiserver IN A 10.0.20.99` to `config.tfvars` (and **delete** stale `kubernetes IN A 10.0.20.100` on line 115)
+  - [ ] `scripts/tg apply -target=module.technitium` — confirm zone reload
+- [ ] **1.2 Extend `infra/scripts/pfsense-haproxy-bootstrap.php` for apiserver pool with HTTPS health check**
+  - [ ] Add `build_pool_https()` helper variant (or add `$use_https_readyz` param to existing `build_pool()`) that emits `check_type='HTTP'`, `monitor_uri='/readyz'`, `httpchk_method='GET'`, `ssl='yes'`, `sslverify='no'`
+  - [ ] Add `'apiserver_nodes'` to `$POOL_NAMES`; `'apiserver_proxy_6443'` to `$FRONTEND_NAMES`
+  - [ ] `build_pool_https('apiserver_nodes', '6443', [['k8s-master', '10.0.20.100']])`
+  - [ ] `build_frontend('apiserver_proxy_6443', 'K8s apiserver VIP', '10.0.20.99', '6443', 'apiserver_nodes')`
+- [ ] **1.3 Deploy + validate**
+  - [ ] `scp infra/scripts/pfsense-haproxy-bootstrap.php admin@10.0.20.1:/tmp/ && ssh admin@10.0.20.1 'php /tmp/pfsense-haproxy-bootstrap.php'`
+  - [ ] `ssh admin@10.0.20.1 'sockstat -l | grep 10.0.20.99:6443'` — expect haproxy listening
+  - [ ] `ssh admin@10.0.20.1 "echo 'show servers state' | socat /tmp/haproxy.socket stdio" | grep apiserver` — backend UP (op_state=2)
+- [ ] **1.4 Smoke via VIP**
+  - [ ] From devvm: `curl --cacert /etc/kubernetes/pki/ca.crt https://10.0.20.99:6443/readyz` — expect `ok`
+  - [ ] Build a transient kubeconfig pointing at VIP, run `kubectl get nodes` — succeeds
+  - [ ] **If TLS validation fails: STOP — Phase 0 cert regen didn't include VIP**, rollback Phase 1 and retry Phase 0
+
+### Phase 1.5 — Refactor rbac stack for multi-master (~45 min)
+
+- [ ] **1.5.1 Refactor `stacks/rbac/modules/rbac/{apiserver-oidc,audit-policy,etcd-tuning}.tf`**
+  - [ ] Replace `var.k8s_master_host = "10.0.20.100"` with `var.k8s_master_hosts = list(string)` (default `["10.0.20.100"]`)
+  - [ ] Wrap each `null_resource` / `provisioner "remote-exec"` block in `for_each = toset(var.k8s_master_hosts)` so the same sed runs on every master
+  - [ ] In `stacks/rbac/main.tf` set `k8s_master_hosts = ["10.0.20.100"]` (still single-master in this phase — variable is forward-looking, no behaviour change yet)
+- [ ] **1.5.2 `scripts/tg apply` rbac stack** — confirm zero diff against today (no-op refactor)
+- [ ] **1.5.3 Verify** — sanity: `ssh wizard@k8s-master 'sudo grep oidc-issuer-url /etc/kubernetes/manifests/kube-apiserver.yaml | wc -l'` — expect `1`. Cluster healthy.
+
+### Phase 2 — Cloud-init template bump + master-2 (~75 min)
+
+- [ ] **2.0 Bump cloud-init template (one-time)**
+  - [ ] Edit `infra/modules/create-template-vm/cloud_init.yaml`:
+    - line 49: apt source `pkgs.k8s.io/core:/stable:/v1.32/deb/` → `pkgs.k8s.io/core:/stable:/v1.34/deb/`
+    - line 135: wrap `${k8s_join_command}` in a conditional via cloud-init `if:` template logic, or simpler: add `${k8s_join_command_or_noop}` and let the module pass `""` for masters and the real worker join command for workers (default)
+  - [ ] Update `infra/modules/create-template-vm/main.tf` to add `variable "k8s_join_command" { default = "" }` and a conditional in the templatefile to skip the runcmd line when empty
+  - [ ] Rebuild the template: `scripts/tg apply -target=module.k8s_template` (or whatever the existing template-build target name is in `stacks/infra/main.tf`)
+  - [ ] Verify new template registered in Proxmox at the same template_id
+- [ ] **2.1 Add master-2 VM to Terraform**
+  - [ ] In `stacks/infra/main.tf`: add `module "k8s-master-2"` using `create-vm` from the (now-v1.34) k8s template, with master sizing (8 vCPU / 32GB / 64GB), VMID 205, IP `10.0.20.110`, unique MAC, `vmbr1/vlan 20`, `use_cloud_init = true`, and explicitly pass `k8s_join_command = ""` (so first-boot does NOT auto-join as worker)
+  - [ ] `scripts/tg apply -target=module.k8s-master-2`
+  - [ ] Verify VM booted: `ssh wizard@k8s-master-2.viktorbarzin.lan uname -a` (expect Ubuntu 26.04 LTS, kernel 7.0.x)
+- [ ] **2.2 Prep master-2 for kubeadm join**
+  - [ ] Confirm versions: `ssh wizard@k8s-master-2.viktorbarzin.lan 'kubeadm version; containerd --version'` — expect kubeadm v1.34.x, containerd 2.2.2+
+  - [ ] DNS resolves: `getent hosts k8s-master-2.viktorbarzin.lan`
+- [ ] **2.3 Upload certs on existing master**
+  - [ ] `sudo kubeadm init phase upload-certs --upload-certs` → records `--certificate-key <KEY>`
+  - [ ] **2h TTL** — Phase 2 + 3 must complete within window or re-upload
+- [ ] **2.4 Generate join command**
+  - [ ] `sudo kubeadm token create --print-join-command` → `kubeadm join 10.0.20.99:6443 --token <T> --discovery-token-ca-cert-hash sha256:<H>`
+  - [ ] Append `--control-plane --certificate-key <KEY>`
+- [ ] **2.5 Run join on master-2**
+  - [ ] `ssh wizard@k8s-master-2.viktorbarzin.lan` → run sudo join command from 2.4
+  - [ ] Wait for "This node has joined the cluster"
+- [ ] **2.6 Update rbac stack to include master-2 (propagates OIDC/audit/etcd tuning to it)**
+  - [ ] Edit `stacks/rbac/main.tf`: `k8s_master_hosts = ["10.0.20.100", "10.0.20.110"]`
+  - [ ] `scripts/tg apply` rbac stack
+  - [ ] Verify: `ssh wizard@k8s-master-2 'sudo grep -c oidc-issuer-url /etc/kubernetes/manifests/kube-apiserver.yaml'` — expect `1`
+- [ ] **2.7 Smoke**
+  - [ ] `kubectl get nodes` — 6 nodes, master-2 Ready control-plane
+  - [ ] `kubectl -n kube-system get pods -o wide | grep k8s-master-2` — 4 static pods Running
+  - [ ] etcd member list shows 2 members
+  - [ ] `kubectl --server=https://10.0.20.110:6443 get nodes` — direct probe works
+- [ ] **2.8 Add master-2 to LB pool**
+  - [ ] Edit `pfsense-haproxy-bootstrap.php`: pool now `[['k8s-master', '10.0.20.100'], ['k8s-master-2', '10.0.20.110']]`
+  - [ ] Deploy + verify both backends UP
+
+### Phase 3 — master-3 (~45 min) — same pattern as Phase 2
+
+- [ ] **3.1 Add `module.k8s-master-3` to Terraform** (VMID 206, IP `10.0.20.111`, same template, `k8s_join_command = ""`)
+- [ ] **3.2 Prep verify**
+- [ ] **3.3 Re-upload certs if >2h since Phase 2.3, refresh `--certificate-key`**
+- [ ] **3.4 Generate fresh join command**
+- [ ] **3.5 Run join on master-3**
+- [ ] **3.6 Update rbac stack: `k8s_master_hosts = [".100", ".110", ".111"]`, apply, verify master-3 has OIDC flag**
+- [ ] **3.7 Smoke (7 nodes, 3 control-plane, etcd quorum 3/3)**
+- [ ] **3.8 Add master-3 to LB pool — all three backends UP**
+
+### Phase 4 — Cut over clients and workers to VIP (~45 min)
+
+- [ ] **4.1 Update in-repo kubeconfig consumers (single commit)**
+  - [ ] `/home/wizard/code/infra/config` — flip `server:` to `https://10.0.20.99:6443`
+  - [ ] `config.tfvars:231` — `k8s_join_command` to `kubeadm join 10.0.20.99:6443 ...`
+  - [ ] `stacks/rbac/modules/rbac/apiserver-oidc.tf` — variable `default = "10.0.20.99"` (or whatever the multi-master refactor needs)
+  - [ ] `.woodpecker/default.yml` — flip server: AND curl URL
+  - [ ] `.woodpecker/drift-detection.yml` — flip server: AND curl URL
+  - [ ] `.woodpecker/renew-tls.yml` — flip curl URL (line 18)
+  - [ ] `.woodpecker/provision-user.yml` — flip curl URL (line 41)
+  - [ ] `stacks/k8s-portal/modules/k8s-portal/files/src/routes/download/+server.ts` — `CLUSTER_SERVER` const
+  - [ ] `stacks/k8s-portal/modules/k8s-portal/files/src/routes/setup/script/+server.ts` — same
+  - [ ] Final sweep: `cd /home/wizard/code/infra && grep -rn '10.0.20.100:6443' --include='*.tf' --include='*.yml' --include='*.yaml' --include='*.ts' --include='*.php' --include='*.sh'` — handle anything remaining
+  - [ ] `scripts/tg apply` for rbac + k8s-portal (and any other stacks touched)
+  - [ ] Commit + push (single conventional commit referencing `code-n0ow`)
+- [ ] **4.2 Worker `kubelet.conf` flip (one at a time, with 5-10s expected NotReady)**
+  ```bash
+  for n in k8s-node1 k8s-node2 k8s-node3 k8s-node4; do
+    echo "=== $n ==="
+    ssh wizard@$n.viktorbarzin.lan "sudo sed -i.bak 's|server: https://10.0.20.100:6443|server: https://10.0.20.99:6443|' /etc/kubernetes/kubelet.conf"
+    ssh wizard@$n.viktorbarzin.lan "sudo systemctl restart kubelet"
+    kubectl wait --for=condition=Ready node/$n --timeout=180s
+    echo "$n Ready"
+    sleep 15
+  done
+  ```
+- [ ] **4.3 Existing master's `kubelet.conf`** — same sed + restart on `k8s-master`
+- [ ] **4.4 Verify master-2 + master-3 kubelet.conf already at VIP** (cloud-init join used VIP via controlPlaneEndpoint)
+- [ ] **4.5 Verify everything**
+  - [ ] `kubectl get nodes` — all 7 Ready
+  - [ ] `kubectl --kubeconfig ~/.kube/config config view --minify -o jsonpath='{.clusters[0].cluster.server}'` → `https://10.0.20.99:6443`
+  - [ ] Worker loop: `for n in k8s-{master,node1,node2,node3,node4,master-2,master-3}; do ssh wizard@$n.viktorbarzin.lan "sudo grep server: /etc/kubernetes/kubelet.conf"; done` — all show VIP
+  - [ ] Trigger a no-op Woodpecker pipeline (commit a typo fix in a runbook) — verify the kubeconfig path through the new VIP
+
+### Phase 4.5 — Fix etcd-backup CronJob node pinning (~15 min)
+
+- [ ] **4.5.1 Edit `stacks/infra-maintenance/modules/infra-maintenance/main.tf`**
+  - [ ] backup-etcd (line 98): replace `node_name = "k8s-master"` with `nodeSelector { "node-role.kubernetes.io/control-plane" = "" }` + the corresponding toleration block
+  - [ ] defrag-etcd (line 218): same change
+- [ ] **4.5.2 `scripts/tg apply` infra-maintenance**
+- [ ] **4.5.3 Verify backup runs** — trigger a manual job-from-cronjob, confirm it lands on one of the 3 masters and produces a valid snapshot
+
+### Phase 5 — kured-sentinel-gate quorum check (~15 min)
+
+- [ ] **5.1 Edit `infra/stacks/kured/main.tf`** (insert into the bash heredoc in the sentinel-gate ConfigMap, between all-nodes-Ready and calico-Ready checks)
+  ```bash
+  # Check 3b: control-plane quorum safety (HA invariant)
+  CP_READY=$(kubectl get nodes -l node-role.kubernetes.io/control-plane= --no-headers | grep ' Ready ' | wc -l | tr -d ' ')
+  if [ "$CP_READY" -lt 2 ]; then
+    echo "  BLOCKED: Only $CP_READY control-plane node(s) Ready (need ≥2 for HA)"
+    rm -f /host/var-run/gated-reboot-required
+    sleep 300
+    continue
+  fi
+  echo "  Control-plane quorum safe ($CP_READY Ready)"
+  ```
+- [ ] **5.2 `scripts/tg apply` kured**
+- [ ] **5.3 Verify**
+  - [ ] `kubectl -n kured logs ds/kured-sentinel-gate | tail -50` — expect "Control-plane quorum safe (3 Ready)" line
+  - [ ] Negative test: cordon `k8s-master-2`, wait for the gate to re-evaluate, confirm block message. Restore.
+
+### Phase 6 — E2E validation (~30 min)
+
+- [ ] **6.1 Failover test**
+  - [ ] `kubectl drain k8s-master --delete-emptydir-data --ignore-daemonsets`
+  - [ ] `ssh wizard@k8s-master.viktorbarzin.lan sudo reboot`
+  - [ ] During the 50-90s reboot: tight loop `while true; do kubectl get nodes -o name | wc -l; sleep 2; done` from devvm — line count never drops to 0 (LB transparent)
+  - [ ] After boot: `kubectl uncordon k8s-master`, verify apiserver static pod re-registers in LB pool (op_state=2)
+- [ ] **6.2 All-masters apiserver flag parity**
+  - [ ] `for h in k8s-master k8s-master-2 k8s-master-3; do echo "=== $h ==="; ssh wizard@$h.viktorbarzin.lan 'sudo grep -E "oidc-issuer-url|audit-policy|auto-compaction-retention|snapshot-count" /etc/kubernetes/manifests/{kube-apiserver,etcd}.yaml | sort'; done`
+  - [ ] Expect identical flag set across all 3 masters
+- [ ] **6.3 Update documentation**
+  - [ ] Add `docs/architecture/control-plane.md` — HA topology, etcd member list, LB config location
+  - [ ] Update `.claude/reference/proxmox-inventory.md` — add VMIDs 205, 206
+  - [ ] Add `docs/runbooks/control-plane-add-remove-master.md`
+  - [ ] Update `docs/runbooks/restore-etcd.md` to cover 3-member quorum restore (was single-master only)
+  - [ ] Cross-link `docs/runbooks/mailserver-pfsense-haproxy.md` with the new apiserver_proxy_6443 pool
+
+### Phase 7 — Extend k8s-version-upgrade chain to multi-master (~60 min)
+
+- [ ] **7.1 Edit `stacks/k8s-version-upgrade/scripts/upgrade-step.sh`**
+  - [ ] phase_master: discover masters dynamically — `MASTERS=$($KUBECTL get nodes -l node-role.kubernetes.io/control-plane= -o name | sed 's|node/||')`
+  - [ ] Wrap drain → `update_k8s.sh` → uncordon → wait-ready in a `for m in $MASTERS; do ... done` loop
+  - [ ] Between masters: quorum check — `READY=$($KUBECTL get nodes -l node-role.kubernetes.io/control-plane= --no-headers | grep ' Ready ' | wc -l); [ $READY -ge 2 ] || { slack "ABORT quorum lost"; exit 1; }`
+  - [ ] Update line 9 + 17 comment block to reflect multi-master phase
+  - [ ] Update line 326-340 containerd-bump section to loop over masters
+- [ ] **7.2 Edit `phase_preflight` and the master phase pin**
+  - [ ] Line 209-210 (scheduling_block): allow any control-plane node to be the target
+  - [ ] Line 285 (`kubeadm upgrade plan` check): run against the first master in the list, not specifically `k8s-master`
+- [ ] **7.3 `scripts/tg apply` k8s-version-upgrade**
+- [ ] **7.4 Dry-run test**
+  - [ ] `kubectl -n k8s-upgrade create job --from=cronjob/k8s-version-check ha-validation-$(date +%s)` (no actual upgrade pending — chain should noop the upgrade phase but exercise the discovery loop)
+  - [ ] Verify logs show 3 masters discovered in correct order
+- [ ] **7.5 (Real test on next patch release)** — when 1.34.8 ships:
+  - [ ] Watch the chain execute drain → upgrade → uncordon across all 3 masters in turn
+  - [ ] Confirm no manual intervention needed
+
+### Phase 8 — Close out
+
+- [ ] **8.1 Update beads** — `bd close code-n0ow` once all 6 acceptance criteria met (see below)
+
+## Rollback plan
+
+### Before Phase 4 (no clients flipped)
+
+- **Phase 0**: restore apiserver cert/key from `/tmp/apiserver-backup/`, edit kubeadm-config back, restart kubelet on master.
+- **Phase 1**: remove `apiserver_proxy_6443` + `apiserver_nodes` from `pfsense-haproxy-bootstrap.php`, re-run; revert DNS A record in config.tfvars.
+- **Phase 1.5**: revert rbac stack to single `k8s_master_host` var; apply.
+- **Phase 2/3**: on failed master `sudo kubeadm reset --force`; from a surviving master `etcdctl member remove <id>`; `tg destroy -target=module.k8s-master-N`.
+
+### After Phase 4 (clients flipped)
+
+- Revert all the Phase 4.1 file changes (single revert commit).
+- Reverse the kubelet.conf sed loop (VIP → direct IP) on all 7 nodes.
+- Phase 0 controlPlaneEndpoint can stay — harmless even on full rollback.
+
+### Worst case (etcd corruption / multi-master split-brain)
+
+- Restore from latest etcd snapshot via `etcdctl snapshot restore` to a single master.
+- Rebuild master VM from the Proxmox snapshot taken in Phase 0.1.
+- Cluster back to single-master.
+
+## Acceptance criteria (beads `code-n0ow`)
+
+- [ ] 1. Design doc + plan doc written ✓ (this commit)
+- [ ] 2. Plan approved by user
+- [ ] 3. 3 masters online, etcd quorum healthy, apiserver LB working
+- [ ] 4. k8s upgrade chain runs end-to-end across **all 3 masters** without manual intervention (Phase 7)
+- [ ] 5. kured-sentinel-gate respects quorum (Phase 5)
+- [ ] 6. etcd backup runs from any control-plane node (Phase 4.5)
+
+## Open questions
+
+None — all locked via 2026-05-22 decision pass + challenger amendment pass.
--- a/scripts/update_k8s.sh
+++ b/scripts/update_k8s.sh
@ -89,17 +89,26 @@ if [[ "$ROLE" == "master" ]]; then
    # sync latency post-master-reboot can exceed it). The etcd image IS
    # actually updated by then, so a 2nd attempt sees etcd already on
    # target and skips it. Up to 3 attempts with a 30s delay between.
+    # First attempt: full kubeadm upgrade (incl. etcd). On the static-pod-
+    # hash 5min-timeout failure, retry with --etcd-upgrade=false. The
+    # timeout happens reliably for patch upgrades where etcd's image
+    # doesn't change (kubeadm writes identical manifest → hash doesn't
+    # change → kubeadm waits forever for a change that will never come).
+    # Skipping the etcd phase on retry is safe IF etcd is already on the
+    # right version (which is the only case where this timeout fires).
    attempt=1
-    while ! sudo kubeadm upgrade apply "v$RELEASE" -y; do
+    extra_flags=""
+    while ! sudo kubeadm upgrade apply "v$RELEASE" -y $extra_flags; do
        if (( attempt >= 3 )); then
            echo "ERROR: kubeadm upgrade apply failed after 3 attempts" >&2
            exit 1
        fi
-        echo "==> kubeadm apply attempt $attempt failed (likely static-pod-hash 5m timeout). Sleeping 30s then retrying — the previous attempt's manifest writes usually take hold on the 2nd try."
+        echo "==> kubeadm apply attempt $attempt failed. Retrying with --etcd-upgrade=false (etcd image is unchanged for patch upgrades; kubeadm's static-pod-hash watch is the only thing failing)."
+        extra_flags="--etcd-upgrade=false"
        sleep 30
        attempt=$(( attempt + 1 ))
    done
-    echo "==> kubeadm upgrade apply succeeded on attempt $attempt"
+    echo "==> kubeadm upgrade apply succeeded on attempt $attempt (flags: '$extra_flags')"
 else
    echo "==> Worker path: kubeadm upgrade node"
    sudo kubeadm upgrade node
--- a/secrets/fullchain.pem
+++ b/secrets/fullchain.pem
--- a/secrets/privkey.pem
+++ b/secrets/privkey.pem
--- a/stacks/blog/main.tf
+++ b/stacks/blog/main.tf
@ -150,19 +150,6 @@ module "ingress" {
  }
 }

-module "ingress-www" {
-  source            = "../../modules/kubernetes/ingress_factory"
-  auth              = "none" # Anubis-fronted; PoW challenge gates bots, no Authentik
-  namespace         = kubernetes_namespace.website.metadata[0].name
-  name              = "blog-www"
-  service_name      = module.anubis.service_name
-  port              = module.anubis.service_port
-  extra_middlewares = ["traefik-x402@kubernetescrd"]
-  full_host         = "www.viktorbarzin.me"
-  tls_secret_name   = var.tls_secret_name
-  anti_ai_scraping  = false
-}
-
 # CI retrigger 2026-05-16T13:42:57+00:00 — bulk enrollment apply (pipeline #689 killed)
 # CI retrigger v2 2026-05-16T13:46:35+00:00

--- a/stacks/broker-sync/main.tf
+++ b/stacks/broker-sync/main.tf
@ -271,10 +271,20 @@ resource "kubernetes_cron_job_v1" "imap" {
          }
          spec {
            restart_policy = "OnFailure"
+            # The broker image's user is uid=10001 gid=999, but the shared
+            # data PVC's /data root was created with gid=10001 (legacy from
+            # an earlier image build). Without fsGroup the pod can't write
+            # to the directory — sqlite3 can't create the journal next to
+            # sync.db, hits 'attempt to write a readonly database'.
+            # fsGroup=10001 adds the matching gid to the pod's supplemental
+            # groups so writes succeed.
+            security_context {
+              fs_group = 10001
+            }
            container {
              name    = "broker-sync"
              image   = local.broker_sync_image
-              command = ["broker-sync", "imap"]
+              command = ["broker-sync", "imap-ingest"]

              env {
                name  = "BROKER_SYNC_DATA_DIR"
--- a/stacks/claude-agent-service/main.tf
+++ b/stacks/claude-agent-service/main.tf
@ -454,10 +454,10 @@ resource "kubernetes_deployment" "claude_agent" {
          resources {
            requests = {
              cpu    = "500m"
-              memory = "2Gi"
+              memory = "1Gi"
            }
            limits = {
-              memory = "4Gi"
+              memory = "2Gi"
            }
          }
        }
--- a/stacks/cloudflared/modules/cloudflared/cloudflare.tf
+++ b/stacks/cloudflared/modules/cloudflared/cloudflare.tf
@ -145,16 +145,6 @@ resource "cloudflare_record" "mail_mx" {
 }


-resource "cloudflare_record" "mail_domainkey" {
-  content  = "\"v=DKIM1; k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDIDLB8mhAHNqs1s6GeZMQHOxWweoNKIrqo5tqRM3yFilgfPUX34aTIXNZg9xAmlK+2S/xXO1ymt127ZGMjnoFKOEP8/uZ54iHTCnioHaPZWMfJ7o6TYIXjr+9ShKfoJxZLv7lHJ2wKQK3yOw4lg4cvja5nxQ6fNoGRwo+mQ/mgJQIDAQAB\""
-  name     = "s1._domainkey.viktorbarzin.me"
-  proxied  = false
-  ttl      = 1
-  type     = "TXT"
-  priority = 1
-  zone_id  = var.cloudflare_zone_id
-}
-
 resource "cloudflare_record" "mail_spf" {
  # Brevo replaced Mailgun as the outbound relay on 2026-04-12 (see docs/architecture/mailserver.md).
  # Soft-fail (~all) is intentional during cutover — revisit once relay delivery is stable.
--- a/stacks/cnpg/modules/cnpg/main.tf
+++ b/stacks/cnpg/modules/cnpg/main.tf
@ -47,6 +47,16 @@ resource "helm_release" "cnpg" {
        memory = "256Mi"
      }
    }
+
+    # Tune webhook-cert renewal threshold. CNPG default is 7 days remaining,
+    # which leaves no buffer when the cluster-health check (#22) flags
+    # certs at <30d. Bump to 30 days so the operator rotates well before
+    # external monitoring notices. Cert lifetime stays at chart default 90d.
+    config = {
+      data = {
+        EXPIRING_CHECK_THRESHOLD = "30"
+      }
+    }
  })]
 }

--- a/stacks/crowdsec/modules/crowdsec/values.yaml
+++ b/stacks/crowdsec/modules/crowdsec/values.yaml
@ -5,7 +5,7 @@ agent:
  resources:
    requests:
      cpu: 25m
-      memory: 64Mi
+      memory: 128Mi
    limits:
      memory: 512Mi
  priorityClassName: "tier-1-cluster"
--- a/stacks/k8s-version-upgrade/main.tf
+++ b/stacks/k8s-version-upgrade/main.tf
@ -172,11 +172,22 @@ resource "kubernetes_cluster_role" "k8s_upgrade_job" {
  # --ignore-daemonsets` can classify each pod's owner. Without daemonsets
  # GET permission, drain bails with "cannot delete daemonsets ... is
  # forbidden" for every daemonset-managed pod on the node. (2026-05-20)
+  #
+  # `patch` on deployments added 2026-05-23: phase_master scales tigera-operator
+  # to 0 before drain (operator crashloops during apiserver static-pod swaps,
+  # generates I/O storm that breaks kubeadm's 5-min watch) and back to 1
+  # after master is upgraded. Until HA control plane lands (beads code-n0ow),
+  # this is how we keep autonomous upgrades unblocked.
  rule {
    api_groups = ["apps"]
    resources  = ["daemonsets", "statefulsets", "replicasets", "deployments"]
    verbs      = ["get", "list"]
  }
+  rule {
+    api_groups = ["apps"]
+    resources  = ["deployments", "deployments/scale"]
+    verbs      = ["patch", "update"]
+  }
  # Chain dispatch — create the next Job; reconcile via apply on retry.
  # In `default` ns to also create the etcd-snapshot Job from cronjob/backup-etcd.
  rule {
@ -359,11 +370,17 @@ resource "kubernetes_cron_job_v1" "k8s_version_check" {
                  exit 0
                fi

-                # 1. Detect running version
+                # 1. Detect running version — use the OLDEST kubelet across
+                # all nodes so partial chains (e.g. master upgraded but
+                # workers still pending) don't trick the chain into
+                # thinking the upgrade is complete. Was `.items[0]` (master
+                # only) which made the chain skip when workers were behind.
+                # Fixed 2026-05-23 after node4-only chain failure.
                RUNNING=$(/usr/local/bin/kubectl get nodes \
-                  -o jsonpath='{.items[0].status.nodeInfo.kubeletVersion}' | tr -d v)
+                  -o jsonpath='{range .items[*]}{.status.nodeInfo.kubeletVersion}{"\n"}{end}' \
+                  | tr -d v | sort -V | head -1)
                RUNNING_MINOR=$(echo "$RUNNING" | awk -F. '{print $1"."$2}')
-                echo "Running version: v$RUNNING (minor $RUNNING_MINOR)"
+                echo "Running version (oldest kubelet): v$RUNNING (minor $RUNNING_MINOR)"

                # 2. Latest patch within current minor (refresh master's apt cache)
                LATEST_PATCH=$($SSH wizard@k8s-master.viktorbarzin.lan \
--- a/stacks/k8s-version-upgrade/scripts/upgrade-step.sh
+++ b/stacks/k8s-version-upgrade/scripts/upgrade-step.sh
@ -94,18 +94,41 @@ push() {

 halt_on_alert_query() {
  local extra_ignore="${1:-}"
-  local regex='^(Watchdog|RebootRequired|KuredNodeWasNotDrained|InfoInhibitor'
-  [ -n "$extra_ignore" ] && regex="$regex|$extra_ignore"
-  regex="$regex)$"
+  # ALLOWLIST design (refactored 2026-05-23 from a denylist): halt only on
+  # alerts with severity=critical. Any warning/info-level alert is treated
+  # as informational and doesn't block the chain.
+  #
+  # Why this is the right model:
+  #   - The cluster has long-running warning-level alerts that are NOT
+  #     blockers for a k8s patch (e.g. GPU operator crashloop on the GPU
+  #     node, ingress latency spikes, IO-wait warnings).
+  #   - Maintaining a denylist of every "noisy" alert is a losing battle.
+  #   - Critical alerts are the only ones that should actually stop us
+  #     mid-chain (apiserver down, etcd down, node not ready, etc.).
+  #
+  # `extra_ignore` is now mostly historical — kept for backwards compat with
+  # `halt_on_alert_query "RecentNodeReboot|IngressTTFBCritical"`-style calls. With severity-based
+  # filtering, RecentNodeReboot (severity=info) is filtered automatically.
+  # We still build the regex for any critical alert the caller wants to
+  # explicitly ignore (e.g. a known-broken thing we're aware of).
+  local ignore_regex=""
+  [ -n "$extra_ignore" ] && ignore_regex="^($extra_ignore)\$"

-  # `grep -vE` returns 1 when nothing matches, which under `set -o pipefail`
-  # bubbles up and (via the caller's `alerts=$(...)`) aborts the whole script.
-  # Trailing `|| true` keeps a no-alerts-firing cluster from looking like a
-  # script error. Discovered 2026-05-19 when the chain wouldn't fire on a
-  # genuinely-clean cluster (every alert was Watchdog/RebootRequired/etc.).
-  curl -sf "$PROM/api/v1/alerts" \
-    | jq -r '.data.alerts[] | select(.state == "firing") | .labels.alertname' \
-    | { grep -vE "$regex" || true; } | sort -u
+  # `grep` returns 1 when nothing matches → under `set -o pipefail` that
+  # bubbles up and aborts the script via the caller's `alerts=$(...)`.
+  # Trailing `|| true` on each grep handles the no-matches case.
+  local critical_firing
+  critical_firing=$(curl -sf "$PROM/api/v1/alerts" \
+    | jq -r '.data.alerts[]
+              | select(.state == "firing" and .labels.severity == "critical")
+              | .labels.alertname' 2>/dev/null \
+    | sort -u || true)
+
+  if [ -n "$ignore_regex" ]; then
+    echo "$critical_firing" | { grep -vE "$ignore_regex" || true; }
+  else
+    echo "$critical_firing"
+  fi
 }

 wait_for_node_ready() {
@ -257,7 +280,7 @@ phase_preflight() {
  # is set, often daily). Now skipped — check 3 is the single source of truth
  # for "is the cluster quiet enough to upgrade".
  local alerts
-  alerts=$(halt_on_alert_query RecentNodeReboot)
+  alerts=$(halt_on_alert_query "RecentNodeReboot|IngressTTFBCritical")
  if [ -n "$alerts" ]; then
    slack "ABORT preflight — firing alerts:\n$alerts"
    exit 1
@ -357,15 +380,46 @@ phase_preflight() {
 }

 phase_master() {
+  # Idempotency: skip the whole phase if k8s-master is already on target.
+  # The chain can re-run after a partial failure (e.g. workers got cut
+  # short); without this short-circuit we re-drain and re-kubeadm an
+  # already-upgraded master for no reason. Added 2026-05-23.
+  local current_v
+  current_v=$($KUBECTL get node k8s-master -o jsonpath='{.status.nodeInfo.kubeletVersion}' 2>/dev/null | tr -d v)
+  if [ "$current_v" = "$TARGET_VERSION" ]; then
+    slack "k8s-master already on v$TARGET_VERSION (kubelet=$current_v) — skipping master phase"
+    echo "k8s-master already on v$TARGET_VERSION — skipping"
+    return 0
+  fi
+
  slack "Draining k8s-master"

  # Re-check halt-on-alert before drain. Always ignore RecentNodeReboot —
  # the chain itself causes node reboots, so this alert firing is expected
  # mid-chain (e.g. master was already upgraded+rebooted before this phase).
  local alerts
-  alerts=$(halt_on_alert_query RecentNodeReboot)
+  alerts=$(halt_on_alert_query "RecentNodeReboot|IngressTTFBCritical")
  [ -n "$alerts" ] && { slack "ABORT master — alerts firing pre-drain: $alerts"; exit 1; }

+  # Quiesce noisy operators that crashloop when apiserver briefly disappears
+  # during the static-pod manifest swaps. The crashloop generates a disk-I/O
+  # storm (~500 MB/s observed from tigera-operator alone) that slows the
+  # apiserver↔kubelet status sync past kubeadm's hardcoded 5-min watch on
+  # `kubernetes.io/config.hash`, causing kubeadm to roll back the upgrade.
+  #
+  # The data plane (calico-node DaemonSet, calico-typha, calico-kube-controllers)
+  # keeps running unchanged — only the OPERATOR (a config reconciler) goes away
+  # briefly. Restored at the end of the phase below.
+  #
+  # If the chain dies between quiesce and restore (e.g. kubeadm fails),
+  # manually restore with:
+  #   kubectl -n tigera-operator scale deploy tigera-operator --replicas=1
+  #
+  # Long-term fix: HA control plane (3 masters) so apiserver never goes down
+  # — see docs/plans/2026-05-21-ha-control-plane-{design,plan}.md (beads code-n0ow).
+  echo "Quiescing tigera-operator before master upgrade (it crashes on apiserver outage)"
+  $KUBECTL -n tigera-operator scale deploy tigera-operator --replicas=0 2>&1 || true
+
  drain_node k8s-master

  slack "Running update_k8s.sh on k8s-master (--role master --release $TARGET_VERSION)"
@ -387,21 +441,37 @@ phase_master() {
    exit 1
  fi

-  alerts=$(halt_on_alert_query RecentNodeReboot)
+  alerts=$(halt_on_alert_query "RecentNodeReboot|IngressTTFBCritical")
  [ -n "$alerts" ] && { slack "ABORT master — alerts firing post-upgrade: $alerts"; exit 1; }

+  # Restore tigera-operator (quiesced before drain). It reconciles in seconds.
+  echo "Restoring tigera-operator"
+  $KUBECTL -n tigera-operator scale deploy tigera-operator --replicas=1 2>&1 || true
+
  slack "Master on v$TARGET_VERSION, control-plane Running. Dispatching worker chain."
 }

 phase_worker() {
  [ -z "$TARGET_NODE" ] && { echo "ERROR: worker phase requires TARGET_NODE"; exit 2; }
+
+  # Idempotency: skip if target node is already on target version. Same
+  # rationale as phase_master — chains re-running after partial completion
+  # shouldn't re-drain an already-upgraded worker. Added 2026-05-23.
+  local current_v
+  current_v=$($KUBECTL get node "$TARGET_NODE" -o jsonpath='{.status.nodeInfo.kubeletVersion}' 2>/dev/null | tr -d v)
+  if [ "$current_v" = "$TARGET_VERSION" ]; then
+    slack "$TARGET_NODE already on v$TARGET_VERSION (kubelet=$current_v) — skipping worker phase"
+    echo "$TARGET_NODE already on v$TARGET_VERSION — skipping"
+    return 0
+  fi
+
  slack "Draining $TARGET_NODE"

  # Halt-on-alert wait (up to 30 min). Ignore RecentNodeReboot — the chain
  # just rebooted a node, that's the cause and is expected.
  local attempt alerts
  for attempt in $(seq 1 30); do
-    alerts=$(halt_on_alert_query RecentNodeReboot)
+    alerts=$(halt_on_alert_query "RecentNodeReboot|IngressTTFBCritical")
    [ -z "$alerts" ] && break
    echo "Waiting for alerts to clear (attempt $attempt/30): $alerts"
    sleep 60
@ -432,7 +502,7 @@ phase_worker() {
  # 10-min soak with halt-on-alert (RecentNodeReboot ignored — we know we restarted it)
  echo "Soaking $TARGET_NODE for 10 min..."
  for i in $(seq 1 10); do
-    alerts=$(halt_on_alert_query RecentNodeReboot)
+    alerts=$(halt_on_alert_query "RecentNodeReboot|IngressTTFBCritical")
    [ -n "$alerts" ] && { slack "ABORT $TARGET_NODE mid-soak — alerts: $alerts"; exit 1; }
    sleep 60
  done
@ -458,7 +528,7 @@ phase_postflight() {
  # No alerts firing. Ignore RecentNodeReboot — by definition we just
  # rebooted every node; this alert clears naturally in <1h.
  local alerts
-  alerts=$(halt_on_alert_query RecentNodeReboot)
+  alerts=$(halt_on_alert_query "RecentNodeReboot|IngressTTFBCritical")
  [ -n "$alerts" ] && slack "Postflight WARN — alerts still firing (cluster on target, please check):\n$alerts"

  # Pod-ready ratio
--- a/stacks/kyverno/modules/kyverno/security-policies.tf
+++ b/stacks/kyverno/modules/kyverno/security-policies.tf
@ -328,25 +328,35 @@ resource "kubectl_manifest" "policy_require_trusted_registries" {
                  "docker.n8n.io/*", "registry.gitlab.com/*",
                  # Private
                  "forgejo.viktorbarzin.me/*", "10.0.20.10*",
+                  # Legacy private registry (decommissioned 2026-05-07 per CLAUDE.md
+                  # but council-complaints still references — migrate to Forgejo).
+                  "registry.viktorbarzin.me/*",
                  # DockerHub library (bare image names without slash)
                  "alpine*", "busybox*", "kong*", "mysql*", "nginx*", "postgres*", "python*",
                  # DockerHub user repos (no registry prefix, has slash) —
-                  # enumerated from current cluster state.
-                  "actualbudget/*", "afadil/*", "binwiederhier/*", "bitnami/*",
+                  # enumerated from current cluster state. New entries added
+                  # 2026-05-22 after Enforce caught these as unallowlisted:
+                  # amruthpillai (resume), athomasson2 (ebook2audiobook),
+                  # netboxcommunity (netbox), nousresearch (hermes-agent),
+                  # opentripplanner (osm-routing), rhasspy (whisper/piper).
+                  "actualbudget/*", "afadil/*", "amruthpillai/*", "athomasson2/*",
+                  "binwiederhier/*", "bitnami/*",
                  "clickhouse/*", "cloudflare/*", "coturn/*", "crowdsecurity/*",
                  "curlimages/*", "deluan/*", "dgtlmoon/*", "dolthub/*",
                  "dpage/*", "dperson/*", "edoburu/*", "esanchezm/*",
                  "freikin/*", "freshrss/*", "hackmdio/*", "hashicorp/*",
                  "headscale/*", "jhonderson/*", "kebe/*", "library/*",
                  "lissy93/*", "louislam/*", "matrixdotorg/*", "mendhak/*",
-                  "mghee/*", "mindflavor/*", "mpepping/*", "netsampler/*",
-                  "nvidia/*", "onlyoffice/*", "openresty/*", "owntracks/*",
+                  "mghee/*", "mindflavor/*", "mpepping/*", "netboxcommunity/*",
+                  "netsampler/*", "nousresearch/*", "nvidia/*", "onlyoffice/*",
+                  "openresty/*", "opentripplanner/*", "owntracks/*",
                  "phpipam/*", "phpmyadmin/*", "privatebin/*", "prom/*",
-                  "prompve/*", "rancher/*", "roundcube/*", "sclevine/*",
+                  "prompve/*", "rancher/*", "rhasspy/*", "roundcube/*", "sclevine/*",
                  "shadowsocks/*", "shlinkio/*", "stirlingtools/*",
                  "technitium/*", "teddysun/*", "temporalio/*",
                  "typhonragewind/*", "tzahi12345/*", "vabene1111/*",
-                  "vaultwarden/*", "viktorbarzin/*", "viren070/*", "zelest/*",
+                  "vaultwarden/*", "viktorbarzin/*", "viren070/*",
+                  "woodpeckerci/*", "zelest/*",
                ])
              }]
            }
--- a/stacks/llama-cpp/main.tf
+++ b/stacks/llama-cpp/main.tf
@ -373,10 +373,22 @@ resource "kubernetes_deployment" "llama_swap" {
  lifecycle {
    ignore_changes = [
      spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
+      metadata[0].annotations["keel.sh/match-tag"],
      metadata[0].annotations["keel.sh/policy"],
      metadata[0].annotations["keel.sh/trigger"],
      metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
      spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE
+      # KEEL_LIFECYCLE_V1 — stop the apply→keel fight: every keel digest
+      # update patches `keel.sh/update-time` on the pod template and
+      # `kubernetes.io/change-cause` + bumps the K8s rollout revision on
+      # the Deployment. Without these ignore_changes, every `tg apply`
+      # reverts those, forcing a rollout, which keel then re-patches on
+      # the next 1h poll → llama-swap was rolling several times a day
+      # (~10s model-load downtime each). Upstream :cuda nightly cadence
+      # still triggers a legitimate daily rollout.
+      metadata[0].annotations["kubernetes.io/change-cause"],
+      metadata[0].annotations["deployment.kubernetes.io/revision"],
+      spec[0].template[0].metadata[0].annotations["keel.sh/update-time"],
    ]
  }

--- a/stacks/mailserver/modules/mailserver/main.tf
+++ b/stacks/mailserver/modules/mailserver/main.tf
@ -3,7 +3,7 @@ variable "tier" { type = string }
 variable "mailserver_accounts" {}
 variable "postfix_account_aliases" {}
 variable "opendkim_key" {}
-variable "sasl_passwd" {} # For sendgrid i.e relayhost
+variable "sasl_passwd" {} # SMTP relay (Brevo) SASL credentials
 variable "nfs_server" { type = string }
 # Build the virtual-alias map, dropping aliases where BOTH the source and
 # target are real mailboxes in var.mailserver_accounts (and are different).
@ -83,7 +83,6 @@ resource "kubernetes_config_map" "mailserver_env_config" {
    POSTFIX_MESSAGE_SIZE_LIMIT             = 1024 * 1024 * 200 # 200 MB
    POSTFIX_REJECT_UNKNOWN_CLIENT_HOSTNAME = "1"
    # TLS_LEVEL                              = "intermediate"
-    # DEFAULT_RELAY_HOST = "[smtp.sendgrid.net]:587"
    DEFAULT_RELAY_HOST = "[smtp-relay.brevo.com]:587"
    SPOOF_PROTECTION   = "1"
    SSL_TYPE           = "manual"
--- a/stacks/mailserver/modules/mailserver/variables.tf
+++ b/stacks/mailserver/modules/mailserver/variables.tf
@ -2,7 +2,6 @@
 # see defaults - https://github.com/docker-mailserver/docker-mailserver/blob/master/target/postfix/main.cf
 variable "postfix_cf" {
  default = <<EOT
-#relayhost = [smtp.sendgrid.net]:587
 relayhost = [smtp-relay.brevo.com]:587
 smtp_sasl_auth_enable = yes
 smtp_sasl_password_maps = hash:/etc/postfix/sasl/passwd
--- a/stacks/monitoring/modules/monitoring/loki.yaml
+++ b/stacks/monitoring/modules/monitoring/loki.yaml
@ -70,7 +70,7 @@ singleBinary:
  resources:
    requests:
      cpu: 250m
-      memory: 2Gi
+      memory: 3Gi
    limits:
      memory: 4Gi

--- a/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl
+++ b/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl
@ -84,12 +84,12 @@ alertmanager:
      - source_matchers:
          - alertname = NodeDown
        target_matchers:
-          - alertname =~ "NodeNotReady|NodeConditionBad|PodCrashLooping|ContainerOOMKilled|DeploymentReplicasMismatch|StatefulSetReplicasMismatch|DaemonSetMissingPods|ScrapeTargetDown|NodeLowFreeMemory|PostgreSQLDown|RedisDown|HeadscaleDown|HeadscaleReplicasMismatch|AuthentikDown|PoisonFountainDown|HackmdDown|PrivatebinDown|MailServerDown|EmailRoundtripFailing|EmailRoundtripStale|NodeExporterDown|DockerRegistryDown|HomeAssistantDown|HomeAssistantCriticalSensorUnavailable|CloudflaredDown|TechnitiumDNSDown|iDRACRedfishMetricsMissing|iDRACSNMPMetricsMissing|HomeAssistantMetricsMissing"
+          - alertname =~ "NodeNotReady|NodeConditionBad|PodCrashLooping|ContainerOOMKilled|DeploymentReplicasMismatch|StatefulSetReplicasMismatch|DaemonSetMissingPods|ScrapeTargetDown|NodeLowFreeMemory|PostgreSQLDown|RedisDown|HeadscaleDown|HeadscaleReplicasMismatch|AuthentikDown|PoisonFountainDown|HackmdDown|PrivatebinDown|MailServerDown|EmailRoundtripFailing|EmailRoundtripStale|ViktorBarzinApexDrift|ViktorBarzinApexProbeStale|NodeExporterDown|DockerRegistryDown|HomeAssistantDown|HomeAssistantCriticalSensorUnavailable|CloudflaredDown|TechnitiumDNSDown|iDRACRedfishMetricsMissing|iDRACSNMPMetricsMissing|HomeAssistantMetricsMissing"
      # NFS down causes mass pod failures and NFS-dependent service outages
      - source_matchers:
          - alertname = NFSServerUnresponsive
        target_matchers:
-          - alertname =~ "PodCrashLooping|ContainerOOMKilled|DeploymentReplicasMismatch|StatefulSetReplicasMismatch|DaemonSetMissingPods|ScrapeTargetDown|PostgreSQLDown|RedisDown|AuthentikDown|PoisonFountainDown|HackmdDown|PrivatebinDown|MailServerDown|EmailRoundtripFailing|EmailRoundtripStale|HomeAssistantDown|HomeAssistantCriticalSensorUnavailable"
+          - alertname =~ "PodCrashLooping|ContainerOOMKilled|DeploymentReplicasMismatch|StatefulSetReplicasMismatch|DaemonSetMissingPods|ScrapeTargetDown|PostgreSQLDown|RedisDown|AuthentikDown|PoisonFountainDown|HackmdDown|PrivatebinDown|MailServerDown|EmailRoundtripFailing|EmailRoundtripStale|ViktorBarzinApexDrift|ViktorBarzinApexProbeStale|HomeAssistantDown|HomeAssistantCriticalSensorUnavailable"
      # Traefik down makes service-level alerts noise
      - source_matchers:
          - alertname = TraefikDown
@ -1870,12 +1870,21 @@ serverFiles:
            annotations:
              summary: "Kubelet {{ $labels.operation_type }} p99: {{ $value | printf \"%.0f\" }}s on {{ $labels.instance }} (threshold: 30s)"
          - alert: KubeletRunningContainersDrop
-            expr: (kubelet_running_containers{container_state="running"} - kubelet_running_containers{container_state="running"} offset 10m) < -10
+            # Relative >50% drop vs. 10m ago, sustained for 5m.
+            # Absolute-count threshold removed 2026-05-18: routine drains
+            # routinely drop 10-30 containers and tripped the old `< -10`
+            # rule; only a >50% drop that persists 5m+ indicates a real
+            # node-level fault (kubelet hang, runtime crash, mass eviction).
+            expr: |
+              (
+                (kubelet_running_containers{container_state="running"} - kubelet_running_containers{container_state="running"} offset 10m)
+                / kubelet_running_containers{container_state="running"} offset 10m
+              ) < -0.5
            for: 5m
            labels:
              severity: critical
            annotations:
-              summary: "Running containers on {{ $labels.instance }} dropped by {{ $value | printf \"%.0f\" }} in 10m"
+              summary: "Running containers on {{ $labels.instance }} dropped >50% in 10m ({{ $value | printf \"%.2f\" }} ratio)"
          - alert: CalicoNodeNotReady
            expr: kube_daemonset_status_number_ready{namespace="calico-system", daemonset="calico-node"} < kube_daemonset_status_desired_number_scheduled{namespace="calico-system", daemonset="calico-node"}
            for: 5m
@ -1934,8 +1943,11 @@ serverFiles:
            annotations:
              summary: "Node {{ $labels.node }} kubelet started {{ $value | humanizeDuration }} ago — 1h settle window halts further reboots"
          - alert: MysqlStandaloneDown
+            # Single-replica StatefulSet: brief drain re-scheduling routinely
+            # takes 1-3 min during k8s upgrades. 3m suppresses those blips;
+            # real outages persist longer. Raised from 2m on 2026-05-18.
            expr: kube_statefulset_status_replicas_ready{statefulset="mysql-standalone"} < 1
-            for: 2m
+            for: 3m
            labels:
              severity: critical
            annotations:
@ -2178,6 +2190,9 @@ serverFiles:
            annotations:
              summary: "Critically slow ingress on {{ $labels.service }}: avg latency {{ $value | printf \"%.2f\" }}s (threshold: 3s for 5m)"
          - alert: IngressErrorRate5xxHigh
+            # Rolling upgrades / pod migrations cause brief 5xx spikes that
+            # clear within 1-2 min. Only persistent 5xx indicates a real
+            # problem. Raised from 5m to 10m on 2026-05-18.
            expr: |
              (
                sum(rate(traefik_service_requests_total{code=~"5..", service!~".*nextcloud.*"}[5m])) by (service)
@ -2186,11 +2201,11 @@ serverFiles:
              ) > 5
              and sum(rate(traefik_service_requests_total{service!~".*nextcloud.*"}[5m])) by (service) > 0.1
              and on() (time() - process_start_time_seconds{job="prometheus"}) > 900
-            for: 5m
+            for: 10m
            labels:
              severity: critical
            annotations:
-              summary: "5xx rate on {{ $labels.service }}: {{ $value | printf \"%.1f\" }}% (threshold: 5% for 5m)"
+              summary: "5xx rate on {{ $labels.service }}: {{ $value | printf \"%.1f\" }}% (threshold: 5% for 10m)"
          - alert: AnubisChallengeStoreErrors
            # Anubis exposes only Go-runtime metrics on :9090 (no anubis_* /
            # challenge_* counters), so we proxy via Traefik 5xx on services
@ -2227,12 +2242,23 @@ serverFiles:
            annotations:
              summary: "Cloudflared: {{ $value | printf \"%.0f\" }} replica(s) unavailable"
          - alert: MetalLBSpeakerDown
+            # kubelet restart during k8s upgrade briefly takes the speaker
+            # pod down; typical recovery is 30-45s. The full drain+kubeadm+
+            # apt+kubelet-restart+uncordon cycle in the chain's worker phase
+            # can take a single node out of MetalLB rotation for 5-7 min in
+            # the worst case (depending on PDB stickiness). 10m suppresses
+            # those upgrade-induced blips while still catching genuine
+            # speaker-down conditions.
+            # Reverted from 2m → 10m on 2026-05-23 after node4 upgrade
+            # tripped it mid-soak and aborted the chain. Previous value was
+            # 5m (set 2026-05-18) which was already correct; a brief patch
+            # had tightened it.
            expr: |
              (
                kube_daemonset_status_desired_number_scheduled{namespace="metallb-system", daemonset="metallb-speaker"}
                - on(namespace, daemonset) kube_daemonset_status_number_ready{namespace="metallb-system", daemonset="metallb-speaker"}
              ) > 0
-            for: 5m
+            for: 10m
            labels:
              severity: critical
            annotations:
@ -2337,6 +2363,30 @@ serverFiles:
              severity: warning
            annotations:
              summary: "Email round-trip monitor never reported - check CronJob in mailserver namespace"
+          - alert: ViktorBarzinApexDrift
+            expr: viktorbarzin_apex_correct{job="viktorbarzin-apex-probe"} == 0
+            for: 10m
+            labels:
+              severity: critical
+            annotations:
+              summary: "viktorbarzin.me apex A drifted from expected 10.0.20.200"
+              description: "Technitium serves the split-horizon apex for ~80 *.viktorbarzin.me CNAMEs. If this is wrong, every internal service (auth, vault, immich, ha-sofia, ...) breaks. Check Technitium primary zone records via API or web console."
+          - alert: ViktorBarzinApexProbeStale
+            expr: (time() - viktorbarzin_apex_last_correct_timestamp{job="viktorbarzin-apex-probe"}) > 900
+            for: 5m
+            labels:
+              severity: warning
+            annotations:
+              summary: "viktorbarzin.me apex probe has not seen a correct result in >15 min"
+              description: "Probe may be failing intermittently or apex may be drifting. Check CronJob `viktorbarzin-apex-probe` in `technitium` namespace."
+          - alert: ViktorBarzinApexProbeNeverRun
+            expr: absent(viktorbarzin_apex_correct{job="viktorbarzin-apex-probe"})
+            for: 30m
+            labels:
+              severity: warning
+            annotations:
+              summary: "viktorbarzin.me apex probe never reported"
+              description: "Check `kubectl -n technitium get cronjob viktorbarzin-apex-probe` and the most recent job pod logs."
          - alert: AIOStreamsStreamCountLow
            expr: aiostreams_stream_count{job="aiostreams-stream-probe"} < 50
            for: 30m
--- a/stacks/n8n/main.tf
+++ b/stacks/n8n/main.tf
@ -228,7 +228,7 @@ resource "kubernetes_deployment" "n8n" {
        service_account_name = kubernetes_service_account.n8n.metadata[0].name
        container {
          name  = "n8n"
-          image = "docker.n8n.io/n8nio/n8n:1.80.0"
+          image = "docker.n8n.io/n8nio/n8n:1.80.5"
          env {
            name  = "N8N_PORT"
            value = "5678"
@ -352,10 +352,10 @@ resource "kubernetes_deployment" "n8n" {
          resources {
            requests = {
              cpu    = "25m"
-              memory = "1Gi"
+              memory = "512Mi"
            }
            limits = {
-              memory = "1Gi"
+              memory = "512Mi"
            }
          }
        }
--- a/stacks/nvidia/modules/nvidia/values.yaml
+++ b/stacks/nvidia/modules/nvidia/values.yaml
@ -37,7 +37,7 @@ driver:
  resources:
    requests:
      cpu: "50m"
-      memory: "256Mi"
+      memory: "822Mi"
    limits:
      memory: "2Gi"

--- a/stacks/openclaw/main.tf
+++ b/stacks/openclaw/main.tf
@ -132,24 +132,29 @@ resource "kubernetes_config_map" "openclaw_config" {
            mode = "off"
          }
          model = {
-            # ChatGPT Plus OAuth via openai-codex plugin (account:
-            # ancaelena98@gmail.com). gpt-5.4-mini is the only mini
-            # variant the Codex backend accepts for Plus tier;
-            # gpt-5-mini / gpt-5.1-codex-mini return model_not_found
-            # / "not supported with ChatGPT account". Plus rate-card:
-            # 1,200–7,000 local msgs / 5h on gpt-5.4-mini.
-            #
-            # If you see "No API key found for provider openai-codex"
-            # / "OAuth refresh failed" in logs, the OAuth token has
-            # expired. Re-auth:
-            #   kubectl -n openclaw exec -it $(kubectl -n openclaw \
-            #     get pods -l app=openclaw -o jsonpath='{.items[0].metadata.name}') \
-            #     -c openclaw -- node /app/openclaw.mjs models auth login \
-            #     --provider openai-codex
-            # Follow the OAuth URL+code prompt. Tokens persist on the
-            # openclaw-home PVC so it sticks across pod restarts.
-            primary   = "openai-codex/gpt-5.4-mini"
-            fallbacks = ["openai-codex/gpt-5.5", "modelrelay/auto-fastest", "nim/qwen/qwen3-coder-480b-a35b-instruct"]
+            # 2026-05-22: switched primary to nim/meta/llama-3.1-70b-instruct.
+            # Verified end-to-end with tool calls (sub-second responses,
+            # proper tool_calls in API response). Auth audit on this date:
+            #   - openai-codex OAuth: EXPIRED (ancaelena98@gmail.com,
+            #     ChatGPT Plus). Re-auth requires interactive TTY:
+            #       kubectl -n openclaw exec -it $(kubectl -n openclaw \
+            #         get pods -l app=openclaw -o jsonpath='{.items[0].metadata.name}') \
+            #         -c openclaw -- node /app/openclaw.mjs models auth \
+            #         login --provider openai-codex
+            #   - secret/openclaw → openai_api_key (sk-svcacct…):
+            #     insufficient_quota (billing exhausted)
+            #   - openrouter_api_key: "Key limit exceeded"
+            #   - llama_api_key: region-blocked
+            #   - anthropic_api_key: sk-ant-oat-… (OAuth refresh token,
+            #     NOT a real x-api-key — won't auth)
+            #   - nvidia_api_key: WORKS. nim/meta/llama-3.1-70b-instruct
+            #     and nim/meta/llama-4-maverick-17b-128e-instruct both
+            #     tool-call reliably.
+            # Keep codex as a fallback so it auto-promotes once
+            # re-authed; modelrelay last because it routes to a
+            # small model that hallucinates instead of tool-calling.
+            primary   = "nim/meta/llama-3.1-70b-instruct"
+            fallbacks = ["nim/meta/llama-4-maverick-17b-128e-instruct", "openai-codex/gpt-5.4-mini", "modelrelay/auto-fastest"]
          }
          models = {
            "modelrelay/auto-fastest"                                = {}
@ -159,6 +164,8 @@ resource "kubernetes_config_map" "openclaw_config" {
            "nim/qwen/qwen3-coder-480b-a35b-instruct"                = {}
            "nim/nvidia/llama-3.1-nemotron-ultra-253b-v1"            = {}
            "nim/z-ai/glm5"                                          = {}
+            "nim/meta/llama-3.1-70b-instruct"                        = {}
+            "nim/meta/llama-4-maverick-17b-128e-instruct"            = {}
            "llama-as-openai/Llama-4-Maverick-17B-128E-Instruct-FP8" = {}
            "llama-as-openai/Llama-4-Scout-17B-16E-Instruct-FP8"     = {}
            "openrouter/stepfun/step-3.5-flash:free"                 = {}
@ -244,6 +251,8 @@ resource "kubernetes_config_map" "openclaw_config" {
              { id = "qwen/qwen3-coder-480b-a35b-instruct", name = "Qwen 3 Coder", reasoning = false, input = ["text"], contextWindow = 262000, maxTokens = 16384, cost = { input = 0, output = 0, cacheRead = 0, cacheWrite = 0 } },
              { id = "nvidia/llama-3.1-nemotron-ultra-253b-v1", name = "Nemotron Ultra 253B", reasoning = true, input = ["text"], contextWindow = 128000, maxTokens = 16384, cost = { input = 0, output = 0, cacheRead = 0, cacheWrite = 0 } },
              { id = "z-ai/glm5", name = "GLM-5", reasoning = false, input = ["text"], contextWindow = 128000, maxTokens = 16384, cost = { input = 0, output = 0, cacheRead = 0, cacheWrite = 0 } },
+              { id = "meta/llama-3.1-70b-instruct", name = "Llama 3.1 70B Instruct", reasoning = false, input = ["text"], contextWindow = 128000, maxTokens = 16384, cost = { input = 0, output = 0, cacheRead = 0, cacheWrite = 0 } },
+              { id = "meta/llama-4-maverick-17b-128e-instruct", name = "Llama 4 Maverick (NIM)", reasoning = false, input = ["text"], contextWindow = 1000000, maxTokens = 16384, cost = { input = 0, output = 0, cacheRead = 0, cacheWrite = 0 } },
            ]
          }
          openrouter = {
@ -1110,7 +1119,7 @@ resource "kubernetes_deployment" "openclaw" {
            # at /home/node/.openclaw/.ssh (set up by init 5).
            ln -sfn /home/node/.openclaw/.ssh /home/node/.ssh
            node openclaw.mjs doctor --fix 2>/dev/null
-            node openclaw.mjs models set openai-codex/gpt-5.4-mini 2>/dev/null
+            node openclaw.mjs models set nim/meta/llama-3.1-70b-instruct 2>/dev/null
            node openclaw.mjs mcp set ha "{\"url\":\"$HA_SOFIA_MCP_URL\",\"transport\":\"streamable-http\"}" 2>/dev/null
            node openclaw.mjs mcp set context7 '{"command":"npx","args":["-y","@upstash/context7-mcp"]}' 2>/dev/null
            node openclaw.mjs mcp set playwright '{"url":"http://localhost:3000/mcp","transport":"streamable-http"}' 2>/dev/null
--- a/stacks/postiz/modules/postiz/main.tf
+++ b/stacks/postiz/modules/postiz/main.tf
@ -207,10 +207,10 @@ resource "helm_release" "postiz" {
    resources = {
      requests = {
        cpu    = "100m"
-        memory = "512Mi"
+        memory = "2Gi"
      }
      limits = {
-        memory = "4Gi"
+        memory = "3Gi"
      }
    }

--- a/stacks/proxmox-csi/modules/proxmox-csi/main.tf
+++ b/stacks/proxmox-csi/modules/proxmox-csi/main.tf
@ -83,11 +83,13 @@ resource "helm_release" "proxmox_csi" {
      }
    }

-    # LUKS2 Argon2id key derivation needs ~1GiB memory
+    # LUKS2 Argon2id key derivation needs ~1GiB memory (memory id=712).
+    # Request bumped from 64Mi → 1024Mi (2026-05-23) so the pod is reserved
+    # for the unlock burst instead of risking OOM under node pressure.
    node = {
      plugin = {
        resources = {
-          requests = { cpu = "10m", memory = "64Mi" }
+          requests = { cpu = "10m", memory = "1024Mi" }
          limits   = { memory = "1280Mi" }
        }
      }
--- a/stacks/technitium/modules/technitium/ha.tf
+++ b/stacks/technitium/modules/technitium/ha.tf
@ -123,10 +123,10 @@ resource "kubernetes_deployment" "technitium_secondary" {
          resources {
            requests = {
              cpu    = "100m"
-              memory = "2Gi"
+              memory = "512Mi"
            }
            limits = {
-              memory = "2Gi"
+              memory = "512Mi"
            }
          }
          port {
@ -285,10 +285,10 @@ resource "kubernetes_deployment" "technitium_tertiary" {
          resources {
            requests = {
              cpu    = "100m"
-              memory = "2Gi"
+              memory = "512Mi"
            }
            limits = {
-              memory = "2Gi"
+              memory = "512Mi"
            }
          }
          port {
--- a/stacks/technitium/modules/technitium/main.tf
+++ b/stacks/technitium/modules/technitium/main.tf
@ -179,10 +179,10 @@ resource "kubernetes_deployment" "technitium" {
          resources {
            requests = {
              cpu    = "100m"
-              memory = "2Gi"
+              memory = "1Gi"
            }
            limits = {
-              memory = "2Gi"
+              memory = "1Gi"
            }
          }
          port {
@ -696,3 +696,106 @@ resource "kubernetes_cron_job_v1" "technitium_dns_optimization" {
  }
 }

+# viktorbarzin.me apex DNS drift probe
+# Resolves `viktorbarzin.me A` against the Technitium LoadBalancer IP every
+# 5 min and pushes a Pushgateway gauge. Backstop for the entire
+# split-horizon zone: every internal `*.viktorbarzin.me` CNAME chains through
+# this apex, so if it drifts (ISP rollover, accidental edit), this is the
+# canary. Alerts: ViktorBarzinApexDrift, ApexProbeStale, ApexProbeNeverRun
+# in stacks/monitoring/.
+resource "kubernetes_cron_job_v1" "viktorbarzin_apex_probe" {
+  metadata {
+    name      = "viktorbarzin-apex-probe"
+    namespace = kubernetes_namespace.technitium.metadata[0].name
+  }
+  spec {
+    concurrency_policy            = "Replace"
+    schedule                      = "*/5 * * * *"
+    successful_jobs_history_limit = 1
+    failed_jobs_history_limit     = 3
+    job_template {
+      metadata {}
+      spec {
+        backoff_limit              = 1
+        ttl_seconds_after_finished = 300
+        template {
+          metadata {}
+          spec {
+            container {
+              name  = "probe"
+              image = "docker.io/library/python:3.12-alpine"
+              resources {
+                requests = {
+                  cpu    = "10m"
+                  memory = "48Mi"
+                }
+                limits = {
+                  memory = "96Mi"
+                }
+              }
+              command = ["/bin/sh", "-c", <<-EOT
+                pip install --quiet --disable-pip-version-check dnspython requests && python3 -c '
+import dns.resolver, requests, time, sys
+
+EXPECTED = {"10.0.20.200"}
+NAMESERVER = "10.0.20.201"  # Technitium LB IP
+NAME = "viktorbarzin.me"
+PUSHGATEWAY = "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/viktorbarzin-apex-probe"
+
+resolver = dns.resolver.Resolver(configure=False)
+resolver.nameservers = [NAMESERVER]
+resolver.timeout = 5
+resolver.lifetime = 8
+
+correct = 0
+observed = "unknown"
+try:
+    answer = resolver.resolve(NAME, "A")
+    ips = sorted(str(r) for r in answer)
+    observed = ",".join(ips)
+    correct = 1 if set(ips) <= EXPECTED and ips else 0
+    print(f"apex {NAME} -> {observed} (expected one of {EXPECTED}); correct={correct}")
+except Exception as e:
+    observed = f"error:{type(e).__name__}"
+    print(f"resolve error: {e}", file=sys.stderr)
+
+metric_lines = [
+    "# HELP viktorbarzin_apex_correct 1 if viktorbarzin.me apex resolves to expected IP, 0 otherwise",
+    "# TYPE viktorbarzin_apex_correct gauge",
+    f"viktorbarzin_apex_correct {correct}",
+]
+if correct:
+    metric_lines += [
+        "# HELP viktorbarzin_apex_last_correct_timestamp Unix time of last correct resolution",
+        "# TYPE viktorbarzin_apex_last_correct_timestamp gauge",
+        f"viktorbarzin_apex_last_correct_timestamp {int(time.time())}",
+    ]
+metrics = "\n".join(metric_lines) + "\n"
+try:
+    r = requests.post(PUSHGATEWAY, data=metrics, timeout=10)
+    print(f"pushgateway: {r.status_code}")
+except Exception as e:
+    print(f"pushgateway error: {e}", file=sys.stderr)
+sys.exit(0 if correct else 1)
+'
+              EOT
+              ]
+            }
+            dns_config {
+              option {
+                name  = "ndots"
+                value = "2"
+              }
+            }
+            restart_policy = "OnFailure"
+          }
+        }
+      }
+    }
+  }
+  lifecycle {
+    # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
+    ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
+  }
+}
+
--- a/stacks/trading-bot/main.tf
+++ b/stacks/trading-bot/main.tf
@ -24,7 +24,10 @@ locals {
    TRADING_CORS_ORIGINS                     = "[\"https://trading.viktorbarzin.me\"]"
    TRADING_MEET_KEVIN_POLL_INTERVAL_SECONDS = "10800"
    TRADING_MEET_KEVIN_DAILY_COST_CAP_USD    = "5"
-    TRADING_MEET_KEVIN_LLM_MODEL             = "anthropic/claude-sonnet-4.5"
+    # Haiku-4-5 used in v1 because sk-ant-oat01 OAuth quota on Enterprise
+    # trips a sticky multi-hour 429 on Sonnet after 5-10 burst calls.
+    # Switch to "claude-sonnet-4-5" if/when the Enterprise quota allows.
+    TRADING_MEET_KEVIN_LLM_MODEL             = "claude-haiku-4-5-20251001"
    TRADING_MEET_KEVIN_PROMPT_VERSION        = "v1"
  }
 }
@ -71,7 +74,7 @@ resource "kubernetes_manifest" "external_secret" {
            TRADING_ALPHA_VANTAGE_API_KEY = "{{ .alpha_vantage_api_key }}"
            TRADING_FMP_API_KEY           = "{{ .fmp_api_key }}"
            DBAAS_ROOT_PASSWORD           = "{{ .dbaas_root_password }}"
-            TRADING_OPENROUTER_API_KEY    = "{{ .openrouter_api_key }}"
+            TRADING_ANTHROPIC_OAUTH_TOKEN = "{{ .anthropic_oauth_token }}"
            TRADING_MEET_KEVIN_CHANNEL_ID = "{{ .meet_kevin_channel_id }}"
          }
        }
@ -85,7 +88,7 @@ resource "kubernetes_manifest" "external_secret" {
        { secretKey = "alpha_vantage_api_key", remoteRef = { key = "trading-bot", property = "alpha_vantage_api_key" } },
        { secretKey = "fmp_api_key", remoteRef = { key = "trading-bot", property = "fmp_api_key" } },
        { secretKey = "dbaas_root_password", remoteRef = { key = "trading-bot", property = "dbaas_root_password" } },
-        { secretKey = "openrouter_api_key", remoteRef = { key = "trading-bot", property = "openrouter_api_key" } },
+        { secretKey = "anthropic_oauth_token", remoteRef = { key = "trading-bot", property = "anthropic_oauth_token" } },
        { secretKey = "meet_kevin_channel_id", remoteRef = { key = "trading-bot", property = "meet_kevin_channel_id" } },
      ]
    }
@ -507,16 +510,59 @@ resource "kubernetes_deployment" "trading-bot-workers" {
            }
          }
        }
+        container {
+          name              = "kevin-signal-bridge"
+          image             = "viktorbarzin/trading-bot-service:latest"
+          image_pull_policy = "Always"
+          command           = ["python", "-m", "services.kevin_signal_bridge.main"]
+          dynamic "env" {
+            for_each = local.common_env
+            content {
+              name  = env.key
+              value = env.value
+            }
+          }
+          env {
+            name  = "TRADING_OTEL_METRICS_PORT"
+            value = "9098"
+          }
+          # Kill-switch off in Phase 1 — bridge writes audit rows only,
+          # never publishes to signals:generated.
+          env {
+            name  = "TRADING_KEVIN_ENABLE_TRADING"
+            value = "false"
+          }
+          env_from {
+            secret_ref {
+              name = "trading-bot-secrets"
+            }
+          }
+          env_from {
+            secret_ref {
+              name = "trading-bot-db-creds"
+            }
+          }
+          resources {
+            requests = {
+              cpu    = "10m"
+              memory = "128Mi"
+            }
+            limits = {
+              memory = "256Mi"
+            }
+          }
+        }
      }
    }
  }
  lifecycle {
-    # DRIFT_WORKAROUND: CI pipeline owns image tags for all 4 worker containers. Reviewed 2026-05-22.
+    # DRIFT_WORKAROUND: CI pipeline owns image tags for all 5 worker containers. Reviewed 2026-05-24.
    ignore_changes = [
      spec[0].template[0].spec[0].container[0].image,
      spec[0].template[0].spec[0].container[1].image,
      spec[0].template[0].spec[0].container[2].image,
      spec[0].template[0].spec[0].container[3].image,
+      spec[0].template[0].spec[0].container[4].image,
      spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
    ]
  }
--- a/stacks/traefik/modules/traefik/main.tf
+++ b/stacks/traefik/modules/traefik/main.tf
@ -688,6 +688,14 @@ resource "kubernetes_config_map" "auth_proxy_config" {
      server {
          listen 9000;

+          # Browsers accumulate one authentik_proxy_<random> cookie per Authentik
+          # Proxy Provider on the parent domain. With 30+ services under
+          # viktorbarzin.me the combined Cookie header exceeds nginx's default
+          # 4 x 8k large_client_header_buffers and trips "Too big request header"
+          # (431). Bump to 8 x 64k so the auth check accepts the pile.
+          client_header_buffer_size 8k;
+          large_client_header_buffers 8 64k;
+
          location /outpost.goauthentik.io/auth/traefik {
              proxy_pass http://authentik;
              proxy_connect_timeout 3s;
--- a/stacks/url/main.tf
+++ b/stacks/url/main.tf
@ -226,11 +226,11 @@ resource "kubernetes_deployment" "shlink" {
          # }
          resources {
            limits = {
-              memory = "960Mi"
+              memory = "512Mi"
            }
            requests = {
              cpu    = "25m"
-              memory = "960Mi"
+              memory = "512Mi"
            }
          }
          port {
--- a/stacks/xray/modules/xray/main.tf
+++ b/stacks/xray/modules/xray/main.tf
@ -91,10 +91,6 @@ resource "kubernetes_deployment" "xray" {
          image             = "teddysun/xray"
          name              = "xray"
          image_pull_policy = "IfNotPresent"
-          port {
-            container_port = 6443 // vless
-            protocol       = "TCP"
-          }
          port {
            container_port = 7443 // reality
            protocol       = "TCP"
@ -174,19 +170,16 @@ resource "kubernetes_service" "xray" {
      app = "xray"
    }
    port {
-      name     = "vless"
-      port     = 6443
-      protocol = "TCP"
+      name        = "websocket"
+      port        = 8443
+      target_port = 8443
+      protocol    = "TCP"
    }
    port {
-      name     = "websocket"
-      port     = 8443
-      protocol = "TCP"
-    }
-    port {
-      name     = "grpc"
-      port     = 9443
-      protocol = "TCP"
+      name        = "grpc"
+      port        = 9443
+      target_port = 9443
+      protocol    = "TCP"
    }
  }
 }
@ -249,16 +242,3 @@ module "ingress_grpc" {
  }
 }

-module "ingress_vless" {
-  source = "../../../../modules/kubernetes/ingress_factory"
-  # VPN protocol (VLESS) — native xray clients, not browsers.
-  # auth = "none": VPN protocol (VLESS) — native xray clients, not browsers; forward-auth incompatible.
-  auth            = "none"
-  dns_type        = "proxied"
-  namespace       = kubernetes_namespace.xray.metadata[0].name
-  name            = "xray-vless"
-  service_name    = "xray"
-  host            = "xray-vless"
-  port            = 6443
-  tls_secret_name = var.tls_secret_name
-}
--- a/state/stacks/cnpg/terraform.tfstate.enc
+++ b/state/stacks/cnpg/terraform.tfstate.enc