Kyverno ClusterPolicy reads dependency.kyverno.io/wait-for annotation
and injects busybox init containers that block until each dependency
is reachable (nc -z). Annotations added to 18 stacks (24 deployments).
Includes graceful-db-maintenance.sh script for planned DB maintenance
(scales dependents to 0, saves replica counts, restores on startup).
The + in +00:00 timezone offsets was being URL-decoded to a space,
causing ValueError on the /api/memories/sync endpoint. Build :17
includes the fix. Using versioned tag instead of :latest to avoid
pull-through cache serving stale images.
- Set memory requests = limits across 56 stacks to prevent overcommit
- Right-sized limits based on actual pod usage (2x actual, rounded up)
- Scaled down trading-bot (replicas=0) to free memory
- Fixed OOMKilled services: forgejo, dawarich, health, meshcentral,
paperless-ngx, vault auto-unseal, rybbit, whisper, openclaw, clickhouse
- Added startup+liveness probes to calibre-web
- Bumped inotify limits on nodes 2,3 (max_user_instances 128->8192)
Post node2 OOM incident (2026-03-14). Previous kubelet config had no
kubeReserved/systemReserved set, allowing pods to starve the kernel.
- Add vault provider to root terragrunt.hcl (generated providers.tf)
- Delete stacks/vault/vault_provider.tf (now in generated providers.tf)
- Add 124 variable declarations + 43 vault_kv_secret_v2 resources to
vault/main.tf to populate Vault KV at secret/<stack-name>
- Migrate 43 consuming stacks to read secrets from Vault KV via
data "vault_kv_secret_v2" instead of SOPS var-file
- Add dependency "vault" to all migrated stacks' terragrunt.hcl
- Complex types (maps/lists) stored as JSON strings, decoded with
jsondecode() in locals blocks
Bootstrap secrets (vault_root_token, vault_authentik_client_id,
vault_authentik_client_secret) remain in SOPS permanently.
Apply order: vault stack first (populates KV), then all others.