Context
-------
The cluster policy is "no CPU limits anywhere" — CFS throttling causes
more harm than good for bursty single-threaded workloads (Node.js,
Python). LimitRanges are already correct (defaultRequest.cpu only, no
default.cpu), but 22 pods still carried CPU limits injected by upstream
Helm chart defaults — CrowdSec (lapi + agents), descheduler,
kubernetes-dashboard (×4), nvidia gpu-operator.
Previous attempts were ad-hoc: patch each values.yaml, occasionally
missing things on chart upgrade. This replaces that with a declarative
Kyverno mutation at admission time.
This change
-----------
Adds a new ClusterPolicy `strip-cpu-limits` with two foreach rules:
strip-container-cpu-limit → containers[]
strip-initcontainer-cpu-limit → initContainers[]
Each rule uses `patchesJson6902` with an `op: remove` on
`resources/limits/cpu`. JSON6902 `remove` fails on missing paths, so
per-element preconditions gate the mutation — pods without CPU limits
pass through untouched. A top-level rule precondition short-circuits
using JMESPath filter (`[?resources.limits.cpu != null] | length(@) > 0`)
so the mutation is a no-op for the overwhelming majority of pods.
Admission-time only. No `mutateExistingOnPolicyUpdate`, no `background`.
Existing pods keep their CPU limits until they're restarted naturally
(Helm upgrade, node drain, rollout). We rely on churn, not forced
restarts, to avoid unnecessary thrash.
Memory limits are preserved — they prevent OOM, still useful.
Flow
----
admission request → match Pod + CREATE
→ top-level precondition: any container has limits.cpu?
no → skip (fast path)
yes → foreach container:
element.limits.cpu present?
no → skip element
yes → remove /spec/containers/N/resources/limits/cpu
→ same again for initContainers
→ mutated pod proceeds to API server
Verification
------------
kubectl run test-strip-cpu --overrides='{limits:{cpu:500m,memory:64Mi}}'
→ admitted pod.resources = {limits:{memory:64Mi}, requests:{cpu:50m,memory:32Mi}}
→ CPU limit stripped, memory preserved, requests untouched
kubectl rollout restart deploy/kubernetes-dashboard-metrics-scraper
→ new pod.resources = {limits:{memory:400Mi}, requests:{cpu:100m,memory:200Mi}}
→ cluster-wide count of pods with CPU limits: 22 → 21
Rollout
-------
Remaining 21 pods will drop their CPU limits on natural churn. No manual
restarts in this change — user may want to time a mass restart with a
maintenance window.
Closes: code-eaf
Closes: code-4bz
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two-tier state architecture:
- Tier 0 (infra, platform, cnpg, vault, dbaas, external-secrets): local
state with SOPS encryption in git — unchanged, required for bootstrap.
- Tier 1 (105 app stacks): PostgreSQL backend on CNPG cluster at
10.0.20.200:5432/terraform_state with native pg_advisory_lock.
Motivation: multi-operator friction (every workstation needed SOPS + age +
git-crypt), bootstrap complexity for new operators, and headless agents/CI
needing the full encryption toolchain just to read state.
Changes:
- terragrunt.hcl: conditional backend (local vs pg) based on tier0 list
- scripts/tg: tier detection, auto-fetch PG creds from Vault for Tier 1,
skip SOPS and Vault KV locking for Tier 1 stacks
- scripts/state-sync: tier-aware encrypt/decrypt (skips Tier 1)
- scripts/migrate-state-to-pg: one-shot migration script (idempotent)
- stacks/vault/main.tf: pg-terraform-state static role + K8s auth role
for claude-agent namespace
- stacks/dbaas: terraform_state DB creation + MetalLB LoadBalancer
service on shared IP 10.0.20.200
- Deleted 107 .tfstate.enc files for migrated Tier 1 stacks
- Cleaned up per-stack tiers.tf (now generated by root terragrunt.hcl)
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pipeline pods pull from registry.viktorbarzin.me:5050 but the
registry-credentials secret only had auth for registry.viktorbarzin.me
(without port). Containerd requires exact hostname:port match.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add cleanup-failed-pods policy that runs hourly (at :15) to delete all
pods in Failed phase cluster-wide. Prevents stale evicted and failed
CronJob pods from accumulating and creating healthcheck noise.
Also adds ClusterRole + ClusterRoleBinding to grant Kyverno cleanup
controller permission to delete Pods (not included by default).
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- daily-backup: handle rsync exit 23 (partial transfer) as OK for LUKS
noload mounts — in-flight writes have corrupt metadata from skipped
journal replay, but core data is intact
- daily-backup: clean up stale LUKS dm mappings from previous crashed
runs before attempting to open
- daily-backup: capture rsync exit code safely with set -e (|| pattern)
- kyverno: bump tier-4-aux requests.memory 2Gi→3Gi (servarr was at 83%)
- actualbudget: patched custom quota 5Gi→6Gi (was at 82%)
Verified: backup now completes status=0 (96 PVCs OK, 0 failed)
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kyverno's tier-1-cluster LimitRange had max=4Gi which blocked
mysql-cluster-2 from starting after we bumped MySQL to 6Gi limit.
Also added custom LimitRange in dbaas stack (for when Terraform
manages it directly).
Kyverno ClusterPolicy clones tls-secret from kyverno namespace to all
namespaces with synchronize=true. Renewal pipeline now updates the source
secret via kubectl, verifies cert validity, and sends Slack notification.
- Increase tier-2-gpu requests.memory from 8Gi to 12Gi to give immich
ML pods scheduling headroom (was at 96% utilization)
- Add critical NvidiaExporterDown Prometheus alert that fires when GPU
metrics are absent for >10 minutes (faster than generic ScrapeTargetDown)
- Grant kyverno-admission-controller and kyverno-background-controller
permissions to manage Secrets (required for generate clone rules)
- Add containerd hosts.toml for 10.0.20.10:5050 with skip_verify=true
(wildcard cert doesn't cover IP SANs) — applied to all nodes + template
- Add auth.htpasswd section to config-private.yml
- Mount htpasswd file in registry-private container, fix healthcheck for 401
- Rename registry UI from registry.viktorbarzin.me → docker.viktorbarzin.me
- Add Docker CLI ingress at registry.viktorbarzin.me (HTTPS backend, no rate-limit, unlimited body)
- Add docker to cloudflare_proxied_names (registry stays non-proxied)
- Add Kyverno ClusterPolicy to sync registry-credentials secret to all namespaces
- Update infra provisioning to install apache2-utils and generate htpasswd from Vault
Phase 2 of platform stack split. 5 more modules extracted into
independent stacks. All applied successfully with zero destroys.
Cloudflared now reads k8s_users from Vault directly to compute
user_domains. Woodpecker pipeline runs all 8 extracted stacks
in parallel. Memory bumped to 6Gi for 9 concurrent TF processes.
Platform reduced from 27 to 19 modules.