infra/stacks/kyverno/modules/kyverno
Viktor Barzin 0216e993dc
Some checks failed
ci/woodpecker/push/build-cli Pipeline was successful
ci/woodpecker/push/default Pipeline failed
etcd-load-reduction: remove VPA/Goldilocks, disable kyverno reporting, descheduler hourly
The control-plane flap (etcd lease-renewal timeouts) recurred. Rather than move
etcd to SSD (code-oflt, deferred again), the chosen direction is to REDUCE etcd
load enough that the leader-election-timeout band-aid (renew 10s->30s) becomes
removable. These are the big, clean cuts:

1. Remove VPA/Goldilocks (stacks/vpa emptied). All 349 VPAs ran updateMode=Off
   (no auto-right-sizing) yet cost ~800 etcd objects + continuous recommender
   writes + a pod-creation admission webhook, purely to feed a dashboard. krr
   (Dockerized, on-demand) replaces it. Reverses the re-add after memory 2431.

2. Disable kyverno reporting (admission/aggregate/background). policyReports were
   already off, so the pipeline generated ephemeralreports + an hourly
   all-resource etcd re-scan for NO user-facing output. Admission enforcement
   (deny-* policies) and Keel mutation are unaffected; violations surface via
   Loki->Slack.

3. descheduler */5 -> hourly (fewer list/evict cycles; rebalancing isn't urgent).

Deferred (poor ROI / unsafe as planned): ESO refreshInterval 15m->1h is a
~20-stack sprawl for ~0.1 writes/s; keel background=false is invalid for a
mutate-existing policy and its churn is apply-time not steady-state. Both filed
as follow-up beads.

Post-apply: delete the chart-orphaned VPA CRDs to cascade-clean leftover CRs.
Then measure etcd apply-latency and revert the timeouts. Docs updated
(VPA/Goldilocks -> krr). See memory 5402-5407.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-12 19:41:22 +00:00
..
dependency-init-containers.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
keel-annotations.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
main.tf etcd-load-reduction: remove VPA/Goldilocks, disable kyverno reporting, descheduler hourly 2026-06-12 19:41:22 +00:00
registry-credentials.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
resource-governance.tf tts+kyverno: non-merge apply trigger (merge-commit diff hid stacks/tts from the stack detector) 2026-06-11 19:08:23 +00:00
security-policies.tf android-emulator: new stack — shared in-cluster Android 16 testing instance 2026-06-11 19:51:57 +00:00
tls-secret-sync.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
versions.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00