infra/stacks/descheduler/main.tf
Viktor Barzin 3bdba9f388 keel: enroll 15 critical-path namespaces for digest-only auto-update
Per user decision today: monitoring, mailserver, vault, descheduler,
metrics-server, traefik, technitium, crowdsec, redis, reverse-proxy,
reloader, headscale, wireguard, xray, cloudflared now participate in
the same `force + match-tag` regime as the rest of the cluster — Keel
watches the deployment's CURRENT tag for digest changes only and rolls
on push, never rewriting tag strings.

Two-part change:

stacks/kyverno/modules/kyverno/keel-annotations.tf
  Trim the policy-level namespace exclude list from 31 → 16. The 16
  remaining exclusions are the irreducible cluster-operator + state-
  coupled set: keel itself, calico-system + tigera-operator (operator
  loop), authentik (2026-05-17 pgbouncer incident bite), cnpg-system +
  dbaas (state-coupled), kyverno, metallb-system, external-secrets,
  proxmox-csi + nfs-csi + nvidia (just stabilized today, chart-pinned),
  kube-system, vpa, sealed-secrets, infra-maintenance.

stacks/<each-of-15>/.../main.tf
  Add `"keel.sh/enrolled" = "true"` label to the `kubernetes_namespace`
  resource so the Kyverno mutate policy can target the workloads via
  its namespaceSelector matchLabels.

Note on the apply path: the live ClusterPolicy was patched via
`kubectl patch` because the hashicorp/kubernetes provider v3.1.0 panics
during state refresh on Kyverno ClusterPolicy schemas with deeply
nested optional `context.celPreconditions` / `imageRegistry` fields
(see crash dump). The TF source above has the desired state, so any
clean future apply on a fixed provider version will be a no-op against
the live cluster.

Floating-tag workloads in the newly-enrolled set (will roll on every
upstream digest update — acceptable risk per user):
  - wireguard: sclevine/wg:latest (image fixed today via iptables-nft
    postStart shim)
  - xray: teddysun/xray
  - crowdsec-web: viktorbarzin/crowdsec_web
  - monitoring: prompve/prometheus-pve-exporter:latest, prom/snmp-exporter
  - traefik: nginx:1-alpine, openresty/openresty:alpine,
    ghcr.io/tarampampam/error-pages:3
  - redis: haproxy:3.1-alpine, redis:8-alpine

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 14:16:56 +00:00

108 lines
2.6 KiB
HCL

resource "kubernetes_namespace" "descheduler" {
metadata {
name = "descheduler"
labels = {
tier = local.tiers.cluster
"keel.sh/enrolled" = "true"
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
}
resource "kubernetes_cluster_role" "descheduler" {
metadata {
name = "descheduler-cluster-role"
}
rule {
api_groups = [""]
resources = ["events"]
verbs = ["create", "update"]
}
rule {
api_groups = ["metrics.k8s.io"]
resources = ["nodes"]
verbs = ["get", "watch", "list"]
}
rule {
api_groups = [""]
resources = ["namespaces"]
verbs = ["get", "list", "watch"]
}
rule {
api_groups = ["metrics.k8s.io"]
resources = ["pods"]
verbs = ["get", "watch", "list", "delete"]
}
rule {
api_groups = [""]
resources = ["pods/eviction"]
verbs = ["create"]
}
rule {
api_groups = [""]
resources = ["scheduling.k8s.io"]
verbs = ["get", "watch", "list"]
}
rule {
api_groups = ["scheduling.k8s.io"]
resources = ["priorityclasses"]
verbs = ["get", "list", "watch"]
}
rule {
api_groups = ["policy"]
resources = ["poddisruptionbudgets"]
verbs = ["get", "list", "watch"]
}
}
resource "kubernetes_service_account" "descheduler" {
metadata {
name = "descheduler-sa"
namespace = kubernetes_namespace.descheduler.metadata[0].name
}
}
resource "kubernetes_cluster_role_binding" "descheduler" {
metadata {
name = "descheduler-cluster-role-binding"
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "ClusterRole"
name = "descheduler-cluster-role"
}
subject {
name = "descheduler-sa"
kind = "ServiceAccount"
namespace = kubernetes_namespace.descheduler.metadata[0].name
}
}
resource "helm_release" "descheduler" { # rename me
namespace = kubernetes_namespace.descheduler.metadata[0].name
name = "descheduler"
repository = "https://kubernetes-sigs.github.io/descheduler/"
chart = "descheduler"
values = [templatefile("${path.module}/values.yaml", {})]
}
# CI retrigger 2026-05-16T13:42:57+00:00 — bulk enrollment apply (pipeline #689 killed)
# CI retrigger v2 2026-05-16T13:46:35+00:00
# CI retrigger v3 2026-05-16T14:06:39Z
# CI retrigger v4 2026-05-16T14:13:59Z
# CI retrigger v5 2026-05-16T23:10:38Z
# CI retrigger v6 2026-05-16T23:18:58Z