keel: enroll 15 critical-path namespaces for digest-only auto-update

Per user decision today: monitoring, mailserver, vault, descheduler,
metrics-server, traefik, technitium, crowdsec, redis, reverse-proxy,
reloader, headscale, wireguard, xray, cloudflared now participate in
the same `force + match-tag` regime as the rest of the cluster — Keel
watches the deployment's CURRENT tag for digest changes only and rolls
on push, never rewriting tag strings.

Two-part change:

stacks/kyverno/modules/kyverno/keel-annotations.tf
  Trim the policy-level namespace exclude list from 31 → 16. The 16
  remaining exclusions are the irreducible cluster-operator + state-
  coupled set: keel itself, calico-system + tigera-operator (operator
  loop), authentik (2026-05-17 pgbouncer incident bite), cnpg-system +
  dbaas (state-coupled), kyverno, metallb-system, external-secrets,
  proxmox-csi + nfs-csi + nvidia (just stabilized today, chart-pinned),
  kube-system, vpa, sealed-secrets, infra-maintenance.

stacks/<each-of-15>/.../main.tf
  Add `"keel.sh/enrolled" = "true"` label to the `kubernetes_namespace`
  resource so the Kyverno mutate policy can target the workloads via
  its namespaceSelector matchLabels.

Note on the apply path: the live ClusterPolicy was patched via
`kubectl patch` because the hashicorp/kubernetes provider v3.1.0 panics
during state refresh on Kyverno ClusterPolicy schemas with deeply
nested optional `context.celPreconditions` / `imageRegistry` fields
(see crash dump). The TF source above has the desired state, so any
clean future apply on a fixed provider version will be a no-op against
the live cluster.

Floating-tag workloads in the newly-enrolled set (will roll on every
upstream digest update — acceptable risk per user):
  - wireguard: sclevine/wg:latest (image fixed today via iptables-nft
    postStart shim)
  - xray: teddysun/xray
  - crowdsec-web: viktorbarzin/crowdsec_web
  - monitoring: prompve/prometheus-pve-exporter:latest, prom/snmp-exporter
  - traefik: nginx:1-alpine, openresty/openresty:alpine,
    ghcr.io/tarampampam/error-pages:3
  - redis: haproxy:3.1-alpine, redis:8-alpine

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-17 12:13:22 +00:00
parent f5cf6ec051
commit 3bdba9f388
16 changed files with 40 additions and 28 deletions

View file

@ -6,7 +6,8 @@ resource "kubernetes_namespace" "cloudflared" {
metadata {
name = "cloudflared"
labels = {
tier = var.tier
tier = var.tier
"keel.sh/enrolled" = "true"
}
}
lifecycle {

View file

@ -29,6 +29,7 @@ resource "kubernetes_namespace" "crowdsec" {
labels = {
tier = var.tier
"resource-governance/custom-quota" = "true"
"keel.sh/enrolled" = "true"
}
}
lifecycle {

View file

@ -4,7 +4,7 @@ resource "kubernetes_namespace" "descheduler" {
metadata {
name = "descheduler"
labels = {
tier = local.tiers.cluster
tier = local.tiers.cluster
"keel.sh/enrolled" = "true"
}
}

View file

@ -25,7 +25,8 @@ resource "kubernetes_namespace" "headscale" {
metadata {
name = "headscale"
labels = {
tier = var.tier
tier = var.tier
"keel.sh/enrolled" = "true"
}
}
lifecycle {

View file

@ -72,19 +72,27 @@ resource "kubernetes_manifest" "policy_inject_keel_annotations" {
# - proxmox-csi, nfs-csi, nvidia, tigera-operator: hardware/CNI
# coordination
# - cloudflared, headscale, wireguard, xray: VPN/tunnel critical
# - mailserver, crowdsec, redis, reverse-proxy: stateful critical
# - infra-maintenance, metrics-server: cluster utilities
# - infra-maintenance: cluster utilities
#
# 2026-05-17 ENROLLMENT EXPANSION: removed 15 namespaces from
# the exclude list per explicit user decision auto-updates
# are now allowed in monitoring, mailserver, vault,
# descheduler, metrics-server, traefik, technitium, crowdsec,
# redis, reverse-proxy, reloader, headscale, wireguard, xray,
# cloudflared. The `force + match-tag` pairing limits each to
# digest-only watches under the deployment's CURRENT tag
# string no tag-switching, just rolls on upstream digest
# changes for the pinned tag. A few are on floating tags
# (sclevine/wg:latest, teddysun/xray, prompve/...:latest,
# nginx:1-alpine, redis:8-alpine, error-pages:3); those will
# roll whenever upstream pushes. Acceptable risk the user
# has alerts in place to catch regressions.
namespaces = [
"keel",
"calico-system",
"authentik",
"vault",
"cnpg-system",
"dbaas",
"monitoring",
"traefik",
"technitium",
"mailserver",
"kyverno",
"metallb-system",
"external-secrets",
@ -92,19 +100,9 @@ resource "kubernetes_manifest" "policy_inject_keel_annotations" {
"nfs-csi",
"nvidia",
"kube-system",
"cloudflared",
"crowdsec",
"reverse-proxy",
"reloader",
"descheduler",
"vpa",
"redis",
"sealed-secrets",
"headscale",
"wireguard",
"xray",
"infra-maintenance",
"metrics-server",
"tigera-operator",
]
}

View file

@ -29,7 +29,8 @@ resource "kubernetes_namespace" "mailserver" {
metadata {
name = "mailserver"
labels = {
tier = var.tier
tier = var.tier
"keel.sh/enrolled" = "true"
}
# connecting via localhost does not seem to work?
# labels = {

View file

@ -5,7 +5,8 @@ resource "kubernetes_namespace" "metrics-server" {
metadata {
name = "metrics-server"
labels = {
tier = var.tier
tier = var.tier
"keel.sh/enrolled" = "true"
}
}
lifecycle {

View file

@ -54,6 +54,7 @@ resource "kubernetes_namespace" "monitoring" {
"istio-injection" : "disabled"
tier = var.tier
"resource-governance/custom-quota" = "true"
"keel.sh/enrolled" = "true"
}
}
lifecycle {

View file

@ -6,7 +6,8 @@ resource "kubernetes_namespace" "redis" {
metadata {
name = "redis"
labels = {
tier = var.tier
tier = var.tier
"keel.sh/enrolled" = "true"
}
}
lifecycle {

View file

@ -2,7 +2,7 @@ resource "kubernetes_namespace" "crowdsec" {
metadata {
name = "reloader"
labels = {
tier = local.tiers.aux
tier = local.tiers.aux
"keel.sh/enrolled" = "true"
}
}

View file

@ -11,6 +11,9 @@ variable "haos_homepage_token" {
resource "kubernetes_namespace" "reverse-proxy" {
metadata {
labels = {
"keel.sh/enrolled" = "true"
}
name = "reverse-proxy"
}
lifecycle {

View file

@ -13,7 +13,8 @@ resource "kubernetes_namespace" "technitium" {
metadata {
name = "technitium"
labels = {
tier = var.tier
tier = var.tier
"keel.sh/enrolled" = "true"
}
# stale cache error when trying to resolve
# labels = {

View file

@ -29,6 +29,7 @@ resource "kubernetes_namespace" "traefik" {
"app.kubernetes.io/name" = "traefik"
"app.kubernetes.io/instance" = "traefik"
tier = var.tier
"keel.sh/enrolled" = "true"
}
}
lifecycle {

View file

@ -10,7 +10,7 @@ resource "kubernetes_namespace" "vault" {
metadata {
name = "vault"
labels = {
tier = local.tiers.core
tier = local.tiers.core
"keel.sh/enrolled" = "true"
}
}

View file

@ -14,7 +14,8 @@ resource "kubernetes_namespace" "wireguard" {
metadata {
name = "wireguard"
labels = {
tier = var.tier
tier = var.tier
"keel.sh/enrolled" = "true"
}
}
lifecycle {

View file

@ -23,7 +23,8 @@ resource "kubernetes_namespace" "xray" {
metadata {
name = "xray"
labels = {
tier = var.tier
tier = var.tier
"keel.sh/enrolled" = "true"
}
}
lifecycle {