[infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip]

## Context

Phase 1 of the state-drift consolidation audit (plan Wave 3) identified that
the entire repo leans on a repeated `lifecycle { ignore_changes = [...dns_config] }`
snippet to suppress Kyverno's admission-webhook dns_config mutation (the ndots=2
override that prevents NxDomain search-domain flooding). 27 occurrences across
19 stacks. Without this suppression, every pod-owning resource shows perpetual
TF plan drift.

The original plan proposed a shared `modules/kubernetes/kyverno_lifecycle/`
module emitting the ignore-paths list as an output that stacks would consume in
their `ignore_changes` blocks. That approach is architecturally impossible:
Terraform's `ignore_changes` meta-argument accepts only static attribute paths
— it rejects module outputs, locals, variables, and any expression (the HCL
spec evaluates `lifecycle` before the regular expression graph). So a DRY
module cannot exist. The canonical pattern IS the repeated snippet.

What the snippet was missing was a *discoverability tag* so that (a) new
resources can be validated for compliance, (b) the existing 27 sites can be
grep'd in a single command, and (c) future maintainers understand the
convention rather than each reinventing it.

## This change

- Introduces `# KYVERNO_LIFECYCLE_V1` as the canonical marker comment.
  Attached inline on every `spec[0].template[0].spec[0].dns_config` line
  (or `spec[0].job_template[0].spec[0]...` for CronJobs) across all 27
  existing suppression sites.
- Documents the convention with rationale and copy-paste snippets in
  `AGENTS.md` → new "Kyverno Drift Suppression" section.
- Expands the existing `.claude/CLAUDE.md` Kyverno ndots note to reference
  the marker and explain why the module approach is blocked.
- Updates `_template/main.tf.example` so every new stack starts compliant.

## What is NOT in this change

- The `kubernetes_manifest` Kyverno annotation drift (beads `code-seq`)
  — that is Phase B with a sibling `# KYVERNO_MANIFEST_V1` marker.
- Behavioral changes — every `ignore_changes` list is byte-identical
  save for the inline comment.
- The fallback module the original plan anticipated — skipped because
  Terraform rejects expressions in `ignore_changes`.
- `terraform fmt` cleanup on adjacent unrelated blocks in three files
  (claude-agent-service, freedify/factory, hermes-agent). Reverted to
  keep this commit scoped to the convention rollout.

## Before / after

Before (cannot distinguish accidental-forgotten from intentional-convention):
```hcl
lifecycle {
  ignore_changes = [spec[0].template[0].spec[0].dns_config]
}
```

After (greppable, self-documenting, discoverable by tooling):
```hcl
lifecycle {
  ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
```

## Test Plan

### Automated
```
$ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \
    | awk -F: '{s+=$2} END {print s}'
27

$ git diff --stat | grep -E '\.(tf|tf\.example|md)$' | wc -l
21

# All code-file diffs are 1 insertion + 1 deletion per marker site,
# except beads-server (3), ebooks (4), immich (3), uptime-kuma (2).
$ git diff --stat stacks/ | tail -1
20 files changed, 45 insertions(+), 28 deletions(-)
```

### Manual Verification

No apply required — HCL comments only. Zero effect on any stack's plan output.
Future audits: `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` must grow as new
pod-owning resources are added.

## Reproduce locally
1. `cd infra && git pull`
2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/` → expect 27 hits in 19 files
3. Grep any new `kubernetes_deployment` for the marker; absence = missing
   suppression.

Closes: code-28m

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-04-18 14:15:51 +00:00
parent a62b43d19e
commit c9d221d578
21 changed files with 50 additions and 28 deletions

View file

@ -73,7 +73,7 @@ Violations cause state drift, which causes future applies to break or silently r
- **LimitRange**: Tier-based defaults silently apply to pods with `resources: {}`. Always set explicit resources on containers needing more than defaults. Tier 3-edge and 4-aux now use Burstable QoS (request < limit) to reduce scheduler pressure. - **LimitRange**: Tier-based defaults silently apply to pods with `resources: {}`. Always set explicit resources on containers needing more than defaults. Tier 3-edge and 4-aux now use Burstable QoS (request < limit) to reduce scheduler pressure.
- **Democratic-CSI sidecars**: Must set explicit resources (32-80Mi) in Helm values — 17 sidecars default to 256Mi each via LimitRange. `csiProxy` is a TOP-LEVEL chart key, not nested under controller/node. - **Democratic-CSI sidecars**: Must set explicit resources (32-80Mi) in Helm values — 17 sidecars default to 256Mi each via LimitRange. `csiProxy` is a TOP-LEVEL chart key, not nested under controller/node.
- **ResourceQuota blocks rolling updates**: When quota is tight, scale to 0 then back to 1 instead of RollingUpdate. Or use Recreate strategy. - **ResourceQuota blocks rolling updates**: When quota is tight, scale to 0 then back to 1 instead of RollingUpdate. Or use Recreate strategy.
- **Kyverno ndots drift**: Kyverno injects dns_config on all pods. Add `lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] }` to kubernetes_deployment resources to prevent perpetual TF plan drift. - **Kyverno ndots drift**: Kyverno injects dns_config on all pods. Every `kubernetes_deployment`, `kubernetes_stateful_set`, and `kubernetes_cron_job_v1` MUST include `lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1 }` (use `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` for CronJobs). The `# KYVERNO_LIFECYCLE_V1` marker is the canonical discoverability tag — grep for it to locate every site. A shared Terraform module was considered but `ignore_changes` only accepts static attribute paths (not module outputs, locals, or expressions), so the snippet convention is the only viable path. Full rationale and copy-paste snippets in `AGENTS.md` → "Kyverno Drift Suppression".
- **NVIDIA GPU operator resources**: dcgm-exporter and cuda-validator resources configurable via `dcgmExporter.resources` and `validator.resources` in nvidia values.yaml. - **NVIDIA GPU operator resources**: dcgm-exporter and cuda-validator resources configurable via `dcgmExporter.resources` and `validator.resources` in nvidia values.yaml.
- **Pin database versions**: Disable Diun (image update monitoring) for MySQL, PostgreSQL, Redis. - **Pin database versions**: Disable Diun (image update monitoring) for MySQL, PostgreSQL, Redis.
- **Quarterly right-sizing**: Check Goldilocks dashboard. Compare VPA upperBound to current request. Also check for under-provisioned (VPA upper > request x 0.8). - **Quarterly right-sizing**: Check Goldilocks dashboard. Compare VPA upperBound to current request. Also check for under-provisioned (VPA upper > request x 0.8).

View file

@ -75,6 +75,28 @@ Terragrunt-based homelab managing a Kubernetes cluster (5 nodes, v1.34.2) on Pro
## Shared Variables (never hardcode) ## Shared Variables (never hardcode)
`var.nfs_server` (192.168.1.127), `var.redis_host`, `var.postgresql_host`, `var.mysql_host`, `var.ollama_host`, `var.mail_host` `var.nfs_server` (192.168.1.127), `var.redis_host`, `var.postgresql_host`, `var.mysql_host`, `var.ollama_host`, `var.mail_host`
## Kyverno Drift Suppression (`# KYVERNO_LIFECYCLE_V1`)
Kyverno's admission webhook mutates every pod with a `dns_config { option { name = "ndots"; value = "2" } }` block (fixes NxDomain search-domain floods — see `k8s-ndots-search-domain-nxdomain-flood` skill). Terraform does not manage that field, so without suppression every pod-owning resource shows perpetual `spec[0].template[0].spec[0].dns_config` drift.
**Rule**: every `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, and `kubernetes_cron_job_v1` MUST include the following `lifecycle` block, tagged with the `# KYVERNO_LIFECYCLE_V1` marker so every site is greppable:
```hcl
# kubernetes_deployment / kubernetes_stateful_set / kubernetes_daemon_set
lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
# kubernetes_cron_job_v1 (extra job_template nesting)
lifecycle {
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
```
**Why not a shared module?** Terraform's `ignore_changes` meta-argument only accepts static attribute paths. It rejects module outputs, locals, variables, and any expression. A DRY module is therefore impossible — the canonical pattern IS the snippet + marker. When `kubernetes_manifest` resources get Kyverno `generate.kyverno.io/*` annotations mutated, a sibling convention `# KYVERNO_MANIFEST_V1` will be introduced (Phase B).
**Audit**: `rg "KYVERNO_LIFECYCLE_V1" stacks/ | wc -l` — should grow (never shrink). Add the marker to every new pod-owning resource. The `_template/main.tf.example` stub shows the canonical form.
## Tier System ## Tier System
`0-core` | `1-cluster` | `2-gpu` | `3-edge` | `4-aux` — Kyverno auto-generates LimitRange + ResourceQuota per namespace based on tier label. `0-core` | `1-cluster` | `2-gpu` | `3-edge` | `4-aux` — Kyverno auto-generates LimitRange + ResourceQuota per namespace based on tier label.
- Containers without explicit `resources {}` get default limits (256Mi for edge/aux — causes OOMKill for heavy apps) - Containers without explicit `resources {}` get default limits (256Mi for edge/aux — causes OOMKill for heavy apps)

View file

@ -63,7 +63,7 @@ resource "kubernetes_deployment" "app" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -151,7 +151,7 @@ resource "kubernetes_deployment" "dolt" {
} }
lifecycle { lifecycle {
ignore_changes = [ ignore_changes = [
spec[0].template[0].spec[0].dns_config spec[0].template[0].spec[0].dns_config # KYVERNO_LIFECYCLE_V1
] ]
} }
} }
@ -355,7 +355,7 @@ resource "kubernetes_deployment" "workbench" {
} }
lifecycle { lifecycle {
ignore_changes = [ ignore_changes = [
spec[0].template[0].spec[0].dns_config spec[0].template[0].spec[0].dns_config # KYVERNO_LIFECYCLE_V1
] ]
} }
} }
@ -626,7 +626,7 @@ resource "kubernetes_deployment" "beadboard" {
} }
lifecycle { lifecycle {
ignore_changes = [ ignore_changes = [
spec[0].template[0].spec[0].dns_config spec[0].template[0].spec[0].dns_config # KYVERNO_LIFECYCLE_V1
] ]
} }
} }

View file

@ -430,7 +430,7 @@ resource "kubernetes_deployment" "claude_agent" {
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -549,7 +549,7 @@ resource "kubernetes_stateful_set_v1" "mysql_standalone" {
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -226,6 +226,6 @@ resource "kubernetes_deployment" "diun" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -346,7 +346,7 @@ resource "kubernetes_deployment" "calibre-web-automated" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }
@ -466,7 +466,7 @@ resource "kubernetes_deployment" "annas-archive-stacks" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }
@ -615,7 +615,7 @@ resource "kubernetes_deployment" "audiobookshelf" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }
@ -876,7 +876,7 @@ resource "kubernetes_deployment" "book_search" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -193,7 +193,7 @@ resource "kubernetes_deployment" "freedify" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -374,7 +374,7 @@ resource "kubernetes_deployment" "hermes_agent" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -145,7 +145,7 @@ resource "kubernetes_deployment" "immich_server" {
lifecycle { lifecycle {
ignore_changes = [ ignore_changes = [
spec[0].template[0].spec[0].dns_config, spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
] ]
} }
@ -373,7 +373,7 @@ resource "kubernetes_deployment" "immich-postgres" {
lifecycle { lifecycle {
ignore_changes = [ ignore_changes = [
spec[0].template[0].spec[0].dns_config, spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
] ]
} }
@ -532,7 +532,7 @@ resource "kubernetes_deployment" "immich-machine-learning" {
lifecycle { lifecycle {
ignore_changes = [ ignore_changes = [
spec[0].template[0].spec[0].dns_config, spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
] ]
} }

View file

@ -198,7 +198,7 @@ resource "kubernetes_deployment" "insta2spotify" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -115,7 +115,7 @@ resource "kubernetes_deployment" "k8s_portal" {
lifecycle { lifecycle {
# DRIFT_WORKAROUND: CI pipeline owns image tag (kubectl set image from Woodpecker/GHA); Kyverno mutates dns_config for ndots. Reviewed 2026-04-18. # DRIFT_WORKAROUND: CI pipeline owns image tag (kubectl set image from Woodpecker/GHA); Kyverno mutates dns_config for ndots. Reviewed 2026-04-18.
ignore_changes = [ ignore_changes = [
spec[0].template[0].spec[0].dns_config, spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
spec[0].template[0].spec[0].container[0].image, # CI updates image tag spec[0].template[0].spec[0].container[0].image, # CI updates image tag
] ]
} }

View file

@ -1175,7 +1175,7 @@ resource "kubernetes_deployment" "openlobster" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -197,7 +197,7 @@ resource "kubernetes_deployment" "phpipam_web" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -89,7 +89,7 @@ resource "kubernetes_deployment" "priority-pass" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -611,6 +611,6 @@ PYEOF
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -89,7 +89,7 @@ resource "kubernetes_deployment" "error_pages" {
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -517,7 +517,7 @@ PYEOF
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }
@ -707,6 +707,6 @@ PYEOF
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }

View file

@ -76,7 +76,7 @@ resource "kubernetes_persistent_volume_claim" "data_proxmox" {
resource "kubernetes_deployment" "wealthfolio" { resource "kubernetes_deployment" "wealthfolio" {
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
metadata { metadata {
name = "wealthfolio" name = "wealthfolio"

View file

@ -230,7 +230,7 @@ resource "kubernetes_deployment" "webhook_handler" {
} }
} }
lifecycle { lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
} }
} }