From 11082f7e834f9967fc6353744bb3d45ee26ee35a Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Sat, 18 Apr 2026 22:52:56 +0000 Subject: [PATCH] [infra] Partial Calico adoption: namespaces only (Wave 5b) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Context Wave 5b of the state-drift consolidation plan. Calico has run this cluster's pod networking since 2024-07-30, installed via raw kubectl manifests — tigera-operator Deployment + ~20 CRDs + an Installation CR. The plan flagged Calico as HIGH BLAST because the operator + Installation CR sit on the critical path for pod scheduling; any mistake during adoption can break CNI and block new pods cluster-wide within seconds. This session takes the safe sub-step: adopt only the three namespaces. Namespaces are label containers — TF managing their names + PSA labels cannot disrupt Calico networking. Getting the operator, Installation CR, and CRDs under TF requires dedicated prep (picking the right `ignore_changes` fields to absorb operator-generated defaults in the Installation CR, decoupling from the embedded PSA labels applied at admission, and a low-traffic window). Deferred to `code-3ad`. ## This change New Tier 1 stack `stacks/calico/` adopting via import `{}` blocks (Wave 8 convention, commit 8a99be11): - `kubernetes_namespace.calico_system` ← id `calico-system` - `kubernetes_namespace.calico_apiserver` ← id `calico-apiserver` - `kubernetes_namespace.tigera_operator` ← id `tigera-operator` Apply: `3 imported, 0 added, 0 changed, 0 destroyed.` Followed by a second `tg plan` that returns `No changes`. Zero cluster impact — namespaces stayed exactly as they were cluster-side. ### terragrunt dependency choice Deliberately no `dependency "platform"` clause — Calico is lower in the stack than platform, so introducing a `platform → calico` or `calico → platform` edge would invite cycle-like pain on first bootstrap. The plan on this stack is always safe to run standalone. ### `ignore_changes` scope on each namespace - `goldilocks.fairwinds.com/vpa-update-mode` — Kyverno ClusterPolicy stamp (Wave 3B sweep, commit 8b43692a). - `pod-security.kubernetes.io/enforce` + `-version` — tigera-operator stamps these on `calico-system` + `calico-apiserver` to opt them out of PSA. These labels aren't surfaced by the kubernetes provider as part of the import (they arrive through a different field manager), so left unmanaged to keep the plan clean. `tigera-operator` ns doesn't get the PSA labels so they aren't ignored there. ## What is NOT in this change - The three live workloads: `tigera-operator` Deployment in `tigera-operator` ns, `calico-kube-controllers`/`calico-node`/ `calico-typha` workloads in `calico-system`, the `calico-apiserver` in `calico-apiserver`. These are all reconciled by the tigera-operator from the Installation CR — importing them into TF is redundant with importing the CR itself. - The `Installation` CR (`default`, apiVersion `operator.tigera.io/v1`) — the user-authored minimal spec has since been filled to 104 lines of operator-generated defaults. Adopting it requires a well-scoped `ignore_changes` list on the `manifest` field. Separate follow-up `code-3ad`. - `.sops.yaml` / `tier0_stacks` updates — the original plan suggested Tier 0 (local SOPS state) for the full Calico stack on the theory that "network underpins all". With only three namespaces in the stack, the argument doesn't hold: a failed Tier 1 plan on calico namespaces cannot break networking, so no need to pay the Tier 0 tax. ## Verification ``` $ cd stacks/calico && ../../scripts/tg plan No changes. Your infrastructure matches the configuration. $ kubectl get pods -n calico-system NAME READY STATUS RESTARTS calico-kube-controllers-... 1/1 Running 0 calico-node-... 1/1 Running 0 ... (all healthy, pre-existing) ``` Follow-up: code-3ad for operator + Installation CR adoption (needs low-traffic window + ignore_changes scoping). Closes: code-hl1 scope of Wave 5b (namespaces). Remaining subwave in code-3ad. Co-Authored-By: Claude Opus 4.7 (1M context) --- stacks/calico/main.tf | 67 ++++++++++++++++++++++++++++++++++++ stacks/calico/secrets | 1 + stacks/calico/terragrunt.hcl | 6 ++++ 3 files changed, 74 insertions(+) create mode 100644 stacks/calico/main.tf create mode 120000 stacks/calico/secrets create mode 100644 stacks/calico/terragrunt.hcl diff --git a/stacks/calico/main.tf b/stacks/calico/main.tf new file mode 100644 index 00000000..79bc756b --- /dev/null +++ b/stacks/calico/main.tf @@ -0,0 +1,67 @@ +# Calico CNI +# +# Calico has underpinned this cluster's pod networking since 2024-07-30, installed +# as raw kubectl manifests (tigera-operator Deployment + CRDs + Installation CR). +# Bringing the full stack under Terraform is high-blast — the operator and its +# Deployment must never flap during node pressure or during any apply, because +# new pod scheduling breaks within ~seconds of a CNI outage. +# +# This stack (created 2026-04-18 Wave 5b) adopts the three namespaces only: +# calico-system, calico-apiserver, tigera-operator. The `tigera-operator` +# Deployment, the 20+ CRDs it manages, and the `Installation` CR itself are +# intentionally *not* adopted yet — they require a low-traffic window and a +# careful ignore_changes set to cover operator-generated defaults on the +# Installation CR. Follow-up tracked in beads code-3ad. +# +# The namespaces are safe to adopt (no networking impact — they're just label +# containers) and give TF an audit trail entry for the labels/tier Kyverno +# cares about. + +resource "kubernetes_namespace" "calico_system" { + metadata { + name = "calico-system" + labels = { + name = "calico-system" + } + } + lifecycle { + # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode label on every namespace. + # pod-security.kubernetes.io/* labels are applied by the tigera-operator + # reconciler on calico-system + calico-apiserver for PSA 'privileged'. + ignore_changes = [ + metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"], + metadata[0].labels["pod-security.kubernetes.io/enforce"], + metadata[0].labels["pod-security.kubernetes.io/enforce-version"], + ] + } +} + +resource "kubernetes_namespace" "calico_apiserver" { + metadata { + name = "calico-apiserver" + labels = { + name = "calico-apiserver" + } + } + lifecycle { + # KYVERNO_LIFECYCLE_V1 + PSA labels applied by tigera-operator (see calico_system). + ignore_changes = [ + metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"], + metadata[0].labels["pod-security.kubernetes.io/enforce"], + metadata[0].labels["pod-security.kubernetes.io/enforce-version"], + ] + } +} + +resource "kubernetes_namespace" "tigera_operator" { + metadata { + name = "tigera-operator" + labels = { + name = "tigera-operator" + } + } + lifecycle { + # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace + ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]] + } +} diff --git a/stacks/calico/secrets b/stacks/calico/secrets new file mode 120000 index 00000000..ca54a7cf --- /dev/null +++ b/stacks/calico/secrets @@ -0,0 +1 @@ +../../secrets \ No newline at end of file diff --git a/stacks/calico/terragrunt.hcl b/stacks/calico/terragrunt.hcl new file mode 100644 index 00000000..eb956424 --- /dev/null +++ b/stacks/calico/terragrunt.hcl @@ -0,0 +1,6 @@ +include "root" { + path = find_in_parent_folders() +} + +# No platform dependency — Calico provides the cluster network the rest +# of the platform runs on. This stack must not introduce a dep cycle.