infra/stacks/phpipam/main.tf

744 lines
24 KiB
Terraform
Raw Permalink Normal View History

variable "tls_secret_name" {
type = string
sensitive = true
}
variable "mysql_host" { type = string }
data "vault_kv_secret_v2" "secrets" {
mount = "secret"
name = "platform"
}
locals {
technitium_password = data.vault_kv_secret_v2.secrets.data["technitium_password"]
}
resource "kubernetes_namespace" "phpipam" {
metadata {
name = "phpipam"
labels = {
tier = local.tiers.aux
}
}
[infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] ## Context Wave 3B-continued: the Goldilocks VPA dashboard (stacks/vpa) runs a Kyverno ClusterPolicy `goldilocks-vpa-auto-mode` that mutates every namespace with `metadata.labels["goldilocks.fairwinds.com/vpa-update-mode"] = "off"`. This is intentional — Terraform owns container resource limits, and Goldilocks should only provide recommendations, never auto-update. The label is how Goldilocks decides per-namespace whether to run its VPA in `off` mode. Effect on Terraform: every `kubernetes_namespace` resource shows the label as pending-removal (`-> null`) on every `scripts/tg plan`. Dawarich survey 2026-04-18 confirmed the drift. Cluster-side count: 88 namespaces carry the label (`kubectl get ns -o json | jq ... | wc -l`). Every TF-managed namespace is affected. This commit brings the intentional admission drift under the same `# KYVERNO_LIFECYCLE_V1` discoverability marker introduced in c9d221d5 for the ndots dns_config pattern. The marker now stands generically for any Kyverno admission-webhook drift suppression; the inline comment records which specific policy stamps which specific field so future grep audits show why each suppression exists. ## This change 107 `.tf` files touched — every stack's `resource "kubernetes_namespace"` resource gets: ```hcl lifecycle { # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]] } ``` Injection was done with a brace-depth-tracking Python pass (`/tmp/add_goldilocks_ignore.py`): match `^resource "kubernetes_namespace" ` → track `{` / `}` until the outermost closing brace → insert the lifecycle block before the closing brace. The script is idempotent (skips any file that already mentions `goldilocks.fairwinds.com/vpa-update-mode`) so re-running is safe. Vault stack picked up 2 namespaces in the same file (k8s-users produces one, plus a second explicit ns) — confirmed via file diff (+8 lines). ## What is NOT in this change - `stacks/trading-bot/main.tf` — entire file is `/* … */` commented out (paused 2026-04-06 per user decision). Reverted after the script ran. - `stacks/_template/main.tf.example` — per-stack skeleton, intentionally minimal. User keeps it that way. Not touched by the script (file has no real `resource "kubernetes_namespace"` — only a placeholder comment). - `.terraform/` copies (e.g. `stacks/metallb/.terraform/modules/...`) — gitignored, won't commit; the live path was edited. - `terraform fmt` cleanup of adjacent pre-existing alignment issues in authentik, freedify, hermes-agent, nvidia, vault, meshcentral. Reverted to keep the commit scoped to the Goldilocks sweep. Those files will need a separate fmt-only commit or will be cleaned up on next real apply to that stack. ## Verification Dawarich (one of the hundred-plus touched stacks) showed the pattern before and after: ``` $ cd stacks/dawarich && ../../scripts/tg plan Before: Plan: 0 to add, 2 to change, 0 to destroy. # kubernetes_namespace.dawarich will be updated in-place (goldilocks.fairwinds.com/vpa-update-mode -> null) # module.tls_secret.kubernetes_secret.tls_secret will be updated in-place (Kyverno generate.* labels — fixed in 8d94688d) After: No changes. Your infrastructure matches the configuration. ``` Injection count check: ``` $ rg -c 'KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode' stacks/ | awk -F: '{s+=$2} END {print s}' 108 ``` ## Reproduce locally 1. `git pull` 2. Pick any stack: `cd stacks/<name> && ../../scripts/tg plan` 3. Expect: no drift on the namespace's goldilocks.fairwinds.com/vpa-update-mode label. Closes: code-dwx Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:15:27 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
}
resource "kubernetes_manifest" "external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "phpipam-secrets"
namespace = "phpipam"
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-database"
kind = "ClusterSecretStore"
}
target = {
name = "phpipam-secrets"
}
data = [{
secretKey = "db_password"
remoteRef = {
key = "static-creds/mysql-phpipam"
property = "password"
}
}]
}
}
depends_on = [kubernetes_namespace.phpipam]
}
resource "kubernetes_manifest" "external_secret_pfsense_ssh" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "phpipam-pfsense-ssh"
namespace = "phpipam"
}
spec = {
refreshInterval = "1h"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "phpipam-pfsense-ssh"
}
data = [{
secretKey = "ssh_key"
remoteRef = {
key = "viktor"
property = "phpipam_pfsense_ssh_key"
}
}]
}
}
depends_on = [kubernetes_namespace.phpipam]
}
resource "kubernetes_manifest" "external_secret_admin" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "phpipam-admin-password"
namespace = "phpipam"
}
spec = {
refreshInterval = "1h"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "phpipam-admin-password"
}
data = [{
secretKey = "password"
remoteRef = {
key = "viktor"
property = "phpipam_admin_password"
}
}]
}
}
depends_on = [kubernetes_namespace.phpipam]
}
module "tls_secret" {
source = "../../modules/kubernetes/setup_tls_secret"
namespace = kubernetes_namespace.phpipam.metadata[0].name
tls_secret_name = var.tls_secret_name
}
resource "kubernetes_deployment" "phpipam_web" {
metadata {
name = "phpipam-web"
namespace = kubernetes_namespace.phpipam.metadata[0].name
labels = {
app = "phpipam"
tier = local.tiers.aux
}
annotations = {
"reloader.stakater.com/auto" = "true"
}
}
spec {
replicas = 1
strategy {
type = "Recreate"
}
selector {
match_labels = {
app = "phpipam"
}
}
template {
metadata {
labels = {
app = "phpipam"
}
annotations = {
"diun.enable" = "true"
"dependency.kyverno.io/wait-for" = "mysql.dbaas:3306"
}
}
spec {
container {
image = "phpipam/phpipam-www:v1.7.4"
name = "phpipam-web"
port {
container_port = 80
}
env {
name = "TZ"
value = "Europe/Sofia"
}
env {
name = "IPAM_DATABASE_HOST"
value = var.mysql_host
}
env {
name = "IPAM_DATABASE_USER"
value = "phpipam"
}
env {
name = "IPAM_DATABASE_PASS"
value_from {
secret_key_ref {
name = "phpipam-secrets"
key = "db_password"
}
}
}
env {
name = "IPAM_DATABASE_NAME"
value = "phpipam"
}
env {
name = "IPAM_TRUST_X_FORWARDED"
value = "true"
}
resources {
requests = {
cpu = "10m"
memory = "64Mi"
}
limits = {
memory = "256Mi"
}
}
}
}
}
}
lifecycle {
[infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] ## Context Phase 1 of the state-drift consolidation audit (plan Wave 3) identified that the entire repo leans on a repeated `lifecycle { ignore_changes = [...dns_config] }` snippet to suppress Kyverno's admission-webhook dns_config mutation (the ndots=2 override that prevents NxDomain search-domain flooding). 27 occurrences across 19 stacks. Without this suppression, every pod-owning resource shows perpetual TF plan drift. The original plan proposed a shared `modules/kubernetes/kyverno_lifecycle/` module emitting the ignore-paths list as an output that stacks would consume in their `ignore_changes` blocks. That approach is architecturally impossible: Terraform's `ignore_changes` meta-argument accepts only static attribute paths — it rejects module outputs, locals, variables, and any expression (the HCL spec evaluates `lifecycle` before the regular expression graph). So a DRY module cannot exist. The canonical pattern IS the repeated snippet. What the snippet was missing was a *discoverability tag* so that (a) new resources can be validated for compliance, (b) the existing 27 sites can be grep'd in a single command, and (c) future maintainers understand the convention rather than each reinventing it. ## This change - Introduces `# KYVERNO_LIFECYCLE_V1` as the canonical marker comment. Attached inline on every `spec[0].template[0].spec[0].dns_config` line (or `spec[0].job_template[0].spec[0]...` for CronJobs) across all 27 existing suppression sites. - Documents the convention with rationale and copy-paste snippets in `AGENTS.md` → new "Kyverno Drift Suppression" section. - Expands the existing `.claude/CLAUDE.md` Kyverno ndots note to reference the marker and explain why the module approach is blocked. - Updates `_template/main.tf.example` so every new stack starts compliant. ## What is NOT in this change - The `kubernetes_manifest` Kyverno annotation drift (beads `code-seq`) — that is Phase B with a sibling `# KYVERNO_MANIFEST_V1` marker. - Behavioral changes — every `ignore_changes` list is byte-identical save for the inline comment. - The fallback module the original plan anticipated — skipped because Terraform rejects expressions in `ignore_changes`. - `terraform fmt` cleanup on adjacent unrelated blocks in three files (claude-agent-service, freedify/factory, hermes-agent). Reverted to keep this commit scoped to the convention rollout. ## Before / after Before (cannot distinguish accidental-forgotten from intentional-convention): ```hcl lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] } ``` After (greppable, self-documenting, discoverable by tooling): ```hcl lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1 } ``` ## Test Plan ### Automated ``` $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 27 $ git diff --stat | grep -E '\.(tf|tf\.example|md)$' | wc -l 21 # All code-file diffs are 1 insertion + 1 deletion per marker site, # except beads-server (3), ebooks (4), immich (3), uptime-kuma (2). $ git diff --stat stacks/ | tail -1 20 files changed, 45 insertions(+), 28 deletions(-) ``` ### Manual Verification No apply required — HCL comments only. Zero effect on any stack's plan output. Future audits: `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` must grow as new pod-owning resources are added. ## Reproduce locally 1. `cd infra && git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/` → expect 27 hits in 19 files 3. Grep any new `kubernetes_deployment` for the marker; absence = missing suppression. Closes: code-28m Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:15:51 +00:00
ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
}
# phpipam-cron container removed — discovery now handled by phpipam-pfsense-import CronJob
# which queries Kea DHCP leases + pfSense ARP table directly (no fping needed)
resource "kubernetes_service" "phpipam" {
metadata {
name = "phpipam"
namespace = kubernetes_namespace.phpipam.metadata[0].name
labels = {
app = "phpipam"
}
}
spec {
selector = {
app = "phpipam"
}
port {
name = "http"
port = 80
target_port = 80
}
}
}
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
[infra] Auto-create Cloudflare DNS records from ingress_factory ## Context Deploying new services required manually adding hostnames to cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars — a separate file from the service stack. This was frequently forgotten, leaving services unreachable externally. ## This change: - Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory` modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP). - Simplify cloudflared tunnel from 100 per-hostname rules to wildcard `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing. - Add global Cloudflare provider via terragrunt.hcl (separate cloudflare_provider.tf with Vault-sourced API key). - Migrate 118 hostnames from centralized config.tfvars to per-service dns_type. 17 hostnames remain centrally managed (Helm ingresses, special cases). - Update docs, AGENTS.md, CLAUDE.md, dns.md runbook. ``` BEFORE AFTER config.tfvars (manual list) stacks/<svc>/main.tf | module "ingress" { v dns_type = "proxied" stacks/cloudflared/ } for_each = list | cloudflare_record auto-creates tunnel per-hostname cloudflare_record + annotation ``` ## What is NOT in this change: - Uptime Kuma monitor migration (still reads from config.tfvars) - 17 remaining centrally-managed hostnames (Helm, special cases) - Removal of allow_overwrite (keep until migration confirmed stable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:45:04 +00:00
dns_type = "proxied"
namespace = kubernetes_namespace.phpipam.metadata[0].name
name = "phpipam"
tls_secret_name = var.tls_secret_name
protected = true
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "phpIPAM"
"gethomepage.dev/description" = "IP Address Management"
"gethomepage.dev/icon" = "phpipam.png"
"gethomepage.dev/group" = "Infrastructure"
"gethomepage.dev/pod-selector" = ""
}
}
# CronJob: Bidirectional sync between phpIPAM and Technitium DNS
# 1. Push: named phpIPAM hosts → Technitium A + PTR records
# 2. Pull: Technitium reverse DNS → phpIPAM hostnames for unnamed entries
resource "kubernetes_cron_job_v1" "phpipam_dns_sync" {
metadata {
name = "phpipam-dns-sync"
namespace = kubernetes_namespace.phpipam.metadata[0].name
}
spec {
schedule = "*/15 * * * *"
successful_jobs_history_limit = 1
failed_jobs_history_limit = 3
concurrency_policy = "Forbid"
job_template {
metadata {}
spec {
backoff_limit = 1
template {
metadata {}
spec {
restart_policy = "Never"
container {
name = "sync"
image = "mysql:8.0"
command = ["/bin/bash", "-c", <<-EOT
set -e
TECH_URL="http://technitium-web.technitium.svc.cluster.local:5380"
# Login to Technitium
TECH_TOKEN=$$(curl -sf "$$TECH_URL/api/user/login?user=admin&pass=$$TECH_PASS" | sed 's/.*"token":"\([^"]*\)".*/\1/')
if [ -z "$$TECH_TOKEN" ]; then echo "Technitium login failed"; exit 1; fi
echo "Technitium auth OK"
# Query phpIPAM MySQL directly for hosts with hostnames
HOSTS=$$(mysql -h $$DB_HOST -u $$DB_USER -p$$DB_PASS $$DB_NAME -N -B -e \
"SELECT INET_NTOA(ip_addr), hostname FROM ipaddresses WHERE hostname != '' AND hostname IS NOT NULL AND subnetId >= 7")
SYNCED=0
echo "$$HOSTS" | while IFS=$$'\t' read -r IP HOSTNAME; do
[ -z "$$IP" ] || [ -z "$$HOSTNAME" ] && continue
SHORT=$$(echo "$$HOSTNAME" | cut -d. -f1)
FQDN="$$SHORT.viktorbarzin.lan"
# A record
curl -sf -o /dev/null -X POST "$$TECH_URL/api/zones/records/add?token=$$TECH_TOKEN" \
-d "domain=$$FQDN&zone=viktorbarzin.lan&type=A&ipAddress=$$IP&overwrite=true&ttl=300"
# PTR record
O1=$$(echo $$IP | cut -d. -f1); O2=$$(echo $$IP | cut -d. -f2)
O3=$$(echo $$IP | cut -d. -f3); O4=$$(echo $$IP | cut -d. -f4)
curl -sf -o /dev/null -X POST "$$TECH_URL/api/zones/records/add?token=$$TECH_TOKEN" \
-d "domain=$$O4.$$O3.$$O2.$$O1.in-addr.arpa&zone=$$O3.$$O2.$$O1.in-addr.arpa&type=PTR&ptrName=$$FQDN&overwrite=true&ttl=300" 2>/dev/null || true
SYNCED=$$((SYNCED + 1))
echo " $$IP -> $$FQDN"
done
echo "Push sync complete"
# Reverse sync: pull hostnames from DNS into phpIPAM for unnamed entries
echo ""
echo "=== Reverse sync: DNS -> phpIPAM ==="
UNNAMED=$$(mysql -h $$DB_HOST -u $$DB_USER -p$$DB_PASS $$DB_NAME -N -B -e \
"SELECT id, INET_NTOA(ip_addr) FROM ipaddresses WHERE (hostname IS NULL OR hostname = '') AND subnetId >= 7")
echo "$$UNNAMED" | while IFS=$$'\t' read -r ID IP; do
[ -z "$$ID" ] || [ -z "$$IP" ] && continue
# Query Technitium for PTR record
O1=$$(echo $$IP | cut -d. -f1); O2=$$(echo $$IP | cut -d. -f2)
O3=$$(echo $$IP | cut -d. -f3); O4=$$(echo $$IP | cut -d. -f4)
PTR_NAME="$$O4.$$O3.$$O2.$$O1.in-addr.arpa"
REV_ZONE="$$O3.$$O2.$$O1.in-addr.arpa"
RESULT=$$(curl -sf "$$TECH_URL/api/zones/records/get?token=$$TECH_TOKEN&domain=$$PTR_NAME&zone=$$REV_ZONE&type=PTR" 2>/dev/null)
HOSTNAME=$$(echo "$$RESULT" | sed -n 's/.*"ptrName":"\([^"]*\)".*/\1/p' | head -1)
[ -z "$$HOSTNAME" ] && continue
# Extract short name
SHORT=$$(echo "$$HOSTNAME" | cut -d. -f1)
[ -z "$$SHORT" ] && continue
# Update phpIPAM
mysql -h $$DB_HOST -u $$DB_USER -p$$DB_PASS $$DB_NAME -e \
"UPDATE ipaddresses SET hostname='$$SHORT' WHERE id=$$ID AND (hostname IS NULL OR hostname = '')"
echo " $$IP -> $$SHORT (from DNS)"
done
echo "Bidirectional sync complete"
EOT
]
env {
name = "TECH_PASS"
value = local.technitium_password
}
env {
name = "DB_HOST"
value = var.mysql_host
}
env {
name = "DB_USER"
value = "phpipam"
}
env {
name = "DB_PASS"
value_from {
secret_key_ref {
name = "phpipam-secrets"
key = "db_password"
}
}
}
env {
name = "DB_NAME"
value = "phpipam"
}
resources {
requests = {
cpu = "10m"
memory = "32Mi"
}
limits = {
memory = "128Mi"
}
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}
# CronJob: Import devices from pfSense (Kea DHCP leases + ARP table) into phpIPAM
# Replaces active fping scanning with passive data from pfSense
resource "kubernetes_cron_job_v1" "phpipam_pfsense_import" {
metadata {
name = "phpipam-pfsense-import"
namespace = kubernetes_namespace.phpipam.metadata[0].name
}
spec {
schedule = "0 * * * *"
successful_jobs_history_limit = 1
failed_jobs_history_limit = 3
concurrency_policy = "Forbid"
job_template {
metadata {}
spec {
backoff_limit = 1
template {
metadata {}
spec {
restart_policy = "Never"
container {
name = "import"
image = "alpine:3.21"
command = ["/bin/sh", "-c", <<-EOT
set -e
apk add --no-cache -q openssh-client mysql-client python3 > /dev/null 2>&1
# Setup SSH key
mkdir -p /root/.ssh
cp /ssh/ssh_key /root/.ssh/id_rsa
chmod 600 /root/.ssh/id_rsa
echo "StrictHostKeyChecking no" > /root/.ssh/config
# 1. Get Kea DHCP leases via control socket
echo "=== Fetching Kea leases ==="
LEASES=$$(ssh admin@10.0.20.1 'echo "{\"command\": \"lease4-get-all\"}" | /usr/bin/nc -U /tmp/kea4-ctrl-socket 2>/dev/null')
# 2. Get ARP table
echo "=== Fetching ARP table ==="
ARP=$$(ssh admin@10.0.20.1 'arp -an' 2>/dev/null)
# Remote sites handled by phpipam-remote-import CronJob (hourly)
# 3. Parse and import into phpIPAM MySQL
echo "=== Importing into phpIPAM ==="
export LEASES_DATA="$$LEASES"
export ARP_DATA="$$ARP"
python3 << 'PYEOF'
import json, subprocess, sys, re, os
db_host = os.environ["DB_HOST"]
db_user = os.environ["DB_USER"]
db_pass = os.environ["DB_PASS"]
db_name = os.environ["DB_NAME"]
def mysql_exec(sql):
r = subprocess.run(
["mysql", "-h", db_host, "-u", db_user, f"-p{db_pass}", db_name, "-N", "-B", "-e", sql],
capture_output=True, text=True
)
return r.stdout.strip()
# Get existing phpIPAM entries (subnetId >= 7 = our subnets)
existing = {}
rows = mysql_exec("SELECT INET_NTOA(ip_addr), hostname, mac, subnetId FROM ipaddresses WHERE subnetId >= 7")
for line in rows.split("\n"):
if not line: continue
parts = line.split("\t")
existing[parts[0]] = {"hostname": parts[1] if parts[1] != "NULL" else "", "mac": parts[2] if parts[2] != "NULL" else "", "subnetId": parts[3]}
# Subnet mapping
def get_subnet_id(ip):
if ip.startswith("10.0.10."): return 7
if ip.startswith("10.0.20."): return 8
if ip.startswith("192.168.1."): return 9
if ip.startswith("10.3.2."): return 10
if ip.startswith("192.168.8."): return 11
if ip.startswith("192.168.0."): return 12
return None
# Parse Kea leases
leases_raw = os.environ.get("LEASES_DATA", "{}")
try:
leases_json = json.loads(leases_raw)
leases = leases_json.get("arguments", {}).get("leases", []) if isinstance(leases_json, dict) else leases_json[0].get("arguments", {}).get("leases", [])
except:
leases = []
imported = 0
updated_mac = 0
updated_hostname = 0
for lease in leases:
ip = lease["ip-address"]
mac = lease.get("hw-address", "")
hostname = lease.get("hostname", "").split(".")[0] # strip .viktorbarzin.lan
subnet_id = get_subnet_id(ip)
if not subnet_id: continue
if ip not in existing:
# New host — insert
mac_sql = f"'{mac}'" if mac else "NULL"
host_sql = f"'{hostname}'" if hostname else "''"
mysql_exec(f"INSERT INTO ipaddresses (ip_addr, subnetId, hostname, mac, description, lastSeen) VALUES (INET_ATON('{ip}'), {subnet_id}, {host_sql}, {mac_sql}, '-- kea lease --', NOW())")
imported += 1
print(f" NEW {ip} -> {hostname} mac={mac}")
else:
# Existing — update MAC if missing, hostname if missing, lastSeen always
updates = ["lastSeen=NOW()"]
if mac and not existing[ip]["mac"]:
updates.append(f"mac='{mac}'")
updated_mac += 1
if hostname and not existing[ip]["hostname"]:
updates.append(f"hostname='{hostname}'")
updated_hostname += 1
mysql_exec(f"UPDATE ipaddresses SET {','.join(updates)} WHERE ip_addr=INET_ATON('{ip}')")
# Parse ARP table for devices not in Kea (static IPs)
arp_raw = os.environ.get("ARP_DATA", "")
lease_ips = {l["ip-address"] for l in leases}
for line in arp_raw.split("\n"):
m = re.match(r'\? \((\d+\.\d+\.\d+\.\d+)\) at ([0-9a-f:]+) on', line)
if not m: continue
ip, mac = m.group(1), m.group(2)
if mac == "(incomplete)": continue
subnet_id = get_subnet_id(ip)
if not subnet_id: continue
if ip in lease_ips: continue # already handled by Kea
if ip not in existing:
mysql_exec(f"INSERT INTO ipaddresses (ip_addr, subnetId, mac, description, lastSeen) VALUES (INET_ATON('{ip}'), {subnet_id}, '{mac}', '-- arp discovered --', NOW())")
imported += 1
print(f" NEW (arp) {ip} mac={mac}")
else:
updates = ["lastSeen=NOW()"]
if mac and not existing[ip]["mac"]:
updates.append(f"mac='{mac}'")
updated_mac += 1
mysql_exec(f"UPDATE ipaddresses SET {','.join(updates)} WHERE ip_addr=INET_ATON('{ip}')")
print(f"\nImported: {imported} new, Updated: {updated_mac} MACs, {updated_hostname} hostnames")
PYEOF
echo "Import complete"
EOT
]
env {
name = "DB_HOST"
value = var.mysql_host
}
env {
name = "DB_USER"
value = "phpipam"
}
env {
name = "DB_PASS"
value_from {
secret_key_ref {
name = "phpipam-secrets"
key = "db_password"
}
}
}
env {
name = "DB_NAME"
value = "phpipam"
}
volume_mount {
name = "ssh-key"
mount_path = "/ssh"
read_only = true
}
resources {
requests = {
cpu = "10m"
memory = "64Mi"
}
limits = {
memory = "128Mi"
}
}
}
volume {
name = "ssh-key"
secret {
secret_name = "phpipam-pfsense-ssh"
default_mode = "0400"
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}
# CronJob: Import devices from remote sites (London + Valchedrym) via SSH
# Runs hourly — these networks are mostly static
resource "kubernetes_cron_job_v1" "phpipam_remote_import" {
metadata {
name = "phpipam-remote-import"
namespace = kubernetes_namespace.phpipam.metadata[0].name
}
spec {
schedule = "0 * * * *"
successful_jobs_history_limit = 1
failed_jobs_history_limit = 3
concurrency_policy = "Forbid"
job_template {
metadata {}
spec {
backoff_limit = 1
template {
metadata {}
spec {
restart_policy = "Never"
container {
name = "import"
image = "alpine:3.21"
command = ["/bin/sh", "-c", <<-EOT
set -e
apk add --no-cache -q openssh-client mysql-client python3 > /dev/null 2>&1
mkdir -p /root/.ssh
cp /ssh/ssh_key /root/.ssh/id_rsa
chmod 600 /root/.ssh/id_rsa
echo "StrictHostKeyChecking no" > /root/.ssh/config
# Pull DHCP leases + ARP from Valchedrym via pfSense SSH hop
echo "=== Valchedrym (192.168.0.1 via pfSense) ==="
VALCHEDRYM=$$(ssh -o ConnectTimeout=10 admin@10.0.20.1 'timeout 15 ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 root@192.168.0.1 "cat /tmp/dhcp.leases 2>/dev/null; echo ---ARP---; cat /proc/net/arp 2>/dev/null" 2>/dev/null' 2>/dev/null || echo "")
echo "=== London (192.168.8.1 via pfSense) ==="
LONDON=$$(ssh -o ConnectTimeout=10 admin@10.0.20.1 'timeout 15 ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 root@192.168.8.1 "cat /tmp/dhcp.leases 2>/dev/null; echo ---ARP---; cat /proc/net/arp 2>/dev/null" 2>/dev/null' 2>/dev/null || echo "")
echo "=== Importing ==="
export LONDON_DATA="$$LONDON"
export VALCHEDRYM_DATA="$$VALCHEDRYM"
python3 << 'PYEOF'
import os, re, subprocess
db_host = os.environ["DB_HOST"]
db_user = os.environ["DB_USER"]
db_pass = os.environ["DB_PASS"]
db_name = os.environ["DB_NAME"]
def mysql_exec(sql):
subprocess.run(["mysql", "-h", db_host, "-u", db_user, f"-p{db_pass}", db_name, "-N", "-B", "-e", sql], capture_output=True, text=True)
def get_existing():
r = subprocess.run(["mysql", "-h", db_host, "-u", db_user, f"-p{db_pass}", db_name, "-N", "-B", "-e",
"SELECT INET_NTOA(ip_addr), hostname, mac, subnetId FROM ipaddresses WHERE subnetId IN (11, 12)"],
capture_output=True, text=True)
existing = {}
for line in r.stdout.strip().split("\n"):
if not line: continue
parts = line.split("\t")
existing[parts[0]] = {"hostname": parts[1] if parts[1] != "NULL" else "", "mac": parts[2] if parts[2] != "NULL" else ""}
return existing
def import_site(data, subnet_prefix, subnet_id, site_name):
if not data or "---ARP---" not in data:
print(f" {site_name}: no data")
return 0
existing = get_existing()
dhcp_part, arp_part = data.split("---ARP---", 1)
imported = 0
# DHCP leases: timestamp mac ip hostname client_id
for line in dhcp_part.strip().split("\n"):
parts = line.split()
if len(parts) < 4: continue
mac, ip, hostname = parts[1], parts[2], parts[3]
if not ip.startswith(subnet_prefix): continue
short = hostname.split(".")[0] if hostname != "*" else ""
if ip not in existing:
mac_sql = f"'{mac}'" if mac else "NULL"
host_sql = f"'{short}'" if short else "''"
mysql_exec(f"INSERT INTO ipaddresses (ip_addr, subnetId, hostname, mac, description, lastSeen) VALUES (INET_ATON('{ip}'), {subnet_id}, {host_sql}, {mac_sql}, '-- {site_name} dhcp --', NOW())")
imported += 1
print(f" NEW {ip} -> {short} mac={mac}")
else:
updates = ["lastSeen=NOW()"]
if mac and not existing[ip]["mac"]: updates.append(f"mac='{mac}'")
if short and not existing[ip]["hostname"]: updates.append(f"hostname='{short}'")
mysql_exec(f"UPDATE ipaddresses SET {','.join(updates)} WHERE ip_addr=INET_ATON('{ip}')")
# ARP table
for line in arp_part.strip().split("\n"):
m = re.match(r'(\d+\.\d+\.\d+\.\d+)\s+\S+\s+\S+\s+([0-9a-f:]+)\s+', line)
if not m: continue
ip, mac = m.group(1), m.group(2)
if not ip.startswith(subnet_prefix) or mac == "00:00:00:00:00:00": continue
if ip in existing: continue
mysql_exec(f"INSERT INTO ipaddresses (ip_addr, subnetId, mac, description, lastSeen) VALUES (INET_ATON('{ip}'), {subnet_id}, '{mac}', '-- {site_name} arp --', NOW())")
imported += 1
print(f" NEW (arp) {ip} mac={mac}")
return imported
london = import_site(os.environ.get("LONDON_DATA", ""), "192.168.8.", 11, "london")
valchedrym = import_site(os.environ.get("VALCHEDRYM_DATA", ""), "192.168.0.", 12, "valchedrym")
print(f"\nLondon: {london} new, Valchedrym: {valchedrym} new")
PYEOF
echo "Remote import complete"
EOT
]
env {
name = "DB_HOST"
value = var.mysql_host
}
env {
name = "DB_USER"
value = "phpipam"
}
env {
name = "DB_PASS"
value_from {
secret_key_ref {
name = "phpipam-secrets"
key = "db_password"
}
}
}
env {
name = "DB_NAME"
value = "phpipam"
}
volume_mount {
name = "ssh-key"
mount_path = "/ssh"
read_only = true
}
resources {
requests = {
cpu = "10m"
memory = "64Mi"
}
limits = {
memory = "128Mi"
}
}
}
volume {
name = "ssh-key"
secret {
secret_name = "phpipam-pfsense-ssh"
default_mode = "0400"
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}