infra/stacks/phpipam/main.tf

753 lines
25 KiB
Terraform
Raw Normal View History

variable "tls_secret_name" {
type = string
sensitive = true
}
variable "mysql_host" { type = string }
data "vault_kv_secret_v2" "secrets" {
mount = "secret"
name = "platform"
}
locals {
technitium_password = data.vault_kv_secret_v2.secrets.data["technitium_password"]
}
resource "kubernetes_namespace" "phpipam" {
metadata {
name = "phpipam"
labels = {
tier = local.tiers.aux
"keel.sh/enrolled" = "true"
}
}
[infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] ## Context Wave 3B-continued: the Goldilocks VPA dashboard (stacks/vpa) runs a Kyverno ClusterPolicy `goldilocks-vpa-auto-mode` that mutates every namespace with `metadata.labels["goldilocks.fairwinds.com/vpa-update-mode"] = "off"`. This is intentional — Terraform owns container resource limits, and Goldilocks should only provide recommendations, never auto-update. The label is how Goldilocks decides per-namespace whether to run its VPA in `off` mode. Effect on Terraform: every `kubernetes_namespace` resource shows the label as pending-removal (`-> null`) on every `scripts/tg plan`. Dawarich survey 2026-04-18 confirmed the drift. Cluster-side count: 88 namespaces carry the label (`kubectl get ns -o json | jq ... | wc -l`). Every TF-managed namespace is affected. This commit brings the intentional admission drift under the same `# KYVERNO_LIFECYCLE_V1` discoverability marker introduced in c9d221d5 for the ndots dns_config pattern. The marker now stands generically for any Kyverno admission-webhook drift suppression; the inline comment records which specific policy stamps which specific field so future grep audits show why each suppression exists. ## This change 107 `.tf` files touched — every stack's `resource "kubernetes_namespace"` resource gets: ```hcl lifecycle { # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]] } ``` Injection was done with a brace-depth-tracking Python pass (`/tmp/add_goldilocks_ignore.py`): match `^resource "kubernetes_namespace" ` → track `{` / `}` until the outermost closing brace → insert the lifecycle block before the closing brace. The script is idempotent (skips any file that already mentions `goldilocks.fairwinds.com/vpa-update-mode`) so re-running is safe. Vault stack picked up 2 namespaces in the same file (k8s-users produces one, plus a second explicit ns) — confirmed via file diff (+8 lines). ## What is NOT in this change - `stacks/trading-bot/main.tf` — entire file is `/* … */` commented out (paused 2026-04-06 per user decision). Reverted after the script ran. - `stacks/_template/main.tf.example` — per-stack skeleton, intentionally minimal. User keeps it that way. Not touched by the script (file has no real `resource "kubernetes_namespace"` — only a placeholder comment). - `.terraform/` copies (e.g. `stacks/metallb/.terraform/modules/...`) — gitignored, won't commit; the live path was edited. - `terraform fmt` cleanup of adjacent pre-existing alignment issues in authentik, freedify, hermes-agent, nvidia, vault, meshcentral. Reverted to keep the commit scoped to the Goldilocks sweep. Those files will need a separate fmt-only commit or will be cleaned up on next real apply to that stack. ## Verification Dawarich (one of the hundred-plus touched stacks) showed the pattern before and after: ``` $ cd stacks/dawarich && ../../scripts/tg plan Before: Plan: 0 to add, 2 to change, 0 to destroy. # kubernetes_namespace.dawarich will be updated in-place (goldilocks.fairwinds.com/vpa-update-mode -> null) # module.tls_secret.kubernetes_secret.tls_secret will be updated in-place (Kyverno generate.* labels — fixed in 8d94688d) After: No changes. Your infrastructure matches the configuration. ``` Injection count check: ``` $ rg -c 'KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode' stacks/ | awk -F: '{s+=$2} END {print s}' 108 ``` ## Reproduce locally 1. `git pull` 2. Pick any stack: `cd stacks/<name> && ../../scripts/tg plan` 3. Expect: no drift on the namespace's goldilocks.fairwinds.com/vpa-update-mode label. Closes: code-dwx Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:15:27 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
}
resource "kubernetes_manifest" "external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "phpipam-secrets"
namespace = "phpipam"
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-database"
kind = "ClusterSecretStore"
}
target = {
name = "phpipam-secrets"
}
data = [{
secretKey = "db_password"
remoteRef = {
key = "static-creds/mysql-phpipam"
property = "password"
}
}]
}
}
depends_on = [kubernetes_namespace.phpipam]
}
resource "kubernetes_manifest" "external_secret_pfsense_ssh" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "phpipam-pfsense-ssh"
namespace = "phpipam"
}
spec = {
refreshInterval = "1h"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "phpipam-pfsense-ssh"
}
data = [{
secretKey = "ssh_key"
remoteRef = {
key = "viktor"
property = "phpipam_pfsense_ssh_key"
}
}]
}
}
depends_on = [kubernetes_namespace.phpipam]
}
resource "kubernetes_manifest" "external_secret_admin" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "phpipam-admin-password"
namespace = "phpipam"
}
spec = {
refreshInterval = "1h"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "phpipam-admin-password"
}
data = [{
secretKey = "password"
remoteRef = {
key = "viktor"
property = "phpipam_admin_password"
}
}]
}
}
depends_on = [kubernetes_namespace.phpipam]
}
module "tls_secret" {
source = "../../modules/kubernetes/setup_tls_secret"
namespace = kubernetes_namespace.phpipam.metadata[0].name
tls_secret_name = var.tls_secret_name
}
resource "kubernetes_deployment" "phpipam_web" {
metadata {
name = "phpipam-web"
namespace = kubernetes_namespace.phpipam.metadata[0].name
labels = {
app = "phpipam"
tier = local.tiers.aux
}
annotations = {
"reloader.stakater.com/auto" = "true"
}
}
spec {
replicas = 1
strategy {
type = "Recreate"
}
selector {
match_labels = {
app = "phpipam"
}
}
template {
metadata {
labels = {
app = "phpipam"
}
annotations = {
"diun.enable" = "true"
"dependency.kyverno.io/wait-for" = "mysql.dbaas:3306"
}
}
spec {
container {
image = "phpipam/phpipam-www:v1.7.4"
name = "phpipam-web"
port {
container_port = 80
}
env {
name = "TZ"
value = "Europe/Sofia"
}
env {
name = "IPAM_DATABASE_HOST"
value = var.mysql_host
}
env {
name = "IPAM_DATABASE_USER"
value = "phpipam"
}
env {
name = "IPAM_DATABASE_PASS"
value_from {
secret_key_ref {
name = "phpipam-secrets"
key = "db_password"
}
}
}
env {
name = "IPAM_DATABASE_NAME"
value = "phpipam"
}
env {
name = "IPAM_TRUST_X_FORWARDED"
value = "true"
}
resources {
requests = {
cpu = "10m"
memory = "64Mi"
}
limits = {
memory = "256Mi"
}
}
}
}
}
}
lifecycle {
ignore_changes = [
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
metadata[0].annotations["keel.sh/policy"],
metadata[0].annotations["keel.sh/trigger"],
metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
]
}
}
# phpipam-cron container removed — discovery now handled by phpipam-pfsense-import CronJob
# which queries Kea DHCP leases + pfSense ARP table directly (no fping needed)
resource "kubernetes_service" "phpipam" {
metadata {
name = "phpipam"
namespace = kubernetes_namespace.phpipam.metadata[0].name
labels = {
app = "phpipam"
}
}
spec {
selector = {
app = "phpipam"
}
port {
name = "http"
port = 80
target_port = 80
}
}
}
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
[infra] Auto-create Cloudflare DNS records from ingress_factory ## Context Deploying new services required manually adding hostnames to cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars — a separate file from the service stack. This was frequently forgotten, leaving services unreachable externally. ## This change: - Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory` modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP). - Simplify cloudflared tunnel from 100 per-hostname rules to wildcard `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing. - Add global Cloudflare provider via terragrunt.hcl (separate cloudflare_provider.tf with Vault-sourced API key). - Migrate 118 hostnames from centralized config.tfvars to per-service dns_type. 17 hostnames remain centrally managed (Helm ingresses, special cases). - Update docs, AGENTS.md, CLAUDE.md, dns.md runbook. ``` BEFORE AFTER config.tfvars (manual list) stacks/<svc>/main.tf | module "ingress" { v dns_type = "proxied" stacks/cloudflared/ } for_each = list | cloudflare_record auto-creates tunnel per-hostname cloudflare_record + annotation ``` ## What is NOT in this change: - Uptime Kuma monitor migration (still reads from config.tfvars) - 17 remaining centrally-managed hostnames (Helm, special cases) - Removal of allow_overwrite (keep until migration confirmed stable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:45:04 +00:00
dns_type = "proxied"
namespace = kubernetes_namespace.phpipam.metadata[0].name
name = "phpipam"
tls_secret_name = var.tls_secret_name
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default false → unprotected) variable in `modules/kubernetes/ingress_factory` with `auth = string` enum (default "required" → fail-closed). Touches every ingress_factory caller so the audit decision is recorded explicitly in code. ingress_factory (Phase 3): - `auth = "required"`: standard Authentik forward-auth (the legacy `protected = true` semantic). - `auth = "public"`: forward-auth via the new `authentik-forward-auth-public` middleware → dedicated public outpost → guest auto-bind. Logged-in users keep their real identity. - `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost itself. - `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated ingresses don't need anti-AI noise; the auth flow already discourages bots). Audit pass (Phase 4) across 96 ingress_factory call sites: - 49 explicit `protected = true` → `auth = "required"` - 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3) - 64 previously-default (no protected line) → `auth = "required"` ADDED, then reviewed individually: * 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack, homepage, wrongmove UI, privatebin) → `auth = "none"` * 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC, xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich location ingestion, immich frame kiosk, headscale CP, send anonymous drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) → `auth = "none"` * Remaining ~33 → `auth = "required"` confirmed (admin tools, internal UIs, services without app-level auth) - Smoke-test promotions to `auth = "public"`: fire-planner public UI, k8s-portal API, insta2spotify callback. Three call sites in wrapper modules (`stacks/freedify/factory/`, `stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected` bool — they translate to `auth` internally, out of scope for this rename. Behavior change: previously-default ingresses now fail closed (require Authentik login) unless explicitly flipped to `auth = "none"` or `auth = "public"`. This is the audit goal — no more accidentally-unprotected surfaces. Sites that were intentionally public (Anubis content, native APIs, webhooks) are now explicitly recorded as `auth = "none"`. Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via `terraform fmt -recursive` during the audit. Behavior-neutral. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
auth = "required"
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "phpIPAM"
"gethomepage.dev/description" = "IP Address Management"
"gethomepage.dev/icon" = "phpipam.png"
"gethomepage.dev/group" = "Infrastructure"
"gethomepage.dev/pod-selector" = ""
}
}
# CronJob: Bidirectional sync between phpIPAM and Technitium DNS
# 1. Push: named phpIPAM hosts → Technitium A + PTR records
# 2. Pull: Technitium reverse DNS → phpIPAM hostnames for unnamed entries
resource "kubernetes_cron_job_v1" "phpipam_dns_sync" {
metadata {
name = "phpipam-dns-sync"
namespace = kubernetes_namespace.phpipam.metadata[0].name
}
spec {
schedule = "*/15 * * * *"
successful_jobs_history_limit = 1
failed_jobs_history_limit = 3
concurrency_policy = "Forbid"
job_template {
metadata {}
spec {
backoff_limit = 1
template {
metadata {}
spec {
restart_policy = "Never"
container {
name = "sync"
image = "mysql:8.0"
command = ["/bin/bash", "-c", <<-EOT
set -e
TECH_URL="http://technitium-web.technitium.svc.cluster.local:5380"
# Login to Technitium
TECH_TOKEN=$$(curl -sf "$$TECH_URL/api/user/login?user=admin&pass=$$TECH_PASS" | sed 's/.*"token":"\([^"]*\)".*/\1/')
if [ -z "$$TECH_TOKEN" ]; then echo "Technitium login failed"; exit 1; fi
echo "Technitium auth OK"
# Query phpIPAM MySQL directly for hosts with hostnames
HOSTS=$$(mysql -h $$DB_HOST -u $$DB_USER -p$$DB_PASS $$DB_NAME -N -B -e \
"SELECT INET_NTOA(ip_addr), hostname FROM ipaddresses WHERE hostname != '' AND hostname IS NOT NULL AND subnetId >= 7")
SYNCED=0
echo "$$HOSTS" | while IFS=$$'\t' read -r IP HOSTNAME; do
[ -z "$$IP" ] || [ -z "$$HOSTNAME" ] && continue
SHORT=$$(echo "$$HOSTNAME" | cut -d. -f1)
FQDN="$$SHORT.viktorbarzin.lan"
# A record
curl -sf -o /dev/null -X POST "$$TECH_URL/api/zones/records/add?token=$$TECH_TOKEN" \
-d "domain=$$FQDN&zone=viktorbarzin.lan&type=A&ipAddress=$$IP&overwrite=true&ttl=300"
# PTR record
O1=$$(echo $$IP | cut -d. -f1); O2=$$(echo $$IP | cut -d. -f2)
O3=$$(echo $$IP | cut -d. -f3); O4=$$(echo $$IP | cut -d. -f4)
curl -sf -o /dev/null -X POST "$$TECH_URL/api/zones/records/add?token=$$TECH_TOKEN" \
-d "domain=$$O4.$$O3.$$O2.$$O1.in-addr.arpa&zone=$$O3.$$O2.$$O1.in-addr.arpa&type=PTR&ptrName=$$FQDN&overwrite=true&ttl=300" 2>/dev/null || true
SYNCED=$$((SYNCED + 1))
echo " $$IP -> $$FQDN"
done
echo "Push sync complete"
# Reverse sync: pull hostnames from DNS into phpIPAM for unnamed entries
echo ""
echo "=== Reverse sync: DNS -> phpIPAM ==="
UNNAMED=$$(mysql -h $$DB_HOST -u $$DB_USER -p$$DB_PASS $$DB_NAME -N -B -e \
"SELECT id, INET_NTOA(ip_addr) FROM ipaddresses WHERE (hostname IS NULL OR hostname = '') AND subnetId >= 7")
echo "$$UNNAMED" | while IFS=$$'\t' read -r ID IP; do
[ -z "$$ID" ] || [ -z "$$IP" ] && continue
# Query Technitium for PTR record
O1=$$(echo $$IP | cut -d. -f1); O2=$$(echo $$IP | cut -d. -f2)
O3=$$(echo $$IP | cut -d. -f3); O4=$$(echo $$IP | cut -d. -f4)
PTR_NAME="$$O4.$$O3.$$O2.$$O1.in-addr.arpa"
REV_ZONE="$$O3.$$O2.$$O1.in-addr.arpa"
RESULT=$$(curl -sf "$$TECH_URL/api/zones/records/get?token=$$TECH_TOKEN&domain=$$PTR_NAME&zone=$$REV_ZONE&type=PTR" 2>/dev/null)
HOSTNAME=$$(echo "$$RESULT" | sed -n 's/.*"ptrName":"\([^"]*\)".*/\1/p' | head -1)
[ -z "$$HOSTNAME" ] && continue
# Extract short name
SHORT=$$(echo "$$HOSTNAME" | cut -d. -f1)
[ -z "$$SHORT" ] && continue
# Update phpIPAM
mysql -h $$DB_HOST -u $$DB_USER -p$$DB_PASS $$DB_NAME -e \
"UPDATE ipaddresses SET hostname='$$SHORT' WHERE id=$$ID AND (hostname IS NULL OR hostname = '')"
echo " $$IP -> $$SHORT (from DNS)"
done
echo "Bidirectional sync complete"
EOT
]
env {
name = "TECH_PASS"
value = local.technitium_password
}
env {
name = "DB_HOST"
value = var.mysql_host
}
env {
name = "DB_USER"
value = "phpipam"
}
env {
name = "DB_PASS"
value_from {
secret_key_ref {
name = "phpipam-secrets"
key = "db_password"
}
}
}
env {
name = "DB_NAME"
value = "phpipam"
}
resources {
requests = {
cpu = "10m"
memory = "32Mi"
}
limits = {
memory = "128Mi"
}
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}
# CronJob: Import devices from pfSense (Kea DHCP leases + ARP table) into phpIPAM
# Replaces active fping scanning with passive data from pfSense
resource "kubernetes_cron_job_v1" "phpipam_pfsense_import" {
metadata {
name = "phpipam-pfsense-import"
namespace = kubernetes_namespace.phpipam.metadata[0].name
}
spec {
schedule = "0 * * * *"
successful_jobs_history_limit = 1
failed_jobs_history_limit = 3
concurrency_policy = "Forbid"
job_template {
metadata {}
spec {
backoff_limit = 1
template {
metadata {}
spec {
restart_policy = "Never"
container {
name = "import"
image = "alpine:3.21"
command = ["/bin/sh", "-c", <<-EOT
set -e
apk add --no-cache -q openssh-client mysql-client python3 > /dev/null 2>&1
# Setup SSH key
mkdir -p /root/.ssh
cp /ssh/ssh_key /root/.ssh/id_rsa
chmod 600 /root/.ssh/id_rsa
echo "StrictHostKeyChecking no" > /root/.ssh/config
# 1. Get Kea DHCP leases via control socket
echo "=== Fetching Kea leases ==="
LEASES=$$(ssh admin@10.0.20.1 'echo "{\"command\": \"lease4-get-all\"}" | /usr/bin/nc -U /tmp/kea4-ctrl-socket 2>/dev/null')
# 2. Get ARP table
echo "=== Fetching ARP table ==="
ARP=$$(ssh admin@10.0.20.1 'arp -an' 2>/dev/null)
# Remote sites handled by phpipam-remote-import CronJob (hourly)
# 3. Parse and import into phpIPAM MySQL
echo "=== Importing into phpIPAM ==="
export LEASES_DATA="$$LEASES"
export ARP_DATA="$$ARP"
python3 << 'PYEOF'
import json, subprocess, sys, re, os
db_host = os.environ["DB_HOST"]
db_user = os.environ["DB_USER"]
db_pass = os.environ["DB_PASS"]
db_name = os.environ["DB_NAME"]
def mysql_exec(sql):
r = subprocess.run(
["mysql", "-h", db_host, "-u", db_user, f"-p{db_pass}", db_name, "-N", "-B", "-e", sql],
capture_output=True, text=True
)
return r.stdout.strip()
# Get existing phpIPAM entries (subnetId >= 7 = our subnets)
existing = {}
rows = mysql_exec("SELECT INET_NTOA(ip_addr), hostname, mac, subnetId FROM ipaddresses WHERE subnetId >= 7")
for line in rows.split("\n"):
if not line: continue
parts = line.split("\t")
existing[parts[0]] = {"hostname": parts[1] if parts[1] != "NULL" else "", "mac": parts[2] if parts[2] != "NULL" else "", "subnetId": parts[3]}
# Subnet mapping
def get_subnet_id(ip):
if ip.startswith("10.0.10."): return 7
if ip.startswith("10.0.20."): return 8
if ip.startswith("192.168.1."): return 9
if ip.startswith("10.3.2."): return 10
if ip.startswith("192.168.8."): return 11
if ip.startswith("192.168.0."): return 12
return None
# Parse Kea leases
leases_raw = os.environ.get("LEASES_DATA", "{}")
try:
leases_json = json.loads(leases_raw)
leases = leases_json.get("arguments", {}).get("leases", []) if isinstance(leases_json, dict) else leases_json[0].get("arguments", {}).get("leases", [])
except:
leases = []
imported = 0
updated_mac = 0
updated_hostname = 0
for lease in leases:
ip = lease["ip-address"]
mac = lease.get("hw-address", "")
hostname = lease.get("hostname", "").split(".")[0] # strip .viktorbarzin.lan
subnet_id = get_subnet_id(ip)
if not subnet_id: continue
if ip not in existing:
# New host — insert
mac_sql = f"'{mac}'" if mac else "NULL"
host_sql = f"'{hostname}'" if hostname else "''"
mysql_exec(f"INSERT INTO ipaddresses (ip_addr, subnetId, hostname, mac, description, lastSeen) VALUES (INET_ATON('{ip}'), {subnet_id}, {host_sql}, {mac_sql}, '-- kea lease --', NOW())")
imported += 1
print(f" NEW {ip} -> {hostname} mac={mac}")
else:
# Existing — update MAC if missing, hostname if missing, lastSeen always
updates = ["lastSeen=NOW()"]
if mac and not existing[ip]["mac"]:
updates.append(f"mac='{mac}'")
updated_mac += 1
if hostname and not existing[ip]["hostname"]:
updates.append(f"hostname='{hostname}'")
updated_hostname += 1
mysql_exec(f"UPDATE ipaddresses SET {','.join(updates)} WHERE ip_addr=INET_ATON('{ip}')")
# Parse ARP table for devices not in Kea (static IPs)
arp_raw = os.environ.get("ARP_DATA", "")
lease_ips = {l["ip-address"] for l in leases}
for line in arp_raw.split("\n"):
m = re.match(r'\? \((\d+\.\d+\.\d+\.\d+)\) at ([0-9a-f:]+) on', line)
if not m: continue
ip, mac = m.group(1), m.group(2)
if mac == "(incomplete)": continue
subnet_id = get_subnet_id(ip)
if not subnet_id: continue
if ip in lease_ips: continue # already handled by Kea
if ip not in existing:
mysql_exec(f"INSERT INTO ipaddresses (ip_addr, subnetId, mac, description, lastSeen) VALUES (INET_ATON('{ip}'), {subnet_id}, '{mac}', '-- arp discovered --', NOW())")
imported += 1
print(f" NEW (arp) {ip} mac={mac}")
else:
updates = ["lastSeen=NOW()"]
if mac and not existing[ip]["mac"]:
updates.append(f"mac='{mac}'")
updated_mac += 1
mysql_exec(f"UPDATE ipaddresses SET {','.join(updates)} WHERE ip_addr=INET_ATON('{ip}')")
print(f"\nImported: {imported} new, Updated: {updated_mac} MACs, {updated_hostname} hostnames")
PYEOF
echo "Import complete"
EOT
]
env {
name = "DB_HOST"
value = var.mysql_host
}
env {
name = "DB_USER"
value = "phpipam"
}
env {
name = "DB_PASS"
value_from {
secret_key_ref {
name = "phpipam-secrets"
key = "db_password"
}
}
}
env {
name = "DB_NAME"
value = "phpipam"
}
volume_mount {
name = "ssh-key"
mount_path = "/ssh"
read_only = true
}
resources {
requests = {
cpu = "10m"
memory = "64Mi"
}
limits = {
memory = "128Mi"
}
}
}
volume {
name = "ssh-key"
secret {
secret_name = "phpipam-pfsense-ssh"
default_mode = "0400"
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}
# CronJob: Import devices from remote sites (London + Valchedrym) via SSH
# Runs hourly — these networks are mostly static
resource "kubernetes_cron_job_v1" "phpipam_remote_import" {
metadata {
name = "phpipam-remote-import"
namespace = kubernetes_namespace.phpipam.metadata[0].name
}
spec {
schedule = "0 * * * *"
successful_jobs_history_limit = 1
failed_jobs_history_limit = 3
concurrency_policy = "Forbid"
job_template {
metadata {}
spec {
backoff_limit = 1
template {
metadata {}
spec {
restart_policy = "Never"
container {
name = "import"
image = "alpine:3.21"
command = ["/bin/sh", "-c", <<-EOT
set -e
apk add --no-cache -q openssh-client mysql-client python3 > /dev/null 2>&1
mkdir -p /root/.ssh
cp /ssh/ssh_key /root/.ssh/id_rsa
chmod 600 /root/.ssh/id_rsa
echo "StrictHostKeyChecking no" > /root/.ssh/config
# Pull DHCP leases + ARP from Valchedrym via pfSense SSH hop
echo "=== Valchedrym (192.168.0.1 via pfSense) ==="
VALCHEDRYM=$$(ssh -o ConnectTimeout=10 admin@10.0.20.1 'timeout 15 ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 root@192.168.0.1 "cat /tmp/dhcp.leases 2>/dev/null; echo ---ARP---; cat /proc/net/arp 2>/dev/null" 2>/dev/null' 2>/dev/null || echo "")
echo "=== London (192.168.8.1 via pfSense) ==="
LONDON=$$(ssh -o ConnectTimeout=10 admin@10.0.20.1 'timeout 15 ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 root@192.168.8.1 "cat /tmp/dhcp.leases 2>/dev/null; echo ---ARP---; cat /proc/net/arp 2>/dev/null" 2>/dev/null' 2>/dev/null || echo "")
echo "=== Importing ==="
export LONDON_DATA="$$LONDON"
export VALCHEDRYM_DATA="$$VALCHEDRYM"
python3 << 'PYEOF'
import os, re, subprocess
db_host = os.environ["DB_HOST"]
db_user = os.environ["DB_USER"]
db_pass = os.environ["DB_PASS"]
db_name = os.environ["DB_NAME"]
def mysql_exec(sql):
subprocess.run(["mysql", "-h", db_host, "-u", db_user, f"-p{db_pass}", db_name, "-N", "-B", "-e", sql], capture_output=True, text=True)
def get_existing():
r = subprocess.run(["mysql", "-h", db_host, "-u", db_user, f"-p{db_pass}", db_name, "-N", "-B", "-e",
"SELECT INET_NTOA(ip_addr), hostname, mac, subnetId FROM ipaddresses WHERE subnetId IN (11, 12)"],
capture_output=True, text=True)
existing = {}
for line in r.stdout.strip().split("\n"):
if not line: continue
parts = line.split("\t")
existing[parts[0]] = {"hostname": parts[1] if parts[1] != "NULL" else "", "mac": parts[2] if parts[2] != "NULL" else ""}
return existing
def import_site(data, subnet_prefix, subnet_id, site_name):
if not data or "---ARP---" not in data:
print(f" {site_name}: no data")
return 0
existing = get_existing()
dhcp_part, arp_part = data.split("---ARP---", 1)
imported = 0
# DHCP leases: timestamp mac ip hostname client_id
for line in dhcp_part.strip().split("\n"):
parts = line.split()
if len(parts) < 4: continue
mac, ip, hostname = parts[1], parts[2], parts[3]
if not ip.startswith(subnet_prefix): continue
short = hostname.split(".")[0] if hostname != "*" else ""
if ip not in existing:
mac_sql = f"'{mac}'" if mac else "NULL"
host_sql = f"'{short}'" if short else "''"
mysql_exec(f"INSERT INTO ipaddresses (ip_addr, subnetId, hostname, mac, description, lastSeen) VALUES (INET_ATON('{ip}'), {subnet_id}, {host_sql}, {mac_sql}, '-- {site_name} dhcp --', NOW())")
imported += 1
print(f" NEW {ip} -> {short} mac={mac}")
else:
updates = ["lastSeen=NOW()"]
if mac and not existing[ip]["mac"]: updates.append(f"mac='{mac}'")
if short and not existing[ip]["hostname"]: updates.append(f"hostname='{short}'")
mysql_exec(f"UPDATE ipaddresses SET {','.join(updates)} WHERE ip_addr=INET_ATON('{ip}')")
# ARP table
for line in arp_part.strip().split("\n"):
m = re.match(r'(\d+\.\d+\.\d+\.\d+)\s+\S+\s+\S+\s+([0-9a-f:]+)\s+', line)
if not m: continue
ip, mac = m.group(1), m.group(2)
if not ip.startswith(subnet_prefix) or mac == "00:00:00:00:00:00": continue
if ip in existing: continue
mysql_exec(f"INSERT INTO ipaddresses (ip_addr, subnetId, mac, description, lastSeen) VALUES (INET_ATON('{ip}'), {subnet_id}, '{mac}', '-- {site_name} arp --', NOW())")
imported += 1
print(f" NEW (arp) {ip} mac={mac}")
return imported
london = import_site(os.environ.get("LONDON_DATA", ""), "192.168.8.", 11, "london")
valchedrym = import_site(os.environ.get("VALCHEDRYM_DATA", ""), "192.168.0.", 12, "valchedrym")
print(f"\nLondon: {london} new, Valchedrym: {valchedrym} new")
PYEOF
echo "Remote import complete"
EOT
]
env {
name = "DB_HOST"
value = var.mysql_host
}
env {
name = "DB_USER"
value = "phpipam"
}
env {
name = "DB_PASS"
value_from {
secret_key_ref {
name = "phpipam-secrets"
key = "db_password"
}
}
}
env {
name = "DB_NAME"
value = "phpipam"
}
volume_mount {
name = "ssh-key"
mount_path = "/ssh"
read_only = true
}
resources {
requests = {
cpu = "10m"
memory = "64Mi"
}
limits = {
memory = "128Mi"
}
}
}
volume {
name = "ssh-key"
secret {
secret_name = "phpipam-pfsense-ssh"
default_mode = "0400"
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}
# CI retrigger 2026-05-16T13:42:57+00:00 — bulk enrollment apply (pipeline #689 killed)
# CI retrigger v2 2026-05-16T13:46:35+00:00