From 8cd87431403ba12e6e8f5b43b594057998a70e35 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Fri, 10 Apr 2026 16:01:32 +0000 Subject: [PATCH] docs: add phpIPAM, Kea DDNS, and DNS sync documentation - networking.md: Add phpIPAM IPAM section, Kea DDNS config, reverse DNS zones, Technitium dynamic update policy - CLAUDE.md: Add phpipam to DB rotation list, service notes, networking section - service-catalog.md: Add phpipam, mark netbox as disabled/replaced [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) --- .claude/CLAUDE.md | 4 +++- .claude/reference/service-catalog.md | 5 +++-- docs/architecture/networking.md | 18 +++++++++++++++--- 3 files changed, 21 insertions(+), 6 deletions(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index e51f539b..5d9d7784 100755 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -55,7 +55,7 @@ Violations cause state drift, which causes future applies to break or silently r - **ESO (External Secrets Operator)**: `stacks/external-secrets/` — 43 ExternalSecrets + 9 DB-creds ExternalSecrets. API version `v1beta1`. Two ClusterSecretStores: `vault-kv` and `vault-database`. - **Plan-time pattern**: Former plan-time stacks use `data "kubernetes_secret"` to read ESO-created K8s Secrets at plan time (no Vault dependency). First-apply gotcha: must `terragrunt apply -target=kubernetes_manifest.external_secret` first, then full apply. `count` on resources using secret values fails — remove conditional counts. - **14 hybrid stacks** still keep `data "vault_kv_secret_v2"` for plan-time needs (job commands, Helm templatefile, module inputs). Platform has 48 plan-time refs — no migration possible without restructuring modules. -- **Database rotation**: Vault DB engine rotates passwords every 7 days (604800s). MySQL: speedtest, wrongmove, codimd, nextcloud, shlink, grafana, technitium. PostgreSQL: health, linkwarden, affine, woodpecker, claude_memory. Excluded: authentik (PgBouncer), crowdsec (Helm-baked), root users. Technitium uses a password-sync CronJob (every 6h) to push rotated password to the Technitium app config via API. +- **Database rotation**: Vault DB engine rotates passwords every 7 days (604800s). MySQL: speedtest, wrongmove, codimd, nextcloud, shlink, grafana, technitium, phpipam. PostgreSQL: health, linkwarden, affine, woodpecker, claude_memory. Excluded: authentik (PgBouncer), crowdsec (Helm-baked), root users. Technitium uses a password-sync CronJob (every 6h) to push rotated password to the Technitium app config via API. - **K8s credentials**: Vault K8s secrets engine. Roles: `dashboard-admin`, `ci-deployer`, `openclaw`, `local-admin`. Use `vault write kubernetes/creds/ROLE kubernetes_namespace=NS`. Helper: `scripts/vault-kubeconfig`. - **CI/CD (GHA + Woodpecker)**: Docker builds run on **GitHub Actions** (free on public repos). Woodpecker is **deploy-only** — receives image tag via API POST, runs `kubectl set image`. Woodpecker authenticates via K8s SA JWT → Vault K8s auth. Sync CronJob pushes `secret/ci/global` → Woodpecker API every 6h. Shell scripts in HCL heredocs: escape `$` → `$$`, `%{}` → `%%{}`. - **Platform cannot depend on vault** (circular). Apply order: vault first, then platform. Platform has 48 vault refs, all in module inputs — no ESO migration possible. @@ -112,6 +112,7 @@ Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handle - **Rate limiting**: Return 429 (not 503). Per-service tuning: Immich/Nextcloud need higher limits. - **Retry middleware**: 2 attempts, 100ms — in default ingress chain. - **HTTP/3 (QUIC)**: Enabled cluster-wide via Traefik. +- **IPAM & DNS auto-registration**: phpIPAM discovers hosts (fping every 15min). Kea DDNS on pfSense auto-registers VLAN 10/20 hosts in Technitium (RFC 2136). CronJob `phpipam-dns-sync` syncs remaining named hosts (192.168.1.x, VPN) → Technitium A+PTR records every 15min. Technitium zones accept dynamic updates from pfSense IPs. ## Service-Specific Notes | Service | Key Operational Knowledge | @@ -123,6 +124,7 @@ Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handle | Authentik | 3 replicas, PgBouncer in front of PostgreSQL, strip auth headers before forwarding | | Kyverno | failurePolicy=Ignore to prevent blocking cluster, pin chart version | | MySQL InnoDB | Enable auto-recovery, anti-affinity excludes k8s-node1 (GPU), 2Gi req / 3Gi limit | +| phpIPAM | IPAM with auto-discovery (fping every 15min). DNS sync CronJob pushes named hosts → Technitium. Kea DDNS handles VLAN 10/20; CronJob handles 192.168.1.x. API app `claude` (ssl_token). Cron container needs NET_RAW + 512Mi. | ## Monitoring & Alerting - Alert cascade inhibitions: if node is down, suppress pod alerts on that node. diff --git a/.claude/reference/service-catalog.md b/.claude/reference/service-catalog.md index ce15d665..c5d8e9d6 100644 --- a/.claude/reference/service-catalog.md +++ b/.claude/reference/service-catalog.md @@ -95,7 +95,8 @@ | stirling-pdf | PDF tools | stirling-pdf | | speedtest | Speed testing | speedtest | | freedify | Music streaming (factory pattern) | freedify | -| netbox | Network documentation | netbox | +| phpipam | IP Address Management (IPAM) + auto-discovery | phpipam | +| ~~netbox~~ | ~~Network documentation~~ (disabled, replaced by phpipam) | netbox | | infra-maintenance | Maintenance jobs | infra-maintenance | | ollama | LLM server (GPU) | ollama | | frigate | NVR/camera (GPU) | frigate | @@ -117,7 +118,7 @@ blog, hackmd, privatebin, url, echo, f1tv, excalidraw, send, audiobookshelf, jsoncrack, ntfy, cyberchef, homepage, linkwarden, changedetection, tandoor, n8n, stirling-pdf, dashy, city-guesser, -travel, netbox +travel, netbox, phpipam ``` ### Non-Proxied (Direct DNS) diff --git a/docs/architecture/networking.md b/docs/architecture/networking.md index 6313229f..5fd42b4d 100644 --- a/docs/architecture/networking.md +++ b/docs/architecture/networking.md @@ -1,6 +1,6 @@ # Networking Architecture -Last updated: 2026-04-08 +Last updated: 2026-04-10 ## Overview @@ -248,15 +248,27 @@ Containerd on all K8s nodes uses `hosts.toml` to redirect pulls to the local cac ### Key Configuration Files **pfSense**: -- Terraform: `stacks/pfsense/main.tf` -- DHCP scope: 10.0.20.50-250 (VLAN 20) +- Config: Not Terraform-managed (pfSense web UI / config.xml) +- DHCP: Kea DHCP4 on VLAN 10 (10.0.10.0/24) and VLAN 20 (10.0.20.0/24) +- DHCP DDNS: Kea DHCP-DDNS sends RFC 2136 updates to Technitium on lease grant - Firewall rules: Allow K8s egress, block inter-VLAN by default **Technitium**: - Config: Stored in PVC `technitium-data` - Zone file: `viktorbarzin.lan` (A records for all internal hosts) +- Reverse zones: `10.0.10.in-addr.arpa`, `20.0.10.in-addr.arpa`, `1.168.192.in-addr.arpa`, `2.3.10.in-addr.arpa`, `0.168.192.in-addr.arpa` +- Dynamic updates: Enabled (UseSpecifiedNetworkACL) from pfSense IPs (10.0.20.1, 10.0.10.1) - Forwarders: Cloudflare 1.1.1.1, Google 8.8.8.8 +**phpIPAM (IP Address Management)**: +- Stack: `stacks/phpipam/` +- Web UI: `phpipam.viktorbarzin.me` (Authentik-protected) +- Database: MySQL InnoDB cluster (`mysql.dbaas.svc.cluster.local`) +- Auto-discovery: fping scan every 15min via `phpipam-cron` container +- Subnets tracked: 10.0.10.0/24, 10.0.20.0/24, 192.168.1.0/24, 10.3.2.0/24, 192.168.8.0/24, 192.168.0.0/24 +- DNS sync: CronJob `phpipam-dns-sync` pushes named hosts → Technitium A+PTR records every 15min +- API: REST API enabled (app `claude`, ssl_token auth), MCP server available for agent access + **Traefik Middleware**: - Helm values: `stacks/platform/traefik-values.yaml` - Middleware CRDs: Generated by `ingress_factory` module