infra/stacks/ebooks/main.tf

966 lines
26 KiB
Terraform
Raw Normal View History

variable "tls_secret_name" {
type = string
sensitive = true
}
variable "nfs_server" { type = string }
resource "kubernetes_namespace" "ebooks" {
metadata {
name = "ebooks"
labels = {
tier = local.tiers.edge
}
}
[infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] ## Context Wave 3B-continued: the Goldilocks VPA dashboard (stacks/vpa) runs a Kyverno ClusterPolicy `goldilocks-vpa-auto-mode` that mutates every namespace with `metadata.labels["goldilocks.fairwinds.com/vpa-update-mode"] = "off"`. This is intentional — Terraform owns container resource limits, and Goldilocks should only provide recommendations, never auto-update. The label is how Goldilocks decides per-namespace whether to run its VPA in `off` mode. Effect on Terraform: every `kubernetes_namespace` resource shows the label as pending-removal (`-> null`) on every `scripts/tg plan`. Dawarich survey 2026-04-18 confirmed the drift. Cluster-side count: 88 namespaces carry the label (`kubectl get ns -o json | jq ... | wc -l`). Every TF-managed namespace is affected. This commit brings the intentional admission drift under the same `# KYVERNO_LIFECYCLE_V1` discoverability marker introduced in c9d221d5 for the ndots dns_config pattern. The marker now stands generically for any Kyverno admission-webhook drift suppression; the inline comment records which specific policy stamps which specific field so future grep audits show why each suppression exists. ## This change 107 `.tf` files touched — every stack's `resource "kubernetes_namespace"` resource gets: ```hcl lifecycle { # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]] } ``` Injection was done with a brace-depth-tracking Python pass (`/tmp/add_goldilocks_ignore.py`): match `^resource "kubernetes_namespace" ` → track `{` / `}` until the outermost closing brace → insert the lifecycle block before the closing brace. The script is idempotent (skips any file that already mentions `goldilocks.fairwinds.com/vpa-update-mode`) so re-running is safe. Vault stack picked up 2 namespaces in the same file (k8s-users produces one, plus a second explicit ns) — confirmed via file diff (+8 lines). ## What is NOT in this change - `stacks/trading-bot/main.tf` — entire file is `/* … */` commented out (paused 2026-04-06 per user decision). Reverted after the script ran. - `stacks/_template/main.tf.example` — per-stack skeleton, intentionally minimal. User keeps it that way. Not touched by the script (file has no real `resource "kubernetes_namespace"` — only a placeholder comment). - `.terraform/` copies (e.g. `stacks/metallb/.terraform/modules/...`) — gitignored, won't commit; the live path was edited. - `terraform fmt` cleanup of adjacent pre-existing alignment issues in authentik, freedify, hermes-agent, nvidia, vault, meshcentral. Reverted to keep the commit scoped to the Goldilocks sweep. Those files will need a separate fmt-only commit or will be cleaned up on next real apply to that stack. ## Verification Dawarich (one of the hundred-plus touched stacks) showed the pattern before and after: ``` $ cd stacks/dawarich && ../../scripts/tg plan Before: Plan: 0 to add, 2 to change, 0 to destroy. # kubernetes_namespace.dawarich will be updated in-place (goldilocks.fairwinds.com/vpa-update-mode -> null) # module.tls_secret.kubernetes_secret.tls_secret will be updated in-place (Kyverno generate.* labels — fixed in 8d94688d) After: No changes. Your infrastructure matches the configuration. ``` Injection count check: ``` $ rg -c 'KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode' stacks/ | awk -F: '{s+=$2} END {print s}' 108 ``` ## Reproduce locally 1. `git pull` 2. Pick any stack: `cd stacks/<name> && ../../scripts/tg plan` 3. Expect: no drift on the namespace's goldilocks.fairwinds.com/vpa-update-mode label. Closes: code-dwx Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:15:27 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
}
# ExternalSecrets for all three sources
resource "kubernetes_manifest" "calibre_external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "calibre-secrets"
namespace = "ebooks"
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "calibre-secrets"
}
dataFrom = [{
extract = {
key = "calibre"
}
}]
}
}
depends_on = [kubernetes_namespace.ebooks]
}
resource "kubernetes_manifest" "audiobookshelf_external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "audiobookshelf-secrets"
namespace = "ebooks"
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "audiobookshelf-secrets"
}
dataFrom = [{
extract = {
key = "audiobookshelf"
}
}]
}
}
depends_on = [kubernetes_namespace.ebooks]
}
resource "kubernetes_manifest" "servarr_external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "servarr-secrets"
namespace = "ebooks"
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "servarr-secrets"
}
dataFrom = [{
extract = {
key = "servarr"
}
}]
}
}
depends_on = [kubernetes_namespace.ebooks]
}
# Data sources to read ExternalSecret-created secrets
data "kubernetes_secret" "calibre_secrets" {
metadata {
name = "calibre-secrets"
namespace = kubernetes_namespace.ebooks.metadata[0].name
}
depends_on = [kubernetes_manifest.calibre_external_secret]
}
data "kubernetes_secret" "audiobookshelf_secrets" {
metadata {
name = "audiobookshelf-secrets"
namespace = kubernetes_namespace.ebooks.metadata[0].name
}
depends_on = [kubernetes_manifest.audiobookshelf_external_secret]
}
data "kubernetes_secret" "servarr_secrets" {
metadata {
name = "servarr-secrets"
namespace = kubernetes_namespace.ebooks.metadata[0].name
}
depends_on = [kubernetes_manifest.servarr_external_secret]
}
locals {
calibre_homepage_credentials = jsondecode(data.kubernetes_secret.calibre_secrets.data["homepage_credentials"])
audiobookshelf_homepage_credentials = jsondecode(data.kubernetes_secret.audiobookshelf_secrets.data["homepage_credentials"])
}
module "tls_secret" {
source = "../../modules/kubernetes/setup_tls_secret"
namespace = kubernetes_namespace.ebooks.metadata[0].name
tls_secret_name = var.tls_secret_name
}
# NFS Volumes - Calibre (prefixed with ebooks- to avoid PV name clash with old stacks)
module "nfs_calibre_library_host" {
source = "../../modules/kubernetes/nfs_volume"
name = "ebooks-calibre-library-host"
namespace = kubernetes_namespace.ebooks.metadata[0].name
nfs_server = "192.168.1.127"
nfs_path = "/srv/nfs/calibre-web-automated/calibre-library"
}
# iSCSI volume for config (SQLite DBs) - enables WAL mode for concurrent reads/writes
resource "kubernetes_persistent_volume_claim" "calibre_config_iscsi" {
metadata {
name = "ebooks-calibre-config-proxmox"
namespace = kubernetes_namespace.ebooks.metadata[0].name
annotations = {
"resize.topolvm.io/threshold" = "10%"
"resize.topolvm.io/increase" = "50%"
"resize.topolvm.io/storage_limit" = "10Gi"
}
}
spec {
access_modes = ["ReadWriteOnce"]
storage_class_name = "proxmox-lvm"
resources {
requests = {
storage = "2Gi"
}
}
}
lifecycle {
# The autoresizer expands requests.storage up to storage_limit and
# PVCs can't shrink. Without this, every TF apply tries to revert
# to the spec value, K8s rejects the shrink, and the PVC ends up
# in Terminating-but-in-use limbo.
ignore_changes = [spec[0].resources[0].requests]
}
}
module "nfs_calibre_ingest_host" {
source = "../../modules/kubernetes/nfs_volume"
name = "ebooks-calibre-ingest-host"
namespace = kubernetes_namespace.ebooks.metadata[0].name
nfs_server = "192.168.1.127"
nfs_path = "/srv/nfs/calibre-web-automated/cwa-book-ingest"
}
module "nfs_calibre_stacks_config_host" {
source = "../../modules/kubernetes/nfs_volume"
name = "ebooks-calibre-stacks-config-host"
namespace = kubernetes_namespace.ebooks.metadata[0].name
nfs_server = "192.168.1.127"
nfs_path = "/srv/nfs/calibre-web-automated/stacks"
}
# NFS Volumes - Audiobookshelf (prefixed with ebooks- to avoid PV name clash)
module "nfs_audiobookshelf_audiobooks_host" {
source = "../../modules/kubernetes/nfs_volume"
name = "ebooks-abs-audiobooks-host"
namespace = kubernetes_namespace.ebooks.metadata[0].name
nfs_server = "192.168.1.127"
nfs_path = "/srv/nfs/audiobookshelf/audiobooks"
}
module "nfs_audiobookshelf_podcasts_host" {
source = "../../modules/kubernetes/nfs_volume"
name = "ebooks-abs-podcasts-host"
namespace = kubernetes_namespace.ebooks.metadata[0].name
nfs_server = "192.168.1.127"
nfs_path = "/srv/nfs/audiobookshelf/podcasts"
}
resource "kubernetes_persistent_volume_claim" "abs_config_proxmox" {
wait_until_bound = false
metadata {
name = "ebooks-abs-config-proxmox"
namespace = kubernetes_namespace.ebooks.metadata[0].name
annotations = {
"resize.topolvm.io/threshold" = "10%"
"resize.topolvm.io/increase" = "100%"
"resize.topolvm.io/storage_limit" = "5Gi"
}
}
spec {
access_modes = ["ReadWriteOnce"]
storage_class_name = "proxmox-lvm"
resources {
requests = {
storage = "1Gi"
}
}
}
lifecycle {
# The autoresizer expands requests.storage up to storage_limit and
# PVCs can't shrink. Without this, every TF apply tries to revert
# to the spec value, K8s rejects the shrink, and the PVC ends up
# in Terminating-but-in-use limbo.
ignore_changes = [spec[0].resources[0].requests]
}
}
module "nfs_audiobookshelf_metadata_host" {
source = "../../modules/kubernetes/nfs_volume"
name = "ebooks-abs-metadata-host"
namespace = kubernetes_namespace.ebooks.metadata[0].name
nfs_server = "192.168.1.127"
nfs_path = "/srv/nfs/audiobookshelf/metadata"
}
# Calibre-Web-Automated Deployment
resource "kubernetes_deployment" "calibre-web-automated" {
wait_for_rollout = true
metadata {
name = "calibre-web-automated"
namespace = kubernetes_namespace.ebooks.metadata[0].name
labels = {
app = "calibre-web-automated"
tier = local.tiers.edge
}
annotations = {
"reloader.stakater.com/search" = "true"
}
}
spec {
replicas = 1
strategy {
type = "Recreate"
}
selector {
match_labels = {
app = "calibre-web-automated"
}
}
template {
metadata {
annotations = {
"diun.enable" = "false"
"diun.include_tags" = "^\\d+(?:\\.\\d+)?(?:\\.\\d+)?$"
}
labels = {
app = "calibre-web-automated"
}
}
spec {
container {
image = "viktorbarzin/calibre-web-automated:latest"
name = "calibre-web-automated"
env {
name = "PUID"
value = 1000
}
env {
name = "PGID"
value = 1000
}
env {
name = "NO_CHOWN"
value = "true"
}
env {
name = "CALIBRE_PORT"
value = "8083"
}
port {
container_port = 8083
}
startup_probe {
http_get {
path = "/"
port = 8083
}
initial_delay_seconds = 10
timeout_seconds = 5
period_seconds = 5
failure_threshold = 24
}
liveness_probe {
http_get {
path = "/"
port = 8083
}
timeout_seconds = 10
period_seconds = 30
failure_threshold = 6
}
resources {
requests = {
cpu = "50m"
memory = "512Mi"
}
limits = {
memory = "1Gi"
}
}
volume_mount {
name = "config"
mount_path = "/config"
}
volume_mount {
name = "library"
mount_path = "/calibre-library"
}
volume_mount {
name = "ingest"
mount_path = "/cwa-book-ingest"
}
}
volume {
name = "library"
persistent_volume_claim {
claim_name = module.nfs_calibre_library_host.claim_name
}
}
volume {
name = "config"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.calibre_config_iscsi.metadata[0].name
}
}
volume {
name = "ingest"
persistent_volume_claim {
claim_name = module.nfs_calibre_ingest_host.claim_name
}
}
}
}
}
lifecycle {
[infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] ## Context Phase 1 of the state-drift consolidation audit (plan Wave 3) identified that the entire repo leans on a repeated `lifecycle { ignore_changes = [...dns_config] }` snippet to suppress Kyverno's admission-webhook dns_config mutation (the ndots=2 override that prevents NxDomain search-domain flooding). 27 occurrences across 19 stacks. Without this suppression, every pod-owning resource shows perpetual TF plan drift. The original plan proposed a shared `modules/kubernetes/kyverno_lifecycle/` module emitting the ignore-paths list as an output that stacks would consume in their `ignore_changes` blocks. That approach is architecturally impossible: Terraform's `ignore_changes` meta-argument accepts only static attribute paths — it rejects module outputs, locals, variables, and any expression (the HCL spec evaluates `lifecycle` before the regular expression graph). So a DRY module cannot exist. The canonical pattern IS the repeated snippet. What the snippet was missing was a *discoverability tag* so that (a) new resources can be validated for compliance, (b) the existing 27 sites can be grep'd in a single command, and (c) future maintainers understand the convention rather than each reinventing it. ## This change - Introduces `# KYVERNO_LIFECYCLE_V1` as the canonical marker comment. Attached inline on every `spec[0].template[0].spec[0].dns_config` line (or `spec[0].job_template[0].spec[0]...` for CronJobs) across all 27 existing suppression sites. - Documents the convention with rationale and copy-paste snippets in `AGENTS.md` → new "Kyverno Drift Suppression" section. - Expands the existing `.claude/CLAUDE.md` Kyverno ndots note to reference the marker and explain why the module approach is blocked. - Updates `_template/main.tf.example` so every new stack starts compliant. ## What is NOT in this change - The `kubernetes_manifest` Kyverno annotation drift (beads `code-seq`) — that is Phase B with a sibling `# KYVERNO_MANIFEST_V1` marker. - Behavioral changes — every `ignore_changes` list is byte-identical save for the inline comment. - The fallback module the original plan anticipated — skipped because Terraform rejects expressions in `ignore_changes`. - `terraform fmt` cleanup on adjacent unrelated blocks in three files (claude-agent-service, freedify/factory, hermes-agent). Reverted to keep this commit scoped to the convention rollout. ## Before / after Before (cannot distinguish accidental-forgotten from intentional-convention): ```hcl lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] } ``` After (greppable, self-documenting, discoverable by tooling): ```hcl lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1 } ``` ## Test Plan ### Automated ``` $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 27 $ git diff --stat | grep -E '\.(tf|tf\.example|md)$' | wc -l 21 # All code-file diffs are 1 insertion + 1 deletion per marker site, # except beads-server (3), ebooks (4), immich (3), uptime-kuma (2). $ git diff --stat stacks/ | tail -1 20 files changed, 45 insertions(+), 28 deletions(-) ``` ### Manual Verification No apply required — HCL comments only. Zero effect on any stack's plan output. Future audits: `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` must grow as new pod-owning resources are added. ## Reproduce locally 1. `cd infra && git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/` → expect 27 hits in 19 files 3. Grep any new `kubernetes_deployment` for the marker; absence = missing suppression. Closes: code-28m Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:15:51 +00:00
ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
}
resource "kubernetes_service" "calibre" {
metadata {
name = "calibre"
namespace = kubernetes_namespace.ebooks.metadata[0].name
labels = {
"app" = "calibre"
}
}
spec {
selector = {
app = "calibre-web-automated"
}
port {
name = "http"
target_port = 8083
port = 80
protocol = "TCP"
}
}
}
module "calibre_ingress" {
source = "../../modules/kubernetes/ingress_factory"
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default false → unprotected) variable in `modules/kubernetes/ingress_factory` with `auth = string` enum (default "required" → fail-closed). Touches every ingress_factory caller so the audit decision is recorded explicitly in code. ingress_factory (Phase 3): - `auth = "required"`: standard Authentik forward-auth (the legacy `protected = true` semantic). - `auth = "public"`: forward-auth via the new `authentik-forward-auth-public` middleware → dedicated public outpost → guest auto-bind. Logged-in users keep their real identity. - `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost itself. - `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated ingresses don't need anti-AI noise; the auth flow already discourages bots). Audit pass (Phase 4) across 96 ingress_factory call sites: - 49 explicit `protected = true` → `auth = "required"` - 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3) - 64 previously-default (no protected line) → `auth = "required"` ADDED, then reviewed individually: * 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack, homepage, wrongmove UI, privatebin) → `auth = "none"` * 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC, xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich location ingestion, immich frame kiosk, headscale CP, send anonymous drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) → `auth = "none"` * Remaining ~33 → `auth = "required"` confirmed (admin tools, internal UIs, services without app-level auth) - Smoke-test promotions to `auth = "public"`: fire-planner public UI, k8s-portal API, insta2spotify callback. Three call sites in wrapper modules (`stacks/freedify/factory/`, `stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected` bool — they translate to `auth` internally, out of scope for this rename. Behavior change: previously-default ingresses now fail closed (require Authentik login) unless explicitly flipped to `auth = "none"` or `auth = "public"`. This is the audit goal — no more accidentally-unprotected surfaces. Sites that were intentionally public (Anubis content, native APIs, webhooks) are now explicitly recorded as `auth = "none"`. Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via `terraform fmt -recursive` during the audit. Behavior-neutral. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
auth = "required"
[infra] Auto-create Cloudflare DNS records from ingress_factory ## Context Deploying new services required manually adding hostnames to cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars — a separate file from the service stack. This was frequently forgotten, leaving services unreachable externally. ## This change: - Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory` modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP). - Simplify cloudflared tunnel from 100 per-hostname rules to wildcard `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing. - Add global Cloudflare provider via terragrunt.hcl (separate cloudflare_provider.tf with Vault-sourced API key). - Migrate 118 hostnames from centralized config.tfvars to per-service dns_type. 17 hostnames remain centrally managed (Helm ingresses, special cases). - Update docs, AGENTS.md, CLAUDE.md, dns.md runbook. ``` BEFORE AFTER config.tfvars (manual list) stacks/<svc>/main.tf | module "ingress" { v dns_type = "proxied" stacks/cloudflared/ } for_each = list | cloudflare_record auto-creates tunnel per-hostname cloudflare_record + annotation ``` ## What is NOT in this change: - Uptime Kuma monitor migration (still reads from config.tfvars) - 17 remaining centrally-managed hostnames (Helm, special cases) - Removal of allow_overwrite (keep until migration confirmed stable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:45:04 +00:00
dns_type = "proxied"
namespace = kubernetes_namespace.ebooks.metadata[0].name
name = "calibre"
tls_secret_name = var.tls_secret_name
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/description" = "Book library"
"gethomepage.dev/group" = "Media & Entertainment"
"gethomepage.dev/icon" = "calibre-web.png"
"gethomepage.dev/name" = "Calibre"
"gethomepage.dev/widget.type" = "calibreweb"
"gethomepage.dev/widget.url" = "http://calibre.ebooks.svc.cluster.local"
"gethomepage.dev/widget.username" = local.calibre_homepage_credentials["calibre-web"]["username"]
"gethomepage.dev/widget.password" = local.calibre_homepage_credentials["calibre-web"]["password"]
"gethomepage.dev/pod-selector" = ""
}
}
# Stacks - Anna's Archive Download Manager
resource "kubernetes_deployment" "annas-archive-stacks" {
metadata {
name = "annas-archive-stacks"
namespace = kubernetes_namespace.ebooks.metadata[0].name
labels = {
app = "annas-archive-stacks"
tier = local.tiers.edge
}
}
spec {
replicas = 1
selector {
match_labels = {
app = "annas-archive-stacks"
}
}
template {
metadata {
labels = {
app = "annas-archive-stacks"
}
}
spec {
container {
image = "zelest/stacks:latest"
name = "annas-archive-stacks"
resources {
requests = {
cpu = "10m"
memory = "384Mi"
}
limits = {
memory = "384Mi"
}
}
port {
container_port = 7788
}
liveness_probe {
http_get {
path = "/api/version"
port = 7788
}
initial_delay_seconds = 15
period_seconds = 30
timeout_seconds = 5
failure_threshold = 3
}
volume_mount {
name = "config"
mount_path = "/opt/stacks/config"
}
volume_mount {
name = "ingest"
mount_path = "/opt/stacks/download"
}
}
volume {
name = "config"
persistent_volume_claim {
claim_name = module.nfs_calibre_stacks_config_host.claim_name
}
}
volume {
name = "ingest"
persistent_volume_claim {
claim_name = module.nfs_calibre_ingest_host.claim_name
}
}
}
}
}
lifecycle {
[infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] ## Context Phase 1 of the state-drift consolidation audit (plan Wave 3) identified that the entire repo leans on a repeated `lifecycle { ignore_changes = [...dns_config] }` snippet to suppress Kyverno's admission-webhook dns_config mutation (the ndots=2 override that prevents NxDomain search-domain flooding). 27 occurrences across 19 stacks. Without this suppression, every pod-owning resource shows perpetual TF plan drift. The original plan proposed a shared `modules/kubernetes/kyverno_lifecycle/` module emitting the ignore-paths list as an output that stacks would consume in their `ignore_changes` blocks. That approach is architecturally impossible: Terraform's `ignore_changes` meta-argument accepts only static attribute paths — it rejects module outputs, locals, variables, and any expression (the HCL spec evaluates `lifecycle` before the regular expression graph). So a DRY module cannot exist. The canonical pattern IS the repeated snippet. What the snippet was missing was a *discoverability tag* so that (a) new resources can be validated for compliance, (b) the existing 27 sites can be grep'd in a single command, and (c) future maintainers understand the convention rather than each reinventing it. ## This change - Introduces `# KYVERNO_LIFECYCLE_V1` as the canonical marker comment. Attached inline on every `spec[0].template[0].spec[0].dns_config` line (or `spec[0].job_template[0].spec[0]...` for CronJobs) across all 27 existing suppression sites. - Documents the convention with rationale and copy-paste snippets in `AGENTS.md` → new "Kyverno Drift Suppression" section. - Expands the existing `.claude/CLAUDE.md` Kyverno ndots note to reference the marker and explain why the module approach is blocked. - Updates `_template/main.tf.example` so every new stack starts compliant. ## What is NOT in this change - The `kubernetes_manifest` Kyverno annotation drift (beads `code-seq`) — that is Phase B with a sibling `# KYVERNO_MANIFEST_V1` marker. - Behavioral changes — every `ignore_changes` list is byte-identical save for the inline comment. - The fallback module the original plan anticipated — skipped because Terraform rejects expressions in `ignore_changes`. - `terraform fmt` cleanup on adjacent unrelated blocks in three files (claude-agent-service, freedify/factory, hermes-agent). Reverted to keep this commit scoped to the convention rollout. ## Before / after Before (cannot distinguish accidental-forgotten from intentional-convention): ```hcl lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] } ``` After (greppable, self-documenting, discoverable by tooling): ```hcl lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1 } ``` ## Test Plan ### Automated ``` $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 27 $ git diff --stat | grep -E '\.(tf|tf\.example|md)$' | wc -l 21 # All code-file diffs are 1 insertion + 1 deletion per marker site, # except beads-server (3), ebooks (4), immich (3), uptime-kuma (2). $ git diff --stat stacks/ | tail -1 20 files changed, 45 insertions(+), 28 deletions(-) ``` ### Manual Verification No apply required — HCL comments only. Zero effect on any stack's plan output. Future audits: `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` must grow as new pod-owning resources are added. ## Reproduce locally 1. `cd infra && git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/` → expect 27 hits in 19 files 3. Grep any new `kubernetes_deployment` for the marker; absence = missing suppression. Closes: code-28m Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:15:51 +00:00
ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
}
resource "kubernetes_service" "annas-archive-stacks" {
metadata {
name = "annas-archive-stacks"
namespace = kubernetes_namespace.ebooks.metadata[0].name
labels = {
"app" = "annas-archive-stacks"
}
}
spec {
selector = {
app = "annas-archive-stacks"
}
port {
name = "http"
port = "80"
target_port = 7788
}
}
}
module "stacks_ingress" {
source = "../../modules/kubernetes/ingress_factory"
[infra] Auto-create Cloudflare DNS records from ingress_factory ## Context Deploying new services required manually adding hostnames to cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars — a separate file from the service stack. This was frequently forgotten, leaving services unreachable externally. ## This change: - Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory` modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP). - Simplify cloudflared tunnel from 100 per-hostname rules to wildcard `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing. - Add global Cloudflare provider via terragrunt.hcl (separate cloudflare_provider.tf with Vault-sourced API key). - Migrate 118 hostnames from centralized config.tfvars to per-service dns_type. 17 hostnames remain centrally managed (Helm ingresses, special cases). - Update docs, AGENTS.md, CLAUDE.md, dns.md runbook. ``` BEFORE AFTER config.tfvars (manual list) stacks/<svc>/main.tf | module "ingress" { v dns_type = "proxied" stacks/cloudflared/ } for_each = list | cloudflare_record auto-creates tunnel per-hostname cloudflare_record + annotation ``` ## What is NOT in this change: - Uptime Kuma monitor migration (still reads from config.tfvars) - 17 remaining centrally-managed hostnames (Helm, special cases) - Removal of allow_overwrite (keep until migration confirmed stable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:45:04 +00:00
dns_type = "proxied"
namespace = kubernetes_namespace.ebooks.metadata[0].name
name = "stacks"
service_name = "annas-archive-stacks"
tls_secret_name = var.tls_secret_name
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default false → unprotected) variable in `modules/kubernetes/ingress_factory` with `auth = string` enum (default "required" → fail-closed). Touches every ingress_factory caller so the audit decision is recorded explicitly in code. ingress_factory (Phase 3): - `auth = "required"`: standard Authentik forward-auth (the legacy `protected = true` semantic). - `auth = "public"`: forward-auth via the new `authentik-forward-auth-public` middleware → dedicated public outpost → guest auto-bind. Logged-in users keep their real identity. - `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost itself. - `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated ingresses don't need anti-AI noise; the auth flow already discourages bots). Audit pass (Phase 4) across 96 ingress_factory call sites: - 49 explicit `protected = true` → `auth = "required"` - 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3) - 64 previously-default (no protected line) → `auth = "required"` ADDED, then reviewed individually: * 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack, homepage, wrongmove UI, privatebin) → `auth = "none"` * 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC, xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich location ingestion, immich frame kiosk, headscale CP, send anonymous drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) → `auth = "none"` * Remaining ~33 → `auth = "required"` confirmed (admin tools, internal UIs, services without app-level auth) - Smoke-test promotions to `auth = "public"`: fire-planner public UI, k8s-portal API, insta2spotify callback. Three call sites in wrapper modules (`stacks/freedify/factory/`, `stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected` bool — they translate to `auth` internally, out of scope for this rename. Behavior change: previously-default ingresses now fail closed (require Authentik login) unless explicitly flipped to `auth = "none"` or `auth = "public"`. This is the audit goal — no more accidentally-unprotected surfaces. Sites that were intentionally public (Anubis content, native APIs, webhooks) are now explicitly recorded as `auth = "none"`. Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via `terraform fmt -recursive` during the audit. Behavior-neutral. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
auth = "required"
extra_annotations = {
"gethomepage.dev/enabled" = "false"
}
}
# Audiobookshelf Deployment
resource "kubernetes_deployment" "audiobookshelf" {
metadata {
name = "audiobookshelf"
namespace = kubernetes_namespace.ebooks.metadata[0].name
labels = {
app = "audiobookshelf"
tier = local.tiers.edge
}
annotations = {
"reloader.stakater.com/search" = "true"
}
}
spec {
replicas = 1
strategy {
type = "Recreate"
}
selector {
match_labels = {
app = "audiobookshelf"
}
}
template {
metadata {
labels = {
app = "audiobookshelf"
}
}
spec {
container {
image = "ghcr.io/advplyr/audiobookshelf:2.33.1"
name = "audiobookshelf"
port {
container_port = 80
}
liveness_probe {
http_get {
path = "/healthcheck"
port = 80
}
initial_delay_seconds = 15
period_seconds = 30
timeout_seconds = 5
failure_threshold = 5
}
readiness_probe {
http_get {
path = "/healthcheck"
port = 80
}
initial_delay_seconds = 5
period_seconds = 30
timeout_seconds = 5
failure_threshold = 3
}
volume_mount {
name = "audiobooks"
mount_path = "/audiobooks"
}
volume_mount {
name = "podcasts"
mount_path = "/podcasts"
}
volume_mount {
name = "config"
mount_path = "/config"
}
volume_mount {
name = "metadata"
mount_path = "/metadata"
}
resources {
requests = {
cpu = "15m"
memory = "64Mi"
}
limits = {
memory = "256Mi"
}
}
}
volume {
name = "audiobooks"
persistent_volume_claim {
claim_name = module.nfs_audiobookshelf_audiobooks_host.claim_name
}
}
volume {
name = "podcasts"
persistent_volume_claim {
claim_name = module.nfs_audiobookshelf_podcasts_host.claim_name
}
}
volume {
name = "config"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.abs_config_proxmox.metadata[0].name
}
}
volume {
name = "metadata"
persistent_volume_claim {
claim_name = module.nfs_audiobookshelf_metadata_host.claim_name
}
}
}
}
}
lifecycle {
[infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] ## Context Phase 1 of the state-drift consolidation audit (plan Wave 3) identified that the entire repo leans on a repeated `lifecycle { ignore_changes = [...dns_config] }` snippet to suppress Kyverno's admission-webhook dns_config mutation (the ndots=2 override that prevents NxDomain search-domain flooding). 27 occurrences across 19 stacks. Without this suppression, every pod-owning resource shows perpetual TF plan drift. The original plan proposed a shared `modules/kubernetes/kyverno_lifecycle/` module emitting the ignore-paths list as an output that stacks would consume in their `ignore_changes` blocks. That approach is architecturally impossible: Terraform's `ignore_changes` meta-argument accepts only static attribute paths — it rejects module outputs, locals, variables, and any expression (the HCL spec evaluates `lifecycle` before the regular expression graph). So a DRY module cannot exist. The canonical pattern IS the repeated snippet. What the snippet was missing was a *discoverability tag* so that (a) new resources can be validated for compliance, (b) the existing 27 sites can be grep'd in a single command, and (c) future maintainers understand the convention rather than each reinventing it. ## This change - Introduces `# KYVERNO_LIFECYCLE_V1` as the canonical marker comment. Attached inline on every `spec[0].template[0].spec[0].dns_config` line (or `spec[0].job_template[0].spec[0]...` for CronJobs) across all 27 existing suppression sites. - Documents the convention with rationale and copy-paste snippets in `AGENTS.md` → new "Kyverno Drift Suppression" section. - Expands the existing `.claude/CLAUDE.md` Kyverno ndots note to reference the marker and explain why the module approach is blocked. - Updates `_template/main.tf.example` so every new stack starts compliant. ## What is NOT in this change - The `kubernetes_manifest` Kyverno annotation drift (beads `code-seq`) — that is Phase B with a sibling `# KYVERNO_MANIFEST_V1` marker. - Behavioral changes — every `ignore_changes` list is byte-identical save for the inline comment. - The fallback module the original plan anticipated — skipped because Terraform rejects expressions in `ignore_changes`. - `terraform fmt` cleanup on adjacent unrelated blocks in three files (claude-agent-service, freedify/factory, hermes-agent). Reverted to keep this commit scoped to the convention rollout. ## Before / after Before (cannot distinguish accidental-forgotten from intentional-convention): ```hcl lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] } ``` After (greppable, self-documenting, discoverable by tooling): ```hcl lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1 } ``` ## Test Plan ### Automated ``` $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 27 $ git diff --stat | grep -E '\.(tf|tf\.example|md)$' | wc -l 21 # All code-file diffs are 1 insertion + 1 deletion per marker site, # except beads-server (3), ebooks (4), immich (3), uptime-kuma (2). $ git diff --stat stacks/ | tail -1 20 files changed, 45 insertions(+), 28 deletions(-) ``` ### Manual Verification No apply required — HCL comments only. Zero effect on any stack's plan output. Future audits: `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` must grow as new pod-owning resources are added. ## Reproduce locally 1. `cd infra && git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/` → expect 27 hits in 19 files 3. Grep any new `kubernetes_deployment` for the marker; absence = missing suppression. Closes: code-28m Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:15:51 +00:00
ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
}
resource "kubernetes_service" "audiobookshelf" {
metadata {
name = "audiobookshelf"
namespace = kubernetes_namespace.ebooks.metadata[0].name
labels = {
"app" = "audiobookshelf"
}
}
spec {
selector = {
app = "audiobookshelf"
}
port {
name = "http"
target_port = 80
port = 80
protocol = "TCP"
}
}
}
module "audiobookshelf_ingress" {
source = "../../modules/kubernetes/ingress_factory"
# auth = "app": Audiobookshelf has its own user/password login + API
# tokens used by the iOS/Android Audiobookshelf app. Authentik forward-auth
# was 302-ing the mobile clients; ABS's own auth gates users.
auth = "app"
[infra] Auto-create Cloudflare DNS records from ingress_factory ## Context Deploying new services required manually adding hostnames to cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars — a separate file from the service stack. This was frequently forgotten, leaving services unreachable externally. ## This change: - Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory` modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP). - Simplify cloudflared tunnel from 100 per-hostname rules to wildcard `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing. - Add global Cloudflare provider via terragrunt.hcl (separate cloudflare_provider.tf with Vault-sourced API key). - Migrate 118 hostnames from centralized config.tfvars to per-service dns_type. 17 hostnames remain centrally managed (Helm ingresses, special cases). - Update docs, AGENTS.md, CLAUDE.md, dns.md runbook. ``` BEFORE AFTER config.tfvars (manual list) stacks/<svc>/main.tf | module "ingress" { v dns_type = "proxied" stacks/cloudflared/ } for_each = list | cloudflare_record auto-creates tunnel per-hostname cloudflare_record + annotation ``` ## What is NOT in this change: - Uptime Kuma monitor migration (still reads from config.tfvars) - 17 remaining centrally-managed hostnames (Helm, special cases) - Removal of allow_overwrite (keep until migration confirmed stable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:45:04 +00:00
dns_type = "non-proxied"
namespace = kubernetes_namespace.ebooks.metadata[0].name
name = "audiobookshelf"
tls_secret_name = var.tls_secret_name
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "Audiobookshelf"
"gethomepage.dev/description" = "Audiobook library"
"gethomepage.dev/icon" = "audiobookshelf.png"
"gethomepage.dev/group" = "Media & Entertainment"
"gethomepage.dev/pod-selector" = ""
"gethomepage.dev/widget.type" = "audiobookshelf"
"gethomepage.dev/widget.url" = "http://audiobookshelf.ebooks.svc.cluster.local"
"gethomepage.dev/widget.key" = local.audiobookshelf_homepage_credentials["audiobookshelf"]["token"]
}
}
# Book-Search Deployment
resource "kubernetes_deployment" "book_search" {
metadata {
name = "book-search"
namespace = kubernetes_namespace.ebooks.metadata[0].name
labels = {
app = "book-search"
tier = local.tiers.edge
}
}
spec {
replicas = 1
selector {
match_labels = {
app = "book-search"
}
}
template {
metadata {
labels = {
app = "book-search"
}
}
spec {
container {
image = "viktorbarzin/book-search:latest"
image_pull_policy = "Always"
name = "book-search"
port {
container_port = 8000
}
env {
name = "QBITTORRENT_URL"
value = "http://qbittorrent.servarr.svc.cluster.local"
}
env {
name = "QBITTORRENT_PASS"
value_from {
secret_key_ref {
name = "servarr-secrets"
key = "qbittorrent_password"
}
}
}
env {
name = "AUDIOBOOKSHELF_URL"
value = "http://audiobookshelf.ebooks.svc.cluster.local"
}
env {
name = "AUDIOBOOKSHELF_TOKEN"
value_from {
secret_key_ref {
name = "servarr-secrets"
key = "audiobookshelf_api_token"
}
}
}
env {
name = "MAM_EMAIL"
value_from {
secret_key_ref {
name = "servarr-secrets"
key = "mam_email"
}
}
}
env {
name = "MAM_PASSWORD"
value_from {
secret_key_ref {
name = "servarr-secrets"
key = "mam_password"
}
}
}
env {
name = "CWA_INGEST_PATH"
value = "/cwa-book-ingest"
}
env {
name = "MAM_ID"
value_from {
secret_key_ref {
name = "servarr-secrets"
key = "mam_id"
optional = true
}
}
}
env {
name = "API_KEY"
value_from {
secret_key_ref {
name = "calibre-secrets"
key = "book_search_api_key"
}
}
}
env {
name = "SHORTCUT_ICLOUD_URL"
value = ""
}
env {
name = "STACKS_DB_PATH"
value = "/stacks-config/queue.db"
}
env {
name = "CALIBRE_WEB_USER"
value = "admin"
}
env {
name = "CALIBRE_WEB_PASS"
value_from {
secret_key_ref {
name = "calibre-secrets"
key = "calibre_web_password"
}
}
}
env {
mailserver: split healthcheck path off PROXY-aware listeners + book-search uses ClusterIP Two coordinated fixes for the same root cause: Postfix's smtpd_upstream_proxy_protocol listener fatals on every HAProxy health probe with `smtpd_peer_hostaddr_to_sockaddr: ... Servname not supported for ai_socktype` — the daemon respawns get throttled by postfix master, and real client connections that land mid-respawn time out. We saw this as ~50% timeout rate on public 587 from inside the cluster. Layer 1 (book-search) — stacks/ebooks/main.tf: SMTP_HOST mail.viktorbarzin.me → mailserver.mailserver.svc.cluster.local Internal services should use ClusterIP, not hairpin through pfSense+HAProxy. 12/12 OK in <28ms vs ~6/12 timeouts on the public path. Layer 2 (pfSense HAProxy) — stacks/mailserver + scripts/pfsense-haproxy-bootstrap.php: Add 3 non-PROXY healthcheck NodePorts to mailserver-proxy svc: 30145 → pod 25 (stock postscreen) 30146 → pod 465 (stock smtps) 30147 → pod 587 (stock submission) HAProxy uses `port <healthcheck-nodeport>` (per-server in advanced field) to redirect L4 health probes to those ports while real client traffic keeps going to 30125-30128 with PROXY v2. Result: 0 fatals/min (was 96), 30/30 probes OK on 587, e2e roundtrip 20.4s. Inter dropped 120000 → 5000 since log-spam concern is gone. `option smtpchk EHLO` was tried first but flapped against postscreen (multi-line greet + DNSBL silence + anti-pre-greet detection trip HAProxy's parser → L7RSP). Plain TCP accept-on-port check is sufficient for both submission and postscreen. Updated docs/runbooks/mailserver-pfsense-haproxy.md to reflect the new healthcheck path and mark the "Known warts" entry as resolved. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-05 19:45:33 +00:00
name = "SMTP_HOST"
# Use intra-cluster ClusterIP path — bypasses pfSense HAProxy +
# PROXY v2 (the public path hairpins through HAProxy:587 →
# NodePort → pod :5587 where Postfix's smtpd-proxy587 daemon
# crashes ~50% of HAProxy healthchecks with
# `smtpd_peer_hostaddr_to_sockaddr: ... Servname not supported`,
# producing intermittent 6s TCP timeouts for clients that land
# mid-respawn). The ClusterIP service points to pod port 587
# (stock submission daemon, no PROXY) and is rock-solid (12/12
# in <31ms vs 6/12 timeouts on the public path).
# See docs/runbooks/mailserver-pfsense-haproxy.md.
value = "mailserver.mailserver.svc.cluster.local"
}
env {
name = "SMTP_PORT"
value = "587"
}
env {
name = "SMTP_USER"
value = "calibre-web@viktorbarzin.me"
}
env {
name = "SMTP_FROM"
value = "Calibre-Web <calibre-web@viktorbarzin.me>"
}
env {
name = "SMTP_PASS"
value_from {
secret_key_ref {
name = "calibre-secrets"
key = "smtp_password"
}
}
}
env {
name = "SLACK_WEBHOOK_URL"
value_from {
secret_key_ref {
name = "calibre-secrets"
key = "slack_webhook_url"
}
}
}
resources {
requests = {
cpu = "10m"
memory = "128Mi"
}
limits = {
memory = "512Mi"
}
}
liveness_probe {
http_get {
path = "/health"
port = 8000
}
initial_delay_seconds = 10
period_seconds = 30
}
volume_mount {
name = "cwa-ingest"
mount_path = "/cwa-book-ingest"
}
volume_mount {
name = "audiobooks"
mount_path = "/audiobooks"
}
volume_mount {
name = "stacks-config"
mount_path = "/stacks-config"
}
volume_mount {
name = "calibre-library"
mount_path = "/calibre-library"
}
}
volume {
name = "cwa-ingest"
persistent_volume_claim {
claim_name = module.nfs_calibre_ingest_host.claim_name
}
}
volume {
name = "audiobooks"
persistent_volume_claim {
claim_name = module.nfs_audiobookshelf_audiobooks_host.claim_name
}
}
volume {
name = "calibre-library"
persistent_volume_claim {
claim_name = module.nfs_calibre_library_host.claim_name
}
}
volume {
name = "stacks-config"
persistent_volume_claim {
claim_name = module.nfs_calibre_stacks_config_host.claim_name
}
}
}
}
}
lifecycle {
[infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] ## Context Phase 1 of the state-drift consolidation audit (plan Wave 3) identified that the entire repo leans on a repeated `lifecycle { ignore_changes = [...dns_config] }` snippet to suppress Kyverno's admission-webhook dns_config mutation (the ndots=2 override that prevents NxDomain search-domain flooding). 27 occurrences across 19 stacks. Without this suppression, every pod-owning resource shows perpetual TF plan drift. The original plan proposed a shared `modules/kubernetes/kyverno_lifecycle/` module emitting the ignore-paths list as an output that stacks would consume in their `ignore_changes` blocks. That approach is architecturally impossible: Terraform's `ignore_changes` meta-argument accepts only static attribute paths — it rejects module outputs, locals, variables, and any expression (the HCL spec evaluates `lifecycle` before the regular expression graph). So a DRY module cannot exist. The canonical pattern IS the repeated snippet. What the snippet was missing was a *discoverability tag* so that (a) new resources can be validated for compliance, (b) the existing 27 sites can be grep'd in a single command, and (c) future maintainers understand the convention rather than each reinventing it. ## This change - Introduces `# KYVERNO_LIFECYCLE_V1` as the canonical marker comment. Attached inline on every `spec[0].template[0].spec[0].dns_config` line (or `spec[0].job_template[0].spec[0]...` for CronJobs) across all 27 existing suppression sites. - Documents the convention with rationale and copy-paste snippets in `AGENTS.md` → new "Kyverno Drift Suppression" section. - Expands the existing `.claude/CLAUDE.md` Kyverno ndots note to reference the marker and explain why the module approach is blocked. - Updates `_template/main.tf.example` so every new stack starts compliant. ## What is NOT in this change - The `kubernetes_manifest` Kyverno annotation drift (beads `code-seq`) — that is Phase B with a sibling `# KYVERNO_MANIFEST_V1` marker. - Behavioral changes — every `ignore_changes` list is byte-identical save for the inline comment. - The fallback module the original plan anticipated — skipped because Terraform rejects expressions in `ignore_changes`. - `terraform fmt` cleanup on adjacent unrelated blocks in three files (claude-agent-service, freedify/factory, hermes-agent). Reverted to keep this commit scoped to the convention rollout. ## Before / after Before (cannot distinguish accidental-forgotten from intentional-convention): ```hcl lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] } ``` After (greppable, self-documenting, discoverable by tooling): ```hcl lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1 } ``` ## Test Plan ### Automated ``` $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 27 $ git diff --stat | grep -E '\.(tf|tf\.example|md)$' | wc -l 21 # All code-file diffs are 1 insertion + 1 deletion per marker site, # except beads-server (3), ebooks (4), immich (3), uptime-kuma (2). $ git diff --stat stacks/ | tail -1 20 files changed, 45 insertions(+), 28 deletions(-) ``` ### Manual Verification No apply required — HCL comments only. Zero effect on any stack's plan output. Future audits: `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` must grow as new pod-owning resources are added. ## Reproduce locally 1. `cd infra && git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/` → expect 27 hits in 19 files 3. Grep any new `kubernetes_deployment` for the marker; absence = missing suppression. Closes: code-28m Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:15:51 +00:00
ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
}
resource "kubernetes_service" "book_search" {
metadata {
name = "book-search"
namespace = kubernetes_namespace.ebooks.metadata[0].name
labels = {
app = "book-search"
}
}
spec {
selector = {
app = "book-search"
}
port {
name = "http"
port = 80
target_port = 8000
}
}
}
module "book_search_ingress" {
source = "../../modules/kubernetes/ingress_factory"
[infra] Auto-create Cloudflare DNS records from ingress_factory ## Context Deploying new services required manually adding hostnames to cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars — a separate file from the service stack. This was frequently forgotten, leaving services unreachable externally. ## This change: - Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory` modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP). - Simplify cloudflared tunnel from 100 per-hostname rules to wildcard `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing. - Add global Cloudflare provider via terragrunt.hcl (separate cloudflare_provider.tf with Vault-sourced API key). - Migrate 118 hostnames from centralized config.tfvars to per-service dns_type. 17 hostnames remain centrally managed (Helm ingresses, special cases). - Update docs, AGENTS.md, CLAUDE.md, dns.md runbook. ``` BEFORE AFTER config.tfvars (manual list) stacks/<svc>/main.tf | module "ingress" { v dns_type = "proxied" stacks/cloudflared/ } for_each = list | cloudflare_record auto-creates tunnel per-hostname cloudflare_record + annotation ``` ## What is NOT in this change: - Uptime Kuma monitor migration (still reads from config.tfvars) - 17 remaining centrally-managed hostnames (Helm, special cases) - Removal of allow_overwrite (keep until migration confirmed stable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:45:04 +00:00
dns_type = "proxied"
namespace = kubernetes_namespace.ebooks.metadata[0].name
name = "book-search"
tls_secret_name = var.tls_secret_name
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default false → unprotected) variable in `modules/kubernetes/ingress_factory` with `auth = string` enum (default "required" → fail-closed). Touches every ingress_factory caller so the audit decision is recorded explicitly in code. ingress_factory (Phase 3): - `auth = "required"`: standard Authentik forward-auth (the legacy `protected = true` semantic). - `auth = "public"`: forward-auth via the new `authentik-forward-auth-public` middleware → dedicated public outpost → guest auto-bind. Logged-in users keep their real identity. - `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost itself. - `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated ingresses don't need anti-AI noise; the auth flow already discourages bots). Audit pass (Phase 4) across 96 ingress_factory call sites: - 49 explicit `protected = true` → `auth = "required"` - 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3) - 64 previously-default (no protected line) → `auth = "required"` ADDED, then reviewed individually: * 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack, homepage, wrongmove UI, privatebin) → `auth = "none"` * 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC, xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich location ingestion, immich frame kiosk, headscale CP, send anonymous drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) → `auth = "none"` * Remaining ~33 → `auth = "required"` confirmed (admin tools, internal UIs, services without app-level auth) - Smoke-test promotions to `auth = "public"`: fire-planner public UI, k8s-portal API, insta2spotify callback. Three call sites in wrapper modules (`stacks/freedify/factory/`, `stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected` bool — they translate to `auth` internally, out of scope for this rename. Behavior change: previously-default ingresses now fail closed (require Authentik login) unless explicitly flipped to `auth = "none"` or `auth = "public"`. This is the audit goal — no more accidentally-unprotected surfaces. Sites that were intentionally public (Anubis content, native APIs, webhooks) are now explicitly recorded as `auth = "none"`. Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via `terraform fmt -recursive` during the audit. Behavior-neutral. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
auth = "required"
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "Book Search"
"gethomepage.dev/description" = "Search & download books"
"gethomepage.dev/icon" = "audiobookshelf.png"
"gethomepage.dev/group" = "Media & Entertainment"
"gethomepage.dev/pod-selector" = ""
}
}
# API ingress - unprotected (API key auth handled by backend)
module "book_search_api_ingress" {
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.ebooks.metadata[0].name
name = "book-search-api"
host = "book-search"
service_name = "book-search"
tls_secret_name = var.tls_secret_name
infra: document auth = "app|none" tier on every legacy ingress Sweep through the 30+ stacks that predated the auth = "app" tier and were tagged auth = "none" without a comment explaining why they weren't behind Authentik. Each is now self-documenting at the call site, so the tg-level anti-exposure guard passes and future readers don't have to reverse-engineer the intent. Flipped 6 stacks from "none" to "app" — their backends have their own user auth and the new tier records that more accurately: - navidrome (Subsonic user/password) - ntfy (deny-all default + user.db tokens) - nextcloud (WebDAV/CalDAV/CardDAV app passwords) - vaultwarden (Bitwarden-compatible token auth) - headscale (OIDC + preauth keys for Tailscale nodes) - paperless-ngx (app-layer login + API tokens) Kept "none" with a comment on the rest — they're genuinely public, webhook receivers, native-protocol endpoints, OAuth callbacks, or Anubis-fronted: authentik (×2 + guest outpost), beads-server (dolt), claude-memory (bearer-token MCP), dawarich, ebooks/book-search-api, fire-planner /api, forgejo (git/OCI native clients), frigate (HA integration), immich/frame, insta2spotify /api, instagram-poster (meta fetcher), k8s-portal, matrix (native bearer), monitoring×2 (HA REST scrapes), n8n (webhooks), nvidia, onlyoffice (JWT), owntracks (HTTP Basic), postiz, privatebin (client-side enc), rybbit (analytics tracker), send (E2E file drop), tuya-bridge (API key), vault (own auth + CLI), webhook_handler, woodpecker (forgejo webhooks + OAuth), xray (×3 VPN transports). real-estate-crawler/main.tf:400 already had its comment from a prior edit — not touched here. No live state changes — auth = "app" produces the same middleware chain as auth = "none" (verified earlier this session). This commit is purely documentation + intent-tagging.
2026-05-11 19:25:48 +00:00
# auth = "none": Book Search API endpoints — API key auth handled by backend; forward-auth would block downloads.
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default false → unprotected) variable in `modules/kubernetes/ingress_factory` with `auth = string` enum (default "required" → fail-closed). Touches every ingress_factory caller so the audit decision is recorded explicitly in code. ingress_factory (Phase 3): - `auth = "required"`: standard Authentik forward-auth (the legacy `protected = true` semantic). - `auth = "public"`: forward-auth via the new `authentik-forward-auth-public` middleware → dedicated public outpost → guest auto-bind. Logged-in users keep their real identity. - `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost itself. - `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated ingresses don't need anti-AI noise; the auth flow already discourages bots). Audit pass (Phase 4) across 96 ingress_factory call sites: - 49 explicit `protected = true` → `auth = "required"` - 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3) - 64 previously-default (no protected line) → `auth = "required"` ADDED, then reviewed individually: * 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack, homepage, wrongmove UI, privatebin) → `auth = "none"` * 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC, xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich location ingestion, immich frame kiosk, headscale CP, send anonymous drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) → `auth = "none"` * Remaining ~33 → `auth = "required"` confirmed (admin tools, internal UIs, services without app-level auth) - Smoke-test promotions to `auth = "public"`: fire-planner public UI, k8s-portal API, insta2spotify callback. Three call sites in wrapper modules (`stacks/freedify/factory/`, `stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected` bool — they translate to `auth` internally, out of scope for this rename. Behavior change: previously-default ingresses now fail closed (require Authentik login) unless explicitly flipped to `auth = "none"` or `auth = "public"`. This is the audit goal — no more accidentally-unprotected surfaces. Sites that were intentionally public (Anubis content, native APIs, webhooks) are now explicitly recorded as `auth = "none"`. Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via `terraform fmt -recursive` during the audit. Behavior-neutral. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
auth = "none"
ingress_path = ["/api/download-url", "/api/download-status", "/api/send-to-kindle", "/shortcut"]
}