Two-tier state architecture:
- Tier 0 (infra, platform, cnpg, vault, dbaas, external-secrets): local
state with SOPS encryption in git — unchanged, required for bootstrap.
- Tier 1 (105 app stacks): PostgreSQL backend on CNPG cluster at
10.0.20.200:5432/terraform_state with native pg_advisory_lock.
Motivation: multi-operator friction (every workstation needed SOPS + age +
git-crypt), bootstrap complexity for new operators, and headless agents/CI
needing the full encryption toolchain just to read state.
Changes:
- terragrunt.hcl: conditional backend (local vs pg) based on tier0 list
- scripts/tg: tier detection, auto-fetch PG creds from Vault for Tier 1,
skip SOPS and Vault KV locking for Tier 1 stacks
- scripts/state-sync: tier-aware encrypt/decrypt (skips Tier 1)
- scripts/migrate-state-to-pg: one-shot migration script (idempotent)
- stacks/vault/main.tf: pg-terraform-state static role + K8s auth role
for claude-agent namespace
- stacks/dbaas: terraform_state DB creation + MetalLB LoadBalancer
service on shared IP 10.0.20.200
- Deleted 107 .tfstate.enc files for migrated Tier 1 stacks
- Cleaned up per-stack tiers.tf (now generated by root terragrunt.hcl)
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## Context
Deploying new services required manually adding hostnames to
cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars —
a separate file from the service stack. This was frequently forgotten,
leaving services unreachable externally.
## This change:
- Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory`
modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates
the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP).
- Simplify cloudflared tunnel from 100 per-hostname rules to wildcard
`*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing.
- Add global Cloudflare provider via terragrunt.hcl (separate
cloudflare_provider.tf with Vault-sourced API key).
- Migrate 118 hostnames from centralized config.tfvars to per-service
dns_type. 17 hostnames remain centrally managed (Helm ingresses,
special cases).
- Update docs, AGENTS.md, CLAUDE.md, dns.md runbook.
```
BEFORE AFTER
config.tfvars (manual list) stacks/<svc>/main.tf
| module "ingress" {
v dns_type = "proxied"
stacks/cloudflared/ }
for_each = list |
cloudflare_record auto-creates
tunnel per-hostname cloudflare_record + annotation
```
## What is NOT in this change:
- Uptime Kuma monitor migration (still reads from config.tfvars)
- 17 remaining centrally-managed hostnames (Helm, special cases)
- Removal of allow_overwrite (keep until migration confirmed stable)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The backup CronJob was stuck in ContainerCreating because it couldn't
mount the proxmox-lvm RWO PVC from a different node. Fixed by:
- Adding pod_affinity to co-locate with the headscale pod (same node)
- Mounting both data PVC (read-only) and NFS backup PVC (write)
- Adding integrity check pattern from vaultwarden backup
- Setting concurrency_policy=Replace and ttl_seconds_after_finished=10
- Auth-proxy fallback now sets ALL X-authentik-* headers (username, uid,
email, name, groups) to prevent client-supplied header spoofing when
Authentik is down. Previously only username was set, allowing a malicious
client to inject fake X-authentik-groups.
- Catch-all IngressRoute restricted to *.viktorbarzin.me only. Non-matching
domains no longer get the wildcard cert served (TLS info leak).
- Added rate-limit and CrowdSec middleware to catch-all IngressRoute.
- Added rate-limit middleware to Headscale DERP IngressRoute.
- Rotated auth-proxy basicAuth credentials (bcrypt cost 5 → 12, admin → emergency-admin).
- Created Authentik brute-force reputation policy (threshold -5, IP+username).
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
SQLite-backed services. Deployments updated to use new block storage
PVCs. Old NFS modules retained for 1-week rollback.
Services: ntfy, freshrss, insta2spotify, actualbudget (x3),
wealthfolio, navidrome (DB only), audiobookshelf config,
headscale, forgejo, uptime-kuma.
Also: set Recreate strategy on ntfy, forgejo, insta2spotify,
wealthfolio (required for RWO volumes).
- Remove viktorbarzin.me from split DNS (same IPs as public DNS,
was adding unnecessary tunnel overhead for every DNS query)
- Narrow reverse DNS split scope from 10.0.0.0/8 → 10.0.20.0/24
and 10.0.10.0/24 only; 192.168.0.0/16 → 192.168.1.0/24 only
- Add extra_records for key internal services (technitium, k8s-master)
for instant MagicDNS resolution without tunnel roundtrip
- Replace full Tailscale DERP map (29 regions) with curated set:
home + 8 European + 5 global fallback DERPs (14 total)
- Add custom derp.yaml to ConfigMap, sourced from Vault
Port 80 DERP dropped — Traefik's global HTTP→HTTPS redirect
prevents non-TLS DERP upgrades on the web entrypoint.
- Add SQLite backup CronJob (every 6h to NFS for cloud sync pickup)
- Move headscale-ui secrets (COOKIE_SECRET, ROOT_API_KEY) from hardcoded
values to Vault-managed secrets
- Add DERP IPv6 address (2001:470:6e:43d::2) for IPv6-capable clients
- Clean up stale test nodes, duplicate users, rename "localhost" nodes
Also updated headscale_config in Vault to include DERP ipv6 field
and headscale_ui_cookie_secret/headscale_ui_api_key secrets.
CrowdSec, rate limiting, anti-AI, and error pages middlewares were
interfering with the Upgrade: DERP protocol handshake. Also updated
Headscale ACL in Vault to allow tailnet DNS traffic to Technitium
(10.0.20.200:53).
- Expose STUN port 3479/UDP on container and LoadBalancer service
- Upgrade headscale from 0.23.0 to 0.28.0
- Vault config updated: auto DERP region with ipv4 field, ISP router
port forward for UDP 3479 added
Home DERP now shows ~3ms latency and is selected as nearest relay.
Phase 3: all 27 platform modules now run as independent stacks.
Platform reduced to empty shell (outputs only) for backward compat
with 72 app stacks that declare dependency "platform".
Fixed technitium cross-module dashboard reference by copying file.
Woodpecker pipeline applies all 27+1 stacks in parallel via loop.
All applied with zero destroys.