2026-02-22 13:01:37 +00:00
|
|
|
# Root Terragrunt configuration
|
|
|
|
|
# Provides DRY provider, backend, and variable loading for all stacks.
|
|
|
|
|
|
[infra] Migrate Terraform state from local SOPS to PostgreSQL backend
Two-tier state architecture:
- Tier 0 (infra, platform, cnpg, vault, dbaas, external-secrets): local
state with SOPS encryption in git — unchanged, required for bootstrap.
- Tier 1 (105 app stacks): PostgreSQL backend on CNPG cluster at
10.0.20.200:5432/terraform_state with native pg_advisory_lock.
Motivation: multi-operator friction (every workstation needed SOPS + age +
git-crypt), bootstrap complexity for new operators, and headless agents/CI
needing the full encryption toolchain just to read state.
Changes:
- terragrunt.hcl: conditional backend (local vs pg) based on tier0 list
- scripts/tg: tier detection, auto-fetch PG creds from Vault for Tier 1,
skip SOPS and Vault KV locking for Tier 1 stacks
- scripts/state-sync: tier-aware encrypt/decrypt (skips Tier 1)
- scripts/migrate-state-to-pg: one-shot migration script (idempotent)
- stacks/vault/main.tf: pg-terraform-state static role + K8s auth role
for claude-agent namespace
- stacks/dbaas: terraform_state DB creation + MetalLB LoadBalancer
service on shared IP 10.0.20.200
- Deleted 107 .tfstate.enc files for migrated Tier 1 stacks
- Cleaned up per-stack tiers.tf (now generated by root terragrunt.hcl)
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 19:33:12 +00:00
|
|
|
# Two-tier state backend:
|
|
|
|
|
# Tier 0 (bootstrap): local state, SOPS-encrypted in git — must exist before PG is reachable.
|
|
|
|
|
# Tier 1 (everything else): PG backend on CNPG cluster, native pg_advisory_lock.
|
|
|
|
|
locals {
|
|
|
|
|
tier0_stacks = ["infra", "platform", "cnpg", "vault", "dbaas", "external-secrets"]
|
|
|
|
|
stack_name = replace(path_relative_to_include(), "stacks/", "")
|
|
|
|
|
is_tier0 = contains(local.tier0_stacks, local.stack_name)
|
|
|
|
|
}
|
|
|
|
|
|
2026-02-22 13:01:37 +00:00
|
|
|
remote_state {
|
[infra] Migrate Terraform state from local SOPS to PostgreSQL backend
Two-tier state architecture:
- Tier 0 (infra, platform, cnpg, vault, dbaas, external-secrets): local
state with SOPS encryption in git — unchanged, required for bootstrap.
- Tier 1 (105 app stacks): PostgreSQL backend on CNPG cluster at
10.0.20.200:5432/terraform_state with native pg_advisory_lock.
Motivation: multi-operator friction (every workstation needed SOPS + age +
git-crypt), bootstrap complexity for new operators, and headless agents/CI
needing the full encryption toolchain just to read state.
Changes:
- terragrunt.hcl: conditional backend (local vs pg) based on tier0 list
- scripts/tg: tier detection, auto-fetch PG creds from Vault for Tier 1,
skip SOPS and Vault KV locking for Tier 1 stacks
- scripts/state-sync: tier-aware encrypt/decrypt (skips Tier 1)
- scripts/migrate-state-to-pg: one-shot migration script (idempotent)
- stacks/vault/main.tf: pg-terraform-state static role + K8s auth role
for claude-agent namespace
- stacks/dbaas: terraform_state DB creation + MetalLB LoadBalancer
service on shared IP 10.0.20.200
- Deleted 107 .tfstate.enc files for migrated Tier 1 stacks
- Cleaned up per-stack tiers.tf (now generated by root terragrunt.hcl)
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 19:33:12 +00:00
|
|
|
backend = local.is_tier0 ? "local" : "pg"
|
2026-02-22 13:01:37 +00:00
|
|
|
generate = {
|
|
|
|
|
path = "backend.tf"
|
|
|
|
|
if_exists = "overwrite_terragrunt"
|
|
|
|
|
}
|
[infra] Migrate Terraform state from local SOPS to PostgreSQL backend
Two-tier state architecture:
- Tier 0 (infra, platform, cnpg, vault, dbaas, external-secrets): local
state with SOPS encryption in git — unchanged, required for bootstrap.
- Tier 1 (105 app stacks): PostgreSQL backend on CNPG cluster at
10.0.20.200:5432/terraform_state with native pg_advisory_lock.
Motivation: multi-operator friction (every workstation needed SOPS + age +
git-crypt), bootstrap complexity for new operators, and headless agents/CI
needing the full encryption toolchain just to read state.
Changes:
- terragrunt.hcl: conditional backend (local vs pg) based on tier0 list
- scripts/tg: tier detection, auto-fetch PG creds from Vault for Tier 1,
skip SOPS and Vault KV locking for Tier 1 stacks
- scripts/state-sync: tier-aware encrypt/decrypt (skips Tier 1)
- scripts/migrate-state-to-pg: one-shot migration script (idempotent)
- stacks/vault/main.tf: pg-terraform-state static role + K8s auth role
for claude-agent namespace
- stacks/dbaas: terraform_state DB creation + MetalLB LoadBalancer
service on shared IP 10.0.20.200
- Deleted 107 .tfstate.enc files for migrated Tier 1 stacks
- Cleaned up per-stack tiers.tf (now generated by root terragrunt.hcl)
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 19:33:12 +00:00
|
|
|
config = local.is_tier0 ? {
|
2026-02-22 13:01:37 +00:00
|
|
|
path = "${get_repo_root()}/state/${path_relative_to_include()}/terraform.tfstate"
|
[infra] Migrate Terraform state from local SOPS to PostgreSQL backend
Two-tier state architecture:
- Tier 0 (infra, platform, cnpg, vault, dbaas, external-secrets): local
state with SOPS encryption in git — unchanged, required for bootstrap.
- Tier 1 (105 app stacks): PostgreSQL backend on CNPG cluster at
10.0.20.200:5432/terraform_state with native pg_advisory_lock.
Motivation: multi-operator friction (every workstation needed SOPS + age +
git-crypt), bootstrap complexity for new operators, and headless agents/CI
needing the full encryption toolchain just to read state.
Changes:
- terragrunt.hcl: conditional backend (local vs pg) based on tier0 list
- scripts/tg: tier detection, auto-fetch PG creds from Vault for Tier 1,
skip SOPS and Vault KV locking for Tier 1 stacks
- scripts/state-sync: tier-aware encrypt/decrypt (skips Tier 1)
- scripts/migrate-state-to-pg: one-shot migration script (idempotent)
- stacks/vault/main.tf: pg-terraform-state static role + K8s auth role
for claude-agent namespace
- stacks/dbaas: terraform_state DB creation + MetalLB LoadBalancer
service on shared IP 10.0.20.200
- Deleted 107 .tfstate.enc files for migrated Tier 1 stacks
- Cleaned up per-stack tiers.tf (now generated by root terragrunt.hcl)
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 19:33:12 +00:00
|
|
|
} : {
|
|
|
|
|
conn_str = get_env("PG_CONN_STR", "")
|
|
|
|
|
schema_name = local.stack_name
|
2026-02-22 13:01:37 +00:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-24 11:14:06 +02:00
|
|
|
# Load config.tfvars (plaintext). Secrets come from Vault KV — authenticate via `vault login -method=oidc`.
|
2026-02-22 13:01:37 +00:00
|
|
|
terraform {
|
|
|
|
|
extra_arguments "common_vars" {
|
|
|
|
|
commands = get_terraform_commands_that_need_vars()
|
|
|
|
|
required_var_files = [
|
2026-03-07 14:16:28 +00:00
|
|
|
"${get_repo_root()}/config.tfvars"
|
|
|
|
|
]
|
2026-02-22 13:01:37 +00:00
|
|
|
}
|
|
|
|
|
|
2026-03-17 22:37:56 +00:00
|
|
|
extra_arguments "no_backup" {
|
|
|
|
|
commands = ["apply", "plan", "destroy", "import"]
|
|
|
|
|
arguments = ["-backup=-"]
|
|
|
|
|
}
|
|
|
|
|
|
2026-02-22 13:01:37 +00:00
|
|
|
extra_arguments "kube_config" {
|
|
|
|
|
commands = get_terraform_commands_that_need_vars()
|
|
|
|
|
arguments = [
|
|
|
|
|
"-var", "kube_config_path=${get_repo_root()}/config"
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-04-16 13:45:04 +00:00
|
|
|
# Generate kubernetes + helm + cloudflare providers for all stacks.
|
2026-02-22 13:01:37 +00:00
|
|
|
# The infra stack overrides this to add the proxmox provider.
|
|
|
|
|
generate "k8s_providers" {
|
|
|
|
|
path = "providers.tf"
|
|
|
|
|
if_exists = "overwrite_terragrunt"
|
|
|
|
|
contents = <<EOF
|
2026-03-14 17:15:48 +00:00
|
|
|
terraform {
|
|
|
|
|
required_providers {
|
|
|
|
|
vault = {
|
|
|
|
|
source = "hashicorp/vault"
|
|
|
|
|
version = "~> 4.0"
|
|
|
|
|
}
|
2026-04-16 13:45:04 +00:00
|
|
|
cloudflare = {
|
|
|
|
|
source = "cloudflare/cloudflare"
|
|
|
|
|
version = "~> 4"
|
|
|
|
|
}
|
2026-03-14 17:15:48 +00:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2026-02-22 13:01:37 +00:00
|
|
|
variable "kube_config_path" {
|
|
|
|
|
type = string
|
|
|
|
|
default = "~/.kube/config"
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
provider "kubernetes" {
|
|
|
|
|
config_path = var.kube_config_path
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
provider "helm" {
|
2026-02-22 13:35:10 +00:00
|
|
|
kubernetes = {
|
2026-02-22 13:01:37 +00:00
|
|
|
config_path = var.kube_config_path
|
|
|
|
|
}
|
|
|
|
|
}
|
2026-03-14 17:15:48 +00:00
|
|
|
|
|
|
|
|
provider "vault" {
|
|
|
|
|
address = "https://vault.viktorbarzin.me"
|
|
|
|
|
skip_child_token = true
|
|
|
|
|
}
|
2026-02-22 13:01:37 +00:00
|
|
|
EOF
|
|
|
|
|
}
|
[ci skip] Infrastructure hardening: security, monitoring, reliability, maintainability
Phase 1 - Critical Security:
- Netbox: move hardcoded DB/superuser passwords to variables
- MeshCentral: disable public registration, add Authentik auth
- Traefik: disable insecure API dashboard (api.insecure=false)
- Traefik: configure forwarded headers with Cloudflare trusted IPs
Phase 2 - Security Hardening:
- Add security headers middleware (HSTS, X-Frame-Options, nosniff, etc.)
- Add Kyverno pod security policies in audit mode (privileged, host
namespaces, SYS_ADMIN, trusted registries)
- Tighten rate limiting (avg=10, burst=50)
- Add Authentik protection to grampsweb
Phase 3 - Monitoring & Alerting:
- Add critical service alerts (PostgreSQL, MySQL, Redis, Headscale,
Authentik, Loki)
- Increase Loki retention from 7 to 30 days (720h)
- Add predictive PV filling alert (predict_linear)
- Re-enable Hackmd and Privatebin down alerts
Phase 4 - Reliability:
- Add resource requests/limits to Redis, DBaaS, Technitium, Headscale,
Vaultwarden, Uptime Kuma
- Increase Alloy DaemonSet memory to 512Mi/1Gi
Phase 6 - Maintainability:
- Extract duplicated tiers locals to terragrunt.hcl generate block
(removed from 67 stacks)
- Replace hardcoded NFS IP 10.0.10.15 with var.nfs_server (114
instances across 63 files)
- Replace hardcoded Redis/PostgreSQL/MySQL/Ollama/mail host references
with variables across ~35 stacks
- Migrate xray raw ingress resources to ingress_factory modules
2026-02-23 22:05:28 +00:00
|
|
|
|
2026-04-16 13:45:04 +00:00
|
|
|
# Generate Cloudflare provider config (separate file to avoid conflicts
|
|
|
|
|
# with stacks that override providers.tf, e.g. infra stack).
|
|
|
|
|
# DNS records are created per-service via ingress_factory's dns_type param.
|
|
|
|
|
generate "cloudflare_provider" {
|
|
|
|
|
path = "cloudflare_provider.tf"
|
|
|
|
|
if_exists = "overwrite_terragrunt"
|
|
|
|
|
contents = <<EOF
|
|
|
|
|
data "vault_kv_secret_v2" "cf_platform" {
|
|
|
|
|
mount = "secret"
|
|
|
|
|
name = "platform"
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
provider "cloudflare" {
|
|
|
|
|
api_key = data.vault_kv_secret_v2.cf_platform.data["cloudflare_api_key"]
|
|
|
|
|
email = "vbarzin@gmail.com"
|
|
|
|
|
}
|
|
|
|
|
EOF
|
|
|
|
|
}
|
|
|
|
|
|
[ci skip] Infrastructure hardening: security, monitoring, reliability, maintainability
Phase 1 - Critical Security:
- Netbox: move hardcoded DB/superuser passwords to variables
- MeshCentral: disable public registration, add Authentik auth
- Traefik: disable insecure API dashboard (api.insecure=false)
- Traefik: configure forwarded headers with Cloudflare trusted IPs
Phase 2 - Security Hardening:
- Add security headers middleware (HSTS, X-Frame-Options, nosniff, etc.)
- Add Kyverno pod security policies in audit mode (privileged, host
namespaces, SYS_ADMIN, trusted registries)
- Tighten rate limiting (avg=10, burst=50)
- Add Authentik protection to grampsweb
Phase 3 - Monitoring & Alerting:
- Add critical service alerts (PostgreSQL, MySQL, Redis, Headscale,
Authentik, Loki)
- Increase Loki retention from 7 to 30 days (720h)
- Add predictive PV filling alert (predict_linear)
- Re-enable Hackmd and Privatebin down alerts
Phase 4 - Reliability:
- Add resource requests/limits to Redis, DBaaS, Technitium, Headscale,
Vaultwarden, Uptime Kuma
- Increase Alloy DaemonSet memory to 512Mi/1Gi
Phase 6 - Maintainability:
- Extract duplicated tiers locals to terragrunt.hcl generate block
(removed from 67 stacks)
- Replace hardcoded NFS IP 10.0.10.15 with var.nfs_server (114
instances across 63 files)
- Replace hardcoded Redis/PostgreSQL/MySQL/Ollama/mail host references
with variables across ~35 stacks
- Migrate xray raw ingress resources to ingress_factory modules
2026-02-23 22:05:28 +00:00
|
|
|
# Generate shared tiers locals for all stacks.
|
|
|
|
|
# Previously duplicated in 67+ stacks; now defined once here.
|
|
|
|
|
generate "tiers" {
|
|
|
|
|
path = "tiers.tf"
|
|
|
|
|
if_exists = "overwrite_terragrunt"
|
|
|
|
|
contents = <<EOF
|
|
|
|
|
locals {
|
|
|
|
|
tiers = {
|
|
|
|
|
core = "0-core"
|
|
|
|
|
cluster = "1-cluster"
|
|
|
|
|
gpu = "2-gpu"
|
|
|
|
|
edge = "3-edge"
|
|
|
|
|
aux = "4-aux"
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
EOF
|
|
|
|
|
}
|