diff --git a/docs/plans/2026-02-22-terragrunt-migration-plan.md b/docs/plans/2026-02-22-terragrunt-migration-plan.md new file mode 100644 index 00000000..fd1e9446 --- /dev/null +++ b/docs/plans/2026-02-22-terragrunt-migration-plan.md @@ -0,0 +1,1235 @@ +# Terragrunt Migration Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Migrate the monolithic Terraform setup (857 resources, 15MB state) to Terragrunt with per-service state isolation, proper DAG dependencies, and changed-stack CI/CD detection. + +**Architecture:** Flat stacks under `stacks/` with thin `main.tf` wrappers calling existing modules. Root `terragrunt.hcl` provides DRY provider/backend config. Platform stack groups ~20 core services and exports outputs (redis_host, postgresql_host, etc.) consumed by ~65 per-service stacks via Terragrunt `dependency` blocks. + +**Tech Stack:** Terragrunt, Terraform 1.14.x, local state backend, Drone CI + +**Design Doc:** `docs/plans/2026-02-22-terragrunt-migration-design.md` + +--- + +## Task 1: Install Terragrunt and Create Directory Skeleton + +**Files:** +- Create: `stacks/` directory +- Create: `state/` directory +- Create: `.gitignore` updates + +**Step 1: Install Terragrunt** + +Run: +```bash +brew install terragrunt +``` +Expected: Terragrunt available at `terragrunt --version` + +**Step 2: Create directory skeleton** + +Run: +```bash +mkdir -p stacks/{infra,platform} +mkdir -p state +``` + +**Step 3: Update `.gitignore`** + +Add to `.gitignore`: +``` +# Terragrunt +.terragrunt-cache/ +state/ +``` + +The `state/` directory contains per-stack terraform state files. These are local-only and should not be committed (they contain resource IDs and potentially sensitive data, same as the current `terraform.tfstate`). + +**Step 4: Commit** + +```bash +git add stacks/ .gitignore +git commit -m "[ci skip] Add Terragrunt directory skeleton" +``` + +--- + +## Task 2: Create Root Terragrunt Configuration + +**Files:** +- Create: `terragrunt.hcl` + +**Step 1: Write root terragrunt.hcl** + +```hcl +# Root Terragrunt configuration +# Provides DRY provider, backend, and variable loading for all stacks. + +# Each stack gets its own local state file under state// +remote_state { + backend = "local" + generate = { + path = "backend.tf" + if_exists = "overwrite_terragrunt" + } + config = { + path = "${get_repo_root()}/state/${path_relative_to_include()}/terraform.tfstate" + } +} + +# Load terraform.tfvars for all stacks. +# Variables not declared by a stack are silently ignored (Terraform 1.x behavior). +terraform { + extra_arguments "common_vars" { + commands = get_terraform_commands_that_need_vars() + required_var_files = [ + "${get_repo_root()}/terraform.tfvars" + ] + } + + extra_arguments "kube_config" { + commands = get_terraform_commands_that_need_vars() + arguments = [ + "-var", "kube_config_path=${get_repo_root()}/config" + ] + } +} + +# Generate kubernetes + helm providers for K8s stacks. +# The infra stack overrides this to add the proxmox provider. +generate "k8s_providers" { + path = "providers.tf" + if_exists = "overwrite_terragrunt" + contents = <&1 | head -5 +``` +Expected: No parse errors (may show warnings about missing main.tf, that's fine) + +**Step 3: Commit** + +```bash +git add terragrunt.hcl +git commit -m "[ci skip] Add root Terragrunt configuration" +``` + +--- + +## Task 3: Create Infra Stack (Proxmox VMs) + +**Files:** +- Create: `stacks/infra/terragrunt.hcl` +- Create: `stacks/infra/main.tf` + +**Step 1: Write infra terragrunt.hcl** + +This stack needs the proxmox provider instead of (or in addition to) the default k8s providers. + +```hcl +# stacks/infra/terragrunt.hcl +include "root" { + path = find_in_parent_folders() +} + +# Override provider generation to include proxmox +generate "providers" { + path = "providers.tf" + if_exists = "overwrite_terragrunt" + contents = < /etc/containerd/certs.d/docker.io/hosts.toml + mkdir -p /etc/containerd/certs.d/ghcr.io + printf 'server = "https://ghcr.io"\n\n[host."http://10.0.20.10:5010"]\n capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/ghcr.io/hosts.toml + mkdir -p /etc/containerd/certs.d/quay.io + printf 'server = "https://quay.io"\n\n[host."http://10.0.20.10:5020"]\n capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/quay.io/hosts.toml + mkdir -p /etc/containerd/certs.d/registry.k8s.io + printf 'server = "https://registry.k8s.io"\n\n[host."http://10.0.20.10:5030"]\n capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/registry.k8s.io/hosts.toml + mkdir -p /etc/containerd/certs.d/reg.kyverno.io + printf 'server = "https://reg.kyverno.io"\n\n[host."http://10.0.20.10:5040"]\n capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/reg.kyverno.io/hosts.toml + sed -i 's/.*max_concurrent_downloads = 3/max_concurrent_downloads = 20/g' /etc/containerd/config.toml + sudo sed -i '/serializeImagePulls:/d' /var/lib/kubelet/config.yaml && \ + sudo sed -i '/maxParallelImagePulls:/d' /var/lib/kubelet/config.yaml && \ + echo -e 'serializeImagePulls: false\nmaxParallelImagePulls: 50' | sudo tee -a /var/lib/kubelet/config.yaml + EOF + k8s_join_command = var.k8s_join_command +} + +module "non-k8s-node-template" { + source = "../../modules/create-template-vm" + proxmox_host = var.proxmox_host + proxmox_user = "root" + + ssh_private_key = var.ssh_private_key + ssh_public_key = var.ssh_public_key + + cloud_image_url = local.cloud_init_image_url + image_path = local.non_k8s_cloud_init_image_path + template_id = 1000 + template_name = local.non_k8s_vm_template + user_passwd = var.vm_wizard_password + + is_k8s_template = false + snippet_name = local.non_k8s_cloud_init_snippet_name +} + +module "docker-registry-template" { + source = "../../modules/create-template-vm" + proxmox_host = var.proxmox_host + proxmox_user = "root" + + ssh_private_key = var.ssh_private_key + ssh_public_key = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDHLhYDfyx237eJgOGVoJRECpUS95+7rEBS9vacsIxtx devvm" + + cloud_image_url = local.cloud_init_image_url + image_path = local.non_k8s_cloud_init_image_path + template_id = 1001 + template_name = "docker-registry-template" + + user_passwd = var.vm_wizard_password + is_k8s_template = false + snippet_name = "docker-registry.yaml" + + provision_cmds = [ + "mkdir -p /etc/docker-registry", + format("echo %s | base64 -d > /etc/docker-registry/config.yml", + base64encode( + templatefile("${path.root}/../../modules/docker-registry/config.yaml", { + password = var.dockerhub_registry_password + }) + ) + ), + # ... (copy remaining provision_cmds from main.tf lines 305-371) + ] +} + +module "docker-registry-vm" { + source = "../../modules/create-vm" + vmid = 220 + vm_cpus = 4 + vm_mem_mb = 4196 + vm_disk_size = "64G" + template_name = "docker-registry-template" + vm_name = "docker-registry" + cisnippet_name = "docker-registry.yaml" + vm_mac_address = "DE:AD:BE:EF:22:22" + bridge = "vmbr1" + vlan_tag = "20" + ipconfig0 = "ip=10.0.20.10/24,gw=10.0.20.1" +} +``` + +**Note:** The `provision_cmds` for docker-registry-template is long (~60 lines). Copy it exactly from the current `main.tf` lines 296-371. The only change is `templatefile` paths: prefix with `${path.root}/../../` since the working directory is now `stacks/infra/`. + +**Step 3: Verify with init (do NOT apply yet)** + +Run: +```bash +cd stacks/infra && terragrunt init +``` +Expected: Successful init, providers downloaded + +**Step 4: Commit** + +```bash +git add stacks/infra/ +git commit -m "[ci skip] Add infra stack (Proxmox VMs)" +``` + +--- + +## Task 4: Migrate Infra Stack State + +**CRITICAL: This task modifies live state. Take a backup first.** + +**Step 1: Backup current state** + +Run: +```bash +cp terraform.tfstate terraform.tfstate.backup-pre-terragrunt +``` + +**Step 2: List current infra resources in state** + +Run: +```bash +terraform state list | grep -E '^module\.(k8s-node-template|non-k8s-node-template|docker-registry-template|docker-registry-vm)\.' +``` +Expected: List of ~10 resources belonging to these 4 modules + +**Step 3: Move resources to new state file** + +For each resource listed in step 2, run: +```bash +mkdir -p state/infra +terraform state mv \ + -state=terraform.tfstate \ + -state-out=state/infra/terraform.tfstate \ + 'module.k8s-node-template' 'module.k8s-node-template' +terraform state mv \ + -state=terraform.tfstate \ + -state-out=state/infra/terraform.tfstate \ + 'module.non-k8s-node-template' 'module.non-k8s-node-template' +terraform state mv \ + -state=terraform.tfstate \ + -state-out=state/infra/terraform.tfstate \ + 'module.docker-registry-template' 'module.docker-registry-template' +terraform state mv \ + -state=terraform.tfstate \ + -state-out=state/infra/terraform.tfstate \ + 'module.docker-registry-vm' 'module.docker-registry-vm' +``` + +**Step 4: Verify no changes in new state** + +Run: +```bash +cd stacks/infra && terragrunt plan +``` +Expected: `No changes. Infrastructure is up-to-date.` + +If there are changes, something went wrong — restore from backup and investigate. + +**Step 5: Remove infra modules from root main.tf** + +Remove (or comment out) the `module.k8s-node-template`, `module.non-k8s-node-template`, `module.docker-registry-template`, and `module.docker-registry-vm` blocks from `main.tf` (lines 208-400). + +Also remove the corresponding `locals` block (lines 196-206) since they're now in `stacks/infra/main.tf`. + +**Step 6: Verify legacy state is clean** + +Run: +```bash +terraform plan -var="kube_config_path=$(pwd)/config" +``` +Expected: No changes (the moved resources are gone from this state but also from main.tf) + +**Step 7: Commit** + +```bash +git add main.tf stacks/infra/ +git commit -m "[ci skip] Migrate infra stack (VMs) to Terragrunt" +``` + +--- + +## Task 5: Create Platform Stack + +**Files:** +- Create: `stacks/platform/terragrunt.hcl` +- Create: `stacks/platform/main.tf` + +This is the largest task — it groups ~20 core services into one stack. + +**Step 1: Write platform terragrunt.hcl** + +```hcl +# stacks/platform/terragrunt.hcl +include "root" { + path = find_in_parent_folders() +} + +dependency "infra" { + config_path = "../infra" + skip_outputs = true +} +``` + +**Step 2: Write platform main.tf** + +This file contains all core/cluster service module calls. Copy each from `modules/kubernetes/main.tf`, adjusting `source` paths from `"./"` to `"../../modules/kubernetes/"`. Remove `for_each` conditionals (core services are always present). Remove `depends_on = [null_resource.core_services]`. + +Platform services (from `modules/kubernetes/main.tf`): + +```hcl +# stacks/platform/main.tf + +# Variables — declare all variables needed by platform services +variable "kube_config_path" { default = "~/.kube/config" } +variable "tls_secret_name" {} +variable "prod" { default = false } + +# dbaas vars +variable "dbaas_root_password" {} +variable "dbaas_postgresql_root_password" {} +variable "dbaas_pgadmin_password" {} + +# traefik vars +variable "ingress_crowdsec_api_key" {} + +# technitium vars +variable "technitium_db_password" {} +variable "homepage_credentials" { type = map(any) } + +# headscale vars +variable "headscale_config" {} +variable "headscale_acl" {} + +# authentik vars +variable "authentik_secret_key" {} +variable "authentik_postgres_password" {} +variable "k8s_users" { type = map(any); default = {} } +variable "ssh_private_key" { type = string; default = ""; sensitive = true } + +# crowdsec vars +variable "crowdsec_enroll_key" { type = string } +variable "crowdsec_db_password" { type = string } +variable "crowdsec_dash_api_key" { type = string } +variable "crowdsec_dash_machine_id" { type = string } +variable "crowdsec_dash_machine_password" { type = string } +variable "alertmanager_slack_api_url" {} + +# cloudflared vars +variable "cloudflare_api_key" {} +variable "cloudflare_email" {} +variable "cloudflare_account_id" {} +variable "cloudflare_zone_id" {} +variable "cloudflare_tunnel_id" {} +variable "public_ip" {} +variable "cloudflare_proxied_names" {} +variable "cloudflare_non_proxied_names" {} +variable "cloudflare_tunnel_token" {} + +# monitoring vars +variable "alertmanager_account_password" {} +variable "idrac_username" { default = "" } +variable "idrac_password" { default = "" } +variable "tiny_tuya_service_secret" { type = string } +variable "haos_api_token" { type = string } +variable "pve_password" { type = string } +variable "grafana_db_password" { type = string } +variable "grafana_admin_password" { type = string } + +# vaultwarden vars +variable "vaultwarden_smtp_password" {} + +# reverse-proxy vars (homepage tokens are in homepage_credentials) + +# wireguard vars +variable "wireguard_wg_0_conf" {} +variable "wireguard_wg_0_key" {} +variable "wireguard_firewall_sh" {} + +# xray vars +variable "xray_reality_clients" { type = list(map(string)) } +variable "xray_reality_private_key" { type = string } +variable "xray_reality_short_ids" { type = list(string) } + +# nvidia vars (none beyond tls_secret_name + tier) + +# mailserver vars +variable "mailserver_accounts" {} +variable "mailserver_aliases" {} +variable "mailserver_opendkim_key" {} +variable "mailserver_sasl_passwd" {} +variable "mailserver_roundcubemail_db_password" { type = string } + +# infra-maintenance vars +variable "webhook_handler_git_user" {} +variable "webhook_handler_git_token" {} +variable "technitium_username" {} +variable "technitium_password" {} + +# uptime-kuma (no extra vars) +# metrics-server (no extra vars) +# kyverno (no extra vars) + +locals { + tiers = { + core = "0-core" + cluster = "1-cluster" + gpu = "2-gpu" + edge = "3-edge" + aux = "4-aux" + } +} + +# --- Core Services (no dependencies, deployed first) --- + +module "metallb" { + source = "../../modules/kubernetes/metallb" + tier = local.tiers.core +} + +module "dbaas" { + source = "../../modules/kubernetes/dbaas" + prod = var.prod + tls_secret_name = var.tls_secret_name + dbaas_root_password = var.dbaas_root_password + postgresql_root_password = var.dbaas_postgresql_root_password + pgadmin_password = var.dbaas_pgadmin_password + tier = local.tiers.cluster +} + +module "redis" { + source = "../../modules/kubernetes/redis" + tls_secret_name = var.tls_secret_name + tier = local.tiers.cluster +} + +module "traefik" { + source = "../../modules/kubernetes/traefik" + tier = local.tiers.core + crowdsec_api_key = var.ingress_crowdsec_api_key + tls_secret_name = var.tls_secret_name +} + +module "technitium" { + source = "../../modules/kubernetes/technitium" + tls_secret_name = var.tls_secret_name + homepage_token = var.homepage_credentials["technitium"]["token"] + technitium_db_password = var.technitium_db_password + tier = local.tiers.core +} + +module "headscale" { + source = "../../modules/kubernetes/headscale" + tls_secret_name = var.tls_secret_name + headscale_config = var.headscale_config + headscale_acl = var.headscale_acl + tier = local.tiers.core +} + +module "authentik" { + source = "../../modules/kubernetes/authentik" + tier = local.tiers.cluster + tls_secret_name = var.tls_secret_name + secret_key = var.authentik_secret_key + postgres_password = var.authentik_postgres_password +} + +module "rbac" { + source = "../../modules/kubernetes/rbac" + tier = local.tiers.cluster + tls_secret_name = var.tls_secret_name + k8s_users = var.k8s_users + ssh_private_key = var.ssh_private_key +} + +module "k8s-portal" { + source = "../../modules/kubernetes/k8s-portal" + tier = local.tiers.edge + tls_secret_name = var.tls_secret_name +} + +module "crowdsec" { + source = "../../modules/kubernetes/crowdsec" + tier = local.tiers.cluster + tls_secret_name = var.tls_secret_name + homepage_username = var.homepage_credentials["crowdsec"]["username"] + homepage_password = var.homepage_credentials["crowdsec"]["password"] + enroll_key = var.crowdsec_enroll_key + db_password = var.crowdsec_db_password + crowdsec_dash_api_key = var.crowdsec_dash_api_key + crowdsec_dash_machine_id = var.crowdsec_dash_machine_id + crowdsec_dash_machine_password = var.crowdsec_dash_machine_password + slack_webhook_url = var.alertmanager_slack_api_url +} + +module "cloudflared" { + source = "../../modules/kubernetes/cloudflared" + tier = local.tiers.core + tls_secret_name = var.tls_secret_name + cloudflare_api_key = var.cloudflare_api_key + cloudflare_email = var.cloudflare_email + cloudflare_account_id = var.cloudflare_account_id + cloudflare_zone_id = var.cloudflare_zone_id + cloudflare_tunnel_id = var.cloudflare_tunnel_id + public_ip = var.public_ip + cloudflare_proxied_names = var.cloudflare_proxied_names + cloudflare_non_proxied_names = var.cloudflare_non_proxied_names + cloudflare_tunnel_token = var.cloudflare_tunnel_token +} + +module "monitoring" { + source = "../../modules/kubernetes/monitoring" + tls_secret_name = var.tls_secret_name + alertmanager_account_password = var.alertmanager_account_password + idrac_username = var.idrac_username + idrac_password = var.idrac_password + alertmanager_slack_api_url = var.alertmanager_slack_api_url + tiny_tuya_service_secret = var.tiny_tuya_service_secret + haos_api_token = var.haos_api_token + pve_password = var.pve_password + grafana_db_password = var.grafana_db_password + grafana_admin_password = var.grafana_admin_password + tier = local.tiers.cluster +} + +module "vaultwarden" { + source = "../../modules/kubernetes/vaultwarden" + tls_secret_name = var.tls_secret_name + smtp_password = var.vaultwarden_smtp_password + tier = local.tiers.edge +} + +module "reverse-proxy" { + source = "../../modules/kubernetes/reverse_proxy" + tls_secret_name = var.tls_secret_name + truenas_homepage_token = var.homepage_credentials["reverse_proxy"]["truenas_token"] + pfsense_homepage_token = var.homepage_credentials["reverse_proxy"]["pfsense_token"] +} + +module "metrics-server" { + source = "../../modules/kubernetes/metrics-server" + tier = local.tiers.cluster + tls_secret_name = var.tls_secret_name +} + +module "nvidia" { + source = "../../modules/kubernetes/nvidia" + tls_secret_name = var.tls_secret_name + tier = local.tiers.gpu +} + +module "kyverno" { + source = "../../modules/kubernetes/kyverno" +} + +module "uptime-kuma" { + source = "../../modules/kubernetes/uptime-kuma" + tls_secret_name = var.tls_secret_name + tier = local.tiers.cluster +} + +module "wireguard" { + source = "../../modules/kubernetes/wireguard" + tls_secret_name = var.tls_secret_name + wg_0_conf = var.wireguard_wg_0_conf + wg_0_key = var.wireguard_wg_0_key + firewall_sh = var.wireguard_firewall_sh + tier = local.tiers.core +} + +module "xray" { + source = "../../modules/kubernetes/xray" + tls_secret_name = var.tls_secret_name + tier = local.tiers.core + xray_reality_clients = var.xray_reality_clients + xray_reality_private_key = var.xray_reality_private_key + xray_reality_short_ids = var.xray_reality_short_ids +} + +module "mailserver" { + source = "../../modules/kubernetes/mailserver" + tls_secret_name = var.tls_secret_name + mailserver_accounts = var.mailserver_accounts + postfix_account_aliases = var.mailserver_aliases + opendkim_key = var.mailserver_opendkim_key + sasl_passwd = var.mailserver_sasl_passwd + roundcube_db_password = var.mailserver_roundcubemail_db_password + tier = local.tiers.edge +} + +module "infra-maintenance" { + source = "../../modules/kubernetes/infra-maintenance" + git_user = var.webhook_handler_git_user + git_token = var.webhook_handler_git_token + technitium_username = var.technitium_username + technitium_password = var.technitium_password +} + +# --- OUTPUTS (consumed by service stacks via Terragrunt dependency) --- + +output "tls_secret_name" { value = var.tls_secret_name } +output "redis_host" { value = "redis.redis.svc.cluster.local" } +output "postgresql_host" { value = "postgresql.dbaas.svc.cluster.local" } +output "postgresql_port" { value = 5432 } +output "mysql_host" { value = "mysql.dbaas.svc.cluster.local" } +output "mysql_port" { value = 3306 } +output "smtp_host" { value = "mail.viktorbarzin.me" } +output "smtp_port" { value = 587 } +``` + +**Step 3: Verify init succeeds** + +Run: +```bash +cd stacks/platform && terragrunt init +``` + +**Step 4: Commit** + +```bash +git add stacks/platform/ +git commit -m "[ci skip] Add platform stack (core services)" +``` + +--- + +## Task 6: Migrate Platform Stack State + +**CRITICAL: Largest state migration. Backup first.** + +**Step 1: Backup** + +```bash +cp terraform.tfstate terraform.tfstate.backup-pre-platform +``` + +**Step 2: Move core service resources** + +The resources are currently at `module.kubernetes_cluster.module.[""]` (the `for_each` key). Services without `for_each` are at `module.kubernetes_cluster.module.`. + +Run state mv for each platform service. Example pattern: +```bash +# Services WITH for_each (note the ["key"] suffix): +terraform state mv \ + -state=terraform.tfstate \ + -state-out=state/platform/terraform.tfstate \ + 'module.kubernetes_cluster.module.redis["redis"]' \ + 'module.redis' + +terraform state mv \ + -state=terraform.tfstate \ + -state-out=state/platform/terraform.tfstate \ + 'module.kubernetes_cluster.module.traefik["traefik"]' \ + 'module.traefik' + +# Services WITHOUT for_each: +terraform state mv \ + -state=terraform.tfstate \ + -state-out=state/platform/terraform.tfstate \ + 'module.kubernetes_cluster.module.metallb' \ + 'module.metallb' + +terraform state mv \ + -state=terraform.tfstate \ + -state-out=state/platform/terraform.tfstate \ + 'module.kubernetes_cluster.module.dbaas' \ + 'module.dbaas' + +terraform state mv \ + -state=terraform.tfstate \ + -state-out=state/platform/terraform.tfstate \ + 'module.kubernetes_cluster.module.cloudflared' \ + 'module.cloudflared' + +terraform state mv \ + -state=terraform.tfstate \ + -state-out=state/platform/terraform.tfstate \ + 'module.kubernetes_cluster.module.infra-maintenance' \ + 'module.infra-maintenance' + +terraform state mv \ + -state=terraform.tfstate \ + -state-out=state/platform/terraform.tfstate \ + 'module.kubernetes_cluster.module.reverse-proxy["reverse-proxy"]' \ + 'module.reverse-proxy' +``` + +Repeat for all platform services. Check whether each has `for_each` by looking at the state list: +```bash +terraform state list | grep 'module.kubernetes_cluster.module' | sort +``` + +Services with `for_each` have `["key"]` suffix; those without don't. + +**Step 3: Also move null_resource.core_services** + +```bash +# This resource can be dropped — don't move it, just remove it +terraform state rm 'module.kubernetes_cluster.null_resource.core_services' +``` + +**Step 4: Verify platform state** + +Run: +```bash +cd stacks/platform && terragrunt plan +``` +Expected: `No changes.` (or only expected diffs from removed for_each wrappers) + +**Step 5: Remove platform services from modules/kubernetes/main.tf** + +Remove the module blocks for all services that moved to the platform stack. Also remove `null_resource.core_services` and the `defcon_modules`/`active_modules` locals that reference these modules. + +**Step 6: Verify legacy state** + +Run: +```bash +terraform plan -var="kube_config_path=$(pwd)/config" +``` +Expected: No changes for remaining services + +**Step 7: Commit** + +```bash +git add main.tf modules/kubernetes/main.tf stacks/platform/ +git commit -m "[ci skip] Migrate platform stack (core services) to Terragrunt" +``` + +--- + +## Task 7: Create Simple Service Stack Template + Migrate First Service (blog) + +**Files:** +- Create: `stacks/blog/terragrunt.hcl` +- Create: `stacks/blog/main.tf` + +**Step 1: Write blog terragrunt.hcl** + +```hcl +# stacks/blog/terragrunt.hcl +include "root" { + path = find_in_parent_folders() +} + +dependency "platform" { + config_path = "../platform" +} + +inputs = { + tls_secret_name = dependency.platform.outputs.tls_secret_name +} +``` + +**Step 2: Write blog main.tf** + +```hcl +# stacks/blog/main.tf +variable "tls_secret_name" {} +variable "kube_config_path" { default = "~/.kube/config" } + +module "blog" { + source = "../../modules/kubernetes/blog" + tls_secret_name = var.tls_secret_name + tier = "4-aux" +} +``` + +**Step 3: Move blog state** + +```bash +terraform state mv \ + -state=terraform.tfstate \ + -state-out=state/blog/terraform.tfstate \ + 'module.kubernetes_cluster.module.blog["blog"]' \ + 'module.blog' +``` + +**Step 4: Verify** + +```bash +cd stacks/blog && terragrunt plan +``` +Expected: `No changes.` + +**Step 5: Remove blog from modules/kubernetes/main.tf** + +Delete the `module "blog" { ... }` block (lines 197-205). + +**Step 6: Commit** + +```bash +git add stacks/blog/ modules/kubernetes/main.tf +git commit -m "[ci skip] Migrate blog to Terragrunt stack" +``` + +--- + +## Task 8: Batch-Migrate Remaining Simple Services + +Simple services only need `tls_secret_name` (and possibly a few non-DB variables). These follow the exact same pattern as blog. + +**Simple services to migrate** (one stack each): +- echo, privatebin, excalidraw, city-guesser, dashy, travel_blog, jsoncrack, cyberchef, stirling-pdf, networking-toolbox, meshcentral, ntfy, plotting-book, reloader, descheduler, homepage, tor-proxy, forgejo, freshrss, navidrome, audiobookshelf, ebook2audiobook, whisper, frigate, matrix, changedetection, isponsorblocktv + +**Services with a few extra variables** (still no DB host refs): +- shadowsocks (password), kms, hackmd (db_password), drone (github creds, rpc_secret), diun (nfty_token, slack_url), calibre (homepage creds), owntracks (credentials), webhook_handler (many tokens), coturn (turn_secret, public_ip), wealthfolio (password_hash), actualbudget (credentials), servarr (aiostreams), onlyoffice (db_password, jwt_token), xray (reality vars), tuya-bridge (api keys), openclaw (ssh_key, api keys), f1-stream (turn_secret), paperless-ngx (db_password), freedify (credentials), netbox + +For each service, create: +1. `stacks//terragrunt.hcl` — include root, dependency on platform, inputs from platform outputs +2. `stacks//main.tf` — variable declarations + module call with `source = "../../modules/kubernetes/"` +3. `terraform state mv` from legacy state +4. Remove module block from `modules/kubernetes/main.tf` +5. Verify with `terragrunt plan` + +**Automation script** (run for each simple service): +```bash +#!/bin/bash +# Usage: ./migrate-service.sh +# Example: ./migrate-service.sh echo echo echo 3-edge + +SERVICE=$1 +SOURCE_DIR=${2:-$1} +FOR_EACH_KEY=${3:-$1} +TIER=${4:-4-aux} + +mkdir -p stacks/$SERVICE + +cat > stacks/$SERVICE/terragrunt.hcl <<'TGEOF' +include "root" { + path = find_in_parent_folders() +} + +dependency "platform" { + config_path = "../platform" +} + +inputs = { + tls_secret_name = dependency.platform.outputs.tls_secret_name +} +TGEOF + +cat > stacks/$SERVICE/main.tf < to Terragrunt stacks" +``` + +--- + +## Task 9: Modify Service Modules to Accept Host Variables + +**~20 modules need modification** to replace hardcoded DNS names with variables. + +For each module, the change is mechanical: +1. Add `variable "redis_host" { type = string }` (and/or postgresql_host, etc.) +2. Replace the hardcoded string with `var.redis_host` + +**Modules to modify and their needed variables:** + +| Module | Add variables | Replace in | +|--------|-------------|-----------| +| affine | redis_host, postgresql_host, postgresql_port, smtp_host, smtp_port | main.tf:25,29,50-64 | +| immich | redis_host, postgresql_host | main.tf:80,96 | +| nextcloud | redis_host, mysql_host | chart_values.yaml:31,37 | +| grampsweb | redis_host, smtp_host, smtp_port | main.tf:37,41,45,57 | +| dawarich | redis_host, postgresql_host | main.tf:75,79,147 | +| send | redis_host | main.tf:75 | +| linkwarden | postgresql_host, postgresql_port | main.tf:67 | +| n8n | postgresql_host | main.tf:56 | +| health | postgresql_host, postgresql_port | main.tf:54 | +| tandoor | postgresql_host, smtp_host, smtp_port | main.tf:66,98 | +| rybbit | postgresql_host | main.tf:162 | +| netbox | postgresql_host | main.tf:73 | +| speedtest | mysql_host | main.tf:85 | +| real-estate-crawler | redis_host, mysql_host | main.tf:140,153,157,301,305,309,401,405,409 | +| ytdlp | redis_host, ollama_host | main.tf:241,255 | +| resume | smtp_host, smtp_port | main.tf:186 | +| monitoring | mysql_host, smtp_host | grafana_chart_values.yaml:51, prometheus_chart_values.tpl:35,37 | + +**Example modification for affine:** + +In `modules/kubernetes/affine/main.tf`, add variables: +```hcl +variable "redis_host" { type = string } +variable "postgresql_host" { type = string } +variable "postgresql_port" { type = number } +variable "smtp_host" { type = string } +variable "smtp_port" { type = number } +``` + +Replace: +```hcl +# Before: +DATABASE_URL = "postgresql://postgres:${var.postgresql_password}@postgresql.dbaas.svc.cluster.local:5432/affine" +# After: +DATABASE_URL = "postgresql://postgres:${var.postgresql_password}@${var.postgresql_host}:${var.postgresql_port}/affine" + +# Before: +REDIS_SERVER_HOST = "redis.redis.svc.cluster.local" +# After: +REDIS_SERVER_HOST = var.redis_host +``` + +**Step: Commit each module modification** + +```bash +git add modules/kubernetes// +git commit -m "[ci skip] Accept host variables in module" +``` + +--- + +## Task 10: Migrate Database-Backed Services to Terragrunt Stacks + +After modules are modified (Task 9), create stacks that wire platform outputs to module inputs. + +**Example: stacks/affine/terragrunt.hcl** + +```hcl +include "root" { + path = find_in_parent_folders() +} + +dependency "platform" { + config_path = "../platform" +} + +inputs = { + tls_secret_name = dependency.platform.outputs.tls_secret_name + redis_host = dependency.platform.outputs.redis_host + postgresql_host = dependency.platform.outputs.postgresql_host + postgresql_port = dependency.platform.outputs.postgresql_port + smtp_host = dependency.platform.outputs.smtp_host + smtp_port = dependency.platform.outputs.smtp_port +} +``` + +**stacks/affine/main.tf:** + +```hcl +variable "tls_secret_name" {} +variable "kube_config_path" { default = "~/.kube/config" } +variable "affine_postgresql_password" {} +variable "redis_host" { type = string } +variable "postgresql_host" { type = string } +variable "postgresql_port" { type = number } +variable "smtp_host" { type = string } +variable "smtp_port" { type = number } + +module "affine" { + source = "../../modules/kubernetes/affine" + tls_secret_name = var.tls_secret_name + postgresql_password = var.affine_postgresql_password + redis_host = var.redis_host + postgresql_host = var.postgresql_host + postgresql_port = var.postgresql_port + smtp_host = var.smtp_host + smtp_port = var.smtp_port + tier = "4-aux" +} +``` + +State migration follows the same pattern as Task 7. + +Repeat for all DB-backed services from the table in Task 9. + +--- + +## Task 11: Migrate Service-to-Service Dependencies + +Services that depend on other non-platform services need multi-dependency stacks. + +**Step 1: Create ollama stack with outputs** + +```hcl +# stacks/ollama/main.tf +variable "tls_secret_name" {} +variable "kube_config_path" { default = "~/.kube/config" } +variable "ollama_api_credentials" {} + +module "ollama" { + source = "../../modules/kubernetes/ollama" + tls_secret_name = var.tls_secret_name + tier = "2-gpu" + ollama_api_credentials = var.ollama_api_credentials +} + +output "ollama_host" { + value = "ollama.ollama.svc.cluster.local" +} +``` + +**Step 2: Create openclaw stack with ollama dependency** + +```hcl +# stacks/openclaw/terragrunt.hcl +include "root" { + path = find_in_parent_folders() +} + +dependency "platform" { + config_path = "../platform" +} + +dependency "ollama" { + config_path = "../ollama" +} + +inputs = { + tls_secret_name = dependency.platform.outputs.tls_secret_name + ollama_host = dependency.ollama.outputs.ollama_host +} +``` + +**Step 3: Similarly for coturn → f1-stream and osm-routing → real-estate-crawler** + +--- + +## Task 12: Final Cleanup + +**Step 1: Remove legacy modules/kubernetes/main.tf** + +After all services are migrated, this file should be empty (or contain only commented-out blocks). Delete it. + +**Step 2: Remove kubernetes_cluster module call from root main.tf** + +The root `main.tf` should now only contain provider blocks (which can also be removed since Terragrunt generates them) and the `variable` declarations for `terraform.tfvars` loading. + +**Step 3: Archive legacy state** + +```bash +mv terraform.tfstate terraform.tfstate.legacy +mv terraform.tfstate.backup-* state/backups/ +``` + +**Step 4: Verify full DAG** + +```bash +cd stacks && terragrunt run-all plan +``` +Expected: All stacks show `No changes.` + +**Step 5: Update CLAUDE.md** + +Update the knowledge file to reflect the new Terragrunt architecture, commands, and workflow. + +**Step 6: Final commit** + +```bash +git add -A +git commit -m "[ci skip] Complete Terragrunt migration — remove legacy monolith" +``` + +--- + +## Task 13: Update CI/CD (Drone Pipeline) + +**Files:** +- Modify: `.drone.yml` + +Create a Drone pipeline that: +1. Detects changed files +2. Maps to affected stacks +3. Runs `terragrunt plan` (on PR) or `terragrunt apply` (on master merge) + +See design doc section "CI/CD: Changed-Stack Detection" for the pipeline logic. + +--- + +## Execution Order Summary + +| Task | Phase | Risk | Reversible | +|------|-------|------|-----------| +| 1. Install + skeleton | 0 | None | Yes (delete dirs) | +| 2. Root terragrunt.hcl | 0 | None | Yes (delete file) | +| 3. Infra stack files | 1 | None | Yes (delete stack) | +| 4. Infra state migration | 1 | Medium | Yes (state mv back) | +| 5. Platform stack files | 2 | None | Yes (delete stack) | +| 6. Platform state migration | 2 | High | Yes (state mv back) | +| 7. First simple service (blog) | 3 | Low | Yes (state mv back) | +| 8. Batch simple services | 3 | Low | Yes (state mv back) | +| 9. Module host variable mods | 4 | Low | Yes (revert changes) | +| 10. DB service stacks | 4 | Low | Yes (state mv back) | +| 11. Service-to-service deps | 5 | Low | Yes (state mv back) | +| 12. Final cleanup | 6 | Medium | Harder to reverse | +| 13. CI/CD update | 6 | Low | Yes (revert .drone.yml) |