37 KiB
Terragrunt Migration Implementation Plan
For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
Goal: Migrate the monolithic Terraform setup (857 resources, 15MB state) to Terragrunt with per-service state isolation, proper DAG dependencies, and changed-stack CI/CD detection.
Architecture: Flat stacks under stacks/ with thin main.tf wrappers calling existing modules. Root terragrunt.hcl provides DRY provider/backend config. Platform stack groups ~20 core services and exports outputs (redis_host, postgresql_host, etc.) consumed by ~65 per-service stacks via Terragrunt dependency blocks.
Tech Stack: Terragrunt, Terraform 1.14.x, local state backend, Drone CI
Design Doc: docs/plans/2026-02-22-terragrunt-migration-design.md
Task 1: Install Terragrunt and Create Directory Skeleton
Files:
- Create:
stacks/directory - Create:
state/directory - Create:
.gitignoreupdates
Step 1: Install Terragrunt
Run:
brew install terragrunt
Expected: Terragrunt available at terragrunt --version
Step 2: Create directory skeleton
Run:
mkdir -p stacks/{infra,platform}
mkdir -p state
Step 3: Update .gitignore
Add to .gitignore:
# Terragrunt
.terragrunt-cache/
state/
The state/ directory contains per-stack terraform state files. These are local-only and should not be committed (they contain resource IDs and potentially sensitive data, same as the current terraform.tfstate).
Step 4: Commit
git add stacks/ .gitignore
git commit -m "[ci skip] Add Terragrunt directory skeleton"
Task 2: Create Root Terragrunt Configuration
Files:
- Create:
terragrunt.hcl
Step 1: Write root terragrunt.hcl
# Root Terragrunt configuration
# Provides DRY provider, backend, and variable loading for all stacks.
# Each stack gets its own local state file under state/<stack-name>/
remote_state {
backend = "local"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
path = "${get_repo_root()}/state/${path_relative_to_include()}/terraform.tfstate"
}
}
# Load terraform.tfvars for all stacks.
# Variables not declared by a stack are silently ignored (Terraform 1.x behavior).
terraform {
extra_arguments "common_vars" {
commands = get_terraform_commands_that_need_vars()
required_var_files = [
"${get_repo_root()}/terraform.tfvars"
]
}
extra_arguments "kube_config" {
commands = get_terraform_commands_that_need_vars()
arguments = [
"-var", "kube_config_path=${get_repo_root()}/config"
]
}
}
# Generate kubernetes + helm providers for K8s stacks.
# The infra stack overrides this to add the proxmox provider.
generate "k8s_providers" {
path = "providers.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
variable "kube_config_path" {
type = string
default = "~/.kube/config"
}
provider "kubernetes" {
config_path = var.kube_config_path
}
provider "helm" {
kubernetes {
config_path = var.kube_config_path
}
}
EOF
}
Step 2: Verify Terragrunt parses it
Run:
cd stacks/infra && terragrunt validate-inputs 2>&1 | head -5
Expected: No parse errors (may show warnings about missing main.tf, that's fine)
Step 3: Commit
git add terragrunt.hcl
git commit -m "[ci skip] Add root Terragrunt configuration"
Task 3: Create Infra Stack (Proxmox VMs)
Files:
- Create:
stacks/infra/terragrunt.hcl - Create:
stacks/infra/main.tf
Step 1: Write infra terragrunt.hcl
This stack needs the proxmox provider instead of (or in addition to) the default k8s providers.
# stacks/infra/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
# Override provider generation to include proxmox
generate "providers" {
path = "providers.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
variable "kube_config_path" {
type = string
default = "~/.kube/config"
}
variable "proxmox_pm_api_url" { type = string }
variable "proxmox_pm_api_token_id" { type = string }
variable "proxmox_pm_api_token_secret" { type = string }
provider "proxmox" {
pm_api_url = var.proxmox_pm_api_url
pm_api_token_id = var.proxmox_pm_api_token_id
pm_api_token_secret = var.proxmox_pm_api_token_secret
pm_tls_insecure = true
}
EOF
}
Step 2: Write infra main.tf
Copy the VM template and docker-registry module calls from the current root main.tf (lines 196-400). These reference ./modules/create-template-vm and ./modules/create-vm — adjust paths to ../../modules/....
# stacks/infra/main.tf
# Proxmox VM templates and docker registry VM
variable "proxmox_host" { type = string }
variable "ssh_private_key" {
type = string
default = ""
}
variable "ssh_public_key" {
type = string
default = ""
}
variable "vm_wizard_password" { type = string }
variable "k8s_join_command" { type = string }
variable "dockerhub_registry_password" { type = string }
locals {
k8s_vm_template = "ubuntu-2404-cloudinit-k8s-template"
k8s_cloud_init_snippet_name = "k8s_cloud_init.yaml"
k8s_cloud_init_image_path = "/var/lib/vz/template/iso/noble-server-cloudimg-amd64-k8s.img"
non_k8s_vm_template = "ubuntu-2404-cloudinit-non-k8s-template"
non_k8s_cloud_init_snippet_name = "non_k8s_cloud_init.yaml"
non_k8s_cloud_init_image_path = "/var/lib/vz/template/iso/noble-server-cloudimg-amd64-non-k8s.img"
cloud_init_image_url = "https://cloud-images.ubuntu.com/noble/current/noble-server-cloudimg-amd64.img"
}
module "k8s-node-template" {
source = "../../modules/create-template-vm"
proxmox_host = var.proxmox_host
proxmox_user = "root"
ssh_private_key = var.ssh_private_key
ssh_public_key = var.ssh_public_key
cloud_image_url = local.cloud_init_image_url
image_path = local.k8s_cloud_init_image_path
template_id = 2000
template_name = local.k8s_vm_template
user_passwd = var.vm_wizard_password
is_k8s_template = true
snippet_name = local.k8s_cloud_init_snippet_name
containerd_config_update_command = <<-EOF
sed -i 's|config_path = ""|config_path = "/etc/containerd/certs.d"|' /etc/containerd/config.toml
mkdir -p /etc/containerd/certs.d/docker.io
printf 'server = "https://registry-1.docker.io"\n\n[host."http://10.0.20.10:5000"]\n capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/docker.io/hosts.toml
mkdir -p /etc/containerd/certs.d/ghcr.io
printf 'server = "https://ghcr.io"\n\n[host."http://10.0.20.10:5010"]\n capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/ghcr.io/hosts.toml
mkdir -p /etc/containerd/certs.d/quay.io
printf 'server = "https://quay.io"\n\n[host."http://10.0.20.10:5020"]\n capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/quay.io/hosts.toml
mkdir -p /etc/containerd/certs.d/registry.k8s.io
printf 'server = "https://registry.k8s.io"\n\n[host."http://10.0.20.10:5030"]\n capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/registry.k8s.io/hosts.toml
mkdir -p /etc/containerd/certs.d/reg.kyverno.io
printf 'server = "https://reg.kyverno.io"\n\n[host."http://10.0.20.10:5040"]\n capabilities = ["pull", "resolve"]\n' > /etc/containerd/certs.d/reg.kyverno.io/hosts.toml
sed -i 's/.*max_concurrent_downloads = 3/max_concurrent_downloads = 20/g' /etc/containerd/config.toml
sudo sed -i '/serializeImagePulls:/d' /var/lib/kubelet/config.yaml && \
sudo sed -i '/maxParallelImagePulls:/d' /var/lib/kubelet/config.yaml && \
echo -e 'serializeImagePulls: false\nmaxParallelImagePulls: 50' | sudo tee -a /var/lib/kubelet/config.yaml
EOF
k8s_join_command = var.k8s_join_command
}
module "non-k8s-node-template" {
source = "../../modules/create-template-vm"
proxmox_host = var.proxmox_host
proxmox_user = "root"
ssh_private_key = var.ssh_private_key
ssh_public_key = var.ssh_public_key
cloud_image_url = local.cloud_init_image_url
image_path = local.non_k8s_cloud_init_image_path
template_id = 1000
template_name = local.non_k8s_vm_template
user_passwd = var.vm_wizard_password
is_k8s_template = false
snippet_name = local.non_k8s_cloud_init_snippet_name
}
module "docker-registry-template" {
source = "../../modules/create-template-vm"
proxmox_host = var.proxmox_host
proxmox_user = "root"
ssh_private_key = var.ssh_private_key
ssh_public_key = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDHLhYDfyx237eJgOGVoJRECpUS95+7rEBS9vacsIxtx devvm"
cloud_image_url = local.cloud_init_image_url
image_path = local.non_k8s_cloud_init_image_path
template_id = 1001
template_name = "docker-registry-template"
user_passwd = var.vm_wizard_password
is_k8s_template = false
snippet_name = "docker-registry.yaml"
provision_cmds = [
"mkdir -p /etc/docker-registry",
format("echo %s | base64 -d > /etc/docker-registry/config.yml",
base64encode(
templatefile("${path.root}/../../modules/docker-registry/config.yaml", {
password = var.dockerhub_registry_password
})
)
),
# ... (copy remaining provision_cmds from main.tf lines 305-371)
]
}
module "docker-registry-vm" {
source = "../../modules/create-vm"
vmid = 220
vm_cpus = 4
vm_mem_mb = 4196
vm_disk_size = "64G"
template_name = "docker-registry-template"
vm_name = "docker-registry"
cisnippet_name = "docker-registry.yaml"
vm_mac_address = "DE:AD:BE:EF:22:22"
bridge = "vmbr1"
vlan_tag = "20"
ipconfig0 = "ip=10.0.20.10/24,gw=10.0.20.1"
}
Note: The provision_cmds for docker-registry-template is long (~60 lines). Copy it exactly from the current main.tf lines 296-371. The only change is templatefile paths: prefix with ${path.root}/../../ since the working directory is now stacks/infra/.
Step 3: Verify with init (do NOT apply yet)
Run:
cd stacks/infra && terragrunt init
Expected: Successful init, providers downloaded
Step 4: Commit
git add stacks/infra/
git commit -m "[ci skip] Add infra stack (Proxmox VMs)"
Task 4: Migrate Infra Stack State
CRITICAL: This task modifies live state. Take a backup first.
Step 1: Backup current state
Run:
cp terraform.tfstate terraform.tfstate.backup-pre-terragrunt
Step 2: List current infra resources in state
Run:
terraform state list | grep -E '^module\.(k8s-node-template|non-k8s-node-template|docker-registry-template|docker-registry-vm)\.'
Expected: List of ~10 resources belonging to these 4 modules
Step 3: Move resources to new state file
For each resource listed in step 2, run:
mkdir -p state/infra
terraform state mv \
-state=terraform.tfstate \
-state-out=state/infra/terraform.tfstate \
'module.k8s-node-template' 'module.k8s-node-template'
terraform state mv \
-state=terraform.tfstate \
-state-out=state/infra/terraform.tfstate \
'module.non-k8s-node-template' 'module.non-k8s-node-template'
terraform state mv \
-state=terraform.tfstate \
-state-out=state/infra/terraform.tfstate \
'module.docker-registry-template' 'module.docker-registry-template'
terraform state mv \
-state=terraform.tfstate \
-state-out=state/infra/terraform.tfstate \
'module.docker-registry-vm' 'module.docker-registry-vm'
Step 4: Verify no changes in new state
Run:
cd stacks/infra && terragrunt plan
Expected: No changes. Infrastructure is up-to-date.
If there are changes, something went wrong — restore from backup and investigate.
Step 5: Remove infra modules from root main.tf
Remove (or comment out) the module.k8s-node-template, module.non-k8s-node-template, module.docker-registry-template, and module.docker-registry-vm blocks from main.tf (lines 208-400).
Also remove the corresponding locals block (lines 196-206) since they're now in stacks/infra/main.tf.
Step 6: Verify legacy state is clean
Run:
terraform plan -var="kube_config_path=$(pwd)/config"
Expected: No changes (the moved resources are gone from this state but also from main.tf)
Step 7: Commit
git add main.tf stacks/infra/
git commit -m "[ci skip] Migrate infra stack (VMs) to Terragrunt"
Task 5: Create Platform Stack
Files:
- Create:
stacks/platform/terragrunt.hcl - Create:
stacks/platform/main.tf
This is the largest task — it groups ~20 core services into one stack.
Step 1: Write platform terragrunt.hcl
# stacks/platform/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
dependency "infra" {
config_path = "../infra"
skip_outputs = true
}
Step 2: Write platform main.tf
This file contains all core/cluster service module calls. Copy each from modules/kubernetes/main.tf, adjusting source paths from "./<service>" to "../../modules/kubernetes/<service>". Remove for_each conditionals (core services are always present). Remove depends_on = [null_resource.core_services].
Platform services (from modules/kubernetes/main.tf):
# stacks/platform/main.tf
# Variables — declare all variables needed by platform services
variable "kube_config_path" { default = "~/.kube/config" }
variable "tls_secret_name" {}
variable "prod" { default = false }
# dbaas vars
variable "dbaas_root_password" {}
variable "dbaas_postgresql_root_password" {}
variable "dbaas_pgadmin_password" {}
# traefik vars
variable "ingress_crowdsec_api_key" {}
# technitium vars
variable "technitium_db_password" {}
variable "homepage_credentials" { type = map(any) }
# headscale vars
variable "headscale_config" {}
variable "headscale_acl" {}
# authentik vars
variable "authentik_secret_key" {}
variable "authentik_postgres_password" {}
variable "k8s_users" { type = map(any); default = {} }
variable "ssh_private_key" { type = string; default = ""; sensitive = true }
# crowdsec vars
variable "crowdsec_enroll_key" { type = string }
variable "crowdsec_db_password" { type = string }
variable "crowdsec_dash_api_key" { type = string }
variable "crowdsec_dash_machine_id" { type = string }
variable "crowdsec_dash_machine_password" { type = string }
variable "alertmanager_slack_api_url" {}
# cloudflared vars
variable "cloudflare_api_key" {}
variable "cloudflare_email" {}
variable "cloudflare_account_id" {}
variable "cloudflare_zone_id" {}
variable "cloudflare_tunnel_id" {}
variable "public_ip" {}
variable "cloudflare_proxied_names" {}
variable "cloudflare_non_proxied_names" {}
variable "cloudflare_tunnel_token" {}
# monitoring vars
variable "alertmanager_account_password" {}
variable "idrac_username" { default = "" }
variable "idrac_password" { default = "" }
variable "tiny_tuya_service_secret" { type = string }
variable "haos_api_token" { type = string }
variable "pve_password" { type = string }
variable "grafana_db_password" { type = string }
variable "grafana_admin_password" { type = string }
# vaultwarden vars
variable "vaultwarden_smtp_password" {}
# reverse-proxy vars (homepage tokens are in homepage_credentials)
# wireguard vars
variable "wireguard_wg_0_conf" {}
variable "wireguard_wg_0_key" {}
variable "wireguard_firewall_sh" {}
# xray vars
variable "xray_reality_clients" { type = list(map(string)) }
variable "xray_reality_private_key" { type = string }
variable "xray_reality_short_ids" { type = list(string) }
# nvidia vars (none beyond tls_secret_name + tier)
# mailserver vars
variable "mailserver_accounts" {}
variable "mailserver_aliases" {}
variable "mailserver_opendkim_key" {}
variable "mailserver_sasl_passwd" {}
variable "mailserver_roundcubemail_db_password" { type = string }
# infra-maintenance vars
variable "webhook_handler_git_user" {}
variable "webhook_handler_git_token" {}
variable "technitium_username" {}
variable "technitium_password" {}
# uptime-kuma (no extra vars)
# metrics-server (no extra vars)
# kyverno (no extra vars)
locals {
tiers = {
core = "0-core"
cluster = "1-cluster"
gpu = "2-gpu"
edge = "3-edge"
aux = "4-aux"
}
}
# --- Core Services (no dependencies, deployed first) ---
module "metallb" {
source = "../../modules/kubernetes/metallb"
tier = local.tiers.core
}
module "dbaas" {
source = "../../modules/kubernetes/dbaas"
prod = var.prod
tls_secret_name = var.tls_secret_name
dbaas_root_password = var.dbaas_root_password
postgresql_root_password = var.dbaas_postgresql_root_password
pgadmin_password = var.dbaas_pgadmin_password
tier = local.tiers.cluster
}
module "redis" {
source = "../../modules/kubernetes/redis"
tls_secret_name = var.tls_secret_name
tier = local.tiers.cluster
}
module "traefik" {
source = "../../modules/kubernetes/traefik"
tier = local.tiers.core
crowdsec_api_key = var.ingress_crowdsec_api_key
tls_secret_name = var.tls_secret_name
}
module "technitium" {
source = "../../modules/kubernetes/technitium"
tls_secret_name = var.tls_secret_name
homepage_token = var.homepage_credentials["technitium"]["token"]
technitium_db_password = var.technitium_db_password
tier = local.tiers.core
}
module "headscale" {
source = "../../modules/kubernetes/headscale"
tls_secret_name = var.tls_secret_name
headscale_config = var.headscale_config
headscale_acl = var.headscale_acl
tier = local.tiers.core
}
module "authentik" {
source = "../../modules/kubernetes/authentik"
tier = local.tiers.cluster
tls_secret_name = var.tls_secret_name
secret_key = var.authentik_secret_key
postgres_password = var.authentik_postgres_password
}
module "rbac" {
source = "../../modules/kubernetes/rbac"
tier = local.tiers.cluster
tls_secret_name = var.tls_secret_name
k8s_users = var.k8s_users
ssh_private_key = var.ssh_private_key
}
module "k8s-portal" {
source = "../../modules/kubernetes/k8s-portal"
tier = local.tiers.edge
tls_secret_name = var.tls_secret_name
}
module "crowdsec" {
source = "../../modules/kubernetes/crowdsec"
tier = local.tiers.cluster
tls_secret_name = var.tls_secret_name
homepage_username = var.homepage_credentials["crowdsec"]["username"]
homepage_password = var.homepage_credentials["crowdsec"]["password"]
enroll_key = var.crowdsec_enroll_key
db_password = var.crowdsec_db_password
crowdsec_dash_api_key = var.crowdsec_dash_api_key
crowdsec_dash_machine_id = var.crowdsec_dash_machine_id
crowdsec_dash_machine_password = var.crowdsec_dash_machine_password
slack_webhook_url = var.alertmanager_slack_api_url
}
module "cloudflared" {
source = "../../modules/kubernetes/cloudflared"
tier = local.tiers.core
tls_secret_name = var.tls_secret_name
cloudflare_api_key = var.cloudflare_api_key
cloudflare_email = var.cloudflare_email
cloudflare_account_id = var.cloudflare_account_id
cloudflare_zone_id = var.cloudflare_zone_id
cloudflare_tunnel_id = var.cloudflare_tunnel_id
public_ip = var.public_ip
cloudflare_proxied_names = var.cloudflare_proxied_names
cloudflare_non_proxied_names = var.cloudflare_non_proxied_names
cloudflare_tunnel_token = var.cloudflare_tunnel_token
}
module "monitoring" {
source = "../../modules/kubernetes/monitoring"
tls_secret_name = var.tls_secret_name
alertmanager_account_password = var.alertmanager_account_password
idrac_username = var.idrac_username
idrac_password = var.idrac_password
alertmanager_slack_api_url = var.alertmanager_slack_api_url
tiny_tuya_service_secret = var.tiny_tuya_service_secret
haos_api_token = var.haos_api_token
pve_password = var.pve_password
grafana_db_password = var.grafana_db_password
grafana_admin_password = var.grafana_admin_password
tier = local.tiers.cluster
}
module "vaultwarden" {
source = "../../modules/kubernetes/vaultwarden"
tls_secret_name = var.tls_secret_name
smtp_password = var.vaultwarden_smtp_password
tier = local.tiers.edge
}
module "reverse-proxy" {
source = "../../modules/kubernetes/reverse_proxy"
tls_secret_name = var.tls_secret_name
truenas_homepage_token = var.homepage_credentials["reverse_proxy"]["truenas_token"]
pfsense_homepage_token = var.homepage_credentials["reverse_proxy"]["pfsense_token"]
}
module "metrics-server" {
source = "../../modules/kubernetes/metrics-server"
tier = local.tiers.cluster
tls_secret_name = var.tls_secret_name
}
module "nvidia" {
source = "../../modules/kubernetes/nvidia"
tls_secret_name = var.tls_secret_name
tier = local.tiers.gpu
}
module "kyverno" {
source = "../../modules/kubernetes/kyverno"
}
module "uptime-kuma" {
source = "../../modules/kubernetes/uptime-kuma"
tls_secret_name = var.tls_secret_name
tier = local.tiers.cluster
}
module "wireguard" {
source = "../../modules/kubernetes/wireguard"
tls_secret_name = var.tls_secret_name
wg_0_conf = var.wireguard_wg_0_conf
wg_0_key = var.wireguard_wg_0_key
firewall_sh = var.wireguard_firewall_sh
tier = local.tiers.core
}
module "xray" {
source = "../../modules/kubernetes/xray"
tls_secret_name = var.tls_secret_name
tier = local.tiers.core
xray_reality_clients = var.xray_reality_clients
xray_reality_private_key = var.xray_reality_private_key
xray_reality_short_ids = var.xray_reality_short_ids
}
module "mailserver" {
source = "../../modules/kubernetes/mailserver"
tls_secret_name = var.tls_secret_name
mailserver_accounts = var.mailserver_accounts
postfix_account_aliases = var.mailserver_aliases
opendkim_key = var.mailserver_opendkim_key
sasl_passwd = var.mailserver_sasl_passwd
roundcube_db_password = var.mailserver_roundcubemail_db_password
tier = local.tiers.edge
}
module "infra-maintenance" {
source = "../../modules/kubernetes/infra-maintenance"
git_user = var.webhook_handler_git_user
git_token = var.webhook_handler_git_token
technitium_username = var.technitium_username
technitium_password = var.technitium_password
}
# --- OUTPUTS (consumed by service stacks via Terragrunt dependency) ---
output "tls_secret_name" { value = var.tls_secret_name }
output "redis_host" { value = "redis.redis.svc.cluster.local" }
output "postgresql_host" { value = "postgresql.dbaas.svc.cluster.local" }
output "postgresql_port" { value = 5432 }
output "mysql_host" { value = "mysql.dbaas.svc.cluster.local" }
output "mysql_port" { value = 3306 }
output "smtp_host" { value = "mail.viktorbarzin.me" }
output "smtp_port" { value = 587 }
Step 3: Verify init succeeds
Run:
cd stacks/platform && terragrunt init
Step 4: Commit
git add stacks/platform/
git commit -m "[ci skip] Add platform stack (core services)"
Task 6: Migrate Platform Stack State
CRITICAL: Largest state migration. Backup first.
Step 1: Backup
cp terraform.tfstate terraform.tfstate.backup-pre-platform
Step 2: Move core service resources
The resources are currently at module.kubernetes_cluster.module.<service>["<key>"] (the for_each key). Services without for_each are at module.kubernetes_cluster.module.<service>.
Run state mv for each platform service. Example pattern:
# Services WITH for_each (note the ["key"] suffix):
terraform state mv \
-state=terraform.tfstate \
-state-out=state/platform/terraform.tfstate \
'module.kubernetes_cluster.module.redis["redis"]' \
'module.redis'
terraform state mv \
-state=terraform.tfstate \
-state-out=state/platform/terraform.tfstate \
'module.kubernetes_cluster.module.traefik["traefik"]' \
'module.traefik'
# Services WITHOUT for_each:
terraform state mv \
-state=terraform.tfstate \
-state-out=state/platform/terraform.tfstate \
'module.kubernetes_cluster.module.metallb' \
'module.metallb'
terraform state mv \
-state=terraform.tfstate \
-state-out=state/platform/terraform.tfstate \
'module.kubernetes_cluster.module.dbaas' \
'module.dbaas'
terraform state mv \
-state=terraform.tfstate \
-state-out=state/platform/terraform.tfstate \
'module.kubernetes_cluster.module.cloudflared' \
'module.cloudflared'
terraform state mv \
-state=terraform.tfstate \
-state-out=state/platform/terraform.tfstate \
'module.kubernetes_cluster.module.infra-maintenance' \
'module.infra-maintenance'
terraform state mv \
-state=terraform.tfstate \
-state-out=state/platform/terraform.tfstate \
'module.kubernetes_cluster.module.reverse-proxy["reverse-proxy"]' \
'module.reverse-proxy'
Repeat for all platform services. Check whether each has for_each by looking at the state list:
terraform state list | grep 'module.kubernetes_cluster.module' | sort
Services with for_each have ["key"] suffix; those without don't.
Step 3: Also move null_resource.core_services
# This resource can be dropped — don't move it, just remove it
terraform state rm 'module.kubernetes_cluster.null_resource.core_services'
Step 4: Verify platform state
Run:
cd stacks/platform && terragrunt plan
Expected: No changes. (or only expected diffs from removed for_each wrappers)
Step 5: Remove platform services from modules/kubernetes/main.tf
Remove the module blocks for all services that moved to the platform stack. Also remove null_resource.core_services and the defcon_modules/active_modules locals that reference these modules.
Step 6: Verify legacy state
Run:
terraform plan -var="kube_config_path=$(pwd)/config"
Expected: No changes for remaining services
Step 7: Commit
git add main.tf modules/kubernetes/main.tf stacks/platform/
git commit -m "[ci skip] Migrate platform stack (core services) to Terragrunt"
Task 7: Create Simple Service Stack Template + Migrate First Service (blog)
Files:
- Create:
stacks/blog/terragrunt.hcl - Create:
stacks/blog/main.tf
Step 1: Write blog terragrunt.hcl
# stacks/blog/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
dependency "platform" {
config_path = "../platform"
}
inputs = {
tls_secret_name = dependency.platform.outputs.tls_secret_name
}
Step 2: Write blog main.tf
# stacks/blog/main.tf
variable "tls_secret_name" {}
variable "kube_config_path" { default = "~/.kube/config" }
module "blog" {
source = "../../modules/kubernetes/blog"
tls_secret_name = var.tls_secret_name
tier = "4-aux"
}
Step 3: Move blog state
terraform state mv \
-state=terraform.tfstate \
-state-out=state/blog/terraform.tfstate \
'module.kubernetes_cluster.module.blog["blog"]' \
'module.blog'
Step 4: Verify
cd stacks/blog && terragrunt plan
Expected: No changes.
Step 5: Remove blog from modules/kubernetes/main.tf
Delete the module "blog" { ... } block (lines 197-205).
Step 6: Commit
git add stacks/blog/ modules/kubernetes/main.tf
git commit -m "[ci skip] Migrate blog to Terragrunt stack"
Task 8: Batch-Migrate Remaining Simple Services
Simple services only need tls_secret_name (and possibly a few non-DB variables). These follow the exact same pattern as blog.
Simple services to migrate (one stack each):
- echo, privatebin, excalidraw, city-guesser, dashy, travel_blog, jsoncrack, cyberchef, stirling-pdf, networking-toolbox, meshcentral, ntfy, plotting-book, reloader, descheduler, homepage, tor-proxy, forgejo, freshrss, navidrome, audiobookshelf, ebook2audiobook, whisper, frigate, matrix, changedetection, isponsorblocktv
Services with a few extra variables (still no DB host refs):
- shadowsocks (password), kms, hackmd (db_password), drone (github creds, rpc_secret), diun (nfty_token, slack_url), calibre (homepage creds), owntracks (credentials), webhook_handler (many tokens), coturn (turn_secret, public_ip), wealthfolio (password_hash), actualbudget (credentials), servarr (aiostreams), onlyoffice (db_password, jwt_token), xray (reality vars), tuya-bridge (api keys), openclaw (ssh_key, api keys), f1-stream (turn_secret), paperless-ngx (db_password), freedify (credentials), netbox
For each service, create:
stacks/<service>/terragrunt.hcl— include root, dependency on platform, inputs from platform outputsstacks/<service>/main.tf— variable declarations + module call withsource = "../../modules/kubernetes/<service>"terraform state mvfrom legacy state- Remove module block from
modules/kubernetes/main.tf - Verify with
terragrunt plan
Automation script (run for each simple service):
#!/bin/bash
# Usage: ./migrate-service.sh <service-name> <source-dir> <for-each-key> <tier>
# Example: ./migrate-service.sh echo echo echo 3-edge
SERVICE=$1
SOURCE_DIR=${2:-$1}
FOR_EACH_KEY=${3:-$1}
TIER=${4:-4-aux}
mkdir -p stacks/$SERVICE
cat > stacks/$SERVICE/terragrunt.hcl <<'TGEOF'
include "root" {
path = find_in_parent_folders()
}
dependency "platform" {
config_path = "../platform"
}
inputs = {
tls_secret_name = dependency.platform.outputs.tls_secret_name
}
TGEOF
cat > stacks/$SERVICE/main.tf <<EOF
variable "tls_secret_name" {}
variable "kube_config_path" { default = "~/.kube/config" }
module "$SERVICE" {
source = "../../modules/kubernetes/$SOURCE_DIR"
tls_secret_name = var.tls_secret_name
tier = "$TIER"
}
EOF
# Move state
mkdir -p state/$SERVICE
terraform state mv \
-state=terraform.tfstate \
-state-out=state/$SERVICE/terraform.tfstate \
"module.kubernetes_cluster.module.${SERVICE}[\"${FOR_EACH_KEY}\"]" \
"module.${SERVICE}"
# Verify
cd stacks/$SERVICE && terragrunt plan && cd ../..
Services with extra variables need manual wrapper main.tf files (add the variable declarations and pass them to the module).
Commit after each batch of ~5-10 services:
git add stacks/*/
git commit -m "[ci skip] Migrate <service-list> to Terragrunt stacks"
Task 9: Modify Service Modules to Accept Host Variables
~20 modules need modification to replace hardcoded DNS names with variables.
For each module, the change is mechanical:
- Add
variable "redis_host" { type = string }(and/or postgresql_host, etc.) - Replace the hardcoded string with
var.redis_host
Modules to modify and their needed variables:
| Module | Add variables | Replace in |
|---|---|---|
| affine | redis_host, postgresql_host, postgresql_port, smtp_host, smtp_port | main.tf:25,29,50-64 |
| immich | redis_host, postgresql_host | main.tf:80,96 |
| nextcloud | redis_host, mysql_host | chart_values.yaml:31,37 |
| grampsweb | redis_host, smtp_host, smtp_port | main.tf:37,41,45,57 |
| dawarich | redis_host, postgresql_host | main.tf:75,79,147 |
| send | redis_host | main.tf:75 |
| linkwarden | postgresql_host, postgresql_port | main.tf:67 |
| n8n | postgresql_host | main.tf:56 |
| health | postgresql_host, postgresql_port | main.tf:54 |
| tandoor | postgresql_host, smtp_host, smtp_port | main.tf:66,98 |
| rybbit | postgresql_host | main.tf:162 |
| netbox | postgresql_host | main.tf:73 |
| speedtest | mysql_host | main.tf:85 |
| real-estate-crawler | redis_host, mysql_host | main.tf:140,153,157,301,305,309,401,405,409 |
| ytdlp | redis_host, ollama_host | main.tf:241,255 |
| resume | smtp_host, smtp_port | main.tf:186 |
| monitoring | mysql_host, smtp_host | grafana_chart_values.yaml:51, prometheus_chart_values.tpl:35,37 |
Example modification for affine:
In modules/kubernetes/affine/main.tf, add variables:
variable "redis_host" { type = string }
variable "postgresql_host" { type = string }
variable "postgresql_port" { type = number }
variable "smtp_host" { type = string }
variable "smtp_port" { type = number }
Replace:
# Before:
DATABASE_URL = "postgresql://postgres:${var.postgresql_password}@postgresql.dbaas.svc.cluster.local:5432/affine"
# After:
DATABASE_URL = "postgresql://postgres:${var.postgresql_password}@${var.postgresql_host}:${var.postgresql_port}/affine"
# Before:
REDIS_SERVER_HOST = "redis.redis.svc.cluster.local"
# After:
REDIS_SERVER_HOST = var.redis_host
Step: Commit each module modification
git add modules/kubernetes/<service>/
git commit -m "[ci skip] Accept host variables in <service> module"
Task 10: Migrate Database-Backed Services to Terragrunt Stacks
After modules are modified (Task 9), create stacks that wire platform outputs to module inputs.
Example: stacks/affine/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
dependency "platform" {
config_path = "../platform"
}
inputs = {
tls_secret_name = dependency.platform.outputs.tls_secret_name
redis_host = dependency.platform.outputs.redis_host
postgresql_host = dependency.platform.outputs.postgresql_host
postgresql_port = dependency.platform.outputs.postgresql_port
smtp_host = dependency.platform.outputs.smtp_host
smtp_port = dependency.platform.outputs.smtp_port
}
stacks/affine/main.tf:
variable "tls_secret_name" {}
variable "kube_config_path" { default = "~/.kube/config" }
variable "affine_postgresql_password" {}
variable "redis_host" { type = string }
variable "postgresql_host" { type = string }
variable "postgresql_port" { type = number }
variable "smtp_host" { type = string }
variable "smtp_port" { type = number }
module "affine" {
source = "../../modules/kubernetes/affine"
tls_secret_name = var.tls_secret_name
postgresql_password = var.affine_postgresql_password
redis_host = var.redis_host
postgresql_host = var.postgresql_host
postgresql_port = var.postgresql_port
smtp_host = var.smtp_host
smtp_port = var.smtp_port
tier = "4-aux"
}
State migration follows the same pattern as Task 7.
Repeat for all DB-backed services from the table in Task 9.
Task 11: Migrate Service-to-Service Dependencies
Services that depend on other non-platform services need multi-dependency stacks.
Step 1: Create ollama stack with outputs
# stacks/ollama/main.tf
variable "tls_secret_name" {}
variable "kube_config_path" { default = "~/.kube/config" }
variable "ollama_api_credentials" {}
module "ollama" {
source = "../../modules/kubernetes/ollama"
tls_secret_name = var.tls_secret_name
tier = "2-gpu"
ollama_api_credentials = var.ollama_api_credentials
}
output "ollama_host" {
value = "ollama.ollama.svc.cluster.local"
}
Step 2: Create openclaw stack with ollama dependency
# stacks/openclaw/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
dependency "platform" {
config_path = "../platform"
}
dependency "ollama" {
config_path = "../ollama"
}
inputs = {
tls_secret_name = dependency.platform.outputs.tls_secret_name
ollama_host = dependency.ollama.outputs.ollama_host
}
Step 3: Similarly for coturn → f1-stream and osm-routing → real-estate-crawler
Task 12: Final Cleanup
Step 1: Remove legacy modules/kubernetes/main.tf
After all services are migrated, this file should be empty (or contain only commented-out blocks). Delete it.
Step 2: Remove kubernetes_cluster module call from root main.tf
The root main.tf should now only contain provider blocks (which can also be removed since Terragrunt generates them) and the variable declarations for terraform.tfvars loading.
Step 3: Archive legacy state
mv terraform.tfstate terraform.tfstate.legacy
mv terraform.tfstate.backup-* state/backups/
Step 4: Verify full DAG
cd stacks && terragrunt run-all plan
Expected: All stacks show No changes.
Step 5: Update CLAUDE.md
Update the knowledge file to reflect the new Terragrunt architecture, commands, and workflow.
Step 6: Final commit
git add -A
git commit -m "[ci skip] Complete Terragrunt migration — remove legacy monolith"
Task 13: Update CI/CD (Drone Pipeline)
Files:
- Modify:
.drone.yml
Create a Drone pipeline that:
- Detects changed files
- Maps to affected stacks
- Runs
terragrunt plan(on PR) orterragrunt apply(on master merge)
See design doc section "CI/CD: Changed-Stack Detection" for the pipeline logic.
Execution Order Summary
| Task | Phase | Risk | Reversible |
|---|---|---|---|
| 1. Install + skeleton | 0 | None | Yes (delete dirs) |
| 2. Root terragrunt.hcl | 0 | None | Yes (delete file) |
| 3. Infra stack files | 1 | None | Yes (delete stack) |
| 4. Infra state migration | 1 | Medium | Yes (state mv back) |
| 5. Platform stack files | 2 | None | Yes (delete stack) |
| 6. Platform state migration | 2 | High | Yes (state mv back) |
| 7. First simple service (blog) | 3 | Low | Yes (state mv back) |
| 8. Batch simple services | 3 | Low | Yes (state mv back) |
| 9. Module host variable mods | 4 | Low | Yes (revert changes) |
| 10. DB service stacks | 4 | Low | Yes (state mv back) |
| 11. Service-to-service deps | 5 | Low | Yes (state mv back) |
| 12. Final cleanup | 6 | Medium | Harder to reverse |
| 13. CI/CD update | 6 | Low | Yes (revert .drone.yml) |