infra

Author	SHA1	Message	Date
Viktor Barzin	c8b42f78df	fix DB password rotation desync in 5 stacks Vault DB engine rotates passwords weekly but 5 stacks baked passwords at Terraform plan time, causing stale credentials until next apply. - real-estate-crawler: add vault-database ESO, use secret_key_ref in 3 deployments - nextcloud: switch Helm chart to existingSecret for DB password - grafana: add vault-database ESO, use envFromSecrets in Helm values - woodpecker: use extraSecretNamesForEnvFrom, remove plan-time data source chain - affine: add vault-database ESO, use secret_key_ref in deployment + init container	2026-03-17 07:39:29 +00:00
Viktor Barzin	745e43c983	fix DB password desync + migrate remaining tfvars to Vault DB desync fix: Stacks with Vault DB engine rotation (24h) now read the password from vault-database ClusterSecretStore instead of vault-kv. 9 stacks updated with db ExternalSecrets reading from static-creds/*. Stacks fixed: speedtest, hackmd, health, trading-bot, claude-memory, woodpecker, linkwarden, nextcloud, url. terraform.tfvars migration: - plotting-book: google_client_id/secret → Vault KV + secret_key_ref - tandoor: email_password var removed (was default="", now optional ESO) - infra: ssh_private_key, vm_wizard_password, dockerhub_registry_password → Vault KV at secret/infra + data source	2026-03-15 21:39:45 +00:00
Viktor Barzin	06a0d0599a	regenerate providers.tf: remove vault_root_token variable [ci skip]	2026-03-15 21:21:01 +00:00
Viktor Barzin	0f262ceda3	add pod dependency management via Kyverno init container injection Kyverno ClusterPolicy reads dependency.kyverno.io/wait-for annotation and injects busybox init containers that block until each dependency is reachable (nc -z). Annotations added to 18 stacks (24 deployments). Includes graceful-db-maintenance.sh script for planned DB maintenance (scales dependents to 0, saves replica counts, restores on startup).	2026-03-15 19:17:57 +00:00
Viktor Barzin	1acf8cc4e8	migrate consuming stacks to ESO + remove k8s-dashboard static token Phase 9: ExternalSecret migration across 26 stacks: Fully migrated (vault data source removed, ESO delivers secrets): - speedtest, shadowsocks, wealthfolio, plotting-book, f1-stream, tandoor - n8n, dawarich, diun, netbox, onlyoffice, tuya-bridge - hackmd (ESO template for DB URL), health (ESO template for DB URL) - trading-bot (ESO template for DATABASE_URL + 7 secret env vars) - forgejo (removed unused vault data source) Partially migrated (vault kept for plan-time, ESO added for runtime): - immich, linkwarden, nextcloud, paperless-ngx (jsondecode for homepage) - claude-memory, rybbit, url, webhook_handler (plan-time in locals/jobs) - woodpecker, openclaw, resume (plan-time in helm values/jobs/modules) 17 stacks unchanged (all plan-time: homepage annotations, configmaps, module inputs) — vault data source works with OIDC auth. Phase 17a: Remove k8s-dashboard static admin token secret. Users now get tokens via: vault write kubernetes/creds/dashboard-admin	2026-03-15 19:05:04 +00:00
Viktor Barzin	3aba29e7a3	remove SOPS pipeline, deploy ESO + Vault DB/K8s engines Vault is now the sole source of truth for secrets. SOPS pipeline removed entirely — auth via `vault login -method=oidc`. Part A: SOPS removal - vault/main.tf: delete 990 lines (93 vars + 43 KV write resources), add self-read data source for OIDC creds from secret/vault - terragrunt.hcl: remove SOPS var loading, vault_root_token, check_secrets hook - scripts/tg: remove SOPS decryption, keep -auto-approve logic - .woodpecker/default.yml: replace SOPS with Vault K8s auth via curl - Delete secrets.sops.json, .sops.yaml Part B: External Secrets Operator - New stack stacks/external-secrets/ with Helm chart + 2 ClusterSecretStores (vault-kv for KV v2, vault-database for DB engine) Part C: Database secrets engine (in vault/main.tf) - MySQL + PostgreSQL connections with static role rotation (24h) - 6 MySQL roles (speedtest, wrongmove, codimd, nextcloud, shlink, grafana) - 6 PostgreSQL roles (trading, health, linkwarden, affine, woodpecker, claude_memory) Part D: Kubernetes secrets engine (in vault/main.tf) - RBAC for Vault SA to manage K8s tokens - Roles: dashboard-admin, ci-deployer, openclaw, local-admin - New scripts/vault-kubeconfig helper for dynamic kubeconfig K8s auth method with scoped policies for CI, ESO, OpenClaw, Woodpecker sync.	2026-03-15 16:37:38 +00:00
Viktor Barzin	23019da8e5	equalize memory req=lim across 70+ containers using Prometheus 7d max data After node2 OOM incident, right-size memory across the cluster by setting requests=limits based on max_over_time(container_memory_working_set_bytes[7d]) with 1.3x headroom. Eliminates ~37Gi overcommit gap. Categories: - Safe equalization (50 containers): set req=lim where max7d well within target - Limit increases (8 containers): raise limits for services spiking above current - No Prometheus data (12 containers): conservatively set lim=req - Exception: nextcloud keeps req=256Mi/lim=8Gi due to Apache memory spikes Also increased dbaas namespace quota from 12Gi to 16Gi to accommodate mysql 4Gi limits across 3 replicas.	2026-03-14 21:46:49 +00:00
Viktor Barzin	a8d944eb9b	migrate all secrets from SOPS to Vault KV - Add vault provider to root terragrunt.hcl (generated providers.tf) - Delete stacks/vault/vault_provider.tf (now in generated providers.tf) - Add 124 variable declarations + 43 vault_kv_secret_v2 resources to vault/main.tf to populate Vault KV at secret/<stack-name> - Migrate 43 consuming stacks to read secrets from Vault KV via data "vault_kv_secret_v2" instead of SOPS var-file - Add dependency "vault" to all migrated stacks' terragrunt.hcl - Complex types (maps/lists) stored as JSON strings, decoded with jsondecode() in locals blocks Bootstrap secrets (vault_root_token, vault_authentik_client_id, vault_authentik_client_secret) remain in SOPS permanently. Apply order: vault stack first (populates KV), then all others.	2026-03-14 17:15:48 +00:00
Viktor Barzin	b00f810d3d	Remove all CPU limits cluster-wide to eliminate CFS throttling CPU limits cause CFS throttling even when nodes have idle capacity. Move to a request-only CPU model: keep CPU requests for scheduling fairness but remove all CPU limits. Memory limits stay (incompressible). Changes across 108 files: - Kyverno LimitRange policy: remove cpu from default/max in all 6 tiers - Kyverno ResourceQuota policy: remove limits.cpu from all 5 tiers - Custom ResourceQuotas: remove limits.cpu from 8 namespace quotas - Custom LimitRanges: remove cpu from default/max (nextcloud, onlyoffice) - RBAC module: remove cpu_limits variable and quota reference - Freedify factory: remove cpu_limit variable and limits reference - 86 deployment files: remove cpu from all limits blocks - 6 Helm values files: remove cpu under limits sections	2026-03-14 08:51:45 +00:00
Viktor Barzin	120f83ce93	Nextcloud performance tuning and fix backup cron job - Set loglevel=2 (warnings) and disable mail_smtpdebug via configs - Enable opcache.enable_file_override for faster file checks - Increase APCu shared memory from 32M to 128M - Fix broken module.nfs_nextcloud_data reference in backup cron job to use the iSCSI PVC directly	2026-03-14 08:20:51 +00:00
Viktor Barzin	af5f6a659b	right-size Nextcloud resources after MySQL migration SQLite caused 4.7 CPU / 2GB usage, now MySQL uses ~95m / 95Mi. Reduced limits from 16 CPU / 6Gi to 2 CPU / 1Gi. Reduced requests from 100m / 1Gi to 50m / 256Mi. Frees ~14 CPU cores and 5Gi memory for other workloads.	2026-03-13 22:21:10 +00:00
Viktor Barzin	3e03fbec63	increase MaxRequestWorkers to 150 now that Nextcloud is on MySQL With SQLite, 50 workers caused all workers to block on DB locks. On MySQL, CPU is ~20m and memory ~143Mi — no resource pressure. The crash-looping was caused by hitting MaxRequestWorkers=50 limit ("server reached MaxRequestWorkers setting"), not by DB contention.	2026-03-13 22:20:52 +00:00
Viktor Barzin	aa3d3d0e66	migrate Nextcloud from SQLite to MySQL InnoDB Cluster SQLite was causing constant crash-looping (138 restarts/day) due to write lock contention with concurrent sync clients. Migration required workarounds for multiple occ db:convert-type bugs: - GR error 3100: SET GLOBAL sql_generate_invisible_primary_key = ON - utf8mb3 column creation: stripped 4-byte emoji + invalid UTF-8 from SQLite (F1 calendar events, filecache) - SQLite index corruption: repaired via .dump + INSERT OR IGNORE reimport - kubectl exec timeouts: used nohup + detached process Verified: all 136 tables migrated, 100% row count match across 15 key tables (users, files, calendars, contacts, shares, activity). Also fixed typo: databse → database in chart values.	2026-03-13 22:20:28 +00:00
OpenClaw	fd6c1cca93	fix(nextcloud): Database corruption recovery and conservative Apache tuning - Restored clean SQLite database from pre-migration backup - Fixed severe database corruption (rowid ordering, page corruption, index issues) - Applied conservative MaxRequestWorkers=15 for SQLite stability - Database integrity now perfect, all health checks passing - Ready for future MySQL migration with clean data [ci skip]	2026-03-12 13:38:37 +00:00
OpenClaw	db1e301eea	fix(nextcloud): Increase Apache MaxRequestWorkers to resolve health check timeouts - Increase MaxRequestWorkers from 10 to 25 for 4 CPU + 3Gi memory container - Update Apache tuning for Redis + SQLite backend (not pure SQLite) - Resolves CrashLoopBackOff caused by health probe timeouts - Allows handling concurrent users without MaxRequestWorkers limit errors [ci skip]	2026-03-12 13:14:20 +00:00
Viktor Barzin	3f0cf4ff4d	stabilize Nextcloud: relax probes, reduce resources for 2-client SQLite workload SQLite locks cause slow responses under concurrent access, triggering liveness probe failures and restarts. With only 2 sync clients: - Liveness: period 30→60s, timeout 10→30s, failures 6→10 (tolerates 10min) - Readiness: period 30→60s, timeout 10→30s, failures 3→5 - Startup: timeout 10→30s - Resources: CPU 16→4, memory 6Gi→3Gi (10 workers × 200MB = 2GB max) [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-12 10:01:20 +00:00
Viktor Barzin	81bfccaefc	fix OOM kills: tune MySQL memory, reduce Nextcloud workers, increase Uptime Kuma limit MySQL (3 OOM kills): - Cap group_replication_message_cache_size to 128MB (default 1GB caused OOM) - Reduce innodb_log_buffer_size from 64MB to 16MB - Lower max_connections from 151 to 80 (peak usage ~40) - Increase memory limit from 3Gi to 4Gi for headroom Nextcloud (30+ apache2 OOM kills per incident): - Reduce MaxRequestWorkers from 50 to 10 to prevent fork bomb when SQLite locks cause request pileup - Lower StartServers/MinSpare/MaxSpare proportionally Uptime Kuma (Node.js memory leak): - Increase memory limit from 256Mi to 512Mi - Increase CPU limit from 200m to 500m Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-12 07:26:08 +00:00
Viktor Barzin	f07f05f9bb	migrate Nextcloud data volume from NFS to iSCSI for fsync support SQLite on NFS caused persistent 500 errors on WebDAV PROPFIND due to missing fsync guarantees and database locking under concurrent access. iSCSI (ext4) provides proper fsync and block-level I/O. - Replace nfs_volume module with iscsi-truenas PVC (20Gi) - Update Helm chart to use nextcloud-data-iscsi claim - Excluded 12.5GB nextcloud.log and corrupted DB from migration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-11 23:24:03 +00:00
Viktor Barzin	d8bcdfef2e	revert MaxRequestWorkers to 50, exclude nextcloud from 5xx alert - MaxRequestWorkers 25→50: too few workers caused ALL workers to block on SQLite locks, making liveness probes fail even faster (131 restarts vs 50 before). 50 is a compromise — enough workers for probes. - Excluded nextcloud from HighServiceErrorRate alert (chronic SQLite issue) - MySQL migration attempted but hit: GR error 3100 (fixed with GIPK), emoji in calendar/filecache (stripped), SQLite corruption (pre-existing from crash-looping). Migration rolled back, Nextcloud restored to SQLite.	2026-03-09 22:05:20 +00:00
Viktor Barzin	0ca81a6112	fix: mount Apache MPM config under nextcloud.extraVolumes (not top-level) The Nextcloud Helm chart expects extraVolumes/extraVolumeMounts nested under the nextcloud: key. Also mount to mods-available/ (the actual file) not mods-enabled/ (which is a symlink). Verified: MaxRequestWorkers 150→25, workers dropped from 49 to 6.	2026-03-08 21:37:39 +00:00
Viktor Barzin	ff03f2b99f	tune Nextcloud Apache/PHP to fix constant crash-looping (50 restarts/6d) Root cause: Apache prefork with 150 MaxRequestWorkers (each ~220MB RSS) on SQLite DB causes worker exhaustion + lock contention → Apache hangs → aggressive liveness probe (3 failures × 10s) kills container. Fixes: - Apache: MaxRequestWorkers 150→25, MaxConnectionsPerChild 0→200, StartServers 5→3 (via ConfigMap mount over mpm_prefork.conf) - PHP: max_execution_time 0→300s, max_input_time 300s (prevent zombie workers) - Liveness probe: period 10s→30s, failureThreshold 3→6, timeout 5s→10s (180s tolerance vs 30s before) - Readiness probe: period 10s→30s, timeout 5s→10s	2026-03-08 21:33:27 +00:00
Viktor Barzin	f3042f318e	[ci skip] fix widget issues: ports, Immich v2 API, Nextcloud trusted domains - qBittorrent: use service port 80 (not container port 8080) - Immich: add version=2 for new API endpoints (/api/server/*) - Nextcloud: use external URL (internal rejects untrusted Host header) - HA London: remove widget (token expired, needs manual regeneration) - Headscale: remove widget (requires nodeId param, not overview)	2026-03-07 20:39:56 +00:00
Viktor Barzin	17256c8f76	[ci skip] fix widget URLs: use correct k8s service ports Services expose port 80 via ClusterIP but widgets were using container target ports (5000, 3001, 4533, 3000). Calibre was using external URL through Authentik. All now use correct internal service URLs.	2026-03-07 20:39:56 +00:00
Viktor Barzin	57eed07370	[ci skip] add widgets for qbittorrent, navidrome, nextcloud, freshrss, linkwarden, uptime-kuma Add API credentials to SOPS and wire homepage_credentials through stacks. Re-add Uptime Kuma widget with new "infra" status page slug.	2026-03-07 20:39:55 +00:00
Viktor Barzin	6bd3970579	[ci skip] add Homepage gethomepage.dev annotations to all services Add Kubernetes ingress annotations for Homepage auto-discovery across ~88 services organized into 11 groups. Enable serviceAccount for RBAC, configure group layouts, and add Grafana/Frigate/Speedtest widgets.	2026-03-07 20:39:54 +00:00
Viktor Barzin	1f2c1ca361	[ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars Phase 5 — CI pipelines: - default.yml: add SOPS decrypt in prepare step, change git add . to specific paths (stacks/ state/ .woodpecker/), cleanup on success+failure - renew-tls.yml: change git add . to git add secrets/ state/ Phase 6 — sensitive=true: - Add sensitive = true to 256 variable declarations across 149 stack files - Prevents secret values from appearing in terraform plan output - Does NOT modify shared modules (ingress_factory, nfs_volume) to avoid breaking module interface contracts Note: CI pipeline SOPS decryption requires sops_age_key Woodpecker secret to be created before the pipeline will work with SOPS. Until then, the old terraform.tfvars path continues to function.	2026-03-07 14:30:36 +00:00
Viktor Barzin	0abae33c71	[ci skip] complete NFS CSI migration: complex stacks + platform modules Migrate remaining multi-volume stacks and all platform modules from inline NFS volumes to CSI-backed PV/PVC with nfs-truenas StorageClass (soft,timeo=30,retrans=3 mount options). Complex stacks: openclaw (4 vols), immich (8 vols), frigate (2 vols), nextcloud (2 vols + old PV replaced), rybbit (1 vol) Remaining stacks: affine, ebook2audiobook, f1-stream, osm_routing, real-estate-crawler Platform modules: monitoring (prometheus, loki, alertmanager PVs converted from native NFS to CSI), redis, dbaas, technitium, headscale, vaultwarden, uptime-kuma, mailserver, infra-maintenance	2026-03-02 01:24:07 +00:00
Viktor Barzin	9e4fb23b10	[ci skip] right-size all pod resources based on VPA + live metrics audit Full cluster resource audit: cross-referenced Goldilocks VPA recommendations, live kubectl top metrics, and Terraform definitions for 100+ containers. Critical fixes: - dashy: CPU throttled at 98% (490m/500m) → 2 CPU limit - stirling-pdf: CPU throttled at 99.7% (299m/300m) → 2 CPU limit - traefik auth-proxy/bot-block-proxy: mem limit 32Mi → 128Mi Added explicit resources to ~40 containers that had none: - audiobookshelf, changedetection, cyberchef, dawarich, diun, echo, excalidraw, freshrss, hackmd, isponsorblocktv, linkwarden, n8n, navidrome, ntfy, owntracks, privatebin, send, shadowsocks, tandoor, tor-proxy, wealthfolio, networking-toolbox, rybbit, mailserver, cloudflared, pgadmin, phpmyadmin, crowdsec-web, xray, wireguard, k8s-portal, tuya-bridge, ollama-ui, whisper, piper, immich-server, immich-postgresql, osrm-foot GPU containers: added CPU/mem alongside GPU limits: - ollama: removed CPU/mem limits (models vary in size), keep GPU only - frigate: req 500m/2Gi, lim 4/8Gi + GPU - immich-ml: req 100m/1Gi, lim 2/4Gi + GPU Right-sized ~25 over-provisioned containers: - kms-web-page: 500m/512Mi → 50m/64Mi (was using 0m/10Mi) - onlyoffice: CPU 8 → 2 (VPA upper 45m) - realestate-crawler-api: CPU 2000m → 250m - blog/travel-blog/webhook-handler: 500m → 100m - coturn/health/plotting-book: reduced to match actual usage Conservative methodology: limits = max(VPA upper * 2, live usage * 2)	2026-03-01 19:18:50 +00:00
Viktor Barzin	4558688baf	[ci skip] nextcloud: bump CPU limit to 16, add custom ResourceQuota CPU was pegged at 2000m/2000m (100% throttled). Add custom-quota opt-out label and ResourceQuota allowing 32 CPU limits to accommodate the 16 CPU container limit plus sidecar defaults.	2026-03-01 17:41:18 +00:00
Viktor Barzin	f2678d3494	[ci skip] fix MySQL cluster RBAC, Kyverno policy bugs, Nextcloud memory - dbaas: add mysql-sidecar-extra ClusterRole for namespaces/CRD list/watch needed by kopf framework in sidecar containers - kyverno: restrict inject-priority-class-from-tier to CREATE operations only (was blocking pod patches with immutable spec error) - kyverno: add resource-governance/custom-limitrange label opt-out to LimitRange generation policy (mirrors existing custom-quota) - nextcloud: bump memory limit 4Gi -> 6Gi, add custom LimitRange with 8Gi max, opt out of Kyverno-managed LimitRange	2026-03-01 17:16:03 +00:00
Viktor Barzin	fcb7d6780e	[ci skip] fix nextcloud: increase memory to 4Gi, extend startup probe - Memory limit: 2Gi → 4Gi (VPA target is 2.8Gi, was OOMKilling) - Memory request: 512Mi → 1Gi - Startup probe: 30s delay, 10s timeout, 60 failures (10min total) Previous 5min window was too short for NFS-backed SQLite init	2026-02-28 23:32:28 +00:00
Viktor Barzin	379c7e261f	[ci skip] fix nextcloud OOMKilled: increase memory limit to 2Gi	2026-02-28 20:21:00 +00:00
Viktor Barzin	a1ba218cd2	[ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk Major milestone - shared PostgreSQL moved from NFS to CloudNativePG: - CNPG cluster (pg-cluster) running in dbaas namespace on local-path storage - PostGIS image (ghcr.io/cloudnative-pg/postgis:16) for dawarich compatibility - All 20 databases and 19 roles restored from pg_dumpall backup - postgresql.dbaas Service patched to point at CNPG primary - Old PG deployment scaled to 0 (NFS data intact for rollback) - All 12+ dependent services verified running: authentik, n8n, dawarich, tandoor, linkwarden, netbox, woodpecker, rybbit, affine, health, resume, trading-bot, atuin - Authentik PgBouncer working through the switched endpoint TODO: codify CNPG cluster in Terraform, add 2nd replica, update backup CronJob	2026-02-28 19:08:06 +00:00
Viktor Barzin	c6beefc845	[ci skip] nextcloud: increase resource limits to prevent OOM crash loop Default LimitRange (256Mi) was too low — pod was using 227Mi/256Mi and getting OOM killed under sync client load, causing 500s and blank web UI.	2026-02-28 16:26:19 +00:00
Viktor Barzin	89a6e08245	[ci skip] Infrastructure hardening: security, monitoring, reliability, maintainability Phase 1 - Critical Security: - Netbox: move hardcoded DB/superuser passwords to variables - MeshCentral: disable public registration, add Authentik auth - Traefik: disable insecure API dashboard (api.insecure=false) - Traefik: configure forwarded headers with Cloudflare trusted IPs Phase 2 - Security Hardening: - Add security headers middleware (HSTS, X-Frame-Options, nosniff, etc.) - Add Kyverno pod security policies in audit mode (privileged, host namespaces, SYS_ADMIN, trusted registries) - Tighten rate limiting (avg=10, burst=50) - Add Authentik protection to grampsweb Phase 3 - Monitoring & Alerting: - Add critical service alerts (PostgreSQL, MySQL, Redis, Headscale, Authentik, Loki) - Increase Loki retention from 7 to 30 days (720h) - Add predictive PV filling alert (predict_linear) - Re-enable Hackmd and Privatebin down alerts Phase 4 - Reliability: - Add resource requests/limits to Redis, DBaaS, Technitium, Headscale, Vaultwarden, Uptime Kuma - Increase Alloy DaemonSet memory to 512Mi/1Gi Phase 6 - Maintainability: - Extract duplicated tiers locals to terragrunt.hcl generate block (removed from 67 stacks) - Replace hardcoded NFS IP 10.0.10.15 with var.nfs_server (114 instances across 63 files) - Replace hardcoded Redis/PostgreSQL/MySQL/Ollama/mail host references with variables across ~35 stacks - Migrate xray raw ingress resources to ingress_factory modules	2026-02-23 22:05:28 +00:00
Viktor Barzin	c7c7047f1c	[ci skip] Flatten module wrappers into stack roots Remove the module "xxx" { source = "./module" } indirection layer from all 66 service stacks. Resources are now defined directly in each stack's main.tf instead of through a wrapper module. - Merge module/main.tf contents into stack main.tf - Apply variable replacements (var.tier -> local.tiers.X, renamed vars) - Fix shared module paths (one fewer ../ at each level) - Move extra files/dirs (factory/, chart_values, subdirs) to stack root - Update state files to strip module.<name>. prefix - Update CLAUDE.md to reflect flat structure Verified: terragrunt plan shows 0 add, 0 destroy across all stacks.	2026-02-22 15:13:55 +00:00
Viktor Barzin	e6420c7b36	[ci skip] Move Terraform modules into stack directories Move all 88 service modules (66 individual + 22 platform) from modules/kubernetes/<service>/ into their corresponding stack directories: - Service stacks: stacks/<service>/module/ - Platform stack: stacks/platform/modules/<service>/ This collocates module source code with its Terragrunt definition. Only shared utility modules remain in modules/kubernetes/: ingress_factory, setup_tls_secret, dockerhub_secret, oauth-proxy. All cross-references to shared modules updated to use correct relative paths. Verified with terragrunt run --all -- plan: 0 adds, 0 destroys across all 68 stacks.	2026-02-22 14:38:14 +00:00
Viktor Barzin	a9ba8899be	[ci skip] Phase 3: Create 66 service stacks and migrate state Generated individual stack directories for all 66 services under stacks/. Each stack has terragrunt.hcl (depends on platform) and main.tf (thin wrapper calling existing module). Migrated all 64 active service states from root terraform.tfstate to individual state files. Root state is now empty. Verified with terragrunt plan on multiple stacks (no changes).	2026-02-22 13:56:34 +00:00

38 commits