infra/stacks/dbaas
Viktor Barzin 3ef860b2be kured + cnpg: drain-safe defaults ahead of Monday reboot wave
Three defensive moves to make the kured rolling-reboot cycle survive
edge cases without operator intervention:

kured (stacks/kured/main.tf):
  - Set `configuration.drainTimeout = "30m"`. Default is unlimited; if
    a future PDB or finalizer stalls drain, kured retries forever and
    the node stays cordoned silently. 30m caps the silent-failure
    window — after timeout kured logs the abort and waits for the
    next period; the node stays Schedulable so cluster capacity isn't
    lost. Lets us fail closed instead of fail-silent.

CNPG pg-cluster (stacks/dbaas/modules/dbaas/main.tf):
  - Bump instances 2 → 3 (1 primary + 2 replicas). With 2 instances the
    failover during a primary-node drain depended on the lone replica
    being caught up; a WAL backlog would stall the drain until the
    replica was current. With 3 instances CNPG always has at least one
    fully-current replica to promote, and the PDB's
    `minAvailable=1` on the primary selector is satisfied throughout
    the switchover. Storage: +20Gi PVC on proxmox-lvm-encrypted (about
    35Gi after autoresize). Memory: +3Gi pod limit.
  - Updated the `triggers.instances` so the null_resource's local-exec
    actually re-applies the YAML (kubectl apply with the new spec). The
    YAML is the source-of-truth but the trigger is what tells terraform
    to re-run the provisioner.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 14:16:48 +00:00
..
modules/dbaas kured + cnpg: drain-safe defaults ahead of Monday reboot wave 2026-05-22 14:16:48 +00:00
main.tf [dbaas] Declare forgejo + roundcubemail MySQL users in Terraform 2026-04-17 22:06:23 +00:00
secrets extract dbaas, authentik, crowdsec from platform into independent stacks [ci skip] 2026-03-17 18:11:53 +00:00
terragrunt.hcl extract dbaas, authentik, crowdsec from platform into independent stacks [ci skip] 2026-03-17 18:11:53 +00:00