infra/stacks
Viktor Barzin 20774f794d dbaas+monitoring: bump PG max_connections to 200, add scrape + alerts
Cluster grew past the 100-conn default — steady-state idle was 90/100,
leaving zero headroom for terragrunt applies or transient surges. The
ceiling was being discovered by Terraform crashing (pq: "remaining
connection slots are reserved for roles with the SUPERUSER attribute"),
not by alerting, because we had no PG scrape config at all.

dbaas (Tier 0):
  * max_connections: 100 → 200
  * shared_buffers: 512MB → 1GB (Postgres recommends ~25% of pod memory)
  * effective_cache_size: 1536MB → 2560MB (scaled with pod memory)
  * pod memory: 2Gi → 3Gi (rough rule of thumb: enough for shared_buffers
    + ~16MB work_mem * concurrent sorts + OS cache + overhead)
  * Triggers bump on null_resource.pg_cluster forces CNPG to re-apply,
    which rolls the cluster (standby first, then primary failover).

monitoring:
  * New scrape job 'cnpg' on dbaas namespace pods labeled
    cnpg.io/podRole=instance, port name=metrics (9187). Relabels add
    cnpg_cluster + cnpg_role labels for alert grouping.
  * PGConnectionsHigh (warning, >85% for 10m) — heads-up before exhaustion.
  * PGConnectionsCritical (critical, >95% for 3m) — last call before
    refusing connections.

Verified: cnpg targets up, sum(cnpg_backends_total)=84, max_connections
metric=200, alert ratio 0.42 → both alerts inactive.
2026-05-22 14:16:44 +00:00
..
_template ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
actualbudget infra/ingress_factory: add auth = "app" mode for self-authed backends 2026-05-22 14:16:44 +00:00
affine infra/ingress_factory: add auth = "app" mode for self-authed backends 2026-05-22 14:16:44 +00:00
authentik state(dbaas): update encrypted state 2026-05-22 14:16:43 +00:00
beads-server fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
blog ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
broker-sync fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
calico
changedetection fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
chrome-service fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
city-guesser ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
claude-agent-service fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
claude-memory claude-memory / resume: unblock terragrunt apply (var defaults + psql -d postgres) 2026-05-22 14:16:44 +00:00
cloudflared cloudflare: disable AI bot edge-block so x402 can issue payment offers 2026-05-22 14:16:42 +00:00
cnpg
coturn
crowdsec ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
cyberchef ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
dashy ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
dawarich ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
dbaas dbaas+monitoring: bump PG max_connections to 200, add scrape + alerts 2026-05-22 14:16:44 +00:00
descheduler
diun fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
ebook2audiobook ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
ebooks infra/ingress_factory: add auth = "app" mode for self-authed backends 2026-05-22 14:16:44 +00:00
echo state(dbaas): update encrypted state 2026-05-22 14:16:43 +00:00
excalidraw fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
external-secrets
f1-stream fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
fire-planner fire-planner / k8s-portal / insta2spotify: revert auth=public to auth=none 2026-05-22 14:16:42 +00:00
foolery ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
forgejo fix: pvc-autoresizer threshold should be 10%, not 80% 2026-05-22 14:16:43 +00:00
freedify ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
freshrss infra/ingress_factory: add auth = "app" mode for self-authed backends 2026-05-22 14:16:44 +00:00
frigate Woodpecker CI deploy [CI SKIP] 2026-05-22 14:16:43 +00:00
grampsweb fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
hackmd fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
headscale fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
health fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
hermes-agent fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
homepage ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
immich infra/ingress_factory: add auth = "app" mode for self-authed backends 2026-05-22 14:16:44 +00:00
infra [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 23:29:34 +00:00
infra-maintenance
insta2spotify fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
instagram-poster fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
isponsorblocktv fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
job-hunter grafana: env-var datasources + reloader so Vault rotations stop breaking dashboards 2026-05-10 11:12:39 +00:00
jsoncrack ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
k8s-dashboard ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
k8s-portal fire-planner / k8s-portal / insta2spotify: revert auth=public to auth=none 2026-05-22 14:16:42 +00:00
k8s-version-upgrade k8s-version-upgrade: detection script refresh apt before madison + DRY_RUN_OVERRIDE 2026-05-22 14:16:43 +00:00
kms ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
kured kured(sentinel-gate): fix auth + write-perm so safety checks actually run 2026-05-22 14:16:41 +00:00
kyverno [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 23:29:34 +00:00
linkwarden infra/ingress_factory: add auth = "app" mode for self-authed backends 2026-05-22 14:16:44 +00:00
llama-cpp infra/llama-cpp: benchmark report + -fa flag fix 2026-05-22 14:16:41 +00:00
local-path
mailserver fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
matrix fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
meshcentral fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
metallb
metrics-server
monitoring dbaas+monitoring: bump PG max_connections to 200, add scrape + alerts 2026-05-22 14:16:44 +00:00
n8n fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
navidrome fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
netbox ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
networking-toolbox ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
nextcloud fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
nfs-csi [infra] TrueNAS decommission — remove active references from Terraform + configs 2026-04-19 16:57:05 +00:00
nodelocal-dns [dns] NodeLocal DNSCache — deploy DaemonSet to all nodes (WS C) 2026-04-19 15:46:41 +00:00
novelapp infra/ingress_factory: add auth = "app" mode for self-authed backends 2026-05-22 14:16:44 +00:00
ntfy fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
nvidia healthcheck: tune noise filters + nvidia-exporter auth=none 2026-05-22 14:16:43 +00:00
onlyoffice fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
openclaw fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
osm_routing
owntracks Woodpecker CI deploy [CI SKIP] 2026-05-22 14:16:42 +00:00
paperless-ngx fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
payslip-ingest grafana: env-var datasources + reloader so Vault rotations stop breaking dashboards 2026-05-10 11:12:39 +00:00
phpipam ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
platform
plotting-book fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
poison-fountain ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
postiz fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
priority-pass fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
privatebin fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
proxmox-csi proxmox-csi: opt SCs into pvc-autoresizer (resize.topolvm.io/enabled=true) 2026-05-22 14:16:41 +00:00
pvc-autoresizer
rbac
real-estate-crawler ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
redis fix: pvc-autoresizer threshold should be 10%, not 80% 2026-05-22 14:16:43 +00:00
reloader
resume Woodpecker CI deploy [CI SKIP] 2026-05-22 14:16:44 +00:00
reverse-proxy chore: remove decommissioned registry.viktorbarzin.me ingress 2026-05-10 11:12:37 +00:00
rybbit fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
sealed-secrets
send fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
servarr fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
shadowsocks
speedtest fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
status-page
stirling-pdf fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
tandoor infra/ingress_factory: add auth = "app" mode for self-authed backends 2026-05-22 14:16:44 +00:00
technitium fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
terminal ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
tor-proxy fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
trading-bot ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
traefik ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
travel_blog ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
tuya-bridge state(dbaas): update encrypted state 2026-05-22 14:16:43 +00:00
uptime-kuma fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
url ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
vault vault: enroll audit-vault-0 in pvc-autoresizer (10Gi limit) 2026-05-22 14:16:43 +00:00
vaultwarden fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
vpa ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
wealthfolio fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
webhook_handler ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
whisper fix: pvc-autoresizer + TF drift safety — bulk add ignore_changes 2026-05-22 14:16:43 +00:00
wireguard
woodpecker ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
xray ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00
ytdlp ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks 2026-05-22 14:16:42 +00:00