docs: refresh CNPG tuning note (archive_timeout=0, commit_delay, zstd) + apply gotcha
All checks were successful
ci/woodpecker/push/default Pipeline was successful

Reflects the write-reduction params applied in c3553731, and documents the
null_resource trigger-bump + targeted-apply gotcha so the next agent doesn't
hit the inert-change / mysql-VCT-drift traps.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-29 15:17:38 +00:00
parent ec04963bfe
commit a3f2c2947a

View file

@ -197,7 +197,7 @@ the workflow's built-in `GITHUB_TOKEN` (`packages: write`).
**`postgresql_host`** in `config.tfvars` is `pg-cluster-rw.dbaas.svc.cluster.local` (the CNPG primary). The legacy `postgresql.dbaas` service is a live compatibility alias (selector `cnpg.io/instanceRole=primary`, so it also reaches the primary — authentik's PgBouncer still points at it) — but use `pg-cluster-rw` for anything new. This variable is shared by ~12 stacks. **`postgresql_host`** in `config.tfvars` is `pg-cluster-rw.dbaas.svc.cluster.local` (the CNPG primary). The legacy `postgresql.dbaas` service is a live compatibility alias (selector `cnpg.io/instanceRole=primary`, so it also reaches the primary — authentik's PgBouncer still points at it) — but use `pg-cluster-rw` for anything new. This variable is shared by ~12 stacks.
**CNPG tuning** (in `stacks/dbaas/modules/dbaas/main.tf`): `shared_buffers=1024MB`, `effective_cache_size=2560MB`, `work_mem=16MB`, `max_connections=200`, `wal_compression=on`, pod memory 3Gi. **Write-reduction (2026-06-29, code-oflt):** `checkpoint_timeout=15min` + `max_wal_size=4GB` + `min_wal_size=1GB` — checkpoints were 100% timer-driven at the 5-min default, bursting full-page-writes onto the contended sdc HDD; all three are reloadable (no restart). **CNPG tuning** (in `stacks/dbaas/modules/dbaas/main.tf`): `shared_buffers=1024MB`, `effective_cache_size=2560MB`, `work_mem=16MB`, `max_connections=200`, pod memory 3Gi. **Write-reduction (2026-06-29, code-oflt, analysis #6922):** `checkpoint_timeout=15min` + `max_wal_size=4GB` + `min_wal_size=1GB` (checkpoints were 100% timer-driven at the 5-min default, bursting FPIs onto sdc); `archive_timeout=0` (CNPG forces `archive_mode=on` but `.spec.backup` is empty → a 16MB WAL switch every 300s shipped nowhere = ~4.6 GB/day waste; daily `pg_dump` is the real backup); `commit_delay=2500`µs (group-commit fsync coalescing, safe for all DBs incl financial); `wal_compression=zstd` (was pglz). All reloadable (no restart). **Apply gotcha:** the Cluster is a `null_resource.pg_cluster` + local-exec `kubectl apply` — bump its `pg_params` trigger or the YAML edit is inert, and apply with `-target=module.dbaas.null_resource.pg_cluster` to dodge the pre-existing `mysql_standalone` VCT-annotation drift that errors a broad `dbaas` apply.
## Networking & Resilience ## Networking & Resilience
- **Critical path services scaled to 3**: Traefik, Authentik, CrowdSec LAPI, PgBouncer, Cloudflared. - **Critical path services scaled to 3**: Traefik, Authentik, CrowdSec LAPI, PgBouncer, Cloudflared.