docs: refresh CNPG tuning note (archive_timeout=0, commit_delay, zstd) + apply gotcha

Reflects the write-reduction params applied in c3553731, and documents the null_resource trigger-bump + targeted-apply gotcha so the next agent doesn't hit the inert-change / mysql-VCT-drift traps. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 15:17:38 +00:00 · 2026-06-29 15:17:38 +00:00 · a3f2c2947a
commit a3f2c2947a
parent ec04963bfe
1 changed files with 1 additions and 1 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@ -197,7 +197,7 @@ the workflow's built-in `GITHUB_TOKEN` (`packages: write`).
 **`postgresql_host`** in `config.tfvars` is `pg-cluster-rw.dbaas.svc.cluster.local` (the CNPG primary). The legacy `postgresql.dbaas` service is a live compatibility alias (selector `cnpg.io/instanceRole=primary`, so it also reaches the primary — authentik's PgBouncer still points at it) — but use `pg-cluster-rw` for anything new. This variable is shared by ~12 stacks.
-**CNPG tuning** (in `stacks/dbaas/modules/dbaas/main.tf`): `shared_buffers=1024MB`, `effective_cache_size=2560MB`, `work_mem=16MB`, `max_connections=200`, `wal_compression=on`, pod memory 3Gi. **Write-reduction (2026-06-29, code-oflt):** `checkpoint_timeout=15min` + `max_wal_size=4GB` + `min_wal_size=1GB` — checkpoints were 100% timer-driven at the 5-min default, bursting full-page-writes onto the contended sdc HDD; all three are reloadable (no restart).
+**CNPG tuning** (in `stacks/dbaas/modules/dbaas/main.tf`): `shared_buffers=1024MB`, `effective_cache_size=2560MB`, `work_mem=16MB`, `max_connections=200`, pod memory 3Gi. **Write-reduction (2026-06-29, code-oflt, analysis #6922):** `checkpoint_timeout=15min` + `max_wal_size=4GB` + `min_wal_size=1GB` (checkpoints were 100% timer-driven at the 5-min default, bursting FPIs onto sdc); `archive_timeout=0` (CNPG forces `archive_mode=on` but `.spec.backup` is empty → a 16MB WAL switch every 300s shipped nowhere = ~4.6 GB/day waste; daily `pg_dump` is the real backup); `commit_delay=2500`µs (group-commit fsync coalescing, safe for all DBs incl financial); `wal_compression=zstd` (was pglz). All reloadable (no restart). **Apply gotcha:** the Cluster is a `null_resource.pg_cluster` + local-exec `kubectl apply` — bump its `pg_params` trigger or the YAML edit is inert, and apply with `-target=module.dbaas.null_resource.pg_cluster` to dodge the pre-existing `mysql_standalone` VCT-annotation drift that errors a broad `dbaas` apply.
 ## Networking & Resilience
 - **Critical path services scaled to 3**: Traefik, Authentik, CrowdSec LAPI, PgBouncer, Cloudflared.