From a3f2c2947a44b884ede648322a1af08bb395c2a9 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Mon, 29 Jun 2026 15:17:38 +0000 Subject: [PATCH] docs: refresh CNPG tuning note (archive_timeout=0, commit_delay, zstd) + apply gotcha Reflects the write-reduction params applied in c3553731, and documents the null_resource trigger-bump + targeted-apply gotcha so the next agent doesn't hit the inert-change / mysql-VCT-drift traps. Co-Authored-By: Claude Opus 4.8 --- .claude/CLAUDE.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index e89b44f6..b5c5c47d 100755 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -197,7 +197,7 @@ the workflow's built-in `GITHUB_TOKEN` (`packages: write`). **`postgresql_host`** in `config.tfvars` is `pg-cluster-rw.dbaas.svc.cluster.local` (the CNPG primary). The legacy `postgresql.dbaas` service is a live compatibility alias (selector `cnpg.io/instanceRole=primary`, so it also reaches the primary — authentik's PgBouncer still points at it) — but use `pg-cluster-rw` for anything new. This variable is shared by ~12 stacks. -**CNPG tuning** (in `stacks/dbaas/modules/dbaas/main.tf`): `shared_buffers=1024MB`, `effective_cache_size=2560MB`, `work_mem=16MB`, `max_connections=200`, `wal_compression=on`, pod memory 3Gi. **Write-reduction (2026-06-29, code-oflt):** `checkpoint_timeout=15min` + `max_wal_size=4GB` + `min_wal_size=1GB` — checkpoints were 100% timer-driven at the 5-min default, bursting full-page-writes onto the contended sdc HDD; all three are reloadable (no restart). +**CNPG tuning** (in `stacks/dbaas/modules/dbaas/main.tf`): `shared_buffers=1024MB`, `effective_cache_size=2560MB`, `work_mem=16MB`, `max_connections=200`, pod memory 3Gi. **Write-reduction (2026-06-29, code-oflt, analysis #6922):** `checkpoint_timeout=15min` + `max_wal_size=4GB` + `min_wal_size=1GB` (checkpoints were 100% timer-driven at the 5-min default, bursting FPIs onto sdc); `archive_timeout=0` (CNPG forces `archive_mode=on` but `.spec.backup` is empty → a 16MB WAL switch every 300s shipped nowhere = ~4.6 GB/day waste; daily `pg_dump` is the real backup); `commit_delay=2500`µs (group-commit fsync coalescing, safe for all DBs incl financial); `wal_compression=zstd` (was pglz). All reloadable (no restart). **Apply gotcha:** the Cluster is a `null_resource.pg_cluster` + local-exec `kubectl apply` — bump its `pg_params` trigger or the YAML edit is inert, and apply with `-target=module.dbaas.null_resource.pg_cluster` to dodge the pre-existing `mysql_standalone` VCT-annotation drift that errors a broad `dbaas` apply. ## Networking & Resilience - **Critical path services scaled to 3**: Traefik, Authentik, CrowdSec LAPI, PgBouncer, Cloudflared.