From a3f2c2947a44b884ede648322a1af08bb395c2a9 Mon Sep 17 00:00:00 2001
From: Viktor Barzin <vbarzin@gmail.com>
Date: Mon, 29 Jun 2026 15:17:38 +0000
Subject: [PATCH] docs: refresh CNPG tuning note (archive_timeout=0,
 commit_delay, zstd) + apply gotcha

Reflects the write-reduction params applied in c3553731, and documents the
null_resource trigger-bump + targeted-apply gotcha so the next agent doesn't
hit the inert-change / mysql-VCT-drift traps.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
---
 .claude/CLAUDE.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
index e89b44f6..b5c5c47d 100755
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -197,7 +197,7 @@ the workflow's built-in `GITHUB_TOKEN` (`packages: write`).
 
 **`postgresql_host`** in `config.tfvars` is `pg-cluster-rw.dbaas.svc.cluster.local` (the CNPG primary). The legacy `postgresql.dbaas` service is a live compatibility alias (selector `cnpg.io/instanceRole=primary`, so it also reaches the primary — authentik's PgBouncer still points at it) — but use `pg-cluster-rw` for anything new. This variable is shared by ~12 stacks.
 
-**CNPG tuning** (in `stacks/dbaas/modules/dbaas/main.tf`): `shared_buffers=1024MB`, `effective_cache_size=2560MB`, `work_mem=16MB`, `max_connections=200`, `wal_compression=on`, pod memory 3Gi. **Write-reduction (2026-06-29, code-oflt):** `checkpoint_timeout=15min` + `max_wal_size=4GB` + `min_wal_size=1GB` — checkpoints were 100% timer-driven at the 5-min default, bursting full-page-writes onto the contended sdc HDD; all three are reloadable (no restart).
+**CNPG tuning** (in `stacks/dbaas/modules/dbaas/main.tf`): `shared_buffers=1024MB`, `effective_cache_size=2560MB`, `work_mem=16MB`, `max_connections=200`, pod memory 3Gi. **Write-reduction (2026-06-29, code-oflt, analysis #6922):** `checkpoint_timeout=15min` + `max_wal_size=4GB` + `min_wal_size=1GB` (checkpoints were 100% timer-driven at the 5-min default, bursting FPIs onto sdc); `archive_timeout=0` (CNPG forces `archive_mode=on` but `.spec.backup` is empty → a 16MB WAL switch every 300s shipped nowhere = ~4.6 GB/day waste; daily `pg_dump` is the real backup); `commit_delay=2500`µs (group-commit fsync coalescing, safe for all DBs incl financial); `wal_compression=zstd` (was pglz). All reloadable (no restart). **Apply gotcha:** the Cluster is a `null_resource.pg_cluster` + local-exec `kubectl apply` — bump its `pg_params` trigger or the YAML edit is inert, and apply with `-target=module.dbaas.null_resource.pg_cluster` to dodge the pre-existing `mysql_standalone` VCT-annotation drift that errors a broad `dbaas` apply.
 
 ## Networking & Resilience
 - **Critical path services scaled to 3**: Traefik, Authentik, CrowdSec LAPI, PgBouncer, Cloudflared.