dbaas: CNPG write-reduction — archive_timeout=0, commit_delay, wal_compression=zstd
Part of code-oflt (cut sdc write IOPS before the SSD move; analysis #6922). - archive_timeout 300->0: CNPG forces archive_mode=on but .spec.backup is empty (no ObjectStore), so a 16MB WAL segment switch every 5min shipped NOWHERE = ~4.6 GB/day of pure-waste WAL on the contended sdc. archive_mode stays CNPG-on (reserved); 0 just stops the timed switch. Daily pg_dump cron unchanged. - commit_delay 0->2500us: group-commit coalesces concurrent fsyncs. SAFE for every DB incl financial -- data still fsynced before COMMIT acks, only <=2.5ms added latency under concurrency. - wal_compression pglz->zstd: ~30-50% smaller full-page images. All sighup-reloadable. Applied via targeted apply of module.dbaas.null_resource.pg_cluster (trigger bumped) to avoid the pre-existing mysql VCT drift that breaks broad dbaas applies. Refs: code-oflt. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
5d059786a1
commit
c3553731c7
1 changed files with 13 additions and 2 deletions
|
|
@ -1113,7 +1113,7 @@ resource "null_resource" "pg_cluster" {
|
||||||
storage_size = "20Gi"
|
storage_size = "20Gi"
|
||||||
storage_class = "proxmox-lvm-encrypted"
|
storage_class = "proxmox-lvm-encrypted"
|
||||||
memory_limit = "3Gi"
|
memory_limit = "3Gi"
|
||||||
pg_params = "v4-shared1024-walcomp-workmem16-max200-ckpt15m-wal4g-minwal1g"
|
pg_params = "v5-shared1024-walcompZSTD-workmem16-max200-ckpt15m-wal4g-minwal1g-archoff-cdelay2500"
|
||||||
affinity = "required-hostname-v1"
|
affinity = "required-hostname-v1"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
@ -1156,7 +1156,7 @@ resource "null_resource" "pg_cluster" {
|
||||||
shared_buffers: "1024MB"
|
shared_buffers: "1024MB"
|
||||||
effective_cache_size: "2560MB"
|
effective_cache_size: "2560MB"
|
||||||
work_mem: "16MB"
|
work_mem: "16MB"
|
||||||
wal_compression: "on"
|
wal_compression: "zstd"
|
||||||
random_page_cost: "4"
|
random_page_cost: "4"
|
||||||
checkpoint_completion_target: "0.9"
|
checkpoint_completion_target: "0.9"
|
||||||
# Write-reduction (2026-06-29, code-oflt): checkpoints were 100%
|
# Write-reduction (2026-06-29, code-oflt): checkpoints were 100%
|
||||||
|
|
@ -1169,6 +1169,17 @@ resource "null_resource" "pg_cluster" {
|
||||||
checkpoint_timeout: "15min"
|
checkpoint_timeout: "15min"
|
||||||
max_wal_size: "4GB"
|
max_wal_size: "4GB"
|
||||||
min_wal_size: "1GB"
|
min_wal_size: "1GB"
|
||||||
|
# Write-reduction (2026-06-29, analysis #6922). archive_timeout=0 stops
|
||||||
|
# the forced 16MB WAL segment switch every 300s that ships NOWHERE:
|
||||||
|
# archive_mode is CNPG-managed-on but .spec.backup is empty (no
|
||||||
|
# ObjectStore, firstRecoverabilityPoint empty), so it was ~4.6 GB/day of
|
||||||
|
# pure-waste WAL on the contended sdc. Daily pg_dump cron remains the real
|
||||||
|
# backup (~24h RPO). commit_delay groups concurrent fsyncs to cut fsync
|
||||||
|
# IOPS -- SAFE for ALL DBs incl financial: data is still fsynced before
|
||||||
|
# COMMIT acks; it only adds <=2.5ms latency under concurrency. (wal_compression
|
||||||
|
# also moved pglz->zstd above: ~30-50% smaller full-page images.)
|
||||||
|
archive_timeout: "0"
|
||||||
|
commit_delay: "2500"
|
||||||
enableAlterSystem: true
|
enableAlterSystem: true
|
||||||
enableSuperuserAccess: true
|
enableSuperuserAccess: true
|
||||||
inheritedMetadata:
|
inheritedMetadata:
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue