Reduce disk write amplification across cluster (~200-350 GB/day savings) [ci skip]

- Prometheus: persist metric whitelist (keep rules) to Helm template, preventing
  regression from 33K to 250K samples/scrape on next apply. Reduce retention 52w→26w.
- MySQL InnoDB: aggressive write reduction — flush_log_at_trx_commit=0, sync_binlog=0,
  doublewrite=OFF, io_capacity=100/200, redo_log=1GB, flush_neighbors=1, reduced page cleaners.
- etcd: increase snapshot-count 10000→50000 to reduce WAL snapshot frequency.
- VM disks: enable TRIM/discard passthrough to LVM thin pool via create-vm module.
- Cloud-init: enable fstrim.timer, journald limits (500M/7d/compress).
- Kubelet: containerLogMaxSize=10Mi, containerLogMaxFiles=3.
- Technitium: DNS query log retention 0→30 days (was unlimited writes to MySQL).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-04-09 19:01:21 +00:00
parent 98aaba98da
commit 6101fb99f9
8 changed files with 127 additions and 8 deletions

View file

@ -56,9 +56,15 @@ apt:
filename: docker.list
runcmd:
# Enable persistent journald logging for crash forensics
# Enable weekly TRIM/discard to reclaim freed blocks in LVM thin pool
- systemctl enable --now fstrim.timer
# Enable persistent journald logging for crash forensics, with size limits to reduce disk wear
- mkdir -p /var/log/journal
- sed -i 's/#Storage=auto/Storage=persistent/' /etc/systemd/journald.conf
- sed -i 's/#SystemMaxUse=/SystemMaxUse=500M/' /etc/systemd/journald.conf
- sed -i 's/#MaxRetentionSec=/MaxRetentionSec=7day/' /etc/systemd/journald.conf
- sed -i 's/#MaxFileSec=/MaxFileSec=1day/' /etc/systemd/journald.conf
- sed -i 's/#Compress=yes/Compress=yes/' /etc/systemd/journald.conf
- systemctl restart systemd-journald
%{if is_k8s_template}
# Disable unattended-upgrades to prevent unexpected kernel updates that can break containerd/kubelet

View file

@ -194,6 +194,7 @@ resource "proxmox_vm_qemu" "cloudinit-vm" {
disk {
storage = "local-lvm"
size = var.vm_disk_size
discard = true # Enable TRIM passthrough to LVM thin pool reduces CoW overhead
}
}
}
@ -203,6 +204,7 @@ resource "proxmox_vm_qemu" "cloudinit-vm" {
disk {
storage = "local-lvm"
size = var.vm_disk_size
discard = true
}
}
}