nodes: journald -> volatile (RAM) to cut sdc write-IOPS
Some checks failed
ci/woodpecker/push/default Pipeline failed

Node "container churn" investigation (code-oflt): container logs (~30 KB/s)
and overlayfs (~17 KB/s) are negligible; the node OS-disk churn is ext4
journal (jbd2) metadata writes driven mostly by journald's continuous
appends. node4 + node5 had drifted to uncapped persistent journald (4 GB
each, ~100 KB/s); master/node1-3 were correctly capped at 500M.

Node + pod journals already ship to Loki (alloy loki.source.journal), so
on-disk journald is pure write-IOPS overhead on the IOPS-bound sdc. Switch
journald to Storage=volatile (RAM, RuntimeMaxUse=200M) fleet-wide:
- cloud_init.yaml: drop-in 90-oflt-volatile.conf for new nodes (replaces
  the old persistent seds).
- running nodes (master + node1-5): pushed the same drop-in via qm guest
  exec + journald restart + cleared /var/log/journal.

Verified node5: OS-disk writers jbd2/sda1-8 931->46 KB/s, systemd-journal
gone (~94% drop); ~4 GB freed each on node4/node5. Logs stay queryable in
Loki. Trade-off: a hard crash loses the last unshipped journal.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-30 08:15:38 +00:00
parent 1afe41880e
commit 71501be408
2 changed files with 7 additions and 8 deletions

View file

@ -88,13 +88,12 @@ write_files:
runcmd:
# Enable weekly TRIM/discard to reclaim freed blocks in LVM thin pool
- systemctl enable --now fstrim.timer
# Enable persistent journald logging for crash forensics, with size limits to reduce disk wear
- mkdir -p /var/log/journal
- sed -i 's/#Storage=auto/Storage=persistent/' /etc/systemd/journald.conf
- sed -i 's/#SystemMaxUse=/SystemMaxUse=500M/' /etc/systemd/journald.conf
- sed -i 's/#MaxRetentionSec=/MaxRetentionSec=7day/' /etc/systemd/journald.conf
- sed -i 's/#MaxFileSec=/MaxFileSec=1day/' /etc/systemd/journald.conf
- sed -i 's/#Compress=yes/Compress=yes/' /etc/systemd/journald.conf
# journald in RAM (volatile, capped) — node + pod journals already ship to
# Loki via alloy (loki.source.journal), so on-disk journald is pure sdc
# write-IOPS overhead on the IOPS-bound HDD. code-oflt 2026-06-30.
- mkdir -p /etc/systemd/journald.conf.d
- printf '[Journal]\nStorage=volatile\nRuntimeMaxUse=200M\nCompress=yes\n' > /etc/systemd/journald.conf.d/90-oflt-volatile.conf
- rm -rf /var/log/journal
- systemctl restart systemd-journald
%{if is_k8s_template}
# Node DNS is intentionally STOCK — no resolved drop-ins, no /etc/hosts