Commit graph

4 commits

Author SHA1 Message Date
Viktor Barzin
344fce3692 [monitoring][poison-fountain] pushgateway persistence + cronjob uid-0
Two independent root-cause fixes surfaced by the 2026-04-22 cluster
health check:

1. Pushgateway lost all in-memory metrics when node3 kubelet hiccuped
   at 11:42 UTC, hiding backup_last_success_timestamp{job="offsite-
   backup-sync"} until the next 06:01 UTC push — a ~18h false-negative
   window. Enable persistence on a 2Gi proxmox-lvm-encrypted PVC with
   --persistence.interval=1m. Chart note: values key is
   `prometheus-pushgateway:` (subchart alias), not `pushgateway:`.

2. poison-fountain-fetcher CronJob runs curlimages/curl as UID 100
   but the NFS mount /srv/nfs/poison-fountain is root:root 755 and
   the main Deployment runs as root, so mkdir /data/cache fails
   every 6h. Set run_as_user=0 on the CronJob container (no_root_squash
   is set on the export).

Closes the backup_offsite_sync FAIL on the next 06:01 UTC offsite
sync; closes the recurring poison-fountain evicted-pod noise on the
next 00:00 UTC cron tick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:32:29 +00:00
Viktor Barzin
216d4240c9 [infra] Add Cloudflare provider to all stack lock files and generated providers
Terragrunt now generates cloudflare_provider.tf (Vault-sourced API key)
and includes cloudflare in required_providers. These are the generated
files from running `terragrunt init -upgrade` across all stacks.

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:31:36 +00:00
Viktor Barzin
1c13af142d sync regenerated providers.tf + upstream changes
- Terragrunt-regenerated providers.tf across stacks (vault_root_token
  variable removed from root generate block)
- Upstream monitoring/openclaw/CLAUDE.md changes from rebase
2026-03-22 02:56:04 +02:00
Viktor Barzin
06a0d0599a regenerate providers.tf: remove vault_root_token variable [ci skip] 2026-03-15 21:21:01 +00:00