infra/stacks/monitoring/modules/monitoring
Viktor Barzin a64d2ba2b9
All checks were successful
ci/woodpecker/push/default Pipeline was successful
upgrades: fix hourly gotenberg error + cap update notifications at weekly
Viktor was getting upgrade-error Slack messages every hour and wants
update notifications at most weekly. Root cause of the errors: Keel kept
trying to roll gotenberg 8.25->8.25.1 in paperless-ngx but kyverno's
require-trusted-registries denied it — gotenberg/* (and apache/*, which
tika will hit next) were never allowlisted, and Keel's Slack notifier at
info level re-posted the identical failure to #general on every hourly
poll since Jun 28.

Changes: allowlist gotenberg/* + apache/* so the patch applies cleanly;
disable Keel's direct Slack notifier and replace failure visibility with
a KeelUpdateFailing Loki-ruler alert (alert-on-change: one notification
plus the daily digest, never an hourly drip); remove diun's Slack
notifier whose default message @channel-pinged #image-updates for every
new upstream tag every 6h (the n8n upgrade-agent webhook feed is
untouched). The k8s upgrade report is already weekly (Mon 06:07 UTC).
Paperless-ngx itself stays paused (keel policy=never, user-managed) while
the ingest runs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 07:16:50 +00:00
..
dashboards fire-countdown dashboard: SQL guards + tax regime + honesty fixes 2026-07-01 22:44:17 +00:00
server-power-cycle fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
alert_digest.py monitoring: reduce Slack alert noise (alert-on-change + daily digest) 2026-06-12 20:35:56 +00:00
alert_digest.tf monitoring: reduce Slack alert noise (alert-on-change + daily digest) 2026-06-12 20:35:56 +00:00
alloy.yaml fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
authentik_walloff_probe.tf monitoring: add pfSense WAN/egress alerting + probes 2026-06-28 16:46:30 +00:00
Dockerfile fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
goflow2.tf goldmane-trail: polish follow-ups #57/#59/#61/#62/#63 + digest→#alerts 2026-06-25 17:49:25 +00:00
grafana.tf fix(monitoring): force_conflicts on grafana_db_creds ExternalSecret 2026-06-24 12:25:36 +00:00
grafana_chart_values.yaml fix(fire-planner): grafana fire-planner-pg datasource survives pw rotation 2026-06-28 16:14:42 +00:00
idrac.tf goldmane-trail: polish follow-ups #57/#59/#61/#62/#63 + digest→#alerts 2026-06-25 17:49:25 +00:00
k8s-monitoring-values.yaml fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
loki.tf upgrades: fix hourly gotenberg error + cap update notifications at weekly 2026-07-02 07:16:50 +00:00
loki.yaml fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
loki_ingress.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
main.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
prometheus.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
prometheus_chart_values.tpl feat(nvidia): GPU VRAM budget + watchdog to stop T4 overallocation 2026-06-30 07:57:40 +00:00
prometheus_snmp_chart_values.yaml fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
pve_exporter.tf goldmane-trail: polish follow-ups #57/#59/#61/#62/#63 + digest→#alerts 2026-06-25 17:49:25 +00:00
snmp_exporter.tf goldmane-trail: polish follow-ups #57/#59/#61/#62/#63 + digest→#alerts 2026-06-25 17:49:25 +00:00
ups_snmp_values.yaml fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00