infra/stacks/monitoring/modules/monitoring
Viktor Barzin 994d305d04
Some checks failed
ci/woodpecker/push/default Pipeline failed
t3: session-auth detection for the gated nightly tracker (dispatch fallback logging + Loki alerts)
Before auto-tracking t3 nightly builds (Viktor's call, risk accepted), stand up
the detection that was missing on 2026-06-09 — when an auto-pulled nightly broke
pairing for ALL users and nothing alerted. Viktor's explicit requirement: make
sure session auth keeps working and revert if the pairing fallback/failure rate
climbs. This is phase 0 (detection) of that work.

- t3-dispatch: exchangeCredential now reports WHICH pairing endpoint answered,
  and autoPair logs every outcome (paired user=.. endpoint=.. fallback=..) — so
  the real-user browser-session->bootstrap fallback rate is observable. A
  non-zero rate flags that a build moved the pairing API (the 2026-06-09 class).
- Loki ruler alerts (devvm journal -> Alertmanager -> Slack): T3PairingBroken
  (real users failing to pair), T3PairFallbackHigh (build moved the pairing API),
  T3AutoUpdateRolledBack / RollbackFailed / Frozen (enforcer outcomes). Closes
  the post-mortem's open "nothing monitors end-to-end pairing" detection gap.

The existing t3-probe only checks GET /api/auth/session==200, which stays 200
even when pairing is dead, so it never caught the outage class.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-16 09:56:55 +00:00
..
dashboards fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
server-power-cycle fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
alert_digest.py monitoring: reduce Slack alert noise (alert-on-change + daily digest) 2026-06-12 20:35:56 +00:00
alert_digest.tf monitoring: reduce Slack alert noise (alert-on-change + daily digest) 2026-06-12 20:35:56 +00:00
alloy.yaml fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
authentik_walloff_probe.tf t3: differential drop-attribution probe + devvm metrics 2026-06-10 21:11:29 +00:00
Dockerfile fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
goflow2.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
grafana.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
grafana_chart_values.yaml fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
idrac.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
k8s-monitoring-values.yaml fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
loki.tf t3: session-auth detection for the gated nightly tracker (dispatch fallback logging + Loki alerts) 2026-06-16 09:56:55 +00:00
loki.yaml fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
loki_ingress.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
main.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
prometheus.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
prometheus_chart_values.tpl chrome-service + mam-farming: doc clarifications (+ re-trigger CI apply missed earlier) 2026-06-16 09:34:23 +00:00
prometheus_snmp_chart_values.yaml fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
pve_exporter.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
snmp_exporter.tf fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00
ups_snmp_values.yaml fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip] 2026-06-09 08:45:33 +00:00