infra/stacks
Viktor Barzin 6c294d4bb0 authentik: zero-endpoints alert + upgrade-validation checklist
Add `AuthentikForwardAuthFallbackActive` Prometheus alert: fires on
sustained 401/s spike on the websecure entrypoint (>5/s for 5m), which
is the symptom of the auth-proxy Emergency-Access fallback firing —
in turn caused by zero ready endpoints on the outpost service.

Why this rule and not `kube_endpoint_address_available == 0`:
kube-state-metrics endpoint metrics exist as series names but never
have current values in this Prometheus pipeline (something is dropping
them silently). Detecting the failure at the edge via Traefik is more
reliable than instrumenting the broken middle.

Also fix the pre-existing `AuthentikOutpostForwardAuth400Spike` regex
— the service label is `authentik-ak-outpost-...`, not
`authentik-authentik-outpost-...`, so the alert never matched any
series and never could have fired. Verified in Prometheus before/after
the fix.

Add an "Upgrade Validation Checklist" section to
`.claude/reference/authentik-state.md` with the seven-step smoke test
to run after Authentik chart bumps, provider bumps, or outpost pod
recreation. Covers the brittle surfaces (Service selector, JSON
patches, postgres backend wiring, access_token_validity TTL, edge
auth flow, plan-to-zero).
2026-05-22 14:16:41 +00:00
..
_template [infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] 2026-04-18 14:15:51 +00:00
actualbudget ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
affine ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
authentik authentik/pgbouncer: image_pull_policy IfNotPresent -> Always (match live) 2026-05-22 14:16:41 +00:00
beads-server [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 23:29:34 +00:00
blog infra/llama-cpp: benchmark report + -fa flag fix 2026-05-22 14:16:41 +00:00
broker-sync [broker-sync] unsuspend IMAP + Panel 15 RSU vest reconciliation (Phase D) 2026-04-19 18:29:01 +00:00
calico [infra] Partial Calico adoption: namespaces only (Wave 5b) 2026-04-18 22:52:56 +00:00
changedetection ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
chrome-service chrome-service: open NP for Traefik → noVNC sidecar (port 6080) 2026-05-07 23:29:34 +00:00
city-guesser [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
claude-agent-service [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 23:29:34 +00:00
claude-memory ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
cloudflared [mailserver] Route DMARC rua/ruf to dmarc@viktorbarzin.me [ci skip] 2026-04-18 23:49:14 +00:00
cnpg [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
coturn [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
crowdsec crowdsec/traefik: stop captchaing legit Immich mobile bursts 2026-04-26 09:27:16 +00:00
cyberchef infra/llama-cpp: benchmark report + -fa flag fix 2026-05-22 14:16:41 +00:00
dashy [infra] TrueNAS decommission — remove active references from Terraform + configs 2026-04-19 16:57:05 +00:00
dawarich ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
dbaas infra/instagram-poster: shared CNPG-backed benchmark DB, no PVC for scores 2026-05-22 14:16:41 +00:00
descheduler [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
diun [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
ebook2audiobook gpu: schedule off NFD label, not k8s-node1 hostname 2026-04-22 13:43:07 +00:00
ebooks mailserver: split healthcheck path off PROXY-aware listeners + book-search uses ClusterIP 2026-05-05 19:45:33 +00:00
echo [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
excalidraw [docs] TrueNAS decommission cleanup — remove references from active docs 2026-04-19 16:55:43 +00:00
external-secrets [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
f1-stream anubis: only challenge GET requests; allow everything else 2026-05-22 14:16:40 +00:00
fire-planner fire-planner: imagePullPolicy=Always on alembic-migrate init container 2026-05-22 14:16:40 +00:00
foolery [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
forgejo ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
freedify ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
freshrss [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
frigate gpu: schedule off NFD label, not k8s-node1 hostname 2026-04-22 13:43:07 +00:00
grampsweb ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
hackmd [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
headscale [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
health [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
hermes-agent [hermes-agent] disable deployment — PVC permission mismatch 2026-04-22 14:31:50 +00:00
homepage infra/llama-cpp: benchmark report + -fa flag fix 2026-05-22 14:16:41 +00:00
immich ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
infra [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 23:29:34 +00:00
infra-maintenance [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
insta2spotify [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
instagram-poster infra/instagram-poster: shared CNPG-backed benchmark DB, no PVC for scores 2026-05-22 14:16:41 +00:00
isponsorblocktv [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
job-hunter grafana: env-var datasources + reloader so Vault rotations stop breaking dashboards 2026-05-10 11:12:39 +00:00
jsoncrack infra/llama-cpp: benchmark report + -fa flag fix 2026-05-22 14:16:41 +00:00
k8s-dashboard [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
k8s-portal gpu: schedule off NFD label, not k8s-node1 hostname 2026-04-22 13:43:07 +00:00
kms kms: per-connection state in notifier (vlmcsd is multi-threaded) 2026-05-22 14:16:40 +00:00
kured [infra] Adopt kured + sentinel-gate into Terraform (Wave 5a) 2026-04-18 22:33:29 +00:00
kyverno [forgejo] Phases 3+4+5: cutover, decommission, docs sweep 2026-05-07 23:29:34 +00:00
linkwarden [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
llama-cpp infra/llama-cpp: benchmark report + -fa flag fix 2026-05-22 14:16:41 +00:00
local-path [infra] Adopt local-path-provisioner into Terraform (Wave 5c) 2026-04-18 22:39:55 +00:00
mailserver fix: restore pvc-autoresizer by allow-listing kubelet_volume_stats_available_bytes 2026-05-10 11:12:37 +00:00
matrix [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
meshcentral [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
metallb [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
metrics-server [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
monitoring authentik: zero-endpoints alert + upgrade-validation checklist 2026-05-22 14:16:41 +00:00
n8n grafana: env-var datasources + reloader so Vault rotations stop breaking dashboards 2026-05-10 11:12:39 +00:00
navidrome [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
netbox [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
networking-toolbox [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
nextcloud ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
nfs-csi [infra] TrueNAS decommission — remove active references from Terraform + configs 2026-04-19 16:57:05 +00:00
nodelocal-dns [dns] NodeLocal DNSCache — deploy DaemonSet to all nodes (WS C) 2026-04-19 15:46:41 +00:00
novelapp [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
ntfy [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
nvidia gpu: schedule off NFD label, not k8s-node1 hostname 2026-04-22 13:43:07 +00:00
onlyoffice ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
openclaw ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
osm_routing [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
owntracks [owntracks] Strip face avatar from hook payload + drop orphan PVC 2026-04-19 12:05:18 +00:00
paperless-ngx ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
payslip-ingest grafana: env-var datasources + reloader so Vault rotations stop breaking dashboards 2026-05-10 11:12:39 +00:00
phpipam phpipam-pfsense-import: every 5min → hourly 2026-04-26 22:48:43 +00:00
platform [infra] Add Cloudflare provider to all stack lock files and generated providers 2026-04-16 16:31:36 +00:00
plotting-book [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
poison-fountain [poison-fountain] opt ingress out of Uptime Kuma external monitor 2026-04-22 21:24:22 +00:00
postiz postiz: disable signups (DISABLE_REGISTRATION=true) 2026-05-10 11:12:40 +00:00
priority-pass priority-pass: bump image_tag to 88f18e53 [ci skip] 2026-05-05 21:13:14 +00:00
privatebin Woodpecker CI deploy [CI SKIP] 2026-05-22 14:16:40 +00:00
proxmox-csi [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
pvc-autoresizer [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
rbac [infra] Migrate Terraform state from local SOPS to PostgreSQL backend 2026-04-16 19:33:12 +00:00
real-estate-crawler infra/llama-cpp: benchmark report + -fa flag fix 2026-05-22 14:16:41 +00:00
redis [redis] stabilise against node-crash flap cascade — RC1-RC5 fixes 2026-04-22 15:59:00 +00:00
reloader [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
resume [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
reverse-proxy chore: remove decommissioned registry.viktorbarzin.me ingress 2026-05-10 11:12:37 +00:00
rybbit [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
sealed-secrets [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
send [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
servarr ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
shadowsocks [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
speedtest [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
status-page [infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip] 2026-04-18 14:15:51 +00:00
stirling-pdf [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
tandoor [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
technitium [technitium] zone-sync now reconciles primaryNameServerAddresses 2026-04-22 17:47:18 +00:00
terminal [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
tor-proxy ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
trading-bot ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
traefik x402: consolidate to a single shared forwardAuth gateway 2026-05-10 11:12:40 +00:00
travel_blog infra/llama-cpp: benchmark report + -fa flag fix 2026-05-22 14:16:41 +00:00
tuya-bridge ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
uptime-kuma [reverse-proxy] Fix gw.viktorbarzin.me — point at 192.168.1.1 via EndpointSlice 2026-04-19 15:07:24 +00:00
url [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
vault infra/instagram-poster: shared CNPG-backed benchmark DB, no PVC for scores 2026-05-22 14:16:41 +00:00
vaultwarden [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
vpa [infra] Migrate Terraform state from local SOPS to PostgreSQL backend 2026-04-16 19:33:12 +00:00
wealthfolio grafana: env-var datasources + reloader so Vault rotations stop breaking dashboards 2026-05-10 11:12:39 +00:00
webhook_handler [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
whisper gpu: schedule off NFD label, not k8s-node1 hostname 2026-04-22 13:43:07 +00:00
wireguard [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] 2026-04-18 21:15:27 +00:00
woodpecker ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
xray [infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] 2026-04-18 21:19:48 +00:00
ytdlp gpu: schedule off NFD label, not k8s-node1 hostname 2026-04-22 13:43:07 +00:00