infra/.claude/reference
Viktor Barzin 6c294d4bb0 authentik: zero-endpoints alert + upgrade-validation checklist
Add `AuthentikForwardAuthFallbackActive` Prometheus alert: fires on
sustained 401/s spike on the websecure entrypoint (>5/s for 5m), which
is the symptom of the auth-proxy Emergency-Access fallback firing —
in turn caused by zero ready endpoints on the outpost service.

Why this rule and not `kube_endpoint_address_available == 0`:
kube-state-metrics endpoint metrics exist as series names but never
have current values in this Prometheus pipeline (something is dropping
them silently). Detecting the failure at the edge via Traefik is more
reliable than instrumenting the broken middle.

Also fix the pre-existing `AuthentikOutpostForwardAuth400Spike` regex
— the service label is `authentik-ak-outpost-...`, not
`authentik-authentik-outpost-...`, so the alert never matched any
series and never could have fired. Verified in Prometheus before/after
the fix.

Add an "Upgrade Validation Checklist" section to
`.claude/reference/authentik-state.md` with the seven-step smoke test
to run after Authentik chart bumps, provider bumps, or outpost pod
recreation. Covers the brittle surfaces (Service selector, JSON
patches, postgres backend wiring, access_token_validity TTL, edge
auth flow, plan-to-zero).
2026-05-22 14:16:41 +00:00
..
authentik-state.md authentik: zero-endpoints alert + upgrade-validation checklist 2026-05-22 14:16:41 +00:00
github-api.md [ci skip] Sunset Drone CI: remove all artifacts, DNS, configs, and references 2026-02-23 19:38:55 +00:00
known-issues.md add infrastructure agent team: 8 specialized agents + 14 diagnostic scripts 2026-03-15 02:01:07 +00:00
patterns.md anubis: per-site PoW reverse proxy on blog + kms + travel-blog 2026-05-10 11:12:40 +00:00
proxmox-inventory.md gpu: schedule off NFD label, not k8s-node1 hostname 2026-04-22 13:43:07 +00:00
service-catalog.md chrome-service: in-cluster headed Chromium pool for f1-stream verifier 2026-05-07 23:29:32 +00:00
upgrade-config.json chore: add untracked stacks, scripts, and agent configs 2026-04-15 09:33:06 +00:00