authentik: dedicated rate-limit carve-out + per-router 5xx observability
All checks were successful
ci/woodpecker/push/default Pipeline was successful
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Unauthenticated users were getting a blank login screen (and the screen would sometimes just hang). Root-caused via a read-only fan-out + adversarial verify: the login SPA cold-loads ~70 flow-executor JS/CSS chunks from /static through the SHARED 10/50 Traefik limiter, so a fresh/empty-cache load 429s the tail and a failed ES-module import aborts SPA bootstrap -> permanent blank. authentik was the only first-party SPA still on the default limiter (8 siblings already have a carve-out). NAT-shared clients trip it especially easily (shared per-IP bucket). - traefik: new `authentik-rate-limit` Middleware (average 100 / burst 1000, mirroring the existing health/tripit carve-outs). The authentik / and /static ingresses switch to it in the authentik-stack commit. - monitoring: the `traefik` scrape job's drop-regex was a blanket `traefik_router_.*`, which also dropped `traefik_router_requests_total` — so per-router 4xx/5xx (incl. 429/503) was neither queryable nor alertable. Narrowed it to keep the counter while still dropping the high-cardinality `*_duration_seconds_bucket` histogram, and added `AuthentikRootRouter5xxHigh` for the episodic all-3-server-pods-NotReady 502/503/504 cascade. Docs updated (networking.md rate-limit list, .claude/CLAUDE.md). GitOps CI applies. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
65a09dcbc4
commit
b84b0021c2
4 changed files with 57 additions and 5 deletions
|
|
@ -341,6 +341,33 @@ resource "kubernetes_manifest" "middleware_health_rate_limit" {
|
|||
depends_on = [helm_release.traefik]
|
||||
}
|
||||
|
||||
# Authentik-specific rate limit. The login SPA cold-loads its flow-executor
|
||||
# JS/CSS chunks from /static (app-served, not a CDN) plus an API burst on / —
|
||||
# ~70 parallel requests on a fresh/empty-cache login. The default 10/50 limiter
|
||||
# 429s the tail, and a 429'd ES-module import aborts SPA bootstrap → blank login
|
||||
# screen for cold/incognito/cache-cleared clients and any clients sharing a NAT
|
||||
# egress IP (sixth instance of the burst pattern, after ha-sofia, ActualBudget,
|
||||
# noVNC, tripit and health). authentik was the only first-party SPA still on the
|
||||
# default limiter. Burst absorbs a couple of full cold loads back-to-back.
|
||||
resource "kubernetes_manifest" "middleware_authentik_rate_limit" {
|
||||
manifest = {
|
||||
apiVersion = "traefik.io/v1alpha1"
|
||||
kind = "Middleware"
|
||||
metadata = {
|
||||
name = "authentik-rate-limit"
|
||||
namespace = kubernetes_namespace.traefik.metadata[0].name
|
||||
}
|
||||
spec = {
|
||||
rateLimit = {
|
||||
average = 100
|
||||
burst = 1000
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
depends_on = [helm_release.traefik]
|
||||
}
|
||||
|
||||
# Compress responses to clients at the entrypoint level (outermost).
|
||||
# Applied at websecure entrypoint so all responses get compressed.
|
||||
# Uses includedContentTypes (whitelist) instead of excludedContentTypes:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue