authentik: incident hardening after the signin-speedup rollout storm
The first apply of the signin-speedup change triggered a ~50min authentik outage (and a shared CNPG primary failover): the helm chart pin (2026.2.2) silently DOWNGRADED the Keel-managed live image (2026.2.4) against an already-migrated DB, default liveness probes kill-looped pods queuing on authentik's migration advisory lock, and kills mid-migration left ghost idle-in-transaction sessions holding that lock. Full analysis in docs/post-mortems/2026-06-10-authentik-downgrade-boot-storm.md. Hardening (all root causes): - values.yaml: pin global.image.tag to the Keel-managed live tag (2026.2.4) so helm applies can never downgrade under Keel again - values.yaml: server livenessProbe 6x10s/5s (was chart-default 3x10s/3s) - values.yaml: REMOVE AUTHENTIK_POSTGRESQL__CONN_MAX_AGE (session-mode pgbouncer pins persistent conns 1:1 -> pool saturation, 58s/s waits) - pgbouncer.ini: idle_transaction_timeout=300 reaps ghost lock holders; pgbouncer.tf gets a config-checksum annotation so ini changes roll pods - authentik_provider.tf: drop the completed import stanza (adoption rule) - traefik: suppress pre-existing keel.sh annotation/tier-label drift on auth-proxy/bot-block/x402/error-pages deployments (KEEL_LIFECYCLE_V1 pattern) so applies stop stripping live Keel state Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
parent
97ccdbecb8
commit
4e88298976
8 changed files with 156 additions and 23 deletions
|
|
@ -217,11 +217,6 @@ resource "authentik_stage_user_login" "default_login" {
|
|||
# screen and bypass the password field.
|
||||
# -----------------------------------------------------------------------------
|
||||
|
||||
import {
|
||||
to = authentik_stage_identification.default_identification
|
||||
id = "32aca5ab-106e-43f4-a4cc-4513d80e57f3"
|
||||
}
|
||||
|
||||
data "authentik_stage" "default_authentication_password" {
|
||||
name = "default-authentication-password"
|
||||
}
|
||||
|
|
@ -243,8 +238,6 @@ resource "authentik_stage_identification" "default_identification" {
|
|||
passwordless_flow,
|
||||
pretend_user_exists,
|
||||
captcha_stage,
|
||||
webauthn_stage,
|
||||
enable_remember_me,
|
||||
]
|
||||
}
|
||||
}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue