fix(authentik): long-lived social-login sessions + shield auth from CrowdSec lockout
All checks were successful
ci/woodpecker/push/default Pipeline was successful
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor's passkeys all vanished and he was suddenly being asked to log in
multiple times a day instead of ~monthly. Root cause: on 2026-06-18 an ad-hoc
tripit passkey E2E test (run from the devvm as akadmin via python-httpx) cleaned
up "the demo user's" passkeys with GET /core/users/?search={demo} then DELETE
each device of users[0] — but the fuzzy search returned the REAL account, so it
wiped all 6 real passkeys. Losing passkeys forced fallback to Google login, and
the social-login stage (default-source-authentication-login) had the provider
default session_duration=seconds=0, which falls back to UNAUTHENTICATED_AGE=2h —
hence the constant re-logins. (Password + passkey logins were already weeks=4.)
Changes:
- authentik: adopt default-source-authentication-login into Terraform (import)
and pin session_duration=weeks=4, so Google/GitHub/Facebook logins last as long
as password/passkey. Immediate relief without re-enrolling.
- authentik: document the provider-schema gotcha — authentik_stage_identification
exposes no webauthn_stage / enable_remember_me attribute, so they must NOT be in
ignore_changes (commit 4e882989 removed them for this reason; re-adding breaks
every apply). The passkey break was purely the missing device records, not drift.
- edge (rybbit): shield auth so a CrowdSec hit can never wall a user out of login —
carve authentik.viktorbarzin.me + public-auth out of the zone WAF block rule,
make the LAPI->edge sync ban-only (stop downgrading captcha to a hard block),
and set exclude_crowdsec on the Authentik UI ingress (auth keeps rate-limiting).
- docs: record the session-duration change, the edge enforcement + auth carve-out
(previously undocumented), and the pre-existing broken crowdsec-cf-sync CronJob
(CF cursor pagination 400 + ~31k IPs vs list capacity -> edge list inert).
Passkey re-enrollment is a manual user action (devices are gone from the DB);
nothing auto-re-deletes them.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
600f1f933c
commit
46166c63b2
6 changed files with 119 additions and 20 deletions
|
|
@ -4,12 +4,16 @@
|
|||
Cloudflare-PROXIED hosts terminate at the CF edge, so the in-cluster CrowdSec
|
||||
bouncer (which keys on the client IP Traefik sees) never decides on them. We
|
||||
push the decisions into the edge instead: a zone-scoped WAF custom rule blocks
|
||||
`(ip.src in $crowdsec_ban)` across EVERY proxied host in the zone. This job is
|
||||
the control plane that keeps that one IP List in sync with LAPI.
|
||||
`(ip.src in $crowdsec_ban)` across EVERY proxied host in the zone (the Authentik
|
||||
auth hosts are carved out in crowdsec_edge.tf so a ban can't break login). This
|
||||
job is the control plane that keeps that one IP List in sync with LAPI.
|
||||
|
||||
The CF account hard-limits to ONE Rules List, so enforcement is BLOCK-ONLY:
|
||||
BOTH ban AND captcha (scope=="ip") decisions are folded into the single
|
||||
crowdsec_ban list and captcha is downgraded to block at the proxied edge.
|
||||
Enforcement is BAN-ONLY: only scope=="ip" decisions of type "ban" are synced.
|
||||
"captcha" decisions are deliberately NOT pushed — the CF account allows only ONE
|
||||
Rules List with a single block action, so folding captcha in would hard-block a
|
||||
soft challenge across every proxied host. Captcha remediation stays at the
|
||||
in-cluster Traefik bouncer (Turnstile) for non-proxied apps. (Changed 2026-06-20
|
||||
from the prior ban+captcha fold that downgraded captcha to a hard edge block.)
|
||||
|
||||
(Filename kept as lapi_kv_sync.py for path/ConfigMap continuity with the prior
|
||||
Workers-KV design; it no longer touches KV — it reconciles a CF Rules List.)
|
||||
|
|
@ -117,13 +121,17 @@ def _cf(url, *, method="GET", payload=None, timeout=20):
|
|||
# LAPI
|
||||
# --------------------------------------------------------------------------- #
|
||||
def fetch_decisions():
|
||||
"""Return the single desired set of IPs to BLOCK at the edge.
|
||||
"""Return the desired set of IPs to BLOCK at the edge.
|
||||
|
||||
Only scope=="ip" decisions are projected (the WAF rule keys on ip.src). The
|
||||
CF account allows only ONE Rules List, so BOTH "ban" AND "captcha" decisions
|
||||
are folded into one block set (captcha is downgraded to block at the proxied
|
||||
edge). Raises on transport/HTTP error so the caller can SKIP the run
|
||||
(fail-safe).
|
||||
Only scope=="ip" decisions of type "ban" are projected (the WAF rule keys on
|
||||
ip.src). "captcha" decisions are deliberately NOT pushed to the edge: the CF
|
||||
account allows only ONE Rules List with a single block action, so folding
|
||||
captcha in would HARD-BLOCK a soft challenge across every proxied host (and,
|
||||
before the auth-host carve-out in crowdsec_edge.tf, could lock a user out of
|
||||
Authentik itself). Edge enforcement is therefore ban-only; captcha
|
||||
remediation stays at the in-cluster Traefik bouncer (Turnstile) for
|
||||
non-proxied apps. Raises on transport/HTTP error so the caller can SKIP the
|
||||
run (fail-safe). 2026-06-20.
|
||||
"""
|
||||
data = _req(
|
||||
f"{LAPI_URL}/v1/decisions",
|
||||
|
|
@ -137,9 +145,10 @@ def fetch_decisions():
|
|||
if not ip:
|
||||
continue
|
||||
dtype = (d.get("type") or "").lower()
|
||||
if dtype in ("ban", "captcha"):
|
||||
if dtype == "ban":
|
||||
block.add(ip)
|
||||
# other remediation types (e.g. throttle) are ignored
|
||||
# captcha / throttle / other remediation types are ignored at the edge
|
||||
# (ban-only enforcement — see the docstring above)
|
||||
return block
|
||||
|
||||
|
||||
|
|
@ -298,7 +307,7 @@ def main():
|
|||
push_metrics(0, ok=False)
|
||||
return 0
|
||||
|
||||
print(f"[info] LAPI desired: {len(block)} block (ban+captcha, ip-scope)")
|
||||
print(f"[info] LAPI desired: {len(block)} block (ban-only, ip-scope)")
|
||||
|
||||
# 2. Reconcile the single block list. CF errors fail loud (non-zero exit).
|
||||
try:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue