infra/.claude/reference/authentik-state.md
Viktor Barzin 8e44ccaa65 docs: dashboard access is forward-auth + token-paste (OIDC SSO blocked)
Correct the docs I'd written for the (reverted) oauth2-proxy SSO. Reality:
apiserver OIDC rejects all Authentik tokens (design §12), so the dashboard
uses forward-auth (admits kubernetes-* groups) + per-namespace SA token-paste.
Updates authentication.md, multi-tenancy.md, service-catalog, authentik-state,
and add-user skill (onboarding now documents the dashboard token). oauth2-proxy
+ k8s-dashboard OIDC app noted as idle. [ci skip]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:10 +00:00

14 KiB

Authentik Current State

Snapshot of applications, groups, users, and flows. Use authentik skill for management tasks.

Applications (11)

Application Provider Type Auth Flow
Cloudflare Access OAuth2/OIDC explicit consent
Domain wide catch all Proxy (forward auth) implicit consent
Forgejo OAuth2/OIDC explicit consent
Grafana OAuth2/OIDC implicit consent
Headscale OAuth2/OIDC explicit consent
Immich OAuth2/OIDC explicit consent
Kubernetes OAuth2/OIDC (public) implicit consent
Kubernetes Dashboard OAuth2/OIDC (confidential) implicit consent
linkwarden OAuth2/OIDC explicit consent
Matrix OAuth2/OIDC implicit consent
wrongmove OAuth2/OIDC implicit consent

Kubernetes Dashboard (TF-managed in stacks/k8s-dashboard/authentik.tf): confidential client k8s-dashboard, built for seamless dashboard SSO via oauth2-proxy. Currently IDLE — the apiserver rejects all OIDC tokens (see docs/plans/2026-06-04-k8s-dashboard-sso-design.md §12), so the dashboard runs on forward-auth + token-paste instead and oauth2-proxy is unwired. Kept for a future SSO retry once apiserver OIDC is fixed.

admin-services-restriction policy (TF-managed in stacks/authentik/admin-services-restriction.tf, adopted 2026-06-04): gates the 15 admin-only hostnames to Home Server Admins, with a carve-out admitting the kubernetes-* RBAC groups to k8s.viktorbarzin.me (dashboard login page).

Groups (9)

Group Parent Superuser Purpose
Allow Login Users -- No Parent group for login-permitted users
authentik Admins -- Yes Full admin access
Headscale Users Allow Login Users No VPN access
Home Server Admins Allow Login Users No Server admin access
Wrongmove Users Allow Login Users No Real-estate app access
kubernetes-admins -- No K8s cluster-admin RBAC
kubernetes-power-users -- No K8s power-user RBAC
kubernetes-namespace-owners -- No K8s namespace-owner RBAC
Task Submitters -- No Task submission access

Users (8 real)

Username Name Type Groups
akadmin authentik Default Admin internal authentik Admins, Home Server Admins, Headscale Users
vbarzin@gmail.com Viktor Barzin internal authentik Admins, Home Server Admins, Wrongmove Users, Headscale Users
emil.barzin@gmail.com Emil Barzin internal Home Server Admins, Headscale Users
ancaelena98@gmail.com Anca Milea external Wrongmove Users, Headscale Users
vabbit81@gmail.com GHEORGHE Milea external Headscale Users, kubernetes-namespace-owners, sops-vabbit81
valentinakolevabarzina@gmail.com Valentina internal Headscale Users
anca.r.cristian10@gmail.com -- internal Wrongmove Users
kadir.tugan@gmail.com Kadir internal Wrongmove Users

Login Sources

  • Google (OAuth) -- user matching by identifier
  • GitHub (OAuth) -- user matching by email_link
  • Facebook (OAuth) -- user matching by email_link
  • All sources use invitation-enrollment as enrollment flow (new users require invitation)

Authorization Flows

  • Explicit consent (default-provider-authorization-explicit-consent): Shows consent screen
  • Implicit consent (default-provider-authorization-implicit-consent): Auto-redirects

Invitation Enrollment Flow

Slug: invitation-enrollment | PK: 7d667321-2b02-4e16-8161-148078a8dac1

New users can only sign up via invitation link. Admins generate single-use invite links.

Stages (in order)

Order Stage Type Purpose
10 invitation-validation Invitation Validates ?itoken= parameter, blocks without valid token
20 enrollment-identification Identification Shows social login (Google/GitHub/Facebook) + passkey
30 enrollment-prompt Prompt Collects name and email (pre-filled from social login)
40 enrollment-user-write User Write Creates user in Allow Login Users group
50 enrollment-login User Login Auto-login after signup (policy: invitation-group-assignment adds user to target group from invitation fixed_data.group)

Invitation Management

Script: .claude/scripts/authentik-invite.sh

# Create invitation (single-use, no expiry)
./authentik-invite.sh create "Headscale Users"

# Create invitation with expiry
./authentik-invite.sh create "Wrongmove Users" --days 7

# Add user to group after enrollment
./authentik-invite.sh assign <username> "Headscale Users"

# List pending invitations
./authentik-invite.sh list

Invited users sign up via social login (Google/GitHub/Facebook) or passkey. No username/password enrollment. The target group (e.g. "Headscale Users") is auto-assigned on enrollment via the invitation-group-assignment expression policy. The assign command is available for manual post-enrollment group changes.

Cleanup Log (2026-03-13)

Deleted Flows

  • enrollment-inviation (typo) -- previous invitation attempt
  • headscale-authentication -- not used by any provider
  • headscale-authorization -- not used by any provider
  • default-enrollment-flow -- password-based, unused
  • oauth-enrollment -- replaced by invitation-enrollment

Deleted Stages

  • enrollment-invitation, enrollment-invitation-write (from old invitation flow)
  • invitation (unbound)
  • default-enrollment-prompt-first, default-enrollment-prompt-second (from default enrollment)
  • default-enrollment-user-write, default-enrollment-email-verification, default-enrollment-user-login

Deleted Groups

  • authentik Read-only -- 0 users, unused role

Deleted Policies

  • map github username to email -- unbound
  • Map Google Attributes -- unbound

Deleted Roles

  • authentik Read-only -- no group assignment

Policy Fix (2026-04-06)

Unbound brute-force-protection Policy

The brute-force-protection ReputationPolicy (PK: ac98cb11-31d3-46ab-8883-bf51e6b09a60, check_username=True, check_ip=True, threshold=-5) was bound to 3 authentication flows, causing "Flow does not apply to current user" for all unauthenticated users (no username to evaluate → failure_result=false → flow denied).

Removed bindings from:

  • default-authentication-flow (PK: 34618cf3) — username/password login
  • webauthn (PK: 0b60c2a5) — passkey login
  • default-source-authentication (PK: via policybindingmodel 1a779f24) — Google/GitHub/Facebook OAuth

Policy still exists with 0 bindings. If brute-force protection is needed, bind to the password stage (not the flow level).

Session Duration (2026-05-01)

Pinned via Terraform in stacks/authentik/:

Knob Value Surface Effect
UserLoginStage.session_duration on default-authentication-login weeks=4 authentik_stage_user_login.default_login in authentik_provider.tf Authenticated users stay logged in 4 weeks across browser restarts. No sliding refresh — resets on each login.
ProxyProvider.access_token_validity on Provider for Domain wide catch all weeks=4 authentik_provider_proxy.catchall.access_token_validity in authentik_provider.tf Cookie Max-Age on authentik_proxy_* and expires on rows in authentik_providers_proxy_proxysession. Bumped 2026-05-10 from hours=168. Bumping requires kubectl rollout restart deploy/ak-outpost-authentik-embedded-outpost — the gorilla session store binds the value once at outpost startup; the 5-min provider refresh logs "reusing existing session store" and skips rebuild.
AUTHENTIK_SESSIONS__UNAUTHENTICATED_AGE (server + worker) hours=2 server.env + worker.env in modules/authentik/values.yaml Anonymous Django sessions (bots, healthcheckers, partial flows) are reaped within 2h instead of the 1d default.

Notes:

  • There is no Brand.session_duration; UserLoginStage is the only correct lever for authenticated session lifetime.
  • Embedded outpost session storage: PostgreSQL table authentik_providers_proxy_proxysession in authentik 2025.10+ (PR #16628), but only when IsEmbedded() returns true (i.e. Outpost.managed == "goauthentik.io/outposts/embedded"). Our outpost record had managed=null until 2026-05-10, which silently kept it on the gorilla FilesystemStore at /dev/shm (TMPDIR) and re-exposed the 2026-04-18 mismatched-session-ID class on every pod restart. Fix landed 2026-05-10: see authentik_outpost.embedded in authentik_provider.tf and post-mortem 2026-04-18-authentik-outpost-shm-full.md.
  • The proxy outpost service has a known goauthentik 2026.2.2 bug (internal/outpost/controllers/k8s/service.py:52): for embedded outposts the controller sets the Service selector to app.kubernetes.io/name=authentik (the server pods), not authentik-outpost-proxy. We work around it via a kubernetes_json_patches.service patch on the outpost record (replaces /spec/selector with the outpost's own labels). Without this, endpoints are empty and Traefik forward-auth fails over to the Basic Auth realm Emergency Access.
  • The standalone embedded-outpost deployment needs AUTHENTIK_POSTGRESQL__{HOST,PORT,USER,PASSWORD,NAME} env vars to reach the dbaas cluster — codified via kubernetes_json_patches.deployment envFrom the shared goauthentik Secret. The app.kubernetes.io/component=server pod label is also injected via JSON patch (matches the component:server half of the Service selector that the controller adds for embedded outposts).
  • ProxyProvider.remember_me_offset stays UI-managed via ignore_changes.
  • The Authentik provider's resource schema does not expose the Outpost.managed field. We rely on TF's "write only fields it knows about" semantic: the server-set goauthentik.io/outposts/embedded value is preserved across applies because Terraform never writes managed. Don't change the resource provider schema expectations without verifying this assumption holds.
  • The unauthenticated_age env var is injected via server.env / worker.env (not authentik.sessions.unauthenticated_age) because we set authentik.existingSecret.secretName: goauthentik, which makes the chart skip rendering its own AUTHENTIK_* Secret. The authentik.* value block is therefore inert in this stack — anything new under authentik.* must use the *.env arrays instead. The same applies to the existing authentik.cache.*, authentik.web.*, authentik.worker.* blocks (currently inert; live values come from the orphaned, helm-keep-policy goauthentik Secret created by chart 2025.10.3 before existingSecret was introduced).

Upgrade Validation Checklist

Run after any of these:

  • Authentik chart version bump in stacks/authentik/modules/authentik/main.tf (the version = "..." line on helm_release.authentik).
  • goauthentik/authentik Terraform provider version bump.
  • Outpost pod recreation (kured reboot, eviction, manual rollout restart, scheduler move).

The fragile surfaces are the kubernetes_json_patches and the Outpost.managed field — both rely on assumptions that can silently break across upgrades. The checklist exercises the same path the alerts watch, so it doubles as a smoke test for the alerts.

# 1. Service routes to the outpost pod (NOT the server pods).
#    Empty endpoints => auth-proxy fallback fires; expected: ONE pod IP, ports 9000/9300/9443.
kubectl -n authentik get endpoints ak-outpost-authentik-embedded-outpost

# 2. Service selector still excludes the server pods. Expected: includes
#    `app.kubernetes.io/name: authentik-outpost-proxy`. If it flips to
#    `name: authentik`, the goauthentik upstream bug came back or our
#    JSON patch was unset.
kubectl -n authentik get svc ak-outpost-authentik-embedded-outpost -o jsonpath='{.spec.selector}'

# 3. Outpost mode + session backend. Expected log lines on startup:
#      {"embedded":true,"event":"Outpost mode",...}
#      {"event":"using PostgreSQL session backend",...}
#    If embedded=false or `using filesystem session backend`, the postgres
#    fix is broken — likely `Outpost.managed` got cleared, or the upstream
#    schema started exposing `managed` and TF reset it.
kubectl -n authentik logs deploy/ak-outpost-authentik-embedded-outpost | grep -E '"Outpost mode"|"session backend"' | head -3

# 4. /dev/shm is essentially empty (postgres backend = no filesystem use).
#    A row count > a few dozen indicates filesystem fallback is firing.
kubectl -n authentik exec deploy/ak-outpost-authentik-embedded-outpost -- sh -c 'df -h /dev/shm; ls /dev/shm | wc -l'

# 5. Postgres session table is growing with traffic. Expected: rows with
#    `expires` ~28 days out (matches access_token_validity = weeks=4).
kubectl -n authentik exec deploy/goauthentik-server -- ak shell -c "
from django.db import connection; c = connection.cursor()
c.execute('SELECT COUNT(*), MAX(expires) FROM authentik_providers_proxy_proxysession')
print(c.fetchone())"

# 6. Edge auth flow: should be 302 → authentik. NOT 401 with WWW-Authenticate.
curl -sS -o /dev/null -D - 'https://terminal.viktorbarzin.me/' -H 'User-Agent: Mozilla/5.0' \
  | grep -iE '^HTTP|^location|x-auth-fallback|www-authenticate'

# 7. Terraform plan-to-zero on the whole authentik stack.
( cd stacks/authentik && /home/wizard/code/infra/scripts/tg plan ) | grep -E 'No changes|Plan:'

Steps 1, 3, 6 cover the failure modes the Prometheus alerts trigger on (AuthentikForwardAuthFallbackActive, AuthentikOutpostForwardAuth400Spike). Steps 4 and 5 cover the silent-regression case (filesystem fallback) where the alerts don't fire but the system loses its postgres-backed session persistence on the next pod restart.

If step 2 shows the controller restored app.kubernetes.io/name=authentik, watch goauthentik/authentik issue tracker for fixes around internal/outpost/controllers/k8s/service.py:52 — the upstream patch might let us drop our kubernetes_json_patches.service workaround.