Update authentication.md (structured multi-issuer AuthenticationConfiguration + dashboard SSO flow), multi-tenancy.md (web dashboard access), authentik-state (new k8s-dashboard app + gheorghe groups), service-catalog (dashboard auth), and the k8s-version-upgrade runbook (kubeadm wipes --authentication-config → re-apply rbac post-upgrade). Design/plan addenda record the issuer-constraint pivot from the original dual-aud approach. [ci skip] Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
14 KiB
Authentik Current State
Snapshot of applications, groups, users, and flows. Use
authentikskill for management tasks.
Applications (11)
| Application | Provider Type | Auth Flow |
|---|---|---|
| Cloudflare Access | OAuth2/OIDC | explicit consent |
| Domain wide catch all | Proxy (forward auth) | implicit consent |
| Forgejo | OAuth2/OIDC | explicit consent |
| Grafana | OAuth2/OIDC | implicit consent |
| Headscale | OAuth2/OIDC | explicit consent |
| Immich | OAuth2/OIDC | explicit consent |
| Kubernetes | OAuth2/OIDC (public) | implicit consent |
| Kubernetes Dashboard | OAuth2/OIDC (confidential) | implicit consent |
| linkwarden | OAuth2/OIDC | explicit consent |
| Matrix | OAuth2/OIDC | implicit consent |
| wrongmove | OAuth2/OIDC | implicit consent |
Kubernetes Dashboard (TF-managed in
stacks/k8s-dashboard/authentik.tf): confidential clientk8s-dashboardconsumed by oauth2-proxy in front of the web dashboard. Has a custom scope mappingk8s-dashboard audience(scopek8s-dashboard-audience) emittingaud=[kubernetes,k8s-dashboard], plus a group-access policy restricting login tokubernetes-admins/kubernetes-power-users/kubernetes-namespace-owners. The apiserver trusts this app's issuer via therbacstack structuredAuthenticationConfiguration.
Groups (9)
| Group | Parent | Superuser | Purpose |
|---|---|---|---|
| Allow Login Users | -- | No | Parent group for login-permitted users |
| authentik Admins | -- | Yes | Full admin access |
| Headscale Users | Allow Login Users | No | VPN access |
| Home Server Admins | Allow Login Users | No | Server admin access |
| Wrongmove Users | Allow Login Users | No | Real-estate app access |
| kubernetes-admins | -- | No | K8s cluster-admin RBAC |
| kubernetes-power-users | -- | No | K8s power-user RBAC |
| kubernetes-namespace-owners | -- | No | K8s namespace-owner RBAC |
| Task Submitters | -- | No | Task submission access |
Users (8 real)
| Username | Name | Type | Groups |
|---|---|---|---|
| akadmin | authentik Default Admin | internal | authentik Admins, Home Server Admins, Headscale Users |
| vbarzin@gmail.com | Viktor Barzin | internal | authentik Admins, Home Server Admins, Wrongmove Users, Headscale Users |
| emil.barzin@gmail.com | Emil Barzin | internal | Home Server Admins, Headscale Users |
| ancaelena98@gmail.com | Anca Milea | external | Wrongmove Users, Headscale Users |
| vabbit81@gmail.com | GHEORGHE Milea | external | Headscale Users, kubernetes-namespace-owners, sops-vabbit81 |
| valentinakolevabarzina@gmail.com | Valentina | internal | Headscale Users |
| anca.r.cristian10@gmail.com | -- | internal | Wrongmove Users |
| kadir.tugan@gmail.com | Kadir | internal | Wrongmove Users |
Login Sources
- Google (OAuth) -- user matching by identifier
- GitHub (OAuth) -- user matching by email_link
- Facebook (OAuth) -- user matching by email_link
- All sources use
invitation-enrollmentas enrollment flow (new users require invitation)
Authorization Flows
- Explicit consent (
default-provider-authorization-explicit-consent): Shows consent screen - Implicit consent (
default-provider-authorization-implicit-consent): Auto-redirects
Invitation Enrollment Flow
Slug: invitation-enrollment | PK: 7d667321-2b02-4e16-8161-148078a8dac1
New users can only sign up via invitation link. Admins generate single-use invite links.
Stages (in order)
| Order | Stage | Type | Purpose |
|---|---|---|---|
| 10 | invitation-validation | Invitation | Validates ?itoken= parameter, blocks without valid token |
| 20 | enrollment-identification | Identification | Shows social login (Google/GitHub/Facebook) + passkey |
| 30 | enrollment-prompt | Prompt | Collects name and email (pre-filled from social login) |
| 40 | enrollment-user-write | User Write | Creates user in Allow Login Users group |
| 50 | enrollment-login | User Login | Auto-login after signup (policy: invitation-group-assignment adds user to target group from invitation fixed_data.group) |
Invitation Management
Script: .claude/scripts/authentik-invite.sh
# Create invitation (single-use, no expiry)
./authentik-invite.sh create "Headscale Users"
# Create invitation with expiry
./authentik-invite.sh create "Wrongmove Users" --days 7
# Add user to group after enrollment
./authentik-invite.sh assign <username> "Headscale Users"
# List pending invitations
./authentik-invite.sh list
Invited users sign up via social login (Google/GitHub/Facebook) or passkey. No username/password enrollment.
The target group (e.g. "Headscale Users") is auto-assigned on enrollment via the invitation-group-assignment expression policy. The assign command is available for manual post-enrollment group changes.
Cleanup Log (2026-03-13)
Deleted Flows
enrollment-inviation(typo) -- previous invitation attemptheadscale-authentication-- not used by any providerheadscale-authorization-- not used by any providerdefault-enrollment-flow-- password-based, unusedoauth-enrollment-- replaced by invitation-enrollment
Deleted Stages
enrollment-invitation,enrollment-invitation-write(from old invitation flow)invitation(unbound)default-enrollment-prompt-first,default-enrollment-prompt-second(from default enrollment)default-enrollment-user-write,default-enrollment-email-verification,default-enrollment-user-login
Deleted Groups
authentik Read-only-- 0 users, unused role
Deleted Policies
map github username to email-- unboundMap Google Attributes-- unbound
Deleted Roles
authentik Read-only-- no group assignment
Policy Fix (2026-04-06)
Unbound brute-force-protection Policy
The brute-force-protection ReputationPolicy (PK: ac98cb11-31d3-46ab-8883-bf51e6b09a60, check_username=True, check_ip=True, threshold=-5) was bound to 3 authentication flows, causing "Flow does not apply to current user" for all unauthenticated users (no username to evaluate → failure_result=false → flow denied).
Removed bindings from:
default-authentication-flow(PK:34618cf3) — username/password loginwebauthn(PK:0b60c2a5) — passkey logindefault-source-authentication(PK: via policybindingmodel1a779f24) — Google/GitHub/Facebook OAuth
Policy still exists with 0 bindings. If brute-force protection is needed, bind to the password stage (not the flow level).
Session Duration (2026-05-01)
Pinned via Terraform in stacks/authentik/:
| Knob | Value | Surface | Effect |
|---|---|---|---|
UserLoginStage.session_duration on default-authentication-login |
weeks=4 |
authentik_stage_user_login.default_login in authentik_provider.tf |
Authenticated users stay logged in 4 weeks across browser restarts. No sliding refresh — resets on each login. |
ProxyProvider.access_token_validity on Provider for Domain wide catch all |
weeks=4 |
authentik_provider_proxy.catchall.access_token_validity in authentik_provider.tf |
Cookie Max-Age on authentik_proxy_* and expires on rows in authentik_providers_proxy_proxysession. Bumped 2026-05-10 from hours=168. Bumping requires kubectl rollout restart deploy/ak-outpost-authentik-embedded-outpost — the gorilla session store binds the value once at outpost startup; the 5-min provider refresh logs "reusing existing session store" and skips rebuild. |
AUTHENTIK_SESSIONS__UNAUTHENTICATED_AGE (server + worker) |
hours=2 |
server.env + worker.env in modules/authentik/values.yaml |
Anonymous Django sessions (bots, healthcheckers, partial flows) are reaped within 2h instead of the 1d default. |
Notes:
- There is no
Brand.session_duration;UserLoginStageis the only correct lever for authenticated session lifetime. - Embedded outpost session storage: PostgreSQL table
authentik_providers_proxy_proxysessionin authentik 2025.10+ (PR #16628), but only whenIsEmbedded()returns true (i.e.Outpost.managed == "goauthentik.io/outposts/embedded"). Our outpost record hadmanaged=nulluntil 2026-05-10, which silently kept it on the gorillaFilesystemStoreat/dev/shm(TMPDIR) and re-exposed the 2026-04-18 mismatched-session-ID class on every pod restart. Fix landed 2026-05-10: seeauthentik_outpost.embeddedinauthentik_provider.tfand post-mortem2026-04-18-authentik-outpost-shm-full.md. - The proxy outpost service has a known goauthentik 2026.2.2 bug (
internal/outpost/controllers/k8s/service.py:52): for embedded outposts the controller sets the Service selector toapp.kubernetes.io/name=authentik(the server pods), notauthentik-outpost-proxy. We work around it via akubernetes_json_patches.servicepatch on the outpost record (replaces/spec/selectorwith the outpost's own labels). Without this, endpoints are empty and Traefik forward-auth fails over to the Basic Auth realmEmergency Access. - The standalone embedded-outpost deployment needs
AUTHENTIK_POSTGRESQL__{HOST,PORT,USER,PASSWORD,NAME}env vars to reach the dbaas cluster — codified viakubernetes_json_patches.deploymentenvFrom the sharedgoauthentikSecret. Theapp.kubernetes.io/component=serverpod label is also injected via JSON patch (matches thecomponent:serverhalf of the Service selector that the controller adds for embedded outposts). ProxyProvider.remember_me_offsetstays UI-managed viaignore_changes.- The Authentik provider's resource schema does not expose the
Outpost.managedfield. We rely on TF's "write only fields it knows about" semantic: the server-setgoauthentik.io/outposts/embeddedvalue is preserved across applies because Terraform never writesmanaged. Don't change the resource provider schema expectations without verifying this assumption holds. - The
unauthenticated_ageenv var is injected viaserver.env/worker.env(notauthentik.sessions.unauthenticated_age) because we setauthentik.existingSecret.secretName: goauthentik, which makes the chart skip rendering its ownAUTHENTIK_*Secret. Theauthentik.*value block is therefore inert in this stack — anything new underauthentik.*must use the*.envarrays instead. The same applies to the existingauthentik.cache.*,authentik.web.*,authentik.worker.*blocks (currently inert; live values come from the orphaned, helm-keep-policygoauthentikSecret created by chart 2025.10.3 beforeexistingSecretwas introduced).
Upgrade Validation Checklist
Run after any of these:
- Authentik chart version bump in
stacks/authentik/modules/authentik/main.tf(theversion = "..."line onhelm_release.authentik). goauthentik/authentikTerraform provider version bump.- Outpost pod recreation (kured reboot, eviction, manual
rollout restart, scheduler move).
The fragile surfaces are the kubernetes_json_patches and the Outpost.managed field — both rely on assumptions that can silently break across upgrades. The checklist exercises the same path the alerts watch, so it doubles as a smoke test for the alerts.
# 1. Service routes to the outpost pod (NOT the server pods).
# Empty endpoints => auth-proxy fallback fires; expected: ONE pod IP, ports 9000/9300/9443.
kubectl -n authentik get endpoints ak-outpost-authentik-embedded-outpost
# 2. Service selector still excludes the server pods. Expected: includes
# `app.kubernetes.io/name: authentik-outpost-proxy`. If it flips to
# `name: authentik`, the goauthentik upstream bug came back or our
# JSON patch was unset.
kubectl -n authentik get svc ak-outpost-authentik-embedded-outpost -o jsonpath='{.spec.selector}'
# 3. Outpost mode + session backend. Expected log lines on startup:
# {"embedded":true,"event":"Outpost mode",...}
# {"event":"using PostgreSQL session backend",...}
# If embedded=false or `using filesystem session backend`, the postgres
# fix is broken — likely `Outpost.managed` got cleared, or the upstream
# schema started exposing `managed` and TF reset it.
kubectl -n authentik logs deploy/ak-outpost-authentik-embedded-outpost | grep -E '"Outpost mode"|"session backend"' | head -3
# 4. /dev/shm is essentially empty (postgres backend = no filesystem use).
# A row count > a few dozen indicates filesystem fallback is firing.
kubectl -n authentik exec deploy/ak-outpost-authentik-embedded-outpost -- sh -c 'df -h /dev/shm; ls /dev/shm | wc -l'
# 5. Postgres session table is growing with traffic. Expected: rows with
# `expires` ~28 days out (matches access_token_validity = weeks=4).
kubectl -n authentik exec deploy/goauthentik-server -- ak shell -c "
from django.db import connection; c = connection.cursor()
c.execute('SELECT COUNT(*), MAX(expires) FROM authentik_providers_proxy_proxysession')
print(c.fetchone())"
# 6. Edge auth flow: should be 302 → authentik. NOT 401 with WWW-Authenticate.
curl -sS -o /dev/null -D - 'https://terminal.viktorbarzin.me/' -H 'User-Agent: Mozilla/5.0' \
| grep -iE '^HTTP|^location|x-auth-fallback|www-authenticate'
# 7. Terraform plan-to-zero on the whole authentik stack.
( cd stacks/authentik && /home/wizard/code/infra/scripts/tg plan ) | grep -E 'No changes|Plan:'
Steps 1, 3, 6 cover the failure modes the Prometheus alerts trigger on (AuthentikForwardAuthFallbackActive, AuthentikOutpostForwardAuth400Spike). Steps 4 and 5 cover the silent-regression case (filesystem fallback) where the alerts don't fire but the system loses its postgres-backed session persistence on the next pod restart.
If step 2 shows the controller restored app.kubernetes.io/name=authentik, watch goauthentik/authentik issue tracker for fixes around internal/outpost/controllers/k8s/service.py:52 — the upstream patch might let us drop our kubernetes_json_patches.service workaround.