docs/authentik: document postgres session backend + close out 2026-04-18 post-mortem items
Update `.claude/reference/authentik-state.md`:
- Add `ProxyProvider.access_token_validity = "weeks=4"` to the Session
Duration table with the gotcha that the gorilla session store binds
the value once at outpost startup (rollout restart needed).
- Replace the "session storage moved to Postgres in 2025.10" note that
falsely implied the migration was automatic — explain that the
`Outpost.managed` field gates the postgres path and our outpost
silently stayed on `FilesystemStore` until 2026-05-10.
- Document the goauthentik 2026.2.2 service-selector bug
(service.py:52) and the JSON-patch workaround.
- Document that the standalone embedded-outpost deployment needs
`AUTHENTIK_POSTGRESQL__*` env vars injected via JSON patch, plus the
`app.kubernetes.io/component=server` pod label.
- Note the "Terraform doesn't expose `Outpost.managed`" assumption
that holds the `managed=embedded` value in place across applies.
Close out post-mortem `2026-04-18-authentik-outpost-shm-full.md`:
- P2 codify-in-Terraform: DONE.
- P3 access_token_validity reduce: DONE-alt (we did the opposite —
bumped to 4 weeks — because postgres backend mooted the storage
concern).
- P3 move-off-embedded-outpost: DONE-alt (postgres backend addresses
the loss-of-state class on the embedded outpost itself).
This commit is contained in:
parent
30cdd05bd8
commit
117b99e28f
2 changed files with 9 additions and 5 deletions
|
|
@ -127,10 +127,14 @@ Pinned via Terraform in `stacks/authentik/`:
|
|||
| Knob | Value | Surface | Effect |
|
||||
|------|-------|---------|--------|
|
||||
| `UserLoginStage.session_duration` on `default-authentication-login` | `weeks=4` | `authentik_stage_user_login.default_login` in `authentik_provider.tf` | Authenticated users stay logged in 4 weeks across browser restarts. No sliding refresh — resets on each login. |
|
||||
| `ProxyProvider.access_token_validity` on `Provider for Domain wide catch all` | `weeks=4` | `authentik_provider_proxy.catchall.access_token_validity` in `authentik_provider.tf` | Cookie `Max-Age` on `authentik_proxy_*` and `expires` on rows in `authentik_providers_proxy_proxysession`. Bumped 2026-05-10 from `hours=168`. **Bumping requires `kubectl rollout restart deploy/ak-outpost-authentik-embedded-outpost`** — the gorilla session store binds the value once at outpost startup; the 5-min provider refresh logs `"reusing existing session store"` and skips rebuild. |
|
||||
| `AUTHENTIK_SESSIONS__UNAUTHENTICATED_AGE` (server + worker) | `hours=2` | `server.env` + `worker.env` in `modules/authentik/values.yaml` | Anonymous Django sessions (bots, healthcheckers, partial flows) are reaped within 2h instead of the 1d default. |
|
||||
|
||||
Notes:
|
||||
- There is **no** `Brand.session_duration`; `UserLoginStage` is the only correct lever for authenticated session lifetime.
|
||||
- Embedded outpost session storage moved from `/dev/shm` → Postgres table `authentik_providers_proxy_proxysession` in authentik 2025.10. The 2026-04-18 `/dev/shm`-fill outage class is no longer load-bearing in 2026.2.2; the `unauthenticated_age` cap is still the right lever for anonymous-session bloat from external monitors.
|
||||
- `ProxyProvider.access_token_validity` and `remember_me_offset` stay UI-managed via `ignore_changes`.
|
||||
- Embedded outpost session storage: PostgreSQL table `authentik_providers_proxy_proxysession` in authentik 2025.10+ (PR #16628), but **only when `IsEmbedded()` returns true** (i.e. `Outpost.managed == "goauthentik.io/outposts/embedded"`). Our outpost record had `managed=null` until 2026-05-10, which silently kept it on the gorilla `FilesystemStore` at `/dev/shm` (TMPDIR) and re-exposed the 2026-04-18 mismatched-session-ID class on every pod restart. Fix landed 2026-05-10: see `authentik_outpost.embedded` in `authentik_provider.tf` and post-mortem `2026-04-18-authentik-outpost-shm-full.md`.
|
||||
- The proxy outpost service has a known goauthentik 2026.2.2 bug (`internal/outpost/controllers/k8s/service.py:52`): for embedded outposts the controller sets the Service selector to `app.kubernetes.io/name=authentik` (the server pods), not `authentik-outpost-proxy`. We work around it via a `kubernetes_json_patches.service` patch on the outpost record (replaces `/spec/selector` with the outpost's own labels). Without this, endpoints are empty and Traefik forward-auth fails over to the Basic Auth realm `Emergency Access`.
|
||||
- The standalone embedded-outpost deployment needs `AUTHENTIK_POSTGRESQL__{HOST,PORT,USER,PASSWORD,NAME}` env vars to reach the dbaas cluster — codified via `kubernetes_json_patches.deployment` envFrom the shared `goauthentik` Secret. The `app.kubernetes.io/component=server` pod label is also injected via JSON patch (matches the `component:server` half of the Service selector that the controller adds for embedded outposts).
|
||||
- `ProxyProvider.remember_me_offset` stays UI-managed via `ignore_changes`.
|
||||
- The Authentik provider's resource schema does **not** expose the `Outpost.managed` field. We rely on TF's "write only fields it knows about" semantic: the server-set `goauthentik.io/outposts/embedded` value is preserved across applies because Terraform never writes `managed`. Don't change the resource provider schema expectations without verifying this assumption holds.
|
||||
- The `unauthenticated_age` env var is injected via `server.env` / `worker.env` (not `authentik.sessions.unauthenticated_age`) because we set `authentik.existingSecret.secretName: goauthentik`, which makes the chart skip rendering its own `AUTHENTIK_*` Secret. The `authentik.*` value block is therefore inert in this stack — anything new under `authentik.*` must use the `*.env` arrays instead. The same applies to the existing `authentik.cache.*`, `authentik.web.*`, `authentik.worker.*` blocks (currently inert; live values come from the orphaned, helm-keep-policy `goauthentik` Secret created by chart 2025.10.3 before `existingSecret` was introduced).
|
||||
|
|
|
|||
|
|
@ -117,7 +117,7 @@ Contributing distractions:
|
|||
|
||||
| Priority | Action | Type | Details | Status |
|
||||
|----------|--------|------|---------|--------|
|
||||
| P2 | Codify the catch-all Proxy provider + embedded outpost config in Terraform | Architecture | Adopt `goauthentik/authentik` Terraform provider in `infra/stacks/authentik/`. Import the existing UUID `0eecac07-97c7-443c-8925-05f2f4fe3e47` and the catch-all provider pk=5. Move `kubernetes_json_patches` into TF so the fix is reviewable in git. | TODO |
|
||||
| P2 | Codify the catch-all Proxy provider + embedded outpost config in Terraform | Architecture | Adopt `goauthentik/authentik` Terraform provider in `infra/stacks/authentik/`. Import the existing UUID `0eecac07-97c7-443c-8925-05f2f4fe3e47` and the catch-all provider pk=5. Move `kubernetes_json_patches` into TF so the fix is reviewable in git. **Done 2026-05-10**: `authentik_outpost.embedded` resource + `authentik_provider_proxy.catchall.access_token_validity` codified, plan-to-zero on the whole stack. The `Outpost.managed` field is server-set (not in provider schema) and preserved across applies because TF only writes known fields. Same-day work also flipped the outpost's session backend from filesystem (`/dev/shm`) to PostgreSQL — see `.claude/reference/authentik-state.md`. | **DONE** |
|
||||
| P2 | Runbook: Authentik forward-auth troubleshooting | Docs | Add a runbook at `docs/runbooks/authentik-forward-auth-400.md` with the "grep outpost logs first" first step, plus pointer commands for `/dev/shm` usage, session file count, and recent authorize events. | TODO |
|
||||
|
||||
### P3 — Upstream + architectural
|
||||
|
|
@ -125,8 +125,8 @@ Contributing distractions:
|
|||
| Priority | Action | Type | Details | Status |
|
||||
|----------|--------|------|---------|--------|
|
||||
| P3 | Comment/support on authentik issue [#20093](https://github.com/goauthentik/authentik/issues/20093) | Upstream | Request either a persistent-backed session store (Redis/DB) OR a configurable GC interval shorter than the default 5 min. | TODO |
|
||||
| P3 | Consider shortening `access_token_validity` from 168h (7 days) to 24h | Config | Reduces steady-state session file count from ~181k to ~26k (7× reduction). Trade-off: users re-auth daily. Viktor's call on UX tolerance. | TODO |
|
||||
| P3 | Evaluate moving forward-auth away from the embedded outpost | Architecture | The embedded outpost is a single replica Go binary with in-memory session state. An external, multi-replica outpost with Redis-backed sessions is the production-grade deployment. Probably overkill for a home-lab, but worth noting. | TODO (paused) |
|
||||
| P3 | Consider shortening `access_token_validity` from 168h (7 days) to 24h | Config | Original idea: shrink steady-state session file count (~7× reduction) at the cost of daily re-auth. **Resolved differently 2026-05-10**: switched the outpost to the PostgreSQL session backend (`Outpost.managed = goauthentik.io/outposts/embedded` + `AUTHENTIK_POSTGRESQL__*` envFrom), which makes session count irrelevant for tmpfs sizing and lets us BUMP `access_token_validity` to `weeks=4` for better UX without cost. | **DONE (alt)** |
|
||||
| P3 | Evaluate moving forward-auth away from the embedded outpost | Architecture | Original framing: external, multi-replica outpost with Redis-backed sessions. **Resolved 2026-05-10** by enabling the postgres-backed session store on the embedded outpost itself (PR goauthentik/authentik#16628). Sessions now persist across pod restarts; the original "in-memory state" concern is moot. Multi-replica still requires a goauthentik upstream fix (PgBouncer-friendly session migration), but the loss-of-state class of failures is gone. | **DONE (alt)** |
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue