6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.6 KiB
Matrix: Synapse → tuwunel migration — Plan (executed)
Date: 2026-06-08 · Companion: 2026-06-08-matrix-synapse-to-tuwunel-design.md
Executed steps
- Vault — generated a 32-byte
registration_token, stored atsecret/matrix. stacks/matrixrewrite — replaced Synapse with tuwunel: removed thematrix-db-credsExternalSecret, both init-containers (install-psycopg2,inject-db-password), theextra-packagesvolume, and the Reloader annotation; added thematrix-secretsExternalSecret (vault-kvdataFrom), theTUWUNEL_*env,securityContext1000, and the tuwunel image. Encrypted PVC, Service (80→8008), and ingress (auth="none", proxied) unchanged.- The image is in the deployment's
ignore_changes(KEEL_IGNORE_IMAGE); it was temporarily un-ignored for this base-image swap, then re-added at step 4 so Keel resumes tag management. tg init -reconfigurewas required first (Tier-1 PG-backend creds rotate weekly → "Backend configuration block has changed").
- The image is in the deployment's
- Apply —
Plan: 1 to add, 2 to change, 1 to destroy. tuwunel 1.7.1 came up 1/1, created a fresh RocksDB on the encrypted PVC (no permission errors — fsGroup worked). - Verify — all
200:/_tuwunel/server_version,.well-known/matrix/ {client,server},/_matrix/client/versions,/_matrix/federation/v1/version. Registered@viktor:matrix.viktorbarzin.me(first user → admin) via the token flow;whoamiconfirmed. Creds stored atsecret/matrix(admin_user,admin_password). - Lock down —
TUWUNEL_ALLOW_REGISTRATION=false+ re-added imageignore_changes; applied. Registration now returns403 M_FORBIDDEN. - Cleanup —
stacks/vault: removed thepg_matrixstatic role + itsallowed_rolesentry (targeted apply — the full plan also wanted an unrelated OIDCtune-TTL change, deliberately NOT applied; see residual items).- Dropped the orphaned
matrixPostgres DB (16 MB) +matrixrole on the CNPG primary (pg-cluster-2). - Docs updated:
.claude/CLAUDE.md(PG-rotation list),service-catalog.md,upgrade-config.json(removed synapse image-rename + matrix PG entry),authentication.md+authentik-state.md(Matrix OIDC → orphaned).
Rollback
Fresh start was confirmed, so there is no Synapse data to preserve. To revert the
service: restore the Synapse main.tf from git, re-add the pg_matrix Vault
role, and restore the matrix Postgres DB from the daily per-db dump
(/backup/per-db/matrix/). The reused encrypted PVC still holds Synapse's old
homeserver.yaml / signing key / media at the volume root alongside the new
RocksDB dir.
Residual / follow-up items (flagged to user)
- Authentik Matrix OAuth2 app — REMOVED 2026-06-08 (user-confirmed). It was
UI-managed (NOT in the authentik TF stack), so it was deleted via the Authentik
API: application
matrix+ OAuth2 providerpk=6. tuwunel uses native password auth, so nothing consumed it. - Pre-existing drift in
stacks/vault:vault_jwt_auth_backend.oidcshows atunediff (explicit768hdefault/max lease TTLs being dropped). This predates this migration and was not applied. Resolve separately. - Synapse leftover files remain on the encrypted PVC volume root (unused by
tuwunel). Can be
rm'd after confidence in the new server.
Follow-up: open registration + bot mitigations (2026-06-08, user-chosen)
Registration was opened fully (tokenless) — TUWUNEL_ALLOW_REGISTRATION=true
TUWUNEL_YES_I_AM_VERY_VERY_SURE_I_WANT_AN_OPEN_REGISTRATION_SERVER_PRONE_TO_ABUSE=true, dropped theTUWUNEL_REGISTRATION_TOKENenv (the Vaultsecret/matrixtoken +matrix-secretsESO are kept for one-env-change revert to token-gated). tuwunel has no CAPTCHA (only Synapse does) and a browser challenge would break native clients, so bot defense is layered instead:
- Traefik rate-limit on
/register— aregister-ratelimitMiddleware (stacks/matrix) on a path-scopedingress_registercarve-out (longer prefix wins over the catch-all). Keyed on the request Host (global/registercap), not source IP — because the host is reachable both via Cloudflare-IPv4 (CF-Connecting-IP) and IPv6-direct (HE tunnel → pfSense HAProxy → Traefik, no CF header); a per-source key let IPv6 bots bypass entirely (found during testing). 10/min, burst 20, per Traefik replica (×3). - CrowdSec (already on the ingress chain) is the hard backstop — bans abusive IPs on both paths; covers the per-replica looseness of the soft rate-limit.
- Notification: Loki ruler rule
MatrixNewUserRegistered(stacks/monitoring, matches... registered on this server, never the rejection line) →lane=security→ existing#securitySlack receiver. Also note tuwunel's admin bot (@conduit:matrix.viktorbarzin.me) natively posts every registration to the server admin room, so there's an in-Matrix notice too. - Verification: open signup returns 200 (
@regtest1, since deactivated via!admin users deactivatein the admin room); Traefik access logs confirm/registerroutes through the rate-limited carve-out router. A live 429 was not force-tested (per-replica burst ~60 across 3 replicas; avoided hammering so as not to trip CrowdSec on the test source IP).
Add a user: anyone can self-register now. To provision manually instead:
!admin users create-user <name> in the admin room (first user @viktor is admin).
Revert to token-gated: drop the YES_I_AM... flag, re-add TUWUNEL_REGISTRATION_TOKEN.