6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
92 lines
5.6 KiB
Markdown
92 lines
5.6 KiB
Markdown
# Matrix: Synapse → tuwunel migration — Plan (executed)
|
||
|
||
**Date:** 2026-06-08 · **Companion:** `2026-06-08-matrix-synapse-to-tuwunel-design.md`
|
||
|
||
## Executed steps
|
||
|
||
1. **Vault** — generated a 32-byte `registration_token`, stored at
|
||
`secret/matrix`.
|
||
2. **`stacks/matrix` rewrite** — replaced Synapse with tuwunel: removed the
|
||
`matrix-db-creds` ExternalSecret, both init-containers (`install-psycopg2`,
|
||
`inject-db-password`), the `extra-packages` volume, and the Reloader
|
||
annotation; added the `matrix-secrets` ExternalSecret (vault-kv `dataFrom`),
|
||
the `TUWUNEL_*` env, `securityContext` 1000, and the tuwunel image. Encrypted
|
||
PVC, Service (`80→8008`), and ingress (`auth="none"`, proxied) unchanged.
|
||
- The image is in the deployment's `ignore_changes` (KEEL_IGNORE_IMAGE); it
|
||
was **temporarily un-ignored** for this base-image swap, then re-added at
|
||
step 4 so Keel resumes tag management.
|
||
- `tg init -reconfigure` was required first (Tier-1 PG-backend creds rotate
|
||
weekly → "Backend configuration block has changed").
|
||
3. **Apply** — `Plan: 1 to add, 2 to change, 1 to destroy`. tuwunel 1.7.1 came up
|
||
1/1, created a fresh RocksDB on the encrypted PVC (no permission errors —
|
||
fsGroup worked).
|
||
4. **Verify** — all `200`: `/_tuwunel/server_version`, `.well-known/matrix/
|
||
{client,server}`, `/_matrix/client/versions`, `/_matrix/federation/v1/version`.
|
||
Registered `@viktor:matrix.viktorbarzin.me` (first user → admin) via the token
|
||
flow; `whoami` confirmed. Creds stored at `secret/matrix`
|
||
(`admin_user`, `admin_password`).
|
||
5. **Lock down** — `TUWUNEL_ALLOW_REGISTRATION=false` + re-added image
|
||
`ignore_changes`; applied. Registration now returns `403 M_FORBIDDEN`.
|
||
6. **Cleanup** —
|
||
- `stacks/vault`: removed the `pg_matrix` static role + its `allowed_roles`
|
||
entry (targeted apply — the full plan also wanted an **unrelated** OIDC
|
||
`tune`-TTL change, deliberately NOT applied; see residual items).
|
||
- Dropped the orphaned `matrix` Postgres DB (16 MB) + `matrix` role on the
|
||
CNPG primary (`pg-cluster-2`).
|
||
- Docs updated: `.claude/CLAUDE.md` (PG-rotation list), `service-catalog.md`,
|
||
`upgrade-config.json` (removed synapse image-rename + matrix PG entry),
|
||
`authentication.md` + `authentik-state.md` (Matrix OIDC → orphaned).
|
||
|
||
## Rollback
|
||
|
||
Fresh start was confirmed, so there is no Synapse data to preserve. To revert the
|
||
*service*: restore the Synapse `main.tf` from git, re-add the `pg_matrix` Vault
|
||
role, and restore the `matrix` Postgres DB from the daily per-db dump
|
||
(`/backup/per-db/matrix/`). The reused encrypted PVC still holds Synapse's old
|
||
`homeserver.yaml` / signing key / media at the volume root alongside the new
|
||
RocksDB dir.
|
||
|
||
## Residual / follow-up items (flagged to user)
|
||
|
||
- **Authentik Matrix OAuth2 app — REMOVED 2026-06-08** (user-confirmed). It was
|
||
UI-managed (NOT in the authentik TF stack), so it was deleted via the Authentik
|
||
API: application `matrix` + OAuth2 provider `pk=6`. tuwunel uses native password
|
||
auth, so nothing consumed it.
|
||
- **Pre-existing drift in `stacks/vault`**: `vault_jwt_auth_backend.oidc` shows a
|
||
`tune` diff (explicit `768h` default/max lease TTLs being dropped). This
|
||
predates this migration and was **not** applied. Resolve separately.
|
||
- **Synapse leftover files** remain on the encrypted PVC volume root (unused by
|
||
tuwunel). Can be `rm`'d after confidence in the new server.
|
||
|
||
## Follow-up: open registration + bot mitigations (2026-06-08, user-chosen)
|
||
|
||
Registration was opened **fully (tokenless)** — `TUWUNEL_ALLOW_REGISTRATION=true`
|
||
+ `TUWUNEL_YES_I_AM_VERY_VERY_SURE_I_WANT_AN_OPEN_REGISTRATION_SERVER_PRONE_TO_ABUSE=true`,
|
||
dropped the `TUWUNEL_REGISTRATION_TOKEN` env (the Vault `secret/matrix` token +
|
||
`matrix-secrets` ESO are kept for one-env-change revert to token-gated). tuwunel
|
||
has **no CAPTCHA** (only Synapse does) and a browser challenge would break native
|
||
clients, so bot defense is layered instead:
|
||
|
||
- **Traefik rate-limit on `/register`** — a `register-ratelimit` Middleware
|
||
(`stacks/matrix`) on a path-scoped `ingress_register` carve-out (longer prefix
|
||
wins over the catch-all). Keyed on the **request Host (global `/register` cap),
|
||
not source IP** — because the host is reachable both via Cloudflare-IPv4
|
||
(`CF-Connecting-IP`) and **IPv6-direct (HE tunnel → pfSense HAProxy → Traefik,
|
||
no CF header)**; a per-source key let IPv6 bots bypass entirely (found during
|
||
testing). 10/min, burst 20, **per Traefik replica (×3)**.
|
||
- **CrowdSec** (already on the ingress chain) is the hard backstop — bans abusive
|
||
IPs on both paths; covers the per-replica looseness of the soft rate-limit.
|
||
- **Notification:** Loki ruler rule `MatrixNewUserRegistered` (`stacks/monitoring`,
|
||
matches `... registered on this server`, never the rejection line) → `lane=security`
|
||
→ existing `#security` Slack receiver. Also note tuwunel's admin bot
|
||
(`@conduit:matrix.viktorbarzin.me`) **natively posts every registration to the
|
||
server admin room**, so there's an in-Matrix notice too.
|
||
- **Verification:** open signup returns 200 (`@regtest1`, since deactivated via
|
||
`!admin users deactivate` in the admin room); Traefik access logs confirm
|
||
`/register` routes through the rate-limited carve-out router. A live 429 was not
|
||
force-tested (per-replica burst ~60 across 3 replicas; avoided hammering so as
|
||
not to trip CrowdSec on the test source IP).
|
||
|
||
**Add a user:** anyone can self-register now. To provision manually instead:
|
||
`!admin users create-user <name>` in the admin room (first user `@viktor` is admin).
|
||
**Revert to token-gated:** drop the YES_I_AM... flag, re-add `TUWUNEL_REGISTRATION_TOKEN`.
|