immich(frame-emo): show photos from the last 365 days (was 730)

Emil asked his Sofia Portal Mini photo-frame to show only the past year of photos rolling from today, instead of the last two years. Changes ImagesFromDays 730 -> 365 in the frame-emo Settings.yml. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 07:49:12 +00:00
78 changed files with 2815 additions and 9164 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
--- a/.claude/reference/service-catalog.md
+++ b/.claude/reference/service-catalog.md
@ -81,7 +81,7 @@
 | ytdlp | YouTube downloader | ytdlp |
 | wealthfolio | Finance tracking | wealthfolio |
 | audiobookshelf | Audiobook server (may be merged into ebooks stack) | audiobookshelf |
-| paperless-ngx | Document management. Mail ingest: forward document emails to `docs@viktorbarzin.me` — sender maps 1:1 to a paperless account (runbook `paperless-mail-ingest.md`) | paperless-ngx |
+| paperless-ngx | Document management | paperless-ngx |
 | jsoncrack | JSON visualizer | jsoncrack |
 | servarr | Media automation (Sonarr/Radarr/etc) | servarr |
 | aiostreams | Stremio stream aggregator (Real-Debrid + Torrentio/Comet/StremThru Torz/Knaben; **MediaFusion removed 2026-06-07** — broken upstream `500`). `auth=app` (own UUID+password); stream-probe tests **both series+movie paths** with per-source breakdown (`aiostreams_streams_{comet,torrentio,stremthru_torz,knaben}`) + `aiostreams_error_streams` + `aiostreams_movie_stream_count`, success gated on Comet (workhorse) being alive; weekly NFS config + Stremio-account-collection backups to `/srv/nfs/aiostreams-backup/`. PG-backed user config (Comet timeout bumped 5s→10s 2026-06-07). | servarr/aiostreams |
@ -99,7 +99,6 @@
 | tor-proxy | Tor proxy | tor-proxy |
 | forgejo | Git forge. Open native self-signup (Turnstile captcha + email confirm) + Authentik & GitHub OAuth sign-in; see `docs/runbooks/forgejo-open-signups.md` | forgejo |
 | freshrss | RSS reader | freshrss |
-| drone-logbook | DJI flight-log analyzer (Open DroneLog, upstream image) — dronelog.viktorbarzin.me | drone-logbook |
 | navidrome | Music streaming | navidrome |
 | networking-toolbox | Network tools | networking-toolbox |
 | stirling-pdf | PDF tools | stirling-pdf |
@ -121,9 +120,7 @@
 | status-page | Status page | status-page |
 | plotting-book | Book plotting/world-building app | plotting-book |
 | tripit | Self-hosted TripIt-clone travel-itinerary PWA (FastAPI + SvelteKit SPA, same-origin). CNPG (`tripit` db, Vault static role `pg-tripit`) + RWX NFS trip-doc vault (`/srv/nfs/tripit-documents`) + RWO `proxmox-lvm-encrypted` personal-document vault `tripit-personal-documents` (passports/IDs — AES-256-GCM app-layer envelope, master key `DOCUMENT_ENCRYPTION_KEY` in `secret/tripit`). `auth=required` (Authentik forward-auth, reads `X-authentik-email`); second `auth=none` ingress on `/api/calendar` for HMAC-token-gated `.ics` feed. Email-ingest CronJob `tripit-ingest-plans` (`*/15`) is the SOLE inbound path — forward a booking to plans@viktorbarzin.me (catch-all → spam@), polled read-only and routed ONLY to a registered user / verified linked address (no default-owner fallback; strangers ignored), parsed by local LLM (`qwen3vl-4b`), and the sender is emailed the outcome (Added to trip / Couldn't import). Plus `tripit-poll-flights`, `tripit-run-reminders`, `tripit-transport-nudge`, `tripit-weather-brief`. (The old Gmail-scrape `tripit-ingest-mail` CronJob was removed 2026-06-05.) App secrets in Vault `secret/tripit`. | tripit |
-| tasks | Reminders-style tasks PWA over Nextcloud CalDAV (FastAPI + SvelteKit SPA same-origin, single container; code `~/code/tasks`, design `tasks/docs/2026-07-03-tasks-pwa-design.md`). Nextcloud stays the source of truth (VTODOs); the app is the front-end Apple Reminders stopped being. CNPG (`tasks` db, Vault static role `pg-tasks`) stores Connected Accounts — per-user Nextcloud app passwords Fernet-encrypted with `fernet_key` from `secret/tasks`. `auth=required` (Authentik forward-auth; identity = `X-authentik-username`, NO app-level login — `DEV_USER` must never be set in prod) at tasks.viktorbarzin.me (proxied). Exception: the five PWA icon/manifest files (`/apple-touch-icon.png`, `/favicon.png`, `/pwa-192x192.png`, `/pwa-512x512.png`, `/manifest.webmanifest`) are a path-scoped `auth=none` carve-out (`module.ingress_icons`) so cookie-less OS icon fetchers (macOS Safari Add-to-Dock, mobile home-screen installs) get the real icon instead of the Authentik 302; guarded by the `tasks-icons` walloff-probe target. NetworkPolicy `tasks-ingress` (SEC-1) restricts pod ingress to traefik + monitoring namespaces so the trusted header can't be spoofed pod-to-pod. GHA → public ghcr `tasks` → Woodpecker deploy (ADR-0002). | tasks |
-| stem95su | STEM educational platform for **95. СУ „Проф. Иван Шишманов"** (Sofia school) at stem95su.viktorbarzin.me — **a Valia site on Cloudflare Pages since 2026-07-03** (ADR-0018): registry entry in `stacks/valia-sites`, synced from Drive folder "claude" every 10 min, deploy-on-change. The old in-cluster stack (nginx off PVE NFS + per-site rclone CronJob) is RETIRED — stacks/stem95su is a tombstone; `secret/stem95su` superseded by `secret/valia-sites`; `stem_video.mp4` was compressed 42.9→21.4MB (25MB Pages cap) with Viktor's OK. See docs/runbooks/valia-sites.md. | — |
-| valia-sites | **Valia-site registry + sync** (ADR-0018): all sites authored by Valia serve OFF-INFRA on Cloudflare Pages (`bridge` + `stem95su` live). One map entry in `stacks/valia-sites/main.tf` per site fans out Pages project + custom domain + public CNAME + internal split-horizon CNAME (ConfigMap `valia-sites-dns` → technitium sync, declarative incl. removal). CronJob `valia-sites-sync` (`*/10`, image ghcr `valia-sites-sync`) mirrors each Drive Content folder (rclone `drive.readonly`, stem95su-style guards + 25MB Pages-cap guard) and wrangler-deploys ONLY on manifest change (free-tier deploy cap). Secrets `secret/valia-sites` (shared rclone conf + SCOPED CF Pages token — Global API Key never in pods). Failed-Job-only visibility by choice. Runbook: docs/runbooks/valia-sites.md. | valia-sites |
+| stem95su | STEM educational platform for **95. СУ „Проф. Иван Шишманов"** (Sofia school) at stem95su.viktorbarzin.me. Public **open** static site (`auth=none` — CrowdSec + ai-bot-block, no login). Stock `nginx:1.28-alpine` serving content **straight off PVE host NFS** `/srv/nfs/stem-site` (RWX `nfs_volume`, mounted read-only) — **NOT** image-baked, so the externally-authored (Gemini-exported) HTML/media updates with no rebuild; auto-backed-up offsite by `nfs-mirror`. **Content source = Google Drive folder "claude"** (id `1cmOI2jRyBJdnrVPgbr4kx2cx_4DY6pm_`, shared Valentina→vbarzin@gmail.com). **Deploy = scheduled mirror** (since 2026-06-09, reversed the earlier on-demand-only call once content went active): CronJob `stem95su-gdrive-sync` (`*/10`, `stacks/stem95su/gdrive-sync.tf`) mounts the content PVC RW and `rclone sync`s the Drive folder onto it (`docker.io/rclone/rclone:1.74.3`, `scope=drive.readonly` — Drive is READ-ONLY; empty-source guard + `--max-delete 25` so a partial listing can't wipe the site). rclone creds (OAuth refresh-token) in Vault `secret/stem95su` (`rclone_conf`) → ESO secret `stem95su-rclone`. **Requires the GCP OAuth app (project home-lab-1700868541205) published to "Production"** or the refresh token expires ~weekly (re-mint + `vault kv put secret/stem95su rclone_conf=…` after publishing); a dead token surfaces as a failed Job. Manual on-demand sync still possible (throwaway rclone container from devvm; recipe in claude-memory). Nextcloud "PVE NFS Pool"/rsync is a manual fallback. Dashboard `stem_board.html` served at `/` via a small nginx ConfigMap (`index`). No DB, no in-cluster secrets. Reference impl for the NFS-backed static-site pattern (see patterns.md). | stem95su |
 | trek | **TRIAL (2026-06-05)** — self-hosted group-trip planner (upstream [TREK](https://github.com/mauriceboe/TREK), `mauriceboe/trek:3.0.22`, AGPL-3.0). Solo evaluation behind Authentik forward-auth (`auth=required`) before deciding build-vs-adopt; covers collaborative trip planning + accommodation records + activities + per-person budget splitting on free OpenStreetMap (no paid maps key). SQLite + uploads on `proxmox-lvm-encrypted` (`trek-data-encrypted` 2Gi, `trek-uploads-encrypted` 5Gi). For the trial only: `ENCRYPTION_KEY` is TREK-auto-generated onto the data PVC and the bootstrap admin (`admin@trek.local`) is printed to pod logs — NO Vault/ESO wiring (graduation TODO: move key to `secret/trek` + ESO, add an app-level SQLite backup CronJob since host file-backup can't read the LUKS PVC, wire TREK↔Authentik OIDC). Pinned image, TF-managed (no CI/Keel). Availability-poll companion (Rallly) deferred. Teardown: `tg destroy` in `stacks/trek`. | trek |

 ## Cloudflare Domains
@ -133,7 +130,7 @@
 blog, hackmd, privatebin, url, echo, f1tv, excalidraw, send,
 audiobookshelf, jsoncrack, ntfy, cyberchef, homepage, linkwarden,
 changedetection, tandoor, n8n, stirling-pdf, dashy, city-guesser,
-travel, netbox, phpipam, tripit, t3, stem95su, tasks
+travel, netbox, phpipam, tripit, t3, stem95su
 ```

 ### Non-Proxied (Direct DNS)
--- a/.github/workflows/build-excalidraw.yml
+++ b/.github/workflows/build-excalidraw.yml
@ -1,42 +0,0 @@
-name: Build excalidraw-library
-
-# ADR-0002 / no-local-builds: excalidraw-library (infra-owned Go app behind
-# draw.viktorbarzin.me) builds off-infra on GHA → private ghcr; Keel polls
-# ghcr:latest and rolls the deployment. Replaces the manual DockerHub pushes
-# (viktorbarzin/excalidraw-library:v4 stays frozen as the rollback image).
-on:
-  push:
-    branches: [master]
-    paths:
-      - 'stacks/excalidraw/project/**'
-  workflow_dispatch: {}
-
-permissions:
-  contents: read
-  packages: write
-
-jobs:
-  build:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-      - uses: actions/setup-go@v5
-        with:
-          go-version: '1.21'
-      - run: go test ./...
-        working-directory: stacks/excalidraw/project
-      - uses: docker/setup-buildx-action@v3
-      - uses: docker/login-action@v3
-        with:
-          registry: ghcr.io
-          username: ${{ github.actor }}
-          password: ${{ secrets.GITHUB_TOKEN }}
-      - uses: docker/build-push-action@v6
-        with:
-          context: stacks/excalidraw/project
-          platforms: linux/amd64
-          provenance: false
-          push: true
-          tags: |
-            ghcr.io/viktorbarzin/excalidraw-library:latest
-            ghcr.io/viktorbarzin/excalidraw-library:${{ github.sha }}
--- a/.github/workflows/build-valia-sites-sync.yml
+++ b/.github/workflows/build-valia-sites-sync.yml
@ -1,39 +0,0 @@
-name: Build valia-sites-sync
-
-# ADR-0002 + ADR-0018: infra-owned image built off-infra on GHA → ghcr (public).
-# Rclone + wrangler runner for the Valia-sites Content-folder mirror CronJob.
-# Rebuilds are rare (tool pins only change deliberately) → dispatch + path.
-# Security note: no untrusted event inputs are interpolated anywhere (only
-# github.actor / github.sha / GITHUB_TOKEN — same shape as the other
-# build-*.yml workflows in this repo).
-on:
-  push:
-    branches: [master]
-    paths:
-      - 'stacks/valia-sites/sync-image/**'
-  workflow_dispatch: {}
-
-permissions:
-  contents: read
-  packages: write
-
-jobs:
-  build:
-    runs-on: ubuntu-latest
-    steps:
-      - uses: actions/checkout@v4
-      - uses: docker/setup-buildx-action@v3
-      - uses: docker/login-action@v3
-        with:
-          registry: ghcr.io
-          username: ${{ github.actor }}
-          password: ${{ secrets.GITHUB_TOKEN }}
-      - uses: docker/build-push-action@v6
-        with:
-          context: stacks/valia-sites/sync-image
-          platforms: linux/amd64
-          provenance: false
-          push: true
-          tags: |
-            ghcr.io/viktorbarzin/valia-sites-sync:latest
-            ghcr.io/viktorbarzin/valia-sites-sync:${{ github.sha }}
--- a/AGENTS.md
+++ b/AGENTS.md
@ -95,7 +95,7 @@ Terragrunt-based homelab managing a Kubernetes cluster (5 nodes, v1.34.2) on Pro
 ## Key Paths
 - `stacks/<service>/main.tf` — service definition
 - `stacks/platform/modules/<service>/` — core infra modules
- `modules/kubernetes/ingress_factory/` — standardized ingress with auth, rate limiting, anti-AI, and auto Cloudflare DNS (`dns_type = "proxied"`, `"non-proxied"`, or `"internal"` — a public A record carrying the internal Traefik LB IP for household-only services; pair with the `home-lans-only` ipAllowList middleware, never with `"proxied"`)
+- `modules/kubernetes/ingress_factory/` — standardized ingress with auth, rate limiting, anti-AI, and auto Cloudflare DNS (`dns_type = "proxied"` or `"non-proxied"`)
 - `modules/kubernetes/nfs_volume/` — NFS volume module (CSI-backed, soft mount)
 - `config.tfvars` — non-secret configuration (plaintext)
 - `secrets.sops.json` — all secrets (SOPS-encrypted JSON)
--- a/CONTEXT.md
+++ b/CONTEXT.md
@ -118,14 +118,6 @@ _Avoid_: "external", "outside".
 `viktorbarzin.lan`, served by Technitium DNS. Resolves only inside the homelab network.
 _Avoid_: bare "lan", "private", "intranet".

-**Segment**:
-One isolated L2/L3 network with pfSense as its gateway — realised as a Proxmox-bridge-level tag feeding one dedicated untagged pfSense interface (dManagementsVms 10.0.10.0/24 = vmbr1 tag 10, dKubernetes 10.0.20.0/24 = vmbr1 tag 20, dCCTV 10.0.30.0/24 = vmbr0 tag 30). pfSense itself never terminates 802.1Q.
-_Avoid_: "VLAN" as the primary name (the tags 10/20/30 are transport detail; the Segment is the concept).
-
-**CCTV segment**:
-The untrusted camera **Segment** (`dCCTV`) — devices in it may be pulled from (RTSP/ISAPI) but may initiate nothing except NTP to their gateway. Deliberately outside every trusted source-IP allowlist (ADR-0017).
-_Avoid_: "camera VLAN", "CCTV LAN".
-
 **Ingress auth**:
 The `auth = "..."` parameter on `ingress_factory` — a discrete *mode*, not a ranked tier — one of `required` (Authentik forward-auth gates every request), `app` (the backend owns its login), `public` (anonymous Authentik binding for audit only), or `none` (Anubis-fronted content, or native-client API). Default `required` (fail-closed).
 _Avoid_: "auth tier" / "auth mode" — refer to it by the canonical key, `auth` (e.g. `auth = "required"`). "tier" is reserved for State tier and Namespace tier.
@ -237,20 +229,6 @@ _Avoid_: expecting Diun to deploy; conflating with **Keel**.
 **Anubis**:
 A PoW reverse-proxy issuing a 30-day JWT cookie, used in front of public content-bearing sites without app-level auth (blog, wiki, landing pages). Never in front of Git, WebDAV, CalDAV, or API endpoints (clients can't solve PoW).

-### Externally-authored sites
-
-**Valia site**:
-A small public static site authored by Valia (Viktor's mother, external to the infra) and hosted for her under `<name>.viktorbarzin.me`. Its source of truth is a **Content folder** she owns; the live site is a mirror of that folder, fresh within ~10 minutes. Hosted **off-infra** (Cloudflare Pages) by decision: a homelab outage freezes content but never takes her sites down. Viktor picks the English subdomain name per site at registration (her folder names stay Bulgarian). Current instances: `stem95su`, `bridge`.
-_Avoid_: "school site" (the family may grow beyond school projects); treating the deployed copy as editable — edits land only in the **Content folder**.
-
-**Content folder**:
-The Google Drive folder (or subfolder) Valia shares with `vbarzin@gmail.com` holding one **Valia site**'s files. Strictly read-only from the infra side — nothing ever writes back to her Drive. Empty or half-uploaded folder states must never wipe a live site.
-_Avoid_: syncing a folder root when the servable content lives in a subfolder (stem95su serves `stem claude/files/`, not the folder root).
-
-**Entry file**:
-The HTML file a **Valia site** serves at `/`. Defaults to `index.html`; per-site override when she names it differently (stem95su: `stem_board.html`). The override is a registration-time setting, not a constraint on her authoring.
-_Avoid_: asking Valia to rename her files to fit hosting conventions.
-
 ## Relationships

 - A **Service** is defined by exactly one **Stack** — **flat** or wrapping a **Stack-local module** — which sources zero or more shared **Factory modules** and resolves to one or more K8s workloads.
@ -262,7 +240,6 @@ _Avoid_: asking Valia to rename her files to fit hosting conventions.
 - A **Service**'s image reaches the cluster via **Woodpecker deploy** (push-driven, on commit) or **Keel** (poll-driven, on a new registry tag); **Diun** only notifies. Operator-managed StatefulSets are rolled by neither.
 - An owned **Service**'s image is built by GitHub Actions from the **Canonical repo**'s **GitHub mirror** and hosted on ghcr.io (ADR-0002); the **Forgejo registry** keeps only a frozen last-known-good tag per **Service**.
 - Tier-1 **State tier** state and ~12 app databases share one **CNPG** `pg-cluster`, reached through **PgBouncer**; their credentials rotate via the `vault-database` store.
- A **Valia site** mirrors exactly one **Content folder** and serves exactly one **Entry file** at `/`; the folder is hers, the subdomain name is Viktor's, the hosting is off-infra.

 ## Example dialogue

--- a/cli/VERSION
+++ b/cli/VERSION
@ -1 +1 @@
-v0.12.0
+v0.11.0
--- a/cli/cmd_memory.go
+++ b/cli/cmd_memory.go
@ -30,21 +30,11 @@ func memoryCommands() []Command {
 	}
 }

-// printMemories renders a {memories:[…]} response as one line per memory, or raw JSON.
+// printMemories renders a {memories:[…]} response as compact lines, or raw JSON.
 func printMemories(raw []byte, jsonOut bool) error {
-	fmt.Print(renderMemories(raw, jsonOut))
-	return nil
-}
-
-// renderMemories formats each memory as a single line with its FULL content
-// (newlines flattened to spaces). Content is deliberately never truncated: the
-// old 240-rune preview cut memories mid-sentence, misled agents into believing
-// no full-content read-back existed, and made blind `update --content` from
-// the preview silently destroy the stored tail. Full passthrough also can't
-// produce invalid UTF-8 (the old mid-rune cut crashed the recall hook).
-func renderMemories(raw []byte, jsonOut bool) string {
 	if jsonOut {
-		return string(raw) + "\n"
+		fmt.Println(string(raw))
+		return nil
 	}
 	var r struct {
 		Memories []struct {
@ -56,20 +46,36 @@ func renderMemories(raw []byte, jsonOut bool) string {
 		} `json:"memories"`
 	}
 	if err := json.Unmarshal(raw, &r); err != nil {
-		return string(raw) + "\n"
+		fmt.Println(string(raw))
+		return nil
 	}
 	if len(r.Memories) == 0 {
-		return "(no memories)\n"
+		fmt.Println("(no memories)")
+		return nil
 	}
-	var b strings.Builder
 	for _, m := range r.Memories {
-		c := strings.ReplaceAll(m.Content, "\n", " ")
-		fmt.Fprintf(&b, "#%d [%s] (%.2f) %s\n", m.ID, m.Category, m.Importance, c)
+		c := truncatePreview(strings.ReplaceAll(m.Content, "\n", " "), 240)
+		fmt.Printf("#%d [%s] (%.2f) %s\n", m.ID, m.Category, m.Importance, c)
 		if m.Tags != "" {
-			fmt.Fprintf(&b, "       tags: %s\n", m.Tags)
+			fmt.Printf("       tags: %s\n", m.Tags)
 		}
 	}
-	return b.String()
+	return nil
+}
+
+// truncatePreview shortens s to at most maxRunes RUNES, appending "…" when it
+// trims. Counting runes (not bytes) is load-bearing: a byte slice like s[:240]
+// can cut through the middle of a multibyte UTF-8 character (e.g. 2-byte
+// Cyrillic), leaving a dangling lead byte = invalid UTF-8. That crashed strict
+// decoders downstream — notably the homelab-memory-recall.py UserPromptSubmit
+// hook (subprocess text=True), which surfaced as a recurring "UserPromptSubmit
+// hook error" for Cyrillic-language users.
+func truncatePreview(s string, maxRunes int) string {
+	r := []rune(s)
+	if len(r) <= maxRunes {
+		return s
+	}
+	return string(r[:maxRunes]) + "…"
 }

 func memoryRecall(args []string) error {
--- a/cli/memory_test.go
+++ b/cli/memory_test.go
@ -8,53 +8,25 @@ import (
 	"unicode/utf8"
 )

-func TestRenderMemoriesFullContent(t *testing.T) {
-	// The pretty view must NOT truncate content: the old 240-rune preview cut
-	// memories mid-sentence, misled agents into thinking no full-content
-	// read-back existed, and made blind `update --content` from the preview
-	// destroy the stored tail. Full passthrough also removes the mid-rune-cut
-	// invalid-UTF-8 class by construction — nothing is ever sliced.
-	long := strings.Repeat("я", 300) + strings.Repeat("a", 300)
-	raw, _ := json.Marshal(map[string]interface{}{"memories": []map[string]interface{}{
-		{"id": 7, "content": long, "category": "facts", "tags": "t1,t2", "importance": 0.7},
-	}})
-	got := renderMemories(raw, false)
-	if !strings.Contains(got, long) {
-		t.Fatalf("content was truncated: %q", got)
-	}
-	if strings.Contains(got, "…") {
-		t.Fatalf("ellipsis in output — truncation still active: %q", got)
-	}
+func TestTruncatePreviewKeepsValidUTF8(t *testing.T) {
+	// Byte-slicing a long Cyrillic string at 240 splits a 2-byte rune and emits
+	// invalid UTF-8 — the bug that crashed the recall hook. truncatePreview must
+	// cut on a rune boundary and always stay valid UTF-8.
+	long := strings.Repeat("я", 300) // 300 runes / 600 bytes
+	got := truncatePreview(long, 240)
 	if !utf8.ValidString(got) {
-		t.Fatalf("invalid UTF-8 in output: %q", got)
+		t.Fatalf("truncatePreview produced invalid UTF-8: %q", got)
 	}
-	if !strings.Contains(got, "#7 [facts] (0.70) ") || !strings.Contains(got, "tags: t1,t2") {
-		t.Fatalf("line format broken: %q", got)
+	if r := []rune(got); len(r) != 241 || string(r[:240]) != strings.Repeat("я", 240) || r[240] != '…' {
+		t.Fatalf("truncatePreview = %d runes, want 240 Cyrillic + ellipsis", len(r))
 	}
-}
-
-func TestRenderMemoriesFlattensNewlinesToOneLine(t *testing.T) {
-	// Consumers (the recall hook, terminal skims) rely on one memory per line;
-	// multi-line content is flattened, never split across lines.
-	raw, _ := json.Marshal(map[string]interface{}{"memories": []map[string]interface{}{
-		{"id": 1, "content": "line one\nline two\nline three", "category": "facts", "importance": 0.5},
-	}})
-	got := renderMemories(raw, false)
-	if !strings.Contains(got, "line one line two line three") {
-		t.Fatalf("newlines not flattened: %q", got)
+	// Short multibyte strings pass through untouched (no ellipsis).
+	if got := truncatePreview("кратко", 240); got != "кратко" {
+		t.Fatalf("short string altered: %q", got)
 	}
-}
-
-func TestRenderMemoriesEdgeCases(t *testing.T) {
-	if got := renderMemories([]byte(`{"memories":[]}`), false); got != "(no memories)\n" {
-		t.Fatalf("empty list: %q", got)
-	}
-	// --json and unparseable responses pass through raw.
-	if got := renderMemories([]byte(`{"x":1}`), true); got != "{\"x\":1}\n" {
-		t.Fatalf("json passthrough: %q", got)
-	}
-	if got := renderMemories([]byte(`not json`), false); got != "not json\n" {
-		t.Fatalf("unparseable passthrough: %q", got)
+	// ASCII boundary still works.
+	if got := truncatePreview(strings.Repeat("a", 500), 240); got != strings.Repeat("a", 240)+"…" {
+		t.Fatalf("ascii truncation wrong: %q", got)
 	}
 }

--- a/config.tfvars
+++ b/config.tfvars
--- a/docs/adr/0017-cctv-physical-cabling.svg
+++ b/docs/adr/0017-cctv-physical-cabling.svg
@ -1,126 +0,0 @@
-<svg xmlns="http://www.w3.org/2000/svg" width="1600" height="820" viewBox="0 0 1600 820" font-family="system-ui, -apple-system, 'Segoe UI', Roboto, sans-serif">
-  <!-- ADR-0017: PHYSICAL cabling only — no VLANs, no flows. Solid = cable in
-       place today · dashed = camera-day work · ~~~ = radio. Palette: neutral
-       grays + blue for copper runs (reference dataviz palette text tokens). -->
-  <defs>
-    <marker id="dot" viewBox="0 0 8 8" refX="4" refY="4" markerWidth="5" markerHeight="5">
-      <circle cx="4" cy="4" r="3" fill="#52514e"/>
-    </marker>
-  </defs>
-
-  <rect width="1600" height="820" fill="#fcfcfb"/>
-
-  <text x="40" y="42" font-size="26" font-weight="700" fill="#0b0b0b">ADR-0017 — physical cabling (single-switch, rev 3)</text>
-  <text x="40" y="66" font-size="15" fill="#52514e">wires only — no VLANs, no traffic · solid = in place · dashed = camera-day · ~ = radio</text>
-
-  <!-- ═════════ APARTMENT ═════════ -->
-  <rect x="40" y="100" width="330" height="330" rx="10" fill="#0b0b0b" fill-opacity="0.03" stroke="#b9b8b2"/>
-  <text x="56" y="126" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">APARTMENT</text>
-
-  <text x="70" y="158" font-size="13" fill="#52514e">☁ ISP (internet)</text>
-  <path d="M120,166 L120,196" fill="none" stroke="#52514e" stroke-width="2"/>
-
-  <rect x="64" y="198" width="220" height="64" rx="8" fill="#ffffff" stroke="#8a8984"/>
-  <text x="80" y="222" font-size="14.5" font-weight="700" fill="#0b0b0b">AX6000 router</text>
-  <text x="80" y="242" font-size="12" fill="#52514e">192.168.1.1 · WAN←ISP · 8×LAN</text>
-
-  <rect x="64" y="290" width="220" height="52" rx="8" fill="#ffffff" stroke="#8a8984"/>
-  <text x="80" y="312" font-size="14" font-weight="700" fill="#0b0b0b">Synology NAS · .13</text>
-  <text x="80" y="330" font-size="12" fill="#52514e">on an AX6000 LAN port</text>
-  <path d="M174,262 L174,290" fill="none" stroke="#2a78d6" stroke-width="2"/>
-
-  <text x="70" y="376" font-size="12.5" fill="#52514e">📶 wifi clients (phones, laptops)</text>
-  <path d="M110,262 C104,272 106,278 100,286 C106,294 104,300 100,308 C106,316 104,322 100,330 C106,338 104,344 100,352 C104,358 102,362 98,366" fill="none" stroke="#8a8984" stroke-width="1.6" stroke-dasharray="2,3"/>
-
-  <!-- in-wall run apartment -> garage -->
-  <path d="M284,230 C450,230 540,228 616,228" fill="none" stroke="#2a78d6" stroke-width="2.5"/>
-  <text x="330" y="218" font-size="12.5" font-weight="700" fill="#2a78d6">in-wall run → garage</text>
-
-  <!-- ═════════ GARAGE — RACK ═════════ -->
-  <rect x="560" y="100" width="640" height="680" rx="10" fill="#0b0b0b" fill-opacity="0.03" stroke="#b9b8b2"/>
-  <text x="576" y="126" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">GARAGE — RACK</text>
-
-  <!-- switch -->
-  <rect x="600" y="150" width="560" height="150" rx="8" fill="#ffffff" stroke="#0b0b0b" stroke-opacity="0.5" stroke-width="1.6"/>
-  <text x="616" y="176" font-size="14.5" font-weight="700" fill="#0b0b0b">TL-SG105PE · 5-port gigabit PoE switch</text>
-  <text x="616" y="194" font-size="12" fill="#52514e">mgmt 192.168.1.6 · replaces the old TL-SG105E (→ shelf, cold spare)</text>
-  <g font-size="11.5" text-anchor="middle">
-    <rect x="616" y="210" width="96" height="40" rx="5" fill="#2a78d6" fill-opacity="0.08" stroke="#8a8984"/>
-    <text x="664" y="227" font-weight="700" fill="#0b0b0b">P1</text>
-    <text x="664" y="242" fill="#52514e">← apartment</text>
-    <rect x="722" y="210" width="96" height="40" rx="5" fill="#2a78d6" fill-opacity="0.08" stroke="#8a8984"/>
-    <text x="770" y="227" font-weight="700" fill="#0b0b0b">P2</text>
-    <text x="770" y="242" fill="#52514e">← 4G router</text>
-    <rect x="828" y="210" width="96" height="40" rx="5" fill="#2a78d6" fill-opacity="0.08" stroke="#8a8984"/>
-    <text x="876" y="227" font-weight="700" fill="#0b0b0b">P3</text>
-    <text x="876" y="242" fill="#52514e">← UPS mgmt</text>
-    <rect x="934" y="210" width="96" height="40" rx="5" fill="#2a78d6" fill-opacity="0.08" stroke="#8a8984" stroke-dasharray="4,3"/>
-    <text x="982" y="227" font-weight="700" fill="#0b0b0b">P4 ⚡PoE</text>
-    <text x="982" y="242" fill="#52514e">← camera</text>
-    <rect x="1040" y="210" width="96" height="40" rx="5" fill="#2a78d6" fill-opacity="0.08" stroke="#8a8984"/>
-    <text x="1088" y="227" font-weight="700" fill="#0b0b0b">P5</text>
-    <text x="1088" y="242" fill="#52514e">← R730 eno1</text>
-  </g>
-  <text x="616" y="284" font-size="12" fill="#52514e">every cable below re-plugs old-switch → PE on camera day (≈3 min)</text>
-
-  <!-- 4G router -->
-  <rect x="600" y="360" width="250" height="64" rx="8" fill="#ffffff" stroke="#8a8984"/>
-  <text x="616" y="384" font-size="14" font-weight="700" fill="#0b0b0b">4G router · 192.168.1.7</text>
-  <text x="616" y="403" font-size="12" fill="#52514e">~cellular uplink (out-of-band)</text>
-  <path d="M770,300 L770,360" fill="none" stroke="#2a78d6" stroke-width="2"/>
-  <path d="M856,392 C866,386 864,380 874,376 C866,370 868,364 876,360" fill="none" stroke="#8a8984" stroke-width="1.6" stroke-dasharray="2,3"/>
-  <text x="884" y="380" font-size="12" fill="#52514e">📡 cellular</text>
-
-  <!-- UPS -->
-  <rect x="600" y="452" width="250" height="56" rx="8" fill="#ffffff" stroke="#8a8984"/>
-  <text x="616" y="476" font-size="14" font-weight="700" fill="#0b0b0b">UPS (Huawei)</text>
-  <text x="616" y="494" font-size="12" fill="#52514e">network mgmt card</text>
-  <path d="M876,300 C876,340 800,410 720,452" fill="none" stroke="#2a78d6" stroke-width="2"/>
-
-  <!-- R730 -->
-  <rect x="600" y="540" width="560" height="220" rx="8" fill="#ffffff" stroke="#0b0b0b" stroke-opacity="0.5" stroke-width="1.6"/>
-  <text x="616" y="566" font-size="14.5" font-weight="700" fill="#0b0b0b">Dell R730 · PVE host · 192.168.1.127</text>
-  <g font-size="11.5">
-    <rect x="616" y="582" width="128" height="38" rx="5" fill="#2a78d6" fill-opacity="0.08" stroke="#8a8984"/>
-    <text x="628" y="598" font-weight="700" fill="#0b0b0b">eno1 · LAN1</text>
-    <text x="628" y="613" fill="#52514e">← switch P5 · 1GbE</text>
-    <rect x="756" y="582" width="128" height="38" rx="5" fill="#ffffff" stroke="#8a8984" stroke-dasharray="4,3"/>
-    <text x="768" y="598" font-weight="700" fill="#52514e">eno2 · LAN2</text>
-    <text x="768" y="613" fill="#8a8984">dark · fallback leg</text>
-    <rect x="896" y="582" width="128" height="38" rx="5" fill="#ffffff" stroke="#d8d7d2"/>
-    <text x="908" y="598" fill="#8a8984">eno3 / eno4</text>
-    <text x="908" y="613" fill="#8a8984">free, uncabled</text>
-    <rect x="1036" y="582" width="108" height="38" rx="5" fill="#ffffff" stroke="#d8d7d2"/>
-    <text x="1048" y="598" fill="#8a8984">iDRAC · .4</text>
-    <text x="1048" y="613" fill="#8a8984">shared-LOM/eno1</text>
-  </g>
-  <text x="616" y="648" font-size="12" fill="#52514e">no other network cables — everything else on this host is VIRTUAL:</text>
-  <text x="616" y="668" font-size="12" fill="#52514e">pfSense · ha-sofia (HA) · devvm · k8s-master + node1-6 · registry VM …</text>
-  <text x="616" y="696" font-size="12" fill="#8a8984">(power: host + switch fed from the UPS — power wiring not drawn)</text>
-
-  <path d="M1088,300 C1088,420 720,500 680,582" fill="none" stroke="#2a78d6" stroke-width="2.5"/>
-  <text x="1100" y="330" font-size="12.5" font-weight="700" fill="#2a78d6">LAN1 cable</text>
-
-  <!-- ═════════ GARAGE ENTRANCE ═════════ -->
-  <rect x="1280" y="100" width="280" height="200" rx="10" fill="#0b0b0b" fill-opacity="0.03" stroke="#b9b8b2"/>
-  <text x="1296" y="126" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">GARAGE ENTRANCE</text>
-  <rect x="1304" y="150" width="232" height="110" rx="8" fill="#ffffff" stroke="#8a8984"/>
-  <text x="1320" y="176" font-size="14" font-weight="700" fill="#0b0b0b">vermont-garage camera</text>
-  <text x="1320" y="196" font-size="12" fill="#52514e">HiLook IPC-T241H-C · 10.0.30.70</text>
-  <text x="1320" y="214" font-size="12" fill="#52514e">powered over the data cable (PoE)</text>
-  <text x="1320" y="232" font-size="12" fill="#52514e">outdoor · armored conduit</text>
-
-  <path d="M982,210 C982,150 1140,140 1304,180" fill="none" stroke="#52514e" stroke-width="2.5" stroke-dasharray="7,5"/>
-  <text x="1080" y="136" font-size="12.5" font-weight="700" fill="#52514e">single cat6 in conduit · data + PoE power (camera day)</text>
-
-  <!-- legend -->
-  <g transform="translate(40,780)" font-size="12.5">
-    <line x1="0" y1="-4" x2="44" y2="-4" stroke="#2a78d6" stroke-width="2.5"/>
-    <text x="52" y="0" fill="#0b0b0b">copper, in place</text>
-    <line x1="190" y1="-4" x2="234" y2="-4" stroke="#52514e" stroke-width="2.5" stroke-dasharray="7,5"/>
-    <text x="242" y="0" fill="#0b0b0b">camera-day cable / dark port</text>
-    <path d="M450,-4 C456,-10 454,-14 460,-18" fill="none" stroke="#8a8984" stroke-width="1.6" stroke-dasharray="2,3"/>
-    <text x="470" y="0" fill="#0b0b0b">radio (wifi / cellular)</text>
-    <text x="650" y="0" fill="#52514e">total wired links at the rack: 5 (all on the one switch) · ADR-0017 rev 3</text>
-  </g>
-</svg>
--- a/docs/adr/0017-cctv-segment-dedicated-pfsense-leg.md
+++ b/docs/adr/0017-cctv-segment-dedicated-pfsense-leg.md
@ -1,99 +0,0 @@
-# CCTV segment: dedicated pfSense interface, VLAN-30 trunk on the LAN1 cable
-
-Status: accepted (2026-07-02, rev 3 — single-switch)
-
-![Network topology — dCCTV segment, flows, and camera-day steps](./0017-cctv-segment-topology.svg)
-
-![Physical cabling — wires only, no VLANs](./0017-cctv-physical-cabling.svg)
-
-The first owned camera at the Sofia/Vermont site (`vermont-garage`, HiLook
-IPC-T241H-C at the garage entrance) needs to be network-isolated: its cable is
-physically exposed outside the apartment, so anything plugged into that cable
-must land in a segment that can reach nothing. The original design doc
-(NAS: `Emo shared/Claude shared/garage-camera/`) called for an "802.1Q trunk
-to pfSense" — but nothing in this network terminates dot1q on pfSense; the
-site idiom is one vlan-aware Proxmox bridge → one tagged VM NIC → one clean
-untagged pfSense interface per segment.
-
-**Decision (rev 3):** ONE switch — the new TL-SG105PE **replaces** the old
-garage TL-SG105E (Viktor prefers not running two switches; retired unit
-becomes a cold spare, its 192.168.1.6 mgmt IP passes to the PE). Five ports,
-all used: apartment uplink, 4G router 192.168.1.7, UPS mgmt (all untagged
-VLAN 1), the camera (untagged VLAN 30, PoE), and the **trunk to R730 `eno1`
-carrying home LAN untagged + CCTV tagged 30** over the existing LAN1 cable.
-pfSense `net3` (vtnet3) sits on `vmbr0` with `tag=30` — exactly the site
-idiom used for dManagementsVms/dKubernetes (bridge-level tag → clean untagged
-vNIC; pfSense still terminates no dot1q itself). The earlier dedicated
-`eno2`/`vmbr2` leg is kept **dormant as a fallback** (rev 2 wired it; moving
-net3 back to vmbr2 restores pure physical isolation in one `qm set`).
-This narrows the earlier 802.1Q objection rather than contradicting it: the
-rejection assumed *unmanaged* switches, where any LAN device could inject
-tagged frames; with the managed PE as the only device on eno1, VLAN-30
-membership is {camera port, trunk port} only, so tag-30 ingress from every
-other port — and from the exposed camera cable — is dropped or contained.
-Cameras are untrusted: default-deny on dCCTV with a single
-NTP-to-gateway exception; Frigate (k8s) pulls RTSP in; ha-sofia (192.168.1.8)
-may reach ISAPI/RTSP directly; home-LAN clients route in via an AX6000 static
-route (10.0.30.0/24 via 192.168.1.2). 10.0.30.0/24 is deliberately NOT in the
-10.0.20.0/22 trusted source-IP allowlist.
-
-## Traffic on the trunk — how one cable carries two networks
-
-The LAN1 cable is shared, but the two networks on it diverge at `vmbr0`
-(the vlan-aware bridge on the PVE host), and only ONE of them ever touches
-pfSense:
-
- **Untagged (VLAN 1, home LAN)** is plain L2 bridging: vmbr0 switches it
-  between the trunk, the host's own IP (192.168.1.127) and pfSense `net0` —
-  where pfSense sits as an ordinary LAN *client* (WAN 192.168.1.2). The home
-  LAN's gateway is and remains the AX6000; home-LAN traffic never transits
-  pfSense. Consequently a pfSense (or R730 VM-level) outage does not affect
-  the home LAN, and the apartment ↔ 4G-router ↔ UPS paths don't even leave
-  the switch (P1/P2/P3 bridge internally), so out-of-band recovery via the
-  4G router survives the whole rack being down.
- **Tagged 30 (CCTV)** has exactly one possible landing: vmbr0 delivers
-  VID 30 only to pfSense `net3` (dCCTV, 10.0.30.1), which is the camera
-  segment's gateway, firewall and sole exit. "Camera → AX6000 → internet"
-  is impossible by construction, not merely by firewall rule.
- pfSense forwards *upstream* only its own segments (10.0.10/20/30), NATed
-  out of its WAN toward the AX6000. Load-wise the trunk gained only the
-  camera's ~8 Mbps — it already carried all rack-bound home-LAN traffic.
-
-![VLAN tagging — where traffic can flow](./0017-cctv-vlan-tagging.svg)
-
-*(editable source: [`0017-cctv-vlan-tagging.excalidraw`](./0017-cctv-vlan-tagging.excalidraw) — open it in excalidraw to tweak)*
-
-## Considered options
-
- **802.1Q over the LAN path behind an UNMANAGED switch** (the original plan
-  read this way) — rejected: any LAN device could inject tagged frames into
-  vmbr0 (`bridge-vids 2-4094`) and tag-passing through a dumb switch is
-  undefined. Rev 3 adopts the tagged path ONLY because the managed PE now
-  polices VLAN-30 membership at the single entry point to eno1; no bridge
-  reconfiguration was needed (vmbr0 was already vlan-aware).
- **Dedicated physical leg (eno2 → vmbr2 → net3), one switch per role**
-  (rev 1/2 as-built) — superseded by rev 3: it forced either a second switch
-  (6 connections vs 5 ports once the PE also replaced the old switch) or new
-  hardware. Strongest isolation of all options; kept dormant as the fallback.
- **AX6000 as the camera gateway** — rejected earlier in the design (consumer
-  router, no inter-VLAN firewall).
-
-## Consequences
-
- The switch is now single-point and load-bearing for everything in the rack
-  (apartment uplink, pfSense backup-WAN via 4G, UPS mgmt, CCTV) AND its VLAN
-  table + mgmt password are part of the isolation boundary — the Easy Smart
-  mgmt UI answers on every port, so the password is the gate between a
-  compromised camera and the switch config. All 5 ports are consumed: the
-  next camera forces an 8-port PoE upgrade (the wiring plan already fits it).
- `eno2`/`vmbr2` stay cabled-ready but dormant (fallback to rev 2's physical
-  leg); eno3/eno4 remain free.
- The old TL-SG105E is retired to cold spare; the PE inherits 192.168.1.6
-  (Kea reservation by MAC).
- Revision history (all 2026-07-02): rev 1 assumed one shared PE with a
-  port-VLAN split (conflated the two devices); rev 2 split into two switches
-  after inspecting 192.168.1.6 (old non-PoE SG105E, 4/5 ports used); rev 3
-  consolidated back to one switch — the PE replacing the SG105E — per
-  Viktor's preference, moving CCTV onto a managed tagged trunk.
- Frigate's ADR-0016 VRAM budget was bumped 2000 → 2300 MiB for the extra
-  NVDEC stream.
--- a/docs/adr/0017-cctv-segment-topology.svg
+++ b/docs/adr/0017-cctv-segment-topology.svg
@ -1,178 +0,0 @@
-<svg xmlns="http://www.w3.org/2000/svg" width="1600" height="880" viewBox="0 0 1600 880" font-family="system-ui, -apple-system, 'Segoe UI', Roboto, sans-serif">
-  <!-- ADR-0017 rev 3 dCCTV topology (single switch, VLAN-30 trunk on LAN1).
-       Colors: reference dataviz palette (light mode). blue #2a78d6 = home LAN ·
-       violet #4a3aa7 = dCCTV · aqua #1baf7a = dKubernetes ·
-       yellow #eda100 = dManagementsVms · green #008300 allow · red #e34948 deny -->
-  <defs>
-    <marker id="arrGreen" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
-      <path d="M0,0 L10,5 L0,10 z" fill="#008300"/>
-    </marker>
-    <marker id="arrRed" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
-      <path d="M0,0 L10,5 L0,10 z" fill="#e34948"/>
-    </marker>
-    <marker id="arrGray" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
-      <path d="M0,0 L10,5 L0,10 z" fill="#52514e"/>
-    </marker>
-  </defs>
-
-  <rect width="1600" height="880" fill="#fcfcfb"/>
-
-  <text x="40" y="42" font-size="26" font-weight="700" fill="#0b0b0b">ADR-0017 — CCTV segment behind pfSense, VLAN-30 trunk on the LAN1 cable</text>
-  <text x="40" y="66" font-size="15" fill="#52514e">Sofia/Vermont · rev 3 (single switch) 2026-07-02 · dashed = camera-day · the ONLY 802.1Q is the trunk between the switch and eno1</text>
-
-  <!-- camera -> everything else (denied) -->
-  <path d="M240,168 C520,104 900,104 1148,140" fill="none" stroke="#e34948" stroke-width="3" marker-end="url(#arrRed)"/>
-  <g transform="translate(560,111)">
-    <circle r="11" fill="#fcfcfb" stroke="#e34948" stroke-width="2.5"/>
-    <path d="M-5,-5 L5,5 M5,-5 L-5,5" stroke="#e34948" stroke-width="2.5"/>
-  </g>
-  <text x="588" y="100" font-size="13.5" font-weight="700" fill="#e34948">DENY · camera → LAN / other segments / internet (default deny on dCCTV)</text>
-
-  <!-- GARAGE ENTRANCE -->
-  <rect x="40" y="128" width="240" height="180" rx="10" fill="#4a3aa7" fill-opacity="0.06" stroke="#4a3aa7" stroke-opacity="0.35"/>
-  <text x="56" y="154" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">GARAGE ENTRANCE</text>
-  <rect x="64" y="170" width="192" height="112" rx="8" fill="#ffffff" stroke="#4a3aa7" stroke-width="2"/>
-  <text x="80" y="196" font-size="15" font-weight="700" fill="#0b0b0b">vermont-garage</text>
-  <text x="80" y="216" font-size="12.5" fill="#52514e">HiLook IPC-T241H-C · pure IR</text>
-  <text x="80" y="234" font-size="12.5" fill="#52514e">10.0.30.70 (Kea reservation)</text>
-  <text x="80" y="252" font-size="12.5" fill="#52514e">DNS: garage-cam.viktorbarzin.lan</text>
-  <text x="80" y="270" font-size="12.5" fill="#52514e">PoE from switch · cloud/P2P off</text>
-
-  <path d="M256,284 C330,330 412,368 417,430" fill="none" stroke="#52514e" stroke-width="2" stroke-dasharray="6,5" marker-end="url(#arrGray)"/>
-  <text x="330" y="322" font-size="12" fill="#52514e">cat6 in conduit · PoE → P4</text>
-
-  <!-- RACK zone: single switch -->
-  <rect x="40" y="360" width="560" height="265" rx="10" fill="#0b0b0b" fill-opacity="0.03" stroke="#b9b8b2"/>
-  <text x="56" y="384" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">RACK — GARAGE · ONE SWITCH</text>
-
-  <rect x="64" y="396" width="512" height="176" rx="8" fill="#4a3aa7" fill-opacity="0.04" stroke="#4a3aa7" stroke-width="2"/>
-  <text x="80" y="420" font-size="15" font-weight="700" fill="#0b0b0b">TL-SG105PE <tspan font-size="12.5" font-weight="400" fill="#52514e">replaces the SG105E · mgmt 192.168.1.6 (Kea) · all 5 ports used</tspan></text>
-  <g font-size="11.5" text-anchor="middle">
-    <rect x="80"  y="436" width="88" height="56" rx="6" fill="#2a78d6" fill-opacity="0.12" stroke="#2a78d6"/>
-    <text x="124" y="454" font-weight="700" fill="#0b0b0b">P1 · V1</text>
-    <text x="124" y="470" fill="#52514e">apartment</text>
-    <text x="124" y="484" fill="#52514e">uplink</text>
-    <rect x="178" y="436" width="88" height="56" rx="6" fill="#2a78d6" fill-opacity="0.12" stroke="#2a78d6"/>
-    <text x="222" y="454" font-weight="700" fill="#0b0b0b">P2 · V1</text>
-    <text x="222" y="470" fill="#52514e">4G router</text>
-    <text x="222" y="484" fill="#52514e">192.168.1.7</text>
-    <rect x="276" y="436" width="88" height="56" rx="6" fill="#2a78d6" fill-opacity="0.12" stroke="#2a78d6"/>
-    <text x="320" y="454" font-weight="700" fill="#0b0b0b">P3 · V1</text>
-    <text x="320" y="470" fill="#52514e">UPS mgmt</text>
-    <rect x="374" y="436" width="88" height="56" rx="6" fill="#4a3aa7" fill-opacity="0.12" stroke="#4a3aa7" stroke-width="2"/>
-    <text x="418" y="454" font-weight="700" fill="#0b0b0b">P4 · V30</text>
-    <text x="418" y="470" fill="#52514e">camera</text>
-    <text x="418" y="484" fill="#52514e">PoE ON</text>
-    <rect x="472" y="436" width="88" height="56" rx="6" fill="#2a78d6" fill-opacity="0.10" stroke="#4a3aa7" stroke-width="2" stroke-dasharray="0"/>
-    <text x="516" y="454" font-weight="700" fill="#0b0b0b">P5 · trunk</text>
-    <text x="516" y="470" fill="#52514e">V1 untagged</text>
-    <text x="516" y="484" fill="#4a3aa7">+ V30 tagged</text>
-  </g>
-  <text x="80" y="516" font-size="12" fill="#52514e">802.1Q: VLAN 1 untagged {P1,P2,P3,P5} · VLAN 30 {P4 untagged/PVID 30, P5 tagged}</text>
-  <text x="80" y="534" font-size="12" fill="#52514e">tag-30 ingress on P1/P2/P3 is dropped (not members) — the trunk is the only tagged path</text>
-  <text x="80" y="558" font-size="12" fill="#8a8984">old TL-SG105E → retired, cold spare · backup-WAN (4G) + UPS keep their ports</text>
-
-  <!-- trunk: two parallel lines to eno1 -->
-  <path d="M560,458 C630,458 640,428 692,420" fill="none" stroke="#2a78d6" stroke-width="2.5"/>
-  <path d="M560,466 C632,466 644,436 692,428" fill="none" stroke="#4a3aa7" stroke-width="2.5"/>
-  <text x="588" y="404" font-size="12" font-weight="700" fill="#0b0b0b">LAN1 cable</text>
-
-  <!-- R730 / PVE zone -->
-  <rect x="680" y="330" width="880" height="440" rx="10" fill="#0b0b0b" fill-opacity="0.03" stroke="#b9b8b2"/>
-  <text x="696" y="356" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">DELL R730 — PVE HOST 192.168.1.127 (IN THE RACK)</text>
-
-  <g font-size="12">
-    <rect x="700" y="400" width="150" height="46" rx="6" fill="#2a78d6" fill-opacity="0.12" stroke="#4a3aa7" stroke-width="2"/>
-    <text x="712" y="419" font-weight="700" fill="#0b0b0b">eno1 → vmbr0</text>
-    <text x="712" y="436" fill="#52514e">untag V1 + tag 30</text>
-
-    <rect x="700" y="471" width="150" height="46" rx="6" fill="#ffffff" stroke="#8a8984" stroke-dasharray="4,3"/>
-    <text x="712" y="490" font-weight="700" fill="#52514e">eno2 → vmbr2</text>
-    <text x="712" y="507" fill="#8a8984">dormant fallback leg</text>
-
-    <rect x="700" y="542" width="150" height="46" rx="6" fill="#0b0b0b" fill-opacity="0.04" stroke="#8a8984"/>
-    <text x="712" y="561" font-weight="700" fill="#0b0b0b">vmbr1</text>
-    <text x="712" y="578" fill="#52514e">internal · tags 10/20</text>
-  </g>
-
-  <!-- pfSense VM -->
-  <rect x="890" y="388" width="300" height="230" rx="8" fill="#ffffff" stroke="#8a8984"/>
-  <text x="906" y="414" font-size="15" font-weight="700" fill="#0b0b0b">pfSense (VM 101)</text>
-  <text x="906" y="432" font-size="12" fill="#52514e">gateway + firewall for every segment</text>
-  <g font-size="12">
-    <rect x="906" y="444" width="268" height="34" rx="5" fill="#2a78d6" fill-opacity="0.12" stroke="#2a78d6"/>
-    <text x="916" y="465" fill="#0b0b0b">net0 · WAN <tspan fill="#52514e">192.168.1.2 · vmbr0 untagged</tspan></text>
-    <rect x="906" y="484" width="268" height="34" rx="5" fill="#eda100" fill-opacity="0.14" stroke="#eda100"/>
-    <text x="916" y="505" fill="#0b0b0b">net1 · dManagementsVms <tspan fill="#52514e">10.0.10.1</tspan></text>
-    <rect x="906" y="524" width="268" height="34" rx="5" fill="#1baf7a" fill-opacity="0.12" stroke="#1baf7a"/>
-    <text x="916" y="545" fill="#0b0b0b">net2 · dKubernetes <tspan fill="#52514e">10.0.20.1</tspan></text>
-    <rect x="906" y="564" width="268" height="34" rx="5" fill="#4a3aa7" fill-opacity="0.12" stroke="#4a3aa7" stroke-width="2"/>
-    <text x="916" y="585" fill="#0b0b0b">net3 · dCCTV <tspan fill="#52514e">10.0.30.1/24 · vmbr0 tag 30</tspan></text>
-  </g>
-  <path d="M850,415 L890,458" fill="none" stroke="#2a78d6" stroke-width="1.6" opacity="0.6"/>
-  <path d="M850,430 L890,581" fill="none" stroke="#4a3aa7" stroke-width="2"/>
-  <path d="M850,565 L890,501" fill="none" stroke="#8a8984" stroke-width="1.6" opacity="0.6"/>
-  <path d="M850,565 L890,541" fill="none" stroke="#8a8984" stroke-width="1.6" opacity="0.6"/>
-
-  <!-- k8s VMs -->
-  <rect x="1240" y="388" width="290" height="230" rx="8" fill="#1baf7a" fill-opacity="0.07" stroke="#1baf7a"/>
-  <text x="1256" y="414" font-size="15" font-weight="700" fill="#0b0b0b">k8s VMs · 10.0.20.0/24</text>
-  <text x="1256" y="434" font-size="12.5" fill="#52514e">vmbr1 tag 20 · pod egress SNATs</text>
-  <text x="1256" y="450" font-size="12.5" fill="#52514e">to node IPs</text>
-  <rect x="1256" y="464" width="258" height="66" rx="6" fill="#ffffff" stroke="#1baf7a"/>
-  <text x="1268" y="486" font-size="13.5" font-weight="700" fill="#0b0b0b">Frigate · k8s-node1 (T4)</text>
-  <text x="1268" y="504" font-size="12" fill="#52514e">detect sub / record main</text>
-  <text x="1268" y="520" font-size="12" fill="#52514e">gpumem budget 2300 MiB</text>
-  <rect x="1256" y="540" width="258" height="52" rx="6" fill="#ffffff" stroke="#1baf7a"/>
-  <text x="1268" y="562" font-size="13.5" font-weight="700" fill="#0b0b0b">go2rtc LB 10.0.20.204</text>
-  <text x="1268" y="580" font-size="12" fill="#52514e">restream → HA live view (MSE/HLS)</text>
-
-  <!-- HOME LAN zone -->
-  <rect x="1148" y="128" width="412" height="180" rx="10" fill="#2a78d6" fill-opacity="0.06" stroke="#2a78d6" stroke-opacity="0.4"/>
-  <text x="1164" y="154" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">HOME LAN 192.168.1.0/24</text>
-  <rect x="1164" y="168" width="180" height="56" rx="6" fill="#ffffff" stroke="#2a78d6"/>
-  <text x="1176" y="190" font-size="13.5" font-weight="700" fill="#0b0b0b">AX6000 · .1</text>
-  <text x="1176" y="208" font-size="11.5" fill="#52514e">+ route 10.0.30.0/24 → .2</text>
-  <rect x="1164" y="236" width="180" height="52" rx="6" fill="#ffffff" stroke="#2a78d6"/>
-  <text x="1176" y="258" font-size="13.5" font-weight="700" fill="#0b0b0b">ha-sofia · .8</text>
-  <text x="1176" y="275" font-size="11.5" fill="#52514e">Frigate card + hikvision_next</text>
-  <rect x="1360" y="168" width="184" height="56" rx="6" fill="#ffffff" stroke="#2a78d6"/>
-  <text x="1372" y="190" font-size="13.5" font-weight="700" fill="#0b0b0b">apartment clients</text>
-  <text x="1372" y="208" font-size="11.5" fill="#52514e">laptops, phones</text>
-  <rect x="1360" y="236" width="184" height="52" rx="6" fill="#ffffff" stroke="#52514e" stroke-dasharray="5,4"/>
-  <text x="1372" y="256" font-size="11.5" font-weight="700" fill="#52514e">CAMERA DAY: static route</text>
-  <text x="1372" y="272" font-size="11.5" fill="#52514e">10.0.30.0/24 via 192.168.1.2</text>
-
-  <path d="M1254,308 C1150,352 950,372 790,400" fill="none" stroke="#2a78d6" stroke-width="2" opacity="0.6"/>
-  <text x="1010" y="374" font-size="12" fill="#2a78d6">apartment uplink · switch P1 · trunk · eno1</text>
-
-  <!-- FLOWS -->
-  <path d="M1256,497 C1010,690 330,730 120,650 C40,618 40,380 96,286" fill="none" stroke="#008300" stroke-width="3" marker-end="url(#arrGreen)"/>
-  <text x="620" y="700" font-size="13.5" font-weight="700" fill="#008300">ALLOW · Frigate → camera RTSP :554 (routed k8s → dCCTV; opt1 allow-all)</text>
-
-  <path d="M1164,262 C820,282 470,268 302,176 C286,167 278,166 270,172" fill="none" stroke="#008300" stroke-width="3" marker-end="url(#arrGreen)"/>
-  <text x="484" y="216" font-size="13.5" font-weight="700" fill="#008300">ALLOW · ha-sofia → camera :80 ISAPI + :554</text>
-  <text x="484" y="234" font-size="12" fill="#52514e">enters pfSense WAN · reply-to off · needs the AX6000 route</text>
-
-  <path d="M280,232 C660,200 860,320 936,386" fill="none" stroke="#008300" stroke-width="2" opacity="0.85" marker-end="url(#arrGreen)"/>
-  <text x="740" y="322" font-size="12.5" font-weight="700" fill="#008300">ALLOW · camera → 10.0.30.1:123 (NTP)</text>
-
-  <!-- LEGEND -->
-  <g transform="translate(40,800)" font-size="12.5">
-    <rect x="0" y="0" width="18" height="18" rx="4" fill="#2a78d6" fill-opacity="0.12" stroke="#2a78d6"/>
-    <text x="26" y="14" fill="#0b0b0b">home LAN / VLAN 1</text>
-    <rect x="200" y="0" width="18" height="18" rx="4" fill="#4a3aa7" fill-opacity="0.12" stroke="#4a3aa7" stroke-width="2"/>
-    <text x="226" y="14" fill="#0b0b0b">CCTV / VLAN 30 / dCCTV 10.0.30.0/24</text>
-    <rect x="500" y="0" width="18" height="18" rx="4" fill="#1baf7a" fill-opacity="0.12" stroke="#1baf7a"/>
-    <text x="526" y="14" fill="#0b0b0b">dKubernetes</text>
-    <rect x="640" y="0" width="18" height="18" rx="4" fill="#eda100" fill-opacity="0.14" stroke="#eda100"/>
-    <text x="666" y="14" fill="#0b0b0b">dManagementsVms</text>
-    <line x1="820" y1="9" x2="860" y2="9" stroke="#008300" stroke-width="3" marker-end="url(#arrGreen)"/>
-    <text x="870" y="14" fill="#0b0b0b">allowed flow</text>
-    <line x1="980" y1="9" x2="1020" y2="9" stroke="#e34948" stroke-width="3" marker-end="url(#arrRed)"/>
-    <text x="1030" y="14" fill="#0b0b0b">denied</text>
-    <line x1="1100" y1="9" x2="1140" y2="9" stroke="#52514e" stroke-width="2" stroke-dasharray="6,5"/>
-    <text x="1150" y="14" fill="#0b0b0b">camera-day step</text>
-    <text x="1320" y="14" fill="#52514e">ADR-0017 · rev 3</text>
-  </g>
-</svg>
--- a/docs/adr/0017-cctv-vlan-tagging.excalidraw
+++ b/docs/adr/0017-cctv-vlan-tagging.excalidraw
--- a/docs/adr/0017-cctv-vlan-tagging.svg
+++ b/docs/adr/0017-cctv-vlan-tagging.svg
--- a/docs/adr/0018-valia-sites-off-infra-pages-in-cluster-sync.md
+++ b/docs/adr/0018-valia-sites-off-infra-pages-in-cluster-sync.md
@ -1,47 +0,0 @@
-# Valia sites are served off-infra (Cloudflare Pages), synced in-cluster
-
-Valia (Viktor's mother) authors small one-page static sites in Google Drive folders she
-shares, and keeps asking for them to be hosted — two exist already (`stem95su`, `bridge`)
-and more are expected. We decided all **Valia sites** are served **off-infra on Cloudflare
-Pages** under `<english-name>.viktorbarzin.me`, kept fresh by **one shared in-cluster
-CronJob** (`stacks/valia-sites/`) that mirrors each **Content folder** every 10 minutes
-(rclone, drive.readonly) and re-deploys only on change (wrangler direct upload). The
-existing in-cluster `stem95su` serving stack (nginx + NFS + ingress + per-site sync)
-migrates onto this and is retired.
-
-Why off-infra serving: these are her sites, shown to teachers/parents — they must survive
-homelab outages (cf. the 2026-06-27 egress incident that took every proxied in-cluster
-site down). With Pages, a homelab outage degrades to "content frozen until we're back",
-never "site down". Serving costs no cluster resources and no per-site nginx/PVC/ingress/
-Anubis. Why the syncer stays in-cluster anyway: secrets stay in Vault (no per-site GHA
-secret sprawl), and the stem95su guard patterns (hard-fail on Drive auth errors, never
-wipe a live site on an empty/partial folder, capped deletes) carry over wholesale. The
-deliberate asymmetry — off-infra serving, on-infra syncing — is the point, not an
-accident.
-
-## Considered options
-
- **In-cluster everywhere** (generalise stem95su into a factory module): one roof, no
-  Cloudflare Pages dependency — but her sites share the homelab's fate and each site
-  spends cluster resources to serve static files a free CDN serves better.
- **Pages for new sites only**: less work now, two patterns and two runbooks forever.
- **GHA-scheduled sync** (fully off-infra pipeline): no cluster dependency at all, but
-  Drive + Cloudflare credentials would live as GitHub secrets per repo, outside Vault.
-
-## Consequences
-
- Registration is one entry in the `sites` map (name, Content folder, optional Entry
-  file); CI applies Pages project, custom domain, public CNAME, and internal-DNS config
-  together. Names are English, picked by Viktor (most → bridge set the precedent).
- The internal split-horizon zone learns Valia sites from a ConfigMap the
-  `technitium-ingress-dns-sync` script consumes — declaratively, including **removal**
-  (the previous static-CNAME approach was add-only; a retired site left a stale record).
- Deploy-on-change is mandatory, not an optimisation: Pages caps monthly deployments on
-  the free tier, and a 10-minute cadence would burn ~4,300/month if unchanged runs
-  deployed.
- Failure visibility is **failed-Job-only** by explicit choice (no stale-sync alert, no
-  per-site uptime monitors, no notifications to Valia) — Viktor fields "it didn't
-  update" reports, consistent with the alert-noise-reduction posture. Revisit if a
-  silent stall actually bites.
- If the homelab is down, content updates pause; the sites keep serving last-deployed
-  content. Accepted degradation.
--- a/docs/adr/0019-backup-mx-self-hosted-oracle-relay.md
+++ b/docs/adr/0019-backup-mx-self-hosted-oracle-relay.md
@ -1,97 +0,0 @@
-# Inbound mail gets a self-hosted store-and-forward backup MX on Oracle Always-Free
-
-`viktorbarzin.me` has run a single direct MX to the home IP since the 2026-04-12
-inbound overhaul, with sender-MTA retry (1–5 days, sender-dependent) as the only
-outage protection — a documented "No Backup MX" decision made after ForwardEmail's
-forced anti-spoofing rejected legitimate forwarded mail and Cloudflare Email
-Routing proved pass-through-only. Viktor now wants inbound mail to survive
-homelab outages **without loss** (2026-07-04): delayed delivery is fine,
-mid-outage reading is not required, and the budget is **$0** — a hard
-constraint that eliminated every managed option (see below).
-
-We run a minimal **Postfix store-and-forward relay on an Oracle Cloud
-Always-Free `VM.Standard.E2.1.Micro`** (`mx2.viktorbarzin.me`, **reserved**
-public IP, MX preference 20; primary untouched at 1). It accepts everything
-for the domain (catch-all — every RCPT is valid; reputation may only ever
-4xx-defer, via postscreen pregreet + conservative DNSBL-defer on the VM —
-never 5xx: a backup MX that hard-rejects manufactures the loss it exists to
-prevent), queues up to **30 days** (bounce lifetime 1 day — the VM can never
-deliver a DSN, its only egress is the drain), and drains to the primary over
-**port 2526** — one scripted pfSense WAN NAT rule onto the existing HAProxy
-frontend — because Oracle blocks egress TCP 25 tenancy-wide. Management is
-tailnet-only (headscale preauth key, `tag:backup-mx`; OCI console as
-mid-outage break-glass since headscale itself lives in the cluster); TLS via
-certbot HTTP-01 (port 80 permanently open — LE validation is
-multi-perspective and unscopeable); the VM is a cattle-rebuild from a new
-`stacks/backup-mx/` Terraform stack (OCI provider + cloud-init, which must
-also punch 25/80 through the OCI Ubuntu image's OS-level iptables REJECT).
-On the primary, the drain stream (one /32) is enabled at the layers that
-actually bite — `check_client_access` permits past
-`reject_unknown_client_hostname` and spoof-protection, an anvil rate-limit
-exception, and rspamd `external_relay` (score against the *original* sender
-IP) with the reject action capped to tag/fold so drained spam can never force
-the VM to emit backscatter. Go-live is gated on empirical checks: inbound-25
-reachability (recurring probe — Oracle publishes no commitment), drain
-end-to-end, and a live failover test that includes a high-spam-score and a
->10 MB message. Two independent adversarial reviews (2026-07-04) shaped this
-final form. Design:
-[`plans/2026-07-04-backup-mx-design.md`](../plans/2026-07-04-backup-mx-design.md).
-
-## Considered options
-
- **Roller Network free Secondary MX** — v1 of this decision, killed at the
-  validation gates the same day: free tier caps at 200 relayed messages or
-  10 MB per rolling 7 days, and overage suspends the domain for 48 h
-  answering **SMTP 5xx** (permanent bounces) — since spammers target backup
-  MXes even while the primary is up, background spam alone can hold it
-  suspended, making it *worse than no backup MX*. Free accounts are also
-  being discontinued. (Their TLS checked out; their paid Basic at $30/yr is
-  the documented fallback if the OCI route sours.)
- **Dynu Email Backup ($9.99/yr)** — queue lifetime undocumented (FAQ hints
-  12–24 h, barely beating sender retry); filtering black-box; not free.
- **Cloudflare Email Routing / mailflare** — no store-and-forward / terminal
-  inbox on Cloudflare; rejected earlier (2026-04-12; 2026-07-04 memory #7148).
- **Other free tiers** (challenged and re-verified 2026-07-04): GCP e2-micro
-  blocks egress 25 too and its free regions are US-only; AWS's 2025+ "free"
-  plan is a 6-month credit; Azure has no always-free VM and blocks 25;
-  Hetzner has no free tier; Fly.io ended free allowances; Vultr/Linode are
-  trial credits; DNSExit/KisoLabs/DuoCircle backup-MX are paid or dead. OCI
-  is the only standing free option.
- **Harden-only** (5xx-misconfig guards + paging) — does not address
-  multi-day outages or short-retry senders; deferred as a complementary
-  track.
-
-## Consequences
-
- **A pet outside the cluster** — deliberately cattle: rebuilt entirely from
-  Terraform + cloud-init, patched by unattended-upgrades, scraped by the
-  cluster's Prometheus (exporters on the reserved public IP, allowlisted to
-  the homelab WAN /32 — there is **no cluster→tailnet route**, so tailnet
-  scraping was rejected as fictional; blackbox TCP:25 + MX-set drift alerts
-  besides). Never a backup target itself.
- **Oracle free-tier caprice is the top risk**: Oracle silently halved the A1
-  free allowance in June 2026 and terminated over-limit instances, and
-  publishes no commitment that inbound 25 stays open. Mitigations:
-  **Pay-As-You-Go conversion is a required prerequisite** (exempts idle
-  reclamation, stays $0), a recurring inbound-25 probe, `BackupMxDown`, and
-  the queue being empty outside outages (a surprise reclamation loses
-  coverage, never mail). Home region is fixed at signup — Frankfurt, chosen
-  once.
- The drain stream bypasses `reject_unknown_client_hostname`, anvil limits,
-  and rspamd's reject tier for one /32; DKIM verification, SPF/DMARC (against
-  the original IP via `external_relay`), and content scoring stay on — spam
-  arriving via the backup is tagged and folded to Junk, never bounced. The VM
-  is deliberately NOT in the primary's `mynetworks` (a compromised VM must
-  not relay through us).
- **Outages > 30 days lose queued mail silently** — no DSN can ever leave the
-  VM. Stated and accepted (6× better than the status quo).
- Outage mail sits in plaintext on Oracle disk ≤ 30 days — single-tenant but
-  off-premises; accepted (same class as Brevo holding outbound today).
- Cloudflare zone lands at 197/200 records; the MTA-STS follow-up (policy
-  host found dangling during design — inert today; must list `mx2` when
-  fixed) needs 1–2 more → schedule the next record purge proactively.
- `architecture/mailserver.md` §"No Backup MX" superseded at implementation;
-  new runbook `docs/runbooks/backup-mx.md` (incl. OCI console break-glass);
-  `vpn.md`'s stale headscale claims fixed in passing; the roundtrip probe's
-  failure semantics change (a "failing" probe may now mean "delayed via mx2,
-  drains shortly" — noted in alert description).
--- a/docs/architecture/chrome-service.md
+++ b/docs/architecture/chrome-service.md
@ -329,12 +329,6 @@ Two independent grants make up "browser access" for a user:
 the provisioner. To revoke: remove from `CHROME_ALLOWED` and delete the SA (rotate
 a token by deleting its `<user>-browser-token` Secret).

-Because the SA is the user's DEFAULT kubectl credential, other per-namespace
-port-forward grants hang off the same identity: `stacks/excalidraw/rbac.tf`
-grants `emo-browser` `pods/portforward` in `excalidraw` (2026-07-02) so emo's
-agent can upload drawings via the port-forward + `X-Authentik-Username` recipe
-in his `~/.claude/CLAUDE.md`. Revoking the SA revokes those too.
-
 ## Limits + risks

 - **Anti-bot vs stealth arms race** — when an upstream beats us (DRM
--- a/docs/architecture/ci-cd.md
+++ b/docs/architecture/ci-cd.md
@ -94,7 +94,7 @@ can't reach Forgejo's public hairpin.
 | Visibility | Packages | Pull mechanism |
 |------------|----------|----------------|
 | **Public** | beadboard, nextcloud-todos, claude-agent-service, claude-memory-mcp, kms-website, freedify, tuya_bridge, x402-gateway, chrome-service-novnc, android-emulator | Anonymous |
-| **Private** | f1-stream, job-hunter, instagram-poster, payslip-ingest, wealthfolio-sync, fire-planner, recruiter-responder, tripit, infra-cli, infra-ci, k8s-portal, excalidraw-library | `ghcr-credentials` dockerconfigjson |
+| **Private** | f1-stream, job-hunter, instagram-poster, payslip-ingest, wealthfolio-sync, fire-planner, recruiter-responder, tripit, infra-cli, infra-ci | `ghcr-credentials` dockerconfigjson |

 Private-image pulls use the `ghcr-credentials` dockerconfigjson, cloned by the
 kyverno stack's `sync-ghcr-credentials` ClusterPolicy to an explicit
@ -188,8 +188,6 @@ reconciled — the workflows were added to the GitHub lineage via PR):
 | android-emulator | `build-android-emulator.yml` | public `ghcr.io/viktorbarzin/android-emulator` |
 | infra CLI | `build-cli.yml` | DockerHub `viktorbarzin/infra` (kept) + `ghcr.io/viktorbarzin/infra-cli` |
 | infra-ci | `build-infra-ci.yml` | private `ghcr.io/viktorbarzin/infra-ci` |
-| k8s-portal | `build-k8s-portal.yml` | private `ghcr.io/viktorbarzin/k8s-portal` (Keel rolls `:latest` digests) |
-| excalidraw-library | `build-excalidraw.yml` | private `ghcr.io/viktorbarzin/excalidraw-library` (Keel rolls `:latest` digests; DockerHub `:v4` frozen as rollback) |

 **`infra-ci`** is the image the `.woodpecker/default.yml` apply step and
 `drift-detection.yml` run in (proven by pipelines 165/166). `chatterbox-tts` is
--- a/docs/architecture/dns.md
+++ b/docs/architecture/dns.md
@ -277,7 +277,7 @@ Technitium's **Split Horizon AddressTranslation** app post-processes DNS respons

 Config is synced to all 3 Technitium instances by CronJob `technitium-split-horizon-sync` (every 6h).

-**Superset rule for the internal `viktorbarzin.me` zone**: it is authoritative for every internal client (pods included since 2026-06-10), so it must carry every record type those clients consume — not just ingress A/CNAMEs. The `technitium-ingress-dns-sync` CronJob therefore also maintains the static **mail-auth records** (apex SPF + brevo-code TXT, MX → mail.viktorbarzin.me, `_dmarc`, `mail._domainkey` DKIM), mirrored from the public Cloudflare zone. Without them, rspamd on the mailserver saw `SPF=none` for inbound `@viktorbarzin.me` mail and quarantined it (broke the Brevo email-roundtrip probe, 2026-06-10). If these records change in Cloudflare, update the sync script too. **Off-infra Valia sites** (Cloudflare Pages, ADR-0018) are the other class of public-only names with no Traefik ingress — without internal records they NXDOMAIN for every internal client while working fine externally. Since 2026-07-03 they are reconciled **declaratively**: `stacks/valia-sites` writes the ConfigMap `valia-sites-dns` (technitium ns, `<name> → <project>.pages.dev`), and the sync script ensures/updates a CNAME per entry and **deletes** stale internal CNAMEs targeting `*.pages.dev` that left the map (retire/rename cleans itself up; deletion is suffix-scoped so nothing else can be touched).
+**Superset rule for the internal `viktorbarzin.me` zone**: it is authoritative for every internal client (pods included since 2026-06-10), so it must carry every record type those clients consume — not just ingress A/CNAMEs. The `technitium-ingress-dns-sync` CronJob therefore also maintains the static **mail-auth records** (apex SPF + brevo-code TXT, MX → mail.viktorbarzin.me, `_dmarc`, `mail._domainkey` DKIM), mirrored from the public Cloudflare zone. Without them, rspamd on the mailserver saw `SPF=none` for inbound `@viktorbarzin.me` mail and quarantined it (broke the Brevo email-roundtrip probe, 2026-06-10). If these records change in Cloudflare, update the sync script too.

 ## NodeLocal DNSCache

@ -368,7 +368,6 @@ The Cloudflare tunnel uses a **wildcard rule** (`*.viktorbarzin.me → Traefik`)
 | TXT (MTA-STS) | 1 | `v=STSv1; id=20260412` | TLS enforcement |
 | TXT (TLSRPT) | 1 | `v=TLSRPTv1; rua=mailto:postmaster@...` | TLS reporting |
 | A (keyserver) | 1 | `130.162.165.220` (Oracle VPS) | PGP keyserver |
-| CNAME (CF Pages) | 2 | `<project>.pages.dev` (Cloudflare Pages) | bridge, stem95su — Valia sites (ADR-0018), managed by `stacks/valia-sites` |

 ### Proxied vs Non-Proxied

@ -514,7 +513,6 @@ For external `.viktorbarzin.me` records:
 1. Add `dns_type = "proxied"` (or `"non-proxied"`) to the `ingress_factory` module call in the service stack
 2. Run `scripts/tg apply` on the service stack — DNS record is auto-created
 3. For non-standard records (MX, TXT), add a `cloudflare_record` resource in `stacks/cloudflared/modules/cloudflared/cloudflare.tf`
-4. For a Valia site (off-infra Cloudflare Pages), add the entry to `local.sites` in `stacks/valia-sites/main.tf` — public CNAME + internal record both follow (`docs/runbooks/valia-sites.md`)

 ## Incident History

--- a/docs/architecture/mailserver.md
+++ b/docs/architecture/mailserver.md
@ -161,17 +161,6 @@ https://mail.viktorbarzin.me → Traefik → Roundcubemail
  DB: MySQL (mysql.dbaas.svc.cluster.local)
 ```

-### Paperless ingest mailbox (docs@)
-
-`docs@viktorbarzin.me` is a dedicated real mailbox (explicit self-alias in
-`extra/aliases.txt` so the `@domain → spam@` catch-all doesn't shadow it) that
-paperless-ngx polls over IMAP; family members forward document emails to it
-and the sender maps 1:1 to a paperless account. A per-user Dovecot sieve
-(`docs-at-viktorbarzin.me.dovecot.sieve` in the `mailserver.config` ConfigMap,
-mounted as `/tmp/docker-mailserver/docs@viktorbarzin.me.dovecot.sieve`)
-discards mail from non-allowlisted senders at delivery. Full flow, sender map,
-and add-a-sender procedure: [`runbooks/paperless-mail-ingest.md`](../runbooks/paperless-mail-ingest.md).
-
 ## DNS Records

 All managed in Terraform at `stacks/cloudflared/modules/cloudflared/cloudflare.tf`.
@ -311,21 +300,6 @@ Push secrets (`BREVO_API_KEY`, `EMAIL_MONITOR_IMAP_PASSWORD`) come from External

 ## Troubleshooting

-### All mail tempfailing with `451 4.3.0 queue file write error` (postsrsd spin)
-
-Seen 2026-07-03 right after a pod restart. Signature in `/var/log/mail/mail.log`:
-`postfix/cleanup: warning: tcp:localhost:10001 lookup error` +
-`sender_canonical_maps map lookup problem ... message not accepted, try again later`.
-Cause: **postsrsd** (SRS daemon, `sender_canonical_maps = tcp:localhost:10001`)
-came up spinning at 100% CPU without binding 10001/10002 — supervisor shows it
-`RUNNING` but `ss -ltn | grep 1000` is empty and its log is empty. Postfix then
-tempfails every message (inbound AND submission); senders retry so nothing is
-lost, and the roundtrip probe alerts within the hour.
-Fix: `supervisorctl restart postsrsd` inside the container; if the fresh
-process spins again (it did once), `kubectl -n mailserver delete pod` for a
-full re-init — that healed it. Root cause not pinned down (one-off bad init;
-postsrsd 1.10).
-
 ### Inbound mail not arriving
 1. **DNS/MX**: `dig MX viktorbarzin.me +short` → should show `mail.viktorbarzin.me`
 2. **WAN reachability**: `nc -zw5 mail.viktorbarzin.me 25` from outside
--- a/docs/architecture/networking.md
+++ b/docs/architecture/networking.md
@ -1,10 +1,10 @@
 # Networking Architecture

-Last updated: 2026-07-02 (dCCTV segment added — dedicated pfSense leg for the garage camera, ADR-0017)
+Last updated: 2026-04-19 (WS E — Kea DHCP pushes dual DNS per subnet; Kea DDNS TSIG-signed)

 ## Overview

-The homelab network is built on three isolated segments behind pfSense (management VLAN 10, Kubernetes VLAN 20, and the physically-legged dCCTV camera segment — see ADR-0017) with pfSense providing gateway services, Technitium for internal DNS, and Cloudflare for external DNS. Traefik serves as the Kubernetes ingress controller with a middleware chain of anti-AI bot-blocking, Authentik forward-auth, rate limiting, and retry. CrowdSec IP-reputation enforcement is **out-of-band** (not a Traefik hop): banned IPs are dropped in-kernel via nftables on direct hosts and blocked at the Cloudflare edge on proxied hosts (see `docs/architecture/security.md`). All HTTP traffic flows through Cloudflared tunnels, avoiding the need for port forwarding or exposing public IPs.
+The homelab network is built on a dual-VLAN architecture with pfSense providing gateway services, Technitium for internal DNS, and Cloudflare for external DNS. Traefik serves as the Kubernetes ingress controller with a middleware chain of anti-AI bot-blocking, Authentik forward-auth, rate limiting, and retry. CrowdSec IP-reputation enforcement is **out-of-band** (not a Traefik hop): banned IPs are dropped in-kernel via nftables on direct hosts and blocked at the Cloudflare edge on proxied hosts (see `docs/architecture/security.md`). All HTTP traffic flows through Cloudflared tunnels, avoiding the need for port forwarding or exposing public IPs.

 ## Architecture Diagram

@ -24,14 +24,9 @@ graph TB

    CSdrop[CrowdSec drop<br/>nftables / CF edge<br/>out-of-band, pre-Traefik]

-    subgraph "Proxmox Host (eno1, eno2)"
+    subgraph "Proxmox Host (eno1)"
        vmbr0[vmbr0 Bridge<br/>192.168.1.127/24]
        vmbr1[vmbr1 Internal<br/>VLAN-aware]
-        vmbr2[vmbr2 Bridge<br/>eno2 → TL-SG105PE]
-
-        subgraph "dCCTV - 10.0.30.0/24<br/>ADR-0017"
-            Camera[vermont-garage<br/>10.0.30.70]
-        end

        subgraph "VLAN 10 - Management<br/>10.0.10.0/24"
            Proxmox[Proxmox Host<br/>10.0.10.1]
@ -76,9 +71,6 @@ graph TB
    vmbr1 -.VLAN 20.- Tech
    vmbr1 -.VLAN 20.- Master
    vmbr1 -.VLAN 20.- Node1
-    vmbr2 -.physical link.- eno2
-    vmbr2 -.untagged.- Camera
-    vmbr2 -.pfSense net3 = dCCTV 10.0.30.1.- pfSense
 ```

 ## Components
@ -89,7 +81,6 @@ graph TB
 | phpIPAM | v1.7.0 | phpipam.viktorbarzin.me | IP address management, device inventory, DNS sync |
 | vmbr0 | Linux bridge | 192.168.1.127/24 | Physical bridge on eno1, uplink to LAN |
 | vmbr1 | Linux bridge (VLAN-aware) | Internal | VLAN trunk for VM isolation |
-| vmbr2 | Linux bridge | Physical (eno2) | DORMANT fallback leg for dCCTV (ADR-0017 rev 3) — live dCCTV rides vmbr0 tag 30 over the LAN1 trunk |
 | Technitium DNS | Container | 10.0.20.201 (LB) / 10.96.0.53 (ClusterIP) | Internal DNS (viktorbarzin.lan) + full recursive resolver |
 | Cloudflare DNS | SaaS | External | ~50 public domains under viktorbarzin.me |
 | Cloudflared | Container | K8s (3 replicas) | Tunnel ingress, replaces port forwarding |
@ -99,22 +90,6 @@ graph TB
 | MetalLB | v0.15.3 Helm chart | K8s | LoadBalancer IPs (10.0.20.200-10.0.20.220), all services on 10.0.20.200 |
 | Registry Cache | Container | 10.0.20.10 | Pull-through for docker.io:5000, ghcr.io:5010 |

-## CCTV Segment (dCCTV) — as-built 2026-07-02
-
-Isolated camera segment for owned cameras at the Sofia site (first: `vermont-garage`, HiLook IPC-T241H-C at the garage entrance). Decision + rejected alternatives: `docs/adr/0017-cctv-segment-dedicated-pfsense-leg.md`.
-
-**Physical path (rev 3, single switch)**: camera → TL-SG105PE PoE port (untagged VLAN 30) → trunk port (home LAN untagged + CCTV **tagged 30**) → the existing LAN1 cable → R730 `eno1` → `vmbr0` (vlan-aware) → pfSense `net3`/vtnet3 = `vmbr0 tag=30` = interface **dCCTV `10.0.30.1/24`**. The TL-SG105PE **replaces** the old garage TL-SG105E (retired to cold spare) and carries everything: apartment uplink, 4G router `192.168.1.7`, UPS mgmt (VLAN 1), camera (VLAN 30), trunk — all 5 ports used. VLAN-30 membership is {camera port, trunk port} only, so tagged injection from other ports is dropped. `eno2`/`vmbr2` remain dormant as the fallback physical leg (rev 2).
-
-**Addressing**: Kea DHCP pool `10.0.30.100-199`; devices get MAC reservations (camera `10.0.30.70`; the PE switch mgmt inherits the retired switch's `192.168.1.6` on the home LAN). Kea DDNS auto-registers names in Technitium; `phpipam-pfsense-import` picks up leases hourly.
-
-**Firewall** (all on pfSense):
- dCCTV in: pass `udp OPT4-net → 10.0.30.1:123` (NTP) — everything else hits the interface's default deny. Cameras cannot reach LAN, other segments, or the internet.
- WAN in (home LAN side): pass `192.168.1.8` (ha-sofia) → `10.0.30.70:80` (ISAPI/hikvision_next) and `:554` (RTSP), reply-to disabled on both.
- dKubernetes is allow-all, so cluster Frigate/go2rtc pulls RTSP with no extra rule (pod egress SNATs to node IPs).
- Home-LAN clients need the **AX6000 static route** `10.0.30.0/24 via 192.168.1.2` (camera-day step) to reach the camera UI.
-
-**Consumers**: cluster Frigate (`/srv/nfs/frigate/config/config.yml` — NOT Terraform) pulls `rtsp://10.0.30.70:554` main+sub as `vermont-garage`; HA integrates via Frigate plus direct hikvision_next for tamper events.
-
 ## IPAM & DNS Auto-Registration

 Devices are automatically discovered, named, and registered in DNS without manual intervention.
@ -232,8 +207,6 @@ VMs tag traffic on vmbr1 to isolate workloads. pfSense bridges VLAN 20 to the up
  - blog, hackmd, privatebin, url, echo, f1tv, excalidraw, send, audiobookshelf, jsoncrack, ntfy, cyberchef, homepage, linkwarden, changedetection, tandoor, n8n, stirling-pdf, dashy, city-guesser, travel, netbox
 - **Non-proxied domains** (grey cloud, direct IP resolution):
  - mail, wg, headscale, immich, calibre, vaultwarden, and other services requiring direct connections
- **Internal-IP domains** (grey cloud, A → `10.0.20.203` Traefik LB, `ingress_factory` `dns_type = "internal"`):
-  - highlights-immich, highlights-immich-emo — publicly *resolvable* but only *routable* from home LANs / WG sites / VPN (spokes policy-route `10.0.0.0/8` down the tunnel, so kiosk devices with baked-in URLs need no per-site DNS overrides). The record is reachability, not a gate — enforcement is the `home-lans-only` Traefik ipAllowList (Sofia/London/Valchedrym LANs + 10/8) on the ingress. See `docs/plans/2026-07-04-immich-frame-lan-only-design.md`.
 - CNAME records for proxied domains point to Cloudflared tunnel FQDNs

 ### Ingress Flow
@ -288,7 +261,7 @@ Traefik chain:

 1. **Anti-AI bot-block** (`ai-bot-block` ForwardAuth, on by default via `ingress_factory`): blocks/tarpits known AI crawlers. **Fail-open** (currently a no-op `return 200` — poison-fountain scaled to 0; see `docs/architecture/security.md`).
 2. **Authentik Forward-Auth** (if `protected = true`): SSO authentication via OIDC. Non-authenticated users are redirected to login. Auth headers are stripped before forwarding to backend.
-3. **Rate Limiting**: Per-IP throttling. Returns **429 Too Many Requests** (not 503) when limit exceeded. Default is `rate-limit` (average 10 req/s, burst 50). Services whose clients legitimately burst harder get a dedicated middleware via `skip_default_rate_limit = true` + `extra_middlewares`: Immich (`immich-rate-limit`, 1000/20000, photo uploads), ActualBudget (`actualbudget-rate-limit`, 50/300 — the Actual web app boots with ~70 parallel asset/migration revalidations; the default burst 429'd the tail and stalled every page load), authentik (`authentik-rate-limit`, 100/1000, on `/` and `/static` — the login SPA cold-loads ~70 flow-executor JS/CSS chunks from `/static`; the default burst 429'd the tail and a failed ES-module import left a blank login screen for cold/incognito/NAT-shared clients), tripit (`tripit-rate-limit`, 100/1000, photo-tab thumbnail bursts), health (`health-rate-limit`, 100/1000, SPA shell + API burst per page), and dawarich (`dawarich-rate-limit`, 100/1000 — the Rails app self-serves all fingerprinted assets and the map adds an API burst per load; the default burst 429'd the asset tail and risked dropping OwnTracks/mobile location POSTs on the same host).
+3. **Rate Limiting**: Per-IP throttling. Returns **429 Too Many Requests** (not 503) when limit exceeded. Default is `rate-limit` (average 10 req/s, burst 50). Services whose clients legitimately burst harder get a dedicated middleware via `skip_default_rate_limit = true` + `extra_middlewares`: Immich (`immich-rate-limit`, 1000/20000, photo uploads), ActualBudget (`actualbudget-rate-limit`, 50/300 — the Actual web app boots with ~70 parallel asset/migration revalidations; the default burst 429'd the tail and stalled every page load), and authentik (`authentik-rate-limit`, 100/1000, on `/` and `/static` — the login SPA cold-loads ~70 flow-executor JS/CSS chunks from `/static`; the default burst 429'd the tail and a failed ES-module import left a blank login screen for cold/incognito/NAT-shared clients).
 4. **Retry**: 2 attempts with 100ms delay on transient failures (5xx errors, connection errors).

 Additional middleware:
@ -579,7 +552,7 @@ chain — a CrowdSec/LAPI outage cannot cause 503s; it only stops new bans.) Che

 **Diagnosis**: Check Traefik middleware config for the affected IngressRoute.

-**Fix**: Give the service a dedicated higher-limit middleware (don't loosen the shared default): define `<service>-rate-limit` in `stacks/traefik/modules/traefik/middleware.tf`, then set `skip_default_rate_limit = true` + `extra_middlewares = ["traefik-<service>-rate-limit@kubernetescrd"]` on its `ingress_factory` call. Shared default is average 10 req/s / burst 50; Immich uses 1000/20000, ActualBudget 50/300, and tripit/health/authentik/dawarich each 100/1000 (SPA or asset-heavy page loads bursting past the default from one client IP).
+**Fix**: Give the service a dedicated higher-limit middleware (don't loosen the shared default): define `<service>-rate-limit` in `stacks/traefik/modules/traefik/middleware.tf`, then set `skip_default_rate_limit = true` + `extra_middlewares = ["traefik-<service>-rate-limit@kubernetescrd"]` on its `ingress_factory` call. Shared default is average 10 req/s / burst 50; Immich uses 1000/20000, ActualBudget 50/300, authentik 100/1000 (login SPA `/static` chunk burst → blank screen).

 ### Large Downloads or Uploads Truncate / Fail Partway

--- a/docs/plans/2026-07-03-vault-token-self-heal-design.md
+++ b/docs/plans/2026-07-03-vault-token-self-heal-design.md
@ -1,103 +0,0 @@
-# Vault Token Renewer Self-Heal Design
-
-**Date**: 2026-07-03
-**Status**: Approved (brainstorm complete; implementation pending)
-**Owner**: wizard@devvm
-**Supersedes**: the "version-only, no self-heal" scope choice recorded in
-`docs/runbooks/vault-token-renew-devvm.md` (2026-06-07)
-
-## Problem
-
-`wizard@devvm` holds a maintenance-free periodic Vault token
-(`token-devvm-wizard`, `period=768h`, renewed daily by the
-`vault-token-renew` user timer) precisely so no weekly re-login is needed.
-But `~/.vault-token` is the Vault CLI's default token sink, so any
-`vault login -method=oidc` — which the infra docs themselves instruct before
-applies — overwrites it with a 7-day OIDC token. The renewer's drift guard
-(deliberately detect-only) then refuses to renew the foreign token and fails
-the unit daily, into a log nobody watches.
-
-Observed consequence: a self-perpetuating weekly-expiry loop. The OIDC token
-expires after 7 days → Vault 403s → the natural response is another
-`vault login -method=oidc` → clobbers again. Drift persisted unnoticed
-2026-06-18 → 06-26 and 2026-06-29 → 07-03 (memory #7121); Viktor experienced
-it as "the token expires maybe once a week".
-
-**Goal**: `vault login -method=oidc` becomes harmless on devvm. The renewer
-converts any admin-capable clobber back into the permanent periodic token,
-unattended. (Chosen over "never log in" doc-fixes and over instant path-unit
-healing — see Alternatives.)
-
-## Decisions
-
-| # | Decision | Notes |
-|---|----------|-------|
-| 1 | Heal in the existing renewer's drift branch, at its nightly run | ~20-line diff to an already-tested script; no new units. A few-hours window holding the 7-day OIDC token is harmless (heal window 24h ≪ 7d TTL) |
-| 2 | Heal = *attempt* re-mint using the foreign token itself; let Vault's 403 decide | No policy-list guessing — identity-vs-token-policies burned us before (memory #4211). OIDC tokens carry `vault-admin` via `identity_policies`, so the create succeeds |
-| 3 | Weak foreign token (create denied) → keep today's loud DRIFT failure | A read-only clobber (e.g. the 2026-06-05 `kubernetes-woodpecker-default` incident) signals a misbehaving agent flow; auto-papering over it would hide the offender. Log gains a "heal denied — investigate what wrote it" suffix |
-| 4 | Do NOT revoke the clobbering OIDC token | It may still back the user's live login session; it ages out in 7 days on its own |
-| 5 | After a successful heal, revoke stale `token-devvm-wizard` accessors | Anti-sprawl: each heal would otherwise strand the previous periodic **admin** token server-side for up to 32 days. Walk `auth/token/accessors`, revoke every `display_name=token-devvm-wizard` except the just-minted one. Runs only on heal (rare), never on the happy path |
-| 6 | Minted-token sanity check before writing the file | Look up the new token; require `display_name=token-devvm-wizard`. Write via temp file + `mv` + `chmod 600` so a failed mint can never truncate `~/.vault-token` |
-| 7 | Keep timer cadence (daily) and all happy-path behavior unchanged | |
-| 8 | No notification plumbing in this change | devvm alerting is tracked separately (beads `code-aslh`). Heal events are logged; heal-denied/FAIL still fail the unit |
-
-## Behavior matrix
-
-| Token found in `~/.vault-token` | Before | After |
-|---|---|---|
-| Our periodic token | renew-self, log `OK` | unchanged |
-| Foreign, admin-capable (OIDC login) | log `DRIFT`, exit 1 | re-mint periodic token with it, sanity-check, atomic write, revoke stale periodic accessors, log `HEALED: re-minted from foreign dn=<dn> (revoked N stale)`, exit 0 |
-| Foreign, weak (read-only k8s clobber) | log `DRIFT`, exit 1 | log `DRIFT … heal denied — foreign token lacks create authority; investigate what wrote it`, exit 1 |
-| Vault unreachable / lookup fails | log `FAIL`, exit 1 | unchanged |
-
-Re-mint command (identical to the manual recovery the DRIFT log already
-prescribes):
-
-```
-vault token create -orphan -period=768h \
-  -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard
-```
-
-## Testing
-
- **Unit** (`scripts/test-vault-token-renew.sh`, existing source-the-functions
-  harness): new pure functions for (a) the stale-accessor revoke filter
-  (match on `display_name`, exclude the current accessor) and (b) the
-  minted-token sanity predicate; regression cases for the existing drift
-  predicate stay green.
- **Live, post-deploy** (on devvm):
-  1. Mint a fake 1h admin token (`-display-name=fake-oidc`,
-     `-policy=vault-admin -policy=sops-admin`), write to `~/.vault-token`,
-     start the service → expect `HEALED`, file holds `token-devvm-wizard`.
-  2. Mint a fake 10m no-privilege token (`-policy=default`), write it, start
-     the service → expect `DRIFT … heal denied`, unit `failed`; restore real
-     token.
-  3. Revoke both fakes; one-off sweep of stale periodic accessors left by the
-     June 26 / July 3 manual re-mints.
-
-## Docs & rollout
-
- Same commit rewrites the runbook's "Drift guard & recovery" section:
-  self-heal is the recovery for admin-capable clobbers; manual re-mint remains
-  only for weak clobbers (or a dead token with no admin-capable replacement in
-  the file).
- `vault login -method=oidc` instructions across the docs stay as-is — the
-  login is now harmless by design.
- Deploy per the runbook's manual model: `install -m 0755` to
-  `~/.local/bin/vault-token-renew`. Units unchanged — no daemon-reload.
- After landing: update memories #4204/#4211 (gotcha now self-healing).
-
-## Alternatives considered
-
- **Instant heal** (systemd path unit + protected source-copy of the token):
-  strictly more capable (seconds-latency, heals weak clobbers too, zero
-  re-minting), but 2 new units + a second secret file + inotify re-trigger
-  edge cases — machinery disproportionate to the residual risk. Revisit only
-  if the few-hour heal window ever bites.
- **Vault CLI `token_helper` interception**: right interception point in
-  theory, but a helper bug breaks every `vault` CLI call, Terraform reads
-  `~/.vault-token` natively anyway, and it adds latency inside login. Rejected.
- **Docs-only ("never log in")**: rejected by user — the login should keep
-  working, not become forbidden knowledge.
- **Raise the OIDC role's 7-day `token_max_ttl`**: shared role, affects every
-  OIDC user; rejected previously for the same reason (memory #4205).
--- a/docs/plans/2026-07-03-vault-token-self-heal-plan.md
+++ b/docs/plans/2026-07-03-vault-token-self-heal-plan.md
@ -1,443 +0,0 @@
-# Vault Token Renewer Self-Heal Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Make `vault login -method=oidc` harmless on devvm — the nightly renewer re-mints the permanent periodic token from any admin-capable clobber of `~/.vault-token`, unattended.
-
-**Architecture:** Extend the drift branch of `scripts/vault-token-renew.sh` (deployed to `~/.local/bin/vault-token-renew`, driven by an existing systemd user timer). On drift, *attempt* the re-mint with the clobbering token itself and let Vault's 403 be the authority; sanity-check the minted token, replace the file atomically, then revoke stale `token-devvm-wizard` leftovers. Weak clobbers keep today's loud failure. Design: `docs/plans/2026-07-03-vault-token-self-heal-design.md`.
-
-**Tech Stack:** bash + jq + vault CLI; existing test harness `scripts/test-vault-token-renew.sh` (sources the script, `vtr_main` is guarded).
-
-**Working copy:** everything below runs in the worktree
-`~/code/infra/.worktrees/vault-token-self-heal` on branch `wizard/vault-token-self-heal`.
-Per repo policy, EVERY git command in this git-crypt repo worktree carries:
-`-c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false`
-(abbreviated as `$GCFLAGS` below; define once per shell:
-`GCFLAGS="-c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false"`
-and use it unquoted: `git $GCFLAGS <verb> …`).
-
---
-
-### Task 1: Unit tests for the two new pure functions (RED)
-
-**Files:**
- Modify: `scripts/test-vault-token-renew.sh` (append before the final `printf`/exit lines)
-
- [ ] **Step 1: Append the failing tests**
-
-Insert this block immediately after the existing "parse + decide end-to-end" section (after the line `no "oidc: parse+decide refused" …`, before the final `printf '\n%d passed…'`):
-
-```bash
-# --- vtr_accessor: parse accessor out of lookup JSON ---
-LOOKUP_NEW='{"data":{"display_name":"token-devvm-wizard","accessor":"acc-new","policies":["default","sops-admin","vault-admin"],"identity_policies":null}}'
-eq "accessor parsed"          "acc-new" "$(vtr_accessor "$LOOKUP_NEW")"
-eq "accessor absent -> empty" ""        "$(vtr_accessor '{"data":{"display_name":"x"}}')"
-
-# --- vtr_is_stale_periodic: the heal's revoke filter — ONLY old token-devvm-wizard
-# --- tokens are swept; the just-minted token, foreign tokens, and anything with an
-# --- unknown accessor are kept. An empty keep-accessor sweeps NOTHING (fail-safe).
-STALE_OURS='{"data":{"display_name":"token-devvm-wizard","accessor":"acc-old","policies":["default","sops-admin","vault-admin"]}}'
-ok "older periodic token is stale"      vtr_is_stale_periodic "$STALE_OURS" "acc-new"
-no "the just-minted token is kept"      vtr_is_stale_periodic "$LOOKUP_NEW" "acc-new"
-no "foreign oidc token never swept"     vtr_is_stale_periodic "$LOOKUP_OIDC" "acc-new"
-no "woodpecker token never swept"       vtr_is_stale_periodic "$LOOKUP_WP" "acc-new"
-no "missing accessor never swept"       vtr_is_stale_periodic '{"data":{"display_name":"token-devvm-wizard"}}' "acc-new"
-no "empty keep-accessor sweeps nothing" vtr_is_stale_periodic "$STALE_OURS" ""
-```
-
-(`LOOKUP_OIDC` / `LOOKUP_WP` and the `ok`/`no`/`eq` helpers already exist in the file.)
-
- [ ] **Step 2: Run tests, verify they fail**
-
-Run: `bash scripts/test-vault-token-renew.sh`
-Expected: FAILs / `command not found` for `vtr_accessor` and `vtr_is_stale_periodic`; the 17 pre-existing tests stay green.
-
-### Task 2: Implement the pure functions (GREEN)
-
-**Files:**
- Modify: `scripts/vault-token-renew.sh` (insert after `vtr_drift_ok()`, before `vtr_main()`)
-
- [ ] **Step 1: Add the two functions**
-
-```bash
-# vtr_accessor <lookup-json> -> the token accessor (empty if absent).
-vtr_accessor() {
-  printf '%s' "$1" | jq -r '.data.accessor // ""'
-}
-
-# vtr_is_stale_periodic <lookup-json> <keep-accessor> -> 0 if this lookup
-# describes one of OUR periodic tokens (display name matches) that is NOT the
-# one to keep — i.e. a stale leftover a heal should revoke. 1 otherwise.
-# Name-only on purpose (no policy check): anything named token-devvm-wizard
-# that isn't the current token is garbage from a previous mint. An empty
-# keep-accessor sweeps NOTHING (fail-safe: never revoke when we don't know
-# which token is current).
-vtr_is_stale_periodic() {
-  local dn acc
-  [ -n "${2:-}" ] || return 1
-  dn=$(vtr_display_name "$1")
-  acc=$(vtr_accessor "$1")
-  [ "$dn" = "$EXPECTED_DN" ] || return 1
-  [ -n "$acc" ] || return 1
-  [ "$acc" != "$2" ]
-}
-```
-
- [ ] **Step 2: Run tests, verify all pass**
-
-Run: `bash scripts/test-vault-token-renew.sh`
-Expected: `25 passed, 0 failed`, exit 0.
-
- [ ] **Step 3: Commit**
-
-```bash
-cd ~/code/infra/.worktrees/vault-token-self-heal
-git $GCFLAGS add scripts/vault-token-renew.sh scripts/test-vault-token-renew.sh
-git $GCFLAGS commit -m "vault-token-renew: pure helpers for the self-heal revoke filter
-
-vtr_accessor parses the accessor from lookup JSON; vtr_is_stale_periodic
-decides which old token-devvm-wizard tokens a heal may revoke (never the
-just-minted one, never foreign tokens, nothing when the keeper is unknown).
-TDD red-green for the heal branch that lands next."
-```
-
-### Task 3: The heal branch (`vtr_heal` + `vtr_main` wiring)
-
-**Files:**
- Modify: `scripts/vault-token-renew.sh`
-
- [ ] **Step 1: Add `vtr_heal` after `vtr_is_stale_periodic()`, before `vtr_main()`**
-
-```bash
-# vtr_heal <foreign-dn> <log-file> -> 0 if ~/.vault-token was re-minted back to
-# our periodic admin token using the foreign token's own authority, 1 if the
-# heal was denied or failed (caller exits non-zero; the unit goes failed).
-#
-# Self-heal added 2026-07-03 (docs/plans/2026-07-03-vault-token-self-heal-design.md):
-# an OIDC login — which the infra docs prescribe before applies — clobbers
-# ~/.vault-token with a 7-day token, and detect-only drift left that unnoticed
-# for weeks (the weekly-expiry loop). We ATTEMPT the re-mint with the
-# clobbering token itself and let Vault's authz decide — a read-only clobber
-# (the 2026-06-05 woodpecker incident) is denied the mint and stays a loud
-# failure, because it signals a misbehaving flow that someone should look at.
-vtr_heal() {
-  local foreign_dn="$1" log="$2"
-  local errf new_token new_info new_dn new_pols new_acc tmp
-  errf=$(mktemp)
-  if ! new_token=$(vault token create -orphan -period=768h \
-        -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard \
-        -field=token 2>"$errf") || [ -z "$new_token" ]; then
-    printf '%s DRIFT: ~/.vault-token is dn=%q — heal denied, foreign token lacks create authority (%s); investigate what wrote it. Manual re-mint: vault login -method=oidc && vault token create -orphan -period=768h -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard -field=token > ~/.vault-token && chmod 600 ~/.vault-token\n' \
-      "$(date -Is)" "$foreign_dn" "$(tr '\n' ' ' <"$errf")" >>"$log"
-    rm -f "$errf"
-    return 1
-  fi
-  rm -f "$errf"
-
-  # Sanity: the minted token must itself pass the drift guard before it may
-  # replace ~/.vault-token.
-  if ! new_info=$(VAULT_TOKEN="$new_token" vault token lookup -format=json 2>&1); then
-    printf '%s FAIL: heal minted a token but its lookup failed: %s\n' \
-      "$(date -Is)" "$new_info" >>"$log"
-    return 1
-  fi
-  new_dn=$(vtr_display_name "$new_info")
-  new_pols=$(vtr_policies_csv "$new_info")
-  if ! vtr_drift_ok "$new_dn" "$new_pols"; then
-    printf '%s FAIL: heal minted an unexpected token (dn=%q policies=%q) — not writing it\n' \
-      "$(date -Is)" "$new_dn" "$new_pols" >>"$log"
-    return 1
-  fi
-
-  # Atomic replace: mktemp files are 0600 from birth; same-filesystem mv.
-  tmp=$(mktemp "$HOME/.vault-token.XXXXXX")
-  printf '%s' "$new_token" >"$tmp"
-  mv "$tmp" "$HOME/.vault-token"
-
-  # Anti-sprawl: revoke previous token-devvm-wizard tokens — each heal would
-  # otherwise strand the prior periodic ADMIN token server-side for up to 32d.
-  # The clobbering foreign token is deliberately NOT revoked: it may still back
-  # the user's live login session, and it ages out on its own (7d for OIDC).
-  local sweep="accessor sweep skipped (list denied)" accessors a a_info revoked=0
-  new_acc=$(vtr_accessor "$new_info")
-  if [ -n "$new_acc" ] && accessors=$(VAULT_TOKEN="$new_token" vault list -format=json auth/token/accessors 2>/dev/null); then
-    while IFS= read -r a; do
-      [ -n "$a" ] || continue
-      a_info=$(VAULT_TOKEN="$new_token" vault token lookup -format=json -accessor "$a" 2>/dev/null) || continue
-      if vtr_is_stale_periodic "$a_info" "$new_acc"; then
-        VAULT_TOKEN="$new_token" vault token revoke -accessor "$a" >/dev/null 2>&1 && revoked=$((revoked + 1))
-      fi
-    done < <(printf '%s' "$accessors" | jq -r '.[]')
-    sweep="revoked $revoked stale periodic token(s)"
-  fi
-
-  printf '%s HEALED: re-minted periodic token from foreign dn=%q (%s)\n' \
-    "$(date -Is)" "$foreign_dn" "$sweep" >>"$log"
-}
-```
-
- [ ] **Step 2: Rewire the drift branch in `vtr_main`**
-
-Replace this exact block (comment + if):
-
-```bash
-  # Drift guard (added 2026-06-07): the renewer must NOT keep a FOREIGN token alive.
-  # On 2026-06-05 a stray `vault login -method=kubernetes` overwrote ~/.vault-token
-  # with a read-only woodpecker token, and this script then silently renewed THAT
-  # for two days — masking the loss of write access. So before renewing, confirm
-  # the token is our periodic admin token; if it has drifted, fail loudly (systemd
-  # marks the unit failed) instead of keeping someone else's token alive.
-  if ! vtr_drift_ok "$dn" "$pols"; then
-    printf '%s DRIFT: ~/.vault-token is dn=%q policies=%q (expected dn=%q with %q). Refusing to renew a foreign token. Re-mint: vault login -method=oidc && vault token create -orphan -period=768h -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard -field=token > ~/.vault-token && chmod 600 ~/.vault-token\n' \
-      "$(date -Is)" "$dn" "$pols" "$EXPECTED_DN" "$REQUIRED_POLICY" >>"$log"
-    exit 1
-  fi
-```
-
-with:
-
-```bash
-  # Drift guard (2026-06-07) + self-heal (2026-07-03): the renewer must not
-  # keep a FOREIGN token alive (on 2026-06-05 a stray kubernetes login was
-  # silently renewed for two days, masking lost write access). But detect-only
-  # drift proved worse in practice: an OIDC login — which the infra docs
-  # prescribe before applies — clobbers this file too, and the resulting DRIFT
-  # failures went unnoticed for weeks while access degraded to a 7-day token
-  # (the weekly-expiry loop). On drift we now ATTEMPT to heal (see vtr_heal):
-  # re-mint the periodic token with the clobbering token's own authority.
-  # Vault's authz keeps the old guarantee — a token that couldn't legitimately
-  # hold vault-admin is denied the mint, and we still fail loud.
-  if ! vtr_drift_ok "$dn" "$pols"; then
-    vtr_heal "$dn" "$log" || exit 1
-    exit 0
-  fi
-```
-
- [ ] **Step 3: Syntax + lint + regression check**
-
-Run: `bash -n scripts/vault-token-renew.sh && bash scripts/test-vault-token-renew.sh; command -v shellcheck >/dev/null && shellcheck scripts/vault-token-renew.sh`
-Expected: syntax OK, `25 passed, 0 failed`; shellcheck (if installed) reports nothing new.
-
- [ ] **Step 4: Commit**
-
-```bash
-git $GCFLAGS add scripts/vault-token-renew.sh
-git $GCFLAGS commit -m "vault-token-renew: self-heal the periodic token on admin-capable clobber
-
-Viktor asked for 'vault login -method=oidc' to work seamlessly: the OIDC
-login the docs prescribe kept clobbering ~/.vault-token with a 7-day token,
-and detect-only DRIFT failures went unnoticed for weeks (weekly-expiry
-loop, twice in June). On drift the renewer now re-mints the periodic token
-with the clobbering token's own authority (Vault's 403 is the judge — no
-policy guessing), sanity-checks it, replaces the file atomically, and
-revokes stale token-devvm-wizard leftovers. Weak/read-only clobbers still
-fail loudly on purpose. Design: docs/plans/2026-07-03-vault-token-self-heal-design.md"
-```
-
-### Task 4: Docs — runbook + test-file header
-
-**Files:**
- Modify: `docs/runbooks/vault-token-renew-devvm.md` (the `## Drift guard & recovery` section + the healthy-log-line note + `## Tests`)
- Modify: `scripts/test-vault-token-renew.sh` (header comment only)
-
- [ ] **Step 1: Replace the runbook's `## Drift guard & recovery` section with:**
-
-```markdown
-## Drift guard & self-heal
-
-`~/.vault-token` is the Vault CLI's default token sink, so **any** `vault login`
-overwrites it. Two confirmed clobber vectors:
-
-1. `vault login -method=oidc` → replaces it with a 7-day OIDC token (the renewer
-   can't push past the OIDC role's 7-day `token_max_ttl`). The infra docs
-   prescribe this login before applies, so it recurs — it went unnoticed for
-   weeks twice (2026-06-18→26, 2026-06-29→07-03) and read as "Vault expires
-   weekly".
-2. A stray `vault login -method=kubernetes` (e.g. a headless agent flow) →
-   writes a read-only `kubernetes-woodpecker-default` token (can read Vault but
-   **cannot** write `secret/*`). Happened 2026-06-05, unnoticed for two days.
-
-Since 2026-07-03 the renewer **self-heals**
-(`docs/plans/2026-07-03-vault-token-self-heal-design.md`). On a foreign token
-it attempts the re-mint **with the clobbering token's own authority** and lets
-Vault's authz decide:
-
- **Admin-capable clobber (OIDC login)** → re-mints the periodic token,
-  sanity-checks it against the drift guard, atomically replaces
-  `~/.vault-token`, revokes stale `token-devvm-wizard` leftovers
-  (anti-sprawl), logs
-  `HEALED: re-minted periodic token from foreign dn=… (revoked N stale periodic token(s))`
-  and exits 0. The clobbering token is NOT revoked — it may still back a live
-  login session; it ages out on its own.
- **Weak clobber (read-only k8s token)** → the mint is denied; logs
-  `DRIFT: … heal denied, foreign token lacks create authority …; investigate what wrote it`
-  and exits non-zero (unit `failed`). Deliberately loud: this signals a
-  misbehaving agent flow — exactly the 2026-06-05 case.
-
-**Manual recovery** is only needed for the weak-clobber case (the DRIFT log
-line still contains the exact command) — run the
-[mint/re-mint](#mint--re-mint-the-token) block.
-```
-
- [ ] **Step 2: In the runbook's `## Health check` section**, after the "A healthy log line looks like…" sentence, add:
-
-```markdown
-After an OIDC login you'll instead see, at the next nightly run:
-`<ts> HEALED: re-minted periodic token from foreign dn="oidc-…" (revoked N stale periodic token(s))` — that's the self-heal working as designed.
-```
-
- [ ] **Step 3: In the runbook's `## Tests` section**, replace the first sentence with:
-
-```markdown
-`infra/scripts/test-vault-token-renew.sh` unit-tests the drift-guard decision,
-the lookup-JSON parsers (including the exact 2026-06-05 woodpecker-clobber
-case), and the self-heal's revoke filter (which stale periodic tokens a heal
-may sweep).
-```
-
- [ ] **Step 4: Update the test file's header comment** (lines 2–7) to:
-
-```bash
-# Unit tests for the pure functions in vault-token-renew.sh.
-# Sources the script (vtr_main is guarded) and exercises (a) the drift-guard
-# decision — is ~/.vault-token OUR periodic admin token (renew) or a foreign
-# clobber (heal / fail loud)? — whose ABSENCE let the 2026-06-05 woodpecker
-# clobber be silently renewed for two days, and (b) the self-heal's revoke
-# filter — which stale token-devvm-wizard tokens a heal may sweep.
-# Run: bash infra/scripts/test-vault-token-renew.sh
-```
-
- [ ] **Step 5: Run tests once more, then commit**
-
-Run: `bash scripts/test-vault-token-renew.sh`
-Expected: `25 passed, 0 failed`.
-
-```bash
-git $GCFLAGS add docs/runbooks/vault-token-renew-devvm.md scripts/test-vault-token-renew.sh
-git $GCFLAGS commit -m "vault-token-renew runbook: document the self-heal behavior
-
-Drift guard section rewritten: admin-capable clobbers now self-heal at the
-nightly run (HEALED log line); weak clobbers keep the loud DRIFT failure;
-manual re-mint is only the weak-clobber recovery now."
-```
-
-### Task 5: Deploy + live verification (on devvm, as wizard)
-
-**Files:** none (host deploy + live checks)
-
- [ ] **Step 1: Install from the worktree**
-
-```bash
-install -m 0755 ~/code/infra/.worktrees/vault-token-self-heal/scripts/vault-token-renew.sh ~/.local/bin/vault-token-renew
-```
-
-(Units unchanged — no `daemon-reload` needed.)
-
- [ ] **Step 2: Live case 1 — admin-capable clobber heals**
-
-```bash
-export VAULT_ADDR=https://vault.viktorbarzin.me
-export XDG_RUNTIME_DIR=/run/user/$(id -u)
-FAKE_ADMIN=$(vault token create -ttl=1h -policy=vault-admin -policy=sops-admin -display-name=fake-oidc -field=token)
-printf '%s' "$FAKE_ADMIN" > ~/.vault-token
-systemctl --user start vault-token-renew.service; echo "exit=$?"
-tail -1 ~/.local/state/vault-token-renew.log
-vault token lookup | grep -E 'display_name|period'
-```
-
-Expected: `exit=0`; log line `HEALED: re-minted periodic token from foreign dn="token-fake-oidc" (revoked N stale periodic token(s))` with N ≥ 1 (the pre-clobber periodic token is itself swept as stale — by design — along with any strays from the June 26 / July 3 manual re-mints); lookup shows `display_name token-devvm-wizard`, `period 768h`. Note: `FAKE_ADMIN` is a child of the swept old token, so the cascade revokes it too — no cleanup needed.
-
- [ ] **Step 3: Verify exactly ONE periodic token remains server-side**
-
-```bash
-for a in $(vault list -format=json auth/token/accessors | jq -r '.[]'); do
-  vault token lookup -format=json -accessor "$a" 2>/dev/null \
-    | jq -r 'select(.data.display_name=="token-devvm-wizard") | .data.accessor'
-done
-```
-
-Expected: exactly one line, matching `vault token lookup -format=json | jq -r .data.accessor`.
-
- [ ] **Step 4: Live case 2 — weak clobber stays a loud failure**
-
-```bash
-GOOD=$(cat ~/.vault-token)
-FAKE_WEAK=$(vault token create -ttl=10m -policy=default -display-name=fake-weak -field=token)
-printf '%s' "$FAKE_WEAK" > ~/.vault-token
-systemctl --user start vault-token-renew.service; echo "exit=$?"
-systemctl --user is-failed vault-token-renew.service
-tail -1 ~/.local/state/vault-token-renew.log
-printf '%s' "$GOOD" > ~/.vault-token && chmod 600 ~/.vault-token
-vault token revoke "$FAKE_WEAK" >/dev/null
-```
-
-Expected: `exit=1` (start reports the oneshot failure), `is-failed` prints `failed`, log line `DRIFT: ~/.vault-token is dn="token-fake-weak" — heal denied, foreign token lacks create authority (… permission denied …); investigate what wrote it. Manual re-mint: …`.
-
- [ ] **Step 5: Happy path still green**
-
-```bash
-systemctl --user start vault-token-renew.service; echo "exit=$?"
-tail -1 ~/.local/state/vault-token-renew.log
-```
-
-Expected: `exit=0`, log `OK renewed (dn=token-devvm-wizard ttl=2764800s)`.
-
-### Task 6: Land on master + cleanup
-
- [ ] **Step 1: Merge latest master into the branch, re-verify, push**
-
-```bash
-cd ~/code/infra/.worktrees/vault-token-self-heal
-git $GCFLAGS fetch forgejo
-git $GCFLAGS merge forgejo/master
-bash scripts/test-vault-token-renew.sh
-git $GCFLAGS push forgejo HEAD:master
-```
-
-Expected: clean merge (or already up to date), `25 passed, 0 failed`, push accepted. Non-fast-forward → fetch, merge, push again.
-
- [ ] **Step 2: Watch CI to completion**
-
-The push fires the infra Woodpecker `default.yml` (terragrunt apply for changed stacks). This change touches only `scripts/` + `docs/` → expect a fast success / no-op apply. Check (Forgejo-forge infra repo = Woodpecker repo id 82):
-
-```bash
-export VAULT_ADDR=https://vault.viktorbarzin.me
-vault kv get -format=json secret/ci/global | jq -r '.data.data | keys[]'   # find the woodpecker admin token key
-WP_TOKEN=$(vault kv get -field=<that-key> secret/ci/global)
-curl -s -H "Authorization: Bearer $WP_TOKEN" 'https://ci.viktorbarzin.me/api/repos/82/pipelines?perPage=1' | jq '.[0] | {number, status, commit: .commit[0:8]}'
-```
-
-Expected: the pipeline for the pushed commit reaches `status: "success"` (poll until terminal). If it fails, fix before proceeding.
-
- [ ] **Step 3: Remove worktree + branch, reconcile main checkout**
-
-```bash
-git -C ~/code/infra $GCFLAGS worktree remove .worktrees/vault-token-self-heal
-git -C ~/code/infra $GCFLAGS branch -d wizard/vault-token-self-heal
-git -C ~/code/infra status --porcelain   # expect clean before pulling
-git -C ~/code/infra $GCFLAGS pull --ff-only forgejo master
-```
-
-Expected: worktree gone, branch deleted (already merged), main checkout fast-forwards to the landed commit.
-
-### Task 7: Memory + wrap-up
-
- [ ] **Step 1: Update the stale memories** (they say the drift guard is detect-only / recovery is manual):
-
-```bash
-homelab memory recall "vault periodic token renewer drift"   # confirm ids 4204, 4211, 7121 still say detect-only
-homelab memory update 4211 "<original gotcha content, amended: since 2026-07-03 the renewer SELF-HEALS admin-capable clobbers at its nightly run (re-mints the periodic token with the clobbering token's authority + revokes stale token-devvm-wizard leftovers; weak clobbers still fail loudly). An OIDC login on devvm is now harmless. Design: infra docs/plans/2026-07-03-vault-token-self-heal-design.md>"
-homelab memory update 7121 "<original content, amended: PLAYBOOK OBSOLETE for admin clobbers — self-heal shipped 2026-07-03; manual re-mint only needed for weak/read-only clobbers>"
-```
-
-(Fetch each memory's current text first and preserve it — amend, don't replace wholesale.)
-
- [ ] **Step 2: End-of-task extraction** — dispatch the standard M.3 memory-mining subagent per `~/.claude/rules/execution.md`, then give the final summary.
-
---
-
-## Plan self-review (done at write time)
-
- **Spec coverage**: heal-on-admin-clobber (T3), loud-fail-on-weak (T3 + live T5.4), no-revoke-foreign (T3 comment + design decision 4), anti-sprawl sweep + fail-safe filter (T2/T3, live T5.3), minted-token sanity + atomic write (T3), unit tests (T1/T2), runbook (T4), deploy + live sim (T5), memory updates (T7). ✓
- **Placeholders**: `<that-key>` in T6.2 is a deliberate discovery step (key name verified live from Vault, not invented). No other TBDs. ✓
- **Name consistency**: `vtr_accessor`, `vtr_is_stale_periodic`, `vtr_heal`, `EXPECTED_DN` match across tasks; test count 17→25 consistent (8 new cases). ✓
--- a/docs/plans/2026-07-04-backup-mx-design.md
+++ b/docs/plans/2026-07-04-backup-mx-design.md
@ -1,335 +0,0 @@
-# Backup MX — self-hosted store-and-forward relay on Oracle Always-Free — design
-
-Date: 2026-07-04 (v3 — post-challenge; v2 Oracle pivot same day) · Status: design,
-pre-implementation · ADR: [0019](../adr/0019-backup-mx-self-hosted-oracle-relay.md)
-
-v3 incorporates two independent adversarial-challenge reviews (same day). Their
-material corrections are marked **[CH]** throughout — the largest: the v2 drain
-path would never have drained (primary-side smtpd rejects), monitoring-over-
-tailnet was fiction (no cluster→tailnet route exists), and the VM's bounce
-model was wrong (it can never deliver a DSN).
-
-## Goal
-
-Inbound mail for `viktorbarzin.me` must survive homelab outages without loss.
-Requirement level (Viktor, 2026-07-04): **never lose mail; delayed delivery is
-acceptable; budget is $0** (hard constraint — reaffirmed after the Rollernet
-gates failed). A store-and-forward backup MX queues mail while the homelab is
-down and re-delivers when it returns.
-
-Out of scope, explicitly:
-
- Reading new mail *during* an outage.
- Outbound mail during outages.
- The "primary up but hard-bouncing 5xx" misconfig class — a backup MX is
-  never consulted when the primary answers. Separate hardening/alerting track.
-
-Known residual limit (state it plainly): an outage **longer than 30 days**
-loses the queued mail *silently* — the VM cannot emit a bounce to anyone
-(egress 25 blocked), so no sender ever learns. Accepted; 30 days is already
-6× the sender-retry status quo.
-
-## v1 → v2: why Rollernet was dropped (gate evidence, 2026-07-04)
-
-v1 selected Roller Network's free Secondary MX. The validation gates killed it
-before any DNS change:
-
- **G2 FAILED**: the [free-accounts policy](https://rollernet.us/policy/free-accounts.html)
-  caps free mail service at **200 relayed messages or 10 MB per rolling 7
-  days**; overage → domain suspended **48 h answering SMTP 5xx** (permanent
-  bounces), repeatable. Spammers deliberately target backup MXes even while
-  the primary is up, so background spam alone can hold the domain suspended —
-  worse than no backup MX.
- **G1 SHAKY**: same policy page says free accounts are being discontinued.
- **G3 PASSED** (for posterity): `mail{,2}.rollernet.us` present valid LE
-  certs over STARTTLS.
- Signup is Cloudflare-Turnstile-gated — moot given G1/G2.
-
-Viktor's decision: stay free → self-host on Oracle Always-Free. **[CH]** The
-external challenger re-searched the free landscape (DNSExit, KisoLabs,
-DuoCircle, AWS/Azure/GCP/Hetzner/Fly/Vultr/Linode free tiers) and confirmed:
-no credible free managed backup-MX or free VM with a usable port-25 story
-exists in 2026 other than OCI. GCP's free e2-micro also blocks egress 25 and
-is US-regions-only (wrong continent).
-
-## Decision
-
-A minimal **Postfix store-and-forward relay** (`mx2.viktorbarzin.me`) on an
-Oracle Cloud **Always-Free** compute instance, published as a lower-preference
-MX. It accepts mail for `viktorbarzin.me` when the primary is unreachable,
-queues up to 30 days, and drains to the primary when it returns. No mailboxes,
-no third-party terms — the queue-lifetime and reject-behavior knobs are ours.
-
-## Architecture
-
-```
-                         ┌── pri 1  mail.viktorbarzin.me ──► pfSense HAProxy ──► mailserver pod
-sender MTA ──► MX lookup ┤                                        ▲
-                         └── pri 20 mx2.viktorbarzin.me           │ drain: smtp to
-                             (Oracle VM, Postfix relay,           │ mail.viktorbarzin.me:2526
-                              queue ≤ 30 days) ───────────────────┘ (pfSense WAN NAT rdr
-                                                                     2526 → 10.0.20.1:25,
-                                                                     existing HAProxy frontend)
-```
-
- **Normal operation**: senders use pri 1; the VM idles (spammers targeting
-  the backup + transient-blip retries get relayed onward immediately).
- **Outage**: senders fall back to pri 20 → VM accepts + queues → Postfix
-  retries the primary on its native schedule → queue drains after recovery
-  through the standard external ingress path (PROXY v2 → :2525 → rspamd →
-  Dovecot).
- **Custom drain port**: Oracle blocks **egress TCP 25** tenancy-wide
-  (post-2021; exemptions unreliable) — the VM cannot reach
-  `mail.viktorbarzin.me:25`. One pfSense WAN NAT rule `TCP 2526 →
-  10.0.20.1:25` reuses the existing HAProxy frontend unchanged. **[CH]
-  Verified against the runbook**: the frontend binds `*:25` on pfSense (not
-  strictly 10.0.20.1), rdr dst-port rewrite is the existing production
-  pattern (WAN:25 already rewrites to 10.0.20.1:25), and port 2526 collides
-  with nothing (the HAProxy test frontend uses :2525). Inbound TCP 25 **to**
-  the VM is unaffected by Oracle's egress-only block per practitioner
-  evidence (iRedMail/mailcow on OCI: receive works, send doesn't) — **to be
-  proven at gate O2 before any DNS change** (Oracle publishes no positive
-  commitment).
-
-## Oracle account & instance
-
- **Account**: Viktor creates it (human signup; card for identity, $0
-  charged). **Home region is fixed at signup and Always-Free compute exists
-  only there — choose `eu-frankfurt-1` deliberately; there is no
-  try-another-region fallback without a new account. [CH]**
- **[CH] PAYG conversion is a REQUIRED prerequisite, not a recommendation**:
-  Oracle stops idle Always-Free instances (95th-pct CPU < 20% over 7 days — an
-  idle Postfix box qualifies) and demonstrably changes free-tier terms without
-  notice, enforcing by termination (June 2026: A1 allowance silently halved,
-  over-limit instances shut down). PAYG keeps Always-Free resources free and
-  exempts them from idle reclamation.
- **Shape**: `VM.Standard.E2.1.Micro` (x86, 1/8 OCPU burst, 1 GB RAM; 2
-  always-free instances allowed; ample for queue-only Postfix — and untouched
-  by the 2026 A1 cuts). ARM A1 fallback is **unreliable** (halved quota,
-  chronic Frankfurt capacity) — treat E2.1.Micro availability as the gate.
- **[CH] Reserved public IP is mandatory** (`oci_core_public_ip`, reserved):
-  an ephemeral IP rotates on stop/start and would silently break all four
-  IP-keyed controls at once (pfSense NAT source-restriction, the primary's
-  smtpd/rspamd exemptions, the Oracle security list, Prometheus scrape
-  allowlist) — discovered only at the next outage's drain.
- **OS**: Ubuntu 24.04. **[CH] OCI Ubuntu images ship an OS-level iptables
-  ruleset (`/etc/iptables/rules.v4`) that ACCEPTs 22 and REJECTs everything
-  else, independent of security lists** — cloud-init must insert ACCEPT rules
-  for 25/80 (+ scrape ports) ahead of the REJECT and persist them, or gate O2
-  fails on day 1 with a correct security list.
- **Credentials**: OCI API key for Terraform → Vault `secret/viktor`
-  (`oci_*`); web login → Vaultwarden item `Oracle Cloud (backup MX)`.
-
-## Networking & security posture
-
- **Ingress on the VM**: TCP 25 world-open (the service). **[CH] TCP 80
-  world-open permanently** — Let's Encrypt validation is multi-perspective
-  with no published source IPs, so it cannot be source-scoped, and a
-  "open-only-during-renewal" toggle is unspecified automation whose realistic
-  failure mode is an expired cert at day ~90. Nothing listens on 80 outside
-  certbot's seconds-long renewal windows; connection-refused surface is
-  negligible. TCP 9100/9154 (exporters) restricted to the homelab WAN /32
-  (176.12.22.76) in both the Oracle security list and the VM firewall.
- **No public SSH**: management rides the headscale tailnet — cloud-init
-  enrolls via a **preauth key for a dedicated non-OIDC headscale user** with
-  node tag `tag:backup-mx` (headscale 0.28.0 file-mode ACL, content in Vault
-  `secret/headscale` → `headscale_acl`); SSH bound to the tailnet interface.
-  ACL grant: `group:admin → tag:backup-mx:22` (cluster pods are NOT tailnet
-  members — see monitoring). **[CH] Outage caveat**: headscale's control
-  plane + DERP live in the cluster, so mid-outage tailnet reachability is
-  cached-netmap best-effort — the runbook documents the **OCI instance
-  console connection as break-glass** management. (Also fix `vpn.md`'s stale
-  "0.23.x / OIDC-only" claims while in there.)
- **VM compromise blast radius**: plaintext of outage-queued mail + a relay
-  surface contained by `relay_domains = viktorbarzin.me` only, no submission
-  ports, no SASL, no local delivery. The VM is deliberately NOT added to the
-  primary's `mynetworks` (that would let a compromised VM relay arbitrary
-  mail *through* the primary) — per-stage exemptions instead, below.
-
-## Postfix configuration (relay-only, accept-and-queue with 4xx-only hygiene)
-
- `relay_domains = viktorbarzin.me`; `mydestination =` (empty).
- **[CH]** `smtpd_relay_restrictions = permit_mynetworks,
-  reject_unauth_destination` — explicit 5xx for foreign-domain RCPTs (the
-  default tail is `defer_unauth_destination`, whose 4xx invites every relay
-  probe to retry forever).
- **[CH]** `relay_recipient_maps` explicitly set to the wildcard form
-  (`@viktorbarzin.me OK`) — documents accept-all-recipients as a decision
-  (the domain is catch-all; every RCPT is valid by definition).
- `transport_maps`: `viktorbarzin.me smtp:[mail.viktorbarzin.me]:2526`.
- `maximal_queue_lifetime = 30d`. **[CH]** `bounce_queue_lifetime = 1d` and
-  `delay_warning_time = 0` — this host can never deliver a DSN to anyone
-  (egress 25 blocked; its only egress is 2526 to the primary), so undeliverable
-  bounces must be discarded quickly or they rot in the queue for a month and
-  permanently poison the queue-depth alert.
- **[CH]** `message_size_limit = 209715200` — exactly the primary's 200 MB
-  (`POSTFIX_MESSAGE_SIZE_LIMIT`, mailserver main.tf:88). The stock 10 MB
-  default would 552-reject large legitimate mail during outages — the exact
-  loss mode this project exists to prevent. Equal, never higher (higher
-  recreates drain-time rejects).
- **[CH] postscreen on the VM in 4xx-only posture**: pregreet test ON
-  (fire-and-forget bots don't retry; real MTAs do — the whole design already
-  rests on sender retry, so 4xx filtering is loss-free by construction),
-  optionally `postscreen_dnsbl_action = defer` with a conservative threshold.
-  v2's blanket "no DNSBL" conflated 5xx reputation rejects (rightly banned)
-  with 4xx tempfail (harmless); without any hygiene the backup is a 24/7
-  spam backdoor since spammers deliberately deliver to the highest-numbered
-  MX. Zero 5xx from reputation, ever.
- `inet_protocols = ipv4` **[CH]** — the primary publishes an AAAA (HE
-  tunnel) but the IPv6 HAProxy bridge has no :2526 listener; skip the wasted
-  v6 attempt per delivery.
- `smtpd_tls_cert_file` = LE cert for `mx2.viktorbarzin.me` (opportunistic
-  STARTTLS inbound; `smtp_tls_security_level = may` on the drain leg).
- Queue disk: the ~45 GB free boot volume dwarfs any realistic 30-day
-  accumulation for a personal domain.
-
-## TLS
-
-certbot standalone HTTP-01 for `mx2.viktorbarzin.me` (no Cloudflare API token
-on an internet-facing VM). Port 80 permanently open (see above); certbot renew
-timer. The MTA-STS follow-up (separate task; policy host currently dangling —
-below) must list `mx2.viktorbarzin.me` when implemented.
-
-## Primary-side drain enablement **[CH — this section replaces v2's "SPF/DMARC exemption + postscreen permit", which exempted the wrong layers]**
-
-The v2 exemptions targeted postscreen DNSBL (which is **off** on the primary —
-`ENABLE_DNSBL` unset) and rspamd SPF/DMARC scoring — but missed the three
-mechanisms that would actually break the drain. All are keyed on the VM's
-reserved /32 (the PROXY-v2-recovered client IP):
-
-1. **`reject_unknown_client_hostname` bypass** — the primary sets
-   `POSTFIX_REJECT_UNKNOWN_CLIENT_HOSTNAME=1` (main.tf:89); an Oracle IP
-   without full FCrDNS (PTR needs an Oracle SR; limited on free accounts)
-   would be **450-deferred on every drain attempt → the queue never drains →
-   mass-bounces at day 30**. Fix: `check_client_access` permit for the VM /32
-   early in `smtpd_client_restrictions`, and a matching permit at the sender
-   stage (SPOOF_PROTECTION=1 rejects unauthenticated own-domain envelope
-   senders — drained self-addressed/bounced mail would 5xx). Attempt the
-   Oracle PTR anyway (belt and braces).
-2. **Anvil rate-limit exception** — `smtpd_client_message_rate_limit = 30`/min
-   keys on the VM's IP at drain; a >3,600-message backlog would throttle for
-   hours and false-fire the queue alert. Add the VM /32 to
-   `smtpd_client_event_limit_exceptions`.
-3. **rspamd: evaluate the original sender, never 5xx the drain stream** — via
-   the existing override.d ConfigMap pattern (same mount as
-   `dkim_signing.conf`): (a) configure rspamd's **`external_relay`** module
-   (ip_map = VM /32) so SPF/DMARC/IP reputation evaluate against the
-   *original* client IP parsed from the VM's Received header — this keeps
-   DMARC protection for the entire drain stream instead of v2's blanket
-   disable; (b) cap rspamd's **action at the VM /32 to tag/fold — never
-   milter-reject**: the primary's default reject tier (DMS default, active
-   since only dkim_signing is overridden today) would 5xx high-score spam at
-   DATA, forcing the VM to generate DSNs to forged senders = classic
-   backup-MX backscatter → mx2's IP blacklisted. Drained spam lands tagged in
-   the catch-all's Junk instead. Validate the external_relay ↔ settings-rule
-   interplay at gate O5 with a high-spam-score message.
-4. postscreen permit for the /32 (harmless; pregreet never trips a real
-   Postfix client and DNSBL is off — kept for future-proofing only).
-
-## Our-side changes (Terraform unless noted)
-
-1. **New stack `stacks/backup-mx/`** (Tier 1): OCI provider (creds from
-   Vault), VCN + subnet + security list + **reserved public IP** +
-   `VM.Standard.E2.1.Micro` + cloud-init (`templatefile`): **OS iptables
-   ACCEPTs for 25/80/9100/9154 ahead of the OCI image's REJECT rule
-   (persisted)**, postfix + config above, certbot, tailscale→headscale
-   enrollment (preauth key from Vault), node_exporter, postfix_exporter,
-   unattended-upgrades.
-2. **DNS** — `stacks/cloudflared/modules/cloudflared/cloudflare.tf`: A
-   `mx2.viktorbarzin.me` → reserved IP (non-proxied), MX pref 20 → `mx2`.
-   **[CH] Live zone count verified: 195/200 → 197/200 after this change; only
-   3 slots remain and the MTA-STS follow-up needs 1–2 → plan the next
-   record-purge now, not at collision time.**
-3. **pfSense (live network device — approved as part of this plan)**: WAN NAT
-   rdr `TCP 2526 → 10.0.20.1:25` + firewall rule, source-restricted to the
-   reserved IP. **[CH] Scripted** (extend the existing
-   `scripts/pfsense-*-haproxy*.php` bootstrap-script family), not
-   hand-clicked — keeps the git-rebuildable parity the rest of the pfSense
-   mail config has. Config.xml rides the nightly backup.
-4. **Mailserver stack**: the four-layer drain enablement above (client+sender
-   `check_client_access` permits, anvil exception, rspamd external_relay +
-   action cap, postscreen permit) — all keyed to one /32, via the existing
-   `postfix_cf` / `user-patches.sh` / rspamd-override hook points (verified
-   present: main.tf:129-144, 222-281, 467-474).
-5. **Monitoring [CH — replaces v2's tailnet scraping, which had no transport:
-   no cluster→tailnet route exists and no existing target is scraped that
-   way]**: Prometheus scrapes `node_exporter`/`postfix_exporter` on the VM's
-   **public reserved IP**, allowed only from the homelab WAN /32 (Oracle SL +
-   VM firewall); blackbox TCP:25 from the cluster (`BackupMxDown`, warning);
-   MX-set drift assertion (both MX records present). Alerts:
-   `BackupMxQueueStuck` = **non-bounce** queue depth > 0 for 2 h while the
-   primary is healthy (gate on the existing `MailServerDown`/roundtrip
-   series, machine-readable — not prose); bounce residue is excluded by the
-   1-day bounce lifetime. Note: during a full homelab outage Prometheus
-   itself is down — queue growth is unobservable live under ANY transport;
-   what we actually watch is the post-recovery drain. A WAN-IP change stales
-   the Oracle allowlist → visible as ScrapeTargetDown (self-signaling).
-   **Probe semantics note**: once mx2 exists, the Brevo roundtrip probe's
-   mail fails over to mx2 on transient primary blips and arrives minutes late
-   via the drain — `EmailRoundtripFailing` may then mean "delayed via mx2",
-   not "lost"; note in the alert description and runbook.
-6. **Docs (same commit as implementation)**: rewrite `mailserver.md` §"No
-   Backup MX", new runbook `docs/runbooks/backup-mx.md` (`postqueue -p`,
-   forced drain `postqueue -f`, cert renewal, **OCI console break-glass**, VM
-   rebuild from stack, Oracle account facts incl. PAYG + home-region lock),
-   `vpn.md` headscale-version/OIDC staleness fix, monitoring rows.
-
-### MTA-STS finding (unchanged; no action in this change)
-
-`_mta-sts` TXT is published but `mta-sts.viktorbarzin.me` has no record and
-nothing serves the policy — MTA-STS is inert today. When fixed, the policy
-MUST include `mx: mx2.viktorbarzin.me` (and budget its DNS records against the
-3 remaining zone slots).
-
-## Validation gates (in order; any failure → stop and report)
-
-| # | Gate | Method | Failure handling |
-|---|------|--------|------------------|
-| O1 | Oracle account (home region `eu-frankfurt-1`, **fixed forever at signup**), **PAYG conversion done**, E2.1.Micro capacity | Viktor signs up + converts; TF apply | A1-in-home-region is a best-effort fallback only (halved quota, contended); else decision returns to Viktor |
-| O2 | Inbound TCP 25 reachable from the internet (after the OS-iptables fix) | `nc -zv <reserved-ip> 25` from outside + recurring Uptime-Kuma TCP monitor (keeps proving it — Oracle publishes no commitment) | Stop; decision returns to Viktor |
-| O3 | Drain works: VM → `mail.viktorbarzin.me:2526` delivers end-to-end | Test message injected on the VM | Debug pfSense NAT / HAProxy path |
-| O4 | LE cert issued | certbot standalone | STARTTLS is opportunistic — non-blocking for go-live; fix before MTA-STS |
-| O5 | Live failover test — **hardened [CH]** | presence-claim → scale mailserver to 0 (~30 min) → send from Gmail + Brevo **plus a high-spam-score message and a >10 MB message** → confirm queued (`postqueue -p`) → scale up → verify full drain within the anvil-exception expectations, spam folded to Junk (not bounced), headers show original-IP SPF/DMARC evaluation, no DSN generated on the VM, roundtrip probe recovers | Debug or roll back (remove MX record) |
-
-## Failure modes
-
-Covered: cluster/pod outages, pfSense/power/ISP outages ≤ 30 days, WAN IP
-changes, short-retry senders. If pfSense is down the drain waits — Postfix
-retries until it heals.
-
-Not covered: primary-up-but-5xx misconfigs; outbound; mid-outage mailbox
-access; **outages > 30 days lose queued mail silently (no DSN possible)**.
-Simultaneous Oracle+homelab outage = status quo ante (sender retries).
-
-Newly introduced, accepted:
-
- **A pet outside the cluster** — deliberately cattle: rebuilt from TF +
-  cloud-init, patched by unattended-upgrades, scraped by Prometheus. Never a
-  backup target.
- **Oracle free-tier caprice [CH — upgraded from v2's framing]**: Oracle has
-  silently cut Always-Free allowances and terminated over-limit instances
-  (June 2026, A1). Mitigations: PAYG (required), recurring inbound-25 probe,
-  `BackupMxDown`, and the fact that outside an active outage the queue is
-  empty — a surprise reclamation loses nothing, only coverage until rebuilt.
-  Rollernet Basic ($30/yr) stays the documented fallback if OCI sours.
- **Spam hygiene**: 4xx-only postscreen on the VM (pregreet + conservative
-  DNSBL-defer) instead of v2's nothing; drained spam is tagged/folded by
-  rspamd, never bounced.
- Outage mail sits plaintext on Oracle disk ≤ 30 days (single-tenant;
-  accepted).
-
-## Rollback
-
-Remove the MX + A records; wait for `postqueue -p` empty; `terraform destroy`
-on `backup-mx`; delete the pfSense NAT rule (scripted); drop the mailserver
-/32 exemptions. Order matters: MX record first.
-
-## Viktor's manual steps (everything else is mine)
-
-1. Create the Oracle Cloud account — **home region `eu-frankfurt-1`** (fixed
-   forever), card for identity, $0 charged.
-2. **Convert the tenancy to Pay-As-You-Go** (required — idle-reclamation
-   exemption; Always-Free stays $0).
-3. Hand me the tenancy OCID + a console user → I mint the API key, store
-   creds (Vault + Vaultwarden), and build the stack.
-4. Approve the (scripted) pfSense NAT rule when I reach that step.
--- a/docs/plans/2026-07-04-drone-logbook-design.md
+++ b/docs/plans/2026-07-04-drone-logbook-design.md
@ -1,89 +0,0 @@
-# Drone Logbook (Open DroneLog) — Design
-
-**Date:** 2026-07-04
-**Status:** Approved (Viktor, 2026-07-04)
-**Owner request:** "I have a DJI Mini 4 Pro. I'm interested in github.com/ViktorBarzin/drone-logbook" → self-host it in the cluster.
-
-## Goal
-
-Self-host [Open DroneLog](https://github.com/arpanghosh8453/open-dronelog) (upstream of the
-`ViktorBarzin/drone-logbook` fork) at **https://dronelog.viktorbarzin.me** so Viktor can import
-DJI Fly flight logs from his DJI Mini 4 Pro and analyze them privately: telemetry charts, 3D map
-replay, per-flight and lifetime stats. All data stays in the cluster (single DuckDB database).
-
-## Decisions (interview, 2026-07-04)
-
-| Question | Decision |
-|---|---|
-| Deployment form | Self-hosted Docker web app in k8s (not desktop app, not hosted webapp) |
-| Exposure | Public `dronelog.viktorbarzin.me`, **Authentik forward-auth** (`auth = "required"`) |
-| Log ingestion | **Both** manual web upload *and* a server-side auto-import drop folder from day one |
-| Image source | **Upstream** `ghcr.io/arpanghosh8453/open-dronelog:latest` — NOT the fork |
-| Fork disposition | Fork is 0 ahead / 372 behind, adds nothing; delete or park it. Only revive (sync + ADR-0002 GHA build) if Viktor starts modifying the code |
-
-## Architecture
-
-New Tier-1 stack `stacks/drone-logbook/`, modeled line-by-line on `stacks/freshrss/`
-(the closest existing shape: single upstream-image app, own data volume, Keel-updated):
-
- **Namespace** `drone-logbook`, tier `4-aux`, label `keel.sh/enrolled=true` → Kyverno injects
-  Keel poll annotations → auto-upgrades as upstream releases (project is actively maintained).
- **Deployment** (1 replica, `Recreate` — DuckDB is single-writer/embedded):
-  - image `ghcr.io/arpanghosh8453/open-dronelog:latest` (nginx frontend + Axum REST backend, port 80)
-  - memory request=limit **512Mi** (DuckDB import/analytics spikes), cpu request 25m, no cpu limit
-  - standard `KYVERNO_LIFECYCLE_V1` / `KEEL_IGNORE_IMAGE` / `KEEL_LIFECYCLE_V1` lifecycle ignores
- **App data** `/data/drone-logbook` (DuckDB db, cached DJI decryption keys, uploaded originals):
-  **`proxmox-lvm-encrypted` block PVC** `drone-logbook-data-encrypted`, 2Gi, topolvm autoresize →
-  10Gi ceiling. Encrypted class because flight logs are GPS traces of home/travel — sensitive data
-  defaults to `proxmox-lvm-encrypted` per the storage decision rule (`.claude/CLAUDE.md`).
-  Embedded DBs stay off NFS (same rationale documented in the freshrss stack: NFS only for static files).
- **Backup CronJob** `drone-logbook-backup` (mandatory for every proxmox-lvm app): daily 01:30
-  file copy of the data volume → NFS `/srv/nfs/drone-logbook-backup` (dated dirs, 30-day retention,
-  Pushgateway metrics), pod-affinity co-scheduled with the app pod (RWO volume). 01:30 sits outside
-  the 00:00/08:00/16:00 sync-import windows so the DuckDB file is quiescent; retained upload
-  originals make even a torn copy recoverable by re-import. `nfs-mirror` (02:00) ships it to sda →
-  Synology offsite. Vaultwarden pattern.
- **Sync drop folder**: static NFS volume (`modules/kubernetes/nfs_volume`)
-  `192.168.1.127:/srv/nfs/drone-logbook/sync-logs`, mounted **read-only** at `/sync-logs`;
-  `SYNC_LOGS_PATH=/sync-logs`, `SYNC_INTERVAL="0 0 */8 * * *"` (every 8 h).
-  Any producer (Nextcloud sync, scp, a future phone pipeline) drops `.txt` logs there; the app
-  imports them automatically. `KEEP_UPLOADED_FILES=true` keeps re-importable originals in the PVC.
- **Ingress** via `ingress_factory`: `name = "dronelog"`, `auth = "required"` (Authentik
-  forward-auth), `dns_type = "proxied"`. External Uptime Kuma HTTPS monitor comes automatically
-  with the ingress annotation. Homepage tile (group "Media & Entertainment", icon `mdi-quadcopter`).
- **Secrets**: Vault KV `secret/drone-logbook` (`profile_creation_pass`) → ExternalSecret
-  (`vault-kv` ClusterSecretStore) → k8s secret `drone-logbook-secrets` → env
-  `PROFILE_CREATION_PASS`. Gates profile create/delete even for other Authentik-logged-in users.
-  No plan-time secret reads needed (no `data "kubernetes_secret"`).
-  No `DJI_API_KEY` — bundled default is fine at personal import volume; add later if rate-limited.
-
-## Operational notes
-
- **DJI egress dependency**: importing a *new* log file requires the pod to reach DJI's servers
-  once (flight-log decryption key fetch; keys are then cached in the data dir). Remember this when
-  egress enforcement lands (Security wave 1, beads `code-8ywc`).
- The web UI is desktop-first; mobile is functional but basic.
- NFS host prerequisite: `/srv/nfs/drone-logbook/sync-logs` (root:www-data, 2775 — same shape as
-  sibling dirs) and `/srv/nfs/drone-logbook-backup` created on 192.168.1.127 and recorded in
-  `secrets/nfs_directories.txt`. `/srv/nfs` is exported whole-tree, so no `/etc/exports`
-  (`scripts/pve-nfs-exports`) change.
- Backup story = the daily app-level backup CronJob (above) + the host `daily-backup` LVM-snapshot
-  leg + original log files retained both in the drop folder and in the data volume
-  (`KEEP_UPLOADED_FILES=true`).
-
-## Alternatives considered
-
- **Build from the fork** (`ghcr.io/viktorbarzin/...` via GHA, ADR-0002): rejected for now — fork
-  has zero custom commits; a build chain adds maintenance for no benefit. Revisit if code changes
-  are wanted.
- **`auth = "app"` + app profile passwords** (would enable the `opendronelog-sync` native uploader
-  from anywhere): rejected — a single app password guarding GPS traces of home/travel on the open
-  internet is weaker than Authentik; the sync drop folder covers automated ingestion instead.
- **Internal-only (.lan + VPN)**: rejected — Authentik-gated public matches the rest of the
-  homelab and works without VPN while traveling.
- **NFS for the DuckDB data**: rejected — embedded-DB-on-NFS locking risk; freshrss precedent
-  keeps app DB data on proxmox-lvm.
-
-## Implementation
-
-See `2026-07-04-drone-logbook-plan.md`.
--- a/docs/plans/2026-07-04-drone-logbook-plan.md
+++ b/docs/plans/2026-07-04-drone-logbook-plan.md
@ -1,542 +0,0 @@
-# Drone Logbook (Open DroneLog) Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Deploy Open DroneLog (DJI flight-log analyzer) at https://dronelog.viktorbarzin.me — new Tier-1 stack `stacks/drone-logbook/`, upstream image, Authentik-gated, with a DuckDB data PVC and an NFS auto-import drop folder.
-
-**Architecture:** Single Deployment running `ghcr.io/arpanghosh8453/open-dronelog:latest` (nginx + Axum + DuckDB, port 80) in namespace `drone-logbook`; data on a `proxmox-lvm-encrypted` PVC (GPS logs = sensitive data), `/sync-logs` drop folder on static NFS, daily backup CronJob to `/srv/nfs/drone-logbook-backup` (vaultwarden pattern), `ingress_factory` with `auth = "required"`, Keel auto-upgrades via namespace enrollment. Modeled line-by-line on `stacks/freshrss/`. Design: `2026-07-04-drone-logbook-design.md`.
-
-**Tech Stack:** Terraform/Terragrunt (Tier-1 PG state), Vault KV + ESO, ingress_factory, nfs_volume module, Keel/Kyverno.
-
-Terraform is exempt from TDD (execution.md); each task ends with a concrete verification instead.
-
---
-
-### Task 1: Vault secret
-
-**Files:** none (Vault KV only)
-
- [ ] **Step 1.1: Create `secret/drone-logbook` with a generated profile-creation password**
-
-```bash
-vault kv put secret/drone-logbook profile_creation_pass="$(openssl rand -base64 24)"
-```
-
- [ ] **Step 1.2: Verify**
-
-```bash
-vault kv get -field=profile_creation_pass secret/drone-logbook | wc -c
-```
-
-Expected: `33` (32 chars + newline). Never echo the value itself.
-
-### Task 2: NFS drop folder on 192.168.1.127
-
-**Files:**
- Modify: `secrets/nfs_directories.txt` (git-crypt'd — **edit from the MAIN checkout only**, never the worktree; sorted list, add `drone-logbook/sync-logs`)
-
- [ ] **Step 2.1: Create the directories** — world-writable + setgid like `vaultwarden-backup` (the `/srv/nfs` export root-squashes, so pod-root writes land as `nobody`):
-
-```bash
-ssh root@192.168.1.127 'mkdir -p /srv/nfs/drone-logbook/sync-logs /srv/nfs/drone-logbook-backup && chown -R root:www-data /srv/nfs/drone-logbook /srv/nfs/drone-logbook-backup && chmod 2777 /srv/nfs/drone-logbook/sync-logs /srv/nfs/drone-logbook-backup && ls -ld /srv/nfs/drone-logbook/sync-logs /srv/nfs/drone-logbook-backup'
-```
-
-Expected: `drwxrwsrwx ... root www-data ...` for both.
-No `/etc/exports` (`scripts/pve-nfs-exports`) change — `/srv/nfs` is exported whole-tree.
-
- [ ] **Step 2.2: Record them in the declarative list (MAIN checkout, plaintext there)** — insert `drone-logbook-backup` and `drone-logbook/sync-logs` (after `diun`, before `etcd-backup`) in `~/code/infra/secrets/nfs_directories.txt`, then commit that single file to master:
-
-```bash
-git -C ~/code/infra add secrets/nfs_directories.txt
-git -C ~/code/infra commit -m "nfs_directories: add drone-logbook/sync-logs
-
-Drop folder for the new drone-logbook stack's auto-import (SYNC_LOGS_PATH).
-Directory created on 192.168.1.127 root:www-data 2775."
-git -C ~/code/infra push forgejo master
-```
-
-(Trivial single-file exception per execution.md; encrypted files cannot be edited from the worktree.)
-
-### Task 3: Stack files (in the `wizard/drone-logbook` worktree)
-
-**Files:**
- Create: `stacks/drone-logbook/main.tf` (content below)
- Create: `stacks/drone-logbook/terragrunt.hcl` (content below)
- Create: `stacks/drone-logbook/secrets` → symlink to `../../secrets`
- (`backend.tf`, `tiers.tf`, `cloudflare_provider.tf`, `providers.tf`, `.terraform.lock.hcl` are terragrunt-generated and **gitignored** — do NOT create or commit them; the tracked copies in old stacks like freshrss predate the ignore rule)
-
- [ ] **Step 3.1: `terragrunt.hcl`**
-
-```hcl
-include "root" {
-  path = find_in_parent_folders()
-}
-
-dependency "platform" {
-  config_path  = "../platform"
-  skip_outputs = true
-}
-```
-
- [ ] **Step 3.2: `main.tf`** — exact content:
-
-```hcl
-variable "tls_secret_name" {
-  type      = string
-  sensitive = true
-}
-variable "nfs_server" { type = string }
-
-# Open DroneLog (https://github.com/arpanghosh8453/open-dronelog) — self-hosted
-# DJI flight-log analyzer for the DJI Mini 4 Pro. Runs the UPSTREAM image (the
-# ViktorBarzin/drone-logbook fork has no custom commits); Keel tracks :latest.
-# Design: docs/plans/2026-07-04-drone-logbook-design.md
-resource "kubernetes_namespace" "drone_logbook" {
-  metadata {
-    name = "drone-logbook"
-    labels = {
-      tier               = local.tiers.aux
-      "keel.sh/enrolled" = "true"
-    }
-  }
-  lifecycle {
-    # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
-    ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
-  }
-}
-
-resource "kubernetes_manifest" "external_secret" {
-  field_manager {
-    force_conflicts = true
-  }
-  manifest = {
-    apiVersion = "external-secrets.io/v1"
-    kind       = "ExternalSecret"
-    metadata = {
-      name      = "drone-logbook-secrets"
-      namespace = "drone-logbook"
-    }
-    spec = {
-      refreshInterval = "15m"
-      secretStoreRef = {
-        name = "vault-kv"
-        kind = "ClusterSecretStore"
-      }
-      target = {
-        name = "drone-logbook-secrets"
-      }
-      dataFrom = [{
-        extract = {
-          key = "drone-logbook"
-        }
-      }]
-    }
-  }
-  depends_on = [kubernetes_namespace.drone_logbook]
-}
-
-module "tls_secret" {
-  source          = "../../modules/kubernetes/setup_tls_secret"
-  namespace       = kubernetes_namespace.drone_logbook.metadata[0].name
-  tls_secret_name = var.tls_secret_name
-}
-
-# DuckDB database + cached DJI decryption keys + uploaded originals.
-# Embedded DB -> block storage, not NFS (same rationale as freshrss data).
-# Encrypted class: flight logs are GPS traces of home/travel (sensitive data
-# -> proxmox-lvm-encrypted per the storage decision rule in .claude/CLAUDE.md).
-resource "kubernetes_persistent_volume_claim" "data" {
-  wait_until_bound = false
-  metadata {
-    name      = "drone-logbook-data-encrypted"
-    namespace = kubernetes_namespace.drone_logbook.metadata[0].name
-    annotations = {
-      "resize.topolvm.io/threshold"     = "10%"
-      "resize.topolvm.io/increase"      = "100%"
-      "resize.topolvm.io/storage_limit" = "10Gi"
-    }
-  }
-  spec {
-    access_modes       = ["ReadWriteOnce"]
-    storage_class_name = "proxmox-lvm-encrypted"
-    resources {
-      requests = {
-        storage = "2Gi"
-      }
-    }
-  }
-  lifecycle {
-    # The autoresizer expands requests.storage up to storage_limit and PVCs
-    # can't shrink; without this every apply tries to revert the size.
-    ignore_changes = [spec[0].resources[0].requests]
-  }
-}
-
-# Drop folder: any producer (Nextcloud sync, scp, future phone pipeline) lands
-# DJI .txt logs here over NFS; the app auto-imports on SYNC_INTERVAL.
-module "nfs_sync_logs" {
-  source     = "../../modules/kubernetes/nfs_volume"
-  name       = "drone-logbook-sync-logs"
-  namespace  = kubernetes_namespace.drone_logbook.metadata[0].name
-  nfs_server = var.nfs_server
-  nfs_path   = "/srv/nfs/drone-logbook/sync-logs"
-  storage    = "5Gi"
-}
-
-resource "kubernetes_deployment" "drone_logbook" {
-  metadata {
-    name      = "drone-logbook"
-    namespace = kubernetes_namespace.drone_logbook.metadata[0].name
-    labels = {
-      app                             = "drone-logbook"
-      "kubernetes.io/cluster-service" = "true"
-      tier                            = local.tiers.aux
-    }
-  }
-  spec {
-    replicas = 1
-    strategy {
-      # DuckDB is single-writer; never overlap two pods on the same volume
-      type = "Recreate"
-    }
-    selector {
-      match_labels = {
-        app = "drone-logbook"
-      }
-    }
-    template {
-      metadata {
-        labels = {
-          app                             = "drone-logbook"
-          "kubernetes.io/cluster-service" = "true"
-        }
-      }
-      spec {
-        container {
-          name  = "drone-logbook"
-          image = "ghcr.io/arpanghosh8453/open-dronelog:latest"
-          env {
-            name  = "RUST_LOG"
-            value = "info"
-          }
-          env {
-            # keep re-importable originals under /data/drone-logbook/uploaded
-            name  = "KEEP_UPLOADED_FILES"
-            value = "true"
-          }
-          env {
-            name  = "SYNC_LOGS_PATH"
-            value = "/sync-logs"
-          }
-          env {
-            # 6-field cron (sec min hour dom mon dow): scan drop folder every 8h
-            name  = "SYNC_INTERVAL"
-            value = "0 0 */8 * * *"
-          }
-          env {
-            name = "PROFILE_CREATION_PASS"
-            value_from {
-              secret_key_ref {
-                name = "drone-logbook-secrets"
-                key  = "profile_creation_pass"
-              }
-            }
-          }
-          volume_mount {
-            name       = "data"
-            mount_path = "/data/drone-logbook"
-          }
-          volume_mount {
-            name       = "sync-logs"
-            mount_path = "/sync-logs"
-            read_only  = true
-          }
-          port {
-            name           = "http"
-            container_port = 80
-            protocol       = "TCP"
-          }
-          resources {
-            requests = {
-              cpu    = "25m"
-              memory = "512Mi"
-            }
-            limits = {
-              memory = "512Mi"
-            }
-          }
-        }
-        volume {
-          name = "data"
-          persistent_volume_claim {
-            claim_name = kubernetes_persistent_volume_claim.data.metadata[0].name
-          }
-        }
-        volume {
-          name = "sync-logs"
-          persistent_volume_claim {
-            claim_name = module.nfs_sync_logs.claim_name
-          }
-        }
-      }
-    }
-  }
-  depends_on = [kubernetes_manifest.external_secret]
-  lifecycle {
-    ignore_changes = [
-      spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
-      metadata[0].annotations["keel.sh/policy"],
-      metadata[0].annotations["keel.sh/trigger"],
-      metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
-      metadata[0].annotations["keel.sh/match-tag"],
-      spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
-      metadata[0].annotations["kubernetes.io/change-cause"],
-      metadata[0].annotations["deployment.kubernetes.io/revision"],
-      spec[0].template[0].metadata[0].annotations["keel.sh/update-time"], # KEEL_LIFECYCLE_V1
-    ]
-  }
-}
-
-resource "kubernetes_service" "drone_logbook" {
-  metadata {
-    name      = "drone-logbook"
-    namespace = kubernetes_namespace.drone_logbook.metadata[0].name
-    labels = {
-      "app" = "drone-logbook"
-    }
-  }
-
-  spec {
-    selector = {
-      app = "drone-logbook"
-    }
-    port {
-      port        = "80"
-      target_port = "80"
-    }
-  }
-}
-
-# -----------------------------------------------------------------------------
-# Backup — required for every proxmox-lvm(-encrypted) app: daily copy of the
-# data volume to NFS /srv/nfs/drone-logbook-backup (picked up by nfs-mirror ->
-# sda -> Synology offsite). 01:30 = outside the 00:00/08:00/16:00 sync-import
-# windows, so the DuckDB file is quiescent; uploaded originals make even a
-# mid-write copy recoverable by re-import. Pod-affinity co-schedules with the
-# app pod (RWO volume mounts twice only on the same node). Vaultwarden pattern.
-# -----------------------------------------------------------------------------
-
-module "nfs_backup" {
-  source     = "../../modules/kubernetes/nfs_volume"
-  name       = "drone-logbook-backup-host"
-  namespace  = kubernetes_namespace.drone_logbook.metadata[0].name
-  nfs_server = var.nfs_server
-  nfs_path   = "/srv/nfs/drone-logbook-backup"
-}
-
-resource "kubernetes_cron_job_v1" "backup" {
-  metadata {
-    name      = "drone-logbook-backup"
-    namespace = kubernetes_namespace.drone_logbook.metadata[0].name
-  }
-  spec {
-    concurrency_policy            = "Replace"
-    failed_jobs_history_limit     = 5
-    schedule                      = "30 1 * * *"
-    starting_deadline_seconds     = 300
-    successful_jobs_history_limit = 3
-    job_template {
-      metadata {}
-      spec {
-        backoff_limit              = 3
-        ttl_seconds_after_finished = 10
-        template {
-          metadata {}
-          spec {
-            affinity {
-              pod_affinity {
-                required_during_scheduling_ignored_during_execution {
-                  label_selector {
-                    match_labels = {
-                      app = "drone-logbook"
-                    }
-                  }
-                  topology_key = "kubernetes.io/hostname"
-                }
-              }
-            }
-            container {
-              name  = "drone-logbook-backup"
-              image = "docker.io/library/alpine"
-              command = ["/bin/sh", "-c", <<-EOT
-                set -euxo pipefail
-                _t0=$(date +%s)
-                now=$(date +"%Y_%m_%d_%H_%M")
-                mkdir -p /backup/$now
-                cp -a /data/. /backup/$now/
-                # Rotate — 30 day retention
-                find /backup -maxdepth 1 -mindepth 1 -type d -mtime +30 -exec rm -rf {} +
-                _dur=$(($(date +%s) - _t0))
-                _out_bytes=$(du -sb /backup/$now | awk '{print $1}')
-                wget -qO- --post-data "backup_duration_seconds $${_dur}
-                backup_output_bytes $${_out_bytes}
-                backup_last_success_timestamp $(date +%s)
-                " "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/drone-logbook-backup" || true
-              EOT
-              ]
-              volume_mount {
-                name       = "data"
-                mount_path = "/data"
-                read_only  = true
-              }
-              volume_mount {
-                name       = "backup"
-                mount_path = "/backup"
-              }
-            }
-            volume {
-              name = "data"
-              persistent_volume_claim {
-                claim_name = kubernetes_persistent_volume_claim.data.metadata[0].name
-              }
-            }
-            volume {
-              name = "backup"
-              persistent_volume_claim {
-                claim_name = module.nfs_backup.claim_name
-              }
-            }
-            dns_config {
-              option {
-                name  = "ndots"
-                value = "2"
-              }
-            }
-          }
-        }
-      }
-    }
-  }
-  lifecycle {
-    # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
-    ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
-  }
-}
-
-# https://dronelog.viktorbarzin.me
-module "ingress" {
-  source          = "../../modules/kubernetes/ingress_factory"
-  auth            = "required" # Authentik forward-auth — flight logs are GPS traces of home/travel
-  dns_type        = "proxied"
-  namespace       = kubernetes_namespace.drone_logbook.metadata[0].name
-  name            = "dronelog"
-  service_name    = "drone-logbook"
-  tls_secret_name = var.tls_secret_name
-  extra_annotations = {
-    "gethomepage.dev/enabled"      = "true"
-    "gethomepage.dev/name"         = "Drone Logbook"
-    "gethomepage.dev/description"  = "DJI flight log analyzer"
-    "gethomepage.dev/icon"         = "mdi-quadcopter"
-    "gethomepage.dev/group"        = "Media & Entertainment"
-    "gethomepage.dev/pod-selector" = ""
-  }
-}
-```
-
- [ ] **Step 3.3: Boilerplate**
-
-```bash
-ln -s ../../secrets ~/code/infra/.worktrees/drone-logbook/stacks/drone-logbook/secrets
-```
-
- [ ] **Step 3.4: Format check**
-
-```bash
-terraform fmt -check -diff $WT/stacks/drone-logbook/ || terraform fmt $WT/stacks/drone-logbook/
-```
-
-Expected: no diff (or auto-fixed).
-
- [ ] **Step 3.5: Commit on the branch (files by name, git-crypt filter flags per execution.md)**
-
-```bash
-git -C $WT -c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false \
-  add docs/plans/2026-07-04-drone-logbook-design.md docs/plans/2026-07-04-drone-logbook-plan.md \
-      stacks/drone-logbook/main.tf stacks/drone-logbook/terragrunt.hcl stacks/drone-logbook/secrets \
-      .claude/reference/service-catalog.md
-git -C $WT -c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false \
-  commit -m "drone-logbook: new stack — self-hosted Open DroneLog at dronelog.viktorbarzin.me
-
-Viktor asked to self-host the DJI flight-log analyzer for his DJI Mini 4 Pro
-(fork ViktorBarzin/drone-logbook -> upstream arpanghosh8453/open-dronelog).
-Upstream ghcr image with Keel auto-upgrade, DuckDB data on proxmox-lvm PVC,
-NFS /sync-logs drop folder auto-imported every 8h, Authentik-gated ingress,
-PROFILE_CREATION_PASS from Vault via ESO. Design + plan in docs/plans/."
-```
-
-### Task 4: Land and apply
-
- [ ] **Step 4.1: Presence claim** (CI apply mutates shared infra)
-
-```bash
-~/code/scripts/presence claim infra:drone-logbook --purpose "deploy new drone-logbook stack (Open DroneLog) via CI apply"
-```
-
- [ ] **Step 4.2: Merge latest master into the branch, push to master**
-
-```bash
-git -C $WT -c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false fetch forgejo
-git -C $WT -c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false merge forgejo/master
-git -C $WT -c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false push forgejo HEAD:master
-```
-
-Non-fast-forward → another agent landed first: fetch, merge, push again. Branch-protection rejection → fall back to PR via Forgejo API (token = password in `~/.git-credentials`).
-
- [ ] **Step 4.3: Watch the CI apply to completion** — Woodpecker pipeline on the infra repo (`ci.viktorbarzin.me`), then confirm live:
-
-```bash
-kubectl get ns drone-logbook && kubectl -n drone-logbook get deploy,pvc,pods,externalsecret,cronjob
-kubectl -n drone-logbook rollout status deploy/drone-logbook --timeout=300s
-```
-
-Expected: namespace present, ExternalSecret `SecretSynced`, data PVC `Bound` (the NFS PVCs bind on first pod/job use), CronJob `drone-logbook-backup` scheduled `30 1 * * *`, pod `Running 1/1`.
-
- [ ] **Step 4.4: Cleanup worktree + branch; release presence**
-
-```bash
-git -C ~/code/infra worktree remove .worktrees/drone-logbook
-git -C ~/code/infra branch -d wizard/drone-logbook
-git -C ~/code/infra pull --ff-only   # only if main checkout clean/quiescent
-~/code/scripts/presence release infra:drone-logbook
-```
-
-### Task 5: End-to-end verification
-
- [ ] **Step 5.1: Ingress + Authentik gate**
-
-```bash
-curl -sI https://dronelog.viktorbarzin.me | head -5
-```
-
-Expected: `302` redirect into Authentik (NOT `200`, NOT `404`).
-
- [ ] **Step 5.2: App alive behind the gate** (bypass ingress via port-forward, read-only debug)
-
-```bash
-kubectl -n drone-logbook port-forward svc/drone-logbook 18080:80 &
-sleep 2 && curl -s -o /dev/null -w '%{http_code}\n' http://127.0.0.1:18080/ && kill %1
-```
-
-Expected: `200`.
-
- [ ] **Step 5.3: Sync folder visible in-pod**
-
-```bash
-kubectl -n drone-logbook exec deploy/drone-logbook -- ls -ld /sync-logs /data/drone-logbook
-```
-
-Expected: both directories listed; `/sync-logs` read-only mount.
-
- [ ] **Step 5.4: Monitor + homepage** — Uptime Kuma external monitor for `dronelog.viktorbarzin.me` auto-created (ingress annotation); homepage tile under "Media & Entertainment".
-
- [ ] **Step 5.5: Functional import** — Viktor uploads a real Mini 4 Pro `.txt` log via the web UI (or drops it in `/srv/nfs/drone-logbook/sync-logs`); confirms flight appears with charts/map. Requires pod egress to DJI once per new log (decryption key). If an upstream sample log is available, the agent may pre-verify import via the REST API through the port-forward.
--- a/docs/plans/2026-07-04-immich-frame-lan-only-design.md
+++ b/docs/plans/2026-07-04-immich-frame-lan-only-design.md
@ -1,125 +0,0 @@
-# immich-frame: LAN-only access, Portals untouched (2026-07-04)
-
-## Goal
-
-Strangers must no longer be able to view `highlights-immich.viktorbarzin.me`
-(Viktor's London Portal Plus frame) or `highlights-immich-emo.viktorbarzin.me`
-(Emo's Sofia Portal Mini frame) — pages or ImmichFrame API. Both were
-`auth = "none"`, Cloudflare-proxied, fully public.
-
-Who keeps access (per Viktor, this session): the two Portals plus **any
-household device on the Sofia, London, or Valchedrym home networks**. No
-public access, no tailnet requirement. Hard constraint: the Portal app is a
-WebView with the URL **baked in at APK build time** (`portal-immich-frame`,
-`-PframeUrl`), so the exact URLs must keep loading from where the Portals sit
-— zero app rebuilds, zero device touches, zero router changes.
-
-## Design
-
-Two cooperating pieces — the gate and the reachability pointer:
-
-1. **The gate — `home-lans-only` Traefik middleware** (traefik stack, next to
-   `local-only`): `ipAllowList` of `192.168.1.0/24` (Sofia LAN), `10.0.0.0/8`
-   (VLANs, K8s pods `10.10.0.0/16`, services `10.96.0.0/12`, WG tunnel
-   `10.3.2.0/24`), `192.168.8.0/24` (London LAN), `192.168.9.0/24` (London
-   GUEST net — post-rollout discovery: the Portal Plus actually leases here,
-   `Portal-75AE8F9C2A8A` = `192.168.9.198`, added same day), `192.168.0.0/24`
-   (Valchedrym LAN), `fc00::/7`, `fe80::/10`. Attached to both frame
-   ingresses via `extra_middlewares`. Everyone else gets a Traefik 403 —
-   including direct-to-WAN-IP requests carrying the right SNI, which DNS
-   changes alone cannot stop. A **separate** middleware rather than a widened
-   `local-only`, because widening would silently grant the remote LANs access
-   to the 9 admin surfaces using it (Prometheus, iDRAC, Loki, …).
-
-2. **The pointer — `dns_type = "internal"`** (new `ingress_factory` tier,
-   Viktor's idea): a **non-proxied public A record → `10.0.20.203`** (module
-   var `internal_lb_ip`). Outsiders resolve it but get an unroutable RFC1918
-   address; every household resolver path delivers a working answer with no
-   config anywhere: Sofia LAN already gets the internal CNAME from Technitium,
-   London/Valchedrym resolve the public record via any upstream and
-   policy-route `10.0.0.0/8` down the WireGuard tunnel. IPv4-only (spokes
-   route no internal v6 range).
-
-Interlock (the reason both flip together): with a *proxied* record, public
-traffic arrives from cloudflared **pod IPs inside 10/8** and would sail
-through the allowlist. `internal` removes the Cloudflare path entirely (CF
-edge stops serving the hostname), so every request reaches Traefik with its
-real source IP (ETP=Local). Verified: no wildcard `*.viktorbarzin.me` record
-exists to resurrect public resolution.
-
-`auth` stays `"none"` — there is still no *user* auth by design (kiosk
-WebView; forward-auth would 302 the device to a login it can't complete, and
-emo's Google-only account can't log in inside a WebView at all); the
-convention comment now names the ipAllowList as the gate.
-
-### Resulting flows
-
-| Client | Path | Result |
-|---|---|---|
-| Emo's Portal Mini (Sofia LAN) | Technitium CNAME → `.203` direct (unchanged) | allowed (`192.168.1.x`) |
-| Viktor's Portal Plus (London GUEST net) | public A → `10.0.20.203` → WG tunnel | allowed (`192.168.9.x`) |
-| Household browsers (any of the 3 LANs) | same as above | allowed |
-| In-cluster checks (`homelab browser`, blackbox) | CoreDNS → Technitium → `.203` | allowed (pod IP in 10/8) |
-| Stranger, resolves hostname | gets `10.0.20.203` | unroutable |
-| Stranger, hits WAN IP with SNI | pfSense NAT → Traefik (real source IP) | **403** |
-| Stranger, via Cloudflare | no proxied record | CF edge won't serve the host |
-
-### Rejected alternatives
-
- **ImmichFrame `AuthenticationSecret`** (supported upstream: web input field
-  or `?authsecret=` param + bearer API): real auth from anywhere, but family
-  browsers would face a secret prompt (fails "household devices just work"),
-  the secret leaks into URLs/analytics/APK, and robust rollout needs APK
-  rebuild + USB-adb sideload on both Portals (the Sofia one is high-friction).
- **Authentik forward-auth / `auth = "public"`**: WebView can't complete SSO
-  (Google blocks WebView logins; session expiry silently bricks an appliance);
-  the anonymous outpost is an audit trail, not a gate.
- **Remove DNS + London router AdGuardHome rewrites**: works, but adds an
-  out-of-band, un-IaC'd router dependency the internal-IP record makes
-  unnecessary. Kept as documented fallback if resolver-side private-IP
-  filtering ever appears in the London path.
-
-## Pre-verified facts (2026-07-04)
-
- London Flint 2 DNS chain returns RFC1918 answers unfiltered
-  (`nslookup 10.0.20.203.nip.io 127.0.0.1` on the router → `10.0.20.203`;
-  dnsmasq `rebind_protection '0'`, no AdGuardHome rebind filtering).
- Technitium already CNAMEs both hostnames → apex → `10.0.20.203`
-  (`technitium-ingress-dns-sync` is ingress-driven, not DNS-record-driven, so
-  the internal answer survives the Cloudflare record swap).
- Pod CIDR `10.10.0.0/16`, service CIDR `10.96.0.0/12` — inside `10.0.0.0/8`.
- No public wildcard record in the zone.
-
-## Blast radius & cleanups
-
- `external_monitor = false` set explicitly on both ingresses: the
-  external-monitor-sync default opt-in would otherwise keep the now-doomed
-  `[External] highlights-immich*` uptime-kuma monitors alive and red. Verify
-  the sync drops them post-apply.
- rybbit CF worker: `highlights-immich` removed from `SITE_IDS` (`index.js`)
-  and `wrangler.toml` routes — off Cloudflare the route can never fire.
-  Requires a `wrangler deploy` to take effect (route removal is hygiene, not
-  functional).
- Homepage dashboard link keeps working from LANs (hostname unchanged).
- Docs updated in the same change: `.claude/CLAUDE.md` (DNS tier +
-  external-monitor mechanism), `AGENTS.md`, `docs/architecture/networking.md`
-  (Internal-IP domains category). The `portal-immich-frame` repo's glossary
-  ("public, login-less URL") updated separately in that repo.
-
-## Failure-mode delta
-
-London frame now depends on the WG tunnel instead of Cloudflare+cloudflared
-(the app self-heals with 5s retries; tunnel-flap modes documented in
-`docs/architecture/vpn.md`). A Traefik LB renumber must update
-`internal_lb_ip` in the module alongside the split-horizon apex record.
-Cutover window: cached proxied answers keep working ≤ ~5 min TTL, then the
-WebView's own retry picks up the new path.
-
-## Verification & rollback
-
-Verify: public dig → `10.0.20.203` (both hosts); Technitium dig → `.203`;
-curl from devvm (10/8) → 200; external vantage (WebFetch/cloud) → unreachable
-or 403; middleware attached on both ingresses; Emo's frame renders via
-`homelab browser`; London Portal image fetches visible in Traefik access logs
-from `192.168.8.x`. Rollback: `git revert` + apply traefik/immich — records
-and middleware chain restore (`allow_overwrite = true` re-adopts the records).
--- a/docs/post-mortems/2026-06-22-devvm-mem-io-overload-containment.md
+++ b/docs/post-mortems/2026-06-22-devvm-mem-io-overload-containment.md
@ -129,40 +129,3 @@ heavy user between 12–16G even with RAM free; bump to 16/20 if that bites.
  storm also neuters oomd. earlyoom (free-RAM threshold, swap-independent) is the
  correct pairing. A famous tool that "does OOM" still has to be proven to fire
  under *your* configuration.
-
-## Addendum (2026-07-02): the MemoryHigh throttle band livelocks — removed
-
-The soft-cap layer of this design was falsified in production on 2026-07-02
-(~15:42–16:35 UTC): an agent-spawned `ugrep` (12.35G RSS; `-o` with wide
-alternation captures over a multi-GB `.jsonl` transcript) **plateaued inside
-t3-serve@wizard's `MemoryHigh=12G..MemoryMax=16G` band**. With
-`MemorySwapMax=0` its anonymous pages were unreclaimable, so the kernel parked
-every allocating task of the cgroup in `mem_cgroup_handle_over_high`
-(`memory.pressure full avg60 ≈ 80%`, `memory.events high=882948`, `oom_kill=0`)
-— including the `t3 serve` event loop (~0.5G RSS, pure collateral). The accept
-queue backed up (21 pending connections), t3-probe logged `t3serve: [Errno 104]
-Connection reset by peer`, t3-dispatch logged `proxy error: context canceled`,
-and t3.viktorbarzin.me was dead for its user until the hog was SIGKILLed by
-hand (the D-state high-throttle sleep IS killable; the cgroup dropped 14G→1.4G
-and the service recovered in seconds with no restart).
-
-The Verification bullet above — a soft-capped balloon "throttled to a crawl,
-making no progress and **harming nothing**" — holds only when the hog is alone
-in its cgroup. Sharing the cgroup with a latency-sensitive server, the crawl
-IS the harm: a hog that stabilises below `MemoryMax` never triggers the local
-OOM the design counted on, so the band converts "runaway dies" into "everyone
-in the cgroup stalls forever".
-
-**Fix (same day, admin-approved): `MemoryHigh=infinity` on all three work
-cgroup definitions** — `scripts/t3-serve@.service`, the `user-.slice.d`
-drop-in, and `docker.slice` (`setup-devvm.sh` §10a/§10c). A runaway now runs
-unthrottled into `MemoryMax` and is cgroup-OOM-killed immediately
-(`OOMPolicy=continue` keeps t3-serve itself alive; in slices the kernel kills
-the biggest task). `MemoryMax`, `MemorySwapMax=0`, and earlyoom — the layers
-the stress tests actually validated — are unchanged. Applied live via
-`daemon-reload` + runtime `set-property` on the running cgroups; no session
-restarts.
-
-Lesson: **with `swap=0`, `memory.high` is not a gentler `memory.max` — it is
-an unbounded stall injector for everything sharing the cgroup.** Cap-and-kill
-beats throttle-and-pray for multi-tenant interactive services.
--- a/docs/runbooks/paperless-mail-ingest.md
+++ b/docs/runbooks/paperless-mail-ingest.md
@ -1,135 +0,0 @@
-# Paperless-ngx Mail Ingest (docs@viktorbarzin.me)
-
-Last updated: 2026-07-03 (initial build)
-
-Forward any email with document attachments to **`docs@viktorbarzin.me`** and
-paperless-ngx ingests the attachments, owned by the paperless account mapped
-from the **sender** (From) address. Built entirely from existing parts: a
-docker-mailserver mailbox + Dovecot sieve, and paperless-ngx's native mail
-consumer (the same machinery as the `utility:` rules).
-
-## Flow
-
-```
-family member forwards email ──> MX ──> docker-mailserver
-    │  postfix virtual: docs@ has an explicit self-alias (extra/aliases.txt),
-    │  so the @domain catch-all (→ spam@, swept by TripIt) does NOT apply
-    ▼
-Dovecot LMTP delivery to docs@
-    │  per-user sieve (docs@viktorbarzin.me.dovecot.sieve): sender NOT in
-    │  allowlist → discard (decision 2026-07-03: unmatched = ignore & delete)
-    ▼
-docs@ INBOX ── paperless-ngx mail task (every 10 min, PAPERLESS_EMAIL_TASK_CRON
-    │          default) applies mail rules in order: filter_from = <sender>
-    │          → consume attachments (ALL parts incl. inline — see design
-    │          notes: Apple Mail marks real PDFs inline), owner = mapped user,
-    │          tag = email-ingest, title = mail subject
-    ▼
-consumed mail is MOVED to the "Processed" IMAP folder (audit trail);
-INBOX stays empty in steady state
-```
-
-## Sender → paperless account map (as built)
-
-| Sender (From)            | Paperless user | Rule            |
-|--------------------------|----------------|-----------------|
-| me@viktorbarzin.me       | root (id 3)    | forward: Viktor (me@) |
-| vbarzin@gmail.com        | root (id 3)    | forward: Viktor (gmail) |
-| viktorbarzin@meta.com    | root (id 3)    | forward: Viktor (meta) |
-| ancaelena98@gmail.com    | anca (id 4)    | forward: Anca   |
-| emil.barzin@gmail.com    | emo (id 7)     | forward: Emo    |
-
-The map lives in **two places by design** — keep them in sync:
-
-1. **Delivery gate (infra, Terraform):**
-   `stacks/mailserver/modules/mailserver/extra/docs-at-viktorbarzin.me.dovecot.sieve`
-   — senders not listed here are discarded at delivery (spam control + the
-   "ignore and delete unmatched" behaviour; paperless cannot express
-   "delete without ingesting", so this must happen before the mailbox).
-2. **Owner map (paperless DB, via API/UI):** one mail rule per sender on the
-   `docs@viktorbarzin.me` mail account. DB-state like workflows — NOT
-   Terraform.
-
-## Add a family member / sender
-
-1. Add the address to the sieve allowlist file above; commit; apply the
-   `mailserver` stack (normal apply is enough — the sieve CM key is not under
-   `ignore_changes`; Reloader restarts the pod).
-2. Clone an existing `forward:` mail rule in the paperless admin UI
-   (Mail → Rules) or via API, changing `filter_from` and the rule **owner**
-   (documents are owned by the rule owner — `assign_owner_from_rule=true`).
-   Keep: action = Move to `Processed`, attachment type = **process all files
-   including inline** (`attachment_type=2` — NOT attachments-only, see design
-   notes), consumption scope = attachments only, tag `email-ingest`, order
-   after the existing rules.
-
-## Operations
-
- **Trigger a fetch immediately** (instead of waiting ≤10 min):
-  `kubectl -n paperless-ngx exec deploy/paperless-ngx -c paperless-ngx -- s6-setuidgid paperless python3 manage.py mail_fetcher`
-  The `s6-setuidgid paperless` is **required**: `kubectl exec` runs as root, and a
-  root-run fetcher downloads attachments root-owned into the scratch dir, which
-  the celery consumer (uid 1000) then can't read — `PermissionError` on
-  `/tmp/paperless/paperless-mail-*/...`, consume task FAILURE (hit during the
-  2026-07-03 build E2E). The mail correctly stays in INBOX for retry (the move
-  action is a chord callback on successful consumption). Recover: `rm -rf
-  /tmp/paperless/paperless-mail-*` (as root) and let the next scheduled fetch
-  re-process.
- **Mailbox credentials:** Vault `secret/platform` → `mailserver_accounts`
-  JSON, key `docs@viktorbarzin.me` (also used by the paperless mail account).
- **Inspect the mailbox:**
-  `python3 -c` IMAP to `mailserver.mailserver.svc.cluster.local:993` (in-cluster,
-  from a pod) or `mail.viktorbarzin.me:993` (externally / devvm).
- **Paperless-side logs:** `kubectl -n paperless-ngx logs deploy/paperless-ngx | grep -i mail`
-  (also Loki, ns `paperless-ngx`). Rule/account state: `GET /api/mail_rules/`,
-  `GET /api/mail_accounts/` with the admin token
-  (k8s secret `paperless-ngx-secrets`, field `api_token`).
- **Account/mailbox provisioning:** adding/rotating anything in
-  `mailserver_accounts` requires the ConfigMap replace workaround —
-  `scripts/tg apply mailserver -- -replace=module.mailserver.kubernetes_config_map.mailserver_config`
-  — because `postfix-accounts.cf` is under `ignore_changes`
-  (non-deterministic bcrypt; see the module comment).
-
-## Design notes / caveats
-
- **Why not the catch-all?** Mail to unknown `@viktorbarzin.me` addresses
-  lands in `spam@`, which the TripIt `ingest-plans` CronJob sweeps every
-  15 min: it marks everything `\Seen`, LLM-parses mail from linked senders and
-  replies with ack/failure emails. Forwarded bank statements would get
-  "couldn't parse a trip" replies. `docs@` being a real mailbox bypasses that
-  path entirely; TripIt, the `smoke-test@` roundtrip probe, and `dmarc@` are
-  untouched.
- **Spoofing:** the sender match is on the From header. Rspamd verifies
-  SPF/DKIM/DMARC on inbound mail, but gmail.com publishes `p=none`, so a
-  crafted spoof could ingest documents into a family member's account. Accepted
-  risk (worst case: unwanted documents appear, visible + deletable in
-  paperless).
- **Not PDF-only:** any attachment type paperless supports is consumed
-  (PDF, images, Office via the existing tika+gotenberg pipeline).
- **Inline attachments ARE processed (`attachment_type=2`, flipped
-  2026-07-03):** the rules originally used attachments-only (1) to skip
-  signature logos, but the very first real forward (Apple Mail, Viktor's
-  client) attached the invoice PDF with `Content-Disposition: inline` —
-  paperless matched the rule, consumed nothing, and recorded
-  `PROCESSED_WO_CONSUMPTION` (which, like any ProcessedMail row, blocks that
-  UID from ever being re-processed — delete the row via `manage.py shell` to
-  retry). Trade-off: signature/inline images in forwards may be ingested as
-  junk docs (tagged `email-ingest`, easy to spot). If that gets noisy, add
-  `filter_attachment_filename_exclude` patterns to the rules using the
-  actually-observed junk filenames — do NOT flip back to attachments-only.
- **No dedicated alerting** (deliberate, 2026-07-03): mail-task errors surface
-  in paperless logs; the mailserver inbound path is covered by
-  `email-roundtrip-monitor`. Revisit if forwards start silently failing.
- **Workflows:** the global `payslip-webhook` + `claude-mcp-readers
-  auto-permission` workflows fire for mail-ingested docs like any other
-  consumption source (verified pre-build; payslip receiver does its own
-  filtering).
-
-## Rollback
-
-1. Disable/delete the 5 `forward:` mail rules + the `docs@` mail account
-   (paperless admin UI or API).
-2. Revert the infra commit (aliases.txt entry, sieve file, CM key + mount).
-3. Remove `docs@viktorbarzin.me` from Vault `mailserver_accounts`, then apply
-   with the `-replace` workaround above. Mail to docs@ then falls back to the
-   catch-all (spam@) like any unknown address.
--- a/docs/runbooks/t3-drop-attribution.md
+++ b/docs/runbooks/t3-drop-attribution.md
@ -109,17 +109,10 @@ rate(node_pressure_memory_stalled_seconds_total{instance="devvm"}[5m])
 node_memory_SwapFree_bytes{instance="devvm"}
 ```

-Guardrails in place (2026-06-10, hardened 2026-07-02; `scripts/t3-serve@.service`):
-per-unit `MemoryMax=16G`, `MemorySwapMax=0`, `OOMPolicy=continue`, and
-`MemoryHigh=infinity` — deliberately NO soft throttle band. With swap=0, a hog
-plateauing between high and max never OOMs and the kernel high-throttle stalls
-the whole unit: a 12.3G agent `ugrep` livelocked t3-serve@wizard for ~50min on
-2026-07-02 (signature: probe `t3serve` leg `Connection reset by peer`, dispatch
-`proxy error: context canceled`, server D-state in `mem_cgroup_handle_over_high`,
-`ss` backlog on the serve port; fix: SIGKILL the hog — the D-state is killable).
-A runaway agent now OOMs alone at 16G inside the cgroup instead of throttling
-the WS server with it. Post-mortem addendum:
-`docs/post-mortems/2026-06-22-devvm-mem-io-overload-containment.md`.
+Guardrails in place (2026-06-10, `scripts/t3-serve@.service`): per-unit
+`MemoryHigh=12G`, `MemoryMax=16G`, `MemorySwapMax=0`, `OOMPolicy=continue` —
+a runaway agent now OOMs alone inside the cgroup instead of taking the box
+(and the WS server) with it.

 ## 4. Known root causes (2026-06-10 investigation)

--- a/docs/runbooks/valia-sites.md
+++ b/docs/runbooks/valia-sites.md
@ -1,98 +0,0 @@
-# Valia sites — add / update / retire
-
-Off-infra static sites authored by Valia (ADR-0018, CONTEXT.md "Valia site").
-Serving: Cloudflare Pages. Freshness: the `valia-sites-sync` CronJob
-(`valia-sites` ns) mirrors each Content folder every 10 minutes and deploys
-only when the folder's manifest hash changed. Registry: `local.sites` in
-`stacks/valia-sites/main.tf` — one entry per site drives everything (Pages
-project, custom domain, public CNAME, internal split-horizon CNAME, sync).
-
-Current sites: `bridge` (ОбУ „Отец Паисий“ — "мост"), `stem95su` (95. СУ STEM
-board).
-
-## Add a site
-
-1. Valia shares the Drive folder with **vbarzin@gmail.com** (viewer is enough —
-   the pipeline is strictly read-only towards Drive).
-2. Get the folder id from its URL (`drive.google.com/drive/folders/<ID>`).
-3. Pick the **English** subdomain name (Viktor's call — CONTEXT.md naming rule).
-4. Add one entry to `local.sites` in `stacks/valia-sites/main.tf`:
-
-   ```hcl
-   <name> = {
-     folder_id  = "<ID>"
-     src_path   = ""            # or "sub/folder" if servable files live deeper
-     entry_file = "index.html"  # or whatever her main HTML file is called
-     manage_dns = true
-   }
-   ```
-
-5. Commit + push; CI applies. Within ~10 min the sync deploys content and the
-   site serves at `https://<name>.viktorbarzin.me` (custom-domain TLS takes
-   ~5–10 min extra on first attach — CF returns 522 for the hostname until
-   then). Internal LAN/VLAN/pod resolution appears when the hourly
-   `technitium-ingress-dns-sync` next runs — trigger it early with:
-   `kubectl create job --from=cronjob/technitium-ingress-dns-sync valia-dns-now -n technitium`
-
-## Content rules (what Valia's folder must look like)
-
- The **entry file** must exist — the sync stages a copy as `index.html` at
-  deploy time, so `/` works; the original filename keeps working too (deep
-  links survive). If the folder is empty or the entry file is missing, the
-  sync **skips the site and leaves it as-is** (never wipes a live site).
- Google-native files (Docs/Sheets) are **ignored** (`--drive-skip-gdocs`) —
-  only real files (`.html`, images, …) deploy. Gemini's HTML exports are fine.
- Per-file limit 25 MB (Cloudflare Pages), 20k files max — far beyond a
-  1-page site.
-
-## Update a site
-
-Nothing to do: Valia edits the folder, the site follows within ~10 minutes.
-Force it early: `kubectl create job --from=cronjob/valia-sites-sync sync-now -n valia-sites`
-
-## Rename / retire a site
-
-Rename = retire + add (Pages projects can't be renamed). Retire:
-
-1. Delete the entry from `local.sites`; commit + push. TF destroys the public
-   CNAME + custom domain + Pages project; the internal record is removed by
-   the next `technitium-ingress-dns-sync` run (its deletion pass drops any
-   internal `*.pages.dev` CNAME that left the `valia-sites-dns` ConfigMap —
-   scoped so it can never touch non-Pages records).
-2. That's all — no manual DNS cleanup (the pre-ADR-0018 add-only gotcha is
-   fixed by the deletion pass).
-
-## Failure modes / debugging
-
- **Visibility is failed-Job-only by choice** (ADR-0018): no alerts, no
-  notifications. Check: `kubectl get jobs -n valia-sites | tail`, logs of the
-  last `valia-sites-sync-*` pod.
- **Drive auth broken** (`FATAL … Drive list failed`): the shared
-  `secret/valia-sites.rclone_conf` token died. The GCP OAuth app
-  (`home-lab-1700868541205`) must stay published to "Production" or refresh
-  tokens expire weekly (same constraint as the old stem95su conf, which this
-  one was copied from). Re-mint and `vault kv patch secret/valia-sites
-  rclone_conf=@…`.
- **Wrangler auth broken**: `secret/valia-sites.cloudflare_pages_token` is a
-  SCOPED token (Pages Read+Write on the account, id
-  `355d2c9d11579bdad1e9498dafca30d5`) — re-mint via
-  `POST /user/tokens` with the Global API Key (`secret/platform`), patch
-  Vault. Do NOT put the Global API Key in the pod.
- **Site serves stale content**: check the state CM
-  (`kubectl get cm valia-sites-state -n valia-sites -o yaml`) — deleting a
-  site's key forces a redeploy on the next run.
- **`GUARD … skipping`** in logs: Valia's folder is empty or renamed the
-  entry file — the site deliberately kept its last content. Fix the folder or
-  update `entry_file`.
-
-## History
-
- stem95su served in-cluster (nginx + NFS + its own rclone CronJob) until
-  2026-07-03, when it was cut over to this pattern and the old stack retired
-  (ADR-0018). The blocking 42.9 MB `stem_video.mp4` was compressed to 21.4 MB
-  (same 1080p, ~2.5 Mbps H.264) and replaced in Valia's folder with Viktor's
-  explicit one-time OK. `secret/stem95su` is superseded by
-  `secret/valia-sites`; `/srv/nfs/stem-site` on the PVE host is a harmless
-  leftover.
- bridge started as a hand-deployed wrangler experiment (2026-07-03, memory
-  id 7085) and was adopted into the stack the same day.
--- a/docs/runbooks/vault-token-renew-devvm.md
+++ b/docs/runbooks/vault-token-renew-devvm.md
@ -82,48 +82,33 @@ tail -5 ~/.local/state/vault-token-renew.log              # recent results
 A healthy log line looks like:
 `<ts> OK renewed (dn=token-devvm-wizard ttl=2764800s)` (ttl 2764800s = 768h).

-After an OIDC login you'll instead see, at the next nightly run:
-`<ts> HEALED: re-minted periodic token from foreign dn=oidc-… (revoked N stale periodic token(s))`
-— that's the self-heal working as designed.
-
-## Drift guard & self-heal
+## Drift guard & recovery

 `~/.vault-token` is the Vault CLI's default token sink, so **any** `vault login`
 overwrites it. Two confirmed clobber vectors:

 1. `vault login -method=oidc` → replaces it with a 7-day OIDC token (the renewer
-   can't push past the OIDC role's 7-day `token_max_ttl`). The infra docs
-   prescribe this login before applies, so it recurs — it went unnoticed for
-   weeks twice (2026-06-18→26, 2026-06-29→07-03) and read as "Vault expires
-   weekly".
+   can't push past the OIDC role's 7-day `token_max_ttl`).
 2. A stray `vault login -method=kubernetes` (e.g. a headless agent flow) →
   writes a read-only `kubernetes-woodpecker-default` token (can read Vault but
-   **cannot** write `secret/*`). Happened 2026-06-05, unnoticed for two days.
+   **cannot** write `secret/*`). This happened 2026-06-05 and went unnoticed for
+   two days — reads worked, writes silently 403'd.

-Since 2026-07-03 the renewer **self-heals**
-(`docs/plans/2026-07-03-vault-token-self-heal-design.md`). On a foreign token
-it attempts the re-mint **with the clobbering token's own authority** and lets
-Vault's authz decide:
+To stop the renewer from silently keeping a foreign token alive, it runs a
+**drift guard** first: it refuses to renew unless the token is
+`token-devvm-wizard` **and** carries `vault-admin`. On drift it logs loudly and
+exits non-zero (the systemd unit goes `failed`) rather than renewing someone
+else's token. Symptom in the log:

- **Admin-capable clobber (OIDC login)** → re-mints the periodic token,
-  sanity-checks it against the drift guard, atomically replaces
-  `~/.vault-token`, revokes stale `token-devvm-wizard` leftovers
-  (anti-sprawl), logs
-  `HEALED: re-minted periodic token from foreign dn=… (revoked N stale periodic token(s))`
-  and exits 0. The clobbering token is NOT revoked — it may still back a live
-  login session; it ages out on its own.
- **Weak clobber (read-only k8s token)** → the mint is denied; logs
-  `DRIFT: … heal denied, foreign token lacks create authority …; investigate what wrote it`
-  and exits non-zero (unit `failed`). Deliberately loud: this signals a
-  misbehaving agent flow — exactly the 2026-06-05 case.
+`<ts> DRIFT: ~/.vault-token is dn=... policies=... Refusing to renew a foreign token. Re-mint: ...`

-**Manual recovery** is only needed for the weak-clobber case (the DRIFT log
-line still contains the exact command) — run the
-[mint/re-mint](#mint--re-mint-the-token) block.
+**Recovery: re-mint** (the DRIFT log line contains the exact command) — run the
+[mint/re-mint](#mint--re-mint-the-token) block. The drift guard detects but does
+**not** auto-recover (a deliberate scope choice — version-only, no self-heal);
+recovery is the manual re-mint above.

 ## Tests

-`infra/scripts/test-vault-token-renew.sh` unit-tests the drift-guard decision,
-the lookup-JSON parsers (including the exact 2026-06-05 woodpecker-clobber
-case), and the self-heal's revoke filter (which stale periodic tokens a heal
-may sweep). Run: `bash infra/scripts/test-vault-token-renew.sh`.
+`infra/scripts/test-vault-token-renew.sh` unit-tests the drift-guard decision
+and the lookup-JSON parsers (including the exact 2026-06-05 woodpecker-clobber
+case). Run: `bash infra/scripts/test-vault-token-renew.sh`.
--- a/modules/kubernetes/ingress_factory/main.tf
+++ b/modules/kubernetes/ingress_factory/main.tf
@ -127,29 +127,20 @@ variable "anti_ai_scraping" {
 variable "dns_type" {
  type        = string
  default     = "none"
-  description = <<-EOT
-    Cloudflare DNS: 'proxied' (CNAME to tunnel), 'non-proxied' (A/AAAA to
-    public IP), 'internal' (A to the internal Traefik LB IP — resolvable from
-    any resolver but only ROUTABLE from home LANs / WG sites / VPN; the record
-    is a reachability pointer, NOT a gate: pair it with an ipAllowList via
-    extra_middlewares, e.g. traefik-home-lans-only@kubernetescrd, because
-    direct-to-WAN-IP requests with the right SNI still hit Traefik), or 'none'.
-  EOT
+  description = "Cloudflare DNS: 'proxied' (CNAME to tunnel), 'non-proxied' (A/AAAA to public IP), or 'none'"
  validation {
-    condition     = contains(["proxied", "non-proxied", "internal", "none"], var.dns_type)
-    error_message = "dns_type must be 'proxied', 'non-proxied', 'internal', or 'none'."
+    condition     = contains(["proxied", "non-proxied", "none"], var.dns_type)
+    error_message = "dns_type must be 'proxied', 'non-proxied', or 'none'."
  }
 }

 # Uptime Kuma external monitor: when true, annotate the ingress so the
 # external-monitor-sync CronJob creates a `[External] <name>` monitor pointing
-# at https://<host>. Null means "follow dns_type" — enabled when the ingress
-# has a PUBLIC DNS record (proxied or non-proxied; 'internal' records are not
-# externally reachable, so no external monitor).
+# at https://<host>. Null means "follow dns_type" — enabled when proxied.
 variable "external_monitor" {
  type        = bool
  default     = null
-  description = "Enable Uptime Kuma external monitor. null = auto (enabled when dns_type is 'proxied' or 'non-proxied')."
+  description = "Enable Uptime Kuma external monitor. null = auto (enabled when dns_type == 'proxied')."
 }

 variable "external_monitor_name" {
@ -180,15 +171,6 @@ variable "public_ipv6" {
  default = "2001:470:6e:43d::2"
 }

-# Internal Traefik LB IP used by dns_type = "internal" records. Tracks the
-# dedicated MetalLB IP from stacks/traefik (ETP=Local). A future LB renumber
-# must update this default alongside the split-horizon apex record — see
-# docs/plans/2026-05-30-traefik-dedicated-ip-etp-local-*.
-variable "internal_lb_ip" {
-  type    = string
-  default = "10.0.20.203"
-}
-
 variable "homepage_group" {
  type    = string
  default = null # auto-detect from namespace
@ -219,10 +201,8 @@ locals {
  )

  # External monitor enabled by default when the ingress has a public DNS
-  # record (either CF-proxied or direct A/AAAA). 'internal' records resolve
-  # publicly but are unroutable from outside, so they get no external monitor.
-  # Explicit bool overrides.
-  effective_external_monitor = var.external_monitor != null ? var.external_monitor : (var.dns_type == "proxied" || var.dns_type == "non-proxied")
+  # record (either CF-proxied or direct A/AAAA). Explicit bool overrides.
+  effective_external_monitor = var.external_monitor != null ? var.external_monitor : (var.dns_type != "none")

  # Emit the annotation when effective is true (positive signal), or when the
  # caller explicitly set external_monitor=false (opt-out). When the caller
@ -444,19 +424,3 @@ resource "cloudflare_record" "non_proxied_aaaa" {
  zone_id         = var.cloudflare_zone_id
  allow_overwrite = true
 }
-
-# 'internal': a publicly-resolvable A record carrying the INTERNAL Traefik LB
-# IP. Outsiders resolve it but can't route to it; home-LAN/WG-site/VPN clients
-# reach Traefik directly (the WG spokes policy-route 10.0.0.0/8 through the
-# tunnel), so kiosk devices with baked-in URLs need no DNS overrides anywhere.
-# IPv4-only on purpose: the spokes route no internal IPv6 range.
-resource "cloudflare_record" "internal_a" {
-  count           = var.dns_type == "internal" ? 1 : 0
-  name            = local.dns_name
-  content         = var.internal_lb_ip
-  proxied         = false
-  ttl             = 1
-  type            = "A"
-  zone_id         = var.cloudflare_zone_id
-  allow_overwrite = true
-}
--- a/scripts/t3-serve@.service
+++ b/scripts/t3-serve@.service
@ -21,19 +21,12 @@ WorkingDirectory=/home/%i
 ExecStart=/usr/bin/t3 serve --host 0.0.0.0 --port ${T3_PORT} --base-dir /home/%i/.t3
 Restart=on-failure
 RestartSec=5
-# Memory containment (2026-06-10, amended 2026-07-02): agent children live in
-# this cgroup; a runaway agent (10.8G anon on a 23G host) swap-thrashed the
-# whole devvm — every >20s stall fires the t3 client watchdog (visible
-# "disconnects") — then global-OOMed. Cap the cgroup so a runaway OOMs early
-# and locally, and forbid swap so stalls can't smear into minutes-long freezes.
-# MemoryHigh is DELIBERATELY infinity — do not add a soft band below MemoryMax:
-# with swap=0 a hog that plateaus between high and max is unreclaimable but
-# never OOMs, and the kernel's high-throttle stalls EVERY task in the cgroup
-# (the t3 event loop included) indefinitely. A 12.3G agent ugrep livelocked
-# this unit for ~50min on 2026-07-02 exactly this way. Straight-to-OOM at
-# MemoryMax is the containment; OOMPolicy=continue below keeps the server up.
-# See docs/post-mortems/2026-06-22-devvm-mem-io-overload-containment.md addendum.
-MemoryHigh=infinity
+# Memory containment (2026-06-10): agent children live in this cgroup; a
+# runaway agent (10.8G anon on a 23G host) swap-thrashed the whole devvm —
+# every >20s stall fires the t3 client watchdog (visible "disconnects") —
+# then global-OOMed. Cap the cgroup so a runaway OOMs early and locally,
+# and forbid swap so stalls can't smear into minutes-long freezes.
+MemoryHigh=12G
 MemoryMax=16G
 MemorySwapMax=0
 # Default OOMPolicy=stop kills the WHOLE unit (8.5min outage 2026-06-10
--- a/scripts/test-vault-token-renew.sh
+++ b/scripts/test-vault-token-renew.sh
@ -1,11 +1,10 @@
 #!/usr/bin/env bash
-# Unit tests for the pure functions in vault-token-renew.sh.
-# Sources the script (vtr_main is guarded) and exercises (a) the drift-guard
-# decision — is ~/.vault-token OUR periodic admin token (renew) or a foreign
-# clobber (heal / fail loud)? — whose ABSENCE let the 2026-06-05 woodpecker
-# clobber be silently renewed for two days, and (b) the self-heal's revoke
-# filter — which stale token-devvm-wizard tokens a heal may sweep.
-# Run: bash infra/scripts/test-vault-token-renew.sh
+# Unit tests for the pure drift-guard functions in vault-token-renew.sh.
+# Sources the script (vtr_main is guarded) and exercises the decision logic that
+# decides whether ~/.vault-token is OUR periodic admin token (renew) or a foreign
+# token that clobbered the file (refuse, fail loud). This is exactly the logic
+# whose ABSENCE let the 2026-06-05 woodpecker-token clobber be silently renewed
+# for two days. Run: bash infra/scripts/test-vault-token-renew.sh
 set -uo pipefail
 DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 # shellcheck source=/dev/null
@ -54,21 +53,5 @@ ok "ours: parse+decide renews"        vtr_drift_ok "$(vtr_display_name "$LOOKUP_
 no "woodpecker: parse+decide refused" vtr_drift_ok "$(vtr_display_name "$LOOKUP_WP")"   "$(vtr_policies_csv "$LOOKUP_WP")"
 no "oidc: parse+decide refused"       vtr_drift_ok "$(vtr_display_name "$LOOKUP_OIDC")" "$(vtr_policies_csv "$LOOKUP_OIDC")"

-# --- vtr_accessor: parse accessor out of lookup JSON ---
-LOOKUP_NEW='{"data":{"display_name":"token-devvm-wizard","accessor":"acc-new","policies":["default","sops-admin","vault-admin"],"identity_policies":null}}'
-eq "accessor parsed"          "acc-new" "$(vtr_accessor "$LOOKUP_NEW")"
-eq "accessor absent -> empty" ""        "$(vtr_accessor '{"data":{"display_name":"x"}}')"
-
-# --- vtr_is_stale_periodic: the heal's revoke filter — ONLY old token-devvm-wizard
-# --- tokens are swept; the just-minted token, foreign tokens, and anything with an
-# --- unknown accessor are kept. An empty keep-accessor sweeps NOTHING (fail-safe).
-STALE_OURS='{"data":{"display_name":"token-devvm-wizard","accessor":"acc-old","policies":["default","sops-admin","vault-admin"]}}'
-ok "older periodic token is stale"      vtr_is_stale_periodic "$STALE_OURS" "acc-new"
-no "the just-minted token is kept"      vtr_is_stale_periodic "$LOOKUP_NEW" "acc-new"
-no "foreign oidc token never swept"     vtr_is_stale_periodic "$LOOKUP_OIDC" "acc-new"
-no "woodpecker token never swept"       vtr_is_stale_periodic "$LOOKUP_WP" "acc-new"
-no "missing accessor never swept"       vtr_is_stale_periodic '{"data":{"display_name":"token-devvm-wizard"}}' "acc-new"
-no "empty keep-accessor sweeps nothing" vtr_is_stale_periodic "$STALE_OURS" ""
-
 printf '\n%d passed, %d failed\n' "$pass" "$fail"
 (( fail == 0 ))
--- a/scripts/vault-token-renew.sh
+++ b/scripts/vault-token-renew.sh
@ -45,94 +45,6 @@ vtr_drift_ok() {
  printf ',%s,' "$pols" | grep -q ",$REQUIRED_POLICY," || return 1
 }

-# vtr_accessor <lookup-json> -> the token accessor (empty if absent).
-vtr_accessor() {
-  printf '%s' "$1" | jq -r '.data.accessor // ""'
-}
-
-# vtr_is_stale_periodic <lookup-json> <keep-accessor> -> 0 if this lookup
-# describes one of OUR periodic tokens (display name matches) that is NOT the
-# one to keep — i.e. a stale leftover a heal should revoke. 1 otherwise.
-# Name-only on purpose (no policy check): anything named token-devvm-wizard
-# that isn't the current token is garbage from a previous mint. An empty
-# keep-accessor sweeps NOTHING (fail-safe: never revoke when we don't know
-# which token is current).
-vtr_is_stale_periodic() {
-  local dn acc
-  [ -n "${2:-}" ] || return 1
-  dn=$(vtr_display_name "$1")
-  acc=$(vtr_accessor "$1")
-  [ "$dn" = "$EXPECTED_DN" ] || return 1
-  [ -n "$acc" ] || return 1
-  [ "$acc" != "$2" ]
-}
-
-# vtr_heal <foreign-dn> <log-file> -> 0 if ~/.vault-token was re-minted back to
-# our periodic admin token using the foreign token's own authority, 1 if the
-# heal was denied or failed (caller exits non-zero; the unit goes failed).
-#
-# Self-heal added 2026-07-03 (docs/plans/2026-07-03-vault-token-self-heal-design.md):
-# an OIDC login — which the infra docs prescribe before applies — clobbers
-# ~/.vault-token with a 7-day token, and detect-only drift left that unnoticed
-# for weeks (the weekly-expiry loop). We ATTEMPT the re-mint with the
-# clobbering token itself and let Vault's authz decide — a read-only clobber
-# (the 2026-06-05 woodpecker incident) is denied the mint and stays a loud
-# failure, because it signals a misbehaving flow that someone should look at.
-vtr_heal() {
-  local foreign_dn="$1" log="$2"
-  local errf new_token new_info new_dn new_pols new_acc tmp
-  errf=$(mktemp)
-  if ! new_token=$(vault token create -orphan -period=768h \
-        -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard \
-        -field=token 2>"$errf") || [ -z "$new_token" ]; then
-    printf '%s DRIFT: ~/.vault-token is dn=%q — heal denied, foreign token lacks create authority (%s); investigate what wrote it. Manual re-mint: vault login -method=oidc && vault token create -orphan -period=768h -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard -field=token > ~/.vault-token && chmod 600 ~/.vault-token\n' \
-      "$(date -Is)" "$foreign_dn" "$(tr '\n' ' ' <"$errf")" >>"$log"
-    rm -f "$errf"
-    return 1
-  fi
-  rm -f "$errf"
-
-  # Sanity: the minted token must itself pass the drift guard before it may
-  # replace ~/.vault-token.
-  if ! new_info=$(VAULT_TOKEN="$new_token" vault token lookup -format=json 2>&1); then
-    printf '%s FAIL: heal minted a token but its lookup failed: %s\n' \
-      "$(date -Is)" "$new_info" >>"$log"
-    return 1
-  fi
-  new_dn=$(vtr_display_name "$new_info")
-  new_pols=$(vtr_policies_csv "$new_info")
-  if ! vtr_drift_ok "$new_dn" "$new_pols"; then
-    printf '%s FAIL: heal minted an unexpected token (dn=%q policies=%q) — not writing it\n' \
-      "$(date -Is)" "$new_dn" "$new_pols" >>"$log"
-    return 1
-  fi
-
-  # Atomic replace: mktemp files are 0600 from birth; same-filesystem mv.
-  tmp=$(mktemp "$HOME/.vault-token.XXXXXX")
-  printf '%s' "$new_token" >"$tmp"
-  mv "$tmp" "$HOME/.vault-token"
-
-  # Anti-sprawl: revoke previous token-devvm-wizard tokens — each heal would
-  # otherwise strand the prior periodic ADMIN token server-side for up to 32d.
-  # The clobbering foreign token is deliberately NOT revoked: it may still back
-  # the user's live login session, and it ages out on its own (7d for OIDC).
-  local sweep="accessor sweep skipped (list denied)" accessors a a_info revoked=0
-  new_acc=$(vtr_accessor "$new_info")
-  if [ -n "$new_acc" ] && accessors=$(VAULT_TOKEN="$new_token" vault list -format=json auth/token/accessors 2>/dev/null); then
-    while IFS= read -r a; do
-      [ -n "$a" ] || continue
-      a_info=$(VAULT_TOKEN="$new_token" vault token lookup -format=json -accessor "$a" 2>/dev/null) || continue
-      if vtr_is_stale_periodic "$a_info" "$new_acc"; then
-        VAULT_TOKEN="$new_token" vault token revoke -accessor "$a" >/dev/null 2>&1 && revoked=$((revoked + 1))
-      fi
-    done < <(printf '%s' "$accessors" | jq -r '.[]')
-    sweep="revoked $revoked stale periodic token(s)"
-  fi
-
-  printf '%s HEALED: re-minted periodic token from foreign dn=%q (%s)\n' \
-    "$(date -Is)" "$foreign_dn" "$sweep" >>"$log"
-}
-
 vtr_main() {
  set -euo pipefail
  export PATH="/usr/local/bin:/usr/bin:/bin:${PATH:-}"
@ -149,19 +61,16 @@ vtr_main() {
  dn=$(vtr_display_name "$info")
  pols=$(vtr_policies_csv "$info")

-  # Drift guard (2026-06-07) + self-heal (2026-07-03): the renewer must not
-  # keep a FOREIGN token alive (on 2026-06-05 a stray kubernetes login was
-  # silently renewed for two days, masking lost write access). But detect-only
-  # drift proved worse in practice: an OIDC login — which the infra docs
-  # prescribe before applies — clobbers this file too, and the resulting DRIFT
-  # failures went unnoticed for weeks while access degraded to a 7-day token
-  # (the weekly-expiry loop). On drift we now ATTEMPT to heal (see vtr_heal):
-  # re-mint the periodic token with the clobbering token's own authority.
-  # Vault's authz keeps the old guarantee — a token that couldn't legitimately
-  # hold vault-admin is denied the mint, and we still fail loud.
+  # Drift guard (added 2026-06-07): the renewer must NOT keep a FOREIGN token alive.
+  # On 2026-06-05 a stray `vault login -method=kubernetes` overwrote ~/.vault-token
+  # with a read-only woodpecker token, and this script then silently renewed THAT
+  # for two days — masking the loss of write access. So before renewing, confirm
+  # the token is our periodic admin token; if it has drifted, fail loudly (systemd
+  # marks the unit failed) instead of keeping someone else's token alive.
  if ! vtr_drift_ok "$dn" "$pols"; then
-    vtr_heal "$dn" "$log" || exit 1
-    exit 0
+    printf '%s DRIFT: ~/.vault-token is dn=%q policies=%q (expected dn=%q with %q). Refusing to renew a foreign token. Re-mint: vault login -method=oidc && vault token create -orphan -period=768h -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard -field=token > ~/.vault-token && chmod 600 ~/.vault-token\n' \
+      "$(date -Is)" "$dn" "$pols" "$EXPECTED_DN" "$REQUIRED_POLICY" >>"$log"
+    exit 1
  fi

  # `vault token renew` with no argument renews the calling token (renew-self).
--- a/scripts/workstation/setup-devvm.sh
+++ b/scripts/workstation/setup-devvm.sh
@ -244,15 +244,9 @@ log "service units installed + enabled (t3-dispatch + 3 timers; t3-serve@ per-us
 #     virtual disk into an IO storm + multi-minute freeze (hard-killed 2026-06-22).
 #     t3-serve@ was already capped (its [Service] block); the HOLE was the uncapped
 #     user-<uid>.slice (all ssh/tmux work). Design — per user, on BOTH trees:
-#     MemoryMax=16G hard + MemorySwapMax=0 (work never touches disk swap → no
-#     thrash; a runaway is cgroup-OOM-killed locally at the ceiling), plus
-#     fair-share CPU/IO weights.
-#     NO MemoryHigh soft band (removed 2026-07-02; was 12G "throttle to a crawl"):
-#     with swap=0, a hog that PLATEAUS between high and max is unreclaimable but
-#     never OOMs — the kernel parks every task of the cgroup in
-#     mem_cgroup_handle_over_high and the whole tree stalls indefinitely. A 12.3G
-#     agent ugrep livelocked t3-serve@wizard (t3 down ~50min) exactly this way.
-#     Cap-and-kill, never throttle-and-pray — see the post-mortem addendum.
+#     MemoryHigh=12G soft (throttles a runaway to a crawl), MemoryMax=16G hard,
+#     MemorySwapMax=0 (work never touches disk swap → no thrash; it OOMs locally at
+#     the ceiling instead), plus fair-share CPU/IO weights.
 #     BACKSTOP = earlyoom, NOT systemd-oomd. We first shipped systemd-oomd but it is
 #     INERT with swap=0: its pressure-kill only acts on cgroups doing active reclaim
 #     (pgscan rising), and a no-swap anon workload never reclaims — verified live, a
@ -266,16 +260,12 @@ log "service units installed + enabled (t3-dispatch + 3 timers; t3-serve@ per-us
 # 10a) per-user caps + fair-share weights on EVERY user-<uid>.slice (ssh/tmux)
 install -d -m 0755 /etc/systemd/system/user-.slice.d
 cat > /etc/systemd/system/user-.slice.d/50-devvm-resource.conf <<'SLICE_EOF'
-# Per-user containment for the shared devvm (setup-devvm.sh §10, 2026-06-22;
-# MemoryHigh dropped 2026-07-02). Applies to EACH user-<uid>.slice = all of one
-# user's ssh/tmux work. Mirrors the t3-serve@.service caps so a user is bounded
-# in whichever surface they work in. MemoryHigh stays infinity: with swap=0 a
-# hog plateauing in a high..max band livelocks the entire slice (every ssh/tmux
-# session of that user) instead of dying — straight-to-OOM at MemoryMax is the
-# containment (see post-mortem addendum 2026-07-02).
+# Per-user containment for the shared devvm (setup-devvm.sh §10, 2026-06-22).
+# Applies to EACH user-<uid>.slice = all of one user's ssh/tmux work. Mirrors the
+# t3-serve@.service caps so a user is bounded in whichever surface they work in.
 [Slice]
 MemoryAccounting=yes
-MemoryHigh=infinity
+MemoryHigh=12G
 MemoryMax=16G
 MemorySwapMax=0
 CPUAccounting=yes
@ -304,14 +294,12 @@ cat > /etc/systemd/system/docker.slice <<'DOCKER_SLICE_EOF'
 # All docker containers live here (cgroup-parent in /etc/docker/daemon.json) so
 # they share one bounded budget and a runaway container is capped at MemoryMax
 # (cgroup-OOM'd locally) instead of escaping into the uncapped system.slice.
-# setup-devvm.sh §10, 2026-06-22; MemoryHigh dropped 2026-07-02 — a container
-# plateauing in the high..max band would throttle-livelock EVERY container in
-# the slice (see post-mortem addendum); MemoryMax OOM is the containment.
+# setup-devvm.sh §10, 2026-06-22.
 [Unit]
 Description=Docker containers slice (capped)
 [Slice]
 MemoryAccounting=yes
-MemoryHigh=infinity
+MemoryHigh=6G
 MemoryMax=8G
 MemorySwapMax=0
 CPUAccounting=yes
--- a/secrets/nfs_directories.txt
+++ b/secrets/nfs_directories.txt
--- a/stacks/cloudflared/modules/cloudflared/cloudflare.tf
+++ b/stacks/cloudflared/modules/cloudflared/cloudflare.tf
@ -235,12 +235,6 @@ resource "cloudflare_record" "keyserver" {
  zone_id  = var.cloudflare_zone_id
 }

-# bridge.viktorbarzin.me (Cloudflare Pages, "мост" school site) moved to
-# stacks/valia-sites (ADR-0018) — all Valia-site records live there now.
-# State handoff was a manual `tg state rm` (2026-07-03): the CI terraform
-# (<1.7) rejects removed{} blocks even at the stack root, so declarative
-# forget wasn't available. valia-sites imported the live record by id.
-
 # Enable HTTP/3 (QUIC) for Cloudflare-proxied domains
 resource "cloudflare_zone_settings_override" "http3" {
  zone_id = var.cloudflare_zone_id
--- a/stacks/dawarich/main.tf
+++ b/stacks/dawarich/main.tf
@ -16,7 +16,7 @@ resource "kubernetes_namespace" "dawarich" {
    name = "dawarich"
    labels = {
      "istio-injection" : "disabled"
-      tier               = local.tiers.edge
+      tier = local.tiers.edge
      "keel.sh/enrolled" = "true"
    }
  }
@ -330,7 +330,7 @@ resource "kubernetes_deployment" "dawarich" {
  }
  lifecycle {
    ignore_changes = [
-      spec[0].template[0].spec[0].dns_config,         # KYVERNO_LIFECYCLE_V1
+      spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
      spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
      metadata[0].annotations["keel.sh/policy"],
      metadata[0].annotations["keel.sh/trigger"],
@ -458,13 +458,6 @@ module "ingress" {
  namespace       = kubernetes_namespace.dawarich.metadata[0].name
  name            = "dawarich"
  tls_secret_name = var.tls_secret_name
-  # Rails serves all its fingerprinted assets itself and the map view adds an
-  # API burst per page load — the default 10/50 limiter 429s the asset tail
-  # from a single client IP (and risks dropping OwnTracks/mobile ingestion
-  # POSTs on the same host). Dedicated 100/1000 limiter defined in
-  # stacks/traefik/modules/traefik/middleware.tf.
-  skip_default_rate_limit = true
-  extra_middlewares       = ["traefik-dawarich-rate-limit@kubernetescrd"]
  extra_annotations = {
    "gethomepage.dev/enabled"      = "true"
    "gethomepage.dev/name"         = "Dawarich"
--- a/stacks/dbaas/modules/dbaas/main.tf
+++ b/stacks/dbaas/modules/dbaas/main.tf
@ -1511,34 +1511,6 @@ resource "null_resource" "pg_instagram_poster_db" {
  }
 }

-# Create tasks database for the tasks PWA (Reminders-style front-end over
-# Nextcloud CalDAV; FastAPI + SvelteKit SPA — see ~/code/tasks). Stores
-# Connected Accounts (Fernet-encrypted Nextcloud app passwords) + sync state.
-# Role password is managed by Vault Database Secrets Engine (static role
-# `pg-tasks`, 7d rotation). Tables are created by alembic on app startup.
-resource "null_resource" "pg_tasks_db" {
-  depends_on = [null_resource.pg_cluster]
-
-  triggers = {
-    db_name  = "tasks"
-    username = "tasks"
-  }
-
-  provisioner "local-exec" {
-    command = <<-EOT
-      PRIMARY=$(kubectl --kubeconfig ${var.kube_config_path} get cluster -n dbaas pg-cluster -o jsonpath='{.status.currentPrimary}')
-      kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas $PRIMARY -c postgres -- \
-        bash -c '
-          psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'tasks'"'"'" | grep -q 1 || \
-            psql -U postgres -c "CREATE ROLE tasks WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'"
-          psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_database WHERE datname = '"'"'tasks'"'"'" | grep -q 1 || \
-            psql -U postgres -c "CREATE DATABASE tasks OWNER tasks"
-          psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE tasks TO tasks"
-        '
-    EOT
-  }
-}
-
 # Old PostgreSQL deployment — kept commented for rollback reference
 # resource "kubernetes_deployment" "postgres" {
 #   metadata {
--- a/stacks/drone-logbook/main.tf
+++ b/stacks/drone-logbook/main.tf
@ -1,360 +0,0 @@
-variable "tls_secret_name" {
-  type      = string
-  sensitive = true
-}
-variable "nfs_server" { type = string }
-
-# Open DroneLog (https://github.com/arpanghosh8453/open-dronelog) — self-hosted
-# DJI flight-log analyzer for the DJI Mini 4 Pro. Runs the UPSTREAM image (the
-# ViktorBarzin/drone-logbook fork has no custom commits); Keel tracks :latest.
-# Design: docs/plans/2026-07-04-drone-logbook-design.md
-resource "kubernetes_namespace" "drone_logbook" {
-  metadata {
-    name = "drone-logbook"
-    labels = {
-      tier               = local.tiers.aux
-      "keel.sh/enrolled" = "true"
-    }
-  }
-  lifecycle {
-    # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
-    ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
-  }
-}
-
-resource "kubernetes_manifest" "external_secret" {
-  field_manager {
-    force_conflicts = true
-  }
-  manifest = {
-    apiVersion = "external-secrets.io/v1"
-    kind       = "ExternalSecret"
-    metadata = {
-      name      = "drone-logbook-secrets"
-      namespace = "drone-logbook"
-    }
-    spec = {
-      refreshInterval = "15m"
-      secretStoreRef = {
-        name = "vault-kv"
-        kind = "ClusterSecretStore"
-      }
-      target = {
-        name = "drone-logbook-secrets"
-      }
-      dataFrom = [{
-        extract = {
-          key = "drone-logbook"
-        }
-      }]
-    }
-  }
-  depends_on = [kubernetes_namespace.drone_logbook]
-}
-
-module "tls_secret" {
-  source          = "../../modules/kubernetes/setup_tls_secret"
-  namespace       = kubernetes_namespace.drone_logbook.metadata[0].name
-  tls_secret_name = var.tls_secret_name
-}
-
-# DuckDB database + cached DJI decryption keys + uploaded originals.
-# Embedded DB -> block storage, not NFS (same rationale as freshrss data).
-# Encrypted class: flight logs are GPS traces of home/travel (sensitive data
-# -> proxmox-lvm-encrypted per the storage decision rule in .claude/CLAUDE.md).
-resource "kubernetes_persistent_volume_claim" "data" {
-  wait_until_bound = false
-  metadata {
-    name      = "drone-logbook-data-encrypted"
-    namespace = kubernetes_namespace.drone_logbook.metadata[0].name
-    annotations = {
-      "resize.topolvm.io/threshold"     = "10%"
-      "resize.topolvm.io/increase"      = "100%"
-      "resize.topolvm.io/storage_limit" = "10Gi"
-    }
-  }
-  spec {
-    access_modes       = ["ReadWriteOnce"]
-    storage_class_name = "proxmox-lvm-encrypted"
-    resources {
-      requests = {
-        storage = "2Gi"
-      }
-    }
-  }
-  lifecycle {
-    # The autoresizer expands requests.storage up to storage_limit and PVCs
-    # can't shrink; without this every apply tries to revert the size.
-    ignore_changes = [spec[0].resources[0].requests]
-  }
-}
-
-# Drop folder: any producer (Nextcloud sync, scp, future phone pipeline) lands
-# DJI .txt logs here over NFS; the app auto-imports on SYNC_INTERVAL.
-module "nfs_sync_logs" {
-  source     = "../../modules/kubernetes/nfs_volume"
-  name       = "drone-logbook-sync-logs"
-  namespace  = kubernetes_namespace.drone_logbook.metadata[0].name
-  nfs_server = var.nfs_server
-  nfs_path   = "/srv/nfs/drone-logbook/sync-logs"
-  storage    = "5Gi"
-}
-
-resource "kubernetes_deployment" "drone_logbook" {
-  metadata {
-    name      = "drone-logbook"
-    namespace = kubernetes_namespace.drone_logbook.metadata[0].name
-    labels = {
-      app                             = "drone-logbook"
-      "kubernetes.io/cluster-service" = "true"
-      tier                            = local.tiers.aux
-    }
-  }
-  spec {
-    replicas = 1
-    strategy {
-      # DuckDB is single-writer; never overlap two pods on the same volume
-      type = "Recreate"
-    }
-    selector {
-      match_labels = {
-        app = "drone-logbook"
-      }
-    }
-    template {
-      metadata {
-        labels = {
-          app                             = "drone-logbook"
-          "kubernetes.io/cluster-service" = "true"
-        }
-      }
-      spec {
-        container {
-          name  = "drone-logbook"
-          image = "ghcr.io/arpanghosh8453/open-dronelog:latest"
-          env {
-            name  = "RUST_LOG"
-            value = "info"
-          }
-          env {
-            # keep re-importable originals under /data/drone-logbook/uploaded
-            name  = "KEEP_UPLOADED_FILES"
-            value = "true"
-          }
-          env {
-            name  = "SYNC_LOGS_PATH"
-            value = "/sync-logs"
-          }
-          env {
-            # 6-field cron (sec min hour dom mon dow): scan drop folder every 8h
-            name  = "SYNC_INTERVAL"
-            value = "0 0 */8 * * *"
-          }
-          env {
-            name = "PROFILE_CREATION_PASS"
-            value_from {
-              secret_key_ref {
-                name = "drone-logbook-secrets"
-                key  = "profile_creation_pass"
-              }
-            }
-          }
-          volume_mount {
-            name       = "data"
-            mount_path = "/data/drone-logbook"
-          }
-          volume_mount {
-            name       = "sync-logs"
-            mount_path = "/sync-logs"
-            read_only  = true
-          }
-          port {
-            name           = "http"
-            container_port = 80
-            protocol       = "TCP"
-          }
-          resources {
-            requests = {
-              cpu    = "25m"
-              memory = "512Mi"
-            }
-            limits = {
-              memory = "512Mi"
-            }
-          }
-        }
-        volume {
-          name = "data"
-          persistent_volume_claim {
-            claim_name = kubernetes_persistent_volume_claim.data.metadata[0].name
-          }
-        }
-        volume {
-          name = "sync-logs"
-          persistent_volume_claim {
-            claim_name = module.nfs_sync_logs.claim_name
-          }
-        }
-      }
-    }
-  }
-  depends_on = [kubernetes_manifest.external_secret]
-  lifecycle {
-    ignore_changes = [
-      spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
-      metadata[0].annotations["keel.sh/policy"],
-      metadata[0].annotations["keel.sh/trigger"],
-      metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
-      metadata[0].annotations["keel.sh/match-tag"],
-      spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
-      metadata[0].annotations["kubernetes.io/change-cause"],
-      metadata[0].annotations["deployment.kubernetes.io/revision"],
-      spec[0].template[0].metadata[0].annotations["keel.sh/update-time"], # KEEL_LIFECYCLE_V1
-    ]
-  }
-}
-
-resource "kubernetes_service" "drone_logbook" {
-  metadata {
-    name      = "drone-logbook"
-    namespace = kubernetes_namespace.drone_logbook.metadata[0].name
-    labels = {
-      "app" = "drone-logbook"
-    }
-  }
-
-  spec {
-    selector = {
-      app = "drone-logbook"
-    }
-    port {
-      port        = "80"
-      target_port = "80"
-    }
-  }
-}
-
-# -----------------------------------------------------------------------------
-# Backup — required for every proxmox-lvm(-encrypted) app: daily copy of the
-# data volume to NFS /srv/nfs/drone-logbook-backup (picked up by nfs-mirror ->
-# sda -> Synology offsite). 01:30 = outside the 00:00/08:00/16:00 sync-import
-# windows, so the DuckDB file is quiescent; uploaded originals make even a
-# mid-write copy recoverable by re-import. Pod-affinity co-schedules with the
-# app pod (RWO volume mounts twice only on the same node). Vaultwarden pattern.
-# -----------------------------------------------------------------------------
-
-module "nfs_backup" {
-  source     = "../../modules/kubernetes/nfs_volume"
-  name       = "drone-logbook-backup-host"
-  namespace  = kubernetes_namespace.drone_logbook.metadata[0].name
-  nfs_server = var.nfs_server
-  nfs_path   = "/srv/nfs/drone-logbook-backup"
-}
-
-resource "kubernetes_cron_job_v1" "backup" {
-  metadata {
-    name      = "drone-logbook-backup"
-    namespace = kubernetes_namespace.drone_logbook.metadata[0].name
-  }
-  spec {
-    concurrency_policy            = "Replace"
-    failed_jobs_history_limit     = 5
-    schedule                      = "30 1 * * *"
-    starting_deadline_seconds     = 300
-    successful_jobs_history_limit = 3
-    job_template {
-      metadata {}
-      spec {
-        backoff_limit              = 3
-        ttl_seconds_after_finished = 10
-        template {
-          metadata {}
-          spec {
-            affinity {
-              pod_affinity {
-                required_during_scheduling_ignored_during_execution {
-                  label_selector {
-                    match_labels = {
-                      app = "drone-logbook"
-                    }
-                  }
-                  topology_key = "kubernetes.io/hostname"
-                }
-              }
-            }
-            container {
-              name  = "drone-logbook-backup"
-              image = "docker.io/library/alpine"
-              command = ["/bin/sh", "-c", <<-EOT
-                set -euxo pipefail
-                _t0=$(date +%s)
-                now=$(date +"%Y_%m_%d_%H_%M")
-                mkdir -p /backup/$now
-                cp -a /data/. /backup/$now/
-                # Rotate — 30 day retention
-                find /backup -maxdepth 1 -mindepth 1 -type d -mtime +30 -exec rm -rf {} +
-                _dur=$(($(date +%s) - _t0))
-                _out_bytes=$(du -sb /backup/$now | awk '{print $1}')
-                wget -qO- --post-data "backup_duration_seconds $${_dur}
-                backup_output_bytes $${_out_bytes}
-                backup_last_success_timestamp $(date +%s)
-                " "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/drone-logbook-backup" || true
-              EOT
-              ]
-              volume_mount {
-                name       = "data"
-                mount_path = "/data"
-                read_only  = true
-              }
-              volume_mount {
-                name       = "backup"
-                mount_path = "/backup"
-              }
-            }
-            volume {
-              name = "data"
-              persistent_volume_claim {
-                claim_name = kubernetes_persistent_volume_claim.data.metadata[0].name
-              }
-            }
-            volume {
-              name = "backup"
-              persistent_volume_claim {
-                claim_name = module.nfs_backup.claim_name
-              }
-            }
-            dns_config {
-              option {
-                name  = "ndots"
-                value = "2"
-              }
-            }
-          }
-        }
-      }
-    }
-  }
-  lifecycle {
-    # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
-    ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
-  }
-}
-
-# https://dronelog.viktorbarzin.me
-module "ingress" {
-  source          = "../../modules/kubernetes/ingress_factory"
-  auth            = "required" # Authentik forward-auth — flight logs are GPS traces of home/travel
-  dns_type        = "proxied"
-  namespace       = kubernetes_namespace.drone_logbook.metadata[0].name
-  name            = "dronelog"
-  service_name    = "drone-logbook"
-  tls_secret_name = var.tls_secret_name
-  extra_annotations = {
-    "gethomepage.dev/enabled"      = "true"
-    "gethomepage.dev/name"         = "Drone Logbook"
-    "gethomepage.dev/description"  = "DJI flight log analyzer"
-    "gethomepage.dev/icon"         = "mdi-quadcopter"
-    "gethomepage.dev/group"        = "Media & Entertainment"
-    "gethomepage.dev/pod-selector" = ""
-  }
-}
--- a/stacks/drone-logbook/secrets
+++ b/stacks/drone-logbook/secrets
@ -1 +0,0 @@
-../../secrets
--- a/stacks/drone-logbook/terragrunt.hcl
+++ b/stacks/drone-logbook/terragrunt.hcl
@ -1,8 +0,0 @@
-include "root" {
-  path = find_in_parent_folders()
-}
-
-dependency "platform" {
-  config_path  = "../platform"
-  skip_outputs = true
-}
--- a/stacks/excalidraw/main.tf
+++ b/stacks/excalidraw/main.tf
@ -10,7 +10,7 @@ resource "kubernetes_namespace" "excalidraw" {
    name = "excalidraw"
    labels = {
      "istio-injection" : "disabled"
-      tier               = local.tiers.aux
+      tier = local.tiers.aux
      "keel.sh/enrolled" = "true"
    }
  }
@ -45,15 +45,6 @@ resource "kubernetes_deployment" "excalidraw" {
      app  = "excalidraw"
      tier = local.tiers.aux
    }
-    # Keel rolls new ghcr:latest digests (k8s-portal pattern). Values here are
-    # recreate-correct seeds only — the keys are in ignore_changes below, so
-    # the live annotations win on an existing deployment.
-    annotations = {
-      "keel.sh/policy"       = "force"
-      "keel.sh/trigger"      = "poll"
-      "keel.sh/match-tag"    = "true"
-      "keel.sh/pollSchedule" = "@every 5m"
-    }
  }
  spec {
    replicas = 1
@ -76,19 +67,9 @@ resource "kubernetes_deployment" "excalidraw" {
        }
      }
      spec {
-        # GHCR pull secret: the ghcr-credentials Secret in this namespace is
-        # cloned in by the kyverno stack's sync-ghcr-credentials ClusterPolicy
-        # (allowlisted private-ghcr namespaces only — ADR-0002). Source of
-        # truth: stacks/kyverno/modules/kyverno/ghcr-credentials.tf.
-        image_pull_secrets {
-          name = "ghcr-credentials"
-        }
        container {
-          # ADR-0002: GHA-built (.github/workflows/build-excalidraw.yml),
-          # PRIVATE ghcr; Keel rolls new :latest digests. DockerHub
-          # viktorbarzin/excalidraw-library:v4 is the frozen rollback image.
-          image             = "ghcr.io/viktorbarzin/excalidraw-library:latest"
-          image_pull_policy = "Always"
+          image             = "viktorbarzin/excalidraw-library:v4"
+          image_pull_policy = "IfNotPresent"
          name              = "excalidraw"
          port {
            container_port = 8080
@ -126,7 +107,7 @@ resource "kubernetes_deployment" "excalidraw" {
  }
  lifecycle {
    ignore_changes = [
-      spec[0].template[0].spec[0].dns_config,         # KYVERNO_LIFECYCLE_V1
+      spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
      spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
      metadata[0].annotations["keel.sh/policy"],
      metadata[0].annotations["keel.sh/trigger"],
--- a/stacks/excalidraw/project/README.md
+++ b/stacks/excalidraw/project/README.md
@ -4,28 +4,18 @@ A self-hosted Excalidraw library with per-user drawing storage and management.

 ## Features

- Dashboard to manage all your drawings (create, open, rename, delete)
+- Dashboard to manage all your drawings
 - Per-user storage (via Authentik SSO headers)
- Rename drawings from the dashboard or by clicking the drawing name in the editor
- Native Excalidraw export via the editor's hamburger menu: "Save to..."
-  (.excalidraw file) and "Export image..." (PNG / SVG / clipboard)
- Autosave (2s debounce) + manual save (Ctrl+S or menu "Save now")
+- Create, edit, and delete drawings
 - Persistent storage via NFS

 ## Docker Image

 ```
-ghcr.io/viktorbarzin/excalidraw-library:latest
+viktorbarzin/excalidraw-library:v4
 ```

-Built by GitHub Actions (`.github/workflows/build-excalidraw.yml` in the infra
-repo, ADR-0002) on every master push touching `stacks/excalidraw/project/**`;
-tags `:latest` + `:<git-sha>`. The package is PRIVATE — cluster pulls use the
-Kyverno-synced `ghcr-credentials` secret. Keel polls `:latest` and rolls the
-deployment on digest change.
-
-The legacy manually-built DockerHub image `viktorbarzin/excalidraw-library:v4`
-is frozen as the rollback target; nothing pushes to it anymore.
+Available on Docker Hub: https://hub.docker.com/r/viktorbarzin/excalidraw-library

 ## Configuration

@ -49,13 +39,54 @@ Mount a persistent volume to the `DATA_DIR` path. Drawings are stored as `.excal
    └── my-diagram.excalidraw
 ```

-The filename (without extension) is both the drawing ID and its display name;
-renaming a drawing renames the file (`os.Rename`, mtime preserved).
-
 ## Deployment

-Deployed by the `stacks/excalidraw` Terraform stack (namespace `excalidraw`,
-service `draw`, ingress `draw.viktorbarzin.me` with `auth = "required"`).
+### Docker
+
+```bash
+docker run -d \
+  --name excalidraw-rooms \
+  -p 8080:8080 \
+  -v /path/to/storage:/data \
+  viktorbarzin/excalidraw-library:v4
+```
+
+### Kubernetes
+
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: excalidraw
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      app: excalidraw
+  template:
+    metadata:
+      labels:
+        app: excalidraw
+    spec:
+      containers:
+        - name: excalidraw
+          image: viktorbarzin/excalidraw-library:v4
+          ports:
+            - containerPort: 8080
+          env:
+            - name: DATA_DIR
+              value: /data
+            - name: PORT
+              value: "8080"
+          volumeMounts:
+            - name: data
+              mountPath: /data
+      volumes:
+        - name: data
+          nfs:
+            server: 192.168.1.127
+            path: /srv/nfs/excalidraw
+```

 ### With Authentik SSO

@ -65,7 +96,23 @@ The application reads user identity from Authentik headers:
 - `X-Authentik-Email` - Displayed in UI
 - `X-Authentik-Name` - Displayed in UI

-Requests without `X-Authentik-Username` fall back to the `anonymous` user.
+Configure your ingress to pass these headers:
+
+```yaml
+annotations:
+  nginx.ingress.kubernetes.io/auth-response-headers: "X-authentik-username,X-authentik-email,X-authentik-name"
+```
+
+## Building
+
+```bash
+# Build the Docker image
+docker build -t excalidraw-library .
+
+# Or build locally
+go build -o excalidraw-library .
+./excalidraw-library
+```

 ## API Endpoints

@ -75,25 +122,10 @@ Requests without `X-Authentik-Username` fall back to the `anonymous` user.
 | GET | `/api/drawings` | List all drawings for current user |
 | GET | `/api/drawings/:id` | Get drawing data |
 | PUT | `/api/drawings/:id` | Save drawing |
-| PATCH | `/api/drawings/:id` | Rename drawing — body `{"name": "<new-name>"}`; returns `{"status":"renamed","id":"<new-id>"}`; 409 if the target name exists |
 | DELETE | `/api/drawings/:id` | Delete drawing |
 | GET | `/api/user` | Get current user info |
 | GET | `/draw/:id` | Open drawing in editor |

-Rename names are sanitized server-side to `[a-zA-Z0-9-_]` (other characters
-become `-`; a trailing `.excalidraw` is stripped). Existing IDs are accepted
-as-is for backward compatibility with API clients.
-
-## Development
-
-```bash
-# Run tests
-go test ./...
-
-# Run locally
-DATA_DIR=/tmp/excalidraw-data go run .
-```
-
 ## License

 MIT
--- a/stacks/excalidraw/project/main.go
+++ b/stacks/excalidraw/project/main.go
@ -9,7 +9,6 @@ import (
 	"net/http"
 	"os"
 	"path/filepath"
-	"regexp"
 	"sort"
 	"strings"
 	"time"
@ -64,21 +63,6 @@ func getUsername(r *http.Request) string {
 	return username
 }

-var invalidNameChars = regexp.MustCompile(`[^a-zA-Z0-9-_]`)
-
-// sanitizeName normalizes a user-supplied drawing name into a safe file ID
-// (same charset the dashboard applies on create). Returns "" if nothing
-// meaningful remains.
-func sanitizeName(name string) string {
-	name = strings.TrimSpace(name)
-	name = strings.TrimSuffix(name, ".excalidraw")
-	name = invalidNameChars.ReplaceAllString(name, "-")
-	if strings.Trim(name, "-") == "" {
-		return ""
-	}
-	return name
-}
-
 // getUserDataDir returns the data directory for a specific user and ensures it exists
 func getUserDataDir(username string) string {
 	userDir := filepath.Join(dataDir, username)
@ -184,41 +168,6 @@ func handleDrawing(w http.ResponseWriter, r *http.Request) {
 		w.Header().Set("Content-Type", "application/json")
 		json.NewEncoder(w).Encode(map[string]string{"status": "saved", "id": id})

-	case http.MethodPatch:
-		var req struct {
-			Name string `json:"name"`
-		}
-		if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
-			http.Error(w, "Invalid JSON body", http.StatusBadRequest)
-			return
-		}
-		newID := sanitizeName(req.Name)
-		if newID == "" {
-			http.Error(w, "Invalid name", http.StatusBadRequest)
-			return
-		}
-		if _, err := os.Stat(filePath); err != nil {
-			if os.IsNotExist(err) {
-				http.Error(w, "Drawing not found", http.StatusNotFound)
-			} else {
-				http.Error(w, err.Error(), http.StatusInternalServerError)
-			}
-			return
-		}
-		if newID != id {
-			newPath := filepath.Join(userDataDir, newID+".excalidraw")
-			if _, err := os.Stat(newPath); err == nil {
-				http.Error(w, "A drawing with that name already exists", http.StatusConflict)
-				return
-			}
-			if err := os.Rename(filePath, newPath); err != nil {
-				http.Error(w, err.Error(), http.StatusInternalServerError)
-				return
-			}
-		}
-		w.Header().Set("Content-Type", "application/json")
-		json.NewEncoder(w).Encode(map[string]string{"status": "renamed", "id": newID})
-
 	case http.MethodDelete:
 		if err := os.Remove(filePath); err != nil {
 			if os.IsNotExist(err) {
@ -315,8 +264,6 @@ const dashboardHTML = `<!DOCTYPE html>
        .btn:hover { background: #5b4cdb; }
        .btn-danger { background: #e74c3c; }
        .btn-danger:hover { background: #c0392b; }
-        .btn-secondary { background: #3d3d5c; }
-        .btn-secondary:hover { background: #4a4a70; }
        .btn-small { padding: 0.4rem 0.8rem; font-size: 0.85rem; }
        .drawings { display: grid; gap: 1rem; }
        .drawing {
@ -395,11 +342,11 @@ const dashboardHTML = `<!DOCTYPE html>

    <div id="modal" class="modal">
        <div class="modal-content">
-            <h2 id="modal-title">New Drawing</h2>
+            <h2>New Drawing</h2>
            <input type="text" id="drawingName" placeholder="Drawing name..." autofocus>
            <div class="modal-actions">
                <button class="btn" style="background:#444" onclick="hideModal()">Cancel</button>
-                <button class="btn" id="modal-confirm" onclick="confirmModal()">Create</button>
+                <button class="btn" onclick="createDrawing()">Create</button>
            </div>
        </div>
    </div>
@ -422,63 +369,31 @@ const dashboardHTML = `<!DOCTYPE html>
            }
        }

-        function drawingRow(d) {
-            var row = document.createElement('div');
-            row.className = 'drawing';
-
-            var info = document.createElement('div');
-            info.className = 'drawing-info';
-            var nameLink = document.createElement('a');
-            nameLink.className = 'drawing-name';
-            nameLink.href = '/draw/' + encodeURIComponent(d.id);
-            nameLink.textContent = d.name;
-            var meta = document.createElement('div');
-            meta.className = 'drawing-meta';
-            meta.textContent = 'Modified: ' + new Date(d.modified).toLocaleDateString() + ' ' +
-                new Date(d.modified).toLocaleTimeString() + ' - ' + formatSize(d.size);
-            info.appendChild(nameLink);
-            info.appendChild(meta);
-
-            var actions = document.createElement('div');
-            actions.className = 'drawing-actions';
-            var open = document.createElement('a');
-            open.className = 'btn btn-small';
-            open.href = '/draw/' + encodeURIComponent(d.id);
-            open.textContent = 'Open';
-            var rename = document.createElement('button');
-            rename.className = 'btn btn-small btn-secondary';
-            rename.textContent = 'Rename';
-            rename.onclick = function() { showRenameModal(d.id); };
-            var del = document.createElement('button');
-            del.className = 'btn btn-small btn-danger';
-            del.textContent = 'Delete';
-            del.onclick = function() { deleteDrawing(d.id); };
-            actions.appendChild(open);
-            actions.appendChild(rename);
-            actions.appendChild(del);
-
-            row.appendChild(info);
-            row.appendChild(actions);
-            return row;
-        }
-
        async function loadDrawings() {
            const resp = await fetch('/api/drawings');
            const drawings = await resp.json();
            const container = document.getElementById('drawings');
-            container.replaceChildren();

            if (!drawings || drawings.length === 0) {
-                var empty = document.createElement('div');
-                empty.className = 'empty';
-                empty.textContent = 'No drawings yet. Create your first one!';
-                container.appendChild(empty);
+                container.innerHTML = '<div class="empty">No drawings yet. Create your first one!</div>';
                return;
            }

-            drawings.forEach(function(d) {
-                container.appendChild(drawingRow(d));
-            });
+            container.innerHTML = drawings.map(function(d) {
+                return '<div class="drawing">' +
+                    '<div class="drawing-info">' +
+                    '<a href="/draw/' + d.id + '" class="drawing-name">' + d.name + '</a>' +
+                    '<div class="drawing-meta">' +
+                    'Modified: ' + new Date(d.modified).toLocaleDateString() + ' ' + new Date(d.modified).toLocaleTimeString() +
+                    ' - ' + formatSize(d.size) +
+                    '</div>' +
+                    '</div>' +
+                    '<div class="drawing-actions">' +
+                    '<a href="/draw/' + d.id + '" class="btn btn-small">Open</a>' +
+                    '<button class="btn btn-small btn-danger" onclick="deleteDrawing(\'' + d.id + '\')">Delete</button>' +
+                    '</div>' +
+                    '</div>';
+            }).join('');
        }

        function formatSize(bytes) {
@ -487,64 +402,18 @@ const dashboardHTML = `<!DOCTYPE html>
            return (bytes / (1024 * 1024)).toFixed(1) + ' MB';
        }

-        var modalAction = null; // invoked with the input value on confirm
-
-        function showModal(title, confirmLabel, initialValue, action) {
-            document.getElementById('modal-title').textContent = title;
-            document.getElementById('modal-confirm').textContent = confirmLabel;
-            var input = document.getElementById('drawingName');
-            input.value = initialValue || '';
-            modalAction = action;
-            document.getElementById('modal').classList.add('active');
-            input.focus();
-            input.select();
-        }
-
        function showNewModal() {
-            showModal('New Drawing', 'Create', '', createDrawing);
-        }
-
-        function showRenameModal(id) {
-            showModal('Rename Drawing', 'Rename', id, function(value) {
-                renameDrawing(id, value);
-            });
+            document.getElementById('modal').classList.add('active');
+            document.getElementById('drawingName').focus();
        }

        function hideModal() {
            document.getElementById('modal').classList.remove('active');
            document.getElementById('drawingName').value = '';
-            modalAction = null;
        }

-        function confirmModal() {
-            if (modalAction) modalAction(document.getElementById('drawingName').value);
-        }
-
-        async function renameDrawing(id, newName) {
-            newName = (newName || '').trim();
-            if (!newName || newName === id) {
-                hideModal();
-                return;
-            }
-            var resp = await fetch('/api/drawings/' + encodeURIComponent(id), {
-                method: 'PATCH',
-                headers: { 'Content-Type': 'application/json' },
-                body: JSON.stringify({ name: newName })
-            });
-            if (resp.status === 409) {
-                alert('A drawing with that name already exists.');
-                return; // keep the modal open so the user can pick another name
-            }
-            if (!resp.ok) {
-                alert('Rename failed: ' + await resp.text());
-                return;
-            }
-            hideModal();
-            loadDrawings();
-        }
-
-        async function createDrawing(name) {
-            name = (name || '').trim();
+        async function createDrawing() {
+            var name = document.getElementById('drawingName').value.trim();
            if (!name) {
                name = 'drawing-' + Date.now();
            }
@ -577,7 +446,7 @@ const dashboardHTML = `<!DOCTYPE html>
        }

        document.getElementById('drawingName').addEventListener('keypress', function(e) {
-            if (e.key === 'Enter') confirmModal();
+            if (e.key === 'Enter') createDrawing();
        });

        document.getElementById('modal').addEventListener('click', function(e) {
--- a/stacks/excalidraw/project/main_test.go
+++ b/stacks/excalidraw/project/main_test.go
@ -1,249 +0,0 @@
-package main
-
-import (
-	"encoding/json"
-	"net/http"
-	"net/http/httptest"
-	"os"
-	"path/filepath"
-	"strings"
-	"testing"
-)
-
-const testDrawing = `{"type":"excalidraw","version":2,"source":"excalidraw-library","elements":[{"id":"e1"}],"appState":{"viewBackgroundColor":"#ffffff"}}`
-
-func setupDataDir(t *testing.T) {
-	t.Helper()
-	dataDir = t.TempDir()
-}
-
-// doDrawing sends a request to handleDrawing for the given user and returns the recorder.
-func doDrawing(t *testing.T, method, id, body, user string) *httptest.ResponseRecorder {
-	t.Helper()
-	var reader *strings.Reader
-	if body == "" {
-		reader = strings.NewReader("")
-	} else {
-		reader = strings.NewReader(body)
-	}
-	req := httptest.NewRequest(method, "/api/drawings/"+id, reader)
-	if user != "" {
-		req.Header.Set("X-Authentik-Username", user)
-	}
-	w := httptest.NewRecorder()
-	handleDrawing(w, req)
-	return w
-}
-
-func listDrawings(t *testing.T, user string) []Drawing {
-	t.Helper()
-	req := httptest.NewRequest(http.MethodGet, "/api/drawings", nil)
-	if user != "" {
-		req.Header.Set("X-Authentik-Username", user)
-	}
-	w := httptest.NewRecorder()
-	handleListDrawings(w, req)
-	if w.Code != http.StatusOK {
-		t.Fatalf("list: expected 200, got %d", w.Code)
-	}
-	var drawings []Drawing
-	if err := json.Unmarshal(w.Body.Bytes(), &drawings); err != nil {
-		t.Fatalf("list: bad JSON: %v", err)
-	}
-	return drawings
-}
-
-func TestPutGetRoundtrip(t *testing.T) {
-	setupDataDir(t)
-	if w := doDrawing(t, http.MethodPut, "foo", testDrawing, "alice"); w.Code != http.StatusOK {
-		t.Fatalf("PUT: expected 200, got %d: %s", w.Code, w.Body.String())
-	}
-	w := doDrawing(t, http.MethodGet, "foo", "", "alice")
-	if w.Code != http.StatusOK {
-		t.Fatalf("GET: expected 200, got %d", w.Code)
-	}
-	if w.Body.String() != testDrawing {
-		t.Errorf("GET: content mismatch: %s", w.Body.String())
-	}
-}
-
-func TestGetMissing(t *testing.T) {
-	setupDataDir(t)
-	if w := doDrawing(t, http.MethodGet, "nope", "", "alice"); w.Code != http.StatusNotFound {
-		t.Fatalf("expected 404, got %d", w.Code)
-	}
-}
-
-func TestListDrawings(t *testing.T) {
-	setupDataDir(t)
-	doDrawing(t, http.MethodPut, "one", testDrawing, "alice")
-	doDrawing(t, http.MethodPut, "two", testDrawing, "alice")
-	drawings := listDrawings(t, "alice")
-	if len(drawings) != 2 {
-		t.Fatalf("expected 2 drawings, got %d", len(drawings))
-	}
-	ids := map[string]bool{drawings[0].ID: true, drawings[1].ID: true}
-	if !ids["one"] || !ids["two"] {
-		t.Errorf("unexpected ids: %v", ids)
-	}
-	for _, d := range drawings {
-		if d.Name != d.ID {
-			t.Errorf("name should equal id: %+v", d)
-		}
-	}
-}
-
-func TestDelete(t *testing.T) {
-	setupDataDir(t)
-	doDrawing(t, http.MethodPut, "foo", testDrawing, "alice")
-	if w := doDrawing(t, http.MethodDelete, "foo", "", "alice"); w.Code != http.StatusOK {
-		t.Fatalf("DELETE: expected 200, got %d", w.Code)
-	}
-	if w := doDrawing(t, http.MethodGet, "foo", "", "alice"); w.Code != http.StatusNotFound {
-		t.Fatalf("GET after delete: expected 404, got %d", w.Code)
-	}
-	if w := doDrawing(t, http.MethodDelete, "foo", "", "alice"); w.Code != http.StatusNotFound {
-		t.Fatalf("second DELETE: expected 404, got %d", w.Code)
-	}
-}
-
-func TestPerUserIsolation(t *testing.T) {
-	setupDataDir(t)
-	doDrawing(t, http.MethodPut, "secret", testDrawing, "alice")
-	if w := doDrawing(t, http.MethodGet, "secret", "", "bob"); w.Code != http.StatusNotFound {
-		t.Fatalf("bob should not see alice's drawing, got %d", w.Code)
-	}
-	if drawings := listDrawings(t, "bob"); len(drawings) != 0 {
-		t.Fatalf("bob's list should be empty, got %d", len(drawings))
-	}
-}
-
-// --- rename (PATCH) ---
-
-func renameReq(t *testing.T, id, newName, user string) *httptest.ResponseRecorder {
-	t.Helper()
-	return doDrawing(t, http.MethodPatch, id, `{"name":`+strconv(newName)+`}`, user)
-}
-
-// strconv JSON-quotes a string without importing encoding/json for a one-liner.
-func strconv(s string) string {
-	b, _ := json.Marshal(s)
-	return string(b)
-}
-
-func TestRenameSuccess(t *testing.T) {
-	setupDataDir(t)
-	doDrawing(t, http.MethodPut, "foo", testDrawing, "alice")
-	w := renameReq(t, "foo", "bar", "alice")
-	if w.Code != http.StatusOK {
-		t.Fatalf("PATCH: expected 200, got %d: %s", w.Code, w.Body.String())
-	}
-	var resp map[string]string
-	if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil {
-		t.Fatalf("PATCH: bad JSON: %v", err)
-	}
-	if resp["id"] != "bar" || resp["status"] != "renamed" {
-		t.Errorf("unexpected response: %v", resp)
-	}
-	if w := doDrawing(t, http.MethodGet, "bar", "", "alice"); w.Code != http.StatusOK || w.Body.String() != testDrawing {
-		t.Errorf("GET new id: code=%d content=%q", w.Code, w.Body.String())
-	}
-	if w := doDrawing(t, http.MethodGet, "foo", "", "alice"); w.Code != http.StatusNotFound {
-		t.Errorf("GET old id: expected 404, got %d", w.Code)
-	}
-}
-
-func TestRenameConflict(t *testing.T) {
-	setupDataDir(t)
-	doDrawing(t, http.MethodPut, "a", testDrawing, "alice")
-	doDrawing(t, http.MethodPut, "b", testDrawing, "alice")
-	if w := renameReq(t, "a", "b", "alice"); w.Code != http.StatusConflict {
-		t.Fatalf("expected 409, got %d", w.Code)
-	}
-	// both drawings intact
-	for _, id := range []string{"a", "b"} {
-		if w := doDrawing(t, http.MethodGet, id, "", "alice"); w.Code != http.StatusOK {
-			t.Errorf("drawing %q should be intact, got %d", id, w.Code)
-		}
-	}
-}
-
-func TestRenameMissing(t *testing.T) {
-	setupDataDir(t)
-	if w := renameReq(t, "nope", "new", "alice"); w.Code != http.StatusNotFound {
-		t.Fatalf("expected 404, got %d", w.Code)
-	}
-}
-
-func TestRenameSameName(t *testing.T) {
-	setupDataDir(t)
-	doDrawing(t, http.MethodPut, "foo", testDrawing, "alice")
-	w := renameReq(t, "foo", "foo", "alice")
-	if w.Code != http.StatusOK {
-		t.Fatalf("same-name rename: expected 200, got %d: %s", w.Code, w.Body.String())
-	}
-	if w := doDrawing(t, http.MethodGet, "foo", "", "alice"); w.Code != http.StatusOK {
-		t.Errorf("drawing should be intact, got %d", w.Code)
-	}
-}
-
-func TestRenameInvalidNames(t *testing.T) {
-	setupDataDir(t)
-	doDrawing(t, http.MethodPut, "foo", testDrawing, "alice")
-	for _, name := range []string{"", "   ", "../..", "---"} {
-		if w := renameReq(t, "foo", name, "alice"); w.Code != http.StatusBadRequest {
-			t.Errorf("rename to %q: expected 400, got %d", name, w.Code)
-		}
-	}
-	// malformed body
-	if w := doDrawing(t, http.MethodPatch, "foo", `{not json`, "alice"); w.Code != http.StatusBadRequest {
-		t.Errorf("malformed body: expected 400, got %d", w.Code)
-	}
-}
-
-func TestRenameSanitization(t *testing.T) {
-	setupDataDir(t)
-	cases := []struct{ in, want string }{
-		{"My Drawing!", "My-Drawing-"},
-		{"net diag.excalidraw", "net-diag"}, // .excalidraw suffix stripped, not mangled
-		{"a/b\\c", "a-b-c"},
-	}
-	for _, c := range cases {
-		doDrawing(t, http.MethodPut, "src", testDrawing, "alice")
-		w := renameReq(t, "src", c.in, "alice")
-		if w.Code != http.StatusOK {
-			t.Errorf("rename to %q: expected 200, got %d: %s", c.in, w.Code, w.Body.String())
-			continue
-		}
-		var resp map[string]string
-		json.Unmarshal(w.Body.Bytes(), &resp)
-		if resp["id"] != c.want {
-			t.Errorf("rename to %q: expected id %q, got %q", c.in, c.want, resp["id"])
-		}
-		// file must be inside the user dir under the sanitized name
-		if _, err := os.Stat(filepath.Join(dataDir, "alice", c.want+".excalidraw")); err != nil {
-			t.Errorf("rename to %q: expected file %q on disk: %v", c.in, c.want, err)
-		}
-		doDrawing(t, http.MethodDelete, resp["id"], "", "alice")
-	}
-}
-
-func TestRenameTraversalStaysInUserDir(t *testing.T) {
-	setupDataDir(t)
-	doDrawing(t, http.MethodPut, "foo", testDrawing, "alice")
-	w := renameReq(t, "foo", "../../../etc/passwd", "alice")
-	if w.Code == http.StatusOK {
-		var resp map[string]string
-		json.Unmarshal(w.Body.Bytes(), &resp)
-		if strings.Contains(resp["id"], "/") || strings.Contains(resp["id"], "..") {
-			t.Fatalf("traversal characters survived: %q", resp["id"])
-		}
-		if _, err := os.Stat(filepath.Join(dataDir, "alice", resp["id"]+".excalidraw")); err != nil {
-			t.Fatalf("renamed file escaped user dir: %v", err)
-		}
-	}
-	// nothing outside the data dir
-	if _, err := os.Stat(filepath.Join(dataDir, "..", "etc")); err == nil {
-		t.Fatal("file escaped the data dir")
-	}
-}
--- a/stacks/excalidraw/project/static/editor.html
+++ b/stacks/excalidraw/project/static/editor.html
@ -8,41 +8,41 @@
        * { margin: 0; padding: 0; }
        html, body { width: 100%; height: 100%; overflow: hidden; }
        #root { width: 100%; height: 100%; }
-        .top-right-ui {
+        .toolbar {
+            position: fixed;
+            top: 10px;
+            left: 10px;
+            z-index: 1000;
            display: flex;
-            align-items: center;
            gap: 8px;
-            font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
-        }
-        .top-right-ui a, .top-right-ui button {
-            display: inline-flex;
-            align-items: center;
-            gap: 6px;
+            background: rgba(255,255,255,0.95);
            padding: 8px 12px;
-            border: 1px solid transparent;
            border-radius: 8px;
+            box-shadow: 0 2px 8px rgba(0,0,0,0.15);
+        }
+        .toolbar button, .toolbar a {
+            padding: 6px 14px;
+            border: none;
+            border-radius: 6px;
            cursor: pointer;
-            font-size: 13px;
+            font-size: 14px;
+            background: #6c5ce7;
+            color: white;
            text-decoration: none;
-            box-shadow: 0 1px 4px rgba(0,0,0,0.12);
-            max-width: 40vw;
-            white-space: nowrap;
-            overflow: hidden;
-            text-overflow: ellipsis;
+            display: inline-block;
        }
-        .top-right-ui.theme-light a, .top-right-ui.theme-light button {
-            background: #ffffff;
-            color: #1b1b1f;
+        .toolbar button:hover, .toolbar a:hover { background: #5b4cdb; }
+        .toolbar .secondary { background: #ddd; color: #333; }
+        .toolbar .secondary:hover { background: #ccc; }
+        .toolbar .title {
+            font-weight: 600;
+            padding: 6px 0;
+            color: #333;
        }
-        .top-right-ui.theme-dark a, .top-right-ui.theme-dark button {
-            background: #232329;
-            color: #e9ecef;
-        }
-        .top-right-ui button:hover, .top-right-ui a:hover { border-color: #a29bfe; }
        .status {
            position: fixed;
            bottom: 10px;
-            right: 60px;
+            right: 10px;
            padding: 6px 12px;
            background: rgba(0,0,0,0.7);
            color: white;
@ -51,7 +51,6 @@
            z-index: 1000;
            opacity: 0;
            transition: opacity 0.3s;
-            pointer-events: none;
        }
        .status.show { opacity: 1; }
        .loading {
@ -68,6 +67,11 @@
    </style>
 </head>
 <body>
+    <div class="toolbar">
+        <a href="/" class="secondary">Back to Library</a>
+        <span class="title" id="title">Loading...</span>
+        <button onclick="saveDrawing()">Save</button>
+    </div>
    <div id="root">
        <div class="loading">
            <div>Loading Excalidraw...</div>
@ -77,33 +81,16 @@
    <div id="status" class="status">Saved</div>

    <script>
-        // Replaces #root with an error panel (safe DOM methods, no innerHTML).
-        function showFatal(title, detail) {
-            var root = document.getElementById('root');
-            root.replaceChildren();
-            var panel = document.createElement('div');
-            panel.className = 'loading error';
-            var titleEl = document.createElement('div');
-            titleEl.textContent = title;
-            panel.appendChild(titleEl);
-            if (detail) {
-                var detailEl = document.createElement('div');
-                detailEl.style.fontSize = '0.9rem';
-                detailEl.textContent = detail;
-                panel.appendChild(detailEl);
-            }
-            root.appendChild(panel);
-        }
-
        // Get drawing ID from URL path: /draw/{id}
        var pathParts = window.location.pathname.split('/');
        var drawingId = pathParts[pathParts.length - 1] || pathParts[pathParts.length - 2];

        if (!drawingId) {
-            showFatal('No drawing ID specified');
+            document.getElementById('root').innerHTML = '<div class="loading error">No drawing ID specified</div>';
            throw new Error('No drawing ID');
        }

+        document.getElementById('title').textContent = drawingId;
        document.title = drawingId + ' - Excalidraw';

        var excalidrawAPI = null;
@ -172,46 +159,6 @@
            autoSaveTimeout = setTimeout(saveDrawing, 2000);
        }

-        // Renames the current drawing via the API. Returns the new ID, or null
-        // if the rename was cancelled or failed.
-        async function renameCurrentDrawing() {
-            var newName = window.prompt('Rename drawing', drawingId);
-            if (newName === null) return null;
-            newName = newName.trim();
-            if (!newName || newName === drawingId) return null;
-
-            // A pending autosave would resurrect the old file after the rename.
-            clearTimeout(autoSaveTimeout);
-
-            var resp;
-            try {
-                resp = await fetch('/api/drawings/' + drawingId, {
-                    method: 'PATCH',
-                    headers: { 'Content-Type': 'application/json' },
-                    body: JSON.stringify({ name: newName })
-                });
-            } catch (e) {
-                showStatus('Rename failed!');
-                return null;
-            }
-            if (resp.status === 409) {
-                window.alert('A drawing with that name already exists.');
-                return null;
-            }
-            if (!resp.ok) {
-                window.alert('Rename failed: ' + (await resp.text()));
-                return null;
-            }
-            var result = await resp.json();
-            drawingId = result.id;
-            document.title = drawingId + ' - Excalidraw';
-            window.history.replaceState(null, '', '/draw/' + encodeURIComponent(drawingId));
-            showStatus('Renamed');
-            // Flush any unsaved changes to the new file.
-            saveDrawing();
-            return drawingId;
-        }
-
        // Load scripts dynamically
        function loadScript(src) {
            return new Promise(function(resolve, reject) {
@ -250,76 +197,33 @@

                updateLoadStatus('Rendering Excalidraw...');

-                var e = React.createElement;
-                var MainMenu = ExcalidrawLib.MainMenu;
-
-                // Native default menu items, existence-guarded so a library
-                // update that drops one degrades gracefully.
-                function defaultItem(name) {
-                    var C = MainMenu && MainMenu.DefaultItems && MainMenu.DefaultItems[name];
-                    return C ? e(C, { key: name }) : null;
-                }
-
+                // Create Excalidraw component
                function App() {
-                    var nameState = React.useState(drawingId);
-                    var name = nameState[0], setName = nameState[1];
-
-                    function onRename() {
-                        renameCurrentDrawing().then(function(newId) {
-                            if (newId) setName(newId);
-                        });
-                    }
-
-                    // The menu is where the native export features live:
-                    // Export = "Save to..." (.excalidraw), SaveAsImage =
-                    // "Export image..." (PNG / SVG / clipboard).
-                    var menu = MainMenu ? e(MainMenu, { key: 'menu' },
-                        e(MainMenu.Item, { key: 'back', onSelect: function() { window.location.href = '/'; } }, 'Back to Library'),
-                        e(MainMenu.Item, { key: 'save', onSelect: saveDrawing }, 'Save now'),
-                        e(MainMenu.Item, { key: 'rename', onSelect: onRename }, 'Rename drawing…'),
-                        MainMenu.Separator ? e(MainMenu.Separator, { key: 'sep1' }) : null,
-                        defaultItem('LoadScene'),
-                        defaultItem('Export'),
-                        defaultItem('SaveAsImage'),
-                        MainMenu.Separator ? e(MainMenu.Separator, { key: 'sep2' }) : null,
-                        defaultItem('ClearCanvas'),
-                        defaultItem('ToggleTheme'),
-                        defaultItem('ChangeCanvasBackground'),
-                        defaultItem('Help')
-                    ) : null;
-
-                    return e(ExcalidrawLib.Excalidraw, {
+                    return React.createElement(ExcalidrawLib.Excalidraw, {
                        initialData: initialData ? {
                            elements: initialData.elements || [],
                            appState: initialData.appState || {}
                        } : undefined,
-                        UIOptions: { canvasActions: { toggleTheme: true } },
                        excalidrawAPI: function(api) {
                            excalidrawAPI = api;
                            console.log('Excalidraw API ready');
                        },
-                        onChange: onChange,
-                        renderTopRightUI: function(isMobile, appState) {
-                            return e('div', { className: 'top-right-ui theme-' + (appState.theme || 'light') },
-                                e('a', { key: 'home', href: '/', title: 'Back to Library' }, '← Library'),
-                                e('button', {
-                                    key: 'name',
-                                    title: 'Click to rename',
-                                    onClick: onRename
-                                }, name + ' ✎')
-                            );
-                        }
-                    }, menu);
+                        onChange: onChange
+                    });
                }

                var root = ReactDOM.createRoot(document.getElementById('root'));
-                root.render(e(App));
+                root.render(React.createElement(App));

                console.log('Excalidraw rendered successfully');

-            } catch (err) {
-                console.error('Init error:', err);
-                showFatal('Failed to load Excalidraw', err.message);
+            } catch (e) {
+                console.error('Init error:', e);
+                document.getElementById('root').innerHTML =
+                    '<div class="loading error">' +
+                    '<div>Failed to load Excalidraw</div>' +
+                    '<div style="font-size:0.9rem">' + e.message + '</div>' +
+                    '</div>';
            }
        }

--- a/stacks/excalidraw/rbac.tf
+++ b/stacks/excalidraw/rbac.tf
@ -1,49 +0,0 @@
-# emo's Claude → Excalidraw upload RBAC.
-#
-# emo's agent uploads drawings with `kubectl -n excalidraw port-forward svc/draw`
-# + `PUT /api/drawings/<name>` carrying the X-Authentik-Username header (the
-# documented recipe in emo's ~/.claude/CLAUDE.md — the app sits behind Authentik
-# forward-auth, so direct curl gets redirected). His hands-off credential is the
-# chrome-service/emo-browser ServiceAccount kubeconfig (stacks/chrome-service/rbac.tf);
-# its cluster-wide grant (oidc-power-user-readonly) is read-only, so pods/portforward
-# must be granted per namespace. This is the excalidraw-namespace grant
-# (Viktor's call, 2026-07-02; same pattern as the chrome-service one).
-#
-# TRADE-OFF (accepted): port-forward into this namespace bypasses the Authentik
-# ingress and the drawings API trusts the X-Authentik-Username header, so the SA
-# can read/write ANY user's drawings, not only emo's. The namespace runs nothing
-# but the drawings app, and the same class of trade-off was already accepted for
-# the shared browser (CDP reach into Viktor's sessions).
-
-resource "kubernetes_role" "portforward" {
-  metadata {
-    name      = "excalidraw-portforward"
-    namespace = kubernetes_namespace.excalidraw.metadata[0].name
-  }
-  rule {
-    api_groups = [""]
-    resources  = ["pods/portforward"]
-    verbs      = ["create"]
-  }
-}
-
-resource "kubernetes_role_binding" "emo_browser_portforward" {
-  metadata {
-    name      = "emo-browser-portforward"
-    namespace = kubernetes_namespace.excalidraw.metadata[0].name
-  }
-  role_ref {
-    api_group = "rbac.authorization.k8s.io"
-    kind      = "Role"
-    name      = kubernetes_role.portforward.metadata[0].name
-  }
-  subject {
-    kind = "ServiceAccount"
-    # Defined in stacks/chrome-service/rbac.tf — referenced by name across
-    # stacks, same as that file references the oidc-power-user-readonly
-    # ClusterRole. get/list on pods+services (needed to resolve svc/draw) comes
-    # from the SA's cluster-read binding there.
-    name      = "emo-browser"
-    namespace = "chrome-service"
-  }
-}
--- a/stacks/f1-stream/main.tf
+++ b/stacks/f1-stream/main.tf
@ -166,33 +166,6 @@ resource "kubernetes_deployment" "f1-stream" {
            name  = "DISCORD_CHANNELS"
            value = var.discord_f1_channel_ids
          }
-          # Replays feature (app repo ADR-0002). optional=true so the pod still
-          # starts before the Reddit app credentials exist; the app treats missing
-          # creds as "replays off" (logs "Replays pipeline disabled"). The
-          # ExternalSecret above uses dataFrom.extract on the Vault "f1-stream"
-          # key, so adding reddit_client_id / reddit_client_secret there auto-syncs
-          # them into this Secret — no ExternalSecret change needed, just a pod
-          # restart to pick them up.
-          env {
-            name = "REDDIT_CLIENT_ID"
-            value_from {
-              secret_key_ref {
-                name     = "f1-stream-secrets"
-                key      = "reddit_client_id"
-                optional = true
-              }
-            }
-          }
-          env {
-            name = "REDDIT_CLIENT_SECRET"
-            value_from {
-              secret_key_ref {
-                name     = "f1-stream-secrets"
-                key      = "reddit_client_secret"
-                optional = true
-              }
-            }
-          }
          # Verifier connects to in-cluster headed Chromium pool — see
          # stacks/chrome-service/. Falls back to in-process headless if unset.
          # 2026-06-04: migrated WS (:3000 / path-token) → CDP (:9222 /
--- a/stacks/frigate/main.tf
+++ b/stacks/frigate/main.tf
@ -117,9 +117,8 @@ resource "kubernetes_deployment" "frigate" {
            limits = {
              memory           = "10Gi"
              "nvidia.com/gpu" = "1"
-              # GPU VRAM budget (ADR-0016): detector + ffmpeg decode (~1.9 GiB),
-              # +~250 MiB NVDEC headroom for the vermont-garage camera (ADR-0017).
-              "viktorbarzin.me/gpumem" = "2300"
+              # GPU VRAM budget (ADR-0016): detector + ffmpeg decode (~1.9 GiB).
+              "viktorbarzin.me/gpumem" = "2000"
            }
          }
          env {
--- a/stacks/immich/frame-emo.tf
+++ b/stacks/immich/frame-emo.tf
@ -73,9 +73,7 @@ resource "kubernetes_deployment" "immich-frame-emo" {
      }
      spec {
        container {
-          # immich_v3: upstream compat tag for Immich v3 — see frame.tf for the
-          # full story; repin to a versioned tag once upstream releases v3 support.
-          image = "ghcr.io/immichframe/immichframe:immich_v3"
+          image = "ghcr.io/immichframe/immichframe:v1.0.32.0"
          name  = "immich-frame-emo"
          resources {
            requests = {
@ -144,21 +142,14 @@ resource "kubernetes_service" "immich-frame-emo" {

 module "ingress_emo" {
  source = "../../modules/kubernetes/ingress_factory"
-  # Photo-frame kiosk display on Emo's Portal Mini (Sofia LAN) — WebView
-  # pulling images via an Immich API key; no user login possible on the
-  # device. Same LAN-only gating as frame.tf: home-lans-only ipAllowList +
-  # dns_type "internal" (Emo's Portal already resolves this host internally
-  # via Technitium; the public internal-IP record covers any resolver).
-  # LAN-only design: docs/plans/2026-07-04-immich-frame-lan-only-design.md.
-  # auth = "none": kiosk WebView, no user auth by design; gated by the home-lans-only ipAllowList instead.
-  auth              = "none"
-  dns_type          = "internal"
-  extra_middlewares = ["traefik-home-lans-only@kubernetescrd"]
-  # Not externally reachable — explicit opt-out so external-monitor-sync
-  # drops the old [External] monitor instead of default-opting it back in.
-  external_monitor = false
-  namespace        = "immich"
-  name             = "highlights-immich-emo"
-  tls_secret_name  = var.tls_secret_name
-  service_name     = "immich-frame-emo"
+  # Photo-frame kiosk display on Emo's Portal — headless browser pulling images
+  # via an Immich API key (no user login). Forward-auth would 302 the device to
+  # Authentik with no way to complete login.
+  # auth = "none": photo-frame kiosk; headless browser with API key; no user login.
+  auth            = "none"
+  dns_type        = "proxied"
+  namespace       = "immich"
+  name            = "highlights-immich-emo"
+  tls_secret_name = var.tls_secret_name
+  service_name    = "immich-frame-emo"
 }
--- a/stacks/immich/frame.tf
+++ b/stacks/immich/frame.tf
@ -69,11 +69,7 @@ resource "kubernetes_deployment" "immich-frame" {
      }
      spec {
        container {
-          # immich_v3 is the upstream compat tag for Immich v3 servers — every
-          # versioned release (≤ v1.0.33.0) crashes deserializing v3 API
-          # responses (immichFrame/immichFrame#653). Pin back to a vX.Y.Z.W tag
-          # once a stable release ships v3 support (upstream PR #654).
-          image = "ghcr.io/immichframe/immichframe:immich_v3"
+          image = "ghcr.io/immichframe/immichframe:v1.0.32.0"
          name  = "immich-frame"
          resources {
            requests = {
@ -142,23 +138,14 @@ resource "kubernetes_service" "immich-frame" {

 module "ingress" {
  source = "../../modules/kubernetes/ingress_factory"
-  # Photo-frame kiosk display (Viktor's London Portal Plus WebView) — pulls
-  # images via an Immich API key; no user login possible on the device, so
-  # forward-auth would 302 it to Authentik with no way to complete login.
-  # The GATE is network-level: the home-lans-only ipAllowList (Sofia/London/
-  # Valchedrym LANs + 10/8) 403s everyone else, and dns_type "internal"
-  # publishes the Traefik LB IP publicly so the Portal's baked-in URL resolves
-  # from any resolver yet routes only via the home LANs / WG tunnel.
-  # LAN-only design: docs/plans/2026-07-04-immich-frame-lan-only-design.md.
-  # auth = "none": kiosk WebView, no user auth by design; gated by the home-lans-only ipAllowList instead.
-  auth              = "none"
-  dns_type          = "internal"
-  extra_middlewares = ["traefik-home-lans-only@kubernetescrd"]
-  # Not externally reachable — explicit opt-out so external-monitor-sync
-  # drops the old [External] monitor instead of default-opting it back in.
-  external_monitor = false
-  namespace        = "immich"
-  name             = "highlights-immich"
-  tls_secret_name  = var.tls_secret_name
-  service_name     = "immich-frame"
+  # Photo-frame kiosk display — runs in headless browser mode on a TV/frame
+  # device and pulls images via an Immich API key (no user login). Forward-auth
+  # would 302 the device to Authentik with no way to complete login.
+  # auth = "none": Photo-frame kiosk display — headless browser with API key; no user login; forward-auth breaks device automation.
+  auth            = "none"
+  dns_type        = "proxied"
+  namespace       = "immich"
+  name            = "highlights-immich"
+  tls_secret_name = var.tls_secret_name
+  service_name    = "immich-frame"
 }
--- a/stacks/immich/main.tf
+++ b/stacks/immich/main.tf
@ -15,7 +15,7 @@ locals {
 variable "immich_version" {
  type = string
  # Change me to upgrade
-  default = "v3.0.0"
+  default = "v2.7.5"
 }
 variable "proxmox_host" { type = string }
 variable "redis_host" { type = string }
@ -492,7 +492,7 @@ resource "kubernetes_deployment" "immich-postgres" {
      }
      spec {
        container {
-          image = "ghcr.io/immich-app/postgres:15-vectorchord0.4.3-pgvectors0.2.0"
+          image = "ghcr.io/immich-app/postgres:15-vectorchord0.3.0-pgvectors0.2.0"
          name  = "immich-postgresql"
          port {
            container_port = 5432
@ -882,7 +882,7 @@ resource "kubernetes_cron_job_v1" "clip-index-prewarm" {
            restart_policy = "Never"
            container {
              name  = "prewarm"
-              image = "ghcr.io/immich-app/postgres:15-vectorchord0.4.3-pgvectors0.2.0"
+              image = "ghcr.io/immich-app/postgres:15-vectorchord0.3.0-pgvectors0.2.0"
              # command overrides the postgres entrypoint → runs psql directly.
              command = [
                "psql", "-v", "ON_ERROR_STOP=1", "-c",
@ -964,7 +964,7 @@ resource "kubernetes_cron_job_v1" "immich-search-probe" {
            }
            init_container {
              name  = "measure"
-              image = "ghcr.io/immich-app/postgres:15-vectorchord0.4.3-pgvectors0.2.0"
+              image = "ghcr.io/immich-app/postgres:15-vectorchord0.3.0-pgvectors0.2.0"
              command = ["/bin/bash", "-c", <<-EOT
                set -uo pipefail
                OUT=/shared/metrics.prom
--- a/stacks/kyverno/modules/kyverno/ghcr-credentials.tf
+++ b/stacks/kyverno/modules/kyverno/ghcr-credentials.tf
@ -43,11 +43,6 @@ locals {
    # ghcr.io/passionprojectsanca/book-plotter (built by GHA in Anca's repo,
    # under her own org's ghcr). The deployment references the cloned secret.
    "plotting-book",
-    # excalidraw: infra-owned image migrated from manual DockerHub pushes to
-    # PRIVATE ghcr.io/viktorbarzin/excalidraw-library (ADR-0002, built by
-    # .github/workflows/build-excalidraw.yml). The deployment references the
-    # cloned secret.
-    "excalidraw",
  ]
 }

--- a/stacks/mailserver/modules/mailserver/extra/aliases.txt
+++ b/stacks/mailserver/modules/mailserver/extra/aliases.txt
@ -19,12 +19,3 @@ plans@viktorbarzin.me spam@viktorbarzin.me
 # to trips@, or every verification/recovery send is rejected (550 sender). Also
 # routes any inbound trips@ to spam@.
 trips@viktorbarzin.me spam@viktorbarzin.me
-
-# docs@ -> docs@: explicit self-alias for the paperless-ngx ingest MAILBOX
-# (a real account in secret/platform.mailserver_accounts). Without this the
-# @domain catch-all above (Vault-side aliases) rewrites docs@ to spam@ and the
-# mail lands in the TripIt-swept catch-all mailbox instead. Same pattern as
-# me@ -> me@. Delivery-time sender allowlist: docs-at-viktorbarzin.me
-# .dovecot.sieve (mounted as docs@viktorbarzin.me.dovecot.sieve).
-# Runbook: docs/runbooks/paperless-mail-ingest.md
-docs@viktorbarzin.me docs@viktorbarzin.me
--- a/stacks/mailserver/modules/mailserver/extra/docs-at-viktorbarzin.me.dovecot.sieve
+++ b/stacks/mailserver/modules/mailserver/extra/docs-at-viktorbarzin.me.dovecot.sieve
@ -1,17 +0,0 @@
-# Sender allowlist for the paperless-ngx ingest mailbox docs@viktorbarzin.me.
-# Family members forward document emails here; paperless-ngx polls the INBOX
-# over IMAP and maps each sender to a paperless account (1 mail rule per
-# sender). Decision (Viktor, 2026-07-03): mail from any OTHER sender is
-# ignored and deleted — discarded here at LMTP delivery, before paperless
-# ever sees it. This also keeps spam to the guessable address out entirely.
-#
-# Keep this list in sync with the paperless mail rules (the sender -> owner
-# map). Add-a-sender procedure: docs/runbooks/paperless-mail-ingest.md
-if not address :is "from" ["me@viktorbarzin.me",
-                           "vbarzin@gmail.com",
-                           "viktorbarzin@meta.com",
-                           "ancaelena98@gmail.com",
-                           "emil.barzin@gmail.com"] {
-    discard;
-    stop;
-}
--- a/stacks/mailserver/modules/mailserver/main.tf
+++ b/stacks/mailserver/modules/mailserver/main.tf
@ -14,15 +14,10 @@ variable "nfs_server" { type = string }
 locals {
  _account_set   = keys(var.mailserver_accounts)
  _virtual_lines = split("\n", format("%s%s", var.postfix_account_aliases, file("${path.module}/extra/aliases.txt")))
-  # NOTE: the length guard must live in a ternary, not a leading `&&` operand.
-  # Terraform only short-circuits && / || from v1.6 — on the older terraform
-  # pinned in the infra-ci image, `split(" ", line)[1]` was still evaluated
-  # for blank/comment lines and failed the whole plan with "Invalid index"
-  # (first hit by CI pipeline #469, 2026-07-03). A conditional expression is
-  # lazy on every terraform version.
  postfix_virtual = join("\n", [
    for line in local._virtual_lines : line
-    if length(split(" ", line)) != 2 ? true : !(
+    if !(
+      length(split(" ", line)) == 2 &&
      contains(local._account_set, split(" ", line)[0]) &&
      contains(local._account_set, split(" ", line)[1]) &&
      split(" ", line)[0] != split(" ", line)[1]
@ -115,12 +110,6 @@ resource "kubernetes_config_map" "mailserver_config" {
    "postfix-main.cf"     = var.postfix_cf
    "postfix-virtual.cf"  = local.postfix_virtual

-    # Per-user Dovecot sieve for the paperless-ngx ingest mailbox: DMS installs
-    # any /tmp/docker-mailserver/<login>.dovecot.sieve at startup. ConfigMap
-    # keys can't contain '@', so the key is sanitized ("-at-") and the
-    # volume_mount below restores the real filename.
-    "docs-at-viktorbarzin.me.dovecot.sieve" = file("${path.module}/extra/docs-at-viktorbarzin.me.dovecot.sieve")
-
    KeyTable      = "mail._domainkey.viktorbarzin.me viktorbarzin.me:mail:/etc/opendkim/keys/viktorbarzin.me-mail.key\n"
    SigningTable  = "*@viktorbarzin.me mail._domainkey.viktorbarzin.me\n"
    TrustedHosts  = "127.0.0.1\nlocalhost\n"
@ -415,12 +404,6 @@ resource "kubernetes_deployment" "mailserver" {
            sub_path   = "postfix-virtual.cf"
            read_only  = true
          }
-          volume_mount {
-            name       = "config"
-            mount_path = "/tmp/docker-mailserver/docs@viktorbarzin.me.dovecot.sieve"
-            sub_path   = "docs-at-viktorbarzin.me.dovecot.sieve"
-            read_only  = true
-          }
          volume_mount {
            name       = "config"
            mount_path = "/tmp/docker-mailserver/fetchmail.cf"
--- a/stacks/monitoring/modules/monitoring/authentik_walloff_probe.tf
+++ b/stacks/monitoring/modules/monitoring/authentik_walloff_probe.tf
@ -60,10 +60,6 @@ locals {
    # t3 dispatch probe surface (auth="none" path carve-out on /probe): WS echo
    # + healthz for the t3-probe drop-attribution client (stacks/t3code).
    "t3-probe-ws" = "https://t3.viktorbarzin.me/probe/healthz"
-    # tasks PWA icons + manifest (auth="none" path carve-out, stacks/tasks
-    # module.ingress_icons): macOS/iOS/Android icon fetchers carry no session
-    # cookies, so an Authentik 302 here breaks Add-to-Dock icons.
-    "tasks-icons" = "https://tasks.viktorbarzin.me/apple-touch-icon.png"
    # NOTE: openclaw task-webhook (auth="none") is intentionally NOT probed — it
    # has no public DNS record (NXDOMAIN, external_monitor=false), so there is no
    # externally GET-able URL to probe. Its carve-out is internal-only.
--- a/stacks/rybbit/worker/index.js
+++ b/stacks/rybbit/worker/index.js
@ -18,6 +18,7 @@ const SITE_IDS = {
  "stacks.viktorbarzin.me": "b38fda4285df",
  "f1.viktorbarzin.me": "7e69786f66d5",
  "frigate.viktorbarzin.me": "0d4044069ff5",
+  "highlights-immich.viktorbarzin.me": "602167601c6b",
  "immich.viktorbarzin.me": "35eedb7a3d2b",
  "mail.viktorbarzin.me": "082f164faa7d",
  "navidrome.viktorbarzin.me": "8a3844ff75ba",
--- a/stacks/rybbit/worker/wrangler.toml
+++ b/stacks/rybbit/worker/wrangler.toml
@ -28,6 +28,7 @@ routes = [
  { pattern = "stacks.viktorbarzin.me/*",               zone_name = "viktorbarzin.me" },
  { pattern = "f1.viktorbarzin.me/*",                   zone_name = "viktorbarzin.me" },
  { pattern = "frigate.viktorbarzin.me/*",              zone_name = "viktorbarzin.me" },
+  { pattern = "highlights-immich.viktorbarzin.me/*",    zone_name = "viktorbarzin.me" },
  { pattern = "immich.viktorbarzin.me/*",               zone_name = "viktorbarzin.me" },
  { pattern = "mail.viktorbarzin.me/*",                 zone_name = "viktorbarzin.me" },
  { pattern = "navidrome.viktorbarzin.me/*",            zone_name = "viktorbarzin.me" },
--- a/stacks/stem95su/gdrive-sync.tf
+++ b/stacks/stem95su/gdrive-sync.tf
@ -0,0 +1,122 @@
+# Automatic Google Drive -> site sync (added 2026-06-09; supersedes the
+# earlier on-demand-only model now that content is actively maintained).
+#
+# A CronJob mirrors the READ-ONLY Drive folder "claude" (servable content in
+# subfolder "stem claude/files/") onto the NFS content volume every 10 min via
+# rclone. rclone is delta-aware: an unchanged run lists ~33 files' metadata and
+# transfers nothing, so the schedule is cheap (not a 24MB re-download). nginx
+# keeps serving the same volume read-only; updates appear within ~5s (actimeo).
+#
+# Drive is treated strictly READ-ONLY: scope=drive.readonly and rclone only ever
+# reads the remote (sync gdrive: -> /data), never writes back.
+#
+# TOKEN LONGEVITY: the GCP OAuth app (project home-lab-1700868541205) MUST be
+# published to "Production" or its refresh token expires ~weekly and this job
+# fails. After publishing, re-mint the token and refresh
+# `secret/stem95su.rclone_conf`. A failed run surfaces as a failed Job.
+
+resource "kubernetes_manifest" "rclone_external_secret" {
+  field_manager {
+    force_conflicts = true
+  }
+  manifest = {
+    apiVersion = "external-secrets.io/v1"
+    kind       = "ExternalSecret"
+    metadata = {
+      name      = "stem95su-rclone"
+      namespace = kubernetes_namespace.stem95su.metadata[0].name
+    }
+    spec = {
+      refreshInterval = "1h"
+      secretStoreRef = {
+        name = "vault-kv"
+        kind = "ClusterSecretStore"
+      }
+      target = { name = "stem95su-rclone" }
+      data = [{
+        secretKey = "rclone.conf"
+        remoteRef = {
+          key      = "stem95su"
+          property = "rclone_conf"
+        }
+      }]
+    }
+  }
+  depends_on = [kubernetes_namespace.stem95su]
+}
+
+resource "kubernetes_cron_job_v1" "gdrive_sync" {
+  metadata {
+    name      = "stem95su-gdrive-sync"
+    namespace = kubernetes_namespace.stem95su.metadata[0].name
+    labels    = { run = "stem95su", component = "gdrive-sync" }
+  }
+  spec {
+    schedule                      = "*/10 * * * *"
+    concurrency_policy            = "Forbid"
+    successful_jobs_history_limit = 2
+    failed_jobs_history_limit     = 3
+    job_template {
+      metadata {}
+      spec {
+        backoff_limit              = 1
+        ttl_seconds_after_finished = 86400
+        template {
+          metadata { labels = { run = "stem95su", component = "gdrive-sync" } }
+          spec {
+            restart_policy = "OnFailure"
+            container {
+              name  = "rclone"
+              image = "docker.io/rclone/rclone:1.74.3"
+              # Mirror Drive folder -> /data. Guard: hard-fail on auth/list error
+              # (so an expired token is visible); skip quietly if the source is
+              # empty / missing the dashboard (never wipe the live site);
+              # --max-delete caps catastrophic deletes from a partial listing.
+              command = ["/bin/sh", "-c", <<-EOT
+                set -eu
+                cp /config/rclone.conf /tmp/rc.conf
+                SRC="gdrive:stem claude/files"
+                LIST=$(rclone --config /tmp/rc.conf lsf "$SRC" --files-only) || { echo "FATAL: Drive list failed (auth/network)"; exit 1; }
+                N=$(printf '%s\n' "$LIST" | grep -c . || true)
+                if [ "$N" -lt 1 ] || ! printf '%s\n' "$LIST" | grep -qx "stem_board.html"; then
+                  echo "GUARD: source N=$N / stem_board.html missing -- skipping, site untouched"; exit 0
+                fi
+                echo "source OK ($N files) -- mirroring to /data"
+                rclone --config /tmp/rc.conf sync "$SRC" /data --exclude ".DS_Store" --fast-list --transfers 4 --max-delete 25 -v
+              EOT
+              ]
+              resources {
+                requests = { cpu = "10m", memory = "64Mi" }
+                limits   = { memory = "192Mi" }
+              }
+              volume_mount {
+                name       = "rclone-config"
+                mount_path = "/config"
+                read_only  = true
+              }
+              volume_mount {
+                name       = "content"
+                mount_path = "/data"
+              }
+            }
+            volume {
+              name = "rclone-config"
+              secret { secret_name = "stem95su-rclone" }
+            }
+            volume {
+              name = "content"
+              persistent_volume_claim {
+                claim_name = module.nfs_content.claim_name
+              }
+            }
+          }
+        }
+      }
+    }
+  }
+  lifecycle {
+    # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
+    ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
+  }
+  depends_on = [kubernetes_manifest.rclone_external_secret]
+}
--- a/stacks/stem95su/main.tf
+++ b/stacks/stem95su/main.tf
@ -1,9 +1,173 @@
-# stem95su moved OFF-INFRA to Cloudflare Pages (ADR-0018 cutover, 2026-07-03) —
-# registry entry `stem95su` in stacks/valia-sites; runbook
-# docs/runbooks/valia-sites.md. This stack intentionally declares NOTHING:
-# the apply that landed this file destroyed the old in-cluster serving
-# (nginx + NFS content PVC + ingress + per-site gdrive-sync CronJob +
-# namespace). Directory kept only so the destroy could run through CI —
-# safe to delete the dir + its PG state schema in a later cleanup.
-# Harmless leftovers (manual cleanup if ever wanted): /srv/nfs/stem-site on
-# the PVE host, and Vault secret/stem95su (superseded by secret/valia-sites).
+# STEM educational platform for 95. СУ „Проф. Иван Шишманов" (Sofia).
+# Public, open static site at stem95su.viktorbarzin.me. Self-contained HTML
+# pages + media authored externally (Gemini exports), served by a stock nginx
+# straight off the PVE host NFS — NOT baked into an image, so content can be
+# updated out-of-band (Nextcloud "PVE NFS Pool" or rsync to /srv/nfs/stem-site)
+# without a rebuild. Auto-backed-up offsite by the existing nfs-mirror job.
+
+resource "kubernetes_namespace" "stem95su" {
+  metadata {
+    name = "stem95su"
+    labels = {
+      "istio-injection" : "disabled"
+      tier = local.tiers.aux
+    }
+  }
+  lifecycle {
+    # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
+    ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
+  }
+}
+
+module "tls_secret" {
+  source          = "../../modules/kubernetes/setup_tls_secret"
+  namespace       = kubernetes_namespace.stem95su.metadata[0].name
+  tls_secret_name = var.tls_secret_name
+}
+
+# Content lives on the PVE host NFS. NOTE: the nfs_volume module creates only
+# the K8s PV+PVC — the export subdir (/srv/nfs/stem-site) must already exist on
+# 192.168.1.127 or the pod fails to mount (mount.nfs exit 32). It is created
+# during deploy and re-created on demand if ever lost.
+module "nfs_content" {
+  source       = "../../modules/kubernetes/nfs_volume"
+  name         = "stem95su-content"
+  namespace    = kubernetes_namespace.stem95su.metadata[0].name
+  nfs_server   = var.nfs_server
+  nfs_path     = "/srv/nfs/stem-site"
+  storage      = "1Gi"
+  access_modes = ["ReadWriteMany"]
+}
+
+# Minimal nginx server block: serve the static dir, with the dashboard
+# (stem_board.html) as the directory index so "/" loads the platform home.
+# All other pages/assets are reached by their exact filenames (the dashboard
+# links to them by name — those must not be renamed).
+resource "kubernetes_config_map" "nginx_conf" {
+  metadata {
+    name      = "stem95su-nginx-conf"
+    namespace = kubernetes_namespace.stem95su.metadata[0].name
+  }
+  data = {
+    "default.conf" = <<-EOT
+      server {
+          listen       80;
+          server_name  _;
+          root   /usr/share/nginx/html;
+          index  stem_board.html index.html;
+      }
+    EOT
+  }
+}
+
+resource "kubernetes_deployment" "stem95su" {
+  metadata {
+    name      = "stem95su"
+    namespace = kubernetes_namespace.stem95su.metadata[0].name
+    labels = {
+      run  = "stem95su"
+      tier = local.tiers.aux
+    }
+  }
+  spec {
+    replicas = 1
+    selector {
+      match_labels = {
+        run = "stem95su"
+      }
+    }
+    template {
+      metadata {
+        labels = {
+          run = "stem95su"
+        }
+      }
+      spec {
+        container {
+          image = "nginx:1.28-alpine"
+          name  = "nginx"
+          resources {
+            limits = {
+              memory = "64Mi"
+            }
+            requests = {
+              cpu    = "10m"
+              memory = "64Mi"
+            }
+          }
+          port {
+            container_port = 80
+          }
+          volume_mount {
+            name       = "content"
+            mount_path = "/usr/share/nginx/html"
+            read_only  = true
+          }
+          volume_mount {
+            name       = "nginx-conf"
+            mount_path = "/etc/nginx/conf.d"
+            read_only  = true
+          }
+          readiness_probe {
+            http_get {
+              path = "/"
+              port = 80
+            }
+            initial_delay_seconds = 3
+            period_seconds        = 10
+          }
+        }
+        volume {
+          name = "content"
+          persistent_volume_claim {
+            claim_name = module.nfs_content.claim_name
+          }
+        }
+        volume {
+          name = "nginx-conf"
+          config_map {
+            name = kubernetes_config_map.nginx_conf.metadata[0].name
+          }
+        }
+      }
+    }
+  }
+  lifecycle {
+    ignore_changes = [
+      spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
+    ]
+  }
+}
+
+resource "kubernetes_service" "stem95su" {
+  metadata {
+    name      = "stem95su"
+    namespace = kubernetes_namespace.stem95su.metadata[0].name
+    labels = {
+      run = "stem95su"
+    }
+  }
+  spec {
+    selector = {
+      run = "stem95su"
+    }
+    port {
+      name        = "http"
+      port        = "80"
+      target_port = "80"
+    }
+  }
+}
+
+module "ingress" {
+  source = "../../modules/kubernetes/ingress_factory"
+  # auth = "none": public static educational site for 95. СУ, open to the internet by design — CrowdSec + ai-bot-block gate bots; no login.
+  auth            = "none"
+  namespace       = kubernetes_namespace.stem95su.metadata[0].name
+  name            = "stem95su"
+  service_name    = kubernetes_service.stem95su.metadata[0].name
+  port            = "80"
+  host            = "stem95su"
+  dns_type        = "proxied"
+  tls_secret_name = var.tls_secret_name
+}
--- a/stacks/stem95su/variables.tf
+++ b/stacks/stem95su/variables.tf
@ -0,0 +1,9 @@
+variable "tls_secret_name" {
+  type      = string
+  sensitive = true
+}
+
+variable "nfs_server" {
+  type    = string
+  default = "192.168.1.127"
+}
--- a/stacks/tasks/imports.tf
+++ b/stacks/tasks/imports.tf
@ -1,53 +0,0 @@
-# One-shot adoption of the live tasks-stack resources that exist in-cluster but
-# were never persisted to Terraform state: pipeline 477 (2026-07-03, the stack's
-# first apply) died mid-`[tasks] apply` — after creating the resources, before
-# the pg backend write — so `tasks.states` stayed empty and every later apply
-# would create-fail with `namespaces "tasks" already exists` (same class as the
-# monitoring alert-digest adoption in stacks/monitoring/imports.tf). Importing
-# reconciles them into state so `terraform apply` UPDATES instead of failing to
-# create. These blocks are idempotent (a no-op once the resources are in state)
-# and may be removed after the next green apply. Defs: main.tf.
-# (module.ingress_icons is deliberately NOT here — it does not exist live yet;
-# the same apply creates it.)
-
-import {
-  to = kubernetes_namespace.tasks
-  id = "tasks"
-}
-
-import {
-  to = kubernetes_manifest.external_secret
-  id = "apiVersion=external-secrets.io/v1,kind=ExternalSecret,namespace=tasks,name=tasks-secrets"
-}
-
-import {
-  to = kubernetes_manifest.db_external_secret
-  id = "apiVersion=external-secrets.io/v1,kind=ExternalSecret,namespace=tasks,name=tasks-db-creds"
-}
-
-import {
-  to = kubernetes_deployment.tasks
-  id = "tasks/tasks"
-}
-
-import {
-  to = kubernetes_service.tasks
-  id = "tasks/tasks"
-}
-
-import {
-  to = kubernetes_network_policy_v1.tasks_ingress
-  id = "tasks/tasks-ingress"
-}
-
-import {
-  to = module.ingress.kubernetes_ingress_v1.proxied-ingress
-  id = "tasks/tasks"
-}
-
-# Cloudflare record ID looked up via the API (zone fd2c5dd4… / record for
-# tasks.viktorbarzin.me, CNAME → the cfargotunnel target, proxied).
-import {
-  to = module.ingress.cloudflare_record.proxied[0]
-  id = "fd2c5dd4efe8fe38958944e74d0ced6d/a8e6901a074c5255d09700d93eaaf705"
-}
--- a/stacks/tasks/main.tf
+++ b/stacks/tasks/main.tf
@ -1,378 +0,0 @@
-variable "image_tag" {
-  type        = string
-  default     = "latest"
-  description = "tasks image tag. Running tag is set by the Woodpecker deploy (kubectl set image)."
-}
-
-variable "postgresql_host" { type = string }
-
-variable "tls_secret_name" {
-  type      = string
-  sensitive = true
-}
-
-locals {
-  namespace = "tasks"
-  # ADR-0002: built on GHA from the public GitHub mirror, pushed to ghcr
-  # (public package — anonymous pulls). Running tag is managed by the
-  # Woodpecker deploy (kubectl set image); the image ref below is
-  # ignore_changes'd (KEEL_IGNORE_IMAGE), so this base only matters on
-  # (re)create.
-  image = "ghcr.io/viktorbarzin/tasks:${var.image_tag}"
-  labels = {
-    app = "tasks"
-  }
-}
-
-resource "kubernetes_namespace" "tasks" {
-  metadata {
-    name = local.namespace
-    labels = {
-      tier              = local.tiers.aux
-      "istio-injection" = "disabled"
-      # Opt into Keel auto-update (inject-keel-annotations ClusterPolicy).
-      "keel.sh/enrolled" = "true"
-    }
-  }
-  lifecycle {
-    # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label.
-    ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
-  }
-}
-
-# App secrets — seed these in Vault before applying:
-#   secret/tasks
-#     fernet_key — Fernet key encrypting the per-user Nextcloud app passwords
-#                  stored in the Connected Accounts table (tasks ADR-0002).
-#
-# DB: CNPG database `tasks` (created in dbaas, null_resource.pg_tasks_db);
-# role password managed via the Vault database engine — see
-# static-creds/pg-tasks. Alembic runs migrations on app startup (no init
-# container needed).
-resource "kubernetes_manifest" "external_secret" {
-  field_manager {
-    force_conflicts = true
-  }
-  manifest = {
-    apiVersion = "external-secrets.io/v1"
-    kind       = "ExternalSecret"
-    metadata = {
-      name      = "tasks-secrets"
-      namespace = local.namespace
-    }
-    spec = {
-      refreshInterval = "15m"
-      secretStoreRef = {
-        name = "vault-kv"
-        kind = "ClusterSecretStore"
-      }
-      target = {
-        name = "tasks-secrets"
-        template = {
-          metadata = {
-            annotations = {
-              "reloader.stakater.com/match" = "true"
-            }
-          }
-        }
-      }
-      data = [
-        { secretKey = "TASKS_FERNET_KEY", remoteRef = { key = "tasks", property = "fernet_key" } },
-      ]
-    }
-  }
-  depends_on = [kubernetes_namespace.tasks]
-}
-
-# DB credentials from Vault database engine (7-day rotation).
-# Builds the asyncpg DSN consumed by the FastAPI app as TASKS_DB_DSN.
-# Pre-req in dbaas: CNPG cluster has DB `tasks`, role `tasks`, and Vault
-# role `static-creds/pg-tasks`.
-resource "kubernetes_manifest" "db_external_secret" {
-  field_manager {
-    force_conflicts = true
-  }
-  manifest = {
-    apiVersion = "external-secrets.io/v1"
-    kind       = "ExternalSecret"
-    metadata = {
-      name      = "tasks-db-creds"
-      namespace = local.namespace
-    }
-    spec = {
-      refreshInterval = "15m"
-      secretStoreRef = {
-        name = "vault-database"
-        kind = "ClusterSecretStore"
-      }
-      target = {
-        name = "tasks-db-creds"
-        template = {
-          metadata = {
-            annotations = {
-              "reloader.stakater.com/match" = "true"
-            }
-          }
-          data = {
-            TASKS_DB_DSN = "postgresql+asyncpg://tasks:{{ .password }}@${var.postgresql_host}:5432/tasks"
-            DB_PASSWORD  = "{{ .password }}"
-          }
-        }
-      }
-      data = [{
-        secretKey = "password"
-        remoteRef = {
-          key      = "static-creds/pg-tasks"
-          property = "password"
-        }
-      }]
-    }
-  }
-  depends_on = [kubernetes_namespace.tasks]
-}
-
-resource "kubernetes_deployment" "tasks" {
-  metadata {
-    name      = "tasks"
-    namespace = kubernetes_namespace.tasks.metadata[0].name
-    labels = merge(local.labels, {
-      tier = local.tiers.aux
-    })
-    annotations = {
-      # Reloader restarts the pod when tasks-secrets / tasks-db-creds change
-      # (both carry reloader.stakater.com/match=true) — required because the
-      # DB password rotates every 7 days and is read only at startup.
-      "reloader.stakater.com/search" = "true"
-    }
-  }
-
-  spec {
-    # Single leader: the CalDAV sync engine wants one writer per user's
-    # sync-token cursor; the SPA is served by the same process.
-    replicas = 1
-    strategy {
-      type = "Recreate"
-    }
-
-    selector {
-      match_labels = local.labels
-    }
-
-    template {
-      metadata {
-        labels = local.labels
-        annotations = {
-          # Prometheus scrapes the service-endpoints (annotations live on the
-          # Service below); the pod annotations here let the kubernetes-pods
-          # SD job also discover /metrics directly.
-          "prometheus.io/scrape" = "true"
-          "prometheus.io/path"   = "/metrics"
-          "prometheus.io/port"   = "8000"
-        }
-      }
-
-      spec {
-        image_pull_secrets {
-          name = "registry-credentials"
-        }
-
-        container {
-          name  = "tasks"
-          image = local.image
-
-          port {
-            container_port = 8000
-          }
-
-          # TASKS_FERNET_KEY via tasks-secrets; TASKS_DB_DSN via tasks-db-creds.
-          env_from {
-            secret_ref { name = "tasks-secrets" }
-          }
-          env_from {
-            secret_ref { name = "tasks-db-creds" }
-          }
-
-          # Wall-clock zone for all-day due dates (DUE;VALUE=DATE) and the
-          # Today/Scheduled smart views.
-          env {
-            name  = "TASKS_LOCAL_TZ"
-            value = "Europe/Sofia"
-          }
-          # SECURITY INVARIANT — DEV_USER must NEVER be set here. It is the
-          # dev-only identity fallback: when present the backend treats every
-          # request as that user, bypassing the Authentik forward-auth
-          # identity (X-authentik-username) entirely. Production identity
-          # comes ONLY from the header Traefik/Authentik injects.
-
-          readiness_probe {
-            http_get {
-              path = "/healthz"
-              port = 8000
-            }
-            initial_delay_seconds = 5
-            period_seconds        = 10
-          }
-          liveness_probe {
-            http_get {
-              path = "/healthz"
-              port = 8000
-            }
-            initial_delay_seconds = 30
-            period_seconds        = 30
-          }
-
-          resources {
-            requests = { cpu = "100m", memory = "384Mi" }
-            limits   = { memory = "384Mi" }
-          }
-        }
-      }
-    }
-  }
-
-  lifecycle {
-    ignore_changes = [
-      spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
-      metadata[0].annotations["keel.sh/policy"],
-      metadata[0].annotations["keel.sh/trigger"],
-      metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
-      metadata[0].annotations["keel.sh/match-tag"],
-      spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Woodpecker deploy sets the running tag
-      metadata[0].annotations["kubernetes.io/change-cause"],
-      metadata[0].annotations["deployment.kubernetes.io/revision"],
-      spec[0].template[0].metadata[0].annotations["keel.sh/update-time"], # KEEL_LIFECYCLE_V1
-    ]
-  }
-
-  depends_on = [
-    kubernetes_manifest.external_secret,
-    kubernetes_manifest.db_external_secret,
-  ]
-}
-
-resource "kubernetes_service" "tasks" {
-  metadata {
-    name      = "tasks"
-    namespace = kubernetes_namespace.tasks.metadata[0].name
-    labels    = local.labels
-    annotations = {
-      # Prometheus kubernetes-service-endpoints SD scrapes /metrics here.
-      "prometheus.io/scrape" = "true"
-      "prometheus.io/path"   = "/metrics"
-      "prometheus.io/port"   = "8000"
-    }
-  }
-
-  spec {
-    type     = "ClusterIP"
-    selector = local.labels
-
-    port {
-      name        = "http"
-      port        = 8000
-      target_port = 8000
-    }
-  }
-}
-
-# Kyverno ClusterPolicy `sync-tls-secret` auto-clones the wildcard TLS
-# secret into every namespace, so we don't need a setup_tls_secret module.
-
-module "ingress" {
-  source = "../../modules/kubernetes/ingress_factory"
-  # auth = "required": Authentik forward-auth gates EVERY request — the app
-  # has no login of its own and blindly trusts the X-authentik-username
-  # header the outpost injects, so Authentik is the only thing standing
-  # between strangers and everyone's tasks. Do NOT relax this tier (tasks
-  # design decision #3; pairs with the NetworkPolicy below, SEC-1).
-  auth            = "required"
-  dns_type        = "proxied"
-  namespace       = kubernetes_namespace.tasks.metadata[0].name
-  name            = "tasks"
-  port            = 8000
-  tls_secret_name = var.tls_secret_name
-}
-
-# Carve-out for the PWA icon assets + web manifest. macOS Safari's
-# "Add to Dock" (and every other OS icon fetcher: iOS Add-to-Home-Screen,
-# Android install prompt) fetches these in a cookie-less context — behind
-# forward-auth it got the Authentik 302 and fell back to a letter monogram.
-# Traefik prioritises these longer path prefixes over the main "/" router,
-# so ONLY these five static files bypass Authentik; the SPA shell and /api
-# stay gated by the main ingress above (and the app itself 401s /api
-# without the identity header). Guarded against regression by the
-# tasks-icons entry in the Authentik walling-off probe
-# (stacks/monitoring/modules/monitoring/authentik_walloff_probe.tf).
-module "ingress_icons" {
-  source = "../../modules/kubernetes/ingress_factory"
-  # auth = "none": public static icons + manifest, no user data; required for
-  # OS icon fetchers (Safari Add-to-Dock etc.) that carry no session and
-  # cannot complete the Authentik redirect dance.
-  auth         = "none"
-  namespace    = kubernetes_namespace.tasks.metadata[0].name
-  name         = "tasks-icons"
-  service_name = kubernetes_service.tasks.metadata[0].name
-  port         = 8000
-  ingress_path = [
-    "/apple-touch-icon.png",
-    "/favicon.png",
-    "/pwa-192x192.png",
-    "/pwa-512x512.png",
-    "/manifest.webmanifest",
-  ]
-  full_host        = "tasks.viktorbarzin.me" # MUST match the main ingress host; otherwise the factory derives tasks-icons.viktorbarzin.me and the carve-out never matches.
-  dns_type         = "none"                  # host record already owned by the main tasks ingress
-  tls_secret_name  = var.tls_secret_name
-  anti_ai_scraping = false # Five static icons + a manifest; nothing for scrapers to mine.
-  homepage_enabled = false # path carve-out, not its own dashboard tile
-}
-
-# --- NetworkPolicy: scoped pod ingress (security-review finding SEC-1). ---
-# The app trusts X-authentik-username unconditionally, so its ENTIRE auth
-# model depends on requests only ever arriving through Traefik (where the
-# Authentik forward-auth middleware sets that header). Any pod that could
-# reach the pod IP directly could spoof the header and read/write anyone's
-# tasks — hence ingress is restricted to:
-#   - TCP/8000 from the traefik namespace (user traffic, post-forward-auth);
-#   - TCP/8000 from the monitoring namespace (Prometheus /metrics scrape).
-# The cluster has no default-deny, so this NP only takes effect inside the
-# tasks ns — pods elsewhere remain unaffected. (Same shape as
-# chrome-service's chrome-service-ws-ingress.)
-resource "kubernetes_network_policy_v1" "tasks_ingress" {
-  metadata {
-    name      = "tasks-ingress"
-    namespace = kubernetes_namespace.tasks.metadata[0].name
-  }
-  spec {
-    pod_selector {
-      match_labels = local.labels
-    }
-    policy_types = ["Ingress"]
-    ingress {
-      from {
-        namespace_selector {
-          match_labels = {
-            "kubernetes.io/metadata.name" = "traefik"
-          }
-        }
-      }
-      ports {
-        port     = "8000"
-        protocol = "TCP"
-      }
-    }
-    ingress {
-      from {
-        namespace_selector {
-          match_labels = {
-            "kubernetes.io/metadata.name" = "monitoring"
-          }
-        }
-      }
-      ports {
-        port     = "8000"
-        protocol = "TCP"
-      }
-    }
-  }
-}
--- a/stacks/tasks/terragrunt.hcl
+++ b/stacks/tasks/terragrunt.hcl
@ -1,23 +0,0 @@
-include "root" {
-  path = find_in_parent_folders()
-}
-
-dependency "platform" {
-  config_path  = "../platform"
-  skip_outputs = true
-}
-
-dependency "vault" {
-  config_path  = "../vault"
-  skip_outputs = true
-}
-
-dependency "external-secrets" {
-  config_path  = "../external-secrets"
-  skip_outputs = true
-}
-
-inputs = {
-  # Override per-deploy in CI / commit.
-  image_tag = "latest"
-}
--- a/stacks/technitium/modules/technitium/main.tf
+++ b/stacks/technitium/modules/technitium/main.tf
@ -873,14 +873,6 @@ resource "kubernetes_cluster_role" "ingress_dns_sync" {
    resources  = ["services"]
    verbs      = ["get", "list"]
  }
-  # Read the Valia-sites internal-DNS feed (written by stacks/valia-sites,
-  # ADR-0018) so the sync can reconcile off-infra Pages CNAMEs declaratively.
-  rule {
-    api_groups     = [""]
-    resources      = ["configmaps"]
-    resource_names = ["valia-sites-dns"]
-    verbs          = ["get"]
-  }
 }

 resource "kubernetes_cluster_role_binding" "ingress_dns_sync" {
@ -1010,42 +1002,6 @@ resource "kubernetes_cron_job_v1" "technitium_ingress_dns_sync" {
                  echo "mail-auth: MX present"
                fi

-                # Valia sites (ADR-0018) — off-infra Cloudflare Pages sites.
-                # The internal zone is authoritative (superset rule above), so
-                # these public-only names must exist here or every internal
-                # client NXDOMAINs on them. Reconciled DECLARATIVELY from the
-                # ConfigMap valia-sites-dns (written by stacks/valia-sites):
-                # ensure/update every entry, and DELETE stale records that
-                # left the map (site retired/renamed). Deletion is scoped to
-                # CNAMEs targeting *.pages.dev — nothing else is ever touched.
-                # Targets resolve upstream to CF edge IPs; no hairpin involved.
-                VALIA=$$(kubectl get configmap valia-sites-dns -n technitium -o go-template='{{range $$k, $$v := .data}}{{$$k}} {{$$v}}{{"\n"}}{{end}}' 2>/dev/null || true)
-                if [ -n "$$VALIA" ]; then
-                  printf '%s\n' "$$VALIA" | while read -r VNAME VTARGET; do
-                    [ -z "$$VNAME" ] && continue
-                    CUR=$$(curl -sf "$$TECH_API/api/zones/records/get?token=$$TOKEN&zone=$$ZONE&domain=$$VNAME.$$ZONE" | grep -o '"cname":"[^"]*"' | head -1 | cut -d'"' -f4)
-                    if [ "$$CUR" = "$$VTARGET" ]; then
-                      echo "valia: $$VNAME.$$ZONE ok"
-                      continue
-                    fi
-                    if [ -n "$$CUR" ]; then
-                      curl -sf -G "$$TECH_API/api/zones/records/delete" --data-urlencode "token=$$TOKEN" --data-urlencode "zone=$$ZONE" --data-urlencode "domain=$$VNAME.$$ZONE" --data-urlencode "type=CNAME" --data-urlencode "cname=$$CUR" > /dev/null || true
-                    fi
-                    R=$$(curl -sf -G "$$TECH_API/api/zones/records/add" --data-urlencode "token=$$TOKEN" --data-urlencode "zone=$$ZONE" --data-urlencode "domain=$$VNAME.$$ZONE" --data-urlencode "type=CNAME" --data-urlencode "cname=$$VTARGET" --data-urlencode "ttl=3600") || true
-                    echo "$$R" | grep -q '"status":"ok"' && echo "valia: set $$VNAME.$$ZONE -> $$VTARGET" || echo "valia: FAILED $$VNAME.$$ZONE -- $$R"
-                  done
-                  # Deletion pass: zone CNAMEs targeting *.pages.dev that are
-                  # no longer in the map. ZONE_DUMP predates this run's adds,
-                  # but just-set names are in $VALIA so they're never deleted.
-                  printf '%s' "$$ZONE_DUMP" | tr ',' '\n' | awk -F'"' '/"name":/{n=$$4} /"cname":/{print n" "$$4}' | grep '\.pages\.dev *$$' | while read -r RNAME RTARGET; do
-                    SHORT=$${RNAME%%.$$ZONE}
-                    printf '%s\n' "$$VALIA" | grep -q "^$$SHORT " && continue
-                    curl -sf -G "$$TECH_API/api/zones/records/delete" --data-urlencode "token=$$TOKEN" --data-urlencode "zone=$$ZONE" --data-urlencode "domain=$$RNAME" --data-urlencode "type=CNAME" --data-urlencode "cname=$$RTARGET" > /dev/null && echo "valia: removed stale $$RNAME -> $$RTARGET"
-                  done
-                else
-                  echo "valia: CM valia-sites-dns absent/unreadable -- skipping Pages CNAMEs this run"
-                fi
-
                # Pin the .lan ingress anchor A record to the LIVE Traefik LB IP.
                # *.viktorbarzin.lan ingress hosts CNAME to ingress.viktorbarzin.lan,
                # so a Traefik LB IP move that misses the .lan zone silently breaks
--- a/stacks/traefik/modules/traefik/middleware.tf
+++ b/stacks/traefik/modules/traefik/middleware.tf
@ -119,41 +119,6 @@ resource "kubernetes_manifest" "middleware_local_only" {
  depends_on = [helm_release.traefik]
 }

-# IP allowlist for household access across ALL home sites: Sofia LAN + the
-# WireGuard spoke LANs (London, Valchedrym) + 10/8 (VLANs, K8s pods/services,
-# WG tunnel IPs). Deliberately a SEPARATE middleware from `local-only` —
-# widening local-only would grant the remote LANs access to the admin surfaces
-# that use it (Prometheus, iDRAC, Loki, …). Use for family-facing services
-# (e.g. the immich-frame kiosks) that every household device may open but the
-# public internet must not. Pair with ingress_factory `dns_type = "internal"`:
-# a Cloudflare-proxied record would deliver public traffic from cloudflared
-# POD IPs (inside 10/8) and silently bypass this allowlist.
-resource "kubernetes_manifest" "middleware_home_lans_only" {
-  manifest = {
-    apiVersion = "traefik.io/v1alpha1"
-    kind       = "Middleware"
-    metadata = {
-      name      = "home-lans-only"
-      namespace = kubernetes_namespace.traefik.metadata[0].name
-    }
-    spec = {
-      ipAllowList = {
-        sourceRange = [
-          "192.168.1.0/24", # Sofia LAN (hub site)
-          "10.0.0.0/8",     # VLANs, K8s pod/svc CIDRs, WG tunnel subnet
-          "192.168.8.0/24", # London LAN (via WG tunnel)
-          "192.168.9.0/24", # London GUEST net — the Portal Plus actually leases here (Portal-75AE8F9C2A8A = 192.168.9.198)
-          "192.168.0.0/24", # Valchedrym LAN (via WG tunnel)
-          "fc00::/7",
-          "fe80::/10",
-        ]
-      }
-    }
-  }
-
-  depends_on = [helm_release.traefik]
-}
-
 # HTTPS redirect middleware
 resource "kubernetes_manifest" "middleware_redirect_https" {
  manifest = {
@ -403,33 +368,6 @@ resource "kubernetes_manifest" "middleware_authentik_rate_limit" {
  depends_on = [helm_release.traefik]
 }

-# Dawarich-specific rate limit. The Rails app serves all its fingerprinted
-# assets itself (JS/CSS chunks, SVG store badges, favicons, webmanifest) and
-# the map view adds a points/API burst on load — a single page load from one
-# client IP blows past the default 10/50 limiter and 429s the asset tail
-# (seventh instance of the burst pattern, after ha-sofia, ActualBudget, noVNC,
-# tripit, health and authentik). Background location ingestion (OwnTracks
-# bridge + mobile api_key POSTs) rides the same host, so 429s here also risk
-# dropped pings. Burst absorbs a couple of full page loads back-to-back.
-resource "kubernetes_manifest" "middleware_dawarich_rate_limit" {
-  manifest = {
-    apiVersion = "traefik.io/v1alpha1"
-    kind       = "Middleware"
-    metadata = {
-      name      = "dawarich-rate-limit"
-      namespace = kubernetes_namespace.traefik.metadata[0].name
-    }
-    spec = {
-      rateLimit = {
-        average = 100
-        burst   = 1000
-      }
-    }
-  }
-
-  depends_on = [helm_release.traefik]
-}
-
 # Compress responses to clients at the entrypoint level (outermost).
 # Applied at websecure entrypoint so all responses get compressed.
 # Uses includedContentTypes (whitelist) instead of excludedContentTypes:
--- a/stacks/tripit/main.tf
+++ b/stacks/tripit/main.tf
@ -175,12 +175,6 @@ locals {
    STORY_SOURCE_MODE        = "web"
    SCRIPT_WRITER_MODE       = "chat"
    PLACE_RESOLVER_MODE      = "wikipedia"
-    # Saved Place preview photos (tripit ADR-0035/0040): the Wikipedia lead-image
-    # fetcher behind manual-add-time photos and the backfill sweep. Same fake-
-    # default gap as the resolver above — never set, so prod silently ran the
-    # fake and hand-added places (and any backfill) would store placeholder
-    # PNGs instead of real photos.
-    PLACE_PHOTO_PROVIDER = "wikipedia"
  }
 }

--- a/stacks/valia-sites/main.tf
+++ b/stacks/valia-sites/main.tf
@ -1,368 +0,0 @@
-# Valia sites (ADR-0018): small static sites authored by Valia in Google Drive,
-# served OFF-INFRA on Cloudflare Pages, mirrored by the in-cluster CronJob below
-# every 10 minutes. Registering a new site = one entry in local.sites (plus
-# Valia sharing the folder with vbarzin@gmail.com). Full runbook:
-# docs/runbooks/valia-sites.md
-#
-# Per site this stack fans out:
-#   - cloudflare_pages_project + custom domain <name>.viktorbarzin.me
-#   - public proxied CNAME <name> -> <project>.pages.dev   (manage_dns gate)
-#   - internal split-horizon CNAME via ConfigMap valia-sites-dns consumed by
-#     the technitium-ingress-dns-sync script (declarative: add/update/REMOVE)
-#   - a slot in the shared sync CronJob (rclone mirror -> wrangler deploy)
-
-locals {
-  cloudflare_account_id = "02e035473cfc4834fb10c5d35470d8b4" # vbarzin@gmail.com's account (not a secret)
-
-  # THE site registry. Keys are the public subdomain (English, Viktor picks —
-  # CONTEXT.md "Valia site"). folder_id = the Drive folder Valia shared (the
-  # Content folder); src_path = subfolder holding servable files ("" = root);
-  # entry_file = what / must serve (staged as index.html at deploy time).
-  # manage_dns = false parks a site's public CNAME + internal record while the
-  # name is still owned elsewhere (used for the stem95su ingress cutover).
-  sites = {
-    bridge = {
-      folder_id  = "1YWwAtSTsJD9HOzckGRIFXigWqCgYSGEa" # "мост" — ОбУ „Отец Паисий“
-      src_path   = ""
-      entry_file = "index.html"
-      manage_dns = true
-    }
-    stem95su = {
-      folder_id  = "1cmOI2jRyBJdnrVPgbr4kx2cx_4DY6pm_" # "claude" — 95. СУ STEM board
-      src_path   = "stem claude/files"
-      entry_file = "stem_board.html"
-      manage_dns = true
-    }
-  }
-
-  dns_managed_sites = { for k, v in local.sites : k => v if v.manage_dns }
-}
-
-# ---------------------------------------------------------------------------
-# Cloudflare Pages: project + custom domain per site
-# ---------------------------------------------------------------------------
-
-resource "cloudflare_pages_project" "site" {
-  for_each          = local.sites
-  account_id        = local.cloudflare_account_id
-  name              = each.key
-  production_branch = "main"
-}
-
-# bridge was created by hand (wrangler) on 2026-07-03 — adopt, don't recreate.
-import {
-  to = cloudflare_pages_project.site["bridge"]
-  id = "02e035473cfc4834fb10c5d35470d8b4/bridge"
-}
-
-resource "cloudflare_pages_domain" "site" {
-  for_each     = local.sites
-  account_id   = local.cloudflare_account_id
-  project_name = cloudflare_pages_project.site[each.key].name
-  domain       = "${each.key}.viktorbarzin.me"
-}
-
-import {
-  to = cloudflare_pages_domain.site["bridge"]
-  id = "02e035473cfc4834fb10c5d35470d8b4/bridge/bridge.viktorbarzin.me"
-}
-
-# Public proxied CNAME. Gated on manage_dns: a site whose name is still served
-# by an in-cluster ingress keeps its ingress_factory record until cutover
-# (two records can't share one name).
-resource "cloudflare_record" "site" {
-  for_each = local.dns_managed_sites
-  zone_id  = var.cloudflare_zone_id
-  name     = each.key
-  content  = cloudflare_pages_project.site[each.key].subdomain
-  type     = "CNAME"
-  proxied  = true
-  ttl      = 1
-}
-
-# bridge's record predates this stack (created 2026-07-03 in stacks/cloudflared,
-# handed off via removed{} there) — adopt by id.
-import {
-  to = cloudflare_record.site["bridge"]
-  id = "fd2c5dd4efe8fe38958944e74d0ced6d/ff4fb6f4900744d4b22de50d3fdd219b"
-}
-
-# ---------------------------------------------------------------------------
-# Internal split-horizon DNS feed (docs/architecture/dns.md "superset rule"):
-# the technitium-ingress-dns-sync script reads this CM and reconciles internal
-# CNAMEs for every entry — including deleting stale *.pages.dev records when
-# an entry disappears (site retired/renamed).
-# ---------------------------------------------------------------------------
-
-resource "kubernetes_config_map" "valia_sites_dns" {
-  metadata {
-    name      = "valia-sites-dns"
-    namespace = "technitium"
-    labels    = { "app.kubernetes.io/managed-by" = "valia-sites" }
-  }
-  data = { for k, v in local.dns_managed_sites : k => cloudflare_pages_project.site[k].subdomain }
-}
-
-# ---------------------------------------------------------------------------
-# The shared sync CronJob
-# ---------------------------------------------------------------------------
-
-resource "kubernetes_namespace" "valia_sites" {
-  metadata {
-    name = "valia-sites"
-    labels = {
-      "istio-injection" : "disabled"
-      tier = local.tiers.aux
-    }
-  }
-  lifecycle {
-    # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
-    ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
-  }
-}
-
-# Secrets: shared drive.readonly rclone conf + the SCOPED CF Pages token
-# (Pages Read/Write only — the Global API Key never enters a pod).
-resource "kubernetes_manifest" "sync_external_secret" {
-  field_manager {
-    force_conflicts = true
-  }
-  manifest = {
-    apiVersion = "external-secrets.io/v1"
-    kind       = "ExternalSecret"
-    metadata = {
-      name      = "valia-sites-sync"
-      namespace = kubernetes_namespace.valia_sites.metadata[0].name
-    }
-    spec = {
-      refreshInterval = "1h"
-      secretStoreRef = {
-        name = "vault-kv"
-        kind = "ClusterSecretStore"
-      }
-      target = { name = "valia-sites-sync" }
-      data = [
-        {
-          secretKey = "rclone.conf"
-          remoteRef = { key = "valia-sites", property = "rclone_conf" }
-        },
-        {
-          secretKey = "CLOUDFLARE_API_TOKEN"
-          remoteRef = { key = "valia-sites", property = "cloudflare_pages_token" }
-        },
-        {
-          secretKey = "CLOUDFLARE_ACCOUNT_ID"
-          remoteRef = { key = "valia-sites", property = "account_id" }
-        },
-      ]
-    }
-  }
-  depends_on = [kubernetes_namespace.valia_sites]
-}
-
-# Site registry rendered for the job (folder ids aren't secrets).
-resource "kubernetes_config_map" "sync_config" {
-  metadata {
-    name      = "valia-sites-config"
-    namespace = kubernetes_namespace.valia_sites.metadata[0].name
-  }
-  data = {
-    "sites.json" = jsonencode(local.sites)
-  }
-}
-
-# Last-deployed manifest hash per site — written by the job (merge-patch), so
-# TF must never fight it over data.
-resource "kubernetes_config_map" "sync_state" {
-  metadata {
-    name      = "valia-sites-state"
-    namespace = kubernetes_namespace.valia_sites.metadata[0].name
-  }
-  data = {}
-  lifecycle {
-    ignore_changes = [data]
-  }
-}
-
-resource "kubernetes_service_account" "sync" {
-  metadata {
-    name      = "valia-sites-sync"
-    namespace = kubernetes_namespace.valia_sites.metadata[0].name
-  }
-}
-
-resource "kubernetes_role" "sync_state" {
-  metadata {
-    name      = "valia-sites-sync-state"
-    namespace = kubernetes_namespace.valia_sites.metadata[0].name
-  }
-  rule {
-    api_groups     = [""]
-    resources      = ["configmaps"]
-    resource_names = ["valia-sites-state"]
-    verbs          = ["get", "patch"]
-  }
-}
-
-resource "kubernetes_role_binding" "sync_state" {
-  metadata {
-    name      = "valia-sites-sync-state"
-    namespace = kubernetes_namespace.valia_sites.metadata[0].name
-  }
-  role_ref {
-    api_group = "rbac.authorization.k8s.io"
-    kind      = "Role"
-    name      = kubernetes_role.sync_state.metadata[0].name
-  }
-  subject {
-    kind      = "ServiceAccount"
-    name      = kubernetes_service_account.sync.metadata[0].name
-    namespace = kubernetes_namespace.valia_sites.metadata[0].name
-  }
-}
-
-resource "kubernetes_cron_job_v1" "sync" {
-  metadata {
-    name      = "valia-sites-sync"
-    namespace = kubernetes_namespace.valia_sites.metadata[0].name
-    labels    = { app = "valia-sites", component = "sync" }
-  }
-  spec {
-    schedule                      = "*/10 * * * *"
-    concurrency_policy            = "Forbid"
-    successful_jobs_history_limit = 2
-    failed_jobs_history_limit     = 3
-    job_template {
-      metadata {}
-      spec {
-        backoff_limit              = 1
-        ttl_seconds_after_finished = 86400
-        template {
-          metadata { labels = { app = "valia-sites", component = "sync" } }
-          spec {
-            restart_policy       = "OnFailure"
-            service_account_name = kubernetes_service_account.sync.metadata[0].name
-            container {
-              name  = "sync"
-              image = "ghcr.io/viktorbarzin/valia-sites-sync:latest"
-              # Guards mirror stem95su's proven set: hard-fail on Drive
-              # list/auth errors (visible as a failed Job — the chosen
-              # visibility, ADR-0018), skip quietly when a folder is empty or
-              # missing its entry file (never wipe a live site), capped
-              # deletes. Deploy ONLY on remote-manifest change: CF Pages caps
-              # monthly deployments on the free tier, so 144 no-op
-              # deploys/day is not an option.
-              command = ["/bin/sh", "-c", <<-EOT
-                set -u
-                cp /config/rclone.conf /tmp/rc.conf
-                APISERVER="https://kubernetes.default.svc"
-                SA=/var/run/secrets/kubernetes.io/serviceaccount
-                KTOKEN=$$(cat $$SA/token); NS=$$(cat $$SA/namespace)
-                STATE_URL="$$APISERVER/api/v1/namespaces/$$NS/configmaps/valia-sites-state"
-                FAILED=0
-                for SITE in $$(jq -r 'keys[]' /sites/sites.json); do
-                  FOLDER=$$(jq -r --arg s "$$SITE" '.[$$s].folder_id' /sites/sites.json)
-                  SRC_PATH=$$(jq -r --arg s "$$SITE" '.[$$s].src_path' /sites/sites.json)
-                  ENTRY=$$(jq -r --arg s "$$SITE" '.[$$s].entry_file' /sites/sites.json)
-                  RC="rclone --config /tmp/rc.conf --drive-root-folder-id=$$FOLDER --drive-skip-gdocs"
-                  # 1. Remote manifest (path+size+hash) — metadata only, no download.
-                  MANIFEST=$$($$RC lsf "gdrive:$$SRC_PATH" -R --files-only --format phs 2>/tmp/lsf.err) || {
-                    echo "FATAL [$$SITE]: Drive list failed (auth/network):"; cat /tmp/lsf.err; FAILED=1; continue; }
-                  N=$$(printf '%s\n' "$$MANIFEST" | grep -c . || true)
-                  if [ "$$N" -lt 1 ] || ! printf '%s\n' "$$MANIFEST" | cut -d';' -f1 | grep -qx "$$ENTRY"; then
-                    echo "GUARD [$$SITE]: N=$$N / $$ENTRY missing -- skipping, site untouched"; continue
-                  fi
-                  # Cloudflare Pages hard-caps files at 25 MB — deploying
-                  # without an oversize file would silently break the pages
-                  # that reference it, so skip the whole site instead (last
-                  # deployed content keeps serving) and say so loudly.
-                  OVERSIZE=$$(printf '%s\n' "$$MANIFEST" | awk -F';' '$$3 > 26214400 {print $$1" ("$$3" B)"}')
-                  if [ -n "$$OVERSIZE" ]; then
-                    echo "GUARD [$$SITE]: file(s) exceed the 25MB Pages limit -- skipping, site untouched:"; echo "$$OVERSIZE"; continue
-                  fi
-                  HASH=$$(printf '%s' "$$MANIFEST" | sha256sum | cut -d' ' -f1)
-                  LAST=$$(curl -sf --cacert $$SA/ca.crt -H "Authorization: Bearer $$KTOKEN" "$$STATE_URL" | jq -r --arg s "$$SITE" '.data[$$s] // ""')
-                  if [ "$$HASH" = "$$LAST" ]; then echo "OK [$$SITE]: unchanged"; continue; fi
-                  # 2. Content changed — pull and deploy.
-                  $$RC sync "gdrive:$$SRC_PATH" "/work/$$SITE" --exclude ".DS_Store" --fast-list --transfers 4 --max-delete 25 -v || {
-                    echo "FATAL [$$SITE]: rclone sync failed"; FAILED=1; continue; }
-                  if [ "$$ENTRY" != "index.html" ]; then
-                    cp "/work/$$SITE/$$ENTRY" "/work/$$SITE/index.html"
-                  fi
-                  wrangler pages deploy "/work/$$SITE" --project-name="$$SITE" --branch=main --commit-dirty=true || {
-                    echo "FATAL [$$SITE]: wrangler deploy failed"; FAILED=1; continue; }
-                  curl -sf --cacert $$SA/ca.crt -H "Authorization: Bearer $$KTOKEN" \
-                    -X PATCH -H "Content-Type: application/merge-patch+json" \
-                    -d "{\"data\":{\"$$SITE\":\"$$HASH\"}}" "$$STATE_URL" > /dev/null || {
-                    echo "WARN [$$SITE]: state patch failed (will redeploy next run)"; FAILED=1; }
-                  echo "DEPLOYED [$$SITE]: $$HASH"
-                done
-                exit $$FAILED
-              EOT
-              ]
-              env {
-                name = "CLOUDFLARE_API_TOKEN"
-                value_from {
-                  secret_key_ref {
-                    name = "valia-sites-sync"
-                    key  = "CLOUDFLARE_API_TOKEN"
-                  }
-                }
-              }
-              env {
-                name = "CLOUDFLARE_ACCOUNT_ID"
-                value_from {
-                  secret_key_ref {
-                    name = "valia-sites-sync"
-                    key  = "CLOUDFLARE_ACCOUNT_ID"
-                  }
-                }
-              }
-              resources {
-                requests = { cpu = "25m", memory = "128Mi" }
-                limits   = { memory = "512Mi" }
-              }
-              volume_mount {
-                name       = "rclone-config"
-                mount_path = "/config"
-                read_only  = true
-              }
-              volume_mount {
-                name       = "sites-config"
-                mount_path = "/sites"
-                read_only  = true
-              }
-              volume_mount {
-                name       = "work"
-                mount_path = "/work"
-              }
-            }
-            volume {
-              name = "rclone-config"
-              secret {
-                secret_name = "valia-sites-sync"
-                items {
-                  key  = "rclone.conf"
-                  path = "rclone.conf"
-                }
-              }
-            }
-            volume {
-              name = "sites-config"
-              config_map { name = kubernetes_config_map.sync_config.metadata[0].name }
-            }
-            volume {
-              name = "work"
-              empty_dir {}
-            }
-          }
-        }
-      }
-    }
-  }
-  lifecycle {
-    # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
-    ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
-  }
-  depends_on = [kubernetes_manifest.sync_external_secret]
-}
--- a/stacks/valia-sites/sync-image/Dockerfile
+++ b/stacks/valia-sites/sync-image/Dockerfile
@ -1,15 +0,0 @@
-# valia-sites-sync: everything the 10-min Content-folder mirror needs, baked in
-# (no runtime installs — CronJob pods must not apk/npm on every start).
-# rclone pinned to match the proven stem95su version; wrangler pinned to major 4.
-FROM node:22-alpine
-
-RUN apk add --no-cache curl unzip ca-certificates jq \
-    && curl -fsSL https://downloads.rclone.org/v1.74.3/rclone-v1.74.3-linux-amd64.zip -o /tmp/rclone.zip \
-    && unzip -j /tmp/rclone.zip '*/rclone' -d /usr/local/bin \
-    && chmod +x /usr/local/bin/rclone \
-    && rm /tmp/rclone.zip \
-    && npm install -g wrangler@4 \
-    && npm cache clean --force
-
-# wrangler writes config/cache under $HOME; the CronJob runs as non-root node (uid 1000)
-ENV HOME=/tmp
--- a/stacks/valia-sites/terragrunt.hcl
+++ b/stacks/valia-sites/terragrunt.hcl
@ -1,8 +0,0 @@
-include "root" {
-  path = find_in_parent_folders()
-}
-
-dependency "platform" {
-  config_path  = "../platform"
-  skip_outputs = true
-}
--- a/stacks/valia-sites/variables.tf
+++ b/stacks/valia-sites/variables.tf
@ -1,3 +0,0 @@
-variable "cloudflare_zone_id" {
-  type = string
-}
--- a/stacks/vault/main.tf
+++ b/stacks/vault/main.tf
@ -675,7 +675,6 @@ resource "vault_database_secret_backend_connection" "postgresql" {
    "pg-nextcloud-todos",
    "pg-technitium",
    "pg-goldmane-edges",
-    "pg-tasks",
  ]

  postgresql {
@ -904,17 +903,6 @@ resource "vault_database_secret_backend_static_role" "pg_goldmane_edges" {
  rotation_period = 604800
 }

-# tasks PWA (Reminders-style front-end over Nextcloud CalDAV) — 7-day rotation
-# for the `tasks` CNPG role. Consumed by stacks/tasks via a vault-database
-# ExternalSecret -> TASKS_DB_DSN (remoteRef static-creds/pg-tasks).
-resource "vault_database_secret_backend_static_role" "pg_tasks" {
-  backend         = vault_mount.database.path
-  db_name         = vault_database_secret_backend_connection.postgresql.name
-  name            = "pg-tasks"
-  username        = "tasks"
-  rotation_period = 604800
-}
-
 # =============================================================================
 # Kubernetes Secrets Engine — Dynamic K8s Credentials
 # =============================================================================
--- a/state/stacks/vault/terraform.tfstate.enc
+++ b/state/stacks/vault/terraform.tfstate.enc