Compare commits
1 commit
master
...
emo/frame-
| Author | SHA1 | Date | |
|---|---|---|---|
| 43a5d2cc27 |
78 changed files with 2815 additions and 9164 deletions
File diff suppressed because one or more lines are too long
|
|
@ -81,7 +81,7 @@
|
||||||
| ytdlp | YouTube downloader | ytdlp |
|
| ytdlp | YouTube downloader | ytdlp |
|
||||||
| wealthfolio | Finance tracking | wealthfolio |
|
| wealthfolio | Finance tracking | wealthfolio |
|
||||||
| audiobookshelf | Audiobook server (may be merged into ebooks stack) | audiobookshelf |
|
| audiobookshelf | Audiobook server (may be merged into ebooks stack) | audiobookshelf |
|
||||||
| paperless-ngx | Document management. Mail ingest: forward document emails to `docs@viktorbarzin.me` — sender maps 1:1 to a paperless account (runbook `paperless-mail-ingest.md`) | paperless-ngx |
|
| paperless-ngx | Document management | paperless-ngx |
|
||||||
| jsoncrack | JSON visualizer | jsoncrack |
|
| jsoncrack | JSON visualizer | jsoncrack |
|
||||||
| servarr | Media automation (Sonarr/Radarr/etc) | servarr |
|
| servarr | Media automation (Sonarr/Radarr/etc) | servarr |
|
||||||
| aiostreams | Stremio stream aggregator (Real-Debrid + Torrentio/Comet/StremThru Torz/Knaben; **MediaFusion removed 2026-06-07** — broken upstream `500`). `auth=app` (own UUID+password); stream-probe tests **both series+movie paths** with per-source breakdown (`aiostreams_streams_{comet,torrentio,stremthru_torz,knaben}`) + `aiostreams_error_streams` + `aiostreams_movie_stream_count`, success gated on Comet (workhorse) being alive; weekly NFS config + Stremio-account-collection backups to `/srv/nfs/aiostreams-backup/`. PG-backed user config (Comet timeout bumped 5s→10s 2026-06-07). | servarr/aiostreams |
|
| aiostreams | Stremio stream aggregator (Real-Debrid + Torrentio/Comet/StremThru Torz/Knaben; **MediaFusion removed 2026-06-07** — broken upstream `500`). `auth=app` (own UUID+password); stream-probe tests **both series+movie paths** with per-source breakdown (`aiostreams_streams_{comet,torrentio,stremthru_torz,knaben}`) + `aiostreams_error_streams` + `aiostreams_movie_stream_count`, success gated on Comet (workhorse) being alive; weekly NFS config + Stremio-account-collection backups to `/srv/nfs/aiostreams-backup/`. PG-backed user config (Comet timeout bumped 5s→10s 2026-06-07). | servarr/aiostreams |
|
||||||
|
|
@ -99,7 +99,6 @@
|
||||||
| tor-proxy | Tor proxy | tor-proxy |
|
| tor-proxy | Tor proxy | tor-proxy |
|
||||||
| forgejo | Git forge. Open native self-signup (Turnstile captcha + email confirm) + Authentik & GitHub OAuth sign-in; see `docs/runbooks/forgejo-open-signups.md` | forgejo |
|
| forgejo | Git forge. Open native self-signup (Turnstile captcha + email confirm) + Authentik & GitHub OAuth sign-in; see `docs/runbooks/forgejo-open-signups.md` | forgejo |
|
||||||
| freshrss | RSS reader | freshrss |
|
| freshrss | RSS reader | freshrss |
|
||||||
| drone-logbook | DJI flight-log analyzer (Open DroneLog, upstream image) — dronelog.viktorbarzin.me | drone-logbook |
|
|
||||||
| navidrome | Music streaming | navidrome |
|
| navidrome | Music streaming | navidrome |
|
||||||
| networking-toolbox | Network tools | networking-toolbox |
|
| networking-toolbox | Network tools | networking-toolbox |
|
||||||
| stirling-pdf | PDF tools | stirling-pdf |
|
| stirling-pdf | PDF tools | stirling-pdf |
|
||||||
|
|
@ -121,9 +120,7 @@
|
||||||
| status-page | Status page | status-page |
|
| status-page | Status page | status-page |
|
||||||
| plotting-book | Book plotting/world-building app | plotting-book |
|
| plotting-book | Book plotting/world-building app | plotting-book |
|
||||||
| tripit | Self-hosted TripIt-clone travel-itinerary PWA (FastAPI + SvelteKit SPA, same-origin). CNPG (`tripit` db, Vault static role `pg-tripit`) + RWX NFS trip-doc vault (`/srv/nfs/tripit-documents`) + RWO `proxmox-lvm-encrypted` personal-document vault `tripit-personal-documents` (passports/IDs — AES-256-GCM app-layer envelope, master key `DOCUMENT_ENCRYPTION_KEY` in `secret/tripit`). `auth=required` (Authentik forward-auth, reads `X-authentik-email`); second `auth=none` ingress on `/api/calendar` for HMAC-token-gated `.ics` feed. Email-ingest CronJob `tripit-ingest-plans` (`*/15`) is the SOLE inbound path — forward a booking to plans@viktorbarzin.me (catch-all → spam@), polled read-only and routed ONLY to a registered user / verified linked address (no default-owner fallback; strangers ignored), parsed by local LLM (`qwen3vl-4b`), and the sender is emailed the outcome (Added to trip / Couldn't import). Plus `tripit-poll-flights`, `tripit-run-reminders`, `tripit-transport-nudge`, `tripit-weather-brief`. (The old Gmail-scrape `tripit-ingest-mail` CronJob was removed 2026-06-05.) App secrets in Vault `secret/tripit`. | tripit |
|
| tripit | Self-hosted TripIt-clone travel-itinerary PWA (FastAPI + SvelteKit SPA, same-origin). CNPG (`tripit` db, Vault static role `pg-tripit`) + RWX NFS trip-doc vault (`/srv/nfs/tripit-documents`) + RWO `proxmox-lvm-encrypted` personal-document vault `tripit-personal-documents` (passports/IDs — AES-256-GCM app-layer envelope, master key `DOCUMENT_ENCRYPTION_KEY` in `secret/tripit`). `auth=required` (Authentik forward-auth, reads `X-authentik-email`); second `auth=none` ingress on `/api/calendar` for HMAC-token-gated `.ics` feed. Email-ingest CronJob `tripit-ingest-plans` (`*/15`) is the SOLE inbound path — forward a booking to plans@viktorbarzin.me (catch-all → spam@), polled read-only and routed ONLY to a registered user / verified linked address (no default-owner fallback; strangers ignored), parsed by local LLM (`qwen3vl-4b`), and the sender is emailed the outcome (Added to trip / Couldn't import). Plus `tripit-poll-flights`, `tripit-run-reminders`, `tripit-transport-nudge`, `tripit-weather-brief`. (The old Gmail-scrape `tripit-ingest-mail` CronJob was removed 2026-06-05.) App secrets in Vault `secret/tripit`. | tripit |
|
||||||
| tasks | Reminders-style tasks PWA over Nextcloud CalDAV (FastAPI + SvelteKit SPA same-origin, single container; code `~/code/tasks`, design `tasks/docs/2026-07-03-tasks-pwa-design.md`). Nextcloud stays the source of truth (VTODOs); the app is the front-end Apple Reminders stopped being. CNPG (`tasks` db, Vault static role `pg-tasks`) stores Connected Accounts — per-user Nextcloud app passwords Fernet-encrypted with `fernet_key` from `secret/tasks`. `auth=required` (Authentik forward-auth; identity = `X-authentik-username`, NO app-level login — `DEV_USER` must never be set in prod) at tasks.viktorbarzin.me (proxied). Exception: the five PWA icon/manifest files (`/apple-touch-icon.png`, `/favicon.png`, `/pwa-192x192.png`, `/pwa-512x512.png`, `/manifest.webmanifest`) are a path-scoped `auth=none` carve-out (`module.ingress_icons`) so cookie-less OS icon fetchers (macOS Safari Add-to-Dock, mobile home-screen installs) get the real icon instead of the Authentik 302; guarded by the `tasks-icons` walloff-probe target. NetworkPolicy `tasks-ingress` (SEC-1) restricts pod ingress to traefik + monitoring namespaces so the trusted header can't be spoofed pod-to-pod. GHA → public ghcr `tasks` → Woodpecker deploy (ADR-0002). | tasks |
|
| stem95su | STEM educational platform for **95. СУ „Проф. Иван Шишманов"** (Sofia school) at stem95su.viktorbarzin.me. Public **open** static site (`auth=none` — CrowdSec + ai-bot-block, no login). Stock `nginx:1.28-alpine` serving content **straight off PVE host NFS** `/srv/nfs/stem-site` (RWX `nfs_volume`, mounted read-only) — **NOT** image-baked, so the externally-authored (Gemini-exported) HTML/media updates with no rebuild; auto-backed-up offsite by `nfs-mirror`. **Content source = Google Drive folder "claude"** (id `1cmOI2jRyBJdnrVPgbr4kx2cx_4DY6pm_`, shared Valentina→vbarzin@gmail.com). **Deploy = scheduled mirror** (since 2026-06-09, reversed the earlier on-demand-only call once content went active): CronJob `stem95su-gdrive-sync` (`*/10`, `stacks/stem95su/gdrive-sync.tf`) mounts the content PVC RW and `rclone sync`s the Drive folder onto it (`docker.io/rclone/rclone:1.74.3`, `scope=drive.readonly` — Drive is READ-ONLY; empty-source guard + `--max-delete 25` so a partial listing can't wipe the site). rclone creds (OAuth refresh-token) in Vault `secret/stem95su` (`rclone_conf`) → ESO secret `stem95su-rclone`. **Requires the GCP OAuth app (project home-lab-1700868541205) published to "Production"** or the refresh token expires ~weekly (re-mint + `vault kv put secret/stem95su rclone_conf=…` after publishing); a dead token surfaces as a failed Job. Manual on-demand sync still possible (throwaway rclone container from devvm; recipe in claude-memory). Nextcloud "PVE NFS Pool"/rsync is a manual fallback. Dashboard `stem_board.html` served at `/` via a small nginx ConfigMap (`index`). No DB, no in-cluster secrets. Reference impl for the NFS-backed static-site pattern (see patterns.md). | stem95su |
|
||||||
| stem95su | STEM educational platform for **95. СУ „Проф. Иван Шишманов"** (Sofia school) at stem95su.viktorbarzin.me — **a Valia site on Cloudflare Pages since 2026-07-03** (ADR-0018): registry entry in `stacks/valia-sites`, synced from Drive folder "claude" every 10 min, deploy-on-change. The old in-cluster stack (nginx off PVE NFS + per-site rclone CronJob) is RETIRED — stacks/stem95su is a tombstone; `secret/stem95su` superseded by `secret/valia-sites`; `stem_video.mp4` was compressed 42.9→21.4MB (25MB Pages cap) with Viktor's OK. See docs/runbooks/valia-sites.md. | — |
|
|
||||||
| valia-sites | **Valia-site registry + sync** (ADR-0018): all sites authored by Valia serve OFF-INFRA on Cloudflare Pages (`bridge` + `stem95su` live). One map entry in `stacks/valia-sites/main.tf` per site fans out Pages project + custom domain + public CNAME + internal split-horizon CNAME (ConfigMap `valia-sites-dns` → technitium sync, declarative incl. removal). CronJob `valia-sites-sync` (`*/10`, image ghcr `valia-sites-sync`) mirrors each Drive Content folder (rclone `drive.readonly`, stem95su-style guards + 25MB Pages-cap guard) and wrangler-deploys ONLY on manifest change (free-tier deploy cap). Secrets `secret/valia-sites` (shared rclone conf + SCOPED CF Pages token — Global API Key never in pods). Failed-Job-only visibility by choice. Runbook: docs/runbooks/valia-sites.md. | valia-sites |
|
|
||||||
| trek | **TRIAL (2026-06-05)** — self-hosted group-trip planner (upstream [TREK](https://github.com/mauriceboe/TREK), `mauriceboe/trek:3.0.22`, AGPL-3.0). Solo evaluation behind Authentik forward-auth (`auth=required`) before deciding build-vs-adopt; covers collaborative trip planning + accommodation records + activities + per-person budget splitting on free OpenStreetMap (no paid maps key). SQLite + uploads on `proxmox-lvm-encrypted` (`trek-data-encrypted` 2Gi, `trek-uploads-encrypted` 5Gi). For the trial only: `ENCRYPTION_KEY` is TREK-auto-generated onto the data PVC and the bootstrap admin (`admin@trek.local`) is printed to pod logs — NO Vault/ESO wiring (graduation TODO: move key to `secret/trek` + ESO, add an app-level SQLite backup CronJob since host file-backup can't read the LUKS PVC, wire TREK↔Authentik OIDC). Pinned image, TF-managed (no CI/Keel). Availability-poll companion (Rallly) deferred. Teardown: `tg destroy` in `stacks/trek`. | trek |
|
| trek | **TRIAL (2026-06-05)** — self-hosted group-trip planner (upstream [TREK](https://github.com/mauriceboe/TREK), `mauriceboe/trek:3.0.22`, AGPL-3.0). Solo evaluation behind Authentik forward-auth (`auth=required`) before deciding build-vs-adopt; covers collaborative trip planning + accommodation records + activities + per-person budget splitting on free OpenStreetMap (no paid maps key). SQLite + uploads on `proxmox-lvm-encrypted` (`trek-data-encrypted` 2Gi, `trek-uploads-encrypted` 5Gi). For the trial only: `ENCRYPTION_KEY` is TREK-auto-generated onto the data PVC and the bootstrap admin (`admin@trek.local`) is printed to pod logs — NO Vault/ESO wiring (graduation TODO: move key to `secret/trek` + ESO, add an app-level SQLite backup CronJob since host file-backup can't read the LUKS PVC, wire TREK↔Authentik OIDC). Pinned image, TF-managed (no CI/Keel). Availability-poll companion (Rallly) deferred. Teardown: `tg destroy` in `stacks/trek`. | trek |
|
||||||
|
|
||||||
## Cloudflare Domains
|
## Cloudflare Domains
|
||||||
|
|
@ -133,7 +130,7 @@
|
||||||
blog, hackmd, privatebin, url, echo, f1tv, excalidraw, send,
|
blog, hackmd, privatebin, url, echo, f1tv, excalidraw, send,
|
||||||
audiobookshelf, jsoncrack, ntfy, cyberchef, homepage, linkwarden,
|
audiobookshelf, jsoncrack, ntfy, cyberchef, homepage, linkwarden,
|
||||||
changedetection, tandoor, n8n, stirling-pdf, dashy, city-guesser,
|
changedetection, tandoor, n8n, stirling-pdf, dashy, city-guesser,
|
||||||
travel, netbox, phpipam, tripit, t3, stem95su, tasks
|
travel, netbox, phpipam, tripit, t3, stem95su
|
||||||
```
|
```
|
||||||
|
|
||||||
### Non-Proxied (Direct DNS)
|
### Non-Proxied (Direct DNS)
|
||||||
|
|
|
||||||
42
.github/workflows/build-excalidraw.yml
vendored
42
.github/workflows/build-excalidraw.yml
vendored
|
|
@ -1,42 +0,0 @@
|
||||||
name: Build excalidraw-library
|
|
||||||
|
|
||||||
# ADR-0002 / no-local-builds: excalidraw-library (infra-owned Go app behind
|
|
||||||
# draw.viktorbarzin.me) builds off-infra on GHA → private ghcr; Keel polls
|
|
||||||
# ghcr:latest and rolls the deployment. Replaces the manual DockerHub pushes
|
|
||||||
# (viktorbarzin/excalidraw-library:v4 stays frozen as the rollback image).
|
|
||||||
on:
|
|
||||||
push:
|
|
||||||
branches: [master]
|
|
||||||
paths:
|
|
||||||
- 'stacks/excalidraw/project/**'
|
|
||||||
workflow_dispatch: {}
|
|
||||||
|
|
||||||
permissions:
|
|
||||||
contents: read
|
|
||||||
packages: write
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
build:
|
|
||||||
runs-on: ubuntu-latest
|
|
||||||
steps:
|
|
||||||
- uses: actions/checkout@v4
|
|
||||||
- uses: actions/setup-go@v5
|
|
||||||
with:
|
|
||||||
go-version: '1.21'
|
|
||||||
- run: go test ./...
|
|
||||||
working-directory: stacks/excalidraw/project
|
|
||||||
- uses: docker/setup-buildx-action@v3
|
|
||||||
- uses: docker/login-action@v3
|
|
||||||
with:
|
|
||||||
registry: ghcr.io
|
|
||||||
username: ${{ github.actor }}
|
|
||||||
password: ${{ secrets.GITHUB_TOKEN }}
|
|
||||||
- uses: docker/build-push-action@v6
|
|
||||||
with:
|
|
||||||
context: stacks/excalidraw/project
|
|
||||||
platforms: linux/amd64
|
|
||||||
provenance: false
|
|
||||||
push: true
|
|
||||||
tags: |
|
|
||||||
ghcr.io/viktorbarzin/excalidraw-library:latest
|
|
||||||
ghcr.io/viktorbarzin/excalidraw-library:${{ github.sha }}
|
|
||||||
39
.github/workflows/build-valia-sites-sync.yml
vendored
39
.github/workflows/build-valia-sites-sync.yml
vendored
|
|
@ -1,39 +0,0 @@
|
||||||
name: Build valia-sites-sync
|
|
||||||
|
|
||||||
# ADR-0002 + ADR-0018: infra-owned image built off-infra on GHA → ghcr (public).
|
|
||||||
# Rclone + wrangler runner for the Valia-sites Content-folder mirror CronJob.
|
|
||||||
# Rebuilds are rare (tool pins only change deliberately) → dispatch + path.
|
|
||||||
# Security note: no untrusted event inputs are interpolated anywhere (only
|
|
||||||
# github.actor / github.sha / GITHUB_TOKEN — same shape as the other
|
|
||||||
# build-*.yml workflows in this repo).
|
|
||||||
on:
|
|
||||||
push:
|
|
||||||
branches: [master]
|
|
||||||
paths:
|
|
||||||
- 'stacks/valia-sites/sync-image/**'
|
|
||||||
workflow_dispatch: {}
|
|
||||||
|
|
||||||
permissions:
|
|
||||||
contents: read
|
|
||||||
packages: write
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
build:
|
|
||||||
runs-on: ubuntu-latest
|
|
||||||
steps:
|
|
||||||
- uses: actions/checkout@v4
|
|
||||||
- uses: docker/setup-buildx-action@v3
|
|
||||||
- uses: docker/login-action@v3
|
|
||||||
with:
|
|
||||||
registry: ghcr.io
|
|
||||||
username: ${{ github.actor }}
|
|
||||||
password: ${{ secrets.GITHUB_TOKEN }}
|
|
||||||
- uses: docker/build-push-action@v6
|
|
||||||
with:
|
|
||||||
context: stacks/valia-sites/sync-image
|
|
||||||
platforms: linux/amd64
|
|
||||||
provenance: false
|
|
||||||
push: true
|
|
||||||
tags: |
|
|
||||||
ghcr.io/viktorbarzin/valia-sites-sync:latest
|
|
||||||
ghcr.io/viktorbarzin/valia-sites-sync:${{ github.sha }}
|
|
||||||
|
|
@ -95,7 +95,7 @@ Terragrunt-based homelab managing a Kubernetes cluster (5 nodes, v1.34.2) on Pro
|
||||||
## Key Paths
|
## Key Paths
|
||||||
- `stacks/<service>/main.tf` — service definition
|
- `stacks/<service>/main.tf` — service definition
|
||||||
- `stacks/platform/modules/<service>/` — core infra modules
|
- `stacks/platform/modules/<service>/` — core infra modules
|
||||||
- `modules/kubernetes/ingress_factory/` — standardized ingress with auth, rate limiting, anti-AI, and auto Cloudflare DNS (`dns_type = "proxied"`, `"non-proxied"`, or `"internal"` — a public A record carrying the internal Traefik LB IP for household-only services; pair with the `home-lans-only` ipAllowList middleware, never with `"proxied"`)
|
- `modules/kubernetes/ingress_factory/` — standardized ingress with auth, rate limiting, anti-AI, and auto Cloudflare DNS (`dns_type = "proxied"` or `"non-proxied"`)
|
||||||
- `modules/kubernetes/nfs_volume/` — NFS volume module (CSI-backed, soft mount)
|
- `modules/kubernetes/nfs_volume/` — NFS volume module (CSI-backed, soft mount)
|
||||||
- `config.tfvars` — non-secret configuration (plaintext)
|
- `config.tfvars` — non-secret configuration (plaintext)
|
||||||
- `secrets.sops.json` — all secrets (SOPS-encrypted JSON)
|
- `secrets.sops.json` — all secrets (SOPS-encrypted JSON)
|
||||||
|
|
|
||||||
23
CONTEXT.md
23
CONTEXT.md
|
|
@ -118,14 +118,6 @@ _Avoid_: "external", "outside".
|
||||||
`viktorbarzin.lan`, served by Technitium DNS. Resolves only inside the homelab network.
|
`viktorbarzin.lan`, served by Technitium DNS. Resolves only inside the homelab network.
|
||||||
_Avoid_: bare "lan", "private", "intranet".
|
_Avoid_: bare "lan", "private", "intranet".
|
||||||
|
|
||||||
**Segment**:
|
|
||||||
One isolated L2/L3 network with pfSense as its gateway — realised as a Proxmox-bridge-level tag feeding one dedicated untagged pfSense interface (dManagementsVms 10.0.10.0/24 = vmbr1 tag 10, dKubernetes 10.0.20.0/24 = vmbr1 tag 20, dCCTV 10.0.30.0/24 = vmbr0 tag 30). pfSense itself never terminates 802.1Q.
|
|
||||||
_Avoid_: "VLAN" as the primary name (the tags 10/20/30 are transport detail; the Segment is the concept).
|
|
||||||
|
|
||||||
**CCTV segment**:
|
|
||||||
The untrusted camera **Segment** (`dCCTV`) — devices in it may be pulled from (RTSP/ISAPI) but may initiate nothing except NTP to their gateway. Deliberately outside every trusted source-IP allowlist (ADR-0017).
|
|
||||||
_Avoid_: "camera VLAN", "CCTV LAN".
|
|
||||||
|
|
||||||
**Ingress auth**:
|
**Ingress auth**:
|
||||||
The `auth = "..."` parameter on `ingress_factory` — a discrete *mode*, not a ranked tier — one of `required` (Authentik forward-auth gates every request), `app` (the backend owns its login), `public` (anonymous Authentik binding for audit only), or `none` (Anubis-fronted content, or native-client API). Default `required` (fail-closed).
|
The `auth = "..."` parameter on `ingress_factory` — a discrete *mode*, not a ranked tier — one of `required` (Authentik forward-auth gates every request), `app` (the backend owns its login), `public` (anonymous Authentik binding for audit only), or `none` (Anubis-fronted content, or native-client API). Default `required` (fail-closed).
|
||||||
_Avoid_: "auth tier" / "auth mode" — refer to it by the canonical key, `auth` (e.g. `auth = "required"`). "tier" is reserved for State tier and Namespace tier.
|
_Avoid_: "auth tier" / "auth mode" — refer to it by the canonical key, `auth` (e.g. `auth = "required"`). "tier" is reserved for State tier and Namespace tier.
|
||||||
|
|
@ -237,20 +229,6 @@ _Avoid_: expecting Diun to deploy; conflating with **Keel**.
|
||||||
**Anubis**:
|
**Anubis**:
|
||||||
A PoW reverse-proxy issuing a 30-day JWT cookie, used in front of public content-bearing sites without app-level auth (blog, wiki, landing pages). Never in front of Git, WebDAV, CalDAV, or API endpoints (clients can't solve PoW).
|
A PoW reverse-proxy issuing a 30-day JWT cookie, used in front of public content-bearing sites without app-level auth (blog, wiki, landing pages). Never in front of Git, WebDAV, CalDAV, or API endpoints (clients can't solve PoW).
|
||||||
|
|
||||||
### Externally-authored sites
|
|
||||||
|
|
||||||
**Valia site**:
|
|
||||||
A small public static site authored by Valia (Viktor's mother, external to the infra) and hosted for her under `<name>.viktorbarzin.me`. Its source of truth is a **Content folder** she owns; the live site is a mirror of that folder, fresh within ~10 minutes. Hosted **off-infra** (Cloudflare Pages) by decision: a homelab outage freezes content but never takes her sites down. Viktor picks the English subdomain name per site at registration (her folder names stay Bulgarian). Current instances: `stem95su`, `bridge`.
|
|
||||||
_Avoid_: "school site" (the family may grow beyond school projects); treating the deployed copy as editable — edits land only in the **Content folder**.
|
|
||||||
|
|
||||||
**Content folder**:
|
|
||||||
The Google Drive folder (or subfolder) Valia shares with `vbarzin@gmail.com` holding one **Valia site**'s files. Strictly read-only from the infra side — nothing ever writes back to her Drive. Empty or half-uploaded folder states must never wipe a live site.
|
|
||||||
_Avoid_: syncing a folder root when the servable content lives in a subfolder (stem95su serves `stem claude/files/`, not the folder root).
|
|
||||||
|
|
||||||
**Entry file**:
|
|
||||||
The HTML file a **Valia site** serves at `/`. Defaults to `index.html`; per-site override when she names it differently (stem95su: `stem_board.html`). The override is a registration-time setting, not a constraint on her authoring.
|
|
||||||
_Avoid_: asking Valia to rename her files to fit hosting conventions.
|
|
||||||
|
|
||||||
## Relationships
|
## Relationships
|
||||||
|
|
||||||
- A **Service** is defined by exactly one **Stack** — **flat** or wrapping a **Stack-local module** — which sources zero or more shared **Factory modules** and resolves to one or more K8s workloads.
|
- A **Service** is defined by exactly one **Stack** — **flat** or wrapping a **Stack-local module** — which sources zero or more shared **Factory modules** and resolves to one or more K8s workloads.
|
||||||
|
|
@ -262,7 +240,6 @@ _Avoid_: asking Valia to rename her files to fit hosting conventions.
|
||||||
- A **Service**'s image reaches the cluster via **Woodpecker deploy** (push-driven, on commit) or **Keel** (poll-driven, on a new registry tag); **Diun** only notifies. Operator-managed StatefulSets are rolled by neither.
|
- A **Service**'s image reaches the cluster via **Woodpecker deploy** (push-driven, on commit) or **Keel** (poll-driven, on a new registry tag); **Diun** only notifies. Operator-managed StatefulSets are rolled by neither.
|
||||||
- An owned **Service**'s image is built by GitHub Actions from the **Canonical repo**'s **GitHub mirror** and hosted on ghcr.io (ADR-0002); the **Forgejo registry** keeps only a frozen last-known-good tag per **Service**.
|
- An owned **Service**'s image is built by GitHub Actions from the **Canonical repo**'s **GitHub mirror** and hosted on ghcr.io (ADR-0002); the **Forgejo registry** keeps only a frozen last-known-good tag per **Service**.
|
||||||
- Tier-1 **State tier** state and ~12 app databases share one **CNPG** `pg-cluster`, reached through **PgBouncer**; their credentials rotate via the `vault-database` store.
|
- Tier-1 **State tier** state and ~12 app databases share one **CNPG** `pg-cluster`, reached through **PgBouncer**; their credentials rotate via the `vault-database` store.
|
||||||
- A **Valia site** mirrors exactly one **Content folder** and serves exactly one **Entry file** at `/`; the folder is hers, the subdomain name is Viktor's, the hosting is off-infra.
|
|
||||||
|
|
||||||
## Example dialogue
|
## Example dialogue
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1 +1 @@
|
||||||
v0.12.0
|
v0.11.0
|
||||||
|
|
|
||||||
|
|
@ -30,21 +30,11 @@ func memoryCommands() []Command {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// printMemories renders a {memories:[…]} response as one line per memory, or raw JSON.
|
// printMemories renders a {memories:[…]} response as compact lines, or raw JSON.
|
||||||
func printMemories(raw []byte, jsonOut bool) error {
|
func printMemories(raw []byte, jsonOut bool) error {
|
||||||
fmt.Print(renderMemories(raw, jsonOut))
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
|
|
||||||
// renderMemories formats each memory as a single line with its FULL content
|
|
||||||
// (newlines flattened to spaces). Content is deliberately never truncated: the
|
|
||||||
// old 240-rune preview cut memories mid-sentence, misled agents into believing
|
|
||||||
// no full-content read-back existed, and made blind `update --content` from
|
|
||||||
// the preview silently destroy the stored tail. Full passthrough also can't
|
|
||||||
// produce invalid UTF-8 (the old mid-rune cut crashed the recall hook).
|
|
||||||
func renderMemories(raw []byte, jsonOut bool) string {
|
|
||||||
if jsonOut {
|
if jsonOut {
|
||||||
return string(raw) + "\n"
|
fmt.Println(string(raw))
|
||||||
|
return nil
|
||||||
}
|
}
|
||||||
var r struct {
|
var r struct {
|
||||||
Memories []struct {
|
Memories []struct {
|
||||||
|
|
@ -56,20 +46,36 @@ func renderMemories(raw []byte, jsonOut bool) string {
|
||||||
} `json:"memories"`
|
} `json:"memories"`
|
||||||
}
|
}
|
||||||
if err := json.Unmarshal(raw, &r); err != nil {
|
if err := json.Unmarshal(raw, &r); err != nil {
|
||||||
return string(raw) + "\n"
|
fmt.Println(string(raw))
|
||||||
|
return nil
|
||||||
}
|
}
|
||||||
if len(r.Memories) == 0 {
|
if len(r.Memories) == 0 {
|
||||||
return "(no memories)\n"
|
fmt.Println("(no memories)")
|
||||||
|
return nil
|
||||||
}
|
}
|
||||||
var b strings.Builder
|
|
||||||
for _, m := range r.Memories {
|
for _, m := range r.Memories {
|
||||||
c := strings.ReplaceAll(m.Content, "\n", " ")
|
c := truncatePreview(strings.ReplaceAll(m.Content, "\n", " "), 240)
|
||||||
fmt.Fprintf(&b, "#%d [%s] (%.2f) %s\n", m.ID, m.Category, m.Importance, c)
|
fmt.Printf("#%d [%s] (%.2f) %s\n", m.ID, m.Category, m.Importance, c)
|
||||||
if m.Tags != "" {
|
if m.Tags != "" {
|
||||||
fmt.Fprintf(&b, " tags: %s\n", m.Tags)
|
fmt.Printf(" tags: %s\n", m.Tags)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
return b.String()
|
return nil
|
||||||
|
}
|
||||||
|
|
||||||
|
// truncatePreview shortens s to at most maxRunes RUNES, appending "…" when it
|
||||||
|
// trims. Counting runes (not bytes) is load-bearing: a byte slice like s[:240]
|
||||||
|
// can cut through the middle of a multibyte UTF-8 character (e.g. 2-byte
|
||||||
|
// Cyrillic), leaving a dangling lead byte = invalid UTF-8. That crashed strict
|
||||||
|
// decoders downstream — notably the homelab-memory-recall.py UserPromptSubmit
|
||||||
|
// hook (subprocess text=True), which surfaced as a recurring "UserPromptSubmit
|
||||||
|
// hook error" for Cyrillic-language users.
|
||||||
|
func truncatePreview(s string, maxRunes int) string {
|
||||||
|
r := []rune(s)
|
||||||
|
if len(r) <= maxRunes {
|
||||||
|
return s
|
||||||
|
}
|
||||||
|
return string(r[:maxRunes]) + "…"
|
||||||
}
|
}
|
||||||
|
|
||||||
func memoryRecall(args []string) error {
|
func memoryRecall(args []string) error {
|
||||||
|
|
|
||||||
|
|
@ -8,53 +8,25 @@ import (
|
||||||
"unicode/utf8"
|
"unicode/utf8"
|
||||||
)
|
)
|
||||||
|
|
||||||
func TestRenderMemoriesFullContent(t *testing.T) {
|
func TestTruncatePreviewKeepsValidUTF8(t *testing.T) {
|
||||||
// The pretty view must NOT truncate content: the old 240-rune preview cut
|
// Byte-slicing a long Cyrillic string at 240 splits a 2-byte rune and emits
|
||||||
// memories mid-sentence, misled agents into thinking no full-content
|
// invalid UTF-8 — the bug that crashed the recall hook. truncatePreview must
|
||||||
// read-back existed, and made blind `update --content` from the preview
|
// cut on a rune boundary and always stay valid UTF-8.
|
||||||
// destroy the stored tail. Full passthrough also removes the mid-rune-cut
|
long := strings.Repeat("я", 300) // 300 runes / 600 bytes
|
||||||
// invalid-UTF-8 class by construction — nothing is ever sliced.
|
got := truncatePreview(long, 240)
|
||||||
long := strings.Repeat("я", 300) + strings.Repeat("a", 300)
|
|
||||||
raw, _ := json.Marshal(map[string]interface{}{"memories": []map[string]interface{}{
|
|
||||||
{"id": 7, "content": long, "category": "facts", "tags": "t1,t2", "importance": 0.7},
|
|
||||||
}})
|
|
||||||
got := renderMemories(raw, false)
|
|
||||||
if !strings.Contains(got, long) {
|
|
||||||
t.Fatalf("content was truncated: %q", got)
|
|
||||||
}
|
|
||||||
if strings.Contains(got, "…") {
|
|
||||||
t.Fatalf("ellipsis in output — truncation still active: %q", got)
|
|
||||||
}
|
|
||||||
if !utf8.ValidString(got) {
|
if !utf8.ValidString(got) {
|
||||||
t.Fatalf("invalid UTF-8 in output: %q", got)
|
t.Fatalf("truncatePreview produced invalid UTF-8: %q", got)
|
||||||
}
|
}
|
||||||
if !strings.Contains(got, "#7 [facts] (0.70) ") || !strings.Contains(got, "tags: t1,t2") {
|
if r := []rune(got); len(r) != 241 || string(r[:240]) != strings.Repeat("я", 240) || r[240] != '…' {
|
||||||
t.Fatalf("line format broken: %q", got)
|
t.Fatalf("truncatePreview = %d runes, want 240 Cyrillic + ellipsis", len(r))
|
||||||
}
|
}
|
||||||
}
|
// Short multibyte strings pass through untouched (no ellipsis).
|
||||||
|
if got := truncatePreview("кратко", 240); got != "кратко" {
|
||||||
func TestRenderMemoriesFlattensNewlinesToOneLine(t *testing.T) {
|
t.Fatalf("short string altered: %q", got)
|
||||||
// Consumers (the recall hook, terminal skims) rely on one memory per line;
|
|
||||||
// multi-line content is flattened, never split across lines.
|
|
||||||
raw, _ := json.Marshal(map[string]interface{}{"memories": []map[string]interface{}{
|
|
||||||
{"id": 1, "content": "line one\nline two\nline three", "category": "facts", "importance": 0.5},
|
|
||||||
}})
|
|
||||||
got := renderMemories(raw, false)
|
|
||||||
if !strings.Contains(got, "line one line two line three") {
|
|
||||||
t.Fatalf("newlines not flattened: %q", got)
|
|
||||||
}
|
}
|
||||||
}
|
// ASCII boundary still works.
|
||||||
|
if got := truncatePreview(strings.Repeat("a", 500), 240); got != strings.Repeat("a", 240)+"…" {
|
||||||
func TestRenderMemoriesEdgeCases(t *testing.T) {
|
t.Fatalf("ascii truncation wrong: %q", got)
|
||||||
if got := renderMemories([]byte(`{"memories":[]}`), false); got != "(no memories)\n" {
|
|
||||||
t.Fatalf("empty list: %q", got)
|
|
||||||
}
|
|
||||||
// --json and unparseable responses pass through raw.
|
|
||||||
if got := renderMemories([]byte(`{"x":1}`), true); got != "{\"x\":1}\n" {
|
|
||||||
t.Fatalf("json passthrough: %q", got)
|
|
||||||
}
|
|
||||||
if got := renderMemories([]byte(`not json`), false); got != "not json\n" {
|
|
||||||
t.Fatalf("unparseable passthrough: %q", got)
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
BIN
config.tfvars
BIN
config.tfvars
Binary file not shown.
|
|
@ -1,126 +0,0 @@
|
||||||
<svg xmlns="http://www.w3.org/2000/svg" width="1600" height="820" viewBox="0 0 1600 820" font-family="system-ui, -apple-system, 'Segoe UI', Roboto, sans-serif">
|
|
||||||
<!-- ADR-0017: PHYSICAL cabling only — no VLANs, no flows. Solid = cable in
|
|
||||||
place today · dashed = camera-day work · ~~~ = radio. Palette: neutral
|
|
||||||
grays + blue for copper runs (reference dataviz palette text tokens). -->
|
|
||||||
<defs>
|
|
||||||
<marker id="dot" viewBox="0 0 8 8" refX="4" refY="4" markerWidth="5" markerHeight="5">
|
|
||||||
<circle cx="4" cy="4" r="3" fill="#52514e"/>
|
|
||||||
</marker>
|
|
||||||
</defs>
|
|
||||||
|
|
||||||
<rect width="1600" height="820" fill="#fcfcfb"/>
|
|
||||||
|
|
||||||
<text x="40" y="42" font-size="26" font-weight="700" fill="#0b0b0b">ADR-0017 — physical cabling (single-switch, rev 3)</text>
|
|
||||||
<text x="40" y="66" font-size="15" fill="#52514e">wires only — no VLANs, no traffic · solid = in place · dashed = camera-day · ~ = radio</text>
|
|
||||||
|
|
||||||
<!-- ═════════ APARTMENT ═════════ -->
|
|
||||||
<rect x="40" y="100" width="330" height="330" rx="10" fill="#0b0b0b" fill-opacity="0.03" stroke="#b9b8b2"/>
|
|
||||||
<text x="56" y="126" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">APARTMENT</text>
|
|
||||||
|
|
||||||
<text x="70" y="158" font-size="13" fill="#52514e">☁ ISP (internet)</text>
|
|
||||||
<path d="M120,166 L120,196" fill="none" stroke="#52514e" stroke-width="2"/>
|
|
||||||
|
|
||||||
<rect x="64" y="198" width="220" height="64" rx="8" fill="#ffffff" stroke="#8a8984"/>
|
|
||||||
<text x="80" y="222" font-size="14.5" font-weight="700" fill="#0b0b0b">AX6000 router</text>
|
|
||||||
<text x="80" y="242" font-size="12" fill="#52514e">192.168.1.1 · WAN←ISP · 8×LAN</text>
|
|
||||||
|
|
||||||
<rect x="64" y="290" width="220" height="52" rx="8" fill="#ffffff" stroke="#8a8984"/>
|
|
||||||
<text x="80" y="312" font-size="14" font-weight="700" fill="#0b0b0b">Synology NAS · .13</text>
|
|
||||||
<text x="80" y="330" font-size="12" fill="#52514e">on an AX6000 LAN port</text>
|
|
||||||
<path d="M174,262 L174,290" fill="none" stroke="#2a78d6" stroke-width="2"/>
|
|
||||||
|
|
||||||
<text x="70" y="376" font-size="12.5" fill="#52514e">📶 wifi clients (phones, laptops)</text>
|
|
||||||
<path d="M110,262 C104,272 106,278 100,286 C106,294 104,300 100,308 C106,316 104,322 100,330 C106,338 104,344 100,352 C104,358 102,362 98,366" fill="none" stroke="#8a8984" stroke-width="1.6" stroke-dasharray="2,3"/>
|
|
||||||
|
|
||||||
<!-- in-wall run apartment -> garage -->
|
|
||||||
<path d="M284,230 C450,230 540,228 616,228" fill="none" stroke="#2a78d6" stroke-width="2.5"/>
|
|
||||||
<text x="330" y="218" font-size="12.5" font-weight="700" fill="#2a78d6">in-wall run → garage</text>
|
|
||||||
|
|
||||||
<!-- ═════════ GARAGE — RACK ═════════ -->
|
|
||||||
<rect x="560" y="100" width="640" height="680" rx="10" fill="#0b0b0b" fill-opacity="0.03" stroke="#b9b8b2"/>
|
|
||||||
<text x="576" y="126" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">GARAGE — RACK</text>
|
|
||||||
|
|
||||||
<!-- switch -->
|
|
||||||
<rect x="600" y="150" width="560" height="150" rx="8" fill="#ffffff" stroke="#0b0b0b" stroke-opacity="0.5" stroke-width="1.6"/>
|
|
||||||
<text x="616" y="176" font-size="14.5" font-weight="700" fill="#0b0b0b">TL-SG105PE · 5-port gigabit PoE switch</text>
|
|
||||||
<text x="616" y="194" font-size="12" fill="#52514e">mgmt 192.168.1.6 · replaces the old TL-SG105E (→ shelf, cold spare)</text>
|
|
||||||
<g font-size="11.5" text-anchor="middle">
|
|
||||||
<rect x="616" y="210" width="96" height="40" rx="5" fill="#2a78d6" fill-opacity="0.08" stroke="#8a8984"/>
|
|
||||||
<text x="664" y="227" font-weight="700" fill="#0b0b0b">P1</text>
|
|
||||||
<text x="664" y="242" fill="#52514e">← apartment</text>
|
|
||||||
<rect x="722" y="210" width="96" height="40" rx="5" fill="#2a78d6" fill-opacity="0.08" stroke="#8a8984"/>
|
|
||||||
<text x="770" y="227" font-weight="700" fill="#0b0b0b">P2</text>
|
|
||||||
<text x="770" y="242" fill="#52514e">← 4G router</text>
|
|
||||||
<rect x="828" y="210" width="96" height="40" rx="5" fill="#2a78d6" fill-opacity="0.08" stroke="#8a8984"/>
|
|
||||||
<text x="876" y="227" font-weight="700" fill="#0b0b0b">P3</text>
|
|
||||||
<text x="876" y="242" fill="#52514e">← UPS mgmt</text>
|
|
||||||
<rect x="934" y="210" width="96" height="40" rx="5" fill="#2a78d6" fill-opacity="0.08" stroke="#8a8984" stroke-dasharray="4,3"/>
|
|
||||||
<text x="982" y="227" font-weight="700" fill="#0b0b0b">P4 ⚡PoE</text>
|
|
||||||
<text x="982" y="242" fill="#52514e">← camera</text>
|
|
||||||
<rect x="1040" y="210" width="96" height="40" rx="5" fill="#2a78d6" fill-opacity="0.08" stroke="#8a8984"/>
|
|
||||||
<text x="1088" y="227" font-weight="700" fill="#0b0b0b">P5</text>
|
|
||||||
<text x="1088" y="242" fill="#52514e">← R730 eno1</text>
|
|
||||||
</g>
|
|
||||||
<text x="616" y="284" font-size="12" fill="#52514e">every cable below re-plugs old-switch → PE on camera day (≈3 min)</text>
|
|
||||||
|
|
||||||
<!-- 4G router -->
|
|
||||||
<rect x="600" y="360" width="250" height="64" rx="8" fill="#ffffff" stroke="#8a8984"/>
|
|
||||||
<text x="616" y="384" font-size="14" font-weight="700" fill="#0b0b0b">4G router · 192.168.1.7</text>
|
|
||||||
<text x="616" y="403" font-size="12" fill="#52514e">~cellular uplink (out-of-band)</text>
|
|
||||||
<path d="M770,300 L770,360" fill="none" stroke="#2a78d6" stroke-width="2"/>
|
|
||||||
<path d="M856,392 C866,386 864,380 874,376 C866,370 868,364 876,360" fill="none" stroke="#8a8984" stroke-width="1.6" stroke-dasharray="2,3"/>
|
|
||||||
<text x="884" y="380" font-size="12" fill="#52514e">📡 cellular</text>
|
|
||||||
|
|
||||||
<!-- UPS -->
|
|
||||||
<rect x="600" y="452" width="250" height="56" rx="8" fill="#ffffff" stroke="#8a8984"/>
|
|
||||||
<text x="616" y="476" font-size="14" font-weight="700" fill="#0b0b0b">UPS (Huawei)</text>
|
|
||||||
<text x="616" y="494" font-size="12" fill="#52514e">network mgmt card</text>
|
|
||||||
<path d="M876,300 C876,340 800,410 720,452" fill="none" stroke="#2a78d6" stroke-width="2"/>
|
|
||||||
|
|
||||||
<!-- R730 -->
|
|
||||||
<rect x="600" y="540" width="560" height="220" rx="8" fill="#ffffff" stroke="#0b0b0b" stroke-opacity="0.5" stroke-width="1.6"/>
|
|
||||||
<text x="616" y="566" font-size="14.5" font-weight="700" fill="#0b0b0b">Dell R730 · PVE host · 192.168.1.127</text>
|
|
||||||
<g font-size="11.5">
|
|
||||||
<rect x="616" y="582" width="128" height="38" rx="5" fill="#2a78d6" fill-opacity="0.08" stroke="#8a8984"/>
|
|
||||||
<text x="628" y="598" font-weight="700" fill="#0b0b0b">eno1 · LAN1</text>
|
|
||||||
<text x="628" y="613" fill="#52514e">← switch P5 · 1GbE</text>
|
|
||||||
<rect x="756" y="582" width="128" height="38" rx="5" fill="#ffffff" stroke="#8a8984" stroke-dasharray="4,3"/>
|
|
||||||
<text x="768" y="598" font-weight="700" fill="#52514e">eno2 · LAN2</text>
|
|
||||||
<text x="768" y="613" fill="#8a8984">dark · fallback leg</text>
|
|
||||||
<rect x="896" y="582" width="128" height="38" rx="5" fill="#ffffff" stroke="#d8d7d2"/>
|
|
||||||
<text x="908" y="598" fill="#8a8984">eno3 / eno4</text>
|
|
||||||
<text x="908" y="613" fill="#8a8984">free, uncabled</text>
|
|
||||||
<rect x="1036" y="582" width="108" height="38" rx="5" fill="#ffffff" stroke="#d8d7d2"/>
|
|
||||||
<text x="1048" y="598" fill="#8a8984">iDRAC · .4</text>
|
|
||||||
<text x="1048" y="613" fill="#8a8984">shared-LOM/eno1</text>
|
|
||||||
</g>
|
|
||||||
<text x="616" y="648" font-size="12" fill="#52514e">no other network cables — everything else on this host is VIRTUAL:</text>
|
|
||||||
<text x="616" y="668" font-size="12" fill="#52514e">pfSense · ha-sofia (HA) · devvm · k8s-master + node1-6 · registry VM …</text>
|
|
||||||
<text x="616" y="696" font-size="12" fill="#8a8984">(power: host + switch fed from the UPS — power wiring not drawn)</text>
|
|
||||||
|
|
||||||
<path d="M1088,300 C1088,420 720,500 680,582" fill="none" stroke="#2a78d6" stroke-width="2.5"/>
|
|
||||||
<text x="1100" y="330" font-size="12.5" font-weight="700" fill="#2a78d6">LAN1 cable</text>
|
|
||||||
|
|
||||||
<!-- ═════════ GARAGE ENTRANCE ═════════ -->
|
|
||||||
<rect x="1280" y="100" width="280" height="200" rx="10" fill="#0b0b0b" fill-opacity="0.03" stroke="#b9b8b2"/>
|
|
||||||
<text x="1296" y="126" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">GARAGE ENTRANCE</text>
|
|
||||||
<rect x="1304" y="150" width="232" height="110" rx="8" fill="#ffffff" stroke="#8a8984"/>
|
|
||||||
<text x="1320" y="176" font-size="14" font-weight="700" fill="#0b0b0b">vermont-garage camera</text>
|
|
||||||
<text x="1320" y="196" font-size="12" fill="#52514e">HiLook IPC-T241H-C · 10.0.30.70</text>
|
|
||||||
<text x="1320" y="214" font-size="12" fill="#52514e">powered over the data cable (PoE)</text>
|
|
||||||
<text x="1320" y="232" font-size="12" fill="#52514e">outdoor · armored conduit</text>
|
|
||||||
|
|
||||||
<path d="M982,210 C982,150 1140,140 1304,180" fill="none" stroke="#52514e" stroke-width="2.5" stroke-dasharray="7,5"/>
|
|
||||||
<text x="1080" y="136" font-size="12.5" font-weight="700" fill="#52514e">single cat6 in conduit · data + PoE power (camera day)</text>
|
|
||||||
|
|
||||||
<!-- legend -->
|
|
||||||
<g transform="translate(40,780)" font-size="12.5">
|
|
||||||
<line x1="0" y1="-4" x2="44" y2="-4" stroke="#2a78d6" stroke-width="2.5"/>
|
|
||||||
<text x="52" y="0" fill="#0b0b0b">copper, in place</text>
|
|
||||||
<line x1="190" y1="-4" x2="234" y2="-4" stroke="#52514e" stroke-width="2.5" stroke-dasharray="7,5"/>
|
|
||||||
<text x="242" y="0" fill="#0b0b0b">camera-day cable / dark port</text>
|
|
||||||
<path d="M450,-4 C456,-10 454,-14 460,-18" fill="none" stroke="#8a8984" stroke-width="1.6" stroke-dasharray="2,3"/>
|
|
||||||
<text x="470" y="0" fill="#0b0b0b">radio (wifi / cellular)</text>
|
|
||||||
<text x="650" y="0" fill="#52514e">total wired links at the rack: 5 (all on the one switch) · ADR-0017 rev 3</text>
|
|
||||||
</g>
|
|
||||||
</svg>
|
|
||||||
|
Before Width: | Height: | Size: 9 KiB |
|
|
@ -1,99 +0,0 @@
|
||||||
# CCTV segment: dedicated pfSense interface, VLAN-30 trunk on the LAN1 cable
|
|
||||||
|
|
||||||
Status: accepted (2026-07-02, rev 3 — single-switch)
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
The first owned camera at the Sofia/Vermont site (`vermont-garage`, HiLook
|
|
||||||
IPC-T241H-C at the garage entrance) needs to be network-isolated: its cable is
|
|
||||||
physically exposed outside the apartment, so anything plugged into that cable
|
|
||||||
must land in a segment that can reach nothing. The original design doc
|
|
||||||
(NAS: `Emo shared/Claude shared/garage-camera/`) called for an "802.1Q trunk
|
|
||||||
to pfSense" — but nothing in this network terminates dot1q on pfSense; the
|
|
||||||
site idiom is one vlan-aware Proxmox bridge → one tagged VM NIC → one clean
|
|
||||||
untagged pfSense interface per segment.
|
|
||||||
|
|
||||||
**Decision (rev 3):** ONE switch — the new TL-SG105PE **replaces** the old
|
|
||||||
garage TL-SG105E (Viktor prefers not running two switches; retired unit
|
|
||||||
becomes a cold spare, its 192.168.1.6 mgmt IP passes to the PE). Five ports,
|
|
||||||
all used: apartment uplink, 4G router 192.168.1.7, UPS mgmt (all untagged
|
|
||||||
VLAN 1), the camera (untagged VLAN 30, PoE), and the **trunk to R730 `eno1`
|
|
||||||
carrying home LAN untagged + CCTV tagged 30** over the existing LAN1 cable.
|
|
||||||
pfSense `net3` (vtnet3) sits on `vmbr0` with `tag=30` — exactly the site
|
|
||||||
idiom used for dManagementsVms/dKubernetes (bridge-level tag → clean untagged
|
|
||||||
vNIC; pfSense still terminates no dot1q itself). The earlier dedicated
|
|
||||||
`eno2`/`vmbr2` leg is kept **dormant as a fallback** (rev 2 wired it; moving
|
|
||||||
net3 back to vmbr2 restores pure physical isolation in one `qm set`).
|
|
||||||
This narrows the earlier 802.1Q objection rather than contradicting it: the
|
|
||||||
rejection assumed *unmanaged* switches, where any LAN device could inject
|
|
||||||
tagged frames; with the managed PE as the only device on eno1, VLAN-30
|
|
||||||
membership is {camera port, trunk port} only, so tag-30 ingress from every
|
|
||||||
other port — and from the exposed camera cable — is dropped or contained.
|
|
||||||
Cameras are untrusted: default-deny on dCCTV with a single
|
|
||||||
NTP-to-gateway exception; Frigate (k8s) pulls RTSP in; ha-sofia (192.168.1.8)
|
|
||||||
may reach ISAPI/RTSP directly; home-LAN clients route in via an AX6000 static
|
|
||||||
route (10.0.30.0/24 via 192.168.1.2). 10.0.30.0/24 is deliberately NOT in the
|
|
||||||
10.0.20.0/22 trusted source-IP allowlist.
|
|
||||||
|
|
||||||
## Traffic on the trunk — how one cable carries two networks
|
|
||||||
|
|
||||||
The LAN1 cable is shared, but the two networks on it diverge at `vmbr0`
|
|
||||||
(the vlan-aware bridge on the PVE host), and only ONE of them ever touches
|
|
||||||
pfSense:
|
|
||||||
|
|
||||||
- **Untagged (VLAN 1, home LAN)** is plain L2 bridging: vmbr0 switches it
|
|
||||||
between the trunk, the host's own IP (192.168.1.127) and pfSense `net0` —
|
|
||||||
where pfSense sits as an ordinary LAN *client* (WAN 192.168.1.2). The home
|
|
||||||
LAN's gateway is and remains the AX6000; home-LAN traffic never transits
|
|
||||||
pfSense. Consequently a pfSense (or R730 VM-level) outage does not affect
|
|
||||||
the home LAN, and the apartment ↔ 4G-router ↔ UPS paths don't even leave
|
|
||||||
the switch (P1/P2/P3 bridge internally), so out-of-band recovery via the
|
|
||||||
4G router survives the whole rack being down.
|
|
||||||
- **Tagged 30 (CCTV)** has exactly one possible landing: vmbr0 delivers
|
|
||||||
VID 30 only to pfSense `net3` (dCCTV, 10.0.30.1), which is the camera
|
|
||||||
segment's gateway, firewall and sole exit. "Camera → AX6000 → internet"
|
|
||||||
is impossible by construction, not merely by firewall rule.
|
|
||||||
- pfSense forwards *upstream* only its own segments (10.0.10/20/30), NATed
|
|
||||||
out of its WAN toward the AX6000. Load-wise the trunk gained only the
|
|
||||||
camera's ~8 Mbps — it already carried all rack-bound home-LAN traffic.
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
*(editable source: [`0017-cctv-vlan-tagging.excalidraw`](./0017-cctv-vlan-tagging.excalidraw) — open it in excalidraw to tweak)*
|
|
||||||
|
|
||||||
## Considered options
|
|
||||||
|
|
||||||
- **802.1Q over the LAN path behind an UNMANAGED switch** (the original plan
|
|
||||||
read this way) — rejected: any LAN device could inject tagged frames into
|
|
||||||
vmbr0 (`bridge-vids 2-4094`) and tag-passing through a dumb switch is
|
|
||||||
undefined. Rev 3 adopts the tagged path ONLY because the managed PE now
|
|
||||||
polices VLAN-30 membership at the single entry point to eno1; no bridge
|
|
||||||
reconfiguration was needed (vmbr0 was already vlan-aware).
|
|
||||||
- **Dedicated physical leg (eno2 → vmbr2 → net3), one switch per role**
|
|
||||||
(rev 1/2 as-built) — superseded by rev 3: it forced either a second switch
|
|
||||||
(6 connections vs 5 ports once the PE also replaced the old switch) or new
|
|
||||||
hardware. Strongest isolation of all options; kept dormant as the fallback.
|
|
||||||
- **AX6000 as the camera gateway** — rejected earlier in the design (consumer
|
|
||||||
router, no inter-VLAN firewall).
|
|
||||||
|
|
||||||
## Consequences
|
|
||||||
|
|
||||||
- The switch is now single-point and load-bearing for everything in the rack
|
|
||||||
(apartment uplink, pfSense backup-WAN via 4G, UPS mgmt, CCTV) AND its VLAN
|
|
||||||
table + mgmt password are part of the isolation boundary — the Easy Smart
|
|
||||||
mgmt UI answers on every port, so the password is the gate between a
|
|
||||||
compromised camera and the switch config. All 5 ports are consumed: the
|
|
||||||
next camera forces an 8-port PoE upgrade (the wiring plan already fits it).
|
|
||||||
- `eno2`/`vmbr2` stay cabled-ready but dormant (fallback to rev 2's physical
|
|
||||||
leg); eno3/eno4 remain free.
|
|
||||||
- The old TL-SG105E is retired to cold spare; the PE inherits 192.168.1.6
|
|
||||||
(Kea reservation by MAC).
|
|
||||||
- Revision history (all 2026-07-02): rev 1 assumed one shared PE with a
|
|
||||||
port-VLAN split (conflated the two devices); rev 2 split into two switches
|
|
||||||
after inspecting 192.168.1.6 (old non-PoE SG105E, 4/5 ports used); rev 3
|
|
||||||
consolidated back to one switch — the PE replacing the SG105E — per
|
|
||||||
Viktor's preference, moving CCTV onto a managed tagged trunk.
|
|
||||||
- Frigate's ADR-0016 VRAM budget was bumped 2000 → 2300 MiB for the extra
|
|
||||||
NVDEC stream.
|
|
||||||
|
|
@ -1,178 +0,0 @@
|
||||||
<svg xmlns="http://www.w3.org/2000/svg" width="1600" height="880" viewBox="0 0 1600 880" font-family="system-ui, -apple-system, 'Segoe UI', Roboto, sans-serif">
|
|
||||||
<!-- ADR-0017 rev 3 dCCTV topology (single switch, VLAN-30 trunk on LAN1).
|
|
||||||
Colors: reference dataviz palette (light mode). blue #2a78d6 = home LAN ·
|
|
||||||
violet #4a3aa7 = dCCTV · aqua #1baf7a = dKubernetes ·
|
|
||||||
yellow #eda100 = dManagementsVms · green #008300 allow · red #e34948 deny -->
|
|
||||||
<defs>
|
|
||||||
<marker id="arrGreen" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
|
||||||
<path d="M0,0 L10,5 L0,10 z" fill="#008300"/>
|
|
||||||
</marker>
|
|
||||||
<marker id="arrRed" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
|
||||||
<path d="M0,0 L10,5 L0,10 z" fill="#e34948"/>
|
|
||||||
</marker>
|
|
||||||
<marker id="arrGray" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="6" markerHeight="6" orient="auto-start-reverse">
|
|
||||||
<path d="M0,0 L10,5 L0,10 z" fill="#52514e"/>
|
|
||||||
</marker>
|
|
||||||
</defs>
|
|
||||||
|
|
||||||
<rect width="1600" height="880" fill="#fcfcfb"/>
|
|
||||||
|
|
||||||
<text x="40" y="42" font-size="26" font-weight="700" fill="#0b0b0b">ADR-0017 — CCTV segment behind pfSense, VLAN-30 trunk on the LAN1 cable</text>
|
|
||||||
<text x="40" y="66" font-size="15" fill="#52514e">Sofia/Vermont · rev 3 (single switch) 2026-07-02 · dashed = camera-day · the ONLY 802.1Q is the trunk between the switch and eno1</text>
|
|
||||||
|
|
||||||
<!-- camera -> everything else (denied) -->
|
|
||||||
<path d="M240,168 C520,104 900,104 1148,140" fill="none" stroke="#e34948" stroke-width="3" marker-end="url(#arrRed)"/>
|
|
||||||
<g transform="translate(560,111)">
|
|
||||||
<circle r="11" fill="#fcfcfb" stroke="#e34948" stroke-width="2.5"/>
|
|
||||||
<path d="M-5,-5 L5,5 M5,-5 L-5,5" stroke="#e34948" stroke-width="2.5"/>
|
|
||||||
</g>
|
|
||||||
<text x="588" y="100" font-size="13.5" font-weight="700" fill="#e34948">DENY · camera → LAN / other segments / internet (default deny on dCCTV)</text>
|
|
||||||
|
|
||||||
<!-- GARAGE ENTRANCE -->
|
|
||||||
<rect x="40" y="128" width="240" height="180" rx="10" fill="#4a3aa7" fill-opacity="0.06" stroke="#4a3aa7" stroke-opacity="0.35"/>
|
|
||||||
<text x="56" y="154" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">GARAGE ENTRANCE</text>
|
|
||||||
<rect x="64" y="170" width="192" height="112" rx="8" fill="#ffffff" stroke="#4a3aa7" stroke-width="2"/>
|
|
||||||
<text x="80" y="196" font-size="15" font-weight="700" fill="#0b0b0b">vermont-garage</text>
|
|
||||||
<text x="80" y="216" font-size="12.5" fill="#52514e">HiLook IPC-T241H-C · pure IR</text>
|
|
||||||
<text x="80" y="234" font-size="12.5" fill="#52514e">10.0.30.70 (Kea reservation)</text>
|
|
||||||
<text x="80" y="252" font-size="12.5" fill="#52514e">DNS: garage-cam.viktorbarzin.lan</text>
|
|
||||||
<text x="80" y="270" font-size="12.5" fill="#52514e">PoE from switch · cloud/P2P off</text>
|
|
||||||
|
|
||||||
<path d="M256,284 C330,330 412,368 417,430" fill="none" stroke="#52514e" stroke-width="2" stroke-dasharray="6,5" marker-end="url(#arrGray)"/>
|
|
||||||
<text x="330" y="322" font-size="12" fill="#52514e">cat6 in conduit · PoE → P4</text>
|
|
||||||
|
|
||||||
<!-- RACK zone: single switch -->
|
|
||||||
<rect x="40" y="360" width="560" height="265" rx="10" fill="#0b0b0b" fill-opacity="0.03" stroke="#b9b8b2"/>
|
|
||||||
<text x="56" y="384" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">RACK — GARAGE · ONE SWITCH</text>
|
|
||||||
|
|
||||||
<rect x="64" y="396" width="512" height="176" rx="8" fill="#4a3aa7" fill-opacity="0.04" stroke="#4a3aa7" stroke-width="2"/>
|
|
||||||
<text x="80" y="420" font-size="15" font-weight="700" fill="#0b0b0b">TL-SG105PE <tspan font-size="12.5" font-weight="400" fill="#52514e">replaces the SG105E · mgmt 192.168.1.6 (Kea) · all 5 ports used</tspan></text>
|
|
||||||
<g font-size="11.5" text-anchor="middle">
|
|
||||||
<rect x="80" y="436" width="88" height="56" rx="6" fill="#2a78d6" fill-opacity="0.12" stroke="#2a78d6"/>
|
|
||||||
<text x="124" y="454" font-weight="700" fill="#0b0b0b">P1 · V1</text>
|
|
||||||
<text x="124" y="470" fill="#52514e">apartment</text>
|
|
||||||
<text x="124" y="484" fill="#52514e">uplink</text>
|
|
||||||
<rect x="178" y="436" width="88" height="56" rx="6" fill="#2a78d6" fill-opacity="0.12" stroke="#2a78d6"/>
|
|
||||||
<text x="222" y="454" font-weight="700" fill="#0b0b0b">P2 · V1</text>
|
|
||||||
<text x="222" y="470" fill="#52514e">4G router</text>
|
|
||||||
<text x="222" y="484" fill="#52514e">192.168.1.7</text>
|
|
||||||
<rect x="276" y="436" width="88" height="56" rx="6" fill="#2a78d6" fill-opacity="0.12" stroke="#2a78d6"/>
|
|
||||||
<text x="320" y="454" font-weight="700" fill="#0b0b0b">P3 · V1</text>
|
|
||||||
<text x="320" y="470" fill="#52514e">UPS mgmt</text>
|
|
||||||
<rect x="374" y="436" width="88" height="56" rx="6" fill="#4a3aa7" fill-opacity="0.12" stroke="#4a3aa7" stroke-width="2"/>
|
|
||||||
<text x="418" y="454" font-weight="700" fill="#0b0b0b">P4 · V30</text>
|
|
||||||
<text x="418" y="470" fill="#52514e">camera</text>
|
|
||||||
<text x="418" y="484" fill="#52514e">PoE ON</text>
|
|
||||||
<rect x="472" y="436" width="88" height="56" rx="6" fill="#2a78d6" fill-opacity="0.10" stroke="#4a3aa7" stroke-width="2" stroke-dasharray="0"/>
|
|
||||||
<text x="516" y="454" font-weight="700" fill="#0b0b0b">P5 · trunk</text>
|
|
||||||
<text x="516" y="470" fill="#52514e">V1 untagged</text>
|
|
||||||
<text x="516" y="484" fill="#4a3aa7">+ V30 tagged</text>
|
|
||||||
</g>
|
|
||||||
<text x="80" y="516" font-size="12" fill="#52514e">802.1Q: VLAN 1 untagged {P1,P2,P3,P5} · VLAN 30 {P4 untagged/PVID 30, P5 tagged}</text>
|
|
||||||
<text x="80" y="534" font-size="12" fill="#52514e">tag-30 ingress on P1/P2/P3 is dropped (not members) — the trunk is the only tagged path</text>
|
|
||||||
<text x="80" y="558" font-size="12" fill="#8a8984">old TL-SG105E → retired, cold spare · backup-WAN (4G) + UPS keep their ports</text>
|
|
||||||
|
|
||||||
<!-- trunk: two parallel lines to eno1 -->
|
|
||||||
<path d="M560,458 C630,458 640,428 692,420" fill="none" stroke="#2a78d6" stroke-width="2.5"/>
|
|
||||||
<path d="M560,466 C632,466 644,436 692,428" fill="none" stroke="#4a3aa7" stroke-width="2.5"/>
|
|
||||||
<text x="588" y="404" font-size="12" font-weight="700" fill="#0b0b0b">LAN1 cable</text>
|
|
||||||
|
|
||||||
<!-- R730 / PVE zone -->
|
|
||||||
<rect x="680" y="330" width="880" height="440" rx="10" fill="#0b0b0b" fill-opacity="0.03" stroke="#b9b8b2"/>
|
|
||||||
<text x="696" y="356" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">DELL R730 — PVE HOST 192.168.1.127 (IN THE RACK)</text>
|
|
||||||
|
|
||||||
<g font-size="12">
|
|
||||||
<rect x="700" y="400" width="150" height="46" rx="6" fill="#2a78d6" fill-opacity="0.12" stroke="#4a3aa7" stroke-width="2"/>
|
|
||||||
<text x="712" y="419" font-weight="700" fill="#0b0b0b">eno1 → vmbr0</text>
|
|
||||||
<text x="712" y="436" fill="#52514e">untag V1 + tag 30</text>
|
|
||||||
|
|
||||||
<rect x="700" y="471" width="150" height="46" rx="6" fill="#ffffff" stroke="#8a8984" stroke-dasharray="4,3"/>
|
|
||||||
<text x="712" y="490" font-weight="700" fill="#52514e">eno2 → vmbr2</text>
|
|
||||||
<text x="712" y="507" fill="#8a8984">dormant fallback leg</text>
|
|
||||||
|
|
||||||
<rect x="700" y="542" width="150" height="46" rx="6" fill="#0b0b0b" fill-opacity="0.04" stroke="#8a8984"/>
|
|
||||||
<text x="712" y="561" font-weight="700" fill="#0b0b0b">vmbr1</text>
|
|
||||||
<text x="712" y="578" fill="#52514e">internal · tags 10/20</text>
|
|
||||||
</g>
|
|
||||||
|
|
||||||
<!-- pfSense VM -->
|
|
||||||
<rect x="890" y="388" width="300" height="230" rx="8" fill="#ffffff" stroke="#8a8984"/>
|
|
||||||
<text x="906" y="414" font-size="15" font-weight="700" fill="#0b0b0b">pfSense (VM 101)</text>
|
|
||||||
<text x="906" y="432" font-size="12" fill="#52514e">gateway + firewall for every segment</text>
|
|
||||||
<g font-size="12">
|
|
||||||
<rect x="906" y="444" width="268" height="34" rx="5" fill="#2a78d6" fill-opacity="0.12" stroke="#2a78d6"/>
|
|
||||||
<text x="916" y="465" fill="#0b0b0b">net0 · WAN <tspan fill="#52514e">192.168.1.2 · vmbr0 untagged</tspan></text>
|
|
||||||
<rect x="906" y="484" width="268" height="34" rx="5" fill="#eda100" fill-opacity="0.14" stroke="#eda100"/>
|
|
||||||
<text x="916" y="505" fill="#0b0b0b">net1 · dManagementsVms <tspan fill="#52514e">10.0.10.1</tspan></text>
|
|
||||||
<rect x="906" y="524" width="268" height="34" rx="5" fill="#1baf7a" fill-opacity="0.12" stroke="#1baf7a"/>
|
|
||||||
<text x="916" y="545" fill="#0b0b0b">net2 · dKubernetes <tspan fill="#52514e">10.0.20.1</tspan></text>
|
|
||||||
<rect x="906" y="564" width="268" height="34" rx="5" fill="#4a3aa7" fill-opacity="0.12" stroke="#4a3aa7" stroke-width="2"/>
|
|
||||||
<text x="916" y="585" fill="#0b0b0b">net3 · dCCTV <tspan fill="#52514e">10.0.30.1/24 · vmbr0 tag 30</tspan></text>
|
|
||||||
</g>
|
|
||||||
<path d="M850,415 L890,458" fill="none" stroke="#2a78d6" stroke-width="1.6" opacity="0.6"/>
|
|
||||||
<path d="M850,430 L890,581" fill="none" stroke="#4a3aa7" stroke-width="2"/>
|
|
||||||
<path d="M850,565 L890,501" fill="none" stroke="#8a8984" stroke-width="1.6" opacity="0.6"/>
|
|
||||||
<path d="M850,565 L890,541" fill="none" stroke="#8a8984" stroke-width="1.6" opacity="0.6"/>
|
|
||||||
|
|
||||||
<!-- k8s VMs -->
|
|
||||||
<rect x="1240" y="388" width="290" height="230" rx="8" fill="#1baf7a" fill-opacity="0.07" stroke="#1baf7a"/>
|
|
||||||
<text x="1256" y="414" font-size="15" font-weight="700" fill="#0b0b0b">k8s VMs · 10.0.20.0/24</text>
|
|
||||||
<text x="1256" y="434" font-size="12.5" fill="#52514e">vmbr1 tag 20 · pod egress SNATs</text>
|
|
||||||
<text x="1256" y="450" font-size="12.5" fill="#52514e">to node IPs</text>
|
|
||||||
<rect x="1256" y="464" width="258" height="66" rx="6" fill="#ffffff" stroke="#1baf7a"/>
|
|
||||||
<text x="1268" y="486" font-size="13.5" font-weight="700" fill="#0b0b0b">Frigate · k8s-node1 (T4)</text>
|
|
||||||
<text x="1268" y="504" font-size="12" fill="#52514e">detect sub / record main</text>
|
|
||||||
<text x="1268" y="520" font-size="12" fill="#52514e">gpumem budget 2300 MiB</text>
|
|
||||||
<rect x="1256" y="540" width="258" height="52" rx="6" fill="#ffffff" stroke="#1baf7a"/>
|
|
||||||
<text x="1268" y="562" font-size="13.5" font-weight="700" fill="#0b0b0b">go2rtc LB 10.0.20.204</text>
|
|
||||||
<text x="1268" y="580" font-size="12" fill="#52514e">restream → HA live view (MSE/HLS)</text>
|
|
||||||
|
|
||||||
<!-- HOME LAN zone -->
|
|
||||||
<rect x="1148" y="128" width="412" height="180" rx="10" fill="#2a78d6" fill-opacity="0.06" stroke="#2a78d6" stroke-opacity="0.4"/>
|
|
||||||
<text x="1164" y="154" font-size="13" font-weight="700" fill="#52514e" letter-spacing="1">HOME LAN 192.168.1.0/24</text>
|
|
||||||
<rect x="1164" y="168" width="180" height="56" rx="6" fill="#ffffff" stroke="#2a78d6"/>
|
|
||||||
<text x="1176" y="190" font-size="13.5" font-weight="700" fill="#0b0b0b">AX6000 · .1</text>
|
|
||||||
<text x="1176" y="208" font-size="11.5" fill="#52514e">+ route 10.0.30.0/24 → .2</text>
|
|
||||||
<rect x="1164" y="236" width="180" height="52" rx="6" fill="#ffffff" stroke="#2a78d6"/>
|
|
||||||
<text x="1176" y="258" font-size="13.5" font-weight="700" fill="#0b0b0b">ha-sofia · .8</text>
|
|
||||||
<text x="1176" y="275" font-size="11.5" fill="#52514e">Frigate card + hikvision_next</text>
|
|
||||||
<rect x="1360" y="168" width="184" height="56" rx="6" fill="#ffffff" stroke="#2a78d6"/>
|
|
||||||
<text x="1372" y="190" font-size="13.5" font-weight="700" fill="#0b0b0b">apartment clients</text>
|
|
||||||
<text x="1372" y="208" font-size="11.5" fill="#52514e">laptops, phones</text>
|
|
||||||
<rect x="1360" y="236" width="184" height="52" rx="6" fill="#ffffff" stroke="#52514e" stroke-dasharray="5,4"/>
|
|
||||||
<text x="1372" y="256" font-size="11.5" font-weight="700" fill="#52514e">CAMERA DAY: static route</text>
|
|
||||||
<text x="1372" y="272" font-size="11.5" fill="#52514e">10.0.30.0/24 via 192.168.1.2</text>
|
|
||||||
|
|
||||||
<path d="M1254,308 C1150,352 950,372 790,400" fill="none" stroke="#2a78d6" stroke-width="2" opacity="0.6"/>
|
|
||||||
<text x="1010" y="374" font-size="12" fill="#2a78d6">apartment uplink · switch P1 · trunk · eno1</text>
|
|
||||||
|
|
||||||
<!-- FLOWS -->
|
|
||||||
<path d="M1256,497 C1010,690 330,730 120,650 C40,618 40,380 96,286" fill="none" stroke="#008300" stroke-width="3" marker-end="url(#arrGreen)"/>
|
|
||||||
<text x="620" y="700" font-size="13.5" font-weight="700" fill="#008300">ALLOW · Frigate → camera RTSP :554 (routed k8s → dCCTV; opt1 allow-all)</text>
|
|
||||||
|
|
||||||
<path d="M1164,262 C820,282 470,268 302,176 C286,167 278,166 270,172" fill="none" stroke="#008300" stroke-width="3" marker-end="url(#arrGreen)"/>
|
|
||||||
<text x="484" y="216" font-size="13.5" font-weight="700" fill="#008300">ALLOW · ha-sofia → camera :80 ISAPI + :554</text>
|
|
||||||
<text x="484" y="234" font-size="12" fill="#52514e">enters pfSense WAN · reply-to off · needs the AX6000 route</text>
|
|
||||||
|
|
||||||
<path d="M280,232 C660,200 860,320 936,386" fill="none" stroke="#008300" stroke-width="2" opacity="0.85" marker-end="url(#arrGreen)"/>
|
|
||||||
<text x="740" y="322" font-size="12.5" font-weight="700" fill="#008300">ALLOW · camera → 10.0.30.1:123 (NTP)</text>
|
|
||||||
|
|
||||||
<!-- LEGEND -->
|
|
||||||
<g transform="translate(40,800)" font-size="12.5">
|
|
||||||
<rect x="0" y="0" width="18" height="18" rx="4" fill="#2a78d6" fill-opacity="0.12" stroke="#2a78d6"/>
|
|
||||||
<text x="26" y="14" fill="#0b0b0b">home LAN / VLAN 1</text>
|
|
||||||
<rect x="200" y="0" width="18" height="18" rx="4" fill="#4a3aa7" fill-opacity="0.12" stroke="#4a3aa7" stroke-width="2"/>
|
|
||||||
<text x="226" y="14" fill="#0b0b0b">CCTV / VLAN 30 / dCCTV 10.0.30.0/24</text>
|
|
||||||
<rect x="500" y="0" width="18" height="18" rx="4" fill="#1baf7a" fill-opacity="0.12" stroke="#1baf7a"/>
|
|
||||||
<text x="526" y="14" fill="#0b0b0b">dKubernetes</text>
|
|
||||||
<rect x="640" y="0" width="18" height="18" rx="4" fill="#eda100" fill-opacity="0.14" stroke="#eda100"/>
|
|
||||||
<text x="666" y="14" fill="#0b0b0b">dManagementsVms</text>
|
|
||||||
<line x1="820" y1="9" x2="860" y2="9" stroke="#008300" stroke-width="3" marker-end="url(#arrGreen)"/>
|
|
||||||
<text x="870" y="14" fill="#0b0b0b">allowed flow</text>
|
|
||||||
<line x1="980" y1="9" x2="1020" y2="9" stroke="#e34948" stroke-width="3" marker-end="url(#arrRed)"/>
|
|
||||||
<text x="1030" y="14" fill="#0b0b0b">denied</text>
|
|
||||||
<line x1="1100" y1="9" x2="1140" y2="9" stroke="#52514e" stroke-width="2" stroke-dasharray="6,5"/>
|
|
||||||
<text x="1150" y="14" fill="#0b0b0b">camera-day step</text>
|
|
||||||
<text x="1320" y="14" fill="#52514e">ADR-0017 · rev 3</text>
|
|
||||||
</g>
|
|
||||||
</svg>
|
|
||||||
|
Before Width: | Height: | Size: 13 KiB |
File diff suppressed because it is too large
Load diff
File diff suppressed because one or more lines are too long
|
Before Width: | Height: | Size: 23 KiB |
|
|
@ -1,47 +0,0 @@
|
||||||
# Valia sites are served off-infra (Cloudflare Pages), synced in-cluster
|
|
||||||
|
|
||||||
Valia (Viktor's mother) authors small one-page static sites in Google Drive folders she
|
|
||||||
shares, and keeps asking for them to be hosted — two exist already (`stem95su`, `bridge`)
|
|
||||||
and more are expected. We decided all **Valia sites** are served **off-infra on Cloudflare
|
|
||||||
Pages** under `<english-name>.viktorbarzin.me`, kept fresh by **one shared in-cluster
|
|
||||||
CronJob** (`stacks/valia-sites/`) that mirrors each **Content folder** every 10 minutes
|
|
||||||
(rclone, drive.readonly) and re-deploys only on change (wrangler direct upload). The
|
|
||||||
existing in-cluster `stem95su` serving stack (nginx + NFS + ingress + per-site sync)
|
|
||||||
migrates onto this and is retired.
|
|
||||||
|
|
||||||
Why off-infra serving: these are her sites, shown to teachers/parents — they must survive
|
|
||||||
homelab outages (cf. the 2026-06-27 egress incident that took every proxied in-cluster
|
|
||||||
site down). With Pages, a homelab outage degrades to "content frozen until we're back",
|
|
||||||
never "site down". Serving costs no cluster resources and no per-site nginx/PVC/ingress/
|
|
||||||
Anubis. Why the syncer stays in-cluster anyway: secrets stay in Vault (no per-site GHA
|
|
||||||
secret sprawl), and the stem95su guard patterns (hard-fail on Drive auth errors, never
|
|
||||||
wipe a live site on an empty/partial folder, capped deletes) carry over wholesale. The
|
|
||||||
deliberate asymmetry — off-infra serving, on-infra syncing — is the point, not an
|
|
||||||
accident.
|
|
||||||
|
|
||||||
## Considered options
|
|
||||||
|
|
||||||
- **In-cluster everywhere** (generalise stem95su into a factory module): one roof, no
|
|
||||||
Cloudflare Pages dependency — but her sites share the homelab's fate and each site
|
|
||||||
spends cluster resources to serve static files a free CDN serves better.
|
|
||||||
- **Pages for new sites only**: less work now, two patterns and two runbooks forever.
|
|
||||||
- **GHA-scheduled sync** (fully off-infra pipeline): no cluster dependency at all, but
|
|
||||||
Drive + Cloudflare credentials would live as GitHub secrets per repo, outside Vault.
|
|
||||||
|
|
||||||
## Consequences
|
|
||||||
|
|
||||||
- Registration is one entry in the `sites` map (name, Content folder, optional Entry
|
|
||||||
file); CI applies Pages project, custom domain, public CNAME, and internal-DNS config
|
|
||||||
together. Names are English, picked by Viktor (most → bridge set the precedent).
|
|
||||||
- The internal split-horizon zone learns Valia sites from a ConfigMap the
|
|
||||||
`technitium-ingress-dns-sync` script consumes — declaratively, including **removal**
|
|
||||||
(the previous static-CNAME approach was add-only; a retired site left a stale record).
|
|
||||||
- Deploy-on-change is mandatory, not an optimisation: Pages caps monthly deployments on
|
|
||||||
the free tier, and a 10-minute cadence would burn ~4,300/month if unchanged runs
|
|
||||||
deployed.
|
|
||||||
- Failure visibility is **failed-Job-only** by explicit choice (no stale-sync alert, no
|
|
||||||
per-site uptime monitors, no notifications to Valia) — Viktor fields "it didn't
|
|
||||||
update" reports, consistent with the alert-noise-reduction posture. Revisit if a
|
|
||||||
silent stall actually bites.
|
|
||||||
- If the homelab is down, content updates pause; the sites keep serving last-deployed
|
|
||||||
content. Accepted degradation.
|
|
||||||
|
|
@ -1,97 +0,0 @@
|
||||||
# Inbound mail gets a self-hosted store-and-forward backup MX on Oracle Always-Free
|
|
||||||
|
|
||||||
`viktorbarzin.me` has run a single direct MX to the home IP since the 2026-04-12
|
|
||||||
inbound overhaul, with sender-MTA retry (1–5 days, sender-dependent) as the only
|
|
||||||
outage protection — a documented "No Backup MX" decision made after ForwardEmail's
|
|
||||||
forced anti-spoofing rejected legitimate forwarded mail and Cloudflare Email
|
|
||||||
Routing proved pass-through-only. Viktor now wants inbound mail to survive
|
|
||||||
homelab outages **without loss** (2026-07-04): delayed delivery is fine,
|
|
||||||
mid-outage reading is not required, and the budget is **$0** — a hard
|
|
||||||
constraint that eliminated every managed option (see below).
|
|
||||||
|
|
||||||
We run a minimal **Postfix store-and-forward relay on an Oracle Cloud
|
|
||||||
Always-Free `VM.Standard.E2.1.Micro`** (`mx2.viktorbarzin.me`, **reserved**
|
|
||||||
public IP, MX preference 20; primary untouched at 1). It accepts everything
|
|
||||||
for the domain (catch-all — every RCPT is valid; reputation may only ever
|
|
||||||
4xx-defer, via postscreen pregreet + conservative DNSBL-defer on the VM —
|
|
||||||
never 5xx: a backup MX that hard-rejects manufactures the loss it exists to
|
|
||||||
prevent), queues up to **30 days** (bounce lifetime 1 day — the VM can never
|
|
||||||
deliver a DSN, its only egress is the drain), and drains to the primary over
|
|
||||||
**port 2526** — one scripted pfSense WAN NAT rule onto the existing HAProxy
|
|
||||||
frontend — because Oracle blocks egress TCP 25 tenancy-wide. Management is
|
|
||||||
tailnet-only (headscale preauth key, `tag:backup-mx`; OCI console as
|
|
||||||
mid-outage break-glass since headscale itself lives in the cluster); TLS via
|
|
||||||
certbot HTTP-01 (port 80 permanently open — LE validation is
|
|
||||||
multi-perspective and unscopeable); the VM is a cattle-rebuild from a new
|
|
||||||
`stacks/backup-mx/` Terraform stack (OCI provider + cloud-init, which must
|
|
||||||
also punch 25/80 through the OCI Ubuntu image's OS-level iptables REJECT).
|
|
||||||
On the primary, the drain stream (one /32) is enabled at the layers that
|
|
||||||
actually bite — `check_client_access` permits past
|
|
||||||
`reject_unknown_client_hostname` and spoof-protection, an anvil rate-limit
|
|
||||||
exception, and rspamd `external_relay` (score against the *original* sender
|
|
||||||
IP) with the reject action capped to tag/fold so drained spam can never force
|
|
||||||
the VM to emit backscatter. Go-live is gated on empirical checks: inbound-25
|
|
||||||
reachability (recurring probe — Oracle publishes no commitment), drain
|
|
||||||
end-to-end, and a live failover test that includes a high-spam-score and a
|
|
||||||
>10 MB message. Two independent adversarial reviews (2026-07-04) shaped this
|
|
||||||
final form. Design:
|
|
||||||
[`plans/2026-07-04-backup-mx-design.md`](../plans/2026-07-04-backup-mx-design.md).
|
|
||||||
|
|
||||||
## Considered options
|
|
||||||
|
|
||||||
- **Roller Network free Secondary MX** — v1 of this decision, killed at the
|
|
||||||
validation gates the same day: free tier caps at 200 relayed messages or
|
|
||||||
10 MB per rolling 7 days, and overage suspends the domain for 48 h
|
|
||||||
answering **SMTP 5xx** (permanent bounces) — since spammers target backup
|
|
||||||
MXes even while the primary is up, background spam alone can hold it
|
|
||||||
suspended, making it *worse than no backup MX*. Free accounts are also
|
|
||||||
being discontinued. (Their TLS checked out; their paid Basic at $30/yr is
|
|
||||||
the documented fallback if the OCI route sours.)
|
|
||||||
- **Dynu Email Backup ($9.99/yr)** — queue lifetime undocumented (FAQ hints
|
|
||||||
12–24 h, barely beating sender retry); filtering black-box; not free.
|
|
||||||
- **Cloudflare Email Routing / mailflare** — no store-and-forward / terminal
|
|
||||||
inbox on Cloudflare; rejected earlier (2026-04-12; 2026-07-04 memory #7148).
|
|
||||||
- **Other free tiers** (challenged and re-verified 2026-07-04): GCP e2-micro
|
|
||||||
blocks egress 25 too and its free regions are US-only; AWS's 2025+ "free"
|
|
||||||
plan is a 6-month credit; Azure has no always-free VM and blocks 25;
|
|
||||||
Hetzner has no free tier; Fly.io ended free allowances; Vultr/Linode are
|
|
||||||
trial credits; DNSExit/KisoLabs/DuoCircle backup-MX are paid or dead. OCI
|
|
||||||
is the only standing free option.
|
|
||||||
- **Harden-only** (5xx-misconfig guards + paging) — does not address
|
|
||||||
multi-day outages or short-retry senders; deferred as a complementary
|
|
||||||
track.
|
|
||||||
|
|
||||||
## Consequences
|
|
||||||
|
|
||||||
- **A pet outside the cluster** — deliberately cattle: rebuilt entirely from
|
|
||||||
Terraform + cloud-init, patched by unattended-upgrades, scraped by the
|
|
||||||
cluster's Prometheus (exporters on the reserved public IP, allowlisted to
|
|
||||||
the homelab WAN /32 — there is **no cluster→tailnet route**, so tailnet
|
|
||||||
scraping was rejected as fictional; blackbox TCP:25 + MX-set drift alerts
|
|
||||||
besides). Never a backup target itself.
|
|
||||||
- **Oracle free-tier caprice is the top risk**: Oracle silently halved the A1
|
|
||||||
free allowance in June 2026 and terminated over-limit instances, and
|
|
||||||
publishes no commitment that inbound 25 stays open. Mitigations:
|
|
||||||
**Pay-As-You-Go conversion is a required prerequisite** (exempts idle
|
|
||||||
reclamation, stays $0), a recurring inbound-25 probe, `BackupMxDown`, and
|
|
||||||
the queue being empty outside outages (a surprise reclamation loses
|
|
||||||
coverage, never mail). Home region is fixed at signup — Frankfurt, chosen
|
|
||||||
once.
|
|
||||||
- The drain stream bypasses `reject_unknown_client_hostname`, anvil limits,
|
|
||||||
and rspamd's reject tier for one /32; DKIM verification, SPF/DMARC (against
|
|
||||||
the original IP via `external_relay`), and content scoring stay on — spam
|
|
||||||
arriving via the backup is tagged and folded to Junk, never bounced. The VM
|
|
||||||
is deliberately NOT in the primary's `mynetworks` (a compromised VM must
|
|
||||||
not relay through us).
|
|
||||||
- **Outages > 30 days lose queued mail silently** — no DSN can ever leave the
|
|
||||||
VM. Stated and accepted (6× better than the status quo).
|
|
||||||
- Outage mail sits in plaintext on Oracle disk ≤ 30 days — single-tenant but
|
|
||||||
off-premises; accepted (same class as Brevo holding outbound today).
|
|
||||||
- Cloudflare zone lands at 197/200 records; the MTA-STS follow-up (policy
|
|
||||||
host found dangling during design — inert today; must list `mx2` when
|
|
||||||
fixed) needs 1–2 more → schedule the next record purge proactively.
|
|
||||||
- `architecture/mailserver.md` §"No Backup MX" superseded at implementation;
|
|
||||||
new runbook `docs/runbooks/backup-mx.md` (incl. OCI console break-glass);
|
|
||||||
`vpn.md`'s stale headscale claims fixed in passing; the roundtrip probe's
|
|
||||||
failure semantics change (a "failing" probe may now mean "delayed via mx2,
|
|
||||||
drains shortly" — noted in alert description).
|
|
||||||
|
|
@ -329,12 +329,6 @@ Two independent grants make up "browser access" for a user:
|
||||||
the provisioner. To revoke: remove from `CHROME_ALLOWED` and delete the SA (rotate
|
the provisioner. To revoke: remove from `CHROME_ALLOWED` and delete the SA (rotate
|
||||||
a token by deleting its `<user>-browser-token` Secret).
|
a token by deleting its `<user>-browser-token` Secret).
|
||||||
|
|
||||||
Because the SA is the user's DEFAULT kubectl credential, other per-namespace
|
|
||||||
port-forward grants hang off the same identity: `stacks/excalidraw/rbac.tf`
|
|
||||||
grants `emo-browser` `pods/portforward` in `excalidraw` (2026-07-02) so emo's
|
|
||||||
agent can upload drawings via the port-forward + `X-Authentik-Username` recipe
|
|
||||||
in his `~/.claude/CLAUDE.md`. Revoking the SA revokes those too.
|
|
||||||
|
|
||||||
## Limits + risks
|
## Limits + risks
|
||||||
|
|
||||||
- **Anti-bot vs stealth arms race** — when an upstream beats us (DRM
|
- **Anti-bot vs stealth arms race** — when an upstream beats us (DRM
|
||||||
|
|
|
||||||
|
|
@ -94,7 +94,7 @@ can't reach Forgejo's public hairpin.
|
||||||
| Visibility | Packages | Pull mechanism |
|
| Visibility | Packages | Pull mechanism |
|
||||||
|------------|----------|----------------|
|
|------------|----------|----------------|
|
||||||
| **Public** | beadboard, nextcloud-todos, claude-agent-service, claude-memory-mcp, kms-website, freedify, tuya_bridge, x402-gateway, chrome-service-novnc, android-emulator | Anonymous |
|
| **Public** | beadboard, nextcloud-todos, claude-agent-service, claude-memory-mcp, kms-website, freedify, tuya_bridge, x402-gateway, chrome-service-novnc, android-emulator | Anonymous |
|
||||||
| **Private** | f1-stream, job-hunter, instagram-poster, payslip-ingest, wealthfolio-sync, fire-planner, recruiter-responder, tripit, infra-cli, infra-ci, k8s-portal, excalidraw-library | `ghcr-credentials` dockerconfigjson |
|
| **Private** | f1-stream, job-hunter, instagram-poster, payslip-ingest, wealthfolio-sync, fire-planner, recruiter-responder, tripit, infra-cli, infra-ci | `ghcr-credentials` dockerconfigjson |
|
||||||
|
|
||||||
Private-image pulls use the `ghcr-credentials` dockerconfigjson, cloned by the
|
Private-image pulls use the `ghcr-credentials` dockerconfigjson, cloned by the
|
||||||
kyverno stack's `sync-ghcr-credentials` ClusterPolicy to an explicit
|
kyverno stack's `sync-ghcr-credentials` ClusterPolicy to an explicit
|
||||||
|
|
@ -188,8 +188,6 @@ reconciled — the workflows were added to the GitHub lineage via PR):
|
||||||
| android-emulator | `build-android-emulator.yml` | public `ghcr.io/viktorbarzin/android-emulator` |
|
| android-emulator | `build-android-emulator.yml` | public `ghcr.io/viktorbarzin/android-emulator` |
|
||||||
| infra CLI | `build-cli.yml` | DockerHub `viktorbarzin/infra` (kept) + `ghcr.io/viktorbarzin/infra-cli` |
|
| infra CLI | `build-cli.yml` | DockerHub `viktorbarzin/infra` (kept) + `ghcr.io/viktorbarzin/infra-cli` |
|
||||||
| infra-ci | `build-infra-ci.yml` | private `ghcr.io/viktorbarzin/infra-ci` |
|
| infra-ci | `build-infra-ci.yml` | private `ghcr.io/viktorbarzin/infra-ci` |
|
||||||
| k8s-portal | `build-k8s-portal.yml` | private `ghcr.io/viktorbarzin/k8s-portal` (Keel rolls `:latest` digests) |
|
|
||||||
| excalidraw-library | `build-excalidraw.yml` | private `ghcr.io/viktorbarzin/excalidraw-library` (Keel rolls `:latest` digests; DockerHub `:v4` frozen as rollback) |
|
|
||||||
|
|
||||||
**`infra-ci`** is the image the `.woodpecker/default.yml` apply step and
|
**`infra-ci`** is the image the `.woodpecker/default.yml` apply step and
|
||||||
`drift-detection.yml` run in (proven by pipelines 165/166). `chatterbox-tts` is
|
`drift-detection.yml` run in (proven by pipelines 165/166). `chatterbox-tts` is
|
||||||
|
|
|
||||||
|
|
@ -277,7 +277,7 @@ Technitium's **Split Horizon AddressTranslation** app post-processes DNS respons
|
||||||
|
|
||||||
Config is synced to all 3 Technitium instances by CronJob `technitium-split-horizon-sync` (every 6h).
|
Config is synced to all 3 Technitium instances by CronJob `technitium-split-horizon-sync` (every 6h).
|
||||||
|
|
||||||
**Superset rule for the internal `viktorbarzin.me` zone**: it is authoritative for every internal client (pods included since 2026-06-10), so it must carry every record type those clients consume — not just ingress A/CNAMEs. The `technitium-ingress-dns-sync` CronJob therefore also maintains the static **mail-auth records** (apex SPF + brevo-code TXT, MX → mail.viktorbarzin.me, `_dmarc`, `mail._domainkey` DKIM), mirrored from the public Cloudflare zone. Without them, rspamd on the mailserver saw `SPF=none` for inbound `@viktorbarzin.me` mail and quarantined it (broke the Brevo email-roundtrip probe, 2026-06-10). If these records change in Cloudflare, update the sync script too. **Off-infra Valia sites** (Cloudflare Pages, ADR-0018) are the other class of public-only names with no Traefik ingress — without internal records they NXDOMAIN for every internal client while working fine externally. Since 2026-07-03 they are reconciled **declaratively**: `stacks/valia-sites` writes the ConfigMap `valia-sites-dns` (technitium ns, `<name> → <project>.pages.dev`), and the sync script ensures/updates a CNAME per entry and **deletes** stale internal CNAMEs targeting `*.pages.dev` that left the map (retire/rename cleans itself up; deletion is suffix-scoped so nothing else can be touched).
|
**Superset rule for the internal `viktorbarzin.me` zone**: it is authoritative for every internal client (pods included since 2026-06-10), so it must carry every record type those clients consume — not just ingress A/CNAMEs. The `technitium-ingress-dns-sync` CronJob therefore also maintains the static **mail-auth records** (apex SPF + brevo-code TXT, MX → mail.viktorbarzin.me, `_dmarc`, `mail._domainkey` DKIM), mirrored from the public Cloudflare zone. Without them, rspamd on the mailserver saw `SPF=none` for inbound `@viktorbarzin.me` mail and quarantined it (broke the Brevo email-roundtrip probe, 2026-06-10). If these records change in Cloudflare, update the sync script too.
|
||||||
|
|
||||||
## NodeLocal DNSCache
|
## NodeLocal DNSCache
|
||||||
|
|
||||||
|
|
@ -368,7 +368,6 @@ The Cloudflare tunnel uses a **wildcard rule** (`*.viktorbarzin.me → Traefik`)
|
||||||
| TXT (MTA-STS) | 1 | `v=STSv1; id=20260412` | TLS enforcement |
|
| TXT (MTA-STS) | 1 | `v=STSv1; id=20260412` | TLS enforcement |
|
||||||
| TXT (TLSRPT) | 1 | `v=TLSRPTv1; rua=mailto:postmaster@...` | TLS reporting |
|
| TXT (TLSRPT) | 1 | `v=TLSRPTv1; rua=mailto:postmaster@...` | TLS reporting |
|
||||||
| A (keyserver) | 1 | `130.162.165.220` (Oracle VPS) | PGP keyserver |
|
| A (keyserver) | 1 | `130.162.165.220` (Oracle VPS) | PGP keyserver |
|
||||||
| CNAME (CF Pages) | 2 | `<project>.pages.dev` (Cloudflare Pages) | bridge, stem95su — Valia sites (ADR-0018), managed by `stacks/valia-sites` |
|
|
||||||
|
|
||||||
### Proxied vs Non-Proxied
|
### Proxied vs Non-Proxied
|
||||||
|
|
||||||
|
|
@ -514,7 +513,6 @@ For external `.viktorbarzin.me` records:
|
||||||
1. Add `dns_type = "proxied"` (or `"non-proxied"`) to the `ingress_factory` module call in the service stack
|
1. Add `dns_type = "proxied"` (or `"non-proxied"`) to the `ingress_factory` module call in the service stack
|
||||||
2. Run `scripts/tg apply` on the service stack — DNS record is auto-created
|
2. Run `scripts/tg apply` on the service stack — DNS record is auto-created
|
||||||
3. For non-standard records (MX, TXT), add a `cloudflare_record` resource in `stacks/cloudflared/modules/cloudflared/cloudflare.tf`
|
3. For non-standard records (MX, TXT), add a `cloudflare_record` resource in `stacks/cloudflared/modules/cloudflared/cloudflare.tf`
|
||||||
4. For a Valia site (off-infra Cloudflare Pages), add the entry to `local.sites` in `stacks/valia-sites/main.tf` — public CNAME + internal record both follow (`docs/runbooks/valia-sites.md`)
|
|
||||||
|
|
||||||
## Incident History
|
## Incident History
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -161,17 +161,6 @@ https://mail.viktorbarzin.me → Traefik → Roundcubemail
|
||||||
DB: MySQL (mysql.dbaas.svc.cluster.local)
|
DB: MySQL (mysql.dbaas.svc.cluster.local)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Paperless ingest mailbox (docs@)
|
|
||||||
|
|
||||||
`docs@viktorbarzin.me` is a dedicated real mailbox (explicit self-alias in
|
|
||||||
`extra/aliases.txt` so the `@domain → spam@` catch-all doesn't shadow it) that
|
|
||||||
paperless-ngx polls over IMAP; family members forward document emails to it
|
|
||||||
and the sender maps 1:1 to a paperless account. A per-user Dovecot sieve
|
|
||||||
(`docs-at-viktorbarzin.me.dovecot.sieve` in the `mailserver.config` ConfigMap,
|
|
||||||
mounted as `/tmp/docker-mailserver/docs@viktorbarzin.me.dovecot.sieve`)
|
|
||||||
discards mail from non-allowlisted senders at delivery. Full flow, sender map,
|
|
||||||
and add-a-sender procedure: [`runbooks/paperless-mail-ingest.md`](../runbooks/paperless-mail-ingest.md).
|
|
||||||
|
|
||||||
## DNS Records
|
## DNS Records
|
||||||
|
|
||||||
All managed in Terraform at `stacks/cloudflared/modules/cloudflared/cloudflare.tf`.
|
All managed in Terraform at `stacks/cloudflared/modules/cloudflared/cloudflare.tf`.
|
||||||
|
|
@ -311,21 +300,6 @@ Push secrets (`BREVO_API_KEY`, `EMAIL_MONITOR_IMAP_PASSWORD`) come from External
|
||||||
|
|
||||||
## Troubleshooting
|
## Troubleshooting
|
||||||
|
|
||||||
### All mail tempfailing with `451 4.3.0 queue file write error` (postsrsd spin)
|
|
||||||
|
|
||||||
Seen 2026-07-03 right after a pod restart. Signature in `/var/log/mail/mail.log`:
|
|
||||||
`postfix/cleanup: warning: tcp:localhost:10001 lookup error` +
|
|
||||||
`sender_canonical_maps map lookup problem ... message not accepted, try again later`.
|
|
||||||
Cause: **postsrsd** (SRS daemon, `sender_canonical_maps = tcp:localhost:10001`)
|
|
||||||
came up spinning at 100% CPU without binding 10001/10002 — supervisor shows it
|
|
||||||
`RUNNING` but `ss -ltn | grep 1000` is empty and its log is empty. Postfix then
|
|
||||||
tempfails every message (inbound AND submission); senders retry so nothing is
|
|
||||||
lost, and the roundtrip probe alerts within the hour.
|
|
||||||
Fix: `supervisorctl restart postsrsd` inside the container; if the fresh
|
|
||||||
process spins again (it did once), `kubectl -n mailserver delete pod` for a
|
|
||||||
full re-init — that healed it. Root cause not pinned down (one-off bad init;
|
|
||||||
postsrsd 1.10).
|
|
||||||
|
|
||||||
### Inbound mail not arriving
|
### Inbound mail not arriving
|
||||||
1. **DNS/MX**: `dig MX viktorbarzin.me +short` → should show `mail.viktorbarzin.me`
|
1. **DNS/MX**: `dig MX viktorbarzin.me +short` → should show `mail.viktorbarzin.me`
|
||||||
2. **WAN reachability**: `nc -zw5 mail.viktorbarzin.me 25` from outside
|
2. **WAN reachability**: `nc -zw5 mail.viktorbarzin.me 25` from outside
|
||||||
|
|
|
||||||
|
|
@ -1,10 +1,10 @@
|
||||||
# Networking Architecture
|
# Networking Architecture
|
||||||
|
|
||||||
Last updated: 2026-07-02 (dCCTV segment added — dedicated pfSense leg for the garage camera, ADR-0017)
|
Last updated: 2026-04-19 (WS E — Kea DHCP pushes dual DNS per subnet; Kea DDNS TSIG-signed)
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
The homelab network is built on three isolated segments behind pfSense (management VLAN 10, Kubernetes VLAN 20, and the physically-legged dCCTV camera segment — see ADR-0017) with pfSense providing gateway services, Technitium for internal DNS, and Cloudflare for external DNS. Traefik serves as the Kubernetes ingress controller with a middleware chain of anti-AI bot-blocking, Authentik forward-auth, rate limiting, and retry. CrowdSec IP-reputation enforcement is **out-of-band** (not a Traefik hop): banned IPs are dropped in-kernel via nftables on direct hosts and blocked at the Cloudflare edge on proxied hosts (see `docs/architecture/security.md`). All HTTP traffic flows through Cloudflared tunnels, avoiding the need for port forwarding or exposing public IPs.
|
The homelab network is built on a dual-VLAN architecture with pfSense providing gateway services, Technitium for internal DNS, and Cloudflare for external DNS. Traefik serves as the Kubernetes ingress controller with a middleware chain of anti-AI bot-blocking, Authentik forward-auth, rate limiting, and retry. CrowdSec IP-reputation enforcement is **out-of-band** (not a Traefik hop): banned IPs are dropped in-kernel via nftables on direct hosts and blocked at the Cloudflare edge on proxied hosts (see `docs/architecture/security.md`). All HTTP traffic flows through Cloudflared tunnels, avoiding the need for port forwarding or exposing public IPs.
|
||||||
|
|
||||||
## Architecture Diagram
|
## Architecture Diagram
|
||||||
|
|
||||||
|
|
@ -24,14 +24,9 @@ graph TB
|
||||||
|
|
||||||
CSdrop[CrowdSec drop<br/>nftables / CF edge<br/>out-of-band, pre-Traefik]
|
CSdrop[CrowdSec drop<br/>nftables / CF edge<br/>out-of-band, pre-Traefik]
|
||||||
|
|
||||||
subgraph "Proxmox Host (eno1, eno2)"
|
subgraph "Proxmox Host (eno1)"
|
||||||
vmbr0[vmbr0 Bridge<br/>192.168.1.127/24]
|
vmbr0[vmbr0 Bridge<br/>192.168.1.127/24]
|
||||||
vmbr1[vmbr1 Internal<br/>VLAN-aware]
|
vmbr1[vmbr1 Internal<br/>VLAN-aware]
|
||||||
vmbr2[vmbr2 Bridge<br/>eno2 → TL-SG105PE]
|
|
||||||
|
|
||||||
subgraph "dCCTV - 10.0.30.0/24<br/>ADR-0017"
|
|
||||||
Camera[vermont-garage<br/>10.0.30.70]
|
|
||||||
end
|
|
||||||
|
|
||||||
subgraph "VLAN 10 - Management<br/>10.0.10.0/24"
|
subgraph "VLAN 10 - Management<br/>10.0.10.0/24"
|
||||||
Proxmox[Proxmox Host<br/>10.0.10.1]
|
Proxmox[Proxmox Host<br/>10.0.10.1]
|
||||||
|
|
@ -76,9 +71,6 @@ graph TB
|
||||||
vmbr1 -.VLAN 20.- Tech
|
vmbr1 -.VLAN 20.- Tech
|
||||||
vmbr1 -.VLAN 20.- Master
|
vmbr1 -.VLAN 20.- Master
|
||||||
vmbr1 -.VLAN 20.- Node1
|
vmbr1 -.VLAN 20.- Node1
|
||||||
vmbr2 -.physical link.- eno2
|
|
||||||
vmbr2 -.untagged.- Camera
|
|
||||||
vmbr2 -.pfSense net3 = dCCTV 10.0.30.1.- pfSense
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Components
|
## Components
|
||||||
|
|
@ -89,7 +81,6 @@ graph TB
|
||||||
| phpIPAM | v1.7.0 | phpipam.viktorbarzin.me | IP address management, device inventory, DNS sync |
|
| phpIPAM | v1.7.0 | phpipam.viktorbarzin.me | IP address management, device inventory, DNS sync |
|
||||||
| vmbr0 | Linux bridge | 192.168.1.127/24 | Physical bridge on eno1, uplink to LAN |
|
| vmbr0 | Linux bridge | 192.168.1.127/24 | Physical bridge on eno1, uplink to LAN |
|
||||||
| vmbr1 | Linux bridge (VLAN-aware) | Internal | VLAN trunk for VM isolation |
|
| vmbr1 | Linux bridge (VLAN-aware) | Internal | VLAN trunk for VM isolation |
|
||||||
| vmbr2 | Linux bridge | Physical (eno2) | DORMANT fallback leg for dCCTV (ADR-0017 rev 3) — live dCCTV rides vmbr0 tag 30 over the LAN1 trunk |
|
|
||||||
| Technitium DNS | Container | 10.0.20.201 (LB) / 10.96.0.53 (ClusterIP) | Internal DNS (viktorbarzin.lan) + full recursive resolver |
|
| Technitium DNS | Container | 10.0.20.201 (LB) / 10.96.0.53 (ClusterIP) | Internal DNS (viktorbarzin.lan) + full recursive resolver |
|
||||||
| Cloudflare DNS | SaaS | External | ~50 public domains under viktorbarzin.me |
|
| Cloudflare DNS | SaaS | External | ~50 public domains under viktorbarzin.me |
|
||||||
| Cloudflared | Container | K8s (3 replicas) | Tunnel ingress, replaces port forwarding |
|
| Cloudflared | Container | K8s (3 replicas) | Tunnel ingress, replaces port forwarding |
|
||||||
|
|
@ -99,22 +90,6 @@ graph TB
|
||||||
| MetalLB | v0.15.3 Helm chart | K8s | LoadBalancer IPs (10.0.20.200-10.0.20.220), all services on 10.0.20.200 |
|
| MetalLB | v0.15.3 Helm chart | K8s | LoadBalancer IPs (10.0.20.200-10.0.20.220), all services on 10.0.20.200 |
|
||||||
| Registry Cache | Container | 10.0.20.10 | Pull-through for docker.io:5000, ghcr.io:5010 |
|
| Registry Cache | Container | 10.0.20.10 | Pull-through for docker.io:5000, ghcr.io:5010 |
|
||||||
|
|
||||||
## CCTV Segment (dCCTV) — as-built 2026-07-02
|
|
||||||
|
|
||||||
Isolated camera segment for owned cameras at the Sofia site (first: `vermont-garage`, HiLook IPC-T241H-C at the garage entrance). Decision + rejected alternatives: `docs/adr/0017-cctv-segment-dedicated-pfsense-leg.md`.
|
|
||||||
|
|
||||||
**Physical path (rev 3, single switch)**: camera → TL-SG105PE PoE port (untagged VLAN 30) → trunk port (home LAN untagged + CCTV **tagged 30**) → the existing LAN1 cable → R730 `eno1` → `vmbr0` (vlan-aware) → pfSense `net3`/vtnet3 = `vmbr0 tag=30` = interface **dCCTV `10.0.30.1/24`**. The TL-SG105PE **replaces** the old garage TL-SG105E (retired to cold spare) and carries everything: apartment uplink, 4G router `192.168.1.7`, UPS mgmt (VLAN 1), camera (VLAN 30), trunk — all 5 ports used. VLAN-30 membership is {camera port, trunk port} only, so tagged injection from other ports is dropped. `eno2`/`vmbr2` remain dormant as the fallback physical leg (rev 2).
|
|
||||||
|
|
||||||
**Addressing**: Kea DHCP pool `10.0.30.100-199`; devices get MAC reservations (camera `10.0.30.70`; the PE switch mgmt inherits the retired switch's `192.168.1.6` on the home LAN). Kea DDNS auto-registers names in Technitium; `phpipam-pfsense-import` picks up leases hourly.
|
|
||||||
|
|
||||||
**Firewall** (all on pfSense):
|
|
||||||
- dCCTV in: pass `udp OPT4-net → 10.0.30.1:123` (NTP) — everything else hits the interface's default deny. Cameras cannot reach LAN, other segments, or the internet.
|
|
||||||
- WAN in (home LAN side): pass `192.168.1.8` (ha-sofia) → `10.0.30.70:80` (ISAPI/hikvision_next) and `:554` (RTSP), reply-to disabled on both.
|
|
||||||
- dKubernetes is allow-all, so cluster Frigate/go2rtc pulls RTSP with no extra rule (pod egress SNATs to node IPs).
|
|
||||||
- Home-LAN clients need the **AX6000 static route** `10.0.30.0/24 via 192.168.1.2` (camera-day step) to reach the camera UI.
|
|
||||||
|
|
||||||
**Consumers**: cluster Frigate (`/srv/nfs/frigate/config/config.yml` — NOT Terraform) pulls `rtsp://10.0.30.70:554` main+sub as `vermont-garage`; HA integrates via Frigate plus direct hikvision_next for tamper events.
|
|
||||||
|
|
||||||
## IPAM & DNS Auto-Registration
|
## IPAM & DNS Auto-Registration
|
||||||
|
|
||||||
Devices are automatically discovered, named, and registered in DNS without manual intervention.
|
Devices are automatically discovered, named, and registered in DNS without manual intervention.
|
||||||
|
|
@ -232,8 +207,6 @@ VMs tag traffic on vmbr1 to isolate workloads. pfSense bridges VLAN 20 to the up
|
||||||
- blog, hackmd, privatebin, url, echo, f1tv, excalidraw, send, audiobookshelf, jsoncrack, ntfy, cyberchef, homepage, linkwarden, changedetection, tandoor, n8n, stirling-pdf, dashy, city-guesser, travel, netbox
|
- blog, hackmd, privatebin, url, echo, f1tv, excalidraw, send, audiobookshelf, jsoncrack, ntfy, cyberchef, homepage, linkwarden, changedetection, tandoor, n8n, stirling-pdf, dashy, city-guesser, travel, netbox
|
||||||
- **Non-proxied domains** (grey cloud, direct IP resolution):
|
- **Non-proxied domains** (grey cloud, direct IP resolution):
|
||||||
- mail, wg, headscale, immich, calibre, vaultwarden, and other services requiring direct connections
|
- mail, wg, headscale, immich, calibre, vaultwarden, and other services requiring direct connections
|
||||||
- **Internal-IP domains** (grey cloud, A → `10.0.20.203` Traefik LB, `ingress_factory` `dns_type = "internal"`):
|
|
||||||
- highlights-immich, highlights-immich-emo — publicly *resolvable* but only *routable* from home LANs / WG sites / VPN (spokes policy-route `10.0.0.0/8` down the tunnel, so kiosk devices with baked-in URLs need no per-site DNS overrides). The record is reachability, not a gate — enforcement is the `home-lans-only` Traefik ipAllowList (Sofia/London/Valchedrym LANs + 10/8) on the ingress. See `docs/plans/2026-07-04-immich-frame-lan-only-design.md`.
|
|
||||||
- CNAME records for proxied domains point to Cloudflared tunnel FQDNs
|
- CNAME records for proxied domains point to Cloudflared tunnel FQDNs
|
||||||
|
|
||||||
### Ingress Flow
|
### Ingress Flow
|
||||||
|
|
@ -288,7 +261,7 @@ Traefik chain:
|
||||||
|
|
||||||
1. **Anti-AI bot-block** (`ai-bot-block` ForwardAuth, on by default via `ingress_factory`): blocks/tarpits known AI crawlers. **Fail-open** (currently a no-op `return 200` — poison-fountain scaled to 0; see `docs/architecture/security.md`).
|
1. **Anti-AI bot-block** (`ai-bot-block` ForwardAuth, on by default via `ingress_factory`): blocks/tarpits known AI crawlers. **Fail-open** (currently a no-op `return 200` — poison-fountain scaled to 0; see `docs/architecture/security.md`).
|
||||||
2. **Authentik Forward-Auth** (if `protected = true`): SSO authentication via OIDC. Non-authenticated users are redirected to login. Auth headers are stripped before forwarding to backend.
|
2. **Authentik Forward-Auth** (if `protected = true`): SSO authentication via OIDC. Non-authenticated users are redirected to login. Auth headers are stripped before forwarding to backend.
|
||||||
3. **Rate Limiting**: Per-IP throttling. Returns **429 Too Many Requests** (not 503) when limit exceeded. Default is `rate-limit` (average 10 req/s, burst 50). Services whose clients legitimately burst harder get a dedicated middleware via `skip_default_rate_limit = true` + `extra_middlewares`: Immich (`immich-rate-limit`, 1000/20000, photo uploads), ActualBudget (`actualbudget-rate-limit`, 50/300 — the Actual web app boots with ~70 parallel asset/migration revalidations; the default burst 429'd the tail and stalled every page load), authentik (`authentik-rate-limit`, 100/1000, on `/` and `/static` — the login SPA cold-loads ~70 flow-executor JS/CSS chunks from `/static`; the default burst 429'd the tail and a failed ES-module import left a blank login screen for cold/incognito/NAT-shared clients), tripit (`tripit-rate-limit`, 100/1000, photo-tab thumbnail bursts), health (`health-rate-limit`, 100/1000, SPA shell + API burst per page), and dawarich (`dawarich-rate-limit`, 100/1000 — the Rails app self-serves all fingerprinted assets and the map adds an API burst per load; the default burst 429'd the asset tail and risked dropping OwnTracks/mobile location POSTs on the same host).
|
3. **Rate Limiting**: Per-IP throttling. Returns **429 Too Many Requests** (not 503) when limit exceeded. Default is `rate-limit` (average 10 req/s, burst 50). Services whose clients legitimately burst harder get a dedicated middleware via `skip_default_rate_limit = true` + `extra_middlewares`: Immich (`immich-rate-limit`, 1000/20000, photo uploads), ActualBudget (`actualbudget-rate-limit`, 50/300 — the Actual web app boots with ~70 parallel asset/migration revalidations; the default burst 429'd the tail and stalled every page load), and authentik (`authentik-rate-limit`, 100/1000, on `/` and `/static` — the login SPA cold-loads ~70 flow-executor JS/CSS chunks from `/static`; the default burst 429'd the tail and a failed ES-module import left a blank login screen for cold/incognito/NAT-shared clients).
|
||||||
4. **Retry**: 2 attempts with 100ms delay on transient failures (5xx errors, connection errors).
|
4. **Retry**: 2 attempts with 100ms delay on transient failures (5xx errors, connection errors).
|
||||||
|
|
||||||
Additional middleware:
|
Additional middleware:
|
||||||
|
|
@ -579,7 +552,7 @@ chain — a CrowdSec/LAPI outage cannot cause 503s; it only stops new bans.) Che
|
||||||
|
|
||||||
**Diagnosis**: Check Traefik middleware config for the affected IngressRoute.
|
**Diagnosis**: Check Traefik middleware config for the affected IngressRoute.
|
||||||
|
|
||||||
**Fix**: Give the service a dedicated higher-limit middleware (don't loosen the shared default): define `<service>-rate-limit` in `stacks/traefik/modules/traefik/middleware.tf`, then set `skip_default_rate_limit = true` + `extra_middlewares = ["traefik-<service>-rate-limit@kubernetescrd"]` on its `ingress_factory` call. Shared default is average 10 req/s / burst 50; Immich uses 1000/20000, ActualBudget 50/300, and tripit/health/authentik/dawarich each 100/1000 (SPA or asset-heavy page loads bursting past the default from one client IP).
|
**Fix**: Give the service a dedicated higher-limit middleware (don't loosen the shared default): define `<service>-rate-limit` in `stacks/traefik/modules/traefik/middleware.tf`, then set `skip_default_rate_limit = true` + `extra_middlewares = ["traefik-<service>-rate-limit@kubernetescrd"]` on its `ingress_factory` call. Shared default is average 10 req/s / burst 50; Immich uses 1000/20000, ActualBudget 50/300, authentik 100/1000 (login SPA `/static` chunk burst → blank screen).
|
||||||
|
|
||||||
### Large Downloads or Uploads Truncate / Fail Partway
|
### Large Downloads or Uploads Truncate / Fail Partway
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,103 +0,0 @@
|
||||||
# Vault Token Renewer Self-Heal Design
|
|
||||||
|
|
||||||
**Date**: 2026-07-03
|
|
||||||
**Status**: Approved (brainstorm complete; implementation pending)
|
|
||||||
**Owner**: wizard@devvm
|
|
||||||
**Supersedes**: the "version-only, no self-heal" scope choice recorded in
|
|
||||||
`docs/runbooks/vault-token-renew-devvm.md` (2026-06-07)
|
|
||||||
|
|
||||||
## Problem
|
|
||||||
|
|
||||||
`wizard@devvm` holds a maintenance-free periodic Vault token
|
|
||||||
(`token-devvm-wizard`, `period=768h`, renewed daily by the
|
|
||||||
`vault-token-renew` user timer) precisely so no weekly re-login is needed.
|
|
||||||
But `~/.vault-token` is the Vault CLI's default token sink, so any
|
|
||||||
`vault login -method=oidc` — which the infra docs themselves instruct before
|
|
||||||
applies — overwrites it with a 7-day OIDC token. The renewer's drift guard
|
|
||||||
(deliberately detect-only) then refuses to renew the foreign token and fails
|
|
||||||
the unit daily, into a log nobody watches.
|
|
||||||
|
|
||||||
Observed consequence: a self-perpetuating weekly-expiry loop. The OIDC token
|
|
||||||
expires after 7 days → Vault 403s → the natural response is another
|
|
||||||
`vault login -method=oidc` → clobbers again. Drift persisted unnoticed
|
|
||||||
2026-06-18 → 06-26 and 2026-06-29 → 07-03 (memory #7121); Viktor experienced
|
|
||||||
it as "the token expires maybe once a week".
|
|
||||||
|
|
||||||
**Goal**: `vault login -method=oidc` becomes harmless on devvm. The renewer
|
|
||||||
converts any admin-capable clobber back into the permanent periodic token,
|
|
||||||
unattended. (Chosen over "never log in" doc-fixes and over instant path-unit
|
|
||||||
healing — see Alternatives.)
|
|
||||||
|
|
||||||
## Decisions
|
|
||||||
|
|
||||||
| # | Decision | Notes |
|
|
||||||
|---|----------|-------|
|
|
||||||
| 1 | Heal in the existing renewer's drift branch, at its nightly run | ~20-line diff to an already-tested script; no new units. A few-hours window holding the 7-day OIDC token is harmless (heal window 24h ≪ 7d TTL) |
|
|
||||||
| 2 | Heal = *attempt* re-mint using the foreign token itself; let Vault's 403 decide | No policy-list guessing — identity-vs-token-policies burned us before (memory #4211). OIDC tokens carry `vault-admin` via `identity_policies`, so the create succeeds |
|
|
||||||
| 3 | Weak foreign token (create denied) → keep today's loud DRIFT failure | A read-only clobber (e.g. the 2026-06-05 `kubernetes-woodpecker-default` incident) signals a misbehaving agent flow; auto-papering over it would hide the offender. Log gains a "heal denied — investigate what wrote it" suffix |
|
|
||||||
| 4 | Do NOT revoke the clobbering OIDC token | It may still back the user's live login session; it ages out in 7 days on its own |
|
|
||||||
| 5 | After a successful heal, revoke stale `token-devvm-wizard` accessors | Anti-sprawl: each heal would otherwise strand the previous periodic **admin** token server-side for up to 32 days. Walk `auth/token/accessors`, revoke every `display_name=token-devvm-wizard` except the just-minted one. Runs only on heal (rare), never on the happy path |
|
|
||||||
| 6 | Minted-token sanity check before writing the file | Look up the new token; require `display_name=token-devvm-wizard`. Write via temp file + `mv` + `chmod 600` so a failed mint can never truncate `~/.vault-token` |
|
|
||||||
| 7 | Keep timer cadence (daily) and all happy-path behavior unchanged | |
|
|
||||||
| 8 | No notification plumbing in this change | devvm alerting is tracked separately (beads `code-aslh`). Heal events are logged; heal-denied/FAIL still fail the unit |
|
|
||||||
|
|
||||||
## Behavior matrix
|
|
||||||
|
|
||||||
| Token found in `~/.vault-token` | Before | After |
|
|
||||||
|---|---|---|
|
|
||||||
| Our periodic token | renew-self, log `OK` | unchanged |
|
|
||||||
| Foreign, admin-capable (OIDC login) | log `DRIFT`, exit 1 | re-mint periodic token with it, sanity-check, atomic write, revoke stale periodic accessors, log `HEALED: re-minted from foreign dn=<dn> (revoked N stale)`, exit 0 |
|
|
||||||
| Foreign, weak (read-only k8s clobber) | log `DRIFT`, exit 1 | log `DRIFT … heal denied — foreign token lacks create authority; investigate what wrote it`, exit 1 |
|
|
||||||
| Vault unreachable / lookup fails | log `FAIL`, exit 1 | unchanged |
|
|
||||||
|
|
||||||
Re-mint command (identical to the manual recovery the DRIFT log already
|
|
||||||
prescribes):
|
|
||||||
|
|
||||||
```
|
|
||||||
vault token create -orphan -period=768h \
|
|
||||||
-policy=vault-admin -policy=sops-admin -display-name=devvm-wizard
|
|
||||||
```
|
|
||||||
|
|
||||||
## Testing
|
|
||||||
|
|
||||||
- **Unit** (`scripts/test-vault-token-renew.sh`, existing source-the-functions
|
|
||||||
harness): new pure functions for (a) the stale-accessor revoke filter
|
|
||||||
(match on `display_name`, exclude the current accessor) and (b) the
|
|
||||||
minted-token sanity predicate; regression cases for the existing drift
|
|
||||||
predicate stay green.
|
|
||||||
- **Live, post-deploy** (on devvm):
|
|
||||||
1. Mint a fake 1h admin token (`-display-name=fake-oidc`,
|
|
||||||
`-policy=vault-admin -policy=sops-admin`), write to `~/.vault-token`,
|
|
||||||
start the service → expect `HEALED`, file holds `token-devvm-wizard`.
|
|
||||||
2. Mint a fake 10m no-privilege token (`-policy=default`), write it, start
|
|
||||||
the service → expect `DRIFT … heal denied`, unit `failed`; restore real
|
|
||||||
token.
|
|
||||||
3. Revoke both fakes; one-off sweep of stale periodic accessors left by the
|
|
||||||
June 26 / July 3 manual re-mints.
|
|
||||||
|
|
||||||
## Docs & rollout
|
|
||||||
|
|
||||||
- Same commit rewrites the runbook's "Drift guard & recovery" section:
|
|
||||||
self-heal is the recovery for admin-capable clobbers; manual re-mint remains
|
|
||||||
only for weak clobbers (or a dead token with no admin-capable replacement in
|
|
||||||
the file).
|
|
||||||
- `vault login -method=oidc` instructions across the docs stay as-is — the
|
|
||||||
login is now harmless by design.
|
|
||||||
- Deploy per the runbook's manual model: `install -m 0755` to
|
|
||||||
`~/.local/bin/vault-token-renew`. Units unchanged — no daemon-reload.
|
|
||||||
- After landing: update memories #4204/#4211 (gotcha now self-healing).
|
|
||||||
|
|
||||||
## Alternatives considered
|
|
||||||
|
|
||||||
- **Instant heal** (systemd path unit + protected source-copy of the token):
|
|
||||||
strictly more capable (seconds-latency, heals weak clobbers too, zero
|
|
||||||
re-minting), but 2 new units + a second secret file + inotify re-trigger
|
|
||||||
edge cases — machinery disproportionate to the residual risk. Revisit only
|
|
||||||
if the few-hour heal window ever bites.
|
|
||||||
- **Vault CLI `token_helper` interception**: right interception point in
|
|
||||||
theory, but a helper bug breaks every `vault` CLI call, Terraform reads
|
|
||||||
`~/.vault-token` natively anyway, and it adds latency inside login. Rejected.
|
|
||||||
- **Docs-only ("never log in")**: rejected by user — the login should keep
|
|
||||||
working, not become forbidden knowledge.
|
|
||||||
- **Raise the OIDC role's 7-day `token_max_ttl`**: shared role, affects every
|
|
||||||
OIDC user; rejected previously for the same reason (memory #4205).
|
|
||||||
|
|
@ -1,443 +0,0 @@
|
||||||
# Vault Token Renewer Self-Heal Implementation Plan
|
|
||||||
|
|
||||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
||||||
|
|
||||||
**Goal:** Make `vault login -method=oidc` harmless on devvm — the nightly renewer re-mints the permanent periodic token from any admin-capable clobber of `~/.vault-token`, unattended.
|
|
||||||
|
|
||||||
**Architecture:** Extend the drift branch of `scripts/vault-token-renew.sh` (deployed to `~/.local/bin/vault-token-renew`, driven by an existing systemd user timer). On drift, *attempt* the re-mint with the clobbering token itself and let Vault's 403 be the authority; sanity-check the minted token, replace the file atomically, then revoke stale `token-devvm-wizard` leftovers. Weak clobbers keep today's loud failure. Design: `docs/plans/2026-07-03-vault-token-self-heal-design.md`.
|
|
||||||
|
|
||||||
**Tech Stack:** bash + jq + vault CLI; existing test harness `scripts/test-vault-token-renew.sh` (sources the script, `vtr_main` is guarded).
|
|
||||||
|
|
||||||
**Working copy:** everything below runs in the worktree
|
|
||||||
`~/code/infra/.worktrees/vault-token-self-heal` on branch `wizard/vault-token-self-heal`.
|
|
||||||
Per repo policy, EVERY git command in this git-crypt repo worktree carries:
|
|
||||||
`-c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false`
|
|
||||||
(abbreviated as `$GCFLAGS` below; define once per shell:
|
|
||||||
`GCFLAGS="-c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false"`
|
|
||||||
and use it unquoted: `git $GCFLAGS <verb> …`).
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 1: Unit tests for the two new pure functions (RED)
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `scripts/test-vault-token-renew.sh` (append before the final `printf`/exit lines)
|
|
||||||
|
|
||||||
- [ ] **Step 1: Append the failing tests**
|
|
||||||
|
|
||||||
Insert this block immediately after the existing "parse + decide end-to-end" section (after the line `no "oidc: parse+decide refused" …`, before the final `printf '\n%d passed…'`):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# --- vtr_accessor: parse accessor out of lookup JSON ---
|
|
||||||
LOOKUP_NEW='{"data":{"display_name":"token-devvm-wizard","accessor":"acc-new","policies":["default","sops-admin","vault-admin"],"identity_policies":null}}'
|
|
||||||
eq "accessor parsed" "acc-new" "$(vtr_accessor "$LOOKUP_NEW")"
|
|
||||||
eq "accessor absent -> empty" "" "$(vtr_accessor '{"data":{"display_name":"x"}}')"
|
|
||||||
|
|
||||||
# --- vtr_is_stale_periodic: the heal's revoke filter — ONLY old token-devvm-wizard
|
|
||||||
# --- tokens are swept; the just-minted token, foreign tokens, and anything with an
|
|
||||||
# --- unknown accessor are kept. An empty keep-accessor sweeps NOTHING (fail-safe).
|
|
||||||
STALE_OURS='{"data":{"display_name":"token-devvm-wizard","accessor":"acc-old","policies":["default","sops-admin","vault-admin"]}}'
|
|
||||||
ok "older periodic token is stale" vtr_is_stale_periodic "$STALE_OURS" "acc-new"
|
|
||||||
no "the just-minted token is kept" vtr_is_stale_periodic "$LOOKUP_NEW" "acc-new"
|
|
||||||
no "foreign oidc token never swept" vtr_is_stale_periodic "$LOOKUP_OIDC" "acc-new"
|
|
||||||
no "woodpecker token never swept" vtr_is_stale_periodic "$LOOKUP_WP" "acc-new"
|
|
||||||
no "missing accessor never swept" vtr_is_stale_periodic '{"data":{"display_name":"token-devvm-wizard"}}' "acc-new"
|
|
||||||
no "empty keep-accessor sweeps nothing" vtr_is_stale_periodic "$STALE_OURS" ""
|
|
||||||
```
|
|
||||||
|
|
||||||
(`LOOKUP_OIDC` / `LOOKUP_WP` and the `ok`/`no`/`eq` helpers already exist in the file.)
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run tests, verify they fail**
|
|
||||||
|
|
||||||
Run: `bash scripts/test-vault-token-renew.sh`
|
|
||||||
Expected: FAILs / `command not found` for `vtr_accessor` and `vtr_is_stale_periodic`; the 17 pre-existing tests stay green.
|
|
||||||
|
|
||||||
### Task 2: Implement the pure functions (GREEN)
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `scripts/vault-token-renew.sh` (insert after `vtr_drift_ok()`, before `vtr_main()`)
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add the two functions**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# vtr_accessor <lookup-json> -> the token accessor (empty if absent).
|
|
||||||
vtr_accessor() {
|
|
||||||
printf '%s' "$1" | jq -r '.data.accessor // ""'
|
|
||||||
}
|
|
||||||
|
|
||||||
# vtr_is_stale_periodic <lookup-json> <keep-accessor> -> 0 if this lookup
|
|
||||||
# describes one of OUR periodic tokens (display name matches) that is NOT the
|
|
||||||
# one to keep — i.e. a stale leftover a heal should revoke. 1 otherwise.
|
|
||||||
# Name-only on purpose (no policy check): anything named token-devvm-wizard
|
|
||||||
# that isn't the current token is garbage from a previous mint. An empty
|
|
||||||
# keep-accessor sweeps NOTHING (fail-safe: never revoke when we don't know
|
|
||||||
# which token is current).
|
|
||||||
vtr_is_stale_periodic() {
|
|
||||||
local dn acc
|
|
||||||
[ -n "${2:-}" ] || return 1
|
|
||||||
dn=$(vtr_display_name "$1")
|
|
||||||
acc=$(vtr_accessor "$1")
|
|
||||||
[ "$dn" = "$EXPECTED_DN" ] || return 1
|
|
||||||
[ -n "$acc" ] || return 1
|
|
||||||
[ "$acc" != "$2" ]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Run tests, verify all pass**
|
|
||||||
|
|
||||||
Run: `bash scripts/test-vault-token-renew.sh`
|
|
||||||
Expected: `25 passed, 0 failed`, exit 0.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ~/code/infra/.worktrees/vault-token-self-heal
|
|
||||||
git $GCFLAGS add scripts/vault-token-renew.sh scripts/test-vault-token-renew.sh
|
|
||||||
git $GCFLAGS commit -m "vault-token-renew: pure helpers for the self-heal revoke filter
|
|
||||||
|
|
||||||
vtr_accessor parses the accessor from lookup JSON; vtr_is_stale_periodic
|
|
||||||
decides which old token-devvm-wizard tokens a heal may revoke (never the
|
|
||||||
just-minted one, never foreign tokens, nothing when the keeper is unknown).
|
|
||||||
TDD red-green for the heal branch that lands next."
|
|
||||||
```
|
|
||||||
|
|
||||||
### Task 3: The heal branch (`vtr_heal` + `vtr_main` wiring)
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `scripts/vault-token-renew.sh`
|
|
||||||
|
|
||||||
- [ ] **Step 1: Add `vtr_heal` after `vtr_is_stale_periodic()`, before `vtr_main()`**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# vtr_heal <foreign-dn> <log-file> -> 0 if ~/.vault-token was re-minted back to
|
|
||||||
# our periodic admin token using the foreign token's own authority, 1 if the
|
|
||||||
# heal was denied or failed (caller exits non-zero; the unit goes failed).
|
|
||||||
#
|
|
||||||
# Self-heal added 2026-07-03 (docs/plans/2026-07-03-vault-token-self-heal-design.md):
|
|
||||||
# an OIDC login — which the infra docs prescribe before applies — clobbers
|
|
||||||
# ~/.vault-token with a 7-day token, and detect-only drift left that unnoticed
|
|
||||||
# for weeks (the weekly-expiry loop). We ATTEMPT the re-mint with the
|
|
||||||
# clobbering token itself and let Vault's authz decide — a read-only clobber
|
|
||||||
# (the 2026-06-05 woodpecker incident) is denied the mint and stays a loud
|
|
||||||
# failure, because it signals a misbehaving flow that someone should look at.
|
|
||||||
vtr_heal() {
|
|
||||||
local foreign_dn="$1" log="$2"
|
|
||||||
local errf new_token new_info new_dn new_pols new_acc tmp
|
|
||||||
errf=$(mktemp)
|
|
||||||
if ! new_token=$(vault token create -orphan -period=768h \
|
|
||||||
-policy=vault-admin -policy=sops-admin -display-name=devvm-wizard \
|
|
||||||
-field=token 2>"$errf") || [ -z "$new_token" ]; then
|
|
||||||
printf '%s DRIFT: ~/.vault-token is dn=%q — heal denied, foreign token lacks create authority (%s); investigate what wrote it. Manual re-mint: vault login -method=oidc && vault token create -orphan -period=768h -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard -field=token > ~/.vault-token && chmod 600 ~/.vault-token\n' \
|
|
||||||
"$(date -Is)" "$foreign_dn" "$(tr '\n' ' ' <"$errf")" >>"$log"
|
|
||||||
rm -f "$errf"
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
rm -f "$errf"
|
|
||||||
|
|
||||||
# Sanity: the minted token must itself pass the drift guard before it may
|
|
||||||
# replace ~/.vault-token.
|
|
||||||
if ! new_info=$(VAULT_TOKEN="$new_token" vault token lookup -format=json 2>&1); then
|
|
||||||
printf '%s FAIL: heal minted a token but its lookup failed: %s\n' \
|
|
||||||
"$(date -Is)" "$new_info" >>"$log"
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
new_dn=$(vtr_display_name "$new_info")
|
|
||||||
new_pols=$(vtr_policies_csv "$new_info")
|
|
||||||
if ! vtr_drift_ok "$new_dn" "$new_pols"; then
|
|
||||||
printf '%s FAIL: heal minted an unexpected token (dn=%q policies=%q) — not writing it\n' \
|
|
||||||
"$(date -Is)" "$new_dn" "$new_pols" >>"$log"
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Atomic replace: mktemp files are 0600 from birth; same-filesystem mv.
|
|
||||||
tmp=$(mktemp "$HOME/.vault-token.XXXXXX")
|
|
||||||
printf '%s' "$new_token" >"$tmp"
|
|
||||||
mv "$tmp" "$HOME/.vault-token"
|
|
||||||
|
|
||||||
# Anti-sprawl: revoke previous token-devvm-wizard tokens — each heal would
|
|
||||||
# otherwise strand the prior periodic ADMIN token server-side for up to 32d.
|
|
||||||
# The clobbering foreign token is deliberately NOT revoked: it may still back
|
|
||||||
# the user's live login session, and it ages out on its own (7d for OIDC).
|
|
||||||
local sweep="accessor sweep skipped (list denied)" accessors a a_info revoked=0
|
|
||||||
new_acc=$(vtr_accessor "$new_info")
|
|
||||||
if [ -n "$new_acc" ] && accessors=$(VAULT_TOKEN="$new_token" vault list -format=json auth/token/accessors 2>/dev/null); then
|
|
||||||
while IFS= read -r a; do
|
|
||||||
[ -n "$a" ] || continue
|
|
||||||
a_info=$(VAULT_TOKEN="$new_token" vault token lookup -format=json -accessor "$a" 2>/dev/null) || continue
|
|
||||||
if vtr_is_stale_periodic "$a_info" "$new_acc"; then
|
|
||||||
VAULT_TOKEN="$new_token" vault token revoke -accessor "$a" >/dev/null 2>&1 && revoked=$((revoked + 1))
|
|
||||||
fi
|
|
||||||
done < <(printf '%s' "$accessors" | jq -r '.[]')
|
|
||||||
sweep="revoked $revoked stale periodic token(s)"
|
|
||||||
fi
|
|
||||||
|
|
||||||
printf '%s HEALED: re-minted periodic token from foreign dn=%q (%s)\n' \
|
|
||||||
"$(date -Is)" "$foreign_dn" "$sweep" >>"$log"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: Rewire the drift branch in `vtr_main`**
|
|
||||||
|
|
||||||
Replace this exact block (comment + if):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Drift guard (added 2026-06-07): the renewer must NOT keep a FOREIGN token alive.
|
|
||||||
# On 2026-06-05 a stray `vault login -method=kubernetes` overwrote ~/.vault-token
|
|
||||||
# with a read-only woodpecker token, and this script then silently renewed THAT
|
|
||||||
# for two days — masking the loss of write access. So before renewing, confirm
|
|
||||||
# the token is our periodic admin token; if it has drifted, fail loudly (systemd
|
|
||||||
# marks the unit failed) instead of keeping someone else's token alive.
|
|
||||||
if ! vtr_drift_ok "$dn" "$pols"; then
|
|
||||||
printf '%s DRIFT: ~/.vault-token is dn=%q policies=%q (expected dn=%q with %q). Refusing to renew a foreign token. Re-mint: vault login -method=oidc && vault token create -orphan -period=768h -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard -field=token > ~/.vault-token && chmod 600 ~/.vault-token\n' \
|
|
||||||
"$(date -Is)" "$dn" "$pols" "$EXPECTED_DN" "$REQUIRED_POLICY" >>"$log"
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
```
|
|
||||||
|
|
||||||
with:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Drift guard (2026-06-07) + self-heal (2026-07-03): the renewer must not
|
|
||||||
# keep a FOREIGN token alive (on 2026-06-05 a stray kubernetes login was
|
|
||||||
# silently renewed for two days, masking lost write access). But detect-only
|
|
||||||
# drift proved worse in practice: an OIDC login — which the infra docs
|
|
||||||
# prescribe before applies — clobbers this file too, and the resulting DRIFT
|
|
||||||
# failures went unnoticed for weeks while access degraded to a 7-day token
|
|
||||||
# (the weekly-expiry loop). On drift we now ATTEMPT to heal (see vtr_heal):
|
|
||||||
# re-mint the periodic token with the clobbering token's own authority.
|
|
||||||
# Vault's authz keeps the old guarantee — a token that couldn't legitimately
|
|
||||||
# hold vault-admin is denied the mint, and we still fail loud.
|
|
||||||
if ! vtr_drift_ok "$dn" "$pols"; then
|
|
||||||
vtr_heal "$dn" "$log" || exit 1
|
|
||||||
exit 0
|
|
||||||
fi
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3: Syntax + lint + regression check**
|
|
||||||
|
|
||||||
Run: `bash -n scripts/vault-token-renew.sh && bash scripts/test-vault-token-renew.sh; command -v shellcheck >/dev/null && shellcheck scripts/vault-token-renew.sh`
|
|
||||||
Expected: syntax OK, `25 passed, 0 failed`; shellcheck (if installed) reports nothing new.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Commit**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git $GCFLAGS add scripts/vault-token-renew.sh
|
|
||||||
git $GCFLAGS commit -m "vault-token-renew: self-heal the periodic token on admin-capable clobber
|
|
||||||
|
|
||||||
Viktor asked for 'vault login -method=oidc' to work seamlessly: the OIDC
|
|
||||||
login the docs prescribe kept clobbering ~/.vault-token with a 7-day token,
|
|
||||||
and detect-only DRIFT failures went unnoticed for weeks (weekly-expiry
|
|
||||||
loop, twice in June). On drift the renewer now re-mints the periodic token
|
|
||||||
with the clobbering token's own authority (Vault's 403 is the judge — no
|
|
||||||
policy guessing), sanity-checks it, replaces the file atomically, and
|
|
||||||
revokes stale token-devvm-wizard leftovers. Weak/read-only clobbers still
|
|
||||||
fail loudly on purpose. Design: docs/plans/2026-07-03-vault-token-self-heal-design.md"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Task 4: Docs — runbook + test-file header
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `docs/runbooks/vault-token-renew-devvm.md` (the `## Drift guard & recovery` section + the healthy-log-line note + `## Tests`)
|
|
||||||
- Modify: `scripts/test-vault-token-renew.sh` (header comment only)
|
|
||||||
|
|
||||||
- [ ] **Step 1: Replace the runbook's `## Drift guard & recovery` section with:**
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
## Drift guard & self-heal
|
|
||||||
|
|
||||||
`~/.vault-token` is the Vault CLI's default token sink, so **any** `vault login`
|
|
||||||
overwrites it. Two confirmed clobber vectors:
|
|
||||||
|
|
||||||
1. `vault login -method=oidc` → replaces it with a 7-day OIDC token (the renewer
|
|
||||||
can't push past the OIDC role's 7-day `token_max_ttl`). The infra docs
|
|
||||||
prescribe this login before applies, so it recurs — it went unnoticed for
|
|
||||||
weeks twice (2026-06-18→26, 2026-06-29→07-03) and read as "Vault expires
|
|
||||||
weekly".
|
|
||||||
2. A stray `vault login -method=kubernetes` (e.g. a headless agent flow) →
|
|
||||||
writes a read-only `kubernetes-woodpecker-default` token (can read Vault but
|
|
||||||
**cannot** write `secret/*`). Happened 2026-06-05, unnoticed for two days.
|
|
||||||
|
|
||||||
Since 2026-07-03 the renewer **self-heals**
|
|
||||||
(`docs/plans/2026-07-03-vault-token-self-heal-design.md`). On a foreign token
|
|
||||||
it attempts the re-mint **with the clobbering token's own authority** and lets
|
|
||||||
Vault's authz decide:
|
|
||||||
|
|
||||||
- **Admin-capable clobber (OIDC login)** → re-mints the periodic token,
|
|
||||||
sanity-checks it against the drift guard, atomically replaces
|
|
||||||
`~/.vault-token`, revokes stale `token-devvm-wizard` leftovers
|
|
||||||
(anti-sprawl), logs
|
|
||||||
`HEALED: re-minted periodic token from foreign dn=… (revoked N stale periodic token(s))`
|
|
||||||
and exits 0. The clobbering token is NOT revoked — it may still back a live
|
|
||||||
login session; it ages out on its own.
|
|
||||||
- **Weak clobber (read-only k8s token)** → the mint is denied; logs
|
|
||||||
`DRIFT: … heal denied, foreign token lacks create authority …; investigate what wrote it`
|
|
||||||
and exits non-zero (unit `failed`). Deliberately loud: this signals a
|
|
||||||
misbehaving agent flow — exactly the 2026-06-05 case.
|
|
||||||
|
|
||||||
**Manual recovery** is only needed for the weak-clobber case (the DRIFT log
|
|
||||||
line still contains the exact command) — run the
|
|
||||||
[mint/re-mint](#mint--re-mint-the-token) block.
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 2: In the runbook's `## Health check` section**, after the "A healthy log line looks like…" sentence, add:
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
After an OIDC login you'll instead see, at the next nightly run:
|
|
||||||
`<ts> HEALED: re-minted periodic token from foreign dn="oidc-…" (revoked N stale periodic token(s))` — that's the self-heal working as designed.
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3: In the runbook's `## Tests` section**, replace the first sentence with:
|
|
||||||
|
|
||||||
```markdown
|
|
||||||
`infra/scripts/test-vault-token-renew.sh` unit-tests the drift-guard decision,
|
|
||||||
the lookup-JSON parsers (including the exact 2026-06-05 woodpecker-clobber
|
|
||||||
case), and the self-heal's revoke filter (which stale periodic tokens a heal
|
|
||||||
may sweep).
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4: Update the test file's header comment** (lines 2–7) to:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Unit tests for the pure functions in vault-token-renew.sh.
|
|
||||||
# Sources the script (vtr_main is guarded) and exercises (a) the drift-guard
|
|
||||||
# decision — is ~/.vault-token OUR periodic admin token (renew) or a foreign
|
|
||||||
# clobber (heal / fail loud)? — whose ABSENCE let the 2026-06-05 woodpecker
|
|
||||||
# clobber be silently renewed for two days, and (b) the self-heal's revoke
|
|
||||||
# filter — which stale token-devvm-wizard tokens a heal may sweep.
|
|
||||||
# Run: bash infra/scripts/test-vault-token-renew.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 5: Run tests once more, then commit**
|
|
||||||
|
|
||||||
Run: `bash scripts/test-vault-token-renew.sh`
|
|
||||||
Expected: `25 passed, 0 failed`.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git $GCFLAGS add docs/runbooks/vault-token-renew-devvm.md scripts/test-vault-token-renew.sh
|
|
||||||
git $GCFLAGS commit -m "vault-token-renew runbook: document the self-heal behavior
|
|
||||||
|
|
||||||
Drift guard section rewritten: admin-capable clobbers now self-heal at the
|
|
||||||
nightly run (HEALED log line); weak clobbers keep the loud DRIFT failure;
|
|
||||||
manual re-mint is only the weak-clobber recovery now."
|
|
||||||
```
|
|
||||||
|
|
||||||
### Task 5: Deploy + live verification (on devvm, as wizard)
|
|
||||||
|
|
||||||
**Files:** none (host deploy + live checks)
|
|
||||||
|
|
||||||
- [ ] **Step 1: Install from the worktree**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
install -m 0755 ~/code/infra/.worktrees/vault-token-self-heal/scripts/vault-token-renew.sh ~/.local/bin/vault-token-renew
|
|
||||||
```
|
|
||||||
|
|
||||||
(Units unchanged — no `daemon-reload` needed.)
|
|
||||||
|
|
||||||
- [ ] **Step 2: Live case 1 — admin-capable clobber heals**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export VAULT_ADDR=https://vault.viktorbarzin.me
|
|
||||||
export XDG_RUNTIME_DIR=/run/user/$(id -u)
|
|
||||||
FAKE_ADMIN=$(vault token create -ttl=1h -policy=vault-admin -policy=sops-admin -display-name=fake-oidc -field=token)
|
|
||||||
printf '%s' "$FAKE_ADMIN" > ~/.vault-token
|
|
||||||
systemctl --user start vault-token-renew.service; echo "exit=$?"
|
|
||||||
tail -1 ~/.local/state/vault-token-renew.log
|
|
||||||
vault token lookup | grep -E 'display_name|period'
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: `exit=0`; log line `HEALED: re-minted periodic token from foreign dn="token-fake-oidc" (revoked N stale periodic token(s))` with N ≥ 1 (the pre-clobber periodic token is itself swept as stale — by design — along with any strays from the June 26 / July 3 manual re-mints); lookup shows `display_name token-devvm-wizard`, `period 768h`. Note: `FAKE_ADMIN` is a child of the swept old token, so the cascade revokes it too — no cleanup needed.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Verify exactly ONE periodic token remains server-side**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
for a in $(vault list -format=json auth/token/accessors | jq -r '.[]'); do
|
|
||||||
vault token lookup -format=json -accessor "$a" 2>/dev/null \
|
|
||||||
| jq -r 'select(.data.display_name=="token-devvm-wizard") | .data.accessor'
|
|
||||||
done
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: exactly one line, matching `vault token lookup -format=json | jq -r .data.accessor`.
|
|
||||||
|
|
||||||
- [ ] **Step 4: Live case 2 — weak clobber stays a loud failure**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
GOOD=$(cat ~/.vault-token)
|
|
||||||
FAKE_WEAK=$(vault token create -ttl=10m -policy=default -display-name=fake-weak -field=token)
|
|
||||||
printf '%s' "$FAKE_WEAK" > ~/.vault-token
|
|
||||||
systemctl --user start vault-token-renew.service; echo "exit=$?"
|
|
||||||
systemctl --user is-failed vault-token-renew.service
|
|
||||||
tail -1 ~/.local/state/vault-token-renew.log
|
|
||||||
printf '%s' "$GOOD" > ~/.vault-token && chmod 600 ~/.vault-token
|
|
||||||
vault token revoke "$FAKE_WEAK" >/dev/null
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: `exit=1` (start reports the oneshot failure), `is-failed` prints `failed`, log line `DRIFT: ~/.vault-token is dn="token-fake-weak" — heal denied, foreign token lacks create authority (… permission denied …); investigate what wrote it. Manual re-mint: …`.
|
|
||||||
|
|
||||||
- [ ] **Step 5: Happy path still green**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
systemctl --user start vault-token-renew.service; echo "exit=$?"
|
|
||||||
tail -1 ~/.local/state/vault-token-renew.log
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: `exit=0`, log `OK renewed (dn=token-devvm-wizard ttl=2764800s)`.
|
|
||||||
|
|
||||||
### Task 6: Land on master + cleanup
|
|
||||||
|
|
||||||
- [ ] **Step 1: Merge latest master into the branch, re-verify, push**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd ~/code/infra/.worktrees/vault-token-self-heal
|
|
||||||
git $GCFLAGS fetch forgejo
|
|
||||||
git $GCFLAGS merge forgejo/master
|
|
||||||
bash scripts/test-vault-token-renew.sh
|
|
||||||
git $GCFLAGS push forgejo HEAD:master
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: clean merge (or already up to date), `25 passed, 0 failed`, push accepted. Non-fast-forward → fetch, merge, push again.
|
|
||||||
|
|
||||||
- [ ] **Step 2: Watch CI to completion**
|
|
||||||
|
|
||||||
The push fires the infra Woodpecker `default.yml` (terragrunt apply for changed stacks). This change touches only `scripts/` + `docs/` → expect a fast success / no-op apply. Check (Forgejo-forge infra repo = Woodpecker repo id 82):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export VAULT_ADDR=https://vault.viktorbarzin.me
|
|
||||||
vault kv get -format=json secret/ci/global | jq -r '.data.data | keys[]' # find the woodpecker admin token key
|
|
||||||
WP_TOKEN=$(vault kv get -field=<that-key> secret/ci/global)
|
|
||||||
curl -s -H "Authorization: Bearer $WP_TOKEN" 'https://ci.viktorbarzin.me/api/repos/82/pipelines?perPage=1' | jq '.[0] | {number, status, commit: .commit[0:8]}'
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: the pipeline for the pushed commit reaches `status: "success"` (poll until terminal). If it fails, fix before proceeding.
|
|
||||||
|
|
||||||
- [ ] **Step 3: Remove worktree + branch, reconcile main checkout**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git -C ~/code/infra $GCFLAGS worktree remove .worktrees/vault-token-self-heal
|
|
||||||
git -C ~/code/infra $GCFLAGS branch -d wizard/vault-token-self-heal
|
|
||||||
git -C ~/code/infra status --porcelain # expect clean before pulling
|
|
||||||
git -C ~/code/infra $GCFLAGS pull --ff-only forgejo master
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: worktree gone, branch deleted (already merged), main checkout fast-forwards to the landed commit.
|
|
||||||
|
|
||||||
### Task 7: Memory + wrap-up
|
|
||||||
|
|
||||||
- [ ] **Step 1: Update the stale memories** (they say the drift guard is detect-only / recovery is manual):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
homelab memory recall "vault periodic token renewer drift" # confirm ids 4204, 4211, 7121 still say detect-only
|
|
||||||
homelab memory update 4211 "<original gotcha content, amended: since 2026-07-03 the renewer SELF-HEALS admin-capable clobbers at its nightly run (re-mints the periodic token with the clobbering token's authority + revokes stale token-devvm-wizard leftovers; weak clobbers still fail loudly). An OIDC login on devvm is now harmless. Design: infra docs/plans/2026-07-03-vault-token-self-heal-design.md>"
|
|
||||||
homelab memory update 7121 "<original content, amended: PLAYBOOK OBSOLETE for admin clobbers — self-heal shipped 2026-07-03; manual re-mint only needed for weak/read-only clobbers>"
|
|
||||||
```
|
|
||||||
|
|
||||||
(Fetch each memory's current text first and preserve it — amend, don't replace wholesale.)
|
|
||||||
|
|
||||||
- [ ] **Step 2: End-of-task extraction** — dispatch the standard M.3 memory-mining subagent per `~/.claude/rules/execution.md`, then give the final summary.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
## Plan self-review (done at write time)
|
|
||||||
|
|
||||||
- **Spec coverage**: heal-on-admin-clobber (T3), loud-fail-on-weak (T3 + live T5.4), no-revoke-foreign (T3 comment + design decision 4), anti-sprawl sweep + fail-safe filter (T2/T3, live T5.3), minted-token sanity + atomic write (T3), unit tests (T1/T2), runbook (T4), deploy + live sim (T5), memory updates (T7). ✓
|
|
||||||
- **Placeholders**: `<that-key>` in T6.2 is a deliberate discovery step (key name verified live from Vault, not invented). No other TBDs. ✓
|
|
||||||
- **Name consistency**: `vtr_accessor`, `vtr_is_stale_periodic`, `vtr_heal`, `EXPECTED_DN` match across tasks; test count 17→25 consistent (8 new cases). ✓
|
|
||||||
|
|
@ -1,335 +0,0 @@
|
||||||
# Backup MX — self-hosted store-and-forward relay on Oracle Always-Free — design
|
|
||||||
|
|
||||||
Date: 2026-07-04 (v3 — post-challenge; v2 Oracle pivot same day) · Status: design,
|
|
||||||
pre-implementation · ADR: [0019](../adr/0019-backup-mx-self-hosted-oracle-relay.md)
|
|
||||||
|
|
||||||
v3 incorporates two independent adversarial-challenge reviews (same day). Their
|
|
||||||
material corrections are marked **[CH]** throughout — the largest: the v2 drain
|
|
||||||
path would never have drained (primary-side smtpd rejects), monitoring-over-
|
|
||||||
tailnet was fiction (no cluster→tailnet route exists), and the VM's bounce
|
|
||||||
model was wrong (it can never deliver a DSN).
|
|
||||||
|
|
||||||
## Goal
|
|
||||||
|
|
||||||
Inbound mail for `viktorbarzin.me` must survive homelab outages without loss.
|
|
||||||
Requirement level (Viktor, 2026-07-04): **never lose mail; delayed delivery is
|
|
||||||
acceptable; budget is $0** (hard constraint — reaffirmed after the Rollernet
|
|
||||||
gates failed). A store-and-forward backup MX queues mail while the homelab is
|
|
||||||
down and re-delivers when it returns.
|
|
||||||
|
|
||||||
Out of scope, explicitly:
|
|
||||||
|
|
||||||
- Reading new mail *during* an outage.
|
|
||||||
- Outbound mail during outages.
|
|
||||||
- The "primary up but hard-bouncing 5xx" misconfig class — a backup MX is
|
|
||||||
never consulted when the primary answers. Separate hardening/alerting track.
|
|
||||||
|
|
||||||
Known residual limit (state it plainly): an outage **longer than 30 days**
|
|
||||||
loses the queued mail *silently* — the VM cannot emit a bounce to anyone
|
|
||||||
(egress 25 blocked), so no sender ever learns. Accepted; 30 days is already
|
|
||||||
6× the sender-retry status quo.
|
|
||||||
|
|
||||||
## v1 → v2: why Rollernet was dropped (gate evidence, 2026-07-04)
|
|
||||||
|
|
||||||
v1 selected Roller Network's free Secondary MX. The validation gates killed it
|
|
||||||
before any DNS change:
|
|
||||||
|
|
||||||
- **G2 FAILED**: the [free-accounts policy](https://rollernet.us/policy/free-accounts.html)
|
|
||||||
caps free mail service at **200 relayed messages or 10 MB per rolling 7
|
|
||||||
days**; overage → domain suspended **48 h answering SMTP 5xx** (permanent
|
|
||||||
bounces), repeatable. Spammers deliberately target backup MXes even while
|
|
||||||
the primary is up, so background spam alone can hold the domain suspended —
|
|
||||||
worse than no backup MX.
|
|
||||||
- **G1 SHAKY**: same policy page says free accounts are being discontinued.
|
|
||||||
- **G3 PASSED** (for posterity): `mail{,2}.rollernet.us` present valid LE
|
|
||||||
certs over STARTTLS.
|
|
||||||
- Signup is Cloudflare-Turnstile-gated — moot given G1/G2.
|
|
||||||
|
|
||||||
Viktor's decision: stay free → self-host on Oracle Always-Free. **[CH]** The
|
|
||||||
external challenger re-searched the free landscape (DNSExit, KisoLabs,
|
|
||||||
DuoCircle, AWS/Azure/GCP/Hetzner/Fly/Vultr/Linode free tiers) and confirmed:
|
|
||||||
no credible free managed backup-MX or free VM with a usable port-25 story
|
|
||||||
exists in 2026 other than OCI. GCP's free e2-micro also blocks egress 25 and
|
|
||||||
is US-regions-only (wrong continent).
|
|
||||||
|
|
||||||
## Decision
|
|
||||||
|
|
||||||
A minimal **Postfix store-and-forward relay** (`mx2.viktorbarzin.me`) on an
|
|
||||||
Oracle Cloud **Always-Free** compute instance, published as a lower-preference
|
|
||||||
MX. It accepts mail for `viktorbarzin.me` when the primary is unreachable,
|
|
||||||
queues up to 30 days, and drains to the primary when it returns. No mailboxes,
|
|
||||||
no third-party terms — the queue-lifetime and reject-behavior knobs are ours.
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
```
|
|
||||||
┌── pri 1 mail.viktorbarzin.me ──► pfSense HAProxy ──► mailserver pod
|
|
||||||
sender MTA ──► MX lookup ┤ ▲
|
|
||||||
└── pri 20 mx2.viktorbarzin.me │ drain: smtp to
|
|
||||||
(Oracle VM, Postfix relay, │ mail.viktorbarzin.me:2526
|
|
||||||
queue ≤ 30 days) ───────────────────┘ (pfSense WAN NAT rdr
|
|
||||||
2526 → 10.0.20.1:25,
|
|
||||||
existing HAProxy frontend)
|
|
||||||
```
|
|
||||||
|
|
||||||
- **Normal operation**: senders use pri 1; the VM idles (spammers targeting
|
|
||||||
the backup + transient-blip retries get relayed onward immediately).
|
|
||||||
- **Outage**: senders fall back to pri 20 → VM accepts + queues → Postfix
|
|
||||||
retries the primary on its native schedule → queue drains after recovery
|
|
||||||
through the standard external ingress path (PROXY v2 → :2525 → rspamd →
|
|
||||||
Dovecot).
|
|
||||||
- **Custom drain port**: Oracle blocks **egress TCP 25** tenancy-wide
|
|
||||||
(post-2021; exemptions unreliable) — the VM cannot reach
|
|
||||||
`mail.viktorbarzin.me:25`. One pfSense WAN NAT rule `TCP 2526 →
|
|
||||||
10.0.20.1:25` reuses the existing HAProxy frontend unchanged. **[CH]
|
|
||||||
Verified against the runbook**: the frontend binds `*:25` on pfSense (not
|
|
||||||
strictly 10.0.20.1), rdr dst-port rewrite is the existing production
|
|
||||||
pattern (WAN:25 already rewrites to 10.0.20.1:25), and port 2526 collides
|
|
||||||
with nothing (the HAProxy test frontend uses :2525). Inbound TCP 25 **to**
|
|
||||||
the VM is unaffected by Oracle's egress-only block per practitioner
|
|
||||||
evidence (iRedMail/mailcow on OCI: receive works, send doesn't) — **to be
|
|
||||||
proven at gate O2 before any DNS change** (Oracle publishes no positive
|
|
||||||
commitment).
|
|
||||||
|
|
||||||
## Oracle account & instance
|
|
||||||
|
|
||||||
- **Account**: Viktor creates it (human signup; card for identity, $0
|
|
||||||
charged). **Home region is fixed at signup and Always-Free compute exists
|
|
||||||
only there — choose `eu-frankfurt-1` deliberately; there is no
|
|
||||||
try-another-region fallback without a new account. [CH]**
|
|
||||||
- **[CH] PAYG conversion is a REQUIRED prerequisite, not a recommendation**:
|
|
||||||
Oracle stops idle Always-Free instances (95th-pct CPU < 20% over 7 days — an
|
|
||||||
idle Postfix box qualifies) and demonstrably changes free-tier terms without
|
|
||||||
notice, enforcing by termination (June 2026: A1 allowance silently halved,
|
|
||||||
over-limit instances shut down). PAYG keeps Always-Free resources free and
|
|
||||||
exempts them from idle reclamation.
|
|
||||||
- **Shape**: `VM.Standard.E2.1.Micro` (x86, 1/8 OCPU burst, 1 GB RAM; 2
|
|
||||||
always-free instances allowed; ample for queue-only Postfix — and untouched
|
|
||||||
by the 2026 A1 cuts). ARM A1 fallback is **unreliable** (halved quota,
|
|
||||||
chronic Frankfurt capacity) — treat E2.1.Micro availability as the gate.
|
|
||||||
- **[CH] Reserved public IP is mandatory** (`oci_core_public_ip`, reserved):
|
|
||||||
an ephemeral IP rotates on stop/start and would silently break all four
|
|
||||||
IP-keyed controls at once (pfSense NAT source-restriction, the primary's
|
|
||||||
smtpd/rspamd exemptions, the Oracle security list, Prometheus scrape
|
|
||||||
allowlist) — discovered only at the next outage's drain.
|
|
||||||
- **OS**: Ubuntu 24.04. **[CH] OCI Ubuntu images ship an OS-level iptables
|
|
||||||
ruleset (`/etc/iptables/rules.v4`) that ACCEPTs 22 and REJECTs everything
|
|
||||||
else, independent of security lists** — cloud-init must insert ACCEPT rules
|
|
||||||
for 25/80 (+ scrape ports) ahead of the REJECT and persist them, or gate O2
|
|
||||||
fails on day 1 with a correct security list.
|
|
||||||
- **Credentials**: OCI API key for Terraform → Vault `secret/viktor`
|
|
||||||
(`oci_*`); web login → Vaultwarden item `Oracle Cloud (backup MX)`.
|
|
||||||
|
|
||||||
## Networking & security posture
|
|
||||||
|
|
||||||
- **Ingress on the VM**: TCP 25 world-open (the service). **[CH] TCP 80
|
|
||||||
world-open permanently** — Let's Encrypt validation is multi-perspective
|
|
||||||
with no published source IPs, so it cannot be source-scoped, and a
|
|
||||||
"open-only-during-renewal" toggle is unspecified automation whose realistic
|
|
||||||
failure mode is an expired cert at day ~90. Nothing listens on 80 outside
|
|
||||||
certbot's seconds-long renewal windows; connection-refused surface is
|
|
||||||
negligible. TCP 9100/9154 (exporters) restricted to the homelab WAN /32
|
|
||||||
(176.12.22.76) in both the Oracle security list and the VM firewall.
|
|
||||||
- **No public SSH**: management rides the headscale tailnet — cloud-init
|
|
||||||
enrolls via a **preauth key for a dedicated non-OIDC headscale user** with
|
|
||||||
node tag `tag:backup-mx` (headscale 0.28.0 file-mode ACL, content in Vault
|
|
||||||
`secret/headscale` → `headscale_acl`); SSH bound to the tailnet interface.
|
|
||||||
ACL grant: `group:admin → tag:backup-mx:22` (cluster pods are NOT tailnet
|
|
||||||
members — see monitoring). **[CH] Outage caveat**: headscale's control
|
|
||||||
plane + DERP live in the cluster, so mid-outage tailnet reachability is
|
|
||||||
cached-netmap best-effort — the runbook documents the **OCI instance
|
|
||||||
console connection as break-glass** management. (Also fix `vpn.md`'s stale
|
|
||||||
"0.23.x / OIDC-only" claims while in there.)
|
|
||||||
- **VM compromise blast radius**: plaintext of outage-queued mail + a relay
|
|
||||||
surface contained by `relay_domains = viktorbarzin.me` only, no submission
|
|
||||||
ports, no SASL, no local delivery. The VM is deliberately NOT added to the
|
|
||||||
primary's `mynetworks` (that would let a compromised VM relay arbitrary
|
|
||||||
mail *through* the primary) — per-stage exemptions instead, below.
|
|
||||||
|
|
||||||
## Postfix configuration (relay-only, accept-and-queue with 4xx-only hygiene)
|
|
||||||
|
|
||||||
- `relay_domains = viktorbarzin.me`; `mydestination =` (empty).
|
|
||||||
- **[CH]** `smtpd_relay_restrictions = permit_mynetworks,
|
|
||||||
reject_unauth_destination` — explicit 5xx for foreign-domain RCPTs (the
|
|
||||||
default tail is `defer_unauth_destination`, whose 4xx invites every relay
|
|
||||||
probe to retry forever).
|
|
||||||
- **[CH]** `relay_recipient_maps` explicitly set to the wildcard form
|
|
||||||
(`@viktorbarzin.me OK`) — documents accept-all-recipients as a decision
|
|
||||||
(the domain is catch-all; every RCPT is valid by definition).
|
|
||||||
- `transport_maps`: `viktorbarzin.me smtp:[mail.viktorbarzin.me]:2526`.
|
|
||||||
- `maximal_queue_lifetime = 30d`. **[CH]** `bounce_queue_lifetime = 1d` and
|
|
||||||
`delay_warning_time = 0` — this host can never deliver a DSN to anyone
|
|
||||||
(egress 25 blocked; its only egress is 2526 to the primary), so undeliverable
|
|
||||||
bounces must be discarded quickly or they rot in the queue for a month and
|
|
||||||
permanently poison the queue-depth alert.
|
|
||||||
- **[CH]** `message_size_limit = 209715200` — exactly the primary's 200 MB
|
|
||||||
(`POSTFIX_MESSAGE_SIZE_LIMIT`, mailserver main.tf:88). The stock 10 MB
|
|
||||||
default would 552-reject large legitimate mail during outages — the exact
|
|
||||||
loss mode this project exists to prevent. Equal, never higher (higher
|
|
||||||
recreates drain-time rejects).
|
|
||||||
- **[CH] postscreen on the VM in 4xx-only posture**: pregreet test ON
|
|
||||||
(fire-and-forget bots don't retry; real MTAs do — the whole design already
|
|
||||||
rests on sender retry, so 4xx filtering is loss-free by construction),
|
|
||||||
optionally `postscreen_dnsbl_action = defer` with a conservative threshold.
|
|
||||||
v2's blanket "no DNSBL" conflated 5xx reputation rejects (rightly banned)
|
|
||||||
with 4xx tempfail (harmless); without any hygiene the backup is a 24/7
|
|
||||||
spam backdoor since spammers deliberately deliver to the highest-numbered
|
|
||||||
MX. Zero 5xx from reputation, ever.
|
|
||||||
- `inet_protocols = ipv4` **[CH]** — the primary publishes an AAAA (HE
|
|
||||||
tunnel) but the IPv6 HAProxy bridge has no :2526 listener; skip the wasted
|
|
||||||
v6 attempt per delivery.
|
|
||||||
- `smtpd_tls_cert_file` = LE cert for `mx2.viktorbarzin.me` (opportunistic
|
|
||||||
STARTTLS inbound; `smtp_tls_security_level = may` on the drain leg).
|
|
||||||
- Queue disk: the ~45 GB free boot volume dwarfs any realistic 30-day
|
|
||||||
accumulation for a personal domain.
|
|
||||||
|
|
||||||
## TLS
|
|
||||||
|
|
||||||
certbot standalone HTTP-01 for `mx2.viktorbarzin.me` (no Cloudflare API token
|
|
||||||
on an internet-facing VM). Port 80 permanently open (see above); certbot renew
|
|
||||||
timer. The MTA-STS follow-up (separate task; policy host currently dangling —
|
|
||||||
below) must list `mx2.viktorbarzin.me` when implemented.
|
|
||||||
|
|
||||||
## Primary-side drain enablement **[CH — this section replaces v2's "SPF/DMARC exemption + postscreen permit", which exempted the wrong layers]**
|
|
||||||
|
|
||||||
The v2 exemptions targeted postscreen DNSBL (which is **off** on the primary —
|
|
||||||
`ENABLE_DNSBL` unset) and rspamd SPF/DMARC scoring — but missed the three
|
|
||||||
mechanisms that would actually break the drain. All are keyed on the VM's
|
|
||||||
reserved /32 (the PROXY-v2-recovered client IP):
|
|
||||||
|
|
||||||
1. **`reject_unknown_client_hostname` bypass** — the primary sets
|
|
||||||
`POSTFIX_REJECT_UNKNOWN_CLIENT_HOSTNAME=1` (main.tf:89); an Oracle IP
|
|
||||||
without full FCrDNS (PTR needs an Oracle SR; limited on free accounts)
|
|
||||||
would be **450-deferred on every drain attempt → the queue never drains →
|
|
||||||
mass-bounces at day 30**. Fix: `check_client_access` permit for the VM /32
|
|
||||||
early in `smtpd_client_restrictions`, and a matching permit at the sender
|
|
||||||
stage (SPOOF_PROTECTION=1 rejects unauthenticated own-domain envelope
|
|
||||||
senders — drained self-addressed/bounced mail would 5xx). Attempt the
|
|
||||||
Oracle PTR anyway (belt and braces).
|
|
||||||
2. **Anvil rate-limit exception** — `smtpd_client_message_rate_limit = 30`/min
|
|
||||||
keys on the VM's IP at drain; a >3,600-message backlog would throttle for
|
|
||||||
hours and false-fire the queue alert. Add the VM /32 to
|
|
||||||
`smtpd_client_event_limit_exceptions`.
|
|
||||||
3. **rspamd: evaluate the original sender, never 5xx the drain stream** — via
|
|
||||||
the existing override.d ConfigMap pattern (same mount as
|
|
||||||
`dkim_signing.conf`): (a) configure rspamd's **`external_relay`** module
|
|
||||||
(ip_map = VM /32) so SPF/DMARC/IP reputation evaluate against the
|
|
||||||
*original* client IP parsed from the VM's Received header — this keeps
|
|
||||||
DMARC protection for the entire drain stream instead of v2's blanket
|
|
||||||
disable; (b) cap rspamd's **action at the VM /32 to tag/fold — never
|
|
||||||
milter-reject**: the primary's default reject tier (DMS default, active
|
|
||||||
since only dkim_signing is overridden today) would 5xx high-score spam at
|
|
||||||
DATA, forcing the VM to generate DSNs to forged senders = classic
|
|
||||||
backup-MX backscatter → mx2's IP blacklisted. Drained spam lands tagged in
|
|
||||||
the catch-all's Junk instead. Validate the external_relay ↔ settings-rule
|
|
||||||
interplay at gate O5 with a high-spam-score message.
|
|
||||||
4. postscreen permit for the /32 (harmless; pregreet never trips a real
|
|
||||||
Postfix client and DNSBL is off — kept for future-proofing only).
|
|
||||||
|
|
||||||
## Our-side changes (Terraform unless noted)
|
|
||||||
|
|
||||||
1. **New stack `stacks/backup-mx/`** (Tier 1): OCI provider (creds from
|
|
||||||
Vault), VCN + subnet + security list + **reserved public IP** +
|
|
||||||
`VM.Standard.E2.1.Micro` + cloud-init (`templatefile`): **OS iptables
|
|
||||||
ACCEPTs for 25/80/9100/9154 ahead of the OCI image's REJECT rule
|
|
||||||
(persisted)**, postfix + config above, certbot, tailscale→headscale
|
|
||||||
enrollment (preauth key from Vault), node_exporter, postfix_exporter,
|
|
||||||
unattended-upgrades.
|
|
||||||
2. **DNS** — `stacks/cloudflared/modules/cloudflared/cloudflare.tf`: A
|
|
||||||
`mx2.viktorbarzin.me` → reserved IP (non-proxied), MX pref 20 → `mx2`.
|
|
||||||
**[CH] Live zone count verified: 195/200 → 197/200 after this change; only
|
|
||||||
3 slots remain and the MTA-STS follow-up needs 1–2 → plan the next
|
|
||||||
record-purge now, not at collision time.**
|
|
||||||
3. **pfSense (live network device — approved as part of this plan)**: WAN NAT
|
|
||||||
rdr `TCP 2526 → 10.0.20.1:25` + firewall rule, source-restricted to the
|
|
||||||
reserved IP. **[CH] Scripted** (extend the existing
|
|
||||||
`scripts/pfsense-*-haproxy*.php` bootstrap-script family), not
|
|
||||||
hand-clicked — keeps the git-rebuildable parity the rest of the pfSense
|
|
||||||
mail config has. Config.xml rides the nightly backup.
|
|
||||||
4. **Mailserver stack**: the four-layer drain enablement above (client+sender
|
|
||||||
`check_client_access` permits, anvil exception, rspamd external_relay +
|
|
||||||
action cap, postscreen permit) — all keyed to one /32, via the existing
|
|
||||||
`postfix_cf` / `user-patches.sh` / rspamd-override hook points (verified
|
|
||||||
present: main.tf:129-144, 222-281, 467-474).
|
|
||||||
5. **Monitoring [CH — replaces v2's tailnet scraping, which had no transport:
|
|
||||||
no cluster→tailnet route exists and no existing target is scraped that
|
|
||||||
way]**: Prometheus scrapes `node_exporter`/`postfix_exporter` on the VM's
|
|
||||||
**public reserved IP**, allowed only from the homelab WAN /32 (Oracle SL +
|
|
||||||
VM firewall); blackbox TCP:25 from the cluster (`BackupMxDown`, warning);
|
|
||||||
MX-set drift assertion (both MX records present). Alerts:
|
|
||||||
`BackupMxQueueStuck` = **non-bounce** queue depth > 0 for 2 h while the
|
|
||||||
primary is healthy (gate on the existing `MailServerDown`/roundtrip
|
|
||||||
series, machine-readable — not prose); bounce residue is excluded by the
|
|
||||||
1-day bounce lifetime. Note: during a full homelab outage Prometheus
|
|
||||||
itself is down — queue growth is unobservable live under ANY transport;
|
|
||||||
what we actually watch is the post-recovery drain. A WAN-IP change stales
|
|
||||||
the Oracle allowlist → visible as ScrapeTargetDown (self-signaling).
|
|
||||||
**Probe semantics note**: once mx2 exists, the Brevo roundtrip probe's
|
|
||||||
mail fails over to mx2 on transient primary blips and arrives minutes late
|
|
||||||
via the drain — `EmailRoundtripFailing` may then mean "delayed via mx2",
|
|
||||||
not "lost"; note in the alert description and runbook.
|
|
||||||
6. **Docs (same commit as implementation)**: rewrite `mailserver.md` §"No
|
|
||||||
Backup MX", new runbook `docs/runbooks/backup-mx.md` (`postqueue -p`,
|
|
||||||
forced drain `postqueue -f`, cert renewal, **OCI console break-glass**, VM
|
|
||||||
rebuild from stack, Oracle account facts incl. PAYG + home-region lock),
|
|
||||||
`vpn.md` headscale-version/OIDC staleness fix, monitoring rows.
|
|
||||||
|
|
||||||
### MTA-STS finding (unchanged; no action in this change)
|
|
||||||
|
|
||||||
`_mta-sts` TXT is published but `mta-sts.viktorbarzin.me` has no record and
|
|
||||||
nothing serves the policy — MTA-STS is inert today. When fixed, the policy
|
|
||||||
MUST include `mx: mx2.viktorbarzin.me` (and budget its DNS records against the
|
|
||||||
3 remaining zone slots).
|
|
||||||
|
|
||||||
## Validation gates (in order; any failure → stop and report)
|
|
||||||
|
|
||||||
| # | Gate | Method | Failure handling |
|
|
||||||
|---|------|--------|------------------|
|
|
||||||
| O1 | Oracle account (home region `eu-frankfurt-1`, **fixed forever at signup**), **PAYG conversion done**, E2.1.Micro capacity | Viktor signs up + converts; TF apply | A1-in-home-region is a best-effort fallback only (halved quota, contended); else decision returns to Viktor |
|
|
||||||
| O2 | Inbound TCP 25 reachable from the internet (after the OS-iptables fix) | `nc -zv <reserved-ip> 25` from outside + recurring Uptime-Kuma TCP monitor (keeps proving it — Oracle publishes no commitment) | Stop; decision returns to Viktor |
|
|
||||||
| O3 | Drain works: VM → `mail.viktorbarzin.me:2526` delivers end-to-end | Test message injected on the VM | Debug pfSense NAT / HAProxy path |
|
|
||||||
| O4 | LE cert issued | certbot standalone | STARTTLS is opportunistic — non-blocking for go-live; fix before MTA-STS |
|
|
||||||
| O5 | Live failover test — **hardened [CH]** | presence-claim → scale mailserver to 0 (~30 min) → send from Gmail + Brevo **plus a high-spam-score message and a >10 MB message** → confirm queued (`postqueue -p`) → scale up → verify full drain within the anvil-exception expectations, spam folded to Junk (not bounced), headers show original-IP SPF/DMARC evaluation, no DSN generated on the VM, roundtrip probe recovers | Debug or roll back (remove MX record) |
|
|
||||||
|
|
||||||
## Failure modes
|
|
||||||
|
|
||||||
Covered: cluster/pod outages, pfSense/power/ISP outages ≤ 30 days, WAN IP
|
|
||||||
changes, short-retry senders. If pfSense is down the drain waits — Postfix
|
|
||||||
retries until it heals.
|
|
||||||
|
|
||||||
Not covered: primary-up-but-5xx misconfigs; outbound; mid-outage mailbox
|
|
||||||
access; **outages > 30 days lose queued mail silently (no DSN possible)**.
|
|
||||||
Simultaneous Oracle+homelab outage = status quo ante (sender retries).
|
|
||||||
|
|
||||||
Newly introduced, accepted:
|
|
||||||
|
|
||||||
- **A pet outside the cluster** — deliberately cattle: rebuilt from TF +
|
|
||||||
cloud-init, patched by unattended-upgrades, scraped by Prometheus. Never a
|
|
||||||
backup target.
|
|
||||||
- **Oracle free-tier caprice [CH — upgraded from v2's framing]**: Oracle has
|
|
||||||
silently cut Always-Free allowances and terminated over-limit instances
|
|
||||||
(June 2026, A1). Mitigations: PAYG (required), recurring inbound-25 probe,
|
|
||||||
`BackupMxDown`, and the fact that outside an active outage the queue is
|
|
||||||
empty — a surprise reclamation loses nothing, only coverage until rebuilt.
|
|
||||||
Rollernet Basic ($30/yr) stays the documented fallback if OCI sours.
|
|
||||||
- **Spam hygiene**: 4xx-only postscreen on the VM (pregreet + conservative
|
|
||||||
DNSBL-defer) instead of v2's nothing; drained spam is tagged/folded by
|
|
||||||
rspamd, never bounced.
|
|
||||||
- Outage mail sits plaintext on Oracle disk ≤ 30 days (single-tenant;
|
|
||||||
accepted).
|
|
||||||
|
|
||||||
## Rollback
|
|
||||||
|
|
||||||
Remove the MX + A records; wait for `postqueue -p` empty; `terraform destroy`
|
|
||||||
on `backup-mx`; delete the pfSense NAT rule (scripted); drop the mailserver
|
|
||||||
/32 exemptions. Order matters: MX record first.
|
|
||||||
|
|
||||||
## Viktor's manual steps (everything else is mine)
|
|
||||||
|
|
||||||
1. Create the Oracle Cloud account — **home region `eu-frankfurt-1`** (fixed
|
|
||||||
forever), card for identity, $0 charged.
|
|
||||||
2. **Convert the tenancy to Pay-As-You-Go** (required — idle-reclamation
|
|
||||||
exemption; Always-Free stays $0).
|
|
||||||
3. Hand me the tenancy OCID + a console user → I mint the API key, store
|
|
||||||
creds (Vault + Vaultwarden), and build the stack.
|
|
||||||
4. Approve the (scripted) pfSense NAT rule when I reach that step.
|
|
||||||
|
|
@ -1,89 +0,0 @@
|
||||||
# Drone Logbook (Open DroneLog) — Design
|
|
||||||
|
|
||||||
**Date:** 2026-07-04
|
|
||||||
**Status:** Approved (Viktor, 2026-07-04)
|
|
||||||
**Owner request:** "I have a DJI Mini 4 Pro. I'm interested in github.com/ViktorBarzin/drone-logbook" → self-host it in the cluster.
|
|
||||||
|
|
||||||
## Goal
|
|
||||||
|
|
||||||
Self-host [Open DroneLog](https://github.com/arpanghosh8453/open-dronelog) (upstream of the
|
|
||||||
`ViktorBarzin/drone-logbook` fork) at **https://dronelog.viktorbarzin.me** so Viktor can import
|
|
||||||
DJI Fly flight logs from his DJI Mini 4 Pro and analyze them privately: telemetry charts, 3D map
|
|
||||||
replay, per-flight and lifetime stats. All data stays in the cluster (single DuckDB database).
|
|
||||||
|
|
||||||
## Decisions (interview, 2026-07-04)
|
|
||||||
|
|
||||||
| Question | Decision |
|
|
||||||
|---|---|
|
|
||||||
| Deployment form | Self-hosted Docker web app in k8s (not desktop app, not hosted webapp) |
|
|
||||||
| Exposure | Public `dronelog.viktorbarzin.me`, **Authentik forward-auth** (`auth = "required"`) |
|
|
||||||
| Log ingestion | **Both** manual web upload *and* a server-side auto-import drop folder from day one |
|
|
||||||
| Image source | **Upstream** `ghcr.io/arpanghosh8453/open-dronelog:latest` — NOT the fork |
|
|
||||||
| Fork disposition | Fork is 0 ahead / 372 behind, adds nothing; delete or park it. Only revive (sync + ADR-0002 GHA build) if Viktor starts modifying the code |
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
New Tier-1 stack `stacks/drone-logbook/`, modeled line-by-line on `stacks/freshrss/`
|
|
||||||
(the closest existing shape: single upstream-image app, own data volume, Keel-updated):
|
|
||||||
|
|
||||||
- **Namespace** `drone-logbook`, tier `4-aux`, label `keel.sh/enrolled=true` → Kyverno injects
|
|
||||||
Keel poll annotations → auto-upgrades as upstream releases (project is actively maintained).
|
|
||||||
- **Deployment** (1 replica, `Recreate` — DuckDB is single-writer/embedded):
|
|
||||||
- image `ghcr.io/arpanghosh8453/open-dronelog:latest` (nginx frontend + Axum REST backend, port 80)
|
|
||||||
- memory request=limit **512Mi** (DuckDB import/analytics spikes), cpu request 25m, no cpu limit
|
|
||||||
- standard `KYVERNO_LIFECYCLE_V1` / `KEEL_IGNORE_IMAGE` / `KEEL_LIFECYCLE_V1` lifecycle ignores
|
|
||||||
- **App data** `/data/drone-logbook` (DuckDB db, cached DJI decryption keys, uploaded originals):
|
|
||||||
**`proxmox-lvm-encrypted` block PVC** `drone-logbook-data-encrypted`, 2Gi, topolvm autoresize →
|
|
||||||
10Gi ceiling. Encrypted class because flight logs are GPS traces of home/travel — sensitive data
|
|
||||||
defaults to `proxmox-lvm-encrypted` per the storage decision rule (`.claude/CLAUDE.md`).
|
|
||||||
Embedded DBs stay off NFS (same rationale documented in the freshrss stack: NFS only for static files).
|
|
||||||
- **Backup CronJob** `drone-logbook-backup` (mandatory for every proxmox-lvm app): daily 01:30
|
|
||||||
file copy of the data volume → NFS `/srv/nfs/drone-logbook-backup` (dated dirs, 30-day retention,
|
|
||||||
Pushgateway metrics), pod-affinity co-scheduled with the app pod (RWO volume). 01:30 sits outside
|
|
||||||
the 00:00/08:00/16:00 sync-import windows so the DuckDB file is quiescent; retained upload
|
|
||||||
originals make even a torn copy recoverable by re-import. `nfs-mirror` (02:00) ships it to sda →
|
|
||||||
Synology offsite. Vaultwarden pattern.
|
|
||||||
- **Sync drop folder**: static NFS volume (`modules/kubernetes/nfs_volume`)
|
|
||||||
`192.168.1.127:/srv/nfs/drone-logbook/sync-logs`, mounted **read-only** at `/sync-logs`;
|
|
||||||
`SYNC_LOGS_PATH=/sync-logs`, `SYNC_INTERVAL="0 0 */8 * * *"` (every 8 h).
|
|
||||||
Any producer (Nextcloud sync, scp, a future phone pipeline) drops `.txt` logs there; the app
|
|
||||||
imports them automatically. `KEEP_UPLOADED_FILES=true` keeps re-importable originals in the PVC.
|
|
||||||
- **Ingress** via `ingress_factory`: `name = "dronelog"`, `auth = "required"` (Authentik
|
|
||||||
forward-auth), `dns_type = "proxied"`. External Uptime Kuma HTTPS monitor comes automatically
|
|
||||||
with the ingress annotation. Homepage tile (group "Media & Entertainment", icon `mdi-quadcopter`).
|
|
||||||
- **Secrets**: Vault KV `secret/drone-logbook` (`profile_creation_pass`) → ExternalSecret
|
|
||||||
(`vault-kv` ClusterSecretStore) → k8s secret `drone-logbook-secrets` → env
|
|
||||||
`PROFILE_CREATION_PASS`. Gates profile create/delete even for other Authentik-logged-in users.
|
|
||||||
No plan-time secret reads needed (no `data "kubernetes_secret"`).
|
|
||||||
No `DJI_API_KEY` — bundled default is fine at personal import volume; add later if rate-limited.
|
|
||||||
|
|
||||||
## Operational notes
|
|
||||||
|
|
||||||
- **DJI egress dependency**: importing a *new* log file requires the pod to reach DJI's servers
|
|
||||||
once (flight-log decryption key fetch; keys are then cached in the data dir). Remember this when
|
|
||||||
egress enforcement lands (Security wave 1, beads `code-8ywc`).
|
|
||||||
- The web UI is desktop-first; mobile is functional but basic.
|
|
||||||
- NFS host prerequisite: `/srv/nfs/drone-logbook/sync-logs` (root:www-data, 2775 — same shape as
|
|
||||||
sibling dirs) and `/srv/nfs/drone-logbook-backup` created on 192.168.1.127 and recorded in
|
|
||||||
`secrets/nfs_directories.txt`. `/srv/nfs` is exported whole-tree, so no `/etc/exports`
|
|
||||||
(`scripts/pve-nfs-exports`) change.
|
|
||||||
- Backup story = the daily app-level backup CronJob (above) + the host `daily-backup` LVM-snapshot
|
|
||||||
leg + original log files retained both in the drop folder and in the data volume
|
|
||||||
(`KEEP_UPLOADED_FILES=true`).
|
|
||||||
|
|
||||||
## Alternatives considered
|
|
||||||
|
|
||||||
- **Build from the fork** (`ghcr.io/viktorbarzin/...` via GHA, ADR-0002): rejected for now — fork
|
|
||||||
has zero custom commits; a build chain adds maintenance for no benefit. Revisit if code changes
|
|
||||||
are wanted.
|
|
||||||
- **`auth = "app"` + app profile passwords** (would enable the `opendronelog-sync` native uploader
|
|
||||||
from anywhere): rejected — a single app password guarding GPS traces of home/travel on the open
|
|
||||||
internet is weaker than Authentik; the sync drop folder covers automated ingestion instead.
|
|
||||||
- **Internal-only (.lan + VPN)**: rejected — Authentik-gated public matches the rest of the
|
|
||||||
homelab and works without VPN while traveling.
|
|
||||||
- **NFS for the DuckDB data**: rejected — embedded-DB-on-NFS locking risk; freshrss precedent
|
|
||||||
keeps app DB data on proxmox-lvm.
|
|
||||||
|
|
||||||
## Implementation
|
|
||||||
|
|
||||||
See `2026-07-04-drone-logbook-plan.md`.
|
|
||||||
|
|
@ -1,542 +0,0 @@
|
||||||
# Drone Logbook (Open DroneLog) Implementation Plan
|
|
||||||
|
|
||||||
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
|
|
||||||
|
|
||||||
**Goal:** Deploy Open DroneLog (DJI flight-log analyzer) at https://dronelog.viktorbarzin.me — new Tier-1 stack `stacks/drone-logbook/`, upstream image, Authentik-gated, with a DuckDB data PVC and an NFS auto-import drop folder.
|
|
||||||
|
|
||||||
**Architecture:** Single Deployment running `ghcr.io/arpanghosh8453/open-dronelog:latest` (nginx + Axum + DuckDB, port 80) in namespace `drone-logbook`; data on a `proxmox-lvm-encrypted` PVC (GPS logs = sensitive data), `/sync-logs` drop folder on static NFS, daily backup CronJob to `/srv/nfs/drone-logbook-backup` (vaultwarden pattern), `ingress_factory` with `auth = "required"`, Keel auto-upgrades via namespace enrollment. Modeled line-by-line on `stacks/freshrss/`. Design: `2026-07-04-drone-logbook-design.md`.
|
|
||||||
|
|
||||||
**Tech Stack:** Terraform/Terragrunt (Tier-1 PG state), Vault KV + ESO, ingress_factory, nfs_volume module, Keel/Kyverno.
|
|
||||||
|
|
||||||
Terraform is exempt from TDD (execution.md); each task ends with a concrete verification instead.
|
|
||||||
|
|
||||||
---
|
|
||||||
|
|
||||||
### Task 1: Vault secret
|
|
||||||
|
|
||||||
**Files:** none (Vault KV only)
|
|
||||||
|
|
||||||
- [ ] **Step 1.1: Create `secret/drone-logbook` with a generated profile-creation password**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
vault kv put secret/drone-logbook profile_creation_pass="$(openssl rand -base64 24)"
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 1.2: Verify**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
vault kv get -field=profile_creation_pass secret/drone-logbook | wc -c
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: `33` (32 chars + newline). Never echo the value itself.
|
|
||||||
|
|
||||||
### Task 2: NFS drop folder on 192.168.1.127
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Modify: `secrets/nfs_directories.txt` (git-crypt'd — **edit from the MAIN checkout only**, never the worktree; sorted list, add `drone-logbook/sync-logs`)
|
|
||||||
|
|
||||||
- [ ] **Step 2.1: Create the directories** — world-writable + setgid like `vaultwarden-backup` (the `/srv/nfs` export root-squashes, so pod-root writes land as `nobody`):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ssh root@192.168.1.127 'mkdir -p /srv/nfs/drone-logbook/sync-logs /srv/nfs/drone-logbook-backup && chown -R root:www-data /srv/nfs/drone-logbook /srv/nfs/drone-logbook-backup && chmod 2777 /srv/nfs/drone-logbook/sync-logs /srv/nfs/drone-logbook-backup && ls -ld /srv/nfs/drone-logbook/sync-logs /srv/nfs/drone-logbook-backup'
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: `drwxrwsrwx ... root www-data ...` for both.
|
|
||||||
No `/etc/exports` (`scripts/pve-nfs-exports`) change — `/srv/nfs` is exported whole-tree.
|
|
||||||
|
|
||||||
- [ ] **Step 2.2: Record them in the declarative list (MAIN checkout, plaintext there)** — insert `drone-logbook-backup` and `drone-logbook/sync-logs` (after `diun`, before `etcd-backup`) in `~/code/infra/secrets/nfs_directories.txt`, then commit that single file to master:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git -C ~/code/infra add secrets/nfs_directories.txt
|
|
||||||
git -C ~/code/infra commit -m "nfs_directories: add drone-logbook/sync-logs
|
|
||||||
|
|
||||||
Drop folder for the new drone-logbook stack's auto-import (SYNC_LOGS_PATH).
|
|
||||||
Directory created on 192.168.1.127 root:www-data 2775."
|
|
||||||
git -C ~/code/infra push forgejo master
|
|
||||||
```
|
|
||||||
|
|
||||||
(Trivial single-file exception per execution.md; encrypted files cannot be edited from the worktree.)
|
|
||||||
|
|
||||||
### Task 3: Stack files (in the `wizard/drone-logbook` worktree)
|
|
||||||
|
|
||||||
**Files:**
|
|
||||||
- Create: `stacks/drone-logbook/main.tf` (content below)
|
|
||||||
- Create: `stacks/drone-logbook/terragrunt.hcl` (content below)
|
|
||||||
- Create: `stacks/drone-logbook/secrets` → symlink to `../../secrets`
|
|
||||||
- (`backend.tf`, `tiers.tf`, `cloudflare_provider.tf`, `providers.tf`, `.terraform.lock.hcl` are terragrunt-generated and **gitignored** — do NOT create or commit them; the tracked copies in old stacks like freshrss predate the ignore rule)
|
|
||||||
|
|
||||||
- [ ] **Step 3.1: `terragrunt.hcl`**
|
|
||||||
|
|
||||||
```hcl
|
|
||||||
include "root" {
|
|
||||||
path = find_in_parent_folders()
|
|
||||||
}
|
|
||||||
|
|
||||||
dependency "platform" {
|
|
||||||
config_path = "../platform"
|
|
||||||
skip_outputs = true
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3.2: `main.tf`** — exact content:
|
|
||||||
|
|
||||||
```hcl
|
|
||||||
variable "tls_secret_name" {
|
|
||||||
type = string
|
|
||||||
sensitive = true
|
|
||||||
}
|
|
||||||
variable "nfs_server" { type = string }
|
|
||||||
|
|
||||||
# Open DroneLog (https://github.com/arpanghosh8453/open-dronelog) — self-hosted
|
|
||||||
# DJI flight-log analyzer for the DJI Mini 4 Pro. Runs the UPSTREAM image (the
|
|
||||||
# ViktorBarzin/drone-logbook fork has no custom commits); Keel tracks :latest.
|
|
||||||
# Design: docs/plans/2026-07-04-drone-logbook-design.md
|
|
||||||
resource "kubernetes_namespace" "drone_logbook" {
|
|
||||||
metadata {
|
|
||||||
name = "drone-logbook"
|
|
||||||
labels = {
|
|
||||||
tier = local.tiers.aux
|
|
||||||
"keel.sh/enrolled" = "true"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
lifecycle {
|
|
||||||
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
|
|
||||||
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_manifest" "external_secret" {
|
|
||||||
field_manager {
|
|
||||||
force_conflicts = true
|
|
||||||
}
|
|
||||||
manifest = {
|
|
||||||
apiVersion = "external-secrets.io/v1"
|
|
||||||
kind = "ExternalSecret"
|
|
||||||
metadata = {
|
|
||||||
name = "drone-logbook-secrets"
|
|
||||||
namespace = "drone-logbook"
|
|
||||||
}
|
|
||||||
spec = {
|
|
||||||
refreshInterval = "15m"
|
|
||||||
secretStoreRef = {
|
|
||||||
name = "vault-kv"
|
|
||||||
kind = "ClusterSecretStore"
|
|
||||||
}
|
|
||||||
target = {
|
|
||||||
name = "drone-logbook-secrets"
|
|
||||||
}
|
|
||||||
dataFrom = [{
|
|
||||||
extract = {
|
|
||||||
key = "drone-logbook"
|
|
||||||
}
|
|
||||||
}]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
depends_on = [kubernetes_namespace.drone_logbook]
|
|
||||||
}
|
|
||||||
|
|
||||||
module "tls_secret" {
|
|
||||||
source = "../../modules/kubernetes/setup_tls_secret"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
tls_secret_name = var.tls_secret_name
|
|
||||||
}
|
|
||||||
|
|
||||||
# DuckDB database + cached DJI decryption keys + uploaded originals.
|
|
||||||
# Embedded DB -> block storage, not NFS (same rationale as freshrss data).
|
|
||||||
# Encrypted class: flight logs are GPS traces of home/travel (sensitive data
|
|
||||||
# -> proxmox-lvm-encrypted per the storage decision rule in .claude/CLAUDE.md).
|
|
||||||
resource "kubernetes_persistent_volume_claim" "data" {
|
|
||||||
wait_until_bound = false
|
|
||||||
metadata {
|
|
||||||
name = "drone-logbook-data-encrypted"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
annotations = {
|
|
||||||
"resize.topolvm.io/threshold" = "10%"
|
|
||||||
"resize.topolvm.io/increase" = "100%"
|
|
||||||
"resize.topolvm.io/storage_limit" = "10Gi"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
spec {
|
|
||||||
access_modes = ["ReadWriteOnce"]
|
|
||||||
storage_class_name = "proxmox-lvm-encrypted"
|
|
||||||
resources {
|
|
||||||
requests = {
|
|
||||||
storage = "2Gi"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
lifecycle {
|
|
||||||
# The autoresizer expands requests.storage up to storage_limit and PVCs
|
|
||||||
# can't shrink; without this every apply tries to revert the size.
|
|
||||||
ignore_changes = [spec[0].resources[0].requests]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# Drop folder: any producer (Nextcloud sync, scp, future phone pipeline) lands
|
|
||||||
# DJI .txt logs here over NFS; the app auto-imports on SYNC_INTERVAL.
|
|
||||||
module "nfs_sync_logs" {
|
|
||||||
source = "../../modules/kubernetes/nfs_volume"
|
|
||||||
name = "drone-logbook-sync-logs"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
nfs_server = var.nfs_server
|
|
||||||
nfs_path = "/srv/nfs/drone-logbook/sync-logs"
|
|
||||||
storage = "5Gi"
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_deployment" "drone_logbook" {
|
|
||||||
metadata {
|
|
||||||
name = "drone-logbook"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
labels = {
|
|
||||||
app = "drone-logbook"
|
|
||||||
"kubernetes.io/cluster-service" = "true"
|
|
||||||
tier = local.tiers.aux
|
|
||||||
}
|
|
||||||
}
|
|
||||||
spec {
|
|
||||||
replicas = 1
|
|
||||||
strategy {
|
|
||||||
# DuckDB is single-writer; never overlap two pods on the same volume
|
|
||||||
type = "Recreate"
|
|
||||||
}
|
|
||||||
selector {
|
|
||||||
match_labels = {
|
|
||||||
app = "drone-logbook"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
template {
|
|
||||||
metadata {
|
|
||||||
labels = {
|
|
||||||
app = "drone-logbook"
|
|
||||||
"kubernetes.io/cluster-service" = "true"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
spec {
|
|
||||||
container {
|
|
||||||
name = "drone-logbook"
|
|
||||||
image = "ghcr.io/arpanghosh8453/open-dronelog:latest"
|
|
||||||
env {
|
|
||||||
name = "RUST_LOG"
|
|
||||||
value = "info"
|
|
||||||
}
|
|
||||||
env {
|
|
||||||
# keep re-importable originals under /data/drone-logbook/uploaded
|
|
||||||
name = "KEEP_UPLOADED_FILES"
|
|
||||||
value = "true"
|
|
||||||
}
|
|
||||||
env {
|
|
||||||
name = "SYNC_LOGS_PATH"
|
|
||||||
value = "/sync-logs"
|
|
||||||
}
|
|
||||||
env {
|
|
||||||
# 6-field cron (sec min hour dom mon dow): scan drop folder every 8h
|
|
||||||
name = "SYNC_INTERVAL"
|
|
||||||
value = "0 0 */8 * * *"
|
|
||||||
}
|
|
||||||
env {
|
|
||||||
name = "PROFILE_CREATION_PASS"
|
|
||||||
value_from {
|
|
||||||
secret_key_ref {
|
|
||||||
name = "drone-logbook-secrets"
|
|
||||||
key = "profile_creation_pass"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
volume_mount {
|
|
||||||
name = "data"
|
|
||||||
mount_path = "/data/drone-logbook"
|
|
||||||
}
|
|
||||||
volume_mount {
|
|
||||||
name = "sync-logs"
|
|
||||||
mount_path = "/sync-logs"
|
|
||||||
read_only = true
|
|
||||||
}
|
|
||||||
port {
|
|
||||||
name = "http"
|
|
||||||
container_port = 80
|
|
||||||
protocol = "TCP"
|
|
||||||
}
|
|
||||||
resources {
|
|
||||||
requests = {
|
|
||||||
cpu = "25m"
|
|
||||||
memory = "512Mi"
|
|
||||||
}
|
|
||||||
limits = {
|
|
||||||
memory = "512Mi"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
volume {
|
|
||||||
name = "data"
|
|
||||||
persistent_volume_claim {
|
|
||||||
claim_name = kubernetes_persistent_volume_claim.data.metadata[0].name
|
|
||||||
}
|
|
||||||
}
|
|
||||||
volume {
|
|
||||||
name = "sync-logs"
|
|
||||||
persistent_volume_claim {
|
|
||||||
claim_name = module.nfs_sync_logs.claim_name
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
depends_on = [kubernetes_manifest.external_secret]
|
|
||||||
lifecycle {
|
|
||||||
ignore_changes = [
|
|
||||||
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
|
|
||||||
metadata[0].annotations["keel.sh/policy"],
|
|
||||||
metadata[0].annotations["keel.sh/trigger"],
|
|
||||||
metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
|
|
||||||
metadata[0].annotations["keel.sh/match-tag"],
|
|
||||||
spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
|
|
||||||
metadata[0].annotations["kubernetes.io/change-cause"],
|
|
||||||
metadata[0].annotations["deployment.kubernetes.io/revision"],
|
|
||||||
spec[0].template[0].metadata[0].annotations["keel.sh/update-time"], # KEEL_LIFECYCLE_V1
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_service" "drone_logbook" {
|
|
||||||
metadata {
|
|
||||||
name = "drone-logbook"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
labels = {
|
|
||||||
"app" = "drone-logbook"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
spec {
|
|
||||||
selector = {
|
|
||||||
app = "drone-logbook"
|
|
||||||
}
|
|
||||||
port {
|
|
||||||
port = "80"
|
|
||||||
target_port = "80"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# -----------------------------------------------------------------------------
|
|
||||||
# Backup — required for every proxmox-lvm(-encrypted) app: daily copy of the
|
|
||||||
# data volume to NFS /srv/nfs/drone-logbook-backup (picked up by nfs-mirror ->
|
|
||||||
# sda -> Synology offsite). 01:30 = outside the 00:00/08:00/16:00 sync-import
|
|
||||||
# windows, so the DuckDB file is quiescent; uploaded originals make even a
|
|
||||||
# mid-write copy recoverable by re-import. Pod-affinity co-schedules with the
|
|
||||||
# app pod (RWO volume mounts twice only on the same node). Vaultwarden pattern.
|
|
||||||
# -----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
module "nfs_backup" {
|
|
||||||
source = "../../modules/kubernetes/nfs_volume"
|
|
||||||
name = "drone-logbook-backup-host"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
nfs_server = var.nfs_server
|
|
||||||
nfs_path = "/srv/nfs/drone-logbook-backup"
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_cron_job_v1" "backup" {
|
|
||||||
metadata {
|
|
||||||
name = "drone-logbook-backup"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
}
|
|
||||||
spec {
|
|
||||||
concurrency_policy = "Replace"
|
|
||||||
failed_jobs_history_limit = 5
|
|
||||||
schedule = "30 1 * * *"
|
|
||||||
starting_deadline_seconds = 300
|
|
||||||
successful_jobs_history_limit = 3
|
|
||||||
job_template {
|
|
||||||
metadata {}
|
|
||||||
spec {
|
|
||||||
backoff_limit = 3
|
|
||||||
ttl_seconds_after_finished = 10
|
|
||||||
template {
|
|
||||||
metadata {}
|
|
||||||
spec {
|
|
||||||
affinity {
|
|
||||||
pod_affinity {
|
|
||||||
required_during_scheduling_ignored_during_execution {
|
|
||||||
label_selector {
|
|
||||||
match_labels = {
|
|
||||||
app = "drone-logbook"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
topology_key = "kubernetes.io/hostname"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
container {
|
|
||||||
name = "drone-logbook-backup"
|
|
||||||
image = "docker.io/library/alpine"
|
|
||||||
command = ["/bin/sh", "-c", <<-EOT
|
|
||||||
set -euxo pipefail
|
|
||||||
_t0=$(date +%s)
|
|
||||||
now=$(date +"%Y_%m_%d_%H_%M")
|
|
||||||
mkdir -p /backup/$now
|
|
||||||
cp -a /data/. /backup/$now/
|
|
||||||
# Rotate — 30 day retention
|
|
||||||
find /backup -maxdepth 1 -mindepth 1 -type d -mtime +30 -exec rm -rf {} +
|
|
||||||
_dur=$(($(date +%s) - _t0))
|
|
||||||
_out_bytes=$(du -sb /backup/$now | awk '{print $1}')
|
|
||||||
wget -qO- --post-data "backup_duration_seconds $${_dur}
|
|
||||||
backup_output_bytes $${_out_bytes}
|
|
||||||
backup_last_success_timestamp $(date +%s)
|
|
||||||
" "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/drone-logbook-backup" || true
|
|
||||||
EOT
|
|
||||||
]
|
|
||||||
volume_mount {
|
|
||||||
name = "data"
|
|
||||||
mount_path = "/data"
|
|
||||||
read_only = true
|
|
||||||
}
|
|
||||||
volume_mount {
|
|
||||||
name = "backup"
|
|
||||||
mount_path = "/backup"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
volume {
|
|
||||||
name = "data"
|
|
||||||
persistent_volume_claim {
|
|
||||||
claim_name = kubernetes_persistent_volume_claim.data.metadata[0].name
|
|
||||||
}
|
|
||||||
}
|
|
||||||
volume {
|
|
||||||
name = "backup"
|
|
||||||
persistent_volume_claim {
|
|
||||||
claim_name = module.nfs_backup.claim_name
|
|
||||||
}
|
|
||||||
}
|
|
||||||
dns_config {
|
|
||||||
option {
|
|
||||||
name = "ndots"
|
|
||||||
value = "2"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
lifecycle {
|
|
||||||
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
|
||||||
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# https://dronelog.viktorbarzin.me
|
|
||||||
module "ingress" {
|
|
||||||
source = "../../modules/kubernetes/ingress_factory"
|
|
||||||
auth = "required" # Authentik forward-auth — flight logs are GPS traces of home/travel
|
|
||||||
dns_type = "proxied"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
name = "dronelog"
|
|
||||||
service_name = "drone-logbook"
|
|
||||||
tls_secret_name = var.tls_secret_name
|
|
||||||
extra_annotations = {
|
|
||||||
"gethomepage.dev/enabled" = "true"
|
|
||||||
"gethomepage.dev/name" = "Drone Logbook"
|
|
||||||
"gethomepage.dev/description" = "DJI flight log analyzer"
|
|
||||||
"gethomepage.dev/icon" = "mdi-quadcopter"
|
|
||||||
"gethomepage.dev/group" = "Media & Entertainment"
|
|
||||||
"gethomepage.dev/pod-selector" = ""
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3.3: Boilerplate**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
ln -s ../../secrets ~/code/infra/.worktrees/drone-logbook/stacks/drone-logbook/secrets
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 3.4: Format check**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
terraform fmt -check -diff $WT/stacks/drone-logbook/ || terraform fmt $WT/stacks/drone-logbook/
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: no diff (or auto-fixed).
|
|
||||||
|
|
||||||
- [ ] **Step 3.5: Commit on the branch (files by name, git-crypt filter flags per execution.md)**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git -C $WT -c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false \
|
|
||||||
add docs/plans/2026-07-04-drone-logbook-design.md docs/plans/2026-07-04-drone-logbook-plan.md \
|
|
||||||
stacks/drone-logbook/main.tf stacks/drone-logbook/terragrunt.hcl stacks/drone-logbook/secrets \
|
|
||||||
.claude/reference/service-catalog.md
|
|
||||||
git -C $WT -c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false \
|
|
||||||
commit -m "drone-logbook: new stack — self-hosted Open DroneLog at dronelog.viktorbarzin.me
|
|
||||||
|
|
||||||
Viktor asked to self-host the DJI flight-log analyzer for his DJI Mini 4 Pro
|
|
||||||
(fork ViktorBarzin/drone-logbook -> upstream arpanghosh8453/open-dronelog).
|
|
||||||
Upstream ghcr image with Keel auto-upgrade, DuckDB data on proxmox-lvm PVC,
|
|
||||||
NFS /sync-logs drop folder auto-imported every 8h, Authentik-gated ingress,
|
|
||||||
PROFILE_CREATION_PASS from Vault via ESO. Design + plan in docs/plans/."
|
|
||||||
```
|
|
||||||
|
|
||||||
### Task 4: Land and apply
|
|
||||||
|
|
||||||
- [ ] **Step 4.1: Presence claim** (CI apply mutates shared infra)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
~/code/scripts/presence claim infra:drone-logbook --purpose "deploy new drone-logbook stack (Open DroneLog) via CI apply"
|
|
||||||
```
|
|
||||||
|
|
||||||
- [ ] **Step 4.2: Merge latest master into the branch, push to master**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git -C $WT -c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false fetch forgejo
|
|
||||||
git -C $WT -c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false merge forgejo/master
|
|
||||||
git -C $WT -c filter.git-crypt.smudge=cat -c filter.git-crypt.clean=cat -c filter.git-crypt.required=false push forgejo HEAD:master
|
|
||||||
```
|
|
||||||
|
|
||||||
Non-fast-forward → another agent landed first: fetch, merge, push again. Branch-protection rejection → fall back to PR via Forgejo API (token = password in `~/.git-credentials`).
|
|
||||||
|
|
||||||
- [ ] **Step 4.3: Watch the CI apply to completion** — Woodpecker pipeline on the infra repo (`ci.viktorbarzin.me`), then confirm live:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl get ns drone-logbook && kubectl -n drone-logbook get deploy,pvc,pods,externalsecret,cronjob
|
|
||||||
kubectl -n drone-logbook rollout status deploy/drone-logbook --timeout=300s
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: namespace present, ExternalSecret `SecretSynced`, data PVC `Bound` (the NFS PVCs bind on first pod/job use), CronJob `drone-logbook-backup` scheduled `30 1 * * *`, pod `Running 1/1`.
|
|
||||||
|
|
||||||
- [ ] **Step 4.4: Cleanup worktree + branch; release presence**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git -C ~/code/infra worktree remove .worktrees/drone-logbook
|
|
||||||
git -C ~/code/infra branch -d wizard/drone-logbook
|
|
||||||
git -C ~/code/infra pull --ff-only # only if main checkout clean/quiescent
|
|
||||||
~/code/scripts/presence release infra:drone-logbook
|
|
||||||
```
|
|
||||||
|
|
||||||
### Task 5: End-to-end verification
|
|
||||||
|
|
||||||
- [ ] **Step 5.1: Ingress + Authentik gate**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -sI https://dronelog.viktorbarzin.me | head -5
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: `302` redirect into Authentik (NOT `200`, NOT `404`).
|
|
||||||
|
|
||||||
- [ ] **Step 5.2: App alive behind the gate** (bypass ingress via port-forward, read-only debug)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl -n drone-logbook port-forward svc/drone-logbook 18080:80 &
|
|
||||||
sleep 2 && curl -s -o /dev/null -w '%{http_code}\n' http://127.0.0.1:18080/ && kill %1
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: `200`.
|
|
||||||
|
|
||||||
- [ ] **Step 5.3: Sync folder visible in-pod**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl -n drone-logbook exec deploy/drone-logbook -- ls -ld /sync-logs /data/drone-logbook
|
|
||||||
```
|
|
||||||
|
|
||||||
Expected: both directories listed; `/sync-logs` read-only mount.
|
|
||||||
|
|
||||||
- [ ] **Step 5.4: Monitor + homepage** — Uptime Kuma external monitor for `dronelog.viktorbarzin.me` auto-created (ingress annotation); homepage tile under "Media & Entertainment".
|
|
||||||
|
|
||||||
- [ ] **Step 5.5: Functional import** — Viktor uploads a real Mini 4 Pro `.txt` log via the web UI (or drops it in `/srv/nfs/drone-logbook/sync-logs`); confirms flight appears with charts/map. Requires pod egress to DJI once per new log (decryption key). If an upstream sample log is available, the agent may pre-verify import via the REST API through the port-forward.
|
|
||||||
|
|
@ -1,125 +0,0 @@
|
||||||
# immich-frame: LAN-only access, Portals untouched (2026-07-04)
|
|
||||||
|
|
||||||
## Goal
|
|
||||||
|
|
||||||
Strangers must no longer be able to view `highlights-immich.viktorbarzin.me`
|
|
||||||
(Viktor's London Portal Plus frame) or `highlights-immich-emo.viktorbarzin.me`
|
|
||||||
(Emo's Sofia Portal Mini frame) — pages or ImmichFrame API. Both were
|
|
||||||
`auth = "none"`, Cloudflare-proxied, fully public.
|
|
||||||
|
|
||||||
Who keeps access (per Viktor, this session): the two Portals plus **any
|
|
||||||
household device on the Sofia, London, or Valchedrym home networks**. No
|
|
||||||
public access, no tailnet requirement. Hard constraint: the Portal app is a
|
|
||||||
WebView with the URL **baked in at APK build time** (`portal-immich-frame`,
|
|
||||||
`-PframeUrl`), so the exact URLs must keep loading from where the Portals sit
|
|
||||||
— zero app rebuilds, zero device touches, zero router changes.
|
|
||||||
|
|
||||||
## Design
|
|
||||||
|
|
||||||
Two cooperating pieces — the gate and the reachability pointer:
|
|
||||||
|
|
||||||
1. **The gate — `home-lans-only` Traefik middleware** (traefik stack, next to
|
|
||||||
`local-only`): `ipAllowList` of `192.168.1.0/24` (Sofia LAN), `10.0.0.0/8`
|
|
||||||
(VLANs, K8s pods `10.10.0.0/16`, services `10.96.0.0/12`, WG tunnel
|
|
||||||
`10.3.2.0/24`), `192.168.8.0/24` (London LAN), `192.168.9.0/24` (London
|
|
||||||
GUEST net — post-rollout discovery: the Portal Plus actually leases here,
|
|
||||||
`Portal-75AE8F9C2A8A` = `192.168.9.198`, added same day), `192.168.0.0/24`
|
|
||||||
(Valchedrym LAN), `fc00::/7`, `fe80::/10`. Attached to both frame
|
|
||||||
ingresses via `extra_middlewares`. Everyone else gets a Traefik 403 —
|
|
||||||
including direct-to-WAN-IP requests carrying the right SNI, which DNS
|
|
||||||
changes alone cannot stop. A **separate** middleware rather than a widened
|
|
||||||
`local-only`, because widening would silently grant the remote LANs access
|
|
||||||
to the 9 admin surfaces using it (Prometheus, iDRAC, Loki, …).
|
|
||||||
|
|
||||||
2. **The pointer — `dns_type = "internal"`** (new `ingress_factory` tier,
|
|
||||||
Viktor's idea): a **non-proxied public A record → `10.0.20.203`** (module
|
|
||||||
var `internal_lb_ip`). Outsiders resolve it but get an unroutable RFC1918
|
|
||||||
address; every household resolver path delivers a working answer with no
|
|
||||||
config anywhere: Sofia LAN already gets the internal CNAME from Technitium,
|
|
||||||
London/Valchedrym resolve the public record via any upstream and
|
|
||||||
policy-route `10.0.0.0/8` down the WireGuard tunnel. IPv4-only (spokes
|
|
||||||
route no internal v6 range).
|
|
||||||
|
|
||||||
Interlock (the reason both flip together): with a *proxied* record, public
|
|
||||||
traffic arrives from cloudflared **pod IPs inside 10/8** and would sail
|
|
||||||
through the allowlist. `internal` removes the Cloudflare path entirely (CF
|
|
||||||
edge stops serving the hostname), so every request reaches Traefik with its
|
|
||||||
real source IP (ETP=Local). Verified: no wildcard `*.viktorbarzin.me` record
|
|
||||||
exists to resurrect public resolution.
|
|
||||||
|
|
||||||
`auth` stays `"none"` — there is still no *user* auth by design (kiosk
|
|
||||||
WebView; forward-auth would 302 the device to a login it can't complete, and
|
|
||||||
emo's Google-only account can't log in inside a WebView at all); the
|
|
||||||
convention comment now names the ipAllowList as the gate.
|
|
||||||
|
|
||||||
### Resulting flows
|
|
||||||
|
|
||||||
| Client | Path | Result |
|
|
||||||
|---|---|---|
|
|
||||||
| Emo's Portal Mini (Sofia LAN) | Technitium CNAME → `.203` direct (unchanged) | allowed (`192.168.1.x`) |
|
|
||||||
| Viktor's Portal Plus (London GUEST net) | public A → `10.0.20.203` → WG tunnel | allowed (`192.168.9.x`) |
|
|
||||||
| Household browsers (any of the 3 LANs) | same as above | allowed |
|
|
||||||
| In-cluster checks (`homelab browser`, blackbox) | CoreDNS → Technitium → `.203` | allowed (pod IP in 10/8) |
|
|
||||||
| Stranger, resolves hostname | gets `10.0.20.203` | unroutable |
|
|
||||||
| Stranger, hits WAN IP with SNI | pfSense NAT → Traefik (real source IP) | **403** |
|
|
||||||
| Stranger, via Cloudflare | no proxied record | CF edge won't serve the host |
|
|
||||||
|
|
||||||
### Rejected alternatives
|
|
||||||
|
|
||||||
- **ImmichFrame `AuthenticationSecret`** (supported upstream: web input field
|
|
||||||
or `?authsecret=` param + bearer API): real auth from anywhere, but family
|
|
||||||
browsers would face a secret prompt (fails "household devices just work"),
|
|
||||||
the secret leaks into URLs/analytics/APK, and robust rollout needs APK
|
|
||||||
rebuild + USB-adb sideload on both Portals (the Sofia one is high-friction).
|
|
||||||
- **Authentik forward-auth / `auth = "public"`**: WebView can't complete SSO
|
|
||||||
(Google blocks WebView logins; session expiry silently bricks an appliance);
|
|
||||||
the anonymous outpost is an audit trail, not a gate.
|
|
||||||
- **Remove DNS + London router AdGuardHome rewrites**: works, but adds an
|
|
||||||
out-of-band, un-IaC'd router dependency the internal-IP record makes
|
|
||||||
unnecessary. Kept as documented fallback if resolver-side private-IP
|
|
||||||
filtering ever appears in the London path.
|
|
||||||
|
|
||||||
## Pre-verified facts (2026-07-04)
|
|
||||||
|
|
||||||
- London Flint 2 DNS chain returns RFC1918 answers unfiltered
|
|
||||||
(`nslookup 10.0.20.203.nip.io 127.0.0.1` on the router → `10.0.20.203`;
|
|
||||||
dnsmasq `rebind_protection '0'`, no AdGuardHome rebind filtering).
|
|
||||||
- Technitium already CNAMEs both hostnames → apex → `10.0.20.203`
|
|
||||||
(`technitium-ingress-dns-sync` is ingress-driven, not DNS-record-driven, so
|
|
||||||
the internal answer survives the Cloudflare record swap).
|
|
||||||
- Pod CIDR `10.10.0.0/16`, service CIDR `10.96.0.0/12` — inside `10.0.0.0/8`.
|
|
||||||
- No public wildcard record in the zone.
|
|
||||||
|
|
||||||
## Blast radius & cleanups
|
|
||||||
|
|
||||||
- `external_monitor = false` set explicitly on both ingresses: the
|
|
||||||
external-monitor-sync default opt-in would otherwise keep the now-doomed
|
|
||||||
`[External] highlights-immich*` uptime-kuma monitors alive and red. Verify
|
|
||||||
the sync drops them post-apply.
|
|
||||||
- rybbit CF worker: `highlights-immich` removed from `SITE_IDS` (`index.js`)
|
|
||||||
and `wrangler.toml` routes — off Cloudflare the route can never fire.
|
|
||||||
Requires a `wrangler deploy` to take effect (route removal is hygiene, not
|
|
||||||
functional).
|
|
||||||
- Homepage dashboard link keeps working from LANs (hostname unchanged).
|
|
||||||
- Docs updated in the same change: `.claude/CLAUDE.md` (DNS tier +
|
|
||||||
external-monitor mechanism), `AGENTS.md`, `docs/architecture/networking.md`
|
|
||||||
(Internal-IP domains category). The `portal-immich-frame` repo's glossary
|
|
||||||
("public, login-less URL") updated separately in that repo.
|
|
||||||
|
|
||||||
## Failure-mode delta
|
|
||||||
|
|
||||||
London frame now depends on the WG tunnel instead of Cloudflare+cloudflared
|
|
||||||
(the app self-heals with 5s retries; tunnel-flap modes documented in
|
|
||||||
`docs/architecture/vpn.md`). A Traefik LB renumber must update
|
|
||||||
`internal_lb_ip` in the module alongside the split-horizon apex record.
|
|
||||||
Cutover window: cached proxied answers keep working ≤ ~5 min TTL, then the
|
|
||||||
WebView's own retry picks up the new path.
|
|
||||||
|
|
||||||
## Verification & rollback
|
|
||||||
|
|
||||||
Verify: public dig → `10.0.20.203` (both hosts); Technitium dig → `.203`;
|
|
||||||
curl from devvm (10/8) → 200; external vantage (WebFetch/cloud) → unreachable
|
|
||||||
or 403; middleware attached on both ingresses; Emo's frame renders via
|
|
||||||
`homelab browser`; London Portal image fetches visible in Traefik access logs
|
|
||||||
from `192.168.8.x`. Rollback: `git revert` + apply traefik/immich — records
|
|
||||||
and middleware chain restore (`allow_overwrite = true` re-adopts the records).
|
|
||||||
|
|
@ -129,40 +129,3 @@ heavy user between 12–16G even with RAM free; bump to 16/20 if that bites.
|
||||||
storm also neuters oomd. earlyoom (free-RAM threshold, swap-independent) is the
|
storm also neuters oomd. earlyoom (free-RAM threshold, swap-independent) is the
|
||||||
correct pairing. A famous tool that "does OOM" still has to be proven to fire
|
correct pairing. A famous tool that "does OOM" still has to be proven to fire
|
||||||
under *your* configuration.
|
under *your* configuration.
|
||||||
|
|
||||||
## Addendum (2026-07-02): the MemoryHigh throttle band livelocks — removed
|
|
||||||
|
|
||||||
The soft-cap layer of this design was falsified in production on 2026-07-02
|
|
||||||
(~15:42–16:35 UTC): an agent-spawned `ugrep` (12.35G RSS; `-o` with wide
|
|
||||||
alternation captures over a multi-GB `.jsonl` transcript) **plateaued inside
|
|
||||||
t3-serve@wizard's `MemoryHigh=12G..MemoryMax=16G` band**. With
|
|
||||||
`MemorySwapMax=0` its anonymous pages were unreclaimable, so the kernel parked
|
|
||||||
every allocating task of the cgroup in `mem_cgroup_handle_over_high`
|
|
||||||
(`memory.pressure full avg60 ≈ 80%`, `memory.events high=882948`, `oom_kill=0`)
|
|
||||||
— including the `t3 serve` event loop (~0.5G RSS, pure collateral). The accept
|
|
||||||
queue backed up (21 pending connections), t3-probe logged `t3serve: [Errno 104]
|
|
||||||
Connection reset by peer`, t3-dispatch logged `proxy error: context canceled`,
|
|
||||||
and t3.viktorbarzin.me was dead for its user until the hog was SIGKILLed by
|
|
||||||
hand (the D-state high-throttle sleep IS killable; the cgroup dropped 14G→1.4G
|
|
||||||
and the service recovered in seconds with no restart).
|
|
||||||
|
|
||||||
The Verification bullet above — a soft-capped balloon "throttled to a crawl,
|
|
||||||
making no progress and **harming nothing**" — holds only when the hog is alone
|
|
||||||
in its cgroup. Sharing the cgroup with a latency-sensitive server, the crawl
|
|
||||||
IS the harm: a hog that stabilises below `MemoryMax` never triggers the local
|
|
||||||
OOM the design counted on, so the band converts "runaway dies" into "everyone
|
|
||||||
in the cgroup stalls forever".
|
|
||||||
|
|
||||||
**Fix (same day, admin-approved): `MemoryHigh=infinity` on all three work
|
|
||||||
cgroup definitions** — `scripts/t3-serve@.service`, the `user-.slice.d`
|
|
||||||
drop-in, and `docker.slice` (`setup-devvm.sh` §10a/§10c). A runaway now runs
|
|
||||||
unthrottled into `MemoryMax` and is cgroup-OOM-killed immediately
|
|
||||||
(`OOMPolicy=continue` keeps t3-serve itself alive; in slices the kernel kills
|
|
||||||
the biggest task). `MemoryMax`, `MemorySwapMax=0`, and earlyoom — the layers
|
|
||||||
the stress tests actually validated — are unchanged. Applied live via
|
|
||||||
`daemon-reload` + runtime `set-property` on the running cgroups; no session
|
|
||||||
restarts.
|
|
||||||
|
|
||||||
Lesson: **with `swap=0`, `memory.high` is not a gentler `memory.max` — it is
|
|
||||||
an unbounded stall injector for everything sharing the cgroup.** Cap-and-kill
|
|
||||||
beats throttle-and-pray for multi-tenant interactive services.
|
|
||||||
|
|
|
||||||
|
|
@ -1,135 +0,0 @@
|
||||||
# Paperless-ngx Mail Ingest (docs@viktorbarzin.me)
|
|
||||||
|
|
||||||
Last updated: 2026-07-03 (initial build)
|
|
||||||
|
|
||||||
Forward any email with document attachments to **`docs@viktorbarzin.me`** and
|
|
||||||
paperless-ngx ingests the attachments, owned by the paperless account mapped
|
|
||||||
from the **sender** (From) address. Built entirely from existing parts: a
|
|
||||||
docker-mailserver mailbox + Dovecot sieve, and paperless-ngx's native mail
|
|
||||||
consumer (the same machinery as the `utility:` rules).
|
|
||||||
|
|
||||||
## Flow
|
|
||||||
|
|
||||||
```
|
|
||||||
family member forwards email ──> MX ──> docker-mailserver
|
|
||||||
│ postfix virtual: docs@ has an explicit self-alias (extra/aliases.txt),
|
|
||||||
│ so the @domain catch-all (→ spam@, swept by TripIt) does NOT apply
|
|
||||||
▼
|
|
||||||
Dovecot LMTP delivery to docs@
|
|
||||||
│ per-user sieve (docs@viktorbarzin.me.dovecot.sieve): sender NOT in
|
|
||||||
│ allowlist → discard (decision 2026-07-03: unmatched = ignore & delete)
|
|
||||||
▼
|
|
||||||
docs@ INBOX ── paperless-ngx mail task (every 10 min, PAPERLESS_EMAIL_TASK_CRON
|
|
||||||
│ default) applies mail rules in order: filter_from = <sender>
|
|
||||||
│ → consume attachments (ALL parts incl. inline — see design
|
|
||||||
│ notes: Apple Mail marks real PDFs inline), owner = mapped user,
|
|
||||||
│ tag = email-ingest, title = mail subject
|
|
||||||
▼
|
|
||||||
consumed mail is MOVED to the "Processed" IMAP folder (audit trail);
|
|
||||||
INBOX stays empty in steady state
|
|
||||||
```
|
|
||||||
|
|
||||||
## Sender → paperless account map (as built)
|
|
||||||
|
|
||||||
| Sender (From) | Paperless user | Rule |
|
|
||||||
|--------------------------|----------------|-----------------|
|
|
||||||
| me@viktorbarzin.me | root (id 3) | forward: Viktor (me@) |
|
|
||||||
| vbarzin@gmail.com | root (id 3) | forward: Viktor (gmail) |
|
|
||||||
| viktorbarzin@meta.com | root (id 3) | forward: Viktor (meta) |
|
|
||||||
| ancaelena98@gmail.com | anca (id 4) | forward: Anca |
|
|
||||||
| emil.barzin@gmail.com | emo (id 7) | forward: Emo |
|
|
||||||
|
|
||||||
The map lives in **two places by design** — keep them in sync:
|
|
||||||
|
|
||||||
1. **Delivery gate (infra, Terraform):**
|
|
||||||
`stacks/mailserver/modules/mailserver/extra/docs-at-viktorbarzin.me.dovecot.sieve`
|
|
||||||
— senders not listed here are discarded at delivery (spam control + the
|
|
||||||
"ignore and delete unmatched" behaviour; paperless cannot express
|
|
||||||
"delete without ingesting", so this must happen before the mailbox).
|
|
||||||
2. **Owner map (paperless DB, via API/UI):** one mail rule per sender on the
|
|
||||||
`docs@viktorbarzin.me` mail account. DB-state like workflows — NOT
|
|
||||||
Terraform.
|
|
||||||
|
|
||||||
## Add a family member / sender
|
|
||||||
|
|
||||||
1. Add the address to the sieve allowlist file above; commit; apply the
|
|
||||||
`mailserver` stack (normal apply is enough — the sieve CM key is not under
|
|
||||||
`ignore_changes`; Reloader restarts the pod).
|
|
||||||
2. Clone an existing `forward:` mail rule in the paperless admin UI
|
|
||||||
(Mail → Rules) or via API, changing `filter_from` and the rule **owner**
|
|
||||||
(documents are owned by the rule owner — `assign_owner_from_rule=true`).
|
|
||||||
Keep: action = Move to `Processed`, attachment type = **process all files
|
|
||||||
including inline** (`attachment_type=2` — NOT attachments-only, see design
|
|
||||||
notes), consumption scope = attachments only, tag `email-ingest`, order
|
|
||||||
after the existing rules.
|
|
||||||
|
|
||||||
## Operations
|
|
||||||
|
|
||||||
- **Trigger a fetch immediately** (instead of waiting ≤10 min):
|
|
||||||
`kubectl -n paperless-ngx exec deploy/paperless-ngx -c paperless-ngx -- s6-setuidgid paperless python3 manage.py mail_fetcher`
|
|
||||||
The `s6-setuidgid paperless` is **required**: `kubectl exec` runs as root, and a
|
|
||||||
root-run fetcher downloads attachments root-owned into the scratch dir, which
|
|
||||||
the celery consumer (uid 1000) then can't read — `PermissionError` on
|
|
||||||
`/tmp/paperless/paperless-mail-*/...`, consume task FAILURE (hit during the
|
|
||||||
2026-07-03 build E2E). The mail correctly stays in INBOX for retry (the move
|
|
||||||
action is a chord callback on successful consumption). Recover: `rm -rf
|
|
||||||
/tmp/paperless/paperless-mail-*` (as root) and let the next scheduled fetch
|
|
||||||
re-process.
|
|
||||||
- **Mailbox credentials:** Vault `secret/platform` → `mailserver_accounts`
|
|
||||||
JSON, key `docs@viktorbarzin.me` (also used by the paperless mail account).
|
|
||||||
- **Inspect the mailbox:**
|
|
||||||
`python3 -c` IMAP to `mailserver.mailserver.svc.cluster.local:993` (in-cluster,
|
|
||||||
from a pod) or `mail.viktorbarzin.me:993` (externally / devvm).
|
|
||||||
- **Paperless-side logs:** `kubectl -n paperless-ngx logs deploy/paperless-ngx | grep -i mail`
|
|
||||||
(also Loki, ns `paperless-ngx`). Rule/account state: `GET /api/mail_rules/`,
|
|
||||||
`GET /api/mail_accounts/` with the admin token
|
|
||||||
(k8s secret `paperless-ngx-secrets`, field `api_token`).
|
|
||||||
- **Account/mailbox provisioning:** adding/rotating anything in
|
|
||||||
`mailserver_accounts` requires the ConfigMap replace workaround —
|
|
||||||
`scripts/tg apply mailserver -- -replace=module.mailserver.kubernetes_config_map.mailserver_config`
|
|
||||||
— because `postfix-accounts.cf` is under `ignore_changes`
|
|
||||||
(non-deterministic bcrypt; see the module comment).
|
|
||||||
|
|
||||||
## Design notes / caveats
|
|
||||||
|
|
||||||
- **Why not the catch-all?** Mail to unknown `@viktorbarzin.me` addresses
|
|
||||||
lands in `spam@`, which the TripIt `ingest-plans` CronJob sweeps every
|
|
||||||
15 min: it marks everything `\Seen`, LLM-parses mail from linked senders and
|
|
||||||
replies with ack/failure emails. Forwarded bank statements would get
|
|
||||||
"couldn't parse a trip" replies. `docs@` being a real mailbox bypasses that
|
|
||||||
path entirely; TripIt, the `smoke-test@` roundtrip probe, and `dmarc@` are
|
|
||||||
untouched.
|
|
||||||
- **Spoofing:** the sender match is on the From header. Rspamd verifies
|
|
||||||
SPF/DKIM/DMARC on inbound mail, but gmail.com publishes `p=none`, so a
|
|
||||||
crafted spoof could ingest documents into a family member's account. Accepted
|
|
||||||
risk (worst case: unwanted documents appear, visible + deletable in
|
|
||||||
paperless).
|
|
||||||
- **Not PDF-only:** any attachment type paperless supports is consumed
|
|
||||||
(PDF, images, Office via the existing tika+gotenberg pipeline).
|
|
||||||
- **Inline attachments ARE processed (`attachment_type=2`, flipped
|
|
||||||
2026-07-03):** the rules originally used attachments-only (1) to skip
|
|
||||||
signature logos, but the very first real forward (Apple Mail, Viktor's
|
|
||||||
client) attached the invoice PDF with `Content-Disposition: inline` —
|
|
||||||
paperless matched the rule, consumed nothing, and recorded
|
|
||||||
`PROCESSED_WO_CONSUMPTION` (which, like any ProcessedMail row, blocks that
|
|
||||||
UID from ever being re-processed — delete the row via `manage.py shell` to
|
|
||||||
retry). Trade-off: signature/inline images in forwards may be ingested as
|
|
||||||
junk docs (tagged `email-ingest`, easy to spot). If that gets noisy, add
|
|
||||||
`filter_attachment_filename_exclude` patterns to the rules using the
|
|
||||||
actually-observed junk filenames — do NOT flip back to attachments-only.
|
|
||||||
- **No dedicated alerting** (deliberate, 2026-07-03): mail-task errors surface
|
|
||||||
in paperless logs; the mailserver inbound path is covered by
|
|
||||||
`email-roundtrip-monitor`. Revisit if forwards start silently failing.
|
|
||||||
- **Workflows:** the global `payslip-webhook` + `claude-mcp-readers
|
|
||||||
auto-permission` workflows fire for mail-ingested docs like any other
|
|
||||||
consumption source (verified pre-build; payslip receiver does its own
|
|
||||||
filtering).
|
|
||||||
|
|
||||||
## Rollback
|
|
||||||
|
|
||||||
1. Disable/delete the 5 `forward:` mail rules + the `docs@` mail account
|
|
||||||
(paperless admin UI or API).
|
|
||||||
2. Revert the infra commit (aliases.txt entry, sieve file, CM key + mount).
|
|
||||||
3. Remove `docs@viktorbarzin.me` from Vault `mailserver_accounts`, then apply
|
|
||||||
with the `-replace` workaround above. Mail to docs@ then falls back to the
|
|
||||||
catch-all (spam@) like any unknown address.
|
|
||||||
|
|
@ -109,17 +109,10 @@ rate(node_pressure_memory_stalled_seconds_total{instance="devvm"}[5m])
|
||||||
node_memory_SwapFree_bytes{instance="devvm"}
|
node_memory_SwapFree_bytes{instance="devvm"}
|
||||||
```
|
```
|
||||||
|
|
||||||
Guardrails in place (2026-06-10, hardened 2026-07-02; `scripts/t3-serve@.service`):
|
Guardrails in place (2026-06-10, `scripts/t3-serve@.service`): per-unit
|
||||||
per-unit `MemoryMax=16G`, `MemorySwapMax=0`, `OOMPolicy=continue`, and
|
`MemoryHigh=12G`, `MemoryMax=16G`, `MemorySwapMax=0`, `OOMPolicy=continue` —
|
||||||
`MemoryHigh=infinity` — deliberately NO soft throttle band. With swap=0, a hog
|
a runaway agent now OOMs alone inside the cgroup instead of taking the box
|
||||||
plateauing between high and max never OOMs and the kernel high-throttle stalls
|
(and the WS server) with it.
|
||||||
the whole unit: a 12.3G agent `ugrep` livelocked t3-serve@wizard for ~50min on
|
|
||||||
2026-07-02 (signature: probe `t3serve` leg `Connection reset by peer`, dispatch
|
|
||||||
`proxy error: context canceled`, server D-state in `mem_cgroup_handle_over_high`,
|
|
||||||
`ss` backlog on the serve port; fix: SIGKILL the hog — the D-state is killable).
|
|
||||||
A runaway agent now OOMs alone at 16G inside the cgroup instead of throttling
|
|
||||||
the WS server with it. Post-mortem addendum:
|
|
||||||
`docs/post-mortems/2026-06-22-devvm-mem-io-overload-containment.md`.
|
|
||||||
|
|
||||||
## 4. Known root causes (2026-06-10 investigation)
|
## 4. Known root causes (2026-06-10 investigation)
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,98 +0,0 @@
|
||||||
# Valia sites — add / update / retire
|
|
||||||
|
|
||||||
Off-infra static sites authored by Valia (ADR-0018, CONTEXT.md "Valia site").
|
|
||||||
Serving: Cloudflare Pages. Freshness: the `valia-sites-sync` CronJob
|
|
||||||
(`valia-sites` ns) mirrors each Content folder every 10 minutes and deploys
|
|
||||||
only when the folder's manifest hash changed. Registry: `local.sites` in
|
|
||||||
`stacks/valia-sites/main.tf` — one entry per site drives everything (Pages
|
|
||||||
project, custom domain, public CNAME, internal split-horizon CNAME, sync).
|
|
||||||
|
|
||||||
Current sites: `bridge` (ОбУ „Отец Паисий“ — "мост"), `stem95su` (95. СУ STEM
|
|
||||||
board).
|
|
||||||
|
|
||||||
## Add a site
|
|
||||||
|
|
||||||
1. Valia shares the Drive folder with **vbarzin@gmail.com** (viewer is enough —
|
|
||||||
the pipeline is strictly read-only towards Drive).
|
|
||||||
2. Get the folder id from its URL (`drive.google.com/drive/folders/<ID>`).
|
|
||||||
3. Pick the **English** subdomain name (Viktor's call — CONTEXT.md naming rule).
|
|
||||||
4. Add one entry to `local.sites` in `stacks/valia-sites/main.tf`:
|
|
||||||
|
|
||||||
```hcl
|
|
||||||
<name> = {
|
|
||||||
folder_id = "<ID>"
|
|
||||||
src_path = "" # or "sub/folder" if servable files live deeper
|
|
||||||
entry_file = "index.html" # or whatever her main HTML file is called
|
|
||||||
manage_dns = true
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
5. Commit + push; CI applies. Within ~10 min the sync deploys content and the
|
|
||||||
site serves at `https://<name>.viktorbarzin.me` (custom-domain TLS takes
|
|
||||||
~5–10 min extra on first attach — CF returns 522 for the hostname until
|
|
||||||
then). Internal LAN/VLAN/pod resolution appears when the hourly
|
|
||||||
`technitium-ingress-dns-sync` next runs — trigger it early with:
|
|
||||||
`kubectl create job --from=cronjob/technitium-ingress-dns-sync valia-dns-now -n technitium`
|
|
||||||
|
|
||||||
## Content rules (what Valia's folder must look like)
|
|
||||||
|
|
||||||
- The **entry file** must exist — the sync stages a copy as `index.html` at
|
|
||||||
deploy time, so `/` works; the original filename keeps working too (deep
|
|
||||||
links survive). If the folder is empty or the entry file is missing, the
|
|
||||||
sync **skips the site and leaves it as-is** (never wipes a live site).
|
|
||||||
- Google-native files (Docs/Sheets) are **ignored** (`--drive-skip-gdocs`) —
|
|
||||||
only real files (`.html`, images, …) deploy. Gemini's HTML exports are fine.
|
|
||||||
- Per-file limit 25 MB (Cloudflare Pages), 20k files max — far beyond a
|
|
||||||
1-page site.
|
|
||||||
|
|
||||||
## Update a site
|
|
||||||
|
|
||||||
Nothing to do: Valia edits the folder, the site follows within ~10 minutes.
|
|
||||||
Force it early: `kubectl create job --from=cronjob/valia-sites-sync sync-now -n valia-sites`
|
|
||||||
|
|
||||||
## Rename / retire a site
|
|
||||||
|
|
||||||
Rename = retire + add (Pages projects can't be renamed). Retire:
|
|
||||||
|
|
||||||
1. Delete the entry from `local.sites`; commit + push. TF destroys the public
|
|
||||||
CNAME + custom domain + Pages project; the internal record is removed by
|
|
||||||
the next `technitium-ingress-dns-sync` run (its deletion pass drops any
|
|
||||||
internal `*.pages.dev` CNAME that left the `valia-sites-dns` ConfigMap —
|
|
||||||
scoped so it can never touch non-Pages records).
|
|
||||||
2. That's all — no manual DNS cleanup (the pre-ADR-0018 add-only gotcha is
|
|
||||||
fixed by the deletion pass).
|
|
||||||
|
|
||||||
## Failure modes / debugging
|
|
||||||
|
|
||||||
- **Visibility is failed-Job-only by choice** (ADR-0018): no alerts, no
|
|
||||||
notifications. Check: `kubectl get jobs -n valia-sites | tail`, logs of the
|
|
||||||
last `valia-sites-sync-*` pod.
|
|
||||||
- **Drive auth broken** (`FATAL … Drive list failed`): the shared
|
|
||||||
`secret/valia-sites.rclone_conf` token died. The GCP OAuth app
|
|
||||||
(`home-lab-1700868541205`) must stay published to "Production" or refresh
|
|
||||||
tokens expire weekly (same constraint as the old stem95su conf, which this
|
|
||||||
one was copied from). Re-mint and `vault kv patch secret/valia-sites
|
|
||||||
rclone_conf=@…`.
|
|
||||||
- **Wrangler auth broken**: `secret/valia-sites.cloudflare_pages_token` is a
|
|
||||||
SCOPED token (Pages Read+Write on the account, id
|
|
||||||
`355d2c9d11579bdad1e9498dafca30d5`) — re-mint via
|
|
||||||
`POST /user/tokens` with the Global API Key (`secret/platform`), patch
|
|
||||||
Vault. Do NOT put the Global API Key in the pod.
|
|
||||||
- **Site serves stale content**: check the state CM
|
|
||||||
(`kubectl get cm valia-sites-state -n valia-sites -o yaml`) — deleting a
|
|
||||||
site's key forces a redeploy on the next run.
|
|
||||||
- **`GUARD … skipping`** in logs: Valia's folder is empty or renamed the
|
|
||||||
entry file — the site deliberately kept its last content. Fix the folder or
|
|
||||||
update `entry_file`.
|
|
||||||
|
|
||||||
## History
|
|
||||||
|
|
||||||
- stem95su served in-cluster (nginx + NFS + its own rclone CronJob) until
|
|
||||||
2026-07-03, when it was cut over to this pattern and the old stack retired
|
|
||||||
(ADR-0018). The blocking 42.9 MB `stem_video.mp4` was compressed to 21.4 MB
|
|
||||||
(same 1080p, ~2.5 Mbps H.264) and replaced in Valia's folder with Viktor's
|
|
||||||
explicit one-time OK. `secret/stem95su` is superseded by
|
|
||||||
`secret/valia-sites`; `/srv/nfs/stem-site` on the PVE host is a harmless
|
|
||||||
leftover.
|
|
||||||
- bridge started as a hand-deployed wrangler experiment (2026-07-03, memory
|
|
||||||
id 7085) and was adopted into the stack the same day.
|
|
||||||
|
|
@ -82,48 +82,33 @@ tail -5 ~/.local/state/vault-token-renew.log # recent results
|
||||||
A healthy log line looks like:
|
A healthy log line looks like:
|
||||||
`<ts> OK renewed (dn=token-devvm-wizard ttl=2764800s)` (ttl 2764800s = 768h).
|
`<ts> OK renewed (dn=token-devvm-wizard ttl=2764800s)` (ttl 2764800s = 768h).
|
||||||
|
|
||||||
After an OIDC login you'll instead see, at the next nightly run:
|
## Drift guard & recovery
|
||||||
`<ts> HEALED: re-minted periodic token from foreign dn=oidc-… (revoked N stale periodic token(s))`
|
|
||||||
— that's the self-heal working as designed.
|
|
||||||
|
|
||||||
## Drift guard & self-heal
|
|
||||||
|
|
||||||
`~/.vault-token` is the Vault CLI's default token sink, so **any** `vault login`
|
`~/.vault-token` is the Vault CLI's default token sink, so **any** `vault login`
|
||||||
overwrites it. Two confirmed clobber vectors:
|
overwrites it. Two confirmed clobber vectors:
|
||||||
|
|
||||||
1. `vault login -method=oidc` → replaces it with a 7-day OIDC token (the renewer
|
1. `vault login -method=oidc` → replaces it with a 7-day OIDC token (the renewer
|
||||||
can't push past the OIDC role's 7-day `token_max_ttl`). The infra docs
|
can't push past the OIDC role's 7-day `token_max_ttl`).
|
||||||
prescribe this login before applies, so it recurs — it went unnoticed for
|
|
||||||
weeks twice (2026-06-18→26, 2026-06-29→07-03) and read as "Vault expires
|
|
||||||
weekly".
|
|
||||||
2. A stray `vault login -method=kubernetes` (e.g. a headless agent flow) →
|
2. A stray `vault login -method=kubernetes` (e.g. a headless agent flow) →
|
||||||
writes a read-only `kubernetes-woodpecker-default` token (can read Vault but
|
writes a read-only `kubernetes-woodpecker-default` token (can read Vault but
|
||||||
**cannot** write `secret/*`). Happened 2026-06-05, unnoticed for two days.
|
**cannot** write `secret/*`). This happened 2026-06-05 and went unnoticed for
|
||||||
|
two days — reads worked, writes silently 403'd.
|
||||||
|
|
||||||
Since 2026-07-03 the renewer **self-heals**
|
To stop the renewer from silently keeping a foreign token alive, it runs a
|
||||||
(`docs/plans/2026-07-03-vault-token-self-heal-design.md`). On a foreign token
|
**drift guard** first: it refuses to renew unless the token is
|
||||||
it attempts the re-mint **with the clobbering token's own authority** and lets
|
`token-devvm-wizard` **and** carries `vault-admin`. On drift it logs loudly and
|
||||||
Vault's authz decide:
|
exits non-zero (the systemd unit goes `failed`) rather than renewing someone
|
||||||
|
else's token. Symptom in the log:
|
||||||
|
|
||||||
- **Admin-capable clobber (OIDC login)** → re-mints the periodic token,
|
`<ts> DRIFT: ~/.vault-token is dn=... policies=... Refusing to renew a foreign token. Re-mint: ...`
|
||||||
sanity-checks it against the drift guard, atomically replaces
|
|
||||||
`~/.vault-token`, revokes stale `token-devvm-wizard` leftovers
|
|
||||||
(anti-sprawl), logs
|
|
||||||
`HEALED: re-minted periodic token from foreign dn=… (revoked N stale periodic token(s))`
|
|
||||||
and exits 0. The clobbering token is NOT revoked — it may still back a live
|
|
||||||
login session; it ages out on its own.
|
|
||||||
- **Weak clobber (read-only k8s token)** → the mint is denied; logs
|
|
||||||
`DRIFT: … heal denied, foreign token lacks create authority …; investigate what wrote it`
|
|
||||||
and exits non-zero (unit `failed`). Deliberately loud: this signals a
|
|
||||||
misbehaving agent flow — exactly the 2026-06-05 case.
|
|
||||||
|
|
||||||
**Manual recovery** is only needed for the weak-clobber case (the DRIFT log
|
**Recovery: re-mint** (the DRIFT log line contains the exact command) — run the
|
||||||
line still contains the exact command) — run the
|
[mint/re-mint](#mint--re-mint-the-token) block. The drift guard detects but does
|
||||||
[mint/re-mint](#mint--re-mint-the-token) block.
|
**not** auto-recover (a deliberate scope choice — version-only, no self-heal);
|
||||||
|
recovery is the manual re-mint above.
|
||||||
|
|
||||||
## Tests
|
## Tests
|
||||||
|
|
||||||
`infra/scripts/test-vault-token-renew.sh` unit-tests the drift-guard decision,
|
`infra/scripts/test-vault-token-renew.sh` unit-tests the drift-guard decision
|
||||||
the lookup-JSON parsers (including the exact 2026-06-05 woodpecker-clobber
|
and the lookup-JSON parsers (including the exact 2026-06-05 woodpecker-clobber
|
||||||
case), and the self-heal's revoke filter (which stale periodic tokens a heal
|
case). Run: `bash infra/scripts/test-vault-token-renew.sh`.
|
||||||
may sweep). Run: `bash infra/scripts/test-vault-token-renew.sh`.
|
|
||||||
|
|
|
||||||
|
|
@ -127,29 +127,20 @@ variable "anti_ai_scraping" {
|
||||||
variable "dns_type" {
|
variable "dns_type" {
|
||||||
type = string
|
type = string
|
||||||
default = "none"
|
default = "none"
|
||||||
description = <<-EOT
|
description = "Cloudflare DNS: 'proxied' (CNAME to tunnel), 'non-proxied' (A/AAAA to public IP), or 'none'"
|
||||||
Cloudflare DNS: 'proxied' (CNAME to tunnel), 'non-proxied' (A/AAAA to
|
|
||||||
public IP), 'internal' (A to the internal Traefik LB IP — resolvable from
|
|
||||||
any resolver but only ROUTABLE from home LANs / WG sites / VPN; the record
|
|
||||||
is a reachability pointer, NOT a gate: pair it with an ipAllowList via
|
|
||||||
extra_middlewares, e.g. traefik-home-lans-only@kubernetescrd, because
|
|
||||||
direct-to-WAN-IP requests with the right SNI still hit Traefik), or 'none'.
|
|
||||||
EOT
|
|
||||||
validation {
|
validation {
|
||||||
condition = contains(["proxied", "non-proxied", "internal", "none"], var.dns_type)
|
condition = contains(["proxied", "non-proxied", "none"], var.dns_type)
|
||||||
error_message = "dns_type must be 'proxied', 'non-proxied', 'internal', or 'none'."
|
error_message = "dns_type must be 'proxied', 'non-proxied', or 'none'."
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
# Uptime Kuma external monitor: when true, annotate the ingress so the
|
# Uptime Kuma external monitor: when true, annotate the ingress so the
|
||||||
# external-monitor-sync CronJob creates a `[External] <name>` monitor pointing
|
# external-monitor-sync CronJob creates a `[External] <name>` monitor pointing
|
||||||
# at https://<host>. Null means "follow dns_type" — enabled when the ingress
|
# at https://<host>. Null means "follow dns_type" — enabled when proxied.
|
||||||
# has a PUBLIC DNS record (proxied or non-proxied; 'internal' records are not
|
|
||||||
# externally reachable, so no external monitor).
|
|
||||||
variable "external_monitor" {
|
variable "external_monitor" {
|
||||||
type = bool
|
type = bool
|
||||||
default = null
|
default = null
|
||||||
description = "Enable Uptime Kuma external monitor. null = auto (enabled when dns_type is 'proxied' or 'non-proxied')."
|
description = "Enable Uptime Kuma external monitor. null = auto (enabled when dns_type == 'proxied')."
|
||||||
}
|
}
|
||||||
|
|
||||||
variable "external_monitor_name" {
|
variable "external_monitor_name" {
|
||||||
|
|
@ -180,15 +171,6 @@ variable "public_ipv6" {
|
||||||
default = "2001:470:6e:43d::2"
|
default = "2001:470:6e:43d::2"
|
||||||
}
|
}
|
||||||
|
|
||||||
# Internal Traefik LB IP used by dns_type = "internal" records. Tracks the
|
|
||||||
# dedicated MetalLB IP from stacks/traefik (ETP=Local). A future LB renumber
|
|
||||||
# must update this default alongside the split-horizon apex record — see
|
|
||||||
# docs/plans/2026-05-30-traefik-dedicated-ip-etp-local-*.
|
|
||||||
variable "internal_lb_ip" {
|
|
||||||
type = string
|
|
||||||
default = "10.0.20.203"
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "homepage_group" {
|
variable "homepage_group" {
|
||||||
type = string
|
type = string
|
||||||
default = null # auto-detect from namespace
|
default = null # auto-detect from namespace
|
||||||
|
|
@ -219,10 +201,8 @@ locals {
|
||||||
)
|
)
|
||||||
|
|
||||||
# External monitor enabled by default when the ingress has a public DNS
|
# External monitor enabled by default when the ingress has a public DNS
|
||||||
# record (either CF-proxied or direct A/AAAA). 'internal' records resolve
|
# record (either CF-proxied or direct A/AAAA). Explicit bool overrides.
|
||||||
# publicly but are unroutable from outside, so they get no external monitor.
|
effective_external_monitor = var.external_monitor != null ? var.external_monitor : (var.dns_type != "none")
|
||||||
# Explicit bool overrides.
|
|
||||||
effective_external_monitor = var.external_monitor != null ? var.external_monitor : (var.dns_type == "proxied" || var.dns_type == "non-proxied")
|
|
||||||
|
|
||||||
# Emit the annotation when effective is true (positive signal), or when the
|
# Emit the annotation when effective is true (positive signal), or when the
|
||||||
# caller explicitly set external_monitor=false (opt-out). When the caller
|
# caller explicitly set external_monitor=false (opt-out). When the caller
|
||||||
|
|
@ -444,19 +424,3 @@ resource "cloudflare_record" "non_proxied_aaaa" {
|
||||||
zone_id = var.cloudflare_zone_id
|
zone_id = var.cloudflare_zone_id
|
||||||
allow_overwrite = true
|
allow_overwrite = true
|
||||||
}
|
}
|
||||||
|
|
||||||
# 'internal': a publicly-resolvable A record carrying the INTERNAL Traefik LB
|
|
||||||
# IP. Outsiders resolve it but can't route to it; home-LAN/WG-site/VPN clients
|
|
||||||
# reach Traefik directly (the WG spokes policy-route 10.0.0.0/8 through the
|
|
||||||
# tunnel), so kiosk devices with baked-in URLs need no DNS overrides anywhere.
|
|
||||||
# IPv4-only on purpose: the spokes route no internal IPv6 range.
|
|
||||||
resource "cloudflare_record" "internal_a" {
|
|
||||||
count = var.dns_type == "internal" ? 1 : 0
|
|
||||||
name = local.dns_name
|
|
||||||
content = var.internal_lb_ip
|
|
||||||
proxied = false
|
|
||||||
ttl = 1
|
|
||||||
type = "A"
|
|
||||||
zone_id = var.cloudflare_zone_id
|
|
||||||
allow_overwrite = true
|
|
||||||
}
|
|
||||||
|
|
|
||||||
|
|
@ -21,19 +21,12 @@ WorkingDirectory=/home/%i
|
||||||
ExecStart=/usr/bin/t3 serve --host 0.0.0.0 --port ${T3_PORT} --base-dir /home/%i/.t3
|
ExecStart=/usr/bin/t3 serve --host 0.0.0.0 --port ${T3_PORT} --base-dir /home/%i/.t3
|
||||||
Restart=on-failure
|
Restart=on-failure
|
||||||
RestartSec=5
|
RestartSec=5
|
||||||
# Memory containment (2026-06-10, amended 2026-07-02): agent children live in
|
# Memory containment (2026-06-10): agent children live in this cgroup; a
|
||||||
# this cgroup; a runaway agent (10.8G anon on a 23G host) swap-thrashed the
|
# runaway agent (10.8G anon on a 23G host) swap-thrashed the whole devvm —
|
||||||
# whole devvm — every >20s stall fires the t3 client watchdog (visible
|
# every >20s stall fires the t3 client watchdog (visible "disconnects") —
|
||||||
# "disconnects") — then global-OOMed. Cap the cgroup so a runaway OOMs early
|
# then global-OOMed. Cap the cgroup so a runaway OOMs early and locally,
|
||||||
# and locally, and forbid swap so stalls can't smear into minutes-long freezes.
|
# and forbid swap so stalls can't smear into minutes-long freezes.
|
||||||
# MemoryHigh is DELIBERATELY infinity — do not add a soft band below MemoryMax:
|
MemoryHigh=12G
|
||||||
# with swap=0 a hog that plateaus between high and max is unreclaimable but
|
|
||||||
# never OOMs, and the kernel's high-throttle stalls EVERY task in the cgroup
|
|
||||||
# (the t3 event loop included) indefinitely. A 12.3G agent ugrep livelocked
|
|
||||||
# this unit for ~50min on 2026-07-02 exactly this way. Straight-to-OOM at
|
|
||||||
# MemoryMax is the containment; OOMPolicy=continue below keeps the server up.
|
|
||||||
# See docs/post-mortems/2026-06-22-devvm-mem-io-overload-containment.md addendum.
|
|
||||||
MemoryHigh=infinity
|
|
||||||
MemoryMax=16G
|
MemoryMax=16G
|
||||||
MemorySwapMax=0
|
MemorySwapMax=0
|
||||||
# Default OOMPolicy=stop kills the WHOLE unit (8.5min outage 2026-06-10
|
# Default OOMPolicy=stop kills the WHOLE unit (8.5min outage 2026-06-10
|
||||||
|
|
|
||||||
|
|
@ -1,11 +1,10 @@
|
||||||
#!/usr/bin/env bash
|
#!/usr/bin/env bash
|
||||||
# Unit tests for the pure functions in vault-token-renew.sh.
|
# Unit tests for the pure drift-guard functions in vault-token-renew.sh.
|
||||||
# Sources the script (vtr_main is guarded) and exercises (a) the drift-guard
|
# Sources the script (vtr_main is guarded) and exercises the decision logic that
|
||||||
# decision — is ~/.vault-token OUR periodic admin token (renew) or a foreign
|
# decides whether ~/.vault-token is OUR periodic admin token (renew) or a foreign
|
||||||
# clobber (heal / fail loud)? — whose ABSENCE let the 2026-06-05 woodpecker
|
# token that clobbered the file (refuse, fail loud). This is exactly the logic
|
||||||
# clobber be silently renewed for two days, and (b) the self-heal's revoke
|
# whose ABSENCE let the 2026-06-05 woodpecker-token clobber be silently renewed
|
||||||
# filter — which stale token-devvm-wizard tokens a heal may sweep.
|
# for two days. Run: bash infra/scripts/test-vault-token-renew.sh
|
||||||
# Run: bash infra/scripts/test-vault-token-renew.sh
|
|
||||||
set -uo pipefail
|
set -uo pipefail
|
||||||
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||||
# shellcheck source=/dev/null
|
# shellcheck source=/dev/null
|
||||||
|
|
@ -54,21 +53,5 @@ ok "ours: parse+decide renews" vtr_drift_ok "$(vtr_display_name "$LOOKUP_
|
||||||
no "woodpecker: parse+decide refused" vtr_drift_ok "$(vtr_display_name "$LOOKUP_WP")" "$(vtr_policies_csv "$LOOKUP_WP")"
|
no "woodpecker: parse+decide refused" vtr_drift_ok "$(vtr_display_name "$LOOKUP_WP")" "$(vtr_policies_csv "$LOOKUP_WP")"
|
||||||
no "oidc: parse+decide refused" vtr_drift_ok "$(vtr_display_name "$LOOKUP_OIDC")" "$(vtr_policies_csv "$LOOKUP_OIDC")"
|
no "oidc: parse+decide refused" vtr_drift_ok "$(vtr_display_name "$LOOKUP_OIDC")" "$(vtr_policies_csv "$LOOKUP_OIDC")"
|
||||||
|
|
||||||
# --- vtr_accessor: parse accessor out of lookup JSON ---
|
|
||||||
LOOKUP_NEW='{"data":{"display_name":"token-devvm-wizard","accessor":"acc-new","policies":["default","sops-admin","vault-admin"],"identity_policies":null}}'
|
|
||||||
eq "accessor parsed" "acc-new" "$(vtr_accessor "$LOOKUP_NEW")"
|
|
||||||
eq "accessor absent -> empty" "" "$(vtr_accessor '{"data":{"display_name":"x"}}')"
|
|
||||||
|
|
||||||
# --- vtr_is_stale_periodic: the heal's revoke filter — ONLY old token-devvm-wizard
|
|
||||||
# --- tokens are swept; the just-minted token, foreign tokens, and anything with an
|
|
||||||
# --- unknown accessor are kept. An empty keep-accessor sweeps NOTHING (fail-safe).
|
|
||||||
STALE_OURS='{"data":{"display_name":"token-devvm-wizard","accessor":"acc-old","policies":["default","sops-admin","vault-admin"]}}'
|
|
||||||
ok "older periodic token is stale" vtr_is_stale_periodic "$STALE_OURS" "acc-new"
|
|
||||||
no "the just-minted token is kept" vtr_is_stale_periodic "$LOOKUP_NEW" "acc-new"
|
|
||||||
no "foreign oidc token never swept" vtr_is_stale_periodic "$LOOKUP_OIDC" "acc-new"
|
|
||||||
no "woodpecker token never swept" vtr_is_stale_periodic "$LOOKUP_WP" "acc-new"
|
|
||||||
no "missing accessor never swept" vtr_is_stale_periodic '{"data":{"display_name":"token-devvm-wizard"}}' "acc-new"
|
|
||||||
no "empty keep-accessor sweeps nothing" vtr_is_stale_periodic "$STALE_OURS" ""
|
|
||||||
|
|
||||||
printf '\n%d passed, %d failed\n' "$pass" "$fail"
|
printf '\n%d passed, %d failed\n' "$pass" "$fail"
|
||||||
(( fail == 0 ))
|
(( fail == 0 ))
|
||||||
|
|
|
||||||
|
|
@ -45,94 +45,6 @@ vtr_drift_ok() {
|
||||||
printf ',%s,' "$pols" | grep -q ",$REQUIRED_POLICY," || return 1
|
printf ',%s,' "$pols" | grep -q ",$REQUIRED_POLICY," || return 1
|
||||||
}
|
}
|
||||||
|
|
||||||
# vtr_accessor <lookup-json> -> the token accessor (empty if absent).
|
|
||||||
vtr_accessor() {
|
|
||||||
printf '%s' "$1" | jq -r '.data.accessor // ""'
|
|
||||||
}
|
|
||||||
|
|
||||||
# vtr_is_stale_periodic <lookup-json> <keep-accessor> -> 0 if this lookup
|
|
||||||
# describes one of OUR periodic tokens (display name matches) that is NOT the
|
|
||||||
# one to keep — i.e. a stale leftover a heal should revoke. 1 otherwise.
|
|
||||||
# Name-only on purpose (no policy check): anything named token-devvm-wizard
|
|
||||||
# that isn't the current token is garbage from a previous mint. An empty
|
|
||||||
# keep-accessor sweeps NOTHING (fail-safe: never revoke when we don't know
|
|
||||||
# which token is current).
|
|
||||||
vtr_is_stale_periodic() {
|
|
||||||
local dn acc
|
|
||||||
[ -n "${2:-}" ] || return 1
|
|
||||||
dn=$(vtr_display_name "$1")
|
|
||||||
acc=$(vtr_accessor "$1")
|
|
||||||
[ "$dn" = "$EXPECTED_DN" ] || return 1
|
|
||||||
[ -n "$acc" ] || return 1
|
|
||||||
[ "$acc" != "$2" ]
|
|
||||||
}
|
|
||||||
|
|
||||||
# vtr_heal <foreign-dn> <log-file> -> 0 if ~/.vault-token was re-minted back to
|
|
||||||
# our periodic admin token using the foreign token's own authority, 1 if the
|
|
||||||
# heal was denied or failed (caller exits non-zero; the unit goes failed).
|
|
||||||
#
|
|
||||||
# Self-heal added 2026-07-03 (docs/plans/2026-07-03-vault-token-self-heal-design.md):
|
|
||||||
# an OIDC login — which the infra docs prescribe before applies — clobbers
|
|
||||||
# ~/.vault-token with a 7-day token, and detect-only drift left that unnoticed
|
|
||||||
# for weeks (the weekly-expiry loop). We ATTEMPT the re-mint with the
|
|
||||||
# clobbering token itself and let Vault's authz decide — a read-only clobber
|
|
||||||
# (the 2026-06-05 woodpecker incident) is denied the mint and stays a loud
|
|
||||||
# failure, because it signals a misbehaving flow that someone should look at.
|
|
||||||
vtr_heal() {
|
|
||||||
local foreign_dn="$1" log="$2"
|
|
||||||
local errf new_token new_info new_dn new_pols new_acc tmp
|
|
||||||
errf=$(mktemp)
|
|
||||||
if ! new_token=$(vault token create -orphan -period=768h \
|
|
||||||
-policy=vault-admin -policy=sops-admin -display-name=devvm-wizard \
|
|
||||||
-field=token 2>"$errf") || [ -z "$new_token" ]; then
|
|
||||||
printf '%s DRIFT: ~/.vault-token is dn=%q — heal denied, foreign token lacks create authority (%s); investigate what wrote it. Manual re-mint: vault login -method=oidc && vault token create -orphan -period=768h -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard -field=token > ~/.vault-token && chmod 600 ~/.vault-token\n' \
|
|
||||||
"$(date -Is)" "$foreign_dn" "$(tr '\n' ' ' <"$errf")" >>"$log"
|
|
||||||
rm -f "$errf"
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
rm -f "$errf"
|
|
||||||
|
|
||||||
# Sanity: the minted token must itself pass the drift guard before it may
|
|
||||||
# replace ~/.vault-token.
|
|
||||||
if ! new_info=$(VAULT_TOKEN="$new_token" vault token lookup -format=json 2>&1); then
|
|
||||||
printf '%s FAIL: heal minted a token but its lookup failed: %s\n' \
|
|
||||||
"$(date -Is)" "$new_info" >>"$log"
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
new_dn=$(vtr_display_name "$new_info")
|
|
||||||
new_pols=$(vtr_policies_csv "$new_info")
|
|
||||||
if ! vtr_drift_ok "$new_dn" "$new_pols"; then
|
|
||||||
printf '%s FAIL: heal minted an unexpected token (dn=%q policies=%q) — not writing it\n' \
|
|
||||||
"$(date -Is)" "$new_dn" "$new_pols" >>"$log"
|
|
||||||
return 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Atomic replace: mktemp files are 0600 from birth; same-filesystem mv.
|
|
||||||
tmp=$(mktemp "$HOME/.vault-token.XXXXXX")
|
|
||||||
printf '%s' "$new_token" >"$tmp"
|
|
||||||
mv "$tmp" "$HOME/.vault-token"
|
|
||||||
|
|
||||||
# Anti-sprawl: revoke previous token-devvm-wizard tokens — each heal would
|
|
||||||
# otherwise strand the prior periodic ADMIN token server-side for up to 32d.
|
|
||||||
# The clobbering foreign token is deliberately NOT revoked: it may still back
|
|
||||||
# the user's live login session, and it ages out on its own (7d for OIDC).
|
|
||||||
local sweep="accessor sweep skipped (list denied)" accessors a a_info revoked=0
|
|
||||||
new_acc=$(vtr_accessor "$new_info")
|
|
||||||
if [ -n "$new_acc" ] && accessors=$(VAULT_TOKEN="$new_token" vault list -format=json auth/token/accessors 2>/dev/null); then
|
|
||||||
while IFS= read -r a; do
|
|
||||||
[ -n "$a" ] || continue
|
|
||||||
a_info=$(VAULT_TOKEN="$new_token" vault token lookup -format=json -accessor "$a" 2>/dev/null) || continue
|
|
||||||
if vtr_is_stale_periodic "$a_info" "$new_acc"; then
|
|
||||||
VAULT_TOKEN="$new_token" vault token revoke -accessor "$a" >/dev/null 2>&1 && revoked=$((revoked + 1))
|
|
||||||
fi
|
|
||||||
done < <(printf '%s' "$accessors" | jq -r '.[]')
|
|
||||||
sweep="revoked $revoked stale periodic token(s)"
|
|
||||||
fi
|
|
||||||
|
|
||||||
printf '%s HEALED: re-minted periodic token from foreign dn=%q (%s)\n' \
|
|
||||||
"$(date -Is)" "$foreign_dn" "$sweep" >>"$log"
|
|
||||||
}
|
|
||||||
|
|
||||||
vtr_main() {
|
vtr_main() {
|
||||||
set -euo pipefail
|
set -euo pipefail
|
||||||
export PATH="/usr/local/bin:/usr/bin:/bin:${PATH:-}"
|
export PATH="/usr/local/bin:/usr/bin:/bin:${PATH:-}"
|
||||||
|
|
@ -149,19 +61,16 @@ vtr_main() {
|
||||||
dn=$(vtr_display_name "$info")
|
dn=$(vtr_display_name "$info")
|
||||||
pols=$(vtr_policies_csv "$info")
|
pols=$(vtr_policies_csv "$info")
|
||||||
|
|
||||||
# Drift guard (2026-06-07) + self-heal (2026-07-03): the renewer must not
|
# Drift guard (added 2026-06-07): the renewer must NOT keep a FOREIGN token alive.
|
||||||
# keep a FOREIGN token alive (on 2026-06-05 a stray kubernetes login was
|
# On 2026-06-05 a stray `vault login -method=kubernetes` overwrote ~/.vault-token
|
||||||
# silently renewed for two days, masking lost write access). But detect-only
|
# with a read-only woodpecker token, and this script then silently renewed THAT
|
||||||
# drift proved worse in practice: an OIDC login — which the infra docs
|
# for two days — masking the loss of write access. So before renewing, confirm
|
||||||
# prescribe before applies — clobbers this file too, and the resulting DRIFT
|
# the token is our periodic admin token; if it has drifted, fail loudly (systemd
|
||||||
# failures went unnoticed for weeks while access degraded to a 7-day token
|
# marks the unit failed) instead of keeping someone else's token alive.
|
||||||
# (the weekly-expiry loop). On drift we now ATTEMPT to heal (see vtr_heal):
|
|
||||||
# re-mint the periodic token with the clobbering token's own authority.
|
|
||||||
# Vault's authz keeps the old guarantee — a token that couldn't legitimately
|
|
||||||
# hold vault-admin is denied the mint, and we still fail loud.
|
|
||||||
if ! vtr_drift_ok "$dn" "$pols"; then
|
if ! vtr_drift_ok "$dn" "$pols"; then
|
||||||
vtr_heal "$dn" "$log" || exit 1
|
printf '%s DRIFT: ~/.vault-token is dn=%q policies=%q (expected dn=%q with %q). Refusing to renew a foreign token. Re-mint: vault login -method=oidc && vault token create -orphan -period=768h -policy=vault-admin -policy=sops-admin -display-name=devvm-wizard -field=token > ~/.vault-token && chmod 600 ~/.vault-token\n' \
|
||||||
exit 0
|
"$(date -Is)" "$dn" "$pols" "$EXPECTED_DN" "$REQUIRED_POLICY" >>"$log"
|
||||||
|
exit 1
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# `vault token renew` with no argument renews the calling token (renew-self).
|
# `vault token renew` with no argument renews the calling token (renew-self).
|
||||||
|
|
|
||||||
|
|
@ -244,15 +244,9 @@ log "service units installed + enabled (t3-dispatch + 3 timers; t3-serve@ per-us
|
||||||
# virtual disk into an IO storm + multi-minute freeze (hard-killed 2026-06-22).
|
# virtual disk into an IO storm + multi-minute freeze (hard-killed 2026-06-22).
|
||||||
# t3-serve@ was already capped (its [Service] block); the HOLE was the uncapped
|
# t3-serve@ was already capped (its [Service] block); the HOLE was the uncapped
|
||||||
# user-<uid>.slice (all ssh/tmux work). Design — per user, on BOTH trees:
|
# user-<uid>.slice (all ssh/tmux work). Design — per user, on BOTH trees:
|
||||||
# MemoryMax=16G hard + MemorySwapMax=0 (work never touches disk swap → no
|
# MemoryHigh=12G soft (throttles a runaway to a crawl), MemoryMax=16G hard,
|
||||||
# thrash; a runaway is cgroup-OOM-killed locally at the ceiling), plus
|
# MemorySwapMax=0 (work never touches disk swap → no thrash; it OOMs locally at
|
||||||
# fair-share CPU/IO weights.
|
# the ceiling instead), plus fair-share CPU/IO weights.
|
||||||
# NO MemoryHigh soft band (removed 2026-07-02; was 12G "throttle to a crawl"):
|
|
||||||
# with swap=0, a hog that PLATEAUS between high and max is unreclaimable but
|
|
||||||
# never OOMs — the kernel parks every task of the cgroup in
|
|
||||||
# mem_cgroup_handle_over_high and the whole tree stalls indefinitely. A 12.3G
|
|
||||||
# agent ugrep livelocked t3-serve@wizard (t3 down ~50min) exactly this way.
|
|
||||||
# Cap-and-kill, never throttle-and-pray — see the post-mortem addendum.
|
|
||||||
# BACKSTOP = earlyoom, NOT systemd-oomd. We first shipped systemd-oomd but it is
|
# BACKSTOP = earlyoom, NOT systemd-oomd. We first shipped systemd-oomd but it is
|
||||||
# INERT with swap=0: its pressure-kill only acts on cgroups doing active reclaim
|
# INERT with swap=0: its pressure-kill only acts on cgroups doing active reclaim
|
||||||
# (pgscan rising), and a no-swap anon workload never reclaims — verified live, a
|
# (pgscan rising), and a no-swap anon workload never reclaims — verified live, a
|
||||||
|
|
@ -266,16 +260,12 @@ log "service units installed + enabled (t3-dispatch + 3 timers; t3-serve@ per-us
|
||||||
# 10a) per-user caps + fair-share weights on EVERY user-<uid>.slice (ssh/tmux)
|
# 10a) per-user caps + fair-share weights on EVERY user-<uid>.slice (ssh/tmux)
|
||||||
install -d -m 0755 /etc/systemd/system/user-.slice.d
|
install -d -m 0755 /etc/systemd/system/user-.slice.d
|
||||||
cat > /etc/systemd/system/user-.slice.d/50-devvm-resource.conf <<'SLICE_EOF'
|
cat > /etc/systemd/system/user-.slice.d/50-devvm-resource.conf <<'SLICE_EOF'
|
||||||
# Per-user containment for the shared devvm (setup-devvm.sh §10, 2026-06-22;
|
# Per-user containment for the shared devvm (setup-devvm.sh §10, 2026-06-22).
|
||||||
# MemoryHigh dropped 2026-07-02). Applies to EACH user-<uid>.slice = all of one
|
# Applies to EACH user-<uid>.slice = all of one user's ssh/tmux work. Mirrors the
|
||||||
# user's ssh/tmux work. Mirrors the t3-serve@.service caps so a user is bounded
|
# t3-serve@.service caps so a user is bounded in whichever surface they work in.
|
||||||
# in whichever surface they work in. MemoryHigh stays infinity: with swap=0 a
|
|
||||||
# hog plateauing in a high..max band livelocks the entire slice (every ssh/tmux
|
|
||||||
# session of that user) instead of dying — straight-to-OOM at MemoryMax is the
|
|
||||||
# containment (see post-mortem addendum 2026-07-02).
|
|
||||||
[Slice]
|
[Slice]
|
||||||
MemoryAccounting=yes
|
MemoryAccounting=yes
|
||||||
MemoryHigh=infinity
|
MemoryHigh=12G
|
||||||
MemoryMax=16G
|
MemoryMax=16G
|
||||||
MemorySwapMax=0
|
MemorySwapMax=0
|
||||||
CPUAccounting=yes
|
CPUAccounting=yes
|
||||||
|
|
@ -304,14 +294,12 @@ cat > /etc/systemd/system/docker.slice <<'DOCKER_SLICE_EOF'
|
||||||
# All docker containers live here (cgroup-parent in /etc/docker/daemon.json) so
|
# All docker containers live here (cgroup-parent in /etc/docker/daemon.json) so
|
||||||
# they share one bounded budget and a runaway container is capped at MemoryMax
|
# they share one bounded budget and a runaway container is capped at MemoryMax
|
||||||
# (cgroup-OOM'd locally) instead of escaping into the uncapped system.slice.
|
# (cgroup-OOM'd locally) instead of escaping into the uncapped system.slice.
|
||||||
# setup-devvm.sh §10, 2026-06-22; MemoryHigh dropped 2026-07-02 — a container
|
# setup-devvm.sh §10, 2026-06-22.
|
||||||
# plateauing in the high..max band would throttle-livelock EVERY container in
|
|
||||||
# the slice (see post-mortem addendum); MemoryMax OOM is the containment.
|
|
||||||
[Unit]
|
[Unit]
|
||||||
Description=Docker containers slice (capped)
|
Description=Docker containers slice (capped)
|
||||||
[Slice]
|
[Slice]
|
||||||
MemoryAccounting=yes
|
MemoryAccounting=yes
|
||||||
MemoryHigh=infinity
|
MemoryHigh=6G
|
||||||
MemoryMax=8G
|
MemoryMax=8G
|
||||||
MemorySwapMax=0
|
MemorySwapMax=0
|
||||||
CPUAccounting=yes
|
CPUAccounting=yes
|
||||||
|
|
|
||||||
Binary file not shown.
|
|
@ -235,12 +235,6 @@ resource "cloudflare_record" "keyserver" {
|
||||||
zone_id = var.cloudflare_zone_id
|
zone_id = var.cloudflare_zone_id
|
||||||
}
|
}
|
||||||
|
|
||||||
# bridge.viktorbarzin.me (Cloudflare Pages, "мост" school site) moved to
|
|
||||||
# stacks/valia-sites (ADR-0018) — all Valia-site records live there now.
|
|
||||||
# State handoff was a manual `tg state rm` (2026-07-03): the CI terraform
|
|
||||||
# (<1.7) rejects removed{} blocks even at the stack root, so declarative
|
|
||||||
# forget wasn't available. valia-sites imported the live record by id.
|
|
||||||
|
|
||||||
# Enable HTTP/3 (QUIC) for Cloudflare-proxied domains
|
# Enable HTTP/3 (QUIC) for Cloudflare-proxied domains
|
||||||
resource "cloudflare_zone_settings_override" "http3" {
|
resource "cloudflare_zone_settings_override" "http3" {
|
||||||
zone_id = var.cloudflare_zone_id
|
zone_id = var.cloudflare_zone_id
|
||||||
|
|
|
||||||
|
|
@ -16,7 +16,7 @@ resource "kubernetes_namespace" "dawarich" {
|
||||||
name = "dawarich"
|
name = "dawarich"
|
||||||
labels = {
|
labels = {
|
||||||
"istio-injection" : "disabled"
|
"istio-injection" : "disabled"
|
||||||
tier = local.tiers.edge
|
tier = local.tiers.edge
|
||||||
"keel.sh/enrolled" = "true"
|
"keel.sh/enrolled" = "true"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -330,7 +330,7 @@ resource "kubernetes_deployment" "dawarich" {
|
||||||
}
|
}
|
||||||
lifecycle {
|
lifecycle {
|
||||||
ignore_changes = [
|
ignore_changes = [
|
||||||
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
|
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
|
||||||
spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
|
spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
|
||||||
metadata[0].annotations["keel.sh/policy"],
|
metadata[0].annotations["keel.sh/policy"],
|
||||||
metadata[0].annotations["keel.sh/trigger"],
|
metadata[0].annotations["keel.sh/trigger"],
|
||||||
|
|
@ -458,13 +458,6 @@ module "ingress" {
|
||||||
namespace = kubernetes_namespace.dawarich.metadata[0].name
|
namespace = kubernetes_namespace.dawarich.metadata[0].name
|
||||||
name = "dawarich"
|
name = "dawarich"
|
||||||
tls_secret_name = var.tls_secret_name
|
tls_secret_name = var.tls_secret_name
|
||||||
# Rails serves all its fingerprinted assets itself and the map view adds an
|
|
||||||
# API burst per page load — the default 10/50 limiter 429s the asset tail
|
|
||||||
# from a single client IP (and risks dropping OwnTracks/mobile ingestion
|
|
||||||
# POSTs on the same host). Dedicated 100/1000 limiter defined in
|
|
||||||
# stacks/traefik/modules/traefik/middleware.tf.
|
|
||||||
skip_default_rate_limit = true
|
|
||||||
extra_middlewares = ["traefik-dawarich-rate-limit@kubernetescrd"]
|
|
||||||
extra_annotations = {
|
extra_annotations = {
|
||||||
"gethomepage.dev/enabled" = "true"
|
"gethomepage.dev/enabled" = "true"
|
||||||
"gethomepage.dev/name" = "Dawarich"
|
"gethomepage.dev/name" = "Dawarich"
|
||||||
|
|
|
||||||
|
|
@ -1511,34 +1511,6 @@ resource "null_resource" "pg_instagram_poster_db" {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
# Create tasks database for the tasks PWA (Reminders-style front-end over
|
|
||||||
# Nextcloud CalDAV; FastAPI + SvelteKit SPA — see ~/code/tasks). Stores
|
|
||||||
# Connected Accounts (Fernet-encrypted Nextcloud app passwords) + sync state.
|
|
||||||
# Role password is managed by Vault Database Secrets Engine (static role
|
|
||||||
# `pg-tasks`, 7d rotation). Tables are created by alembic on app startup.
|
|
||||||
resource "null_resource" "pg_tasks_db" {
|
|
||||||
depends_on = [null_resource.pg_cluster]
|
|
||||||
|
|
||||||
triggers = {
|
|
||||||
db_name = "tasks"
|
|
||||||
username = "tasks"
|
|
||||||
}
|
|
||||||
|
|
||||||
provisioner "local-exec" {
|
|
||||||
command = <<-EOT
|
|
||||||
PRIMARY=$(kubectl --kubeconfig ${var.kube_config_path} get cluster -n dbaas pg-cluster -o jsonpath='{.status.currentPrimary}')
|
|
||||||
kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas $PRIMARY -c postgres -- \
|
|
||||||
bash -c '
|
|
||||||
psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'tasks'"'"'" | grep -q 1 || \
|
|
||||||
psql -U postgres -c "CREATE ROLE tasks WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'"
|
|
||||||
psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_database WHERE datname = '"'"'tasks'"'"'" | grep -q 1 || \
|
|
||||||
psql -U postgres -c "CREATE DATABASE tasks OWNER tasks"
|
|
||||||
psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE tasks TO tasks"
|
|
||||||
'
|
|
||||||
EOT
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# Old PostgreSQL deployment — kept commented for rollback reference
|
# Old PostgreSQL deployment — kept commented for rollback reference
|
||||||
# resource "kubernetes_deployment" "postgres" {
|
# resource "kubernetes_deployment" "postgres" {
|
||||||
# metadata {
|
# metadata {
|
||||||
|
|
|
||||||
|
|
@ -1,360 +0,0 @@
|
||||||
variable "tls_secret_name" {
|
|
||||||
type = string
|
|
||||||
sensitive = true
|
|
||||||
}
|
|
||||||
variable "nfs_server" { type = string }
|
|
||||||
|
|
||||||
# Open DroneLog (https://github.com/arpanghosh8453/open-dronelog) — self-hosted
|
|
||||||
# DJI flight-log analyzer for the DJI Mini 4 Pro. Runs the UPSTREAM image (the
|
|
||||||
# ViktorBarzin/drone-logbook fork has no custom commits); Keel tracks :latest.
|
|
||||||
# Design: docs/plans/2026-07-04-drone-logbook-design.md
|
|
||||||
resource "kubernetes_namespace" "drone_logbook" {
|
|
||||||
metadata {
|
|
||||||
name = "drone-logbook"
|
|
||||||
labels = {
|
|
||||||
tier = local.tiers.aux
|
|
||||||
"keel.sh/enrolled" = "true"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
lifecycle {
|
|
||||||
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
|
|
||||||
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_manifest" "external_secret" {
|
|
||||||
field_manager {
|
|
||||||
force_conflicts = true
|
|
||||||
}
|
|
||||||
manifest = {
|
|
||||||
apiVersion = "external-secrets.io/v1"
|
|
||||||
kind = "ExternalSecret"
|
|
||||||
metadata = {
|
|
||||||
name = "drone-logbook-secrets"
|
|
||||||
namespace = "drone-logbook"
|
|
||||||
}
|
|
||||||
spec = {
|
|
||||||
refreshInterval = "15m"
|
|
||||||
secretStoreRef = {
|
|
||||||
name = "vault-kv"
|
|
||||||
kind = "ClusterSecretStore"
|
|
||||||
}
|
|
||||||
target = {
|
|
||||||
name = "drone-logbook-secrets"
|
|
||||||
}
|
|
||||||
dataFrom = [{
|
|
||||||
extract = {
|
|
||||||
key = "drone-logbook"
|
|
||||||
}
|
|
||||||
}]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
depends_on = [kubernetes_namespace.drone_logbook]
|
|
||||||
}
|
|
||||||
|
|
||||||
module "tls_secret" {
|
|
||||||
source = "../../modules/kubernetes/setup_tls_secret"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
tls_secret_name = var.tls_secret_name
|
|
||||||
}
|
|
||||||
|
|
||||||
# DuckDB database + cached DJI decryption keys + uploaded originals.
|
|
||||||
# Embedded DB -> block storage, not NFS (same rationale as freshrss data).
|
|
||||||
# Encrypted class: flight logs are GPS traces of home/travel (sensitive data
|
|
||||||
# -> proxmox-lvm-encrypted per the storage decision rule in .claude/CLAUDE.md).
|
|
||||||
resource "kubernetes_persistent_volume_claim" "data" {
|
|
||||||
wait_until_bound = false
|
|
||||||
metadata {
|
|
||||||
name = "drone-logbook-data-encrypted"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
annotations = {
|
|
||||||
"resize.topolvm.io/threshold" = "10%"
|
|
||||||
"resize.topolvm.io/increase" = "100%"
|
|
||||||
"resize.topolvm.io/storage_limit" = "10Gi"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
spec {
|
|
||||||
access_modes = ["ReadWriteOnce"]
|
|
||||||
storage_class_name = "proxmox-lvm-encrypted"
|
|
||||||
resources {
|
|
||||||
requests = {
|
|
||||||
storage = "2Gi"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
lifecycle {
|
|
||||||
# The autoresizer expands requests.storage up to storage_limit and PVCs
|
|
||||||
# can't shrink; without this every apply tries to revert the size.
|
|
||||||
ignore_changes = [spec[0].resources[0].requests]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# Drop folder: any producer (Nextcloud sync, scp, future phone pipeline) lands
|
|
||||||
# DJI .txt logs here over NFS; the app auto-imports on SYNC_INTERVAL.
|
|
||||||
module "nfs_sync_logs" {
|
|
||||||
source = "../../modules/kubernetes/nfs_volume"
|
|
||||||
name = "drone-logbook-sync-logs"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
nfs_server = var.nfs_server
|
|
||||||
nfs_path = "/srv/nfs/drone-logbook/sync-logs"
|
|
||||||
storage = "5Gi"
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_deployment" "drone_logbook" {
|
|
||||||
metadata {
|
|
||||||
name = "drone-logbook"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
labels = {
|
|
||||||
app = "drone-logbook"
|
|
||||||
"kubernetes.io/cluster-service" = "true"
|
|
||||||
tier = local.tiers.aux
|
|
||||||
}
|
|
||||||
}
|
|
||||||
spec {
|
|
||||||
replicas = 1
|
|
||||||
strategy {
|
|
||||||
# DuckDB is single-writer; never overlap two pods on the same volume
|
|
||||||
type = "Recreate"
|
|
||||||
}
|
|
||||||
selector {
|
|
||||||
match_labels = {
|
|
||||||
app = "drone-logbook"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
template {
|
|
||||||
metadata {
|
|
||||||
labels = {
|
|
||||||
app = "drone-logbook"
|
|
||||||
"kubernetes.io/cluster-service" = "true"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
spec {
|
|
||||||
container {
|
|
||||||
name = "drone-logbook"
|
|
||||||
image = "ghcr.io/arpanghosh8453/open-dronelog:latest"
|
|
||||||
env {
|
|
||||||
name = "RUST_LOG"
|
|
||||||
value = "info"
|
|
||||||
}
|
|
||||||
env {
|
|
||||||
# keep re-importable originals under /data/drone-logbook/uploaded
|
|
||||||
name = "KEEP_UPLOADED_FILES"
|
|
||||||
value = "true"
|
|
||||||
}
|
|
||||||
env {
|
|
||||||
name = "SYNC_LOGS_PATH"
|
|
||||||
value = "/sync-logs"
|
|
||||||
}
|
|
||||||
env {
|
|
||||||
# 6-field cron (sec min hour dom mon dow): scan drop folder every 8h
|
|
||||||
name = "SYNC_INTERVAL"
|
|
||||||
value = "0 0 */8 * * *"
|
|
||||||
}
|
|
||||||
env {
|
|
||||||
name = "PROFILE_CREATION_PASS"
|
|
||||||
value_from {
|
|
||||||
secret_key_ref {
|
|
||||||
name = "drone-logbook-secrets"
|
|
||||||
key = "profile_creation_pass"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
volume_mount {
|
|
||||||
name = "data"
|
|
||||||
mount_path = "/data/drone-logbook"
|
|
||||||
}
|
|
||||||
volume_mount {
|
|
||||||
name = "sync-logs"
|
|
||||||
mount_path = "/sync-logs"
|
|
||||||
read_only = true
|
|
||||||
}
|
|
||||||
port {
|
|
||||||
name = "http"
|
|
||||||
container_port = 80
|
|
||||||
protocol = "TCP"
|
|
||||||
}
|
|
||||||
resources {
|
|
||||||
requests = {
|
|
||||||
cpu = "25m"
|
|
||||||
memory = "512Mi"
|
|
||||||
}
|
|
||||||
limits = {
|
|
||||||
memory = "512Mi"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
volume {
|
|
||||||
name = "data"
|
|
||||||
persistent_volume_claim {
|
|
||||||
claim_name = kubernetes_persistent_volume_claim.data.metadata[0].name
|
|
||||||
}
|
|
||||||
}
|
|
||||||
volume {
|
|
||||||
name = "sync-logs"
|
|
||||||
persistent_volume_claim {
|
|
||||||
claim_name = module.nfs_sync_logs.claim_name
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
depends_on = [kubernetes_manifest.external_secret]
|
|
||||||
lifecycle {
|
|
||||||
ignore_changes = [
|
|
||||||
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
|
|
||||||
metadata[0].annotations["keel.sh/policy"],
|
|
||||||
metadata[0].annotations["keel.sh/trigger"],
|
|
||||||
metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
|
|
||||||
metadata[0].annotations["keel.sh/match-tag"],
|
|
||||||
spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
|
|
||||||
metadata[0].annotations["kubernetes.io/change-cause"],
|
|
||||||
metadata[0].annotations["deployment.kubernetes.io/revision"],
|
|
||||||
spec[0].template[0].metadata[0].annotations["keel.sh/update-time"], # KEEL_LIFECYCLE_V1
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_service" "drone_logbook" {
|
|
||||||
metadata {
|
|
||||||
name = "drone-logbook"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
labels = {
|
|
||||||
"app" = "drone-logbook"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
spec {
|
|
||||||
selector = {
|
|
||||||
app = "drone-logbook"
|
|
||||||
}
|
|
||||||
port {
|
|
||||||
port = "80"
|
|
||||||
target_port = "80"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# -----------------------------------------------------------------------------
|
|
||||||
# Backup — required for every proxmox-lvm(-encrypted) app: daily copy of the
|
|
||||||
# data volume to NFS /srv/nfs/drone-logbook-backup (picked up by nfs-mirror ->
|
|
||||||
# sda -> Synology offsite). 01:30 = outside the 00:00/08:00/16:00 sync-import
|
|
||||||
# windows, so the DuckDB file is quiescent; uploaded originals make even a
|
|
||||||
# mid-write copy recoverable by re-import. Pod-affinity co-schedules with the
|
|
||||||
# app pod (RWO volume mounts twice only on the same node). Vaultwarden pattern.
|
|
||||||
# -----------------------------------------------------------------------------
|
|
||||||
|
|
||||||
module "nfs_backup" {
|
|
||||||
source = "../../modules/kubernetes/nfs_volume"
|
|
||||||
name = "drone-logbook-backup-host"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
nfs_server = var.nfs_server
|
|
||||||
nfs_path = "/srv/nfs/drone-logbook-backup"
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_cron_job_v1" "backup" {
|
|
||||||
metadata {
|
|
||||||
name = "drone-logbook-backup"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
}
|
|
||||||
spec {
|
|
||||||
concurrency_policy = "Replace"
|
|
||||||
failed_jobs_history_limit = 5
|
|
||||||
schedule = "30 1 * * *"
|
|
||||||
starting_deadline_seconds = 300
|
|
||||||
successful_jobs_history_limit = 3
|
|
||||||
job_template {
|
|
||||||
metadata {}
|
|
||||||
spec {
|
|
||||||
backoff_limit = 3
|
|
||||||
ttl_seconds_after_finished = 10
|
|
||||||
template {
|
|
||||||
metadata {}
|
|
||||||
spec {
|
|
||||||
affinity {
|
|
||||||
pod_affinity {
|
|
||||||
required_during_scheduling_ignored_during_execution {
|
|
||||||
label_selector {
|
|
||||||
match_labels = {
|
|
||||||
app = "drone-logbook"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
topology_key = "kubernetes.io/hostname"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
container {
|
|
||||||
name = "drone-logbook-backup"
|
|
||||||
image = "docker.io/library/alpine"
|
|
||||||
command = ["/bin/sh", "-c", <<-EOT
|
|
||||||
set -euxo pipefail
|
|
||||||
_t0=$(date +%s)
|
|
||||||
now=$(date +"%Y_%m_%d_%H_%M")
|
|
||||||
mkdir -p /backup/$now
|
|
||||||
cp -a /data/. /backup/$now/
|
|
||||||
# Rotate — 30 day retention
|
|
||||||
find /backup -maxdepth 1 -mindepth 1 -type d -mtime +30 -exec rm -rf {} +
|
|
||||||
_dur=$(($(date +%s) - _t0))
|
|
||||||
_out_bytes=$(du -sb /backup/$now | awk '{print $1}')
|
|
||||||
wget -qO- --post-data "backup_duration_seconds $${_dur}
|
|
||||||
backup_output_bytes $${_out_bytes}
|
|
||||||
backup_last_success_timestamp $(date +%s)
|
|
||||||
" "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/drone-logbook-backup" || true
|
|
||||||
EOT
|
|
||||||
]
|
|
||||||
volume_mount {
|
|
||||||
name = "data"
|
|
||||||
mount_path = "/data"
|
|
||||||
read_only = true
|
|
||||||
}
|
|
||||||
volume_mount {
|
|
||||||
name = "backup"
|
|
||||||
mount_path = "/backup"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
volume {
|
|
||||||
name = "data"
|
|
||||||
persistent_volume_claim {
|
|
||||||
claim_name = kubernetes_persistent_volume_claim.data.metadata[0].name
|
|
||||||
}
|
|
||||||
}
|
|
||||||
volume {
|
|
||||||
name = "backup"
|
|
||||||
persistent_volume_claim {
|
|
||||||
claim_name = module.nfs_backup.claim_name
|
|
||||||
}
|
|
||||||
}
|
|
||||||
dns_config {
|
|
||||||
option {
|
|
||||||
name = "ndots"
|
|
||||||
value = "2"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
lifecycle {
|
|
||||||
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
|
||||||
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# https://dronelog.viktorbarzin.me
|
|
||||||
module "ingress" {
|
|
||||||
source = "../../modules/kubernetes/ingress_factory"
|
|
||||||
auth = "required" # Authentik forward-auth — flight logs are GPS traces of home/travel
|
|
||||||
dns_type = "proxied"
|
|
||||||
namespace = kubernetes_namespace.drone_logbook.metadata[0].name
|
|
||||||
name = "dronelog"
|
|
||||||
service_name = "drone-logbook"
|
|
||||||
tls_secret_name = var.tls_secret_name
|
|
||||||
extra_annotations = {
|
|
||||||
"gethomepage.dev/enabled" = "true"
|
|
||||||
"gethomepage.dev/name" = "Drone Logbook"
|
|
||||||
"gethomepage.dev/description" = "DJI flight log analyzer"
|
|
||||||
"gethomepage.dev/icon" = "mdi-quadcopter"
|
|
||||||
"gethomepage.dev/group" = "Media & Entertainment"
|
|
||||||
"gethomepage.dev/pod-selector" = ""
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
@ -1 +0,0 @@
|
||||||
../../secrets
|
|
||||||
|
|
@ -1,8 +0,0 @@
|
||||||
include "root" {
|
|
||||||
path = find_in_parent_folders()
|
|
||||||
}
|
|
||||||
|
|
||||||
dependency "platform" {
|
|
||||||
config_path = "../platform"
|
|
||||||
skip_outputs = true
|
|
||||||
}
|
|
||||||
|
|
@ -10,7 +10,7 @@ resource "kubernetes_namespace" "excalidraw" {
|
||||||
name = "excalidraw"
|
name = "excalidraw"
|
||||||
labels = {
|
labels = {
|
||||||
"istio-injection" : "disabled"
|
"istio-injection" : "disabled"
|
||||||
tier = local.tiers.aux
|
tier = local.tiers.aux
|
||||||
"keel.sh/enrolled" = "true"
|
"keel.sh/enrolled" = "true"
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
@ -45,15 +45,6 @@ resource "kubernetes_deployment" "excalidraw" {
|
||||||
app = "excalidraw"
|
app = "excalidraw"
|
||||||
tier = local.tiers.aux
|
tier = local.tiers.aux
|
||||||
}
|
}
|
||||||
# Keel rolls new ghcr:latest digests (k8s-portal pattern). Values here are
|
|
||||||
# recreate-correct seeds only — the keys are in ignore_changes below, so
|
|
||||||
# the live annotations win on an existing deployment.
|
|
||||||
annotations = {
|
|
||||||
"keel.sh/policy" = "force"
|
|
||||||
"keel.sh/trigger" = "poll"
|
|
||||||
"keel.sh/match-tag" = "true"
|
|
||||||
"keel.sh/pollSchedule" = "@every 5m"
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
spec {
|
spec {
|
||||||
replicas = 1
|
replicas = 1
|
||||||
|
|
@ -76,19 +67,9 @@ resource "kubernetes_deployment" "excalidraw" {
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
spec {
|
spec {
|
||||||
# GHCR pull secret: the ghcr-credentials Secret in this namespace is
|
|
||||||
# cloned in by the kyverno stack's sync-ghcr-credentials ClusterPolicy
|
|
||||||
# (allowlisted private-ghcr namespaces only — ADR-0002). Source of
|
|
||||||
# truth: stacks/kyverno/modules/kyverno/ghcr-credentials.tf.
|
|
||||||
image_pull_secrets {
|
|
||||||
name = "ghcr-credentials"
|
|
||||||
}
|
|
||||||
container {
|
container {
|
||||||
# ADR-0002: GHA-built (.github/workflows/build-excalidraw.yml),
|
image = "viktorbarzin/excalidraw-library:v4"
|
||||||
# PRIVATE ghcr; Keel rolls new :latest digests. DockerHub
|
image_pull_policy = "IfNotPresent"
|
||||||
# viktorbarzin/excalidraw-library:v4 is the frozen rollback image.
|
|
||||||
image = "ghcr.io/viktorbarzin/excalidraw-library:latest"
|
|
||||||
image_pull_policy = "Always"
|
|
||||||
name = "excalidraw"
|
name = "excalidraw"
|
||||||
port {
|
port {
|
||||||
container_port = 8080
|
container_port = 8080
|
||||||
|
|
@ -126,7 +107,7 @@ resource "kubernetes_deployment" "excalidraw" {
|
||||||
}
|
}
|
||||||
lifecycle {
|
lifecycle {
|
||||||
ignore_changes = [
|
ignore_changes = [
|
||||||
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
|
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
|
||||||
spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
|
spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
|
||||||
metadata[0].annotations["keel.sh/policy"],
|
metadata[0].annotations["keel.sh/policy"],
|
||||||
metadata[0].annotations["keel.sh/trigger"],
|
metadata[0].annotations["keel.sh/trigger"],
|
||||||
|
|
|
||||||
|
|
@ -4,28 +4,18 @@ A self-hosted Excalidraw library with per-user drawing storage and management.
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- Dashboard to manage all your drawings (create, open, rename, delete)
|
- Dashboard to manage all your drawings
|
||||||
- Per-user storage (via Authentik SSO headers)
|
- Per-user storage (via Authentik SSO headers)
|
||||||
- Rename drawings from the dashboard or by clicking the drawing name in the editor
|
- Create, edit, and delete drawings
|
||||||
- Native Excalidraw export via the editor's hamburger menu: "Save to..."
|
|
||||||
(.excalidraw file) and "Export image..." (PNG / SVG / clipboard)
|
|
||||||
- Autosave (2s debounce) + manual save (Ctrl+S or menu "Save now")
|
|
||||||
- Persistent storage via NFS
|
- Persistent storage via NFS
|
||||||
|
|
||||||
## Docker Image
|
## Docker Image
|
||||||
|
|
||||||
```
|
```
|
||||||
ghcr.io/viktorbarzin/excalidraw-library:latest
|
viktorbarzin/excalidraw-library:v4
|
||||||
```
|
```
|
||||||
|
|
||||||
Built by GitHub Actions (`.github/workflows/build-excalidraw.yml` in the infra
|
Available on Docker Hub: https://hub.docker.com/r/viktorbarzin/excalidraw-library
|
||||||
repo, ADR-0002) on every master push touching `stacks/excalidraw/project/**`;
|
|
||||||
tags `:latest` + `:<git-sha>`. The package is PRIVATE — cluster pulls use the
|
|
||||||
Kyverno-synced `ghcr-credentials` secret. Keel polls `:latest` and rolls the
|
|
||||||
deployment on digest change.
|
|
||||||
|
|
||||||
The legacy manually-built DockerHub image `viktorbarzin/excalidraw-library:v4`
|
|
||||||
is frozen as the rollback target; nothing pushes to it anymore.
|
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
|
|
@ -49,13 +39,54 @@ Mount a persistent volume to the `DATA_DIR` path. Drawings are stored as `.excal
|
||||||
└── my-diagram.excalidraw
|
└── my-diagram.excalidraw
|
||||||
```
|
```
|
||||||
|
|
||||||
The filename (without extension) is both the drawing ID and its display name;
|
|
||||||
renaming a drawing renames the file (`os.Rename`, mtime preserved).
|
|
||||||
|
|
||||||
## Deployment
|
## Deployment
|
||||||
|
|
||||||
Deployed by the `stacks/excalidraw` Terraform stack (namespace `excalidraw`,
|
### Docker
|
||||||
service `draw`, ingress `draw.viktorbarzin.me` with `auth = "required"`).
|
|
||||||
|
```bash
|
||||||
|
docker run -d \
|
||||||
|
--name excalidraw-rooms \
|
||||||
|
-p 8080:8080 \
|
||||||
|
-v /path/to/storage:/data \
|
||||||
|
viktorbarzin/excalidraw-library:v4
|
||||||
|
```
|
||||||
|
|
||||||
|
### Kubernetes
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: excalidraw
|
||||||
|
spec:
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: excalidraw
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: excalidraw
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: excalidraw
|
||||||
|
image: viktorbarzin/excalidraw-library:v4
|
||||||
|
ports:
|
||||||
|
- containerPort: 8080
|
||||||
|
env:
|
||||||
|
- name: DATA_DIR
|
||||||
|
value: /data
|
||||||
|
- name: PORT
|
||||||
|
value: "8080"
|
||||||
|
volumeMounts:
|
||||||
|
- name: data
|
||||||
|
mountPath: /data
|
||||||
|
volumes:
|
||||||
|
- name: data
|
||||||
|
nfs:
|
||||||
|
server: 192.168.1.127
|
||||||
|
path: /srv/nfs/excalidraw
|
||||||
|
```
|
||||||
|
|
||||||
### With Authentik SSO
|
### With Authentik SSO
|
||||||
|
|
||||||
|
|
@ -65,7 +96,23 @@ The application reads user identity from Authentik headers:
|
||||||
- `X-Authentik-Email` - Displayed in UI
|
- `X-Authentik-Email` - Displayed in UI
|
||||||
- `X-Authentik-Name` - Displayed in UI
|
- `X-Authentik-Name` - Displayed in UI
|
||||||
|
|
||||||
Requests without `X-Authentik-Username` fall back to the `anonymous` user.
|
Configure your ingress to pass these headers:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
annotations:
|
||||||
|
nginx.ingress.kubernetes.io/auth-response-headers: "X-authentik-username,X-authentik-email,X-authentik-name"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Building
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build the Docker image
|
||||||
|
docker build -t excalidraw-library .
|
||||||
|
|
||||||
|
# Or build locally
|
||||||
|
go build -o excalidraw-library .
|
||||||
|
./excalidraw-library
|
||||||
|
```
|
||||||
|
|
||||||
## API Endpoints
|
## API Endpoints
|
||||||
|
|
||||||
|
|
@ -75,25 +122,10 @@ Requests without `X-Authentik-Username` fall back to the `anonymous` user.
|
||||||
| GET | `/api/drawings` | List all drawings for current user |
|
| GET | `/api/drawings` | List all drawings for current user |
|
||||||
| GET | `/api/drawings/:id` | Get drawing data |
|
| GET | `/api/drawings/:id` | Get drawing data |
|
||||||
| PUT | `/api/drawings/:id` | Save drawing |
|
| PUT | `/api/drawings/:id` | Save drawing |
|
||||||
| PATCH | `/api/drawings/:id` | Rename drawing — body `{"name": "<new-name>"}`; returns `{"status":"renamed","id":"<new-id>"}`; 409 if the target name exists |
|
|
||||||
| DELETE | `/api/drawings/:id` | Delete drawing |
|
| DELETE | `/api/drawings/:id` | Delete drawing |
|
||||||
| GET | `/api/user` | Get current user info |
|
| GET | `/api/user` | Get current user info |
|
||||||
| GET | `/draw/:id` | Open drawing in editor |
|
| GET | `/draw/:id` | Open drawing in editor |
|
||||||
|
|
||||||
Rename names are sanitized server-side to `[a-zA-Z0-9-_]` (other characters
|
|
||||||
become `-`; a trailing `.excalidraw` is stripped). Existing IDs are accepted
|
|
||||||
as-is for backward compatibility with API clients.
|
|
||||||
|
|
||||||
## Development
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Run tests
|
|
||||||
go test ./...
|
|
||||||
|
|
||||||
# Run locally
|
|
||||||
DATA_DIR=/tmp/excalidraw-data go run .
|
|
||||||
```
|
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
MIT
|
MIT
|
||||||
|
|
|
||||||
|
|
@ -9,7 +9,6 @@ import (
|
||||||
"net/http"
|
"net/http"
|
||||||
"os"
|
"os"
|
||||||
"path/filepath"
|
"path/filepath"
|
||||||
"regexp"
|
|
||||||
"sort"
|
"sort"
|
||||||
"strings"
|
"strings"
|
||||||
"time"
|
"time"
|
||||||
|
|
@ -64,21 +63,6 @@ func getUsername(r *http.Request) string {
|
||||||
return username
|
return username
|
||||||
}
|
}
|
||||||
|
|
||||||
var invalidNameChars = regexp.MustCompile(`[^a-zA-Z0-9-_]`)
|
|
||||||
|
|
||||||
// sanitizeName normalizes a user-supplied drawing name into a safe file ID
|
|
||||||
// (same charset the dashboard applies on create). Returns "" if nothing
|
|
||||||
// meaningful remains.
|
|
||||||
func sanitizeName(name string) string {
|
|
||||||
name = strings.TrimSpace(name)
|
|
||||||
name = strings.TrimSuffix(name, ".excalidraw")
|
|
||||||
name = invalidNameChars.ReplaceAllString(name, "-")
|
|
||||||
if strings.Trim(name, "-") == "" {
|
|
||||||
return ""
|
|
||||||
}
|
|
||||||
return name
|
|
||||||
}
|
|
||||||
|
|
||||||
// getUserDataDir returns the data directory for a specific user and ensures it exists
|
// getUserDataDir returns the data directory for a specific user and ensures it exists
|
||||||
func getUserDataDir(username string) string {
|
func getUserDataDir(username string) string {
|
||||||
userDir := filepath.Join(dataDir, username)
|
userDir := filepath.Join(dataDir, username)
|
||||||
|
|
@ -184,41 +168,6 @@ func handleDrawing(w http.ResponseWriter, r *http.Request) {
|
||||||
w.Header().Set("Content-Type", "application/json")
|
w.Header().Set("Content-Type", "application/json")
|
||||||
json.NewEncoder(w).Encode(map[string]string{"status": "saved", "id": id})
|
json.NewEncoder(w).Encode(map[string]string{"status": "saved", "id": id})
|
||||||
|
|
||||||
case http.MethodPatch:
|
|
||||||
var req struct {
|
|
||||||
Name string `json:"name"`
|
|
||||||
}
|
|
||||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
|
||||||
http.Error(w, "Invalid JSON body", http.StatusBadRequest)
|
|
||||||
return
|
|
||||||
}
|
|
||||||
newID := sanitizeName(req.Name)
|
|
||||||
if newID == "" {
|
|
||||||
http.Error(w, "Invalid name", http.StatusBadRequest)
|
|
||||||
return
|
|
||||||
}
|
|
||||||
if _, err := os.Stat(filePath); err != nil {
|
|
||||||
if os.IsNotExist(err) {
|
|
||||||
http.Error(w, "Drawing not found", http.StatusNotFound)
|
|
||||||
} else {
|
|
||||||
http.Error(w, err.Error(), http.StatusInternalServerError)
|
|
||||||
}
|
|
||||||
return
|
|
||||||
}
|
|
||||||
if newID != id {
|
|
||||||
newPath := filepath.Join(userDataDir, newID+".excalidraw")
|
|
||||||
if _, err := os.Stat(newPath); err == nil {
|
|
||||||
http.Error(w, "A drawing with that name already exists", http.StatusConflict)
|
|
||||||
return
|
|
||||||
}
|
|
||||||
if err := os.Rename(filePath, newPath); err != nil {
|
|
||||||
http.Error(w, err.Error(), http.StatusInternalServerError)
|
|
||||||
return
|
|
||||||
}
|
|
||||||
}
|
|
||||||
w.Header().Set("Content-Type", "application/json")
|
|
||||||
json.NewEncoder(w).Encode(map[string]string{"status": "renamed", "id": newID})
|
|
||||||
|
|
||||||
case http.MethodDelete:
|
case http.MethodDelete:
|
||||||
if err := os.Remove(filePath); err != nil {
|
if err := os.Remove(filePath); err != nil {
|
||||||
if os.IsNotExist(err) {
|
if os.IsNotExist(err) {
|
||||||
|
|
@ -315,8 +264,6 @@ const dashboardHTML = `<!DOCTYPE html>
|
||||||
.btn:hover { background: #5b4cdb; }
|
.btn:hover { background: #5b4cdb; }
|
||||||
.btn-danger { background: #e74c3c; }
|
.btn-danger { background: #e74c3c; }
|
||||||
.btn-danger:hover { background: #c0392b; }
|
.btn-danger:hover { background: #c0392b; }
|
||||||
.btn-secondary { background: #3d3d5c; }
|
|
||||||
.btn-secondary:hover { background: #4a4a70; }
|
|
||||||
.btn-small { padding: 0.4rem 0.8rem; font-size: 0.85rem; }
|
.btn-small { padding: 0.4rem 0.8rem; font-size: 0.85rem; }
|
||||||
.drawings { display: grid; gap: 1rem; }
|
.drawings { display: grid; gap: 1rem; }
|
||||||
.drawing {
|
.drawing {
|
||||||
|
|
@ -395,11 +342,11 @@ const dashboardHTML = `<!DOCTYPE html>
|
||||||
|
|
||||||
<div id="modal" class="modal">
|
<div id="modal" class="modal">
|
||||||
<div class="modal-content">
|
<div class="modal-content">
|
||||||
<h2 id="modal-title">New Drawing</h2>
|
<h2>New Drawing</h2>
|
||||||
<input type="text" id="drawingName" placeholder="Drawing name..." autofocus>
|
<input type="text" id="drawingName" placeholder="Drawing name..." autofocus>
|
||||||
<div class="modal-actions">
|
<div class="modal-actions">
|
||||||
<button class="btn" style="background:#444" onclick="hideModal()">Cancel</button>
|
<button class="btn" style="background:#444" onclick="hideModal()">Cancel</button>
|
||||||
<button class="btn" id="modal-confirm" onclick="confirmModal()">Create</button>
|
<button class="btn" onclick="createDrawing()">Create</button>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|
@ -422,63 +369,31 @@ const dashboardHTML = `<!DOCTYPE html>
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
function drawingRow(d) {
|
|
||||||
var row = document.createElement('div');
|
|
||||||
row.className = 'drawing';
|
|
||||||
|
|
||||||
var info = document.createElement('div');
|
|
||||||
info.className = 'drawing-info';
|
|
||||||
var nameLink = document.createElement('a');
|
|
||||||
nameLink.className = 'drawing-name';
|
|
||||||
nameLink.href = '/draw/' + encodeURIComponent(d.id);
|
|
||||||
nameLink.textContent = d.name;
|
|
||||||
var meta = document.createElement('div');
|
|
||||||
meta.className = 'drawing-meta';
|
|
||||||
meta.textContent = 'Modified: ' + new Date(d.modified).toLocaleDateString() + ' ' +
|
|
||||||
new Date(d.modified).toLocaleTimeString() + ' - ' + formatSize(d.size);
|
|
||||||
info.appendChild(nameLink);
|
|
||||||
info.appendChild(meta);
|
|
||||||
|
|
||||||
var actions = document.createElement('div');
|
|
||||||
actions.className = 'drawing-actions';
|
|
||||||
var open = document.createElement('a');
|
|
||||||
open.className = 'btn btn-small';
|
|
||||||
open.href = '/draw/' + encodeURIComponent(d.id);
|
|
||||||
open.textContent = 'Open';
|
|
||||||
var rename = document.createElement('button');
|
|
||||||
rename.className = 'btn btn-small btn-secondary';
|
|
||||||
rename.textContent = 'Rename';
|
|
||||||
rename.onclick = function() { showRenameModal(d.id); };
|
|
||||||
var del = document.createElement('button');
|
|
||||||
del.className = 'btn btn-small btn-danger';
|
|
||||||
del.textContent = 'Delete';
|
|
||||||
del.onclick = function() { deleteDrawing(d.id); };
|
|
||||||
actions.appendChild(open);
|
|
||||||
actions.appendChild(rename);
|
|
||||||
actions.appendChild(del);
|
|
||||||
|
|
||||||
row.appendChild(info);
|
|
||||||
row.appendChild(actions);
|
|
||||||
return row;
|
|
||||||
}
|
|
||||||
|
|
||||||
async function loadDrawings() {
|
async function loadDrawings() {
|
||||||
const resp = await fetch('/api/drawings');
|
const resp = await fetch('/api/drawings');
|
||||||
const drawings = await resp.json();
|
const drawings = await resp.json();
|
||||||
const container = document.getElementById('drawings');
|
const container = document.getElementById('drawings');
|
||||||
container.replaceChildren();
|
|
||||||
|
|
||||||
if (!drawings || drawings.length === 0) {
|
if (!drawings || drawings.length === 0) {
|
||||||
var empty = document.createElement('div');
|
container.innerHTML = '<div class="empty">No drawings yet. Create your first one!</div>';
|
||||||
empty.className = 'empty';
|
|
||||||
empty.textContent = 'No drawings yet. Create your first one!';
|
|
||||||
container.appendChild(empty);
|
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
drawings.forEach(function(d) {
|
container.innerHTML = drawings.map(function(d) {
|
||||||
container.appendChild(drawingRow(d));
|
return '<div class="drawing">' +
|
||||||
});
|
'<div class="drawing-info">' +
|
||||||
|
'<a href="/draw/' + d.id + '" class="drawing-name">' + d.name + '</a>' +
|
||||||
|
'<div class="drawing-meta">' +
|
||||||
|
'Modified: ' + new Date(d.modified).toLocaleDateString() + ' ' + new Date(d.modified).toLocaleTimeString() +
|
||||||
|
' - ' + formatSize(d.size) +
|
||||||
|
'</div>' +
|
||||||
|
'</div>' +
|
||||||
|
'<div class="drawing-actions">' +
|
||||||
|
'<a href="/draw/' + d.id + '" class="btn btn-small">Open</a>' +
|
||||||
|
'<button class="btn btn-small btn-danger" onclick="deleteDrawing(\'' + d.id + '\')">Delete</button>' +
|
||||||
|
'</div>' +
|
||||||
|
'</div>';
|
||||||
|
}).join('');
|
||||||
}
|
}
|
||||||
|
|
||||||
function formatSize(bytes) {
|
function formatSize(bytes) {
|
||||||
|
|
@ -487,64 +402,18 @@ const dashboardHTML = `<!DOCTYPE html>
|
||||||
return (bytes / (1024 * 1024)).toFixed(1) + ' MB';
|
return (bytes / (1024 * 1024)).toFixed(1) + ' MB';
|
||||||
}
|
}
|
||||||
|
|
||||||
var modalAction = null; // invoked with the input value on confirm
|
|
||||||
|
|
||||||
function showModal(title, confirmLabel, initialValue, action) {
|
|
||||||
document.getElementById('modal-title').textContent = title;
|
|
||||||
document.getElementById('modal-confirm').textContent = confirmLabel;
|
|
||||||
var input = document.getElementById('drawingName');
|
|
||||||
input.value = initialValue || '';
|
|
||||||
modalAction = action;
|
|
||||||
document.getElementById('modal').classList.add('active');
|
|
||||||
input.focus();
|
|
||||||
input.select();
|
|
||||||
}
|
|
||||||
|
|
||||||
function showNewModal() {
|
function showNewModal() {
|
||||||
showModal('New Drawing', 'Create', '', createDrawing);
|
document.getElementById('modal').classList.add('active');
|
||||||
}
|
document.getElementById('drawingName').focus();
|
||||||
|
|
||||||
function showRenameModal(id) {
|
|
||||||
showModal('Rename Drawing', 'Rename', id, function(value) {
|
|
||||||
renameDrawing(id, value);
|
|
||||||
});
|
|
||||||
}
|
}
|
||||||
|
|
||||||
function hideModal() {
|
function hideModal() {
|
||||||
document.getElementById('modal').classList.remove('active');
|
document.getElementById('modal').classList.remove('active');
|
||||||
document.getElementById('drawingName').value = '';
|
document.getElementById('drawingName').value = '';
|
||||||
modalAction = null;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
function confirmModal() {
|
async function createDrawing() {
|
||||||
if (modalAction) modalAction(document.getElementById('drawingName').value);
|
var name = document.getElementById('drawingName').value.trim();
|
||||||
}
|
|
||||||
|
|
||||||
async function renameDrawing(id, newName) {
|
|
||||||
newName = (newName || '').trim();
|
|
||||||
if (!newName || newName === id) {
|
|
||||||
hideModal();
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
var resp = await fetch('/api/drawings/' + encodeURIComponent(id), {
|
|
||||||
method: 'PATCH',
|
|
||||||
headers: { 'Content-Type': 'application/json' },
|
|
||||||
body: JSON.stringify({ name: newName })
|
|
||||||
});
|
|
||||||
if (resp.status === 409) {
|
|
||||||
alert('A drawing with that name already exists.');
|
|
||||||
return; // keep the modal open so the user can pick another name
|
|
||||||
}
|
|
||||||
if (!resp.ok) {
|
|
||||||
alert('Rename failed: ' + await resp.text());
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
hideModal();
|
|
||||||
loadDrawings();
|
|
||||||
}
|
|
||||||
|
|
||||||
async function createDrawing(name) {
|
|
||||||
name = (name || '').trim();
|
|
||||||
if (!name) {
|
if (!name) {
|
||||||
name = 'drawing-' + Date.now();
|
name = 'drawing-' + Date.now();
|
||||||
}
|
}
|
||||||
|
|
@ -577,7 +446,7 @@ const dashboardHTML = `<!DOCTYPE html>
|
||||||
}
|
}
|
||||||
|
|
||||||
document.getElementById('drawingName').addEventListener('keypress', function(e) {
|
document.getElementById('drawingName').addEventListener('keypress', function(e) {
|
||||||
if (e.key === 'Enter') confirmModal();
|
if (e.key === 'Enter') createDrawing();
|
||||||
});
|
});
|
||||||
|
|
||||||
document.getElementById('modal').addEventListener('click', function(e) {
|
document.getElementById('modal').addEventListener('click', function(e) {
|
||||||
|
|
|
||||||
|
|
@ -1,249 +0,0 @@
|
||||||
package main
|
|
||||||
|
|
||||||
import (
|
|
||||||
"encoding/json"
|
|
||||||
"net/http"
|
|
||||||
"net/http/httptest"
|
|
||||||
"os"
|
|
||||||
"path/filepath"
|
|
||||||
"strings"
|
|
||||||
"testing"
|
|
||||||
)
|
|
||||||
|
|
||||||
const testDrawing = `{"type":"excalidraw","version":2,"source":"excalidraw-library","elements":[{"id":"e1"}],"appState":{"viewBackgroundColor":"#ffffff"}}`
|
|
||||||
|
|
||||||
func setupDataDir(t *testing.T) {
|
|
||||||
t.Helper()
|
|
||||||
dataDir = t.TempDir()
|
|
||||||
}
|
|
||||||
|
|
||||||
// doDrawing sends a request to handleDrawing for the given user and returns the recorder.
|
|
||||||
func doDrawing(t *testing.T, method, id, body, user string) *httptest.ResponseRecorder {
|
|
||||||
t.Helper()
|
|
||||||
var reader *strings.Reader
|
|
||||||
if body == "" {
|
|
||||||
reader = strings.NewReader("")
|
|
||||||
} else {
|
|
||||||
reader = strings.NewReader(body)
|
|
||||||
}
|
|
||||||
req := httptest.NewRequest(method, "/api/drawings/"+id, reader)
|
|
||||||
if user != "" {
|
|
||||||
req.Header.Set("X-Authentik-Username", user)
|
|
||||||
}
|
|
||||||
w := httptest.NewRecorder()
|
|
||||||
handleDrawing(w, req)
|
|
||||||
return w
|
|
||||||
}
|
|
||||||
|
|
||||||
func listDrawings(t *testing.T, user string) []Drawing {
|
|
||||||
t.Helper()
|
|
||||||
req := httptest.NewRequest(http.MethodGet, "/api/drawings", nil)
|
|
||||||
if user != "" {
|
|
||||||
req.Header.Set("X-Authentik-Username", user)
|
|
||||||
}
|
|
||||||
w := httptest.NewRecorder()
|
|
||||||
handleListDrawings(w, req)
|
|
||||||
if w.Code != http.StatusOK {
|
|
||||||
t.Fatalf("list: expected 200, got %d", w.Code)
|
|
||||||
}
|
|
||||||
var drawings []Drawing
|
|
||||||
if err := json.Unmarshal(w.Body.Bytes(), &drawings); err != nil {
|
|
||||||
t.Fatalf("list: bad JSON: %v", err)
|
|
||||||
}
|
|
||||||
return drawings
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestPutGetRoundtrip(t *testing.T) {
|
|
||||||
setupDataDir(t)
|
|
||||||
if w := doDrawing(t, http.MethodPut, "foo", testDrawing, "alice"); w.Code != http.StatusOK {
|
|
||||||
t.Fatalf("PUT: expected 200, got %d: %s", w.Code, w.Body.String())
|
|
||||||
}
|
|
||||||
w := doDrawing(t, http.MethodGet, "foo", "", "alice")
|
|
||||||
if w.Code != http.StatusOK {
|
|
||||||
t.Fatalf("GET: expected 200, got %d", w.Code)
|
|
||||||
}
|
|
||||||
if w.Body.String() != testDrawing {
|
|
||||||
t.Errorf("GET: content mismatch: %s", w.Body.String())
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestGetMissing(t *testing.T) {
|
|
||||||
setupDataDir(t)
|
|
||||||
if w := doDrawing(t, http.MethodGet, "nope", "", "alice"); w.Code != http.StatusNotFound {
|
|
||||||
t.Fatalf("expected 404, got %d", w.Code)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestListDrawings(t *testing.T) {
|
|
||||||
setupDataDir(t)
|
|
||||||
doDrawing(t, http.MethodPut, "one", testDrawing, "alice")
|
|
||||||
doDrawing(t, http.MethodPut, "two", testDrawing, "alice")
|
|
||||||
drawings := listDrawings(t, "alice")
|
|
||||||
if len(drawings) != 2 {
|
|
||||||
t.Fatalf("expected 2 drawings, got %d", len(drawings))
|
|
||||||
}
|
|
||||||
ids := map[string]bool{drawings[0].ID: true, drawings[1].ID: true}
|
|
||||||
if !ids["one"] || !ids["two"] {
|
|
||||||
t.Errorf("unexpected ids: %v", ids)
|
|
||||||
}
|
|
||||||
for _, d := range drawings {
|
|
||||||
if d.Name != d.ID {
|
|
||||||
t.Errorf("name should equal id: %+v", d)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestDelete(t *testing.T) {
|
|
||||||
setupDataDir(t)
|
|
||||||
doDrawing(t, http.MethodPut, "foo", testDrawing, "alice")
|
|
||||||
if w := doDrawing(t, http.MethodDelete, "foo", "", "alice"); w.Code != http.StatusOK {
|
|
||||||
t.Fatalf("DELETE: expected 200, got %d", w.Code)
|
|
||||||
}
|
|
||||||
if w := doDrawing(t, http.MethodGet, "foo", "", "alice"); w.Code != http.StatusNotFound {
|
|
||||||
t.Fatalf("GET after delete: expected 404, got %d", w.Code)
|
|
||||||
}
|
|
||||||
if w := doDrawing(t, http.MethodDelete, "foo", "", "alice"); w.Code != http.StatusNotFound {
|
|
||||||
t.Fatalf("second DELETE: expected 404, got %d", w.Code)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestPerUserIsolation(t *testing.T) {
|
|
||||||
setupDataDir(t)
|
|
||||||
doDrawing(t, http.MethodPut, "secret", testDrawing, "alice")
|
|
||||||
if w := doDrawing(t, http.MethodGet, "secret", "", "bob"); w.Code != http.StatusNotFound {
|
|
||||||
t.Fatalf("bob should not see alice's drawing, got %d", w.Code)
|
|
||||||
}
|
|
||||||
if drawings := listDrawings(t, "bob"); len(drawings) != 0 {
|
|
||||||
t.Fatalf("bob's list should be empty, got %d", len(drawings))
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// --- rename (PATCH) ---
|
|
||||||
|
|
||||||
func renameReq(t *testing.T, id, newName, user string) *httptest.ResponseRecorder {
|
|
||||||
t.Helper()
|
|
||||||
return doDrawing(t, http.MethodPatch, id, `{"name":`+strconv(newName)+`}`, user)
|
|
||||||
}
|
|
||||||
|
|
||||||
// strconv JSON-quotes a string without importing encoding/json for a one-liner.
|
|
||||||
func strconv(s string) string {
|
|
||||||
b, _ := json.Marshal(s)
|
|
||||||
return string(b)
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRenameSuccess(t *testing.T) {
|
|
||||||
setupDataDir(t)
|
|
||||||
doDrawing(t, http.MethodPut, "foo", testDrawing, "alice")
|
|
||||||
w := renameReq(t, "foo", "bar", "alice")
|
|
||||||
if w.Code != http.StatusOK {
|
|
||||||
t.Fatalf("PATCH: expected 200, got %d: %s", w.Code, w.Body.String())
|
|
||||||
}
|
|
||||||
var resp map[string]string
|
|
||||||
if err := json.Unmarshal(w.Body.Bytes(), &resp); err != nil {
|
|
||||||
t.Fatalf("PATCH: bad JSON: %v", err)
|
|
||||||
}
|
|
||||||
if resp["id"] != "bar" || resp["status"] != "renamed" {
|
|
||||||
t.Errorf("unexpected response: %v", resp)
|
|
||||||
}
|
|
||||||
if w := doDrawing(t, http.MethodGet, "bar", "", "alice"); w.Code != http.StatusOK || w.Body.String() != testDrawing {
|
|
||||||
t.Errorf("GET new id: code=%d content=%q", w.Code, w.Body.String())
|
|
||||||
}
|
|
||||||
if w := doDrawing(t, http.MethodGet, "foo", "", "alice"); w.Code != http.StatusNotFound {
|
|
||||||
t.Errorf("GET old id: expected 404, got %d", w.Code)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRenameConflict(t *testing.T) {
|
|
||||||
setupDataDir(t)
|
|
||||||
doDrawing(t, http.MethodPut, "a", testDrawing, "alice")
|
|
||||||
doDrawing(t, http.MethodPut, "b", testDrawing, "alice")
|
|
||||||
if w := renameReq(t, "a", "b", "alice"); w.Code != http.StatusConflict {
|
|
||||||
t.Fatalf("expected 409, got %d", w.Code)
|
|
||||||
}
|
|
||||||
// both drawings intact
|
|
||||||
for _, id := range []string{"a", "b"} {
|
|
||||||
if w := doDrawing(t, http.MethodGet, id, "", "alice"); w.Code != http.StatusOK {
|
|
||||||
t.Errorf("drawing %q should be intact, got %d", id, w.Code)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRenameMissing(t *testing.T) {
|
|
||||||
setupDataDir(t)
|
|
||||||
if w := renameReq(t, "nope", "new", "alice"); w.Code != http.StatusNotFound {
|
|
||||||
t.Fatalf("expected 404, got %d", w.Code)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRenameSameName(t *testing.T) {
|
|
||||||
setupDataDir(t)
|
|
||||||
doDrawing(t, http.MethodPut, "foo", testDrawing, "alice")
|
|
||||||
w := renameReq(t, "foo", "foo", "alice")
|
|
||||||
if w.Code != http.StatusOK {
|
|
||||||
t.Fatalf("same-name rename: expected 200, got %d: %s", w.Code, w.Body.String())
|
|
||||||
}
|
|
||||||
if w := doDrawing(t, http.MethodGet, "foo", "", "alice"); w.Code != http.StatusOK {
|
|
||||||
t.Errorf("drawing should be intact, got %d", w.Code)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRenameInvalidNames(t *testing.T) {
|
|
||||||
setupDataDir(t)
|
|
||||||
doDrawing(t, http.MethodPut, "foo", testDrawing, "alice")
|
|
||||||
for _, name := range []string{"", " ", "../..", "---"} {
|
|
||||||
if w := renameReq(t, "foo", name, "alice"); w.Code != http.StatusBadRequest {
|
|
||||||
t.Errorf("rename to %q: expected 400, got %d", name, w.Code)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
// malformed body
|
|
||||||
if w := doDrawing(t, http.MethodPatch, "foo", `{not json`, "alice"); w.Code != http.StatusBadRequest {
|
|
||||||
t.Errorf("malformed body: expected 400, got %d", w.Code)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRenameSanitization(t *testing.T) {
|
|
||||||
setupDataDir(t)
|
|
||||||
cases := []struct{ in, want string }{
|
|
||||||
{"My Drawing!", "My-Drawing-"},
|
|
||||||
{"net diag.excalidraw", "net-diag"}, // .excalidraw suffix stripped, not mangled
|
|
||||||
{"a/b\\c", "a-b-c"},
|
|
||||||
}
|
|
||||||
for _, c := range cases {
|
|
||||||
doDrawing(t, http.MethodPut, "src", testDrawing, "alice")
|
|
||||||
w := renameReq(t, "src", c.in, "alice")
|
|
||||||
if w.Code != http.StatusOK {
|
|
||||||
t.Errorf("rename to %q: expected 200, got %d: %s", c.in, w.Code, w.Body.String())
|
|
||||||
continue
|
|
||||||
}
|
|
||||||
var resp map[string]string
|
|
||||||
json.Unmarshal(w.Body.Bytes(), &resp)
|
|
||||||
if resp["id"] != c.want {
|
|
||||||
t.Errorf("rename to %q: expected id %q, got %q", c.in, c.want, resp["id"])
|
|
||||||
}
|
|
||||||
// file must be inside the user dir under the sanitized name
|
|
||||||
if _, err := os.Stat(filepath.Join(dataDir, "alice", c.want+".excalidraw")); err != nil {
|
|
||||||
t.Errorf("rename to %q: expected file %q on disk: %v", c.in, c.want, err)
|
|
||||||
}
|
|
||||||
doDrawing(t, http.MethodDelete, resp["id"], "", "alice")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
func TestRenameTraversalStaysInUserDir(t *testing.T) {
|
|
||||||
setupDataDir(t)
|
|
||||||
doDrawing(t, http.MethodPut, "foo", testDrawing, "alice")
|
|
||||||
w := renameReq(t, "foo", "../../../etc/passwd", "alice")
|
|
||||||
if w.Code == http.StatusOK {
|
|
||||||
var resp map[string]string
|
|
||||||
json.Unmarshal(w.Body.Bytes(), &resp)
|
|
||||||
if strings.Contains(resp["id"], "/") || strings.Contains(resp["id"], "..") {
|
|
||||||
t.Fatalf("traversal characters survived: %q", resp["id"])
|
|
||||||
}
|
|
||||||
if _, err := os.Stat(filepath.Join(dataDir, "alice", resp["id"]+".excalidraw")); err != nil {
|
|
||||||
t.Fatalf("renamed file escaped user dir: %v", err)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
// nothing outside the data dir
|
|
||||||
if _, err := os.Stat(filepath.Join(dataDir, "..", "etc")); err == nil {
|
|
||||||
t.Fatal("file escaped the data dir")
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
@ -8,41 +8,41 @@
|
||||||
* { margin: 0; padding: 0; }
|
* { margin: 0; padding: 0; }
|
||||||
html, body { width: 100%; height: 100%; overflow: hidden; }
|
html, body { width: 100%; height: 100%; overflow: hidden; }
|
||||||
#root { width: 100%; height: 100%; }
|
#root { width: 100%; height: 100%; }
|
||||||
.top-right-ui {
|
.toolbar {
|
||||||
|
position: fixed;
|
||||||
|
top: 10px;
|
||||||
|
left: 10px;
|
||||||
|
z-index: 1000;
|
||||||
display: flex;
|
display: flex;
|
||||||
align-items: center;
|
|
||||||
gap: 8px;
|
gap: 8px;
|
||||||
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
|
background: rgba(255,255,255,0.95);
|
||||||
}
|
|
||||||
.top-right-ui a, .top-right-ui button {
|
|
||||||
display: inline-flex;
|
|
||||||
align-items: center;
|
|
||||||
gap: 6px;
|
|
||||||
padding: 8px 12px;
|
padding: 8px 12px;
|
||||||
border: 1px solid transparent;
|
|
||||||
border-radius: 8px;
|
border-radius: 8px;
|
||||||
|
box-shadow: 0 2px 8px rgba(0,0,0,0.15);
|
||||||
|
}
|
||||||
|
.toolbar button, .toolbar a {
|
||||||
|
padding: 6px 14px;
|
||||||
|
border: none;
|
||||||
|
border-radius: 6px;
|
||||||
cursor: pointer;
|
cursor: pointer;
|
||||||
font-size: 13px;
|
font-size: 14px;
|
||||||
|
background: #6c5ce7;
|
||||||
|
color: white;
|
||||||
text-decoration: none;
|
text-decoration: none;
|
||||||
box-shadow: 0 1px 4px rgba(0,0,0,0.12);
|
display: inline-block;
|
||||||
max-width: 40vw;
|
|
||||||
white-space: nowrap;
|
|
||||||
overflow: hidden;
|
|
||||||
text-overflow: ellipsis;
|
|
||||||
}
|
}
|
||||||
.top-right-ui.theme-light a, .top-right-ui.theme-light button {
|
.toolbar button:hover, .toolbar a:hover { background: #5b4cdb; }
|
||||||
background: #ffffff;
|
.toolbar .secondary { background: #ddd; color: #333; }
|
||||||
color: #1b1b1f;
|
.toolbar .secondary:hover { background: #ccc; }
|
||||||
|
.toolbar .title {
|
||||||
|
font-weight: 600;
|
||||||
|
padding: 6px 0;
|
||||||
|
color: #333;
|
||||||
}
|
}
|
||||||
.top-right-ui.theme-dark a, .top-right-ui.theme-dark button {
|
|
||||||
background: #232329;
|
|
||||||
color: #e9ecef;
|
|
||||||
}
|
|
||||||
.top-right-ui button:hover, .top-right-ui a:hover { border-color: #a29bfe; }
|
|
||||||
.status {
|
.status {
|
||||||
position: fixed;
|
position: fixed;
|
||||||
bottom: 10px;
|
bottom: 10px;
|
||||||
right: 60px;
|
right: 10px;
|
||||||
padding: 6px 12px;
|
padding: 6px 12px;
|
||||||
background: rgba(0,0,0,0.7);
|
background: rgba(0,0,0,0.7);
|
||||||
color: white;
|
color: white;
|
||||||
|
|
@ -51,7 +51,6 @@
|
||||||
z-index: 1000;
|
z-index: 1000;
|
||||||
opacity: 0;
|
opacity: 0;
|
||||||
transition: opacity 0.3s;
|
transition: opacity 0.3s;
|
||||||
pointer-events: none;
|
|
||||||
}
|
}
|
||||||
.status.show { opacity: 1; }
|
.status.show { opacity: 1; }
|
||||||
.loading {
|
.loading {
|
||||||
|
|
@ -68,6 +67,11 @@
|
||||||
</style>
|
</style>
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
|
<div class="toolbar">
|
||||||
|
<a href="/" class="secondary">Back to Library</a>
|
||||||
|
<span class="title" id="title">Loading...</span>
|
||||||
|
<button onclick="saveDrawing()">Save</button>
|
||||||
|
</div>
|
||||||
<div id="root">
|
<div id="root">
|
||||||
<div class="loading">
|
<div class="loading">
|
||||||
<div>Loading Excalidraw...</div>
|
<div>Loading Excalidraw...</div>
|
||||||
|
|
@ -77,33 +81,16 @@
|
||||||
<div id="status" class="status">Saved</div>
|
<div id="status" class="status">Saved</div>
|
||||||
|
|
||||||
<script>
|
<script>
|
||||||
// Replaces #root with an error panel (safe DOM methods, no innerHTML).
|
|
||||||
function showFatal(title, detail) {
|
|
||||||
var root = document.getElementById('root');
|
|
||||||
root.replaceChildren();
|
|
||||||
var panel = document.createElement('div');
|
|
||||||
panel.className = 'loading error';
|
|
||||||
var titleEl = document.createElement('div');
|
|
||||||
titleEl.textContent = title;
|
|
||||||
panel.appendChild(titleEl);
|
|
||||||
if (detail) {
|
|
||||||
var detailEl = document.createElement('div');
|
|
||||||
detailEl.style.fontSize = '0.9rem';
|
|
||||||
detailEl.textContent = detail;
|
|
||||||
panel.appendChild(detailEl);
|
|
||||||
}
|
|
||||||
root.appendChild(panel);
|
|
||||||
}
|
|
||||||
|
|
||||||
// Get drawing ID from URL path: /draw/{id}
|
// Get drawing ID from URL path: /draw/{id}
|
||||||
var pathParts = window.location.pathname.split('/');
|
var pathParts = window.location.pathname.split('/');
|
||||||
var drawingId = pathParts[pathParts.length - 1] || pathParts[pathParts.length - 2];
|
var drawingId = pathParts[pathParts.length - 1] || pathParts[pathParts.length - 2];
|
||||||
|
|
||||||
if (!drawingId) {
|
if (!drawingId) {
|
||||||
showFatal('No drawing ID specified');
|
document.getElementById('root').innerHTML = '<div class="loading error">No drawing ID specified</div>';
|
||||||
throw new Error('No drawing ID');
|
throw new Error('No drawing ID');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
document.getElementById('title').textContent = drawingId;
|
||||||
document.title = drawingId + ' - Excalidraw';
|
document.title = drawingId + ' - Excalidraw';
|
||||||
|
|
||||||
var excalidrawAPI = null;
|
var excalidrawAPI = null;
|
||||||
|
|
@ -172,46 +159,6 @@
|
||||||
autoSaveTimeout = setTimeout(saveDrawing, 2000);
|
autoSaveTimeout = setTimeout(saveDrawing, 2000);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Renames the current drawing via the API. Returns the new ID, or null
|
|
||||||
// if the rename was cancelled or failed.
|
|
||||||
async function renameCurrentDrawing() {
|
|
||||||
var newName = window.prompt('Rename drawing', drawingId);
|
|
||||||
if (newName === null) return null;
|
|
||||||
newName = newName.trim();
|
|
||||||
if (!newName || newName === drawingId) return null;
|
|
||||||
|
|
||||||
// A pending autosave would resurrect the old file after the rename.
|
|
||||||
clearTimeout(autoSaveTimeout);
|
|
||||||
|
|
||||||
var resp;
|
|
||||||
try {
|
|
||||||
resp = await fetch('/api/drawings/' + drawingId, {
|
|
||||||
method: 'PATCH',
|
|
||||||
headers: { 'Content-Type': 'application/json' },
|
|
||||||
body: JSON.stringify({ name: newName })
|
|
||||||
});
|
|
||||||
} catch (e) {
|
|
||||||
showStatus('Rename failed!');
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
if (resp.status === 409) {
|
|
||||||
window.alert('A drawing with that name already exists.');
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
if (!resp.ok) {
|
|
||||||
window.alert('Rename failed: ' + (await resp.text()));
|
|
||||||
return null;
|
|
||||||
}
|
|
||||||
var result = await resp.json();
|
|
||||||
drawingId = result.id;
|
|
||||||
document.title = drawingId + ' - Excalidraw';
|
|
||||||
window.history.replaceState(null, '', '/draw/' + encodeURIComponent(drawingId));
|
|
||||||
showStatus('Renamed');
|
|
||||||
// Flush any unsaved changes to the new file.
|
|
||||||
saveDrawing();
|
|
||||||
return drawingId;
|
|
||||||
}
|
|
||||||
|
|
||||||
// Load scripts dynamically
|
// Load scripts dynamically
|
||||||
function loadScript(src) {
|
function loadScript(src) {
|
||||||
return new Promise(function(resolve, reject) {
|
return new Promise(function(resolve, reject) {
|
||||||
|
|
@ -250,76 +197,33 @@
|
||||||
|
|
||||||
updateLoadStatus('Rendering Excalidraw...');
|
updateLoadStatus('Rendering Excalidraw...');
|
||||||
|
|
||||||
var e = React.createElement;
|
// Create Excalidraw component
|
||||||
var MainMenu = ExcalidrawLib.MainMenu;
|
|
||||||
|
|
||||||
// Native default menu items, existence-guarded so a library
|
|
||||||
// update that drops one degrades gracefully.
|
|
||||||
function defaultItem(name) {
|
|
||||||
var C = MainMenu && MainMenu.DefaultItems && MainMenu.DefaultItems[name];
|
|
||||||
return C ? e(C, { key: name }) : null;
|
|
||||||
}
|
|
||||||
|
|
||||||
function App() {
|
function App() {
|
||||||
var nameState = React.useState(drawingId);
|
return React.createElement(ExcalidrawLib.Excalidraw, {
|
||||||
var name = nameState[0], setName = nameState[1];
|
|
||||||
|
|
||||||
function onRename() {
|
|
||||||
renameCurrentDrawing().then(function(newId) {
|
|
||||||
if (newId) setName(newId);
|
|
||||||
});
|
|
||||||
}
|
|
||||||
|
|
||||||
// The menu is where the native export features live:
|
|
||||||
// Export = "Save to..." (.excalidraw), SaveAsImage =
|
|
||||||
// "Export image..." (PNG / SVG / clipboard).
|
|
||||||
var menu = MainMenu ? e(MainMenu, { key: 'menu' },
|
|
||||||
e(MainMenu.Item, { key: 'back', onSelect: function() { window.location.href = '/'; } }, 'Back to Library'),
|
|
||||||
e(MainMenu.Item, { key: 'save', onSelect: saveDrawing }, 'Save now'),
|
|
||||||
e(MainMenu.Item, { key: 'rename', onSelect: onRename }, 'Rename drawing…'),
|
|
||||||
MainMenu.Separator ? e(MainMenu.Separator, { key: 'sep1' }) : null,
|
|
||||||
defaultItem('LoadScene'),
|
|
||||||
defaultItem('Export'),
|
|
||||||
defaultItem('SaveAsImage'),
|
|
||||||
MainMenu.Separator ? e(MainMenu.Separator, { key: 'sep2' }) : null,
|
|
||||||
defaultItem('ClearCanvas'),
|
|
||||||
defaultItem('ToggleTheme'),
|
|
||||||
defaultItem('ChangeCanvasBackground'),
|
|
||||||
defaultItem('Help')
|
|
||||||
) : null;
|
|
||||||
|
|
||||||
return e(ExcalidrawLib.Excalidraw, {
|
|
||||||
initialData: initialData ? {
|
initialData: initialData ? {
|
||||||
elements: initialData.elements || [],
|
elements: initialData.elements || [],
|
||||||
appState: initialData.appState || {}
|
appState: initialData.appState || {}
|
||||||
} : undefined,
|
} : undefined,
|
||||||
UIOptions: { canvasActions: { toggleTheme: true } },
|
|
||||||
excalidrawAPI: function(api) {
|
excalidrawAPI: function(api) {
|
||||||
excalidrawAPI = api;
|
excalidrawAPI = api;
|
||||||
console.log('Excalidraw API ready');
|
console.log('Excalidraw API ready');
|
||||||
},
|
},
|
||||||
onChange: onChange,
|
onChange: onChange
|
||||||
renderTopRightUI: function(isMobile, appState) {
|
});
|
||||||
return e('div', { className: 'top-right-ui theme-' + (appState.theme || 'light') },
|
|
||||||
e('a', { key: 'home', href: '/', title: 'Back to Library' }, '← Library'),
|
|
||||||
e('button', {
|
|
||||||
key: 'name',
|
|
||||||
title: 'Click to rename',
|
|
||||||
onClick: onRename
|
|
||||||
}, name + ' ✎')
|
|
||||||
);
|
|
||||||
}
|
|
||||||
}, menu);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
var root = ReactDOM.createRoot(document.getElementById('root'));
|
var root = ReactDOM.createRoot(document.getElementById('root'));
|
||||||
root.render(e(App));
|
root.render(React.createElement(App));
|
||||||
|
|
||||||
console.log('Excalidraw rendered successfully');
|
console.log('Excalidraw rendered successfully');
|
||||||
|
|
||||||
} catch (err) {
|
} catch (e) {
|
||||||
console.error('Init error:', err);
|
console.error('Init error:', e);
|
||||||
showFatal('Failed to load Excalidraw', err.message);
|
document.getElementById('root').innerHTML =
|
||||||
|
'<div class="loading error">' +
|
||||||
|
'<div>Failed to load Excalidraw</div>' +
|
||||||
|
'<div style="font-size:0.9rem">' + e.message + '</div>' +
|
||||||
|
'</div>';
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,49 +0,0 @@
|
||||||
# emo's Claude → Excalidraw upload RBAC.
|
|
||||||
#
|
|
||||||
# emo's agent uploads drawings with `kubectl -n excalidraw port-forward svc/draw`
|
|
||||||
# + `PUT /api/drawings/<name>` carrying the X-Authentik-Username header (the
|
|
||||||
# documented recipe in emo's ~/.claude/CLAUDE.md — the app sits behind Authentik
|
|
||||||
# forward-auth, so direct curl gets redirected). His hands-off credential is the
|
|
||||||
# chrome-service/emo-browser ServiceAccount kubeconfig (stacks/chrome-service/rbac.tf);
|
|
||||||
# its cluster-wide grant (oidc-power-user-readonly) is read-only, so pods/portforward
|
|
||||||
# must be granted per namespace. This is the excalidraw-namespace grant
|
|
||||||
# (Viktor's call, 2026-07-02; same pattern as the chrome-service one).
|
|
||||||
#
|
|
||||||
# TRADE-OFF (accepted): port-forward into this namespace bypasses the Authentik
|
|
||||||
# ingress and the drawings API trusts the X-Authentik-Username header, so the SA
|
|
||||||
# can read/write ANY user's drawings, not only emo's. The namespace runs nothing
|
|
||||||
# but the drawings app, and the same class of trade-off was already accepted for
|
|
||||||
# the shared browser (CDP reach into Viktor's sessions).
|
|
||||||
|
|
||||||
resource "kubernetes_role" "portforward" {
|
|
||||||
metadata {
|
|
||||||
name = "excalidraw-portforward"
|
|
||||||
namespace = kubernetes_namespace.excalidraw.metadata[0].name
|
|
||||||
}
|
|
||||||
rule {
|
|
||||||
api_groups = [""]
|
|
||||||
resources = ["pods/portforward"]
|
|
||||||
verbs = ["create"]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_role_binding" "emo_browser_portforward" {
|
|
||||||
metadata {
|
|
||||||
name = "emo-browser-portforward"
|
|
||||||
namespace = kubernetes_namespace.excalidraw.metadata[0].name
|
|
||||||
}
|
|
||||||
role_ref {
|
|
||||||
api_group = "rbac.authorization.k8s.io"
|
|
||||||
kind = "Role"
|
|
||||||
name = kubernetes_role.portforward.metadata[0].name
|
|
||||||
}
|
|
||||||
subject {
|
|
||||||
kind = "ServiceAccount"
|
|
||||||
# Defined in stacks/chrome-service/rbac.tf — referenced by name across
|
|
||||||
# stacks, same as that file references the oidc-power-user-readonly
|
|
||||||
# ClusterRole. get/list on pods+services (needed to resolve svc/draw) comes
|
|
||||||
# from the SA's cluster-read binding there.
|
|
||||||
name = "emo-browser"
|
|
||||||
namespace = "chrome-service"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
@ -166,33 +166,6 @@ resource "kubernetes_deployment" "f1-stream" {
|
||||||
name = "DISCORD_CHANNELS"
|
name = "DISCORD_CHANNELS"
|
||||||
value = var.discord_f1_channel_ids
|
value = var.discord_f1_channel_ids
|
||||||
}
|
}
|
||||||
# Replays feature (app repo ADR-0002). optional=true so the pod still
|
|
||||||
# starts before the Reddit app credentials exist; the app treats missing
|
|
||||||
# creds as "replays off" (logs "Replays pipeline disabled"). The
|
|
||||||
# ExternalSecret above uses dataFrom.extract on the Vault "f1-stream"
|
|
||||||
# key, so adding reddit_client_id / reddit_client_secret there auto-syncs
|
|
||||||
# them into this Secret — no ExternalSecret change needed, just a pod
|
|
||||||
# restart to pick them up.
|
|
||||||
env {
|
|
||||||
name = "REDDIT_CLIENT_ID"
|
|
||||||
value_from {
|
|
||||||
secret_key_ref {
|
|
||||||
name = "f1-stream-secrets"
|
|
||||||
key = "reddit_client_id"
|
|
||||||
optional = true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
env {
|
|
||||||
name = "REDDIT_CLIENT_SECRET"
|
|
||||||
value_from {
|
|
||||||
secret_key_ref {
|
|
||||||
name = "f1-stream-secrets"
|
|
||||||
key = "reddit_client_secret"
|
|
||||||
optional = true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
# Verifier connects to in-cluster headed Chromium pool — see
|
# Verifier connects to in-cluster headed Chromium pool — see
|
||||||
# stacks/chrome-service/. Falls back to in-process headless if unset.
|
# stacks/chrome-service/. Falls back to in-process headless if unset.
|
||||||
# 2026-06-04: migrated WS (:3000 / path-token) → CDP (:9222 /
|
# 2026-06-04: migrated WS (:3000 / path-token) → CDP (:9222 /
|
||||||
|
|
|
||||||
|
|
@ -117,9 +117,8 @@ resource "kubernetes_deployment" "frigate" {
|
||||||
limits = {
|
limits = {
|
||||||
memory = "10Gi"
|
memory = "10Gi"
|
||||||
"nvidia.com/gpu" = "1"
|
"nvidia.com/gpu" = "1"
|
||||||
# GPU VRAM budget (ADR-0016): detector + ffmpeg decode (~1.9 GiB),
|
# GPU VRAM budget (ADR-0016): detector + ffmpeg decode (~1.9 GiB).
|
||||||
# +~250 MiB NVDEC headroom for the vermont-garage camera (ADR-0017).
|
"viktorbarzin.me/gpumem" = "2000"
|
||||||
"viktorbarzin.me/gpumem" = "2300"
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
env {
|
env {
|
||||||
|
|
|
||||||
|
|
@ -73,9 +73,7 @@ resource "kubernetes_deployment" "immich-frame-emo" {
|
||||||
}
|
}
|
||||||
spec {
|
spec {
|
||||||
container {
|
container {
|
||||||
# immich_v3: upstream compat tag for Immich v3 — see frame.tf for the
|
image = "ghcr.io/immichframe/immichframe:v1.0.32.0"
|
||||||
# full story; repin to a versioned tag once upstream releases v3 support.
|
|
||||||
image = "ghcr.io/immichframe/immichframe:immich_v3"
|
|
||||||
name = "immich-frame-emo"
|
name = "immich-frame-emo"
|
||||||
resources {
|
resources {
|
||||||
requests = {
|
requests = {
|
||||||
|
|
@ -144,21 +142,14 @@ resource "kubernetes_service" "immich-frame-emo" {
|
||||||
|
|
||||||
module "ingress_emo" {
|
module "ingress_emo" {
|
||||||
source = "../../modules/kubernetes/ingress_factory"
|
source = "../../modules/kubernetes/ingress_factory"
|
||||||
# Photo-frame kiosk display on Emo's Portal Mini (Sofia LAN) — WebView
|
# Photo-frame kiosk display on Emo's Portal — headless browser pulling images
|
||||||
# pulling images via an Immich API key; no user login possible on the
|
# via an Immich API key (no user login). Forward-auth would 302 the device to
|
||||||
# device. Same LAN-only gating as frame.tf: home-lans-only ipAllowList +
|
# Authentik with no way to complete login.
|
||||||
# dns_type "internal" (Emo's Portal already resolves this host internally
|
# auth = "none": photo-frame kiosk; headless browser with API key; no user login.
|
||||||
# via Technitium; the public internal-IP record covers any resolver).
|
auth = "none"
|
||||||
# LAN-only design: docs/plans/2026-07-04-immich-frame-lan-only-design.md.
|
dns_type = "proxied"
|
||||||
# auth = "none": kiosk WebView, no user auth by design; gated by the home-lans-only ipAllowList instead.
|
namespace = "immich"
|
||||||
auth = "none"
|
name = "highlights-immich-emo"
|
||||||
dns_type = "internal"
|
tls_secret_name = var.tls_secret_name
|
||||||
extra_middlewares = ["traefik-home-lans-only@kubernetescrd"]
|
service_name = "immich-frame-emo"
|
||||||
# Not externally reachable — explicit opt-out so external-monitor-sync
|
|
||||||
# drops the old [External] monitor instead of default-opting it back in.
|
|
||||||
external_monitor = false
|
|
||||||
namespace = "immich"
|
|
||||||
name = "highlights-immich-emo"
|
|
||||||
tls_secret_name = var.tls_secret_name
|
|
||||||
service_name = "immich-frame-emo"
|
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -69,11 +69,7 @@ resource "kubernetes_deployment" "immich-frame" {
|
||||||
}
|
}
|
||||||
spec {
|
spec {
|
||||||
container {
|
container {
|
||||||
# immich_v3 is the upstream compat tag for Immich v3 servers — every
|
image = "ghcr.io/immichframe/immichframe:v1.0.32.0"
|
||||||
# versioned release (≤ v1.0.33.0) crashes deserializing v3 API
|
|
||||||
# responses (immichFrame/immichFrame#653). Pin back to a vX.Y.Z.W tag
|
|
||||||
# once a stable release ships v3 support (upstream PR #654).
|
|
||||||
image = "ghcr.io/immichframe/immichframe:immich_v3"
|
|
||||||
name = "immich-frame"
|
name = "immich-frame"
|
||||||
resources {
|
resources {
|
||||||
requests = {
|
requests = {
|
||||||
|
|
@ -142,23 +138,14 @@ resource "kubernetes_service" "immich-frame" {
|
||||||
|
|
||||||
module "ingress" {
|
module "ingress" {
|
||||||
source = "../../modules/kubernetes/ingress_factory"
|
source = "../../modules/kubernetes/ingress_factory"
|
||||||
# Photo-frame kiosk display (Viktor's London Portal Plus WebView) — pulls
|
# Photo-frame kiosk display — runs in headless browser mode on a TV/frame
|
||||||
# images via an Immich API key; no user login possible on the device, so
|
# device and pulls images via an Immich API key (no user login). Forward-auth
|
||||||
# forward-auth would 302 it to Authentik with no way to complete login.
|
# would 302 the device to Authentik with no way to complete login.
|
||||||
# The GATE is network-level: the home-lans-only ipAllowList (Sofia/London/
|
# auth = "none": Photo-frame kiosk display — headless browser with API key; no user login; forward-auth breaks device automation.
|
||||||
# Valchedrym LANs + 10/8) 403s everyone else, and dns_type "internal"
|
auth = "none"
|
||||||
# publishes the Traefik LB IP publicly so the Portal's baked-in URL resolves
|
dns_type = "proxied"
|
||||||
# from any resolver yet routes only via the home LANs / WG tunnel.
|
namespace = "immich"
|
||||||
# LAN-only design: docs/plans/2026-07-04-immich-frame-lan-only-design.md.
|
name = "highlights-immich"
|
||||||
# auth = "none": kiosk WebView, no user auth by design; gated by the home-lans-only ipAllowList instead.
|
tls_secret_name = var.tls_secret_name
|
||||||
auth = "none"
|
service_name = "immich-frame"
|
||||||
dns_type = "internal"
|
|
||||||
extra_middlewares = ["traefik-home-lans-only@kubernetescrd"]
|
|
||||||
# Not externally reachable — explicit opt-out so external-monitor-sync
|
|
||||||
# drops the old [External] monitor instead of default-opting it back in.
|
|
||||||
external_monitor = false
|
|
||||||
namespace = "immich"
|
|
||||||
name = "highlights-immich"
|
|
||||||
tls_secret_name = var.tls_secret_name
|
|
||||||
service_name = "immich-frame"
|
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -15,7 +15,7 @@ locals {
|
||||||
variable "immich_version" {
|
variable "immich_version" {
|
||||||
type = string
|
type = string
|
||||||
# Change me to upgrade
|
# Change me to upgrade
|
||||||
default = "v3.0.0"
|
default = "v2.7.5"
|
||||||
}
|
}
|
||||||
variable "proxmox_host" { type = string }
|
variable "proxmox_host" { type = string }
|
||||||
variable "redis_host" { type = string }
|
variable "redis_host" { type = string }
|
||||||
|
|
@ -492,7 +492,7 @@ resource "kubernetes_deployment" "immich-postgres" {
|
||||||
}
|
}
|
||||||
spec {
|
spec {
|
||||||
container {
|
container {
|
||||||
image = "ghcr.io/immich-app/postgres:15-vectorchord0.4.3-pgvectors0.2.0"
|
image = "ghcr.io/immich-app/postgres:15-vectorchord0.3.0-pgvectors0.2.0"
|
||||||
name = "immich-postgresql"
|
name = "immich-postgresql"
|
||||||
port {
|
port {
|
||||||
container_port = 5432
|
container_port = 5432
|
||||||
|
|
@ -882,7 +882,7 @@ resource "kubernetes_cron_job_v1" "clip-index-prewarm" {
|
||||||
restart_policy = "Never"
|
restart_policy = "Never"
|
||||||
container {
|
container {
|
||||||
name = "prewarm"
|
name = "prewarm"
|
||||||
image = "ghcr.io/immich-app/postgres:15-vectorchord0.4.3-pgvectors0.2.0"
|
image = "ghcr.io/immich-app/postgres:15-vectorchord0.3.0-pgvectors0.2.0"
|
||||||
# command overrides the postgres entrypoint → runs psql directly.
|
# command overrides the postgres entrypoint → runs psql directly.
|
||||||
command = [
|
command = [
|
||||||
"psql", "-v", "ON_ERROR_STOP=1", "-c",
|
"psql", "-v", "ON_ERROR_STOP=1", "-c",
|
||||||
|
|
@ -964,7 +964,7 @@ resource "kubernetes_cron_job_v1" "immich-search-probe" {
|
||||||
}
|
}
|
||||||
init_container {
|
init_container {
|
||||||
name = "measure"
|
name = "measure"
|
||||||
image = "ghcr.io/immich-app/postgres:15-vectorchord0.4.3-pgvectors0.2.0"
|
image = "ghcr.io/immich-app/postgres:15-vectorchord0.3.0-pgvectors0.2.0"
|
||||||
command = ["/bin/bash", "-c", <<-EOT
|
command = ["/bin/bash", "-c", <<-EOT
|
||||||
set -uo pipefail
|
set -uo pipefail
|
||||||
OUT=/shared/metrics.prom
|
OUT=/shared/metrics.prom
|
||||||
|
|
|
||||||
|
|
@ -43,11 +43,6 @@ locals {
|
||||||
# ghcr.io/passionprojectsanca/book-plotter (built by GHA in Anca's repo,
|
# ghcr.io/passionprojectsanca/book-plotter (built by GHA in Anca's repo,
|
||||||
# under her own org's ghcr). The deployment references the cloned secret.
|
# under her own org's ghcr). The deployment references the cloned secret.
|
||||||
"plotting-book",
|
"plotting-book",
|
||||||
# excalidraw: infra-owned image migrated from manual DockerHub pushes to
|
|
||||||
# PRIVATE ghcr.io/viktorbarzin/excalidraw-library (ADR-0002, built by
|
|
||||||
# .github/workflows/build-excalidraw.yml). The deployment references the
|
|
||||||
# cloned secret.
|
|
||||||
"excalidraw",
|
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -19,12 +19,3 @@ plans@viktorbarzin.me spam@viktorbarzin.me
|
||||||
# to trips@, or every verification/recovery send is rejected (550 sender). Also
|
# to trips@, or every verification/recovery send is rejected (550 sender). Also
|
||||||
# routes any inbound trips@ to spam@.
|
# routes any inbound trips@ to spam@.
|
||||||
trips@viktorbarzin.me spam@viktorbarzin.me
|
trips@viktorbarzin.me spam@viktorbarzin.me
|
||||||
|
|
||||||
# docs@ -> docs@: explicit self-alias for the paperless-ngx ingest MAILBOX
|
|
||||||
# (a real account in secret/platform.mailserver_accounts). Without this the
|
|
||||||
# @domain catch-all above (Vault-side aliases) rewrites docs@ to spam@ and the
|
|
||||||
# mail lands in the TripIt-swept catch-all mailbox instead. Same pattern as
|
|
||||||
# me@ -> me@. Delivery-time sender allowlist: docs-at-viktorbarzin.me
|
|
||||||
# .dovecot.sieve (mounted as docs@viktorbarzin.me.dovecot.sieve).
|
|
||||||
# Runbook: docs/runbooks/paperless-mail-ingest.md
|
|
||||||
docs@viktorbarzin.me docs@viktorbarzin.me
|
|
||||||
|
|
|
||||||
|
|
@ -1,17 +0,0 @@
|
||||||
# Sender allowlist for the paperless-ngx ingest mailbox docs@viktorbarzin.me.
|
|
||||||
# Family members forward document emails here; paperless-ngx polls the INBOX
|
|
||||||
# over IMAP and maps each sender to a paperless account (1 mail rule per
|
|
||||||
# sender). Decision (Viktor, 2026-07-03): mail from any OTHER sender is
|
|
||||||
# ignored and deleted — discarded here at LMTP delivery, before paperless
|
|
||||||
# ever sees it. This also keeps spam to the guessable address out entirely.
|
|
||||||
#
|
|
||||||
# Keep this list in sync with the paperless mail rules (the sender -> owner
|
|
||||||
# map). Add-a-sender procedure: docs/runbooks/paperless-mail-ingest.md
|
|
||||||
if not address :is "from" ["me@viktorbarzin.me",
|
|
||||||
"vbarzin@gmail.com",
|
|
||||||
"viktorbarzin@meta.com",
|
|
||||||
"ancaelena98@gmail.com",
|
|
||||||
"emil.barzin@gmail.com"] {
|
|
||||||
discard;
|
|
||||||
stop;
|
|
||||||
}
|
|
||||||
|
|
@ -14,15 +14,10 @@ variable "nfs_server" { type = string }
|
||||||
locals {
|
locals {
|
||||||
_account_set = keys(var.mailserver_accounts)
|
_account_set = keys(var.mailserver_accounts)
|
||||||
_virtual_lines = split("\n", format("%s%s", var.postfix_account_aliases, file("${path.module}/extra/aliases.txt")))
|
_virtual_lines = split("\n", format("%s%s", var.postfix_account_aliases, file("${path.module}/extra/aliases.txt")))
|
||||||
# NOTE: the length guard must live in a ternary, not a leading `&&` operand.
|
|
||||||
# Terraform only short-circuits && / || from v1.6 — on the older terraform
|
|
||||||
# pinned in the infra-ci image, `split(" ", line)[1]` was still evaluated
|
|
||||||
# for blank/comment lines and failed the whole plan with "Invalid index"
|
|
||||||
# (first hit by CI pipeline #469, 2026-07-03). A conditional expression is
|
|
||||||
# lazy on every terraform version.
|
|
||||||
postfix_virtual = join("\n", [
|
postfix_virtual = join("\n", [
|
||||||
for line in local._virtual_lines : line
|
for line in local._virtual_lines : line
|
||||||
if length(split(" ", line)) != 2 ? true : !(
|
if !(
|
||||||
|
length(split(" ", line)) == 2 &&
|
||||||
contains(local._account_set, split(" ", line)[0]) &&
|
contains(local._account_set, split(" ", line)[0]) &&
|
||||||
contains(local._account_set, split(" ", line)[1]) &&
|
contains(local._account_set, split(" ", line)[1]) &&
|
||||||
split(" ", line)[0] != split(" ", line)[1]
|
split(" ", line)[0] != split(" ", line)[1]
|
||||||
|
|
@ -115,12 +110,6 @@ resource "kubernetes_config_map" "mailserver_config" {
|
||||||
"postfix-main.cf" = var.postfix_cf
|
"postfix-main.cf" = var.postfix_cf
|
||||||
"postfix-virtual.cf" = local.postfix_virtual
|
"postfix-virtual.cf" = local.postfix_virtual
|
||||||
|
|
||||||
# Per-user Dovecot sieve for the paperless-ngx ingest mailbox: DMS installs
|
|
||||||
# any /tmp/docker-mailserver/<login>.dovecot.sieve at startup. ConfigMap
|
|
||||||
# keys can't contain '@', so the key is sanitized ("-at-") and the
|
|
||||||
# volume_mount below restores the real filename.
|
|
||||||
"docs-at-viktorbarzin.me.dovecot.sieve" = file("${path.module}/extra/docs-at-viktorbarzin.me.dovecot.sieve")
|
|
||||||
|
|
||||||
KeyTable = "mail._domainkey.viktorbarzin.me viktorbarzin.me:mail:/etc/opendkim/keys/viktorbarzin.me-mail.key\n"
|
KeyTable = "mail._domainkey.viktorbarzin.me viktorbarzin.me:mail:/etc/opendkim/keys/viktorbarzin.me-mail.key\n"
|
||||||
SigningTable = "*@viktorbarzin.me mail._domainkey.viktorbarzin.me\n"
|
SigningTable = "*@viktorbarzin.me mail._domainkey.viktorbarzin.me\n"
|
||||||
TrustedHosts = "127.0.0.1\nlocalhost\n"
|
TrustedHosts = "127.0.0.1\nlocalhost\n"
|
||||||
|
|
@ -415,12 +404,6 @@ resource "kubernetes_deployment" "mailserver" {
|
||||||
sub_path = "postfix-virtual.cf"
|
sub_path = "postfix-virtual.cf"
|
||||||
read_only = true
|
read_only = true
|
||||||
}
|
}
|
||||||
volume_mount {
|
|
||||||
name = "config"
|
|
||||||
mount_path = "/tmp/docker-mailserver/docs@viktorbarzin.me.dovecot.sieve"
|
|
||||||
sub_path = "docs-at-viktorbarzin.me.dovecot.sieve"
|
|
||||||
read_only = true
|
|
||||||
}
|
|
||||||
volume_mount {
|
volume_mount {
|
||||||
name = "config"
|
name = "config"
|
||||||
mount_path = "/tmp/docker-mailserver/fetchmail.cf"
|
mount_path = "/tmp/docker-mailserver/fetchmail.cf"
|
||||||
|
|
|
||||||
|
|
@ -60,10 +60,6 @@ locals {
|
||||||
# t3 dispatch probe surface (auth="none" path carve-out on /probe): WS echo
|
# t3 dispatch probe surface (auth="none" path carve-out on /probe): WS echo
|
||||||
# + healthz for the t3-probe drop-attribution client (stacks/t3code).
|
# + healthz for the t3-probe drop-attribution client (stacks/t3code).
|
||||||
"t3-probe-ws" = "https://t3.viktorbarzin.me/probe/healthz"
|
"t3-probe-ws" = "https://t3.viktorbarzin.me/probe/healthz"
|
||||||
# tasks PWA icons + manifest (auth="none" path carve-out, stacks/tasks
|
|
||||||
# module.ingress_icons): macOS/iOS/Android icon fetchers carry no session
|
|
||||||
# cookies, so an Authentik 302 here breaks Add-to-Dock icons.
|
|
||||||
"tasks-icons" = "https://tasks.viktorbarzin.me/apple-touch-icon.png"
|
|
||||||
# NOTE: openclaw task-webhook (auth="none") is intentionally NOT probed — it
|
# NOTE: openclaw task-webhook (auth="none") is intentionally NOT probed — it
|
||||||
# has no public DNS record (NXDOMAIN, external_monitor=false), so there is no
|
# has no public DNS record (NXDOMAIN, external_monitor=false), so there is no
|
||||||
# externally GET-able URL to probe. Its carve-out is internal-only.
|
# externally GET-able URL to probe. Its carve-out is internal-only.
|
||||||
|
|
|
||||||
|
|
@ -18,6 +18,7 @@ const SITE_IDS = {
|
||||||
"stacks.viktorbarzin.me": "b38fda4285df",
|
"stacks.viktorbarzin.me": "b38fda4285df",
|
||||||
"f1.viktorbarzin.me": "7e69786f66d5",
|
"f1.viktorbarzin.me": "7e69786f66d5",
|
||||||
"frigate.viktorbarzin.me": "0d4044069ff5",
|
"frigate.viktorbarzin.me": "0d4044069ff5",
|
||||||
|
"highlights-immich.viktorbarzin.me": "602167601c6b",
|
||||||
"immich.viktorbarzin.me": "35eedb7a3d2b",
|
"immich.viktorbarzin.me": "35eedb7a3d2b",
|
||||||
"mail.viktorbarzin.me": "082f164faa7d",
|
"mail.viktorbarzin.me": "082f164faa7d",
|
||||||
"navidrome.viktorbarzin.me": "8a3844ff75ba",
|
"navidrome.viktorbarzin.me": "8a3844ff75ba",
|
||||||
|
|
|
||||||
|
|
@ -28,6 +28,7 @@ routes = [
|
||||||
{ pattern = "stacks.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
{ pattern = "stacks.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||||
{ pattern = "f1.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
{ pattern = "f1.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||||
{ pattern = "frigate.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
{ pattern = "frigate.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||||
|
{ pattern = "highlights-immich.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||||
{ pattern = "immich.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
{ pattern = "immich.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||||
{ pattern = "mail.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
{ pattern = "mail.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||||
{ pattern = "navidrome.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
{ pattern = "navidrome.viktorbarzin.me/*", zone_name = "viktorbarzin.me" },
|
||||||
|
|
|
||||||
122
stacks/stem95su/gdrive-sync.tf
Normal file
122
stacks/stem95su/gdrive-sync.tf
Normal file
|
|
@ -0,0 +1,122 @@
|
||||||
|
# Automatic Google Drive -> site sync (added 2026-06-09; supersedes the
|
||||||
|
# earlier on-demand-only model now that content is actively maintained).
|
||||||
|
#
|
||||||
|
# A CronJob mirrors the READ-ONLY Drive folder "claude" (servable content in
|
||||||
|
# subfolder "stem claude/files/") onto the NFS content volume every 10 min via
|
||||||
|
# rclone. rclone is delta-aware: an unchanged run lists ~33 files' metadata and
|
||||||
|
# transfers nothing, so the schedule is cheap (not a 24MB re-download). nginx
|
||||||
|
# keeps serving the same volume read-only; updates appear within ~5s (actimeo).
|
||||||
|
#
|
||||||
|
# Drive is treated strictly READ-ONLY: scope=drive.readonly and rclone only ever
|
||||||
|
# reads the remote (sync gdrive: -> /data), never writes back.
|
||||||
|
#
|
||||||
|
# TOKEN LONGEVITY: the GCP OAuth app (project home-lab-1700868541205) MUST be
|
||||||
|
# published to "Production" or its refresh token expires ~weekly and this job
|
||||||
|
# fails. After publishing, re-mint the token and refresh
|
||||||
|
# `secret/stem95su.rclone_conf`. A failed run surfaces as a failed Job.
|
||||||
|
|
||||||
|
resource "kubernetes_manifest" "rclone_external_secret" {
|
||||||
|
field_manager {
|
||||||
|
force_conflicts = true
|
||||||
|
}
|
||||||
|
manifest = {
|
||||||
|
apiVersion = "external-secrets.io/v1"
|
||||||
|
kind = "ExternalSecret"
|
||||||
|
metadata = {
|
||||||
|
name = "stem95su-rclone"
|
||||||
|
namespace = kubernetes_namespace.stem95su.metadata[0].name
|
||||||
|
}
|
||||||
|
spec = {
|
||||||
|
refreshInterval = "1h"
|
||||||
|
secretStoreRef = {
|
||||||
|
name = "vault-kv"
|
||||||
|
kind = "ClusterSecretStore"
|
||||||
|
}
|
||||||
|
target = { name = "stem95su-rclone" }
|
||||||
|
data = [{
|
||||||
|
secretKey = "rclone.conf"
|
||||||
|
remoteRef = {
|
||||||
|
key = "stem95su"
|
||||||
|
property = "rclone_conf"
|
||||||
|
}
|
||||||
|
}]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
depends_on = [kubernetes_namespace.stem95su]
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubernetes_cron_job_v1" "gdrive_sync" {
|
||||||
|
metadata {
|
||||||
|
name = "stem95su-gdrive-sync"
|
||||||
|
namespace = kubernetes_namespace.stem95su.metadata[0].name
|
||||||
|
labels = { run = "stem95su", component = "gdrive-sync" }
|
||||||
|
}
|
||||||
|
spec {
|
||||||
|
schedule = "*/10 * * * *"
|
||||||
|
concurrency_policy = "Forbid"
|
||||||
|
successful_jobs_history_limit = 2
|
||||||
|
failed_jobs_history_limit = 3
|
||||||
|
job_template {
|
||||||
|
metadata {}
|
||||||
|
spec {
|
||||||
|
backoff_limit = 1
|
||||||
|
ttl_seconds_after_finished = 86400
|
||||||
|
template {
|
||||||
|
metadata { labels = { run = "stem95su", component = "gdrive-sync" } }
|
||||||
|
spec {
|
||||||
|
restart_policy = "OnFailure"
|
||||||
|
container {
|
||||||
|
name = "rclone"
|
||||||
|
image = "docker.io/rclone/rclone:1.74.3"
|
||||||
|
# Mirror Drive folder -> /data. Guard: hard-fail on auth/list error
|
||||||
|
# (so an expired token is visible); skip quietly if the source is
|
||||||
|
# empty / missing the dashboard (never wipe the live site);
|
||||||
|
# --max-delete caps catastrophic deletes from a partial listing.
|
||||||
|
command = ["/bin/sh", "-c", <<-EOT
|
||||||
|
set -eu
|
||||||
|
cp /config/rclone.conf /tmp/rc.conf
|
||||||
|
SRC="gdrive:stem claude/files"
|
||||||
|
LIST=$(rclone --config /tmp/rc.conf lsf "$SRC" --files-only) || { echo "FATAL: Drive list failed (auth/network)"; exit 1; }
|
||||||
|
N=$(printf '%s\n' "$LIST" | grep -c . || true)
|
||||||
|
if [ "$N" -lt 1 ] || ! printf '%s\n' "$LIST" | grep -qx "stem_board.html"; then
|
||||||
|
echo "GUARD: source N=$N / stem_board.html missing -- skipping, site untouched"; exit 0
|
||||||
|
fi
|
||||||
|
echo "source OK ($N files) -- mirroring to /data"
|
||||||
|
rclone --config /tmp/rc.conf sync "$SRC" /data --exclude ".DS_Store" --fast-list --transfers 4 --max-delete 25 -v
|
||||||
|
EOT
|
||||||
|
]
|
||||||
|
resources {
|
||||||
|
requests = { cpu = "10m", memory = "64Mi" }
|
||||||
|
limits = { memory = "192Mi" }
|
||||||
|
}
|
||||||
|
volume_mount {
|
||||||
|
name = "rclone-config"
|
||||||
|
mount_path = "/config"
|
||||||
|
read_only = true
|
||||||
|
}
|
||||||
|
volume_mount {
|
||||||
|
name = "content"
|
||||||
|
mount_path = "/data"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
volume {
|
||||||
|
name = "rclone-config"
|
||||||
|
secret { secret_name = "stem95su-rclone" }
|
||||||
|
}
|
||||||
|
volume {
|
||||||
|
name = "content"
|
||||||
|
persistent_volume_claim {
|
||||||
|
claim_name = module.nfs_content.claim_name
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
lifecycle {
|
||||||
|
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
||||||
|
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
||||||
|
}
|
||||||
|
depends_on = [kubernetes_manifest.rclone_external_secret]
|
||||||
|
}
|
||||||
|
|
@ -1,9 +1,173 @@
|
||||||
# stem95su moved OFF-INFRA to Cloudflare Pages (ADR-0018 cutover, 2026-07-03) —
|
# STEM educational platform for 95. СУ „Проф. Иван Шишманов" (Sofia).
|
||||||
# registry entry `stem95su` in stacks/valia-sites; runbook
|
# Public, open static site at stem95su.viktorbarzin.me. Self-contained HTML
|
||||||
# docs/runbooks/valia-sites.md. This stack intentionally declares NOTHING:
|
# pages + media authored externally (Gemini exports), served by a stock nginx
|
||||||
# the apply that landed this file destroyed the old in-cluster serving
|
# straight off the PVE host NFS — NOT baked into an image, so content can be
|
||||||
# (nginx + NFS content PVC + ingress + per-site gdrive-sync CronJob +
|
# updated out-of-band (Nextcloud "PVE NFS Pool" or rsync to /srv/nfs/stem-site)
|
||||||
# namespace). Directory kept only so the destroy could run through CI —
|
# without a rebuild. Auto-backed-up offsite by the existing nfs-mirror job.
|
||||||
# safe to delete the dir + its PG state schema in a later cleanup.
|
|
||||||
# Harmless leftovers (manual cleanup if ever wanted): /srv/nfs/stem-site on
|
resource "kubernetes_namespace" "stem95su" {
|
||||||
# the PVE host, and Vault secret/stem95su (superseded by secret/valia-sites).
|
metadata {
|
||||||
|
name = "stem95su"
|
||||||
|
labels = {
|
||||||
|
"istio-injection" : "disabled"
|
||||||
|
tier = local.tiers.aux
|
||||||
|
}
|
||||||
|
}
|
||||||
|
lifecycle {
|
||||||
|
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
|
||||||
|
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "tls_secret" {
|
||||||
|
source = "../../modules/kubernetes/setup_tls_secret"
|
||||||
|
namespace = kubernetes_namespace.stem95su.metadata[0].name
|
||||||
|
tls_secret_name = var.tls_secret_name
|
||||||
|
}
|
||||||
|
|
||||||
|
# Content lives on the PVE host NFS. NOTE: the nfs_volume module creates only
|
||||||
|
# the K8s PV+PVC — the export subdir (/srv/nfs/stem-site) must already exist on
|
||||||
|
# 192.168.1.127 or the pod fails to mount (mount.nfs exit 32). It is created
|
||||||
|
# during deploy and re-created on demand if ever lost.
|
||||||
|
module "nfs_content" {
|
||||||
|
source = "../../modules/kubernetes/nfs_volume"
|
||||||
|
name = "stem95su-content"
|
||||||
|
namespace = kubernetes_namespace.stem95su.metadata[0].name
|
||||||
|
nfs_server = var.nfs_server
|
||||||
|
nfs_path = "/srv/nfs/stem-site"
|
||||||
|
storage = "1Gi"
|
||||||
|
access_modes = ["ReadWriteMany"]
|
||||||
|
}
|
||||||
|
|
||||||
|
# Minimal nginx server block: serve the static dir, with the dashboard
|
||||||
|
# (stem_board.html) as the directory index so "/" loads the platform home.
|
||||||
|
# All other pages/assets are reached by their exact filenames (the dashboard
|
||||||
|
# links to them by name — those must not be renamed).
|
||||||
|
resource "kubernetes_config_map" "nginx_conf" {
|
||||||
|
metadata {
|
||||||
|
name = "stem95su-nginx-conf"
|
||||||
|
namespace = kubernetes_namespace.stem95su.metadata[0].name
|
||||||
|
}
|
||||||
|
data = {
|
||||||
|
"default.conf" = <<-EOT
|
||||||
|
server {
|
||||||
|
listen 80;
|
||||||
|
server_name _;
|
||||||
|
root /usr/share/nginx/html;
|
||||||
|
index stem_board.html index.html;
|
||||||
|
}
|
||||||
|
EOT
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubernetes_deployment" "stem95su" {
|
||||||
|
metadata {
|
||||||
|
name = "stem95su"
|
||||||
|
namespace = kubernetes_namespace.stem95su.metadata[0].name
|
||||||
|
labels = {
|
||||||
|
run = "stem95su"
|
||||||
|
tier = local.tiers.aux
|
||||||
|
}
|
||||||
|
}
|
||||||
|
spec {
|
||||||
|
replicas = 1
|
||||||
|
selector {
|
||||||
|
match_labels = {
|
||||||
|
run = "stem95su"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
template {
|
||||||
|
metadata {
|
||||||
|
labels = {
|
||||||
|
run = "stem95su"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
spec {
|
||||||
|
container {
|
||||||
|
image = "nginx:1.28-alpine"
|
||||||
|
name = "nginx"
|
||||||
|
resources {
|
||||||
|
limits = {
|
||||||
|
memory = "64Mi"
|
||||||
|
}
|
||||||
|
requests = {
|
||||||
|
cpu = "10m"
|
||||||
|
memory = "64Mi"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
port {
|
||||||
|
container_port = 80
|
||||||
|
}
|
||||||
|
volume_mount {
|
||||||
|
name = "content"
|
||||||
|
mount_path = "/usr/share/nginx/html"
|
||||||
|
read_only = true
|
||||||
|
}
|
||||||
|
volume_mount {
|
||||||
|
name = "nginx-conf"
|
||||||
|
mount_path = "/etc/nginx/conf.d"
|
||||||
|
read_only = true
|
||||||
|
}
|
||||||
|
readiness_probe {
|
||||||
|
http_get {
|
||||||
|
path = "/"
|
||||||
|
port = 80
|
||||||
|
}
|
||||||
|
initial_delay_seconds = 3
|
||||||
|
period_seconds = 10
|
||||||
|
}
|
||||||
|
}
|
||||||
|
volume {
|
||||||
|
name = "content"
|
||||||
|
persistent_volume_claim {
|
||||||
|
claim_name = module.nfs_content.claim_name
|
||||||
|
}
|
||||||
|
}
|
||||||
|
volume {
|
||||||
|
name = "nginx-conf"
|
||||||
|
config_map {
|
||||||
|
name = kubernetes_config_map.nginx_conf.metadata[0].name
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
lifecycle {
|
||||||
|
ignore_changes = [
|
||||||
|
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubernetes_service" "stem95su" {
|
||||||
|
metadata {
|
||||||
|
name = "stem95su"
|
||||||
|
namespace = kubernetes_namespace.stem95su.metadata[0].name
|
||||||
|
labels = {
|
||||||
|
run = "stem95su"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
spec {
|
||||||
|
selector = {
|
||||||
|
run = "stem95su"
|
||||||
|
}
|
||||||
|
port {
|
||||||
|
name = "http"
|
||||||
|
port = "80"
|
||||||
|
target_port = "80"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "ingress" {
|
||||||
|
source = "../../modules/kubernetes/ingress_factory"
|
||||||
|
# auth = "none": public static educational site for 95. СУ, open to the internet by design — CrowdSec + ai-bot-block gate bots; no login.
|
||||||
|
auth = "none"
|
||||||
|
namespace = kubernetes_namespace.stem95su.metadata[0].name
|
||||||
|
name = "stem95su"
|
||||||
|
service_name = kubernetes_service.stem95su.metadata[0].name
|
||||||
|
port = "80"
|
||||||
|
host = "stem95su"
|
||||||
|
dns_type = "proxied"
|
||||||
|
tls_secret_name = var.tls_secret_name
|
||||||
|
}
|
||||||
|
|
|
||||||
9
stacks/stem95su/variables.tf
Normal file
9
stacks/stem95su/variables.tf
Normal file
|
|
@ -0,0 +1,9 @@
|
||||||
|
variable "tls_secret_name" {
|
||||||
|
type = string
|
||||||
|
sensitive = true
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "nfs_server" {
|
||||||
|
type = string
|
||||||
|
default = "192.168.1.127"
|
||||||
|
}
|
||||||
|
|
@ -1,53 +0,0 @@
|
||||||
# One-shot adoption of the live tasks-stack resources that exist in-cluster but
|
|
||||||
# were never persisted to Terraform state: pipeline 477 (2026-07-03, the stack's
|
|
||||||
# first apply) died mid-`[tasks] apply` — after creating the resources, before
|
|
||||||
# the pg backend write — so `tasks.states` stayed empty and every later apply
|
|
||||||
# would create-fail with `namespaces "tasks" already exists` (same class as the
|
|
||||||
# monitoring alert-digest adoption in stacks/monitoring/imports.tf). Importing
|
|
||||||
# reconciles them into state so `terraform apply` UPDATES instead of failing to
|
|
||||||
# create. These blocks are idempotent (a no-op once the resources are in state)
|
|
||||||
# and may be removed after the next green apply. Defs: main.tf.
|
|
||||||
# (module.ingress_icons is deliberately NOT here — it does not exist live yet;
|
|
||||||
# the same apply creates it.)
|
|
||||||
|
|
||||||
import {
|
|
||||||
to = kubernetes_namespace.tasks
|
|
||||||
id = "tasks"
|
|
||||||
}
|
|
||||||
|
|
||||||
import {
|
|
||||||
to = kubernetes_manifest.external_secret
|
|
||||||
id = "apiVersion=external-secrets.io/v1,kind=ExternalSecret,namespace=tasks,name=tasks-secrets"
|
|
||||||
}
|
|
||||||
|
|
||||||
import {
|
|
||||||
to = kubernetes_manifest.db_external_secret
|
|
||||||
id = "apiVersion=external-secrets.io/v1,kind=ExternalSecret,namespace=tasks,name=tasks-db-creds"
|
|
||||||
}
|
|
||||||
|
|
||||||
import {
|
|
||||||
to = kubernetes_deployment.tasks
|
|
||||||
id = "tasks/tasks"
|
|
||||||
}
|
|
||||||
|
|
||||||
import {
|
|
||||||
to = kubernetes_service.tasks
|
|
||||||
id = "tasks/tasks"
|
|
||||||
}
|
|
||||||
|
|
||||||
import {
|
|
||||||
to = kubernetes_network_policy_v1.tasks_ingress
|
|
||||||
id = "tasks/tasks-ingress"
|
|
||||||
}
|
|
||||||
|
|
||||||
import {
|
|
||||||
to = module.ingress.kubernetes_ingress_v1.proxied-ingress
|
|
||||||
id = "tasks/tasks"
|
|
||||||
}
|
|
||||||
|
|
||||||
# Cloudflare record ID looked up via the API (zone fd2c5dd4… / record for
|
|
||||||
# tasks.viktorbarzin.me, CNAME → the cfargotunnel target, proxied).
|
|
||||||
import {
|
|
||||||
to = module.ingress.cloudflare_record.proxied[0]
|
|
||||||
id = "fd2c5dd4efe8fe38958944e74d0ced6d/a8e6901a074c5255d09700d93eaaf705"
|
|
||||||
}
|
|
||||||
|
|
@ -1,378 +0,0 @@
|
||||||
variable "image_tag" {
|
|
||||||
type = string
|
|
||||||
default = "latest"
|
|
||||||
description = "tasks image tag. Running tag is set by the Woodpecker deploy (kubectl set image)."
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "postgresql_host" { type = string }
|
|
||||||
|
|
||||||
variable "tls_secret_name" {
|
|
||||||
type = string
|
|
||||||
sensitive = true
|
|
||||||
}
|
|
||||||
|
|
||||||
locals {
|
|
||||||
namespace = "tasks"
|
|
||||||
# ADR-0002: built on GHA from the public GitHub mirror, pushed to ghcr
|
|
||||||
# (public package — anonymous pulls). Running tag is managed by the
|
|
||||||
# Woodpecker deploy (kubectl set image); the image ref below is
|
|
||||||
# ignore_changes'd (KEEL_IGNORE_IMAGE), so this base only matters on
|
|
||||||
# (re)create.
|
|
||||||
image = "ghcr.io/viktorbarzin/tasks:${var.image_tag}"
|
|
||||||
labels = {
|
|
||||||
app = "tasks"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_namespace" "tasks" {
|
|
||||||
metadata {
|
|
||||||
name = local.namespace
|
|
||||||
labels = {
|
|
||||||
tier = local.tiers.aux
|
|
||||||
"istio-injection" = "disabled"
|
|
||||||
# Opt into Keel auto-update (inject-keel-annotations ClusterPolicy).
|
|
||||||
"keel.sh/enrolled" = "true"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
lifecycle {
|
|
||||||
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label.
|
|
||||||
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# App secrets — seed these in Vault before applying:
|
|
||||||
# secret/tasks
|
|
||||||
# fernet_key — Fernet key encrypting the per-user Nextcloud app passwords
|
|
||||||
# stored in the Connected Accounts table (tasks ADR-0002).
|
|
||||||
#
|
|
||||||
# DB: CNPG database `tasks` (created in dbaas, null_resource.pg_tasks_db);
|
|
||||||
# role password managed via the Vault database engine — see
|
|
||||||
# static-creds/pg-tasks. Alembic runs migrations on app startup (no init
|
|
||||||
# container needed).
|
|
||||||
resource "kubernetes_manifest" "external_secret" {
|
|
||||||
field_manager {
|
|
||||||
force_conflicts = true
|
|
||||||
}
|
|
||||||
manifest = {
|
|
||||||
apiVersion = "external-secrets.io/v1"
|
|
||||||
kind = "ExternalSecret"
|
|
||||||
metadata = {
|
|
||||||
name = "tasks-secrets"
|
|
||||||
namespace = local.namespace
|
|
||||||
}
|
|
||||||
spec = {
|
|
||||||
refreshInterval = "15m"
|
|
||||||
secretStoreRef = {
|
|
||||||
name = "vault-kv"
|
|
||||||
kind = "ClusterSecretStore"
|
|
||||||
}
|
|
||||||
target = {
|
|
||||||
name = "tasks-secrets"
|
|
||||||
template = {
|
|
||||||
metadata = {
|
|
||||||
annotations = {
|
|
||||||
"reloader.stakater.com/match" = "true"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
data = [
|
|
||||||
{ secretKey = "TASKS_FERNET_KEY", remoteRef = { key = "tasks", property = "fernet_key" } },
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
depends_on = [kubernetes_namespace.tasks]
|
|
||||||
}
|
|
||||||
|
|
||||||
# DB credentials from Vault database engine (7-day rotation).
|
|
||||||
# Builds the asyncpg DSN consumed by the FastAPI app as TASKS_DB_DSN.
|
|
||||||
# Pre-req in dbaas: CNPG cluster has DB `tasks`, role `tasks`, and Vault
|
|
||||||
# role `static-creds/pg-tasks`.
|
|
||||||
resource "kubernetes_manifest" "db_external_secret" {
|
|
||||||
field_manager {
|
|
||||||
force_conflicts = true
|
|
||||||
}
|
|
||||||
manifest = {
|
|
||||||
apiVersion = "external-secrets.io/v1"
|
|
||||||
kind = "ExternalSecret"
|
|
||||||
metadata = {
|
|
||||||
name = "tasks-db-creds"
|
|
||||||
namespace = local.namespace
|
|
||||||
}
|
|
||||||
spec = {
|
|
||||||
refreshInterval = "15m"
|
|
||||||
secretStoreRef = {
|
|
||||||
name = "vault-database"
|
|
||||||
kind = "ClusterSecretStore"
|
|
||||||
}
|
|
||||||
target = {
|
|
||||||
name = "tasks-db-creds"
|
|
||||||
template = {
|
|
||||||
metadata = {
|
|
||||||
annotations = {
|
|
||||||
"reloader.stakater.com/match" = "true"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
data = {
|
|
||||||
TASKS_DB_DSN = "postgresql+asyncpg://tasks:{{ .password }}@${var.postgresql_host}:5432/tasks"
|
|
||||||
DB_PASSWORD = "{{ .password }}"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
data = [{
|
|
||||||
secretKey = "password"
|
|
||||||
remoteRef = {
|
|
||||||
key = "static-creds/pg-tasks"
|
|
||||||
property = "password"
|
|
||||||
}
|
|
||||||
}]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
depends_on = [kubernetes_namespace.tasks]
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_deployment" "tasks" {
|
|
||||||
metadata {
|
|
||||||
name = "tasks"
|
|
||||||
namespace = kubernetes_namespace.tasks.metadata[0].name
|
|
||||||
labels = merge(local.labels, {
|
|
||||||
tier = local.tiers.aux
|
|
||||||
})
|
|
||||||
annotations = {
|
|
||||||
# Reloader restarts the pod when tasks-secrets / tasks-db-creds change
|
|
||||||
# (both carry reloader.stakater.com/match=true) — required because the
|
|
||||||
# DB password rotates every 7 days and is read only at startup.
|
|
||||||
"reloader.stakater.com/search" = "true"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
spec {
|
|
||||||
# Single leader: the CalDAV sync engine wants one writer per user's
|
|
||||||
# sync-token cursor; the SPA is served by the same process.
|
|
||||||
replicas = 1
|
|
||||||
strategy {
|
|
||||||
type = "Recreate"
|
|
||||||
}
|
|
||||||
|
|
||||||
selector {
|
|
||||||
match_labels = local.labels
|
|
||||||
}
|
|
||||||
|
|
||||||
template {
|
|
||||||
metadata {
|
|
||||||
labels = local.labels
|
|
||||||
annotations = {
|
|
||||||
# Prometheus scrapes the service-endpoints (annotations live on the
|
|
||||||
# Service below); the pod annotations here let the kubernetes-pods
|
|
||||||
# SD job also discover /metrics directly.
|
|
||||||
"prometheus.io/scrape" = "true"
|
|
||||||
"prometheus.io/path" = "/metrics"
|
|
||||||
"prometheus.io/port" = "8000"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
spec {
|
|
||||||
image_pull_secrets {
|
|
||||||
name = "registry-credentials"
|
|
||||||
}
|
|
||||||
|
|
||||||
container {
|
|
||||||
name = "tasks"
|
|
||||||
image = local.image
|
|
||||||
|
|
||||||
port {
|
|
||||||
container_port = 8000
|
|
||||||
}
|
|
||||||
|
|
||||||
# TASKS_FERNET_KEY via tasks-secrets; TASKS_DB_DSN via tasks-db-creds.
|
|
||||||
env_from {
|
|
||||||
secret_ref { name = "tasks-secrets" }
|
|
||||||
}
|
|
||||||
env_from {
|
|
||||||
secret_ref { name = "tasks-db-creds" }
|
|
||||||
}
|
|
||||||
|
|
||||||
# Wall-clock zone for all-day due dates (DUE;VALUE=DATE) and the
|
|
||||||
# Today/Scheduled smart views.
|
|
||||||
env {
|
|
||||||
name = "TASKS_LOCAL_TZ"
|
|
||||||
value = "Europe/Sofia"
|
|
||||||
}
|
|
||||||
# SECURITY INVARIANT — DEV_USER must NEVER be set here. It is the
|
|
||||||
# dev-only identity fallback: when present the backend treats every
|
|
||||||
# request as that user, bypassing the Authentik forward-auth
|
|
||||||
# identity (X-authentik-username) entirely. Production identity
|
|
||||||
# comes ONLY from the header Traefik/Authentik injects.
|
|
||||||
|
|
||||||
readiness_probe {
|
|
||||||
http_get {
|
|
||||||
path = "/healthz"
|
|
||||||
port = 8000
|
|
||||||
}
|
|
||||||
initial_delay_seconds = 5
|
|
||||||
period_seconds = 10
|
|
||||||
}
|
|
||||||
liveness_probe {
|
|
||||||
http_get {
|
|
||||||
path = "/healthz"
|
|
||||||
port = 8000
|
|
||||||
}
|
|
||||||
initial_delay_seconds = 30
|
|
||||||
period_seconds = 30
|
|
||||||
}
|
|
||||||
|
|
||||||
resources {
|
|
||||||
requests = { cpu = "100m", memory = "384Mi" }
|
|
||||||
limits = { memory = "384Mi" }
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
lifecycle {
|
|
||||||
ignore_changes = [
|
|
||||||
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
|
|
||||||
metadata[0].annotations["keel.sh/policy"],
|
|
||||||
metadata[0].annotations["keel.sh/trigger"],
|
|
||||||
metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
|
|
||||||
metadata[0].annotations["keel.sh/match-tag"],
|
|
||||||
spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Woodpecker deploy sets the running tag
|
|
||||||
metadata[0].annotations["kubernetes.io/change-cause"],
|
|
||||||
metadata[0].annotations["deployment.kubernetes.io/revision"],
|
|
||||||
spec[0].template[0].metadata[0].annotations["keel.sh/update-time"], # KEEL_LIFECYCLE_V1
|
|
||||||
]
|
|
||||||
}
|
|
||||||
|
|
||||||
depends_on = [
|
|
||||||
kubernetes_manifest.external_secret,
|
|
||||||
kubernetes_manifest.db_external_secret,
|
|
||||||
]
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_service" "tasks" {
|
|
||||||
metadata {
|
|
||||||
name = "tasks"
|
|
||||||
namespace = kubernetes_namespace.tasks.metadata[0].name
|
|
||||||
labels = local.labels
|
|
||||||
annotations = {
|
|
||||||
# Prometheus kubernetes-service-endpoints SD scrapes /metrics here.
|
|
||||||
"prometheus.io/scrape" = "true"
|
|
||||||
"prometheus.io/path" = "/metrics"
|
|
||||||
"prometheus.io/port" = "8000"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
spec {
|
|
||||||
type = "ClusterIP"
|
|
||||||
selector = local.labels
|
|
||||||
|
|
||||||
port {
|
|
||||||
name = "http"
|
|
||||||
port = 8000
|
|
||||||
target_port = 8000
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# Kyverno ClusterPolicy `sync-tls-secret` auto-clones the wildcard TLS
|
|
||||||
# secret into every namespace, so we don't need a setup_tls_secret module.
|
|
||||||
|
|
||||||
module "ingress" {
|
|
||||||
source = "../../modules/kubernetes/ingress_factory"
|
|
||||||
# auth = "required": Authentik forward-auth gates EVERY request — the app
|
|
||||||
# has no login of its own and blindly trusts the X-authentik-username
|
|
||||||
# header the outpost injects, so Authentik is the only thing standing
|
|
||||||
# between strangers and everyone's tasks. Do NOT relax this tier (tasks
|
|
||||||
# design decision #3; pairs with the NetworkPolicy below, SEC-1).
|
|
||||||
auth = "required"
|
|
||||||
dns_type = "proxied"
|
|
||||||
namespace = kubernetes_namespace.tasks.metadata[0].name
|
|
||||||
name = "tasks"
|
|
||||||
port = 8000
|
|
||||||
tls_secret_name = var.tls_secret_name
|
|
||||||
}
|
|
||||||
|
|
||||||
# Carve-out for the PWA icon assets + web manifest. macOS Safari's
|
|
||||||
# "Add to Dock" (and every other OS icon fetcher: iOS Add-to-Home-Screen,
|
|
||||||
# Android install prompt) fetches these in a cookie-less context — behind
|
|
||||||
# forward-auth it got the Authentik 302 and fell back to a letter monogram.
|
|
||||||
# Traefik prioritises these longer path prefixes over the main "/" router,
|
|
||||||
# so ONLY these five static files bypass Authentik; the SPA shell and /api
|
|
||||||
# stay gated by the main ingress above (and the app itself 401s /api
|
|
||||||
# without the identity header). Guarded against regression by the
|
|
||||||
# tasks-icons entry in the Authentik walling-off probe
|
|
||||||
# (stacks/monitoring/modules/monitoring/authentik_walloff_probe.tf).
|
|
||||||
module "ingress_icons" {
|
|
||||||
source = "../../modules/kubernetes/ingress_factory"
|
|
||||||
# auth = "none": public static icons + manifest, no user data; required for
|
|
||||||
# OS icon fetchers (Safari Add-to-Dock etc.) that carry no session and
|
|
||||||
# cannot complete the Authentik redirect dance.
|
|
||||||
auth = "none"
|
|
||||||
namespace = kubernetes_namespace.tasks.metadata[0].name
|
|
||||||
name = "tasks-icons"
|
|
||||||
service_name = kubernetes_service.tasks.metadata[0].name
|
|
||||||
port = 8000
|
|
||||||
ingress_path = [
|
|
||||||
"/apple-touch-icon.png",
|
|
||||||
"/favicon.png",
|
|
||||||
"/pwa-192x192.png",
|
|
||||||
"/pwa-512x512.png",
|
|
||||||
"/manifest.webmanifest",
|
|
||||||
]
|
|
||||||
full_host = "tasks.viktorbarzin.me" # MUST match the main ingress host; otherwise the factory derives tasks-icons.viktorbarzin.me and the carve-out never matches.
|
|
||||||
dns_type = "none" # host record already owned by the main tasks ingress
|
|
||||||
tls_secret_name = var.tls_secret_name
|
|
||||||
anti_ai_scraping = false # Five static icons + a manifest; nothing for scrapers to mine.
|
|
||||||
homepage_enabled = false # path carve-out, not its own dashboard tile
|
|
||||||
}
|
|
||||||
|
|
||||||
# --- NetworkPolicy: scoped pod ingress (security-review finding SEC-1). ---
|
|
||||||
# The app trusts X-authentik-username unconditionally, so its ENTIRE auth
|
|
||||||
# model depends on requests only ever arriving through Traefik (where the
|
|
||||||
# Authentik forward-auth middleware sets that header). Any pod that could
|
|
||||||
# reach the pod IP directly could spoof the header and read/write anyone's
|
|
||||||
# tasks — hence ingress is restricted to:
|
|
||||||
# - TCP/8000 from the traefik namespace (user traffic, post-forward-auth);
|
|
||||||
# - TCP/8000 from the monitoring namespace (Prometheus /metrics scrape).
|
|
||||||
# The cluster has no default-deny, so this NP only takes effect inside the
|
|
||||||
# tasks ns — pods elsewhere remain unaffected. (Same shape as
|
|
||||||
# chrome-service's chrome-service-ws-ingress.)
|
|
||||||
resource "kubernetes_network_policy_v1" "tasks_ingress" {
|
|
||||||
metadata {
|
|
||||||
name = "tasks-ingress"
|
|
||||||
namespace = kubernetes_namespace.tasks.metadata[0].name
|
|
||||||
}
|
|
||||||
spec {
|
|
||||||
pod_selector {
|
|
||||||
match_labels = local.labels
|
|
||||||
}
|
|
||||||
policy_types = ["Ingress"]
|
|
||||||
ingress {
|
|
||||||
from {
|
|
||||||
namespace_selector {
|
|
||||||
match_labels = {
|
|
||||||
"kubernetes.io/metadata.name" = "traefik"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
ports {
|
|
||||||
port = "8000"
|
|
||||||
protocol = "TCP"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
ingress {
|
|
||||||
from {
|
|
||||||
namespace_selector {
|
|
||||||
match_labels = {
|
|
||||||
"kubernetes.io/metadata.name" = "monitoring"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
ports {
|
|
||||||
port = "8000"
|
|
||||||
protocol = "TCP"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
@ -1,23 +0,0 @@
|
||||||
include "root" {
|
|
||||||
path = find_in_parent_folders()
|
|
||||||
}
|
|
||||||
|
|
||||||
dependency "platform" {
|
|
||||||
config_path = "../platform"
|
|
||||||
skip_outputs = true
|
|
||||||
}
|
|
||||||
|
|
||||||
dependency "vault" {
|
|
||||||
config_path = "../vault"
|
|
||||||
skip_outputs = true
|
|
||||||
}
|
|
||||||
|
|
||||||
dependency "external-secrets" {
|
|
||||||
config_path = "../external-secrets"
|
|
||||||
skip_outputs = true
|
|
||||||
}
|
|
||||||
|
|
||||||
inputs = {
|
|
||||||
# Override per-deploy in CI / commit.
|
|
||||||
image_tag = "latest"
|
|
||||||
}
|
|
||||||
|
|
@ -873,14 +873,6 @@ resource "kubernetes_cluster_role" "ingress_dns_sync" {
|
||||||
resources = ["services"]
|
resources = ["services"]
|
||||||
verbs = ["get", "list"]
|
verbs = ["get", "list"]
|
||||||
}
|
}
|
||||||
# Read the Valia-sites internal-DNS feed (written by stacks/valia-sites,
|
|
||||||
# ADR-0018) so the sync can reconcile off-infra Pages CNAMEs declaratively.
|
|
||||||
rule {
|
|
||||||
api_groups = [""]
|
|
||||||
resources = ["configmaps"]
|
|
||||||
resource_names = ["valia-sites-dns"]
|
|
||||||
verbs = ["get"]
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
resource "kubernetes_cluster_role_binding" "ingress_dns_sync" {
|
resource "kubernetes_cluster_role_binding" "ingress_dns_sync" {
|
||||||
|
|
@ -1010,42 +1002,6 @@ resource "kubernetes_cron_job_v1" "technitium_ingress_dns_sync" {
|
||||||
echo "mail-auth: MX present"
|
echo "mail-auth: MX present"
|
||||||
fi
|
fi
|
||||||
|
|
||||||
# Valia sites (ADR-0018) — off-infra Cloudflare Pages sites.
|
|
||||||
# The internal zone is authoritative (superset rule above), so
|
|
||||||
# these public-only names must exist here or every internal
|
|
||||||
# client NXDOMAINs on them. Reconciled DECLARATIVELY from the
|
|
||||||
# ConfigMap valia-sites-dns (written by stacks/valia-sites):
|
|
||||||
# ensure/update every entry, and DELETE stale records that
|
|
||||||
# left the map (site retired/renamed). Deletion is scoped to
|
|
||||||
# CNAMEs targeting *.pages.dev — nothing else is ever touched.
|
|
||||||
# Targets resolve upstream to CF edge IPs; no hairpin involved.
|
|
||||||
VALIA=$$(kubectl get configmap valia-sites-dns -n technitium -o go-template='{{range $$k, $$v := .data}}{{$$k}} {{$$v}}{{"\n"}}{{end}}' 2>/dev/null || true)
|
|
||||||
if [ -n "$$VALIA" ]; then
|
|
||||||
printf '%s\n' "$$VALIA" | while read -r VNAME VTARGET; do
|
|
||||||
[ -z "$$VNAME" ] && continue
|
|
||||||
CUR=$$(curl -sf "$$TECH_API/api/zones/records/get?token=$$TOKEN&zone=$$ZONE&domain=$$VNAME.$$ZONE" | grep -o '"cname":"[^"]*"' | head -1 | cut -d'"' -f4)
|
|
||||||
if [ "$$CUR" = "$$VTARGET" ]; then
|
|
||||||
echo "valia: $$VNAME.$$ZONE ok"
|
|
||||||
continue
|
|
||||||
fi
|
|
||||||
if [ -n "$$CUR" ]; then
|
|
||||||
curl -sf -G "$$TECH_API/api/zones/records/delete" --data-urlencode "token=$$TOKEN" --data-urlencode "zone=$$ZONE" --data-urlencode "domain=$$VNAME.$$ZONE" --data-urlencode "type=CNAME" --data-urlencode "cname=$$CUR" > /dev/null || true
|
|
||||||
fi
|
|
||||||
R=$$(curl -sf -G "$$TECH_API/api/zones/records/add" --data-urlencode "token=$$TOKEN" --data-urlencode "zone=$$ZONE" --data-urlencode "domain=$$VNAME.$$ZONE" --data-urlencode "type=CNAME" --data-urlencode "cname=$$VTARGET" --data-urlencode "ttl=3600") || true
|
|
||||||
echo "$$R" | grep -q '"status":"ok"' && echo "valia: set $$VNAME.$$ZONE -> $$VTARGET" || echo "valia: FAILED $$VNAME.$$ZONE -- $$R"
|
|
||||||
done
|
|
||||||
# Deletion pass: zone CNAMEs targeting *.pages.dev that are
|
|
||||||
# no longer in the map. ZONE_DUMP predates this run's adds,
|
|
||||||
# but just-set names are in $VALIA so they're never deleted.
|
|
||||||
printf '%s' "$$ZONE_DUMP" | tr ',' '\n' | awk -F'"' '/"name":/{n=$$4} /"cname":/{print n" "$$4}' | grep '\.pages\.dev *$$' | while read -r RNAME RTARGET; do
|
|
||||||
SHORT=$${RNAME%%.$$ZONE}
|
|
||||||
printf '%s\n' "$$VALIA" | grep -q "^$$SHORT " && continue
|
|
||||||
curl -sf -G "$$TECH_API/api/zones/records/delete" --data-urlencode "token=$$TOKEN" --data-urlencode "zone=$$ZONE" --data-urlencode "domain=$$RNAME" --data-urlencode "type=CNAME" --data-urlencode "cname=$$RTARGET" > /dev/null && echo "valia: removed stale $$RNAME -> $$RTARGET"
|
|
||||||
done
|
|
||||||
else
|
|
||||||
echo "valia: CM valia-sites-dns absent/unreadable -- skipping Pages CNAMEs this run"
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Pin the .lan ingress anchor A record to the LIVE Traefik LB IP.
|
# Pin the .lan ingress anchor A record to the LIVE Traefik LB IP.
|
||||||
# *.viktorbarzin.lan ingress hosts CNAME to ingress.viktorbarzin.lan,
|
# *.viktorbarzin.lan ingress hosts CNAME to ingress.viktorbarzin.lan,
|
||||||
# so a Traefik LB IP move that misses the .lan zone silently breaks
|
# so a Traefik LB IP move that misses the .lan zone silently breaks
|
||||||
|
|
|
||||||
|
|
@ -119,41 +119,6 @@ resource "kubernetes_manifest" "middleware_local_only" {
|
||||||
depends_on = [helm_release.traefik]
|
depends_on = [helm_release.traefik]
|
||||||
}
|
}
|
||||||
|
|
||||||
# IP allowlist for household access across ALL home sites: Sofia LAN + the
|
|
||||||
# WireGuard spoke LANs (London, Valchedrym) + 10/8 (VLANs, K8s pods/services,
|
|
||||||
# WG tunnel IPs). Deliberately a SEPARATE middleware from `local-only` —
|
|
||||||
# widening local-only would grant the remote LANs access to the admin surfaces
|
|
||||||
# that use it (Prometheus, iDRAC, Loki, …). Use for family-facing services
|
|
||||||
# (e.g. the immich-frame kiosks) that every household device may open but the
|
|
||||||
# public internet must not. Pair with ingress_factory `dns_type = "internal"`:
|
|
||||||
# a Cloudflare-proxied record would deliver public traffic from cloudflared
|
|
||||||
# POD IPs (inside 10/8) and silently bypass this allowlist.
|
|
||||||
resource "kubernetes_manifest" "middleware_home_lans_only" {
|
|
||||||
manifest = {
|
|
||||||
apiVersion = "traefik.io/v1alpha1"
|
|
||||||
kind = "Middleware"
|
|
||||||
metadata = {
|
|
||||||
name = "home-lans-only"
|
|
||||||
namespace = kubernetes_namespace.traefik.metadata[0].name
|
|
||||||
}
|
|
||||||
spec = {
|
|
||||||
ipAllowList = {
|
|
||||||
sourceRange = [
|
|
||||||
"192.168.1.0/24", # Sofia LAN (hub site)
|
|
||||||
"10.0.0.0/8", # VLANs, K8s pod/svc CIDRs, WG tunnel subnet
|
|
||||||
"192.168.8.0/24", # London LAN (via WG tunnel)
|
|
||||||
"192.168.9.0/24", # London GUEST net — the Portal Plus actually leases here (Portal-75AE8F9C2A8A = 192.168.9.198)
|
|
||||||
"192.168.0.0/24", # Valchedrym LAN (via WG tunnel)
|
|
||||||
"fc00::/7",
|
|
||||||
"fe80::/10",
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
depends_on = [helm_release.traefik]
|
|
||||||
}
|
|
||||||
|
|
||||||
# HTTPS redirect middleware
|
# HTTPS redirect middleware
|
||||||
resource "kubernetes_manifest" "middleware_redirect_https" {
|
resource "kubernetes_manifest" "middleware_redirect_https" {
|
||||||
manifest = {
|
manifest = {
|
||||||
|
|
@ -403,33 +368,6 @@ resource "kubernetes_manifest" "middleware_authentik_rate_limit" {
|
||||||
depends_on = [helm_release.traefik]
|
depends_on = [helm_release.traefik]
|
||||||
}
|
}
|
||||||
|
|
||||||
# Dawarich-specific rate limit. The Rails app serves all its fingerprinted
|
|
||||||
# assets itself (JS/CSS chunks, SVG store badges, favicons, webmanifest) and
|
|
||||||
# the map view adds a points/API burst on load — a single page load from one
|
|
||||||
# client IP blows past the default 10/50 limiter and 429s the asset tail
|
|
||||||
# (seventh instance of the burst pattern, after ha-sofia, ActualBudget, noVNC,
|
|
||||||
# tripit, health and authentik). Background location ingestion (OwnTracks
|
|
||||||
# bridge + mobile api_key POSTs) rides the same host, so 429s here also risk
|
|
||||||
# dropped pings. Burst absorbs a couple of full page loads back-to-back.
|
|
||||||
resource "kubernetes_manifest" "middleware_dawarich_rate_limit" {
|
|
||||||
manifest = {
|
|
||||||
apiVersion = "traefik.io/v1alpha1"
|
|
||||||
kind = "Middleware"
|
|
||||||
metadata = {
|
|
||||||
name = "dawarich-rate-limit"
|
|
||||||
namespace = kubernetes_namespace.traefik.metadata[0].name
|
|
||||||
}
|
|
||||||
spec = {
|
|
||||||
rateLimit = {
|
|
||||||
average = 100
|
|
||||||
burst = 1000
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
depends_on = [helm_release.traefik]
|
|
||||||
}
|
|
||||||
|
|
||||||
# Compress responses to clients at the entrypoint level (outermost).
|
# Compress responses to clients at the entrypoint level (outermost).
|
||||||
# Applied at websecure entrypoint so all responses get compressed.
|
# Applied at websecure entrypoint so all responses get compressed.
|
||||||
# Uses includedContentTypes (whitelist) instead of excludedContentTypes:
|
# Uses includedContentTypes (whitelist) instead of excludedContentTypes:
|
||||||
|
|
|
||||||
|
|
@ -175,12 +175,6 @@ locals {
|
||||||
STORY_SOURCE_MODE = "web"
|
STORY_SOURCE_MODE = "web"
|
||||||
SCRIPT_WRITER_MODE = "chat"
|
SCRIPT_WRITER_MODE = "chat"
|
||||||
PLACE_RESOLVER_MODE = "wikipedia"
|
PLACE_RESOLVER_MODE = "wikipedia"
|
||||||
# Saved Place preview photos (tripit ADR-0035/0040): the Wikipedia lead-image
|
|
||||||
# fetcher behind manual-add-time photos and the backfill sweep. Same fake-
|
|
||||||
# default gap as the resolver above — never set, so prod silently ran the
|
|
||||||
# fake and hand-added places (and any backfill) would store placeholder
|
|
||||||
# PNGs instead of real photos.
|
|
||||||
PLACE_PHOTO_PROVIDER = "wikipedia"
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -1,368 +0,0 @@
|
||||||
# Valia sites (ADR-0018): small static sites authored by Valia in Google Drive,
|
|
||||||
# served OFF-INFRA on Cloudflare Pages, mirrored by the in-cluster CronJob below
|
|
||||||
# every 10 minutes. Registering a new site = one entry in local.sites (plus
|
|
||||||
# Valia sharing the folder with vbarzin@gmail.com). Full runbook:
|
|
||||||
# docs/runbooks/valia-sites.md
|
|
||||||
#
|
|
||||||
# Per site this stack fans out:
|
|
||||||
# - cloudflare_pages_project + custom domain <name>.viktorbarzin.me
|
|
||||||
# - public proxied CNAME <name> -> <project>.pages.dev (manage_dns gate)
|
|
||||||
# - internal split-horizon CNAME via ConfigMap valia-sites-dns consumed by
|
|
||||||
# the technitium-ingress-dns-sync script (declarative: add/update/REMOVE)
|
|
||||||
# - a slot in the shared sync CronJob (rclone mirror -> wrangler deploy)
|
|
||||||
|
|
||||||
locals {
|
|
||||||
cloudflare_account_id = "02e035473cfc4834fb10c5d35470d8b4" # vbarzin@gmail.com's account (not a secret)
|
|
||||||
|
|
||||||
# THE site registry. Keys are the public subdomain (English, Viktor picks —
|
|
||||||
# CONTEXT.md "Valia site"). folder_id = the Drive folder Valia shared (the
|
|
||||||
# Content folder); src_path = subfolder holding servable files ("" = root);
|
|
||||||
# entry_file = what / must serve (staged as index.html at deploy time).
|
|
||||||
# manage_dns = false parks a site's public CNAME + internal record while the
|
|
||||||
# name is still owned elsewhere (used for the stem95su ingress cutover).
|
|
||||||
sites = {
|
|
||||||
bridge = {
|
|
||||||
folder_id = "1YWwAtSTsJD9HOzckGRIFXigWqCgYSGEa" # "мост" — ОбУ „Отец Паисий“
|
|
||||||
src_path = ""
|
|
||||||
entry_file = "index.html"
|
|
||||||
manage_dns = true
|
|
||||||
}
|
|
||||||
stem95su = {
|
|
||||||
folder_id = "1cmOI2jRyBJdnrVPgbr4kx2cx_4DY6pm_" # "claude" — 95. СУ STEM board
|
|
||||||
src_path = "stem claude/files"
|
|
||||||
entry_file = "stem_board.html"
|
|
||||||
manage_dns = true
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
dns_managed_sites = { for k, v in local.sites : k => v if v.manage_dns }
|
|
||||||
}
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Cloudflare Pages: project + custom domain per site
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
resource "cloudflare_pages_project" "site" {
|
|
||||||
for_each = local.sites
|
|
||||||
account_id = local.cloudflare_account_id
|
|
||||||
name = each.key
|
|
||||||
production_branch = "main"
|
|
||||||
}
|
|
||||||
|
|
||||||
# bridge was created by hand (wrangler) on 2026-07-03 — adopt, don't recreate.
|
|
||||||
import {
|
|
||||||
to = cloudflare_pages_project.site["bridge"]
|
|
||||||
id = "02e035473cfc4834fb10c5d35470d8b4/bridge"
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "cloudflare_pages_domain" "site" {
|
|
||||||
for_each = local.sites
|
|
||||||
account_id = local.cloudflare_account_id
|
|
||||||
project_name = cloudflare_pages_project.site[each.key].name
|
|
||||||
domain = "${each.key}.viktorbarzin.me"
|
|
||||||
}
|
|
||||||
|
|
||||||
import {
|
|
||||||
to = cloudflare_pages_domain.site["bridge"]
|
|
||||||
id = "02e035473cfc4834fb10c5d35470d8b4/bridge/bridge.viktorbarzin.me"
|
|
||||||
}
|
|
||||||
|
|
||||||
# Public proxied CNAME. Gated on manage_dns: a site whose name is still served
|
|
||||||
# by an in-cluster ingress keeps its ingress_factory record until cutover
|
|
||||||
# (two records can't share one name).
|
|
||||||
resource "cloudflare_record" "site" {
|
|
||||||
for_each = local.dns_managed_sites
|
|
||||||
zone_id = var.cloudflare_zone_id
|
|
||||||
name = each.key
|
|
||||||
content = cloudflare_pages_project.site[each.key].subdomain
|
|
||||||
type = "CNAME"
|
|
||||||
proxied = true
|
|
||||||
ttl = 1
|
|
||||||
}
|
|
||||||
|
|
||||||
# bridge's record predates this stack (created 2026-07-03 in stacks/cloudflared,
|
|
||||||
# handed off via removed{} there) — adopt by id.
|
|
||||||
import {
|
|
||||||
to = cloudflare_record.site["bridge"]
|
|
||||||
id = "fd2c5dd4efe8fe38958944e74d0ced6d/ff4fb6f4900744d4b22de50d3fdd219b"
|
|
||||||
}
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# Internal split-horizon DNS feed (docs/architecture/dns.md "superset rule"):
|
|
||||||
# the technitium-ingress-dns-sync script reads this CM and reconciles internal
|
|
||||||
# CNAMEs for every entry — including deleting stale *.pages.dev records when
|
|
||||||
# an entry disappears (site retired/renamed).
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
resource "kubernetes_config_map" "valia_sites_dns" {
|
|
||||||
metadata {
|
|
||||||
name = "valia-sites-dns"
|
|
||||||
namespace = "technitium"
|
|
||||||
labels = { "app.kubernetes.io/managed-by" = "valia-sites" }
|
|
||||||
}
|
|
||||||
data = { for k, v in local.dns_managed_sites : k => cloudflare_pages_project.site[k].subdomain }
|
|
||||||
}
|
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
# The shared sync CronJob
|
|
||||||
# ---------------------------------------------------------------------------
|
|
||||||
|
|
||||||
resource "kubernetes_namespace" "valia_sites" {
|
|
||||||
metadata {
|
|
||||||
name = "valia-sites"
|
|
||||||
labels = {
|
|
||||||
"istio-injection" : "disabled"
|
|
||||||
tier = local.tiers.aux
|
|
||||||
}
|
|
||||||
}
|
|
||||||
lifecycle {
|
|
||||||
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
|
|
||||||
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# Secrets: shared drive.readonly rclone conf + the SCOPED CF Pages token
|
|
||||||
# (Pages Read/Write only — the Global API Key never enters a pod).
|
|
||||||
resource "kubernetes_manifest" "sync_external_secret" {
|
|
||||||
field_manager {
|
|
||||||
force_conflicts = true
|
|
||||||
}
|
|
||||||
manifest = {
|
|
||||||
apiVersion = "external-secrets.io/v1"
|
|
||||||
kind = "ExternalSecret"
|
|
||||||
metadata = {
|
|
||||||
name = "valia-sites-sync"
|
|
||||||
namespace = kubernetes_namespace.valia_sites.metadata[0].name
|
|
||||||
}
|
|
||||||
spec = {
|
|
||||||
refreshInterval = "1h"
|
|
||||||
secretStoreRef = {
|
|
||||||
name = "vault-kv"
|
|
||||||
kind = "ClusterSecretStore"
|
|
||||||
}
|
|
||||||
target = { name = "valia-sites-sync" }
|
|
||||||
data = [
|
|
||||||
{
|
|
||||||
secretKey = "rclone.conf"
|
|
||||||
remoteRef = { key = "valia-sites", property = "rclone_conf" }
|
|
||||||
},
|
|
||||||
{
|
|
||||||
secretKey = "CLOUDFLARE_API_TOKEN"
|
|
||||||
remoteRef = { key = "valia-sites", property = "cloudflare_pages_token" }
|
|
||||||
},
|
|
||||||
{
|
|
||||||
secretKey = "CLOUDFLARE_ACCOUNT_ID"
|
|
||||||
remoteRef = { key = "valia-sites", property = "account_id" }
|
|
||||||
},
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
depends_on = [kubernetes_namespace.valia_sites]
|
|
||||||
}
|
|
||||||
|
|
||||||
# Site registry rendered for the job (folder ids aren't secrets).
|
|
||||||
resource "kubernetes_config_map" "sync_config" {
|
|
||||||
metadata {
|
|
||||||
name = "valia-sites-config"
|
|
||||||
namespace = kubernetes_namespace.valia_sites.metadata[0].name
|
|
||||||
}
|
|
||||||
data = {
|
|
||||||
"sites.json" = jsonencode(local.sites)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
# Last-deployed manifest hash per site — written by the job (merge-patch), so
|
|
||||||
# TF must never fight it over data.
|
|
||||||
resource "kubernetes_config_map" "sync_state" {
|
|
||||||
metadata {
|
|
||||||
name = "valia-sites-state"
|
|
||||||
namespace = kubernetes_namespace.valia_sites.metadata[0].name
|
|
||||||
}
|
|
||||||
data = {}
|
|
||||||
lifecycle {
|
|
||||||
ignore_changes = [data]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_service_account" "sync" {
|
|
||||||
metadata {
|
|
||||||
name = "valia-sites-sync"
|
|
||||||
namespace = kubernetes_namespace.valia_sites.metadata[0].name
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_role" "sync_state" {
|
|
||||||
metadata {
|
|
||||||
name = "valia-sites-sync-state"
|
|
||||||
namespace = kubernetes_namespace.valia_sites.metadata[0].name
|
|
||||||
}
|
|
||||||
rule {
|
|
||||||
api_groups = [""]
|
|
||||||
resources = ["configmaps"]
|
|
||||||
resource_names = ["valia-sites-state"]
|
|
||||||
verbs = ["get", "patch"]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_role_binding" "sync_state" {
|
|
||||||
metadata {
|
|
||||||
name = "valia-sites-sync-state"
|
|
||||||
namespace = kubernetes_namespace.valia_sites.metadata[0].name
|
|
||||||
}
|
|
||||||
role_ref {
|
|
||||||
api_group = "rbac.authorization.k8s.io"
|
|
||||||
kind = "Role"
|
|
||||||
name = kubernetes_role.sync_state.metadata[0].name
|
|
||||||
}
|
|
||||||
subject {
|
|
||||||
kind = "ServiceAccount"
|
|
||||||
name = kubernetes_service_account.sync.metadata[0].name
|
|
||||||
namespace = kubernetes_namespace.valia_sites.metadata[0].name
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
resource "kubernetes_cron_job_v1" "sync" {
|
|
||||||
metadata {
|
|
||||||
name = "valia-sites-sync"
|
|
||||||
namespace = kubernetes_namespace.valia_sites.metadata[0].name
|
|
||||||
labels = { app = "valia-sites", component = "sync" }
|
|
||||||
}
|
|
||||||
spec {
|
|
||||||
schedule = "*/10 * * * *"
|
|
||||||
concurrency_policy = "Forbid"
|
|
||||||
successful_jobs_history_limit = 2
|
|
||||||
failed_jobs_history_limit = 3
|
|
||||||
job_template {
|
|
||||||
metadata {}
|
|
||||||
spec {
|
|
||||||
backoff_limit = 1
|
|
||||||
ttl_seconds_after_finished = 86400
|
|
||||||
template {
|
|
||||||
metadata { labels = { app = "valia-sites", component = "sync" } }
|
|
||||||
spec {
|
|
||||||
restart_policy = "OnFailure"
|
|
||||||
service_account_name = kubernetes_service_account.sync.metadata[0].name
|
|
||||||
container {
|
|
||||||
name = "sync"
|
|
||||||
image = "ghcr.io/viktorbarzin/valia-sites-sync:latest"
|
|
||||||
# Guards mirror stem95su's proven set: hard-fail on Drive
|
|
||||||
# list/auth errors (visible as a failed Job — the chosen
|
|
||||||
# visibility, ADR-0018), skip quietly when a folder is empty or
|
|
||||||
# missing its entry file (never wipe a live site), capped
|
|
||||||
# deletes. Deploy ONLY on remote-manifest change: CF Pages caps
|
|
||||||
# monthly deployments on the free tier, so 144 no-op
|
|
||||||
# deploys/day is not an option.
|
|
||||||
command = ["/bin/sh", "-c", <<-EOT
|
|
||||||
set -u
|
|
||||||
cp /config/rclone.conf /tmp/rc.conf
|
|
||||||
APISERVER="https://kubernetes.default.svc"
|
|
||||||
SA=/var/run/secrets/kubernetes.io/serviceaccount
|
|
||||||
KTOKEN=$$(cat $$SA/token); NS=$$(cat $$SA/namespace)
|
|
||||||
STATE_URL="$$APISERVER/api/v1/namespaces/$$NS/configmaps/valia-sites-state"
|
|
||||||
FAILED=0
|
|
||||||
for SITE in $$(jq -r 'keys[]' /sites/sites.json); do
|
|
||||||
FOLDER=$$(jq -r --arg s "$$SITE" '.[$$s].folder_id' /sites/sites.json)
|
|
||||||
SRC_PATH=$$(jq -r --arg s "$$SITE" '.[$$s].src_path' /sites/sites.json)
|
|
||||||
ENTRY=$$(jq -r --arg s "$$SITE" '.[$$s].entry_file' /sites/sites.json)
|
|
||||||
RC="rclone --config /tmp/rc.conf --drive-root-folder-id=$$FOLDER --drive-skip-gdocs"
|
|
||||||
# 1. Remote manifest (path+size+hash) — metadata only, no download.
|
|
||||||
MANIFEST=$$($$RC lsf "gdrive:$$SRC_PATH" -R --files-only --format phs 2>/tmp/lsf.err) || {
|
|
||||||
echo "FATAL [$$SITE]: Drive list failed (auth/network):"; cat /tmp/lsf.err; FAILED=1; continue; }
|
|
||||||
N=$$(printf '%s\n' "$$MANIFEST" | grep -c . || true)
|
|
||||||
if [ "$$N" -lt 1 ] || ! printf '%s\n' "$$MANIFEST" | cut -d';' -f1 | grep -qx "$$ENTRY"; then
|
|
||||||
echo "GUARD [$$SITE]: N=$$N / $$ENTRY missing -- skipping, site untouched"; continue
|
|
||||||
fi
|
|
||||||
# Cloudflare Pages hard-caps files at 25 MB — deploying
|
|
||||||
# without an oversize file would silently break the pages
|
|
||||||
# that reference it, so skip the whole site instead (last
|
|
||||||
# deployed content keeps serving) and say so loudly.
|
|
||||||
OVERSIZE=$$(printf '%s\n' "$$MANIFEST" | awk -F';' '$$3 > 26214400 {print $$1" ("$$3" B)"}')
|
|
||||||
if [ -n "$$OVERSIZE" ]; then
|
|
||||||
echo "GUARD [$$SITE]: file(s) exceed the 25MB Pages limit -- skipping, site untouched:"; echo "$$OVERSIZE"; continue
|
|
||||||
fi
|
|
||||||
HASH=$$(printf '%s' "$$MANIFEST" | sha256sum | cut -d' ' -f1)
|
|
||||||
LAST=$$(curl -sf --cacert $$SA/ca.crt -H "Authorization: Bearer $$KTOKEN" "$$STATE_URL" | jq -r --arg s "$$SITE" '.data[$$s] // ""')
|
|
||||||
if [ "$$HASH" = "$$LAST" ]; then echo "OK [$$SITE]: unchanged"; continue; fi
|
|
||||||
# 2. Content changed — pull and deploy.
|
|
||||||
$$RC sync "gdrive:$$SRC_PATH" "/work/$$SITE" --exclude ".DS_Store" --fast-list --transfers 4 --max-delete 25 -v || {
|
|
||||||
echo "FATAL [$$SITE]: rclone sync failed"; FAILED=1; continue; }
|
|
||||||
if [ "$$ENTRY" != "index.html" ]; then
|
|
||||||
cp "/work/$$SITE/$$ENTRY" "/work/$$SITE/index.html"
|
|
||||||
fi
|
|
||||||
wrangler pages deploy "/work/$$SITE" --project-name="$$SITE" --branch=main --commit-dirty=true || {
|
|
||||||
echo "FATAL [$$SITE]: wrangler deploy failed"; FAILED=1; continue; }
|
|
||||||
curl -sf --cacert $$SA/ca.crt -H "Authorization: Bearer $$KTOKEN" \
|
|
||||||
-X PATCH -H "Content-Type: application/merge-patch+json" \
|
|
||||||
-d "{\"data\":{\"$$SITE\":\"$$HASH\"}}" "$$STATE_URL" > /dev/null || {
|
|
||||||
echo "WARN [$$SITE]: state patch failed (will redeploy next run)"; FAILED=1; }
|
|
||||||
echo "DEPLOYED [$$SITE]: $$HASH"
|
|
||||||
done
|
|
||||||
exit $$FAILED
|
|
||||||
EOT
|
|
||||||
]
|
|
||||||
env {
|
|
||||||
name = "CLOUDFLARE_API_TOKEN"
|
|
||||||
value_from {
|
|
||||||
secret_key_ref {
|
|
||||||
name = "valia-sites-sync"
|
|
||||||
key = "CLOUDFLARE_API_TOKEN"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
env {
|
|
||||||
name = "CLOUDFLARE_ACCOUNT_ID"
|
|
||||||
value_from {
|
|
||||||
secret_key_ref {
|
|
||||||
name = "valia-sites-sync"
|
|
||||||
key = "CLOUDFLARE_ACCOUNT_ID"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
resources {
|
|
||||||
requests = { cpu = "25m", memory = "128Mi" }
|
|
||||||
limits = { memory = "512Mi" }
|
|
||||||
}
|
|
||||||
volume_mount {
|
|
||||||
name = "rclone-config"
|
|
||||||
mount_path = "/config"
|
|
||||||
read_only = true
|
|
||||||
}
|
|
||||||
volume_mount {
|
|
||||||
name = "sites-config"
|
|
||||||
mount_path = "/sites"
|
|
||||||
read_only = true
|
|
||||||
}
|
|
||||||
volume_mount {
|
|
||||||
name = "work"
|
|
||||||
mount_path = "/work"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
volume {
|
|
||||||
name = "rclone-config"
|
|
||||||
secret {
|
|
||||||
secret_name = "valia-sites-sync"
|
|
||||||
items {
|
|
||||||
key = "rclone.conf"
|
|
||||||
path = "rclone.conf"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
volume {
|
|
||||||
name = "sites-config"
|
|
||||||
config_map { name = kubernetes_config_map.sync_config.metadata[0].name }
|
|
||||||
}
|
|
||||||
volume {
|
|
||||||
name = "work"
|
|
||||||
empty_dir {}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
lifecycle {
|
|
||||||
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
|
||||||
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
|
||||||
}
|
|
||||||
depends_on = [kubernetes_manifest.sync_external_secret]
|
|
||||||
}
|
|
||||||
|
|
@ -1,15 +0,0 @@
|
||||||
# valia-sites-sync: everything the 10-min Content-folder mirror needs, baked in
|
|
||||||
# (no runtime installs — CronJob pods must not apk/npm on every start).
|
|
||||||
# rclone pinned to match the proven stem95su version; wrangler pinned to major 4.
|
|
||||||
FROM node:22-alpine
|
|
||||||
|
|
||||||
RUN apk add --no-cache curl unzip ca-certificates jq \
|
|
||||||
&& curl -fsSL https://downloads.rclone.org/v1.74.3/rclone-v1.74.3-linux-amd64.zip -o /tmp/rclone.zip \
|
|
||||||
&& unzip -j /tmp/rclone.zip '*/rclone' -d /usr/local/bin \
|
|
||||||
&& chmod +x /usr/local/bin/rclone \
|
|
||||||
&& rm /tmp/rclone.zip \
|
|
||||||
&& npm install -g wrangler@4 \
|
|
||||||
&& npm cache clean --force
|
|
||||||
|
|
||||||
# wrangler writes config/cache under $HOME; the CronJob runs as non-root node (uid 1000)
|
|
||||||
ENV HOME=/tmp
|
|
||||||
|
|
@ -1,8 +0,0 @@
|
||||||
include "root" {
|
|
||||||
path = find_in_parent_folders()
|
|
||||||
}
|
|
||||||
|
|
||||||
dependency "platform" {
|
|
||||||
config_path = "../platform"
|
|
||||||
skip_outputs = true
|
|
||||||
}
|
|
||||||
|
|
@ -1,3 +0,0 @@
|
||||||
variable "cloudflare_zone_id" {
|
|
||||||
type = string
|
|
||||||
}
|
|
||||||
|
|
@ -675,7 +675,6 @@ resource "vault_database_secret_backend_connection" "postgresql" {
|
||||||
"pg-nextcloud-todos",
|
"pg-nextcloud-todos",
|
||||||
"pg-technitium",
|
"pg-technitium",
|
||||||
"pg-goldmane-edges",
|
"pg-goldmane-edges",
|
||||||
"pg-tasks",
|
|
||||||
]
|
]
|
||||||
|
|
||||||
postgresql {
|
postgresql {
|
||||||
|
|
@ -904,17 +903,6 @@ resource "vault_database_secret_backend_static_role" "pg_goldmane_edges" {
|
||||||
rotation_period = 604800
|
rotation_period = 604800
|
||||||
}
|
}
|
||||||
|
|
||||||
# tasks PWA (Reminders-style front-end over Nextcloud CalDAV) — 7-day rotation
|
|
||||||
# for the `tasks` CNPG role. Consumed by stacks/tasks via a vault-database
|
|
||||||
# ExternalSecret -> TASKS_DB_DSN (remoteRef static-creds/pg-tasks).
|
|
||||||
resource "vault_database_secret_backend_static_role" "pg_tasks" {
|
|
||||||
backend = vault_mount.database.path
|
|
||||||
db_name = vault_database_secret_backend_connection.postgresql.name
|
|
||||||
name = "pg-tasks"
|
|
||||||
username = "tasks"
|
|
||||||
rotation_period = 604800
|
|
||||||
}
|
|
||||||
|
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
# Kubernetes Secrets Engine — Dynamic K8s Credentials
|
# Kubernetes Secrets Engine — Dynamic K8s Credentials
|
||||||
# =============================================================================
|
# =============================================================================
|
||||||
|
|
|
||||||
File diff suppressed because one or more lines are too long
Loading…
Add table
Add a link
Reference in a new issue