infra

Author	SHA1	Message	Date
Viktor Barzin	731de63150	fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 20:00:21 +00:00
Viktor Barzin	9ce9a9a7f7	Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`.	2026-04-17 19:52:36 +00:00
Viktor Barzin	277babc696	[tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for .viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/* filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks//secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits `5a988133` 2026-03-11, `a6f71fc6` 2026-03-18, `9820f2ce` 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN .viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks//secrets/ filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 19:49:09 +00:00
Viktor Barzin	d91fbd4a60	[monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\\|main\.sh' \ --include='.tf' --include='.hcl' --include='.yaml' \ --include='.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 19:42:55 +00:00
Viktor Barzin	7a884a0b97	[monitoring] Fix alerts for intentionally scaled-down services PoisonFountainDown and ForwardAuthFallbackActive both fired because poison-fountain was scaled to 0 replicas (intentional). Updated both alert expressions to check kube_deployment_spec_replicas > 0 before alerting on missing available replicas — if desired replicas is 0, the service is intentionally down and should not alert. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 19:17:41 +00:00
Viktor Barzin	a19581e32b	fix(beads-server): fix Workbench timeout — use internal GraphQL URL GRAPHQLAPI_URL must point to localhost:9002 (internal), not the external URL which goes through Authentik. SSR can't authenticate to Authentik. Also removed Authentik from /graphql ingress — browser fetch() can't follow 302 redirects on POST requests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 19:05:47 +00:00
Viktor Barzin	da6b82ed5c	fix(beads-server): persist GRAPHQLAPI_URL in Terraform The env var was only set via kubectl and got overwritten on next apply. Now permanently in the deployment spec. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 18:58:59 +00:00
Viktor Barzin	afb8a16623	[infra] Scale down unused services + remove DoH ingress Scale to 0 replicas: - ollama: low usage, saves ~2Gi memory + 59GB NFS-SSD model data idle - poison-fountain: RSS link archiver, not actively used - travel-blog: Hugo blog, not actively used Remove technitium DoH ingress (dns.viktorbarzin.me): externally unreachable and unused. DNS is served on UDP/TCP port 53 via LoadBalancer (10.0.20.201). Clears 3 of 5 ExternalAccessDivergence services. Remaining 2 (pdf, travel) should clear now that the Uptime Kuma monitors will report both down. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 18:55:52 +00:00
Viktor Barzin	cdc851fc63	[alerts] Fix status-page-pusher crash + Prometheus backup push ## status-page-pusher (ExternalAccessDivergence false positive) The pusher was crashing with `AttributeError: 'list' object has no attribute 'get'` at line 122 — the uptime-kuma-api library changed the heartbeats return format. Fixed by making beat flattening more robust: handle any nesting of lists/dicts in the heartbeat data, and add isinstance check before calling `.get()` on the latest beat. ## Prometheus backup (PrometheusBackupNeverRun) The backup sidecar's Pushgateway push was silently failing because `wget --post-file=-` needs `--header="Content-Type: text/plain"` for Pushgateway to accept the Prometheus exposition format. Added the header. Also manually pushed the metric to clear the `absent()` alert immediately. Note: ExternalAccessDivergence still fires because 5 services (ollama, pdf, poison, dns, travel) ARE genuinely externally unreachable but internally up. This is a real issue (likely Cloudflare tunnel routing) not a false positive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 18:29:43 +00:00
Viktor Barzin	eef4242408	fix(beads-server): auto-connect Workbench to Dolt on startup The Workbench's database connection is in-memory and lost on pod restart. Added startup script that waits for GraphQL server readiness, then calls addDatabaseConnection mutation automatically. No more manual reconnection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 18:12:31 +00:00
Viktor Barzin	b034c868db	[traefik] Remove broken rewrite-body plugin and all rybbit/anti-AI injection The rewrite-body Traefik plugin (both packruler/rewrite-body v1.2.0 and the-ccsn/traefik-plugin-rewritebody v0.1.3) silently fails on Traefik v3.6.12 due to Yaegi interpreter issues with ResponseWriter wrapping. Both plugins load without errors but never inject content. Removed: - rewrite-body plugin download (init container) and registration - strip-accept-encoding middleware (only existed for rewrite-body bug) - anti-ai-trap-links middleware (used rewrite-body for injection) - rybbit_site_id variable from ingress_factory and reverse_proxy factory - rybbit_site_id from 25 service stacks (39 instances) - Per-service rybbit-analytics middleware CRD resources Kept: - compress middleware (entrypoint-level, working correctly) - ai-bot-block middleware (ForwardAuth to bot-block-proxy) - anti-ai-headers middleware (X-Robots-Tag: noai, noimageai) - All CrowdSec, Authentik, rate-limit middleware unchanged Next: Cloudflare Workers with HTMLRewriter for edge-side injection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 12:41:17 +00:00
Viktor Barzin	b24545ffdb	fix(beads-server): fix BeadBoard project ID + install bd binary - Fixed project_id mismatch (was "beadboard", should be actual DB project ID) - Rebuilt Docker image with bd v1.0.2 binary (node:20-slim for glibc compat) - Ran bd migrate to update schema from 1.0.0 → 1.0.2 (adds started_at, etc.) - Task creation and bd CLI now work inside the container Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 11:57:45 +00:00
Viktor Barzin	f2037545b3	fix(beads-server): make BeadBoard .beads dir writable BeadBoard needs to create templates/ and archetypes/ subdirectories inside .beads/. ConfigMap mounts are read-only, causing ENOENT errors and 503 responses. Fix: init container copies ConfigMap to emptyDir. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 11:37:26 +00:00
Viktor Barzin	00e2f15a5d	feat(beads-server): deploy BeadBoard task visualization dashboard Add BeadBoard (zenchantlive/beadboard) alongside Dolt server and Workbench for task dependency graph, kanban, and agent coordination views. - Built custom Docker image (registry.viktorbarzin.me:5050/beadboard) - ConfigMap provides .beads/metadata.json pointing to Dolt server - Behind Authentik auth at beadboard.viktorbarzin.me - Also fixed: GraphQL ingress now has Authentik middleware - Also fixed: Workbench store.json type enum (mysql → Mysql) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 11:30:43 +00:00
Viktor Barzin	366e2ab083	[uptime-kuma] Opt-out external monitoring for every public ingress [ci skip] ## Context After the previous commit migrated monitor discovery to per-ingress annotation (opt-in via `uptime.viktorbarzin.me/external-monitor=true`), coverage expanded from 13 → 26 monitors but still left ~99 public ingresses uncovered — notably Helm-managed services (authentik, grafana, vault, forgejo, ntfy) that don't go through `ingress_factory`, plus any `dns_type = "non-proxied"` ingress (Immich was a direct victim: `dns_type = "non-proxied"` → no annotation added → no monitor → invisible outage). The user's concern: "I should have known external Immich was down before users tried to open it." ## This change Flipped the semantic from opt-in to opt-out by default: - Every ingress whose host ends in `.viktorbarzin.me` gets a `[External] <label>` monitor automatically - Only ingresses with annotation `uptime.viktorbarzin.me/external-monitor=false` are skipped - Host dedup via a `seen` set (one monitor per hostname, regardless of how many Ingress resources share it) ## Verification Triggered a manual CronJob run post-apply: ``` Sync complete: 102 created, 1 deleted, 23 unchanged ``` Coverage jumped from 26 → ~124 external monitors. All 6 Helm-managed services now have dedicated monitors: - [External] immich, authentik, forgejo, grafana, ntfy, vault ## Scope Only `stacks/uptime-kuma/modules/uptime-kuma/main.tf` (Python script in the CronJob resource). No RBAC or service account changes — the ones added in the previous commit still cover this path. ## Test plan ### Automated \`\`\` \$ kubectl -n uptime-kuma logs -l job-name=manual-sync-optout-1776422993 --tail=50 \| grep -iE 'immich\|authentik\|grafana\|forgejo\|vault\|ntfy' Creating monitor: [External] authentik -> https://authentik.viktorbarzin.me Creating monitor: [External] forgejo -> https://forgejo.viktorbarzin.me Creating monitor: [External] immich -> https://immich.viktorbarzin.me Creating monitor: [External] grafana -> https://grafana.viktorbarzin.me Creating monitor: [External] ntfy -> https://ntfy.viktorbarzin.me Creating monitor: [External] vault -> https://vault.viktorbarzin.me \`\`\` ### Manual Verification 1. Open `https://uptime.viktorbarzin.me` → confirm `[External] immich` exists 2. Simulate an Immich outage (scale deploy to 0 briefly) → external monitor should go red within the probe interval (5min); internal monitor stays up (pod-level from a different probe angle) → `ExternalAccessDivergence` alert fires after 15 min Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 11:12:00 +00:00
Viktor Barzin	66d2d9916b	[infra] Per-ingress external-monitor annotation + actualbudget plan-time fix [ci skip] ## Context Two operational gaps surfaced during a healthcheck sweep today: 1. External monitoring coverage: Only ~13 hostnames (via `cloudflare_proxied_names` in `config.tfvars`) had `[External]` monitors in Uptime Kuma. Any service deployed via `ingress_factory` with `dns_type = "proxied"` auto-created its DNS record but was NOT registered for external probing — so outages like Immich going down externally were invisible until a user complained. 99 of ~125 public ingresses had no external monitor. 2. actualbudget stack unplannable: `count = var.budget_encryption_password != null ? 1 : 0` in `factory/main.tf:152` failed with "Invalid count argument" because the value flows from a `data.kubernetes_secret` whose contents are `(known after apply)` at plan time. Blocked CI applies and drift reconciliation. ## This change ### Per-ingress external-monitor annotation (ingress_factory + reverse_proxy/factory) - New variables `external_monitor` (bool, nullable) + `external_monitor_name` (string, nullable). Default is "follow dns_type" — enabled for any public DNS record (`dns_type != "none"`, covers both proxied and non-proxied so Immich and other direct-A records are also monitored). - Emits two annotations on the Ingress: - `uptime.viktorbarzin.me/external-monitor = "true"` - `uptime.viktorbarzin.me/external-monitor-name = "<label>"` (optional override) ### external-monitor-sync CronJob (uptime-kuma stack) - Discovers targets from live Ingress objects via the K8s API first (filter by annotation), falls back to the legacy `external-monitor-targets` ConfigMap on any API error (zero rollout risk). - New `ServiceAccount` + cluster-wide `ClusterRole`/`ClusterRoleBinding` giving `list`/`get` on `networking.k8s.io/ingresses`. - `API_SERVER` now uses the `KUBERNETES_SERVICE_HOST` env var (always injected by K8s) instead of `kubernetes.default.svc` — the search-domain expansion failed in the CronJob pod's DNS config. Verified working: CronJob now logs `Loaded N external monitor targets (source=k8s-api)`. ### actualbudget count-on-unknown refactor - Replaced `count = var.budget_encryption_password != null ? 1 : 0` with two explicit plan-time booleans: `enable_http_api` and `enable_bank_sync`. Values are known at plan; no `-target` workaround needed. - Callers (`stacks/actualbudget/main.tf`) pass `true` explicitly. Runtime behaviour is unchanged — the secret is still consumed via env var. - Also aligned the factory with live state (the 3 budget-* PVCs had been migrated `proxmox-lvm` → `proxmox-lvm-encrypted` outside Terraform): PVC resource renamed `data_proxmox` → `data_encrypted`, storage class updated, orphaned `nfs_data` module removed. State was rm'd + re-imported with matching UIDs, so no data was moved. ## Rollout status (already partially applied in this session) - `stacks/uptime-kuma` applied — SA + RBAC + CronJob changes live; FQDN fix verified - `stacks/actualbudget` applied — budget-{viktor,anca,emo} all 200 OK externally - `stacks/mailserver` + 21 other ingress_factory consumers applied — annotations live - CronJob `external-monitor-sync` latest run: `source=k8s-api`, 26 monitors active (was 13 on the central list) ## Deferred (separate work) - 4 stacks show pre-existing DESTRUCTIVE drift in plan (metallb namespace, claude-memory, rbac, redis) — NOT triggered by this commit but will be by CI's global-file cascade. `[ci skip]` here so those don't auto-apply; they will be fixed manually before the next CI push. - Cleanup of `cloudflare_proxied_names` list once Helm-managed ingresses (authentik, grafana, vault, forgejo) are annotated — separate PR. ## Test plan ### Automated \`\`\` \$ kubectl -n uptime-kuma logs \$(kubectl -n uptime-kuma get pods -l job-name -o name \| tail -1) Loaded 26 external monitor targets (source=k8s-api) Sync complete: 7 created, 0 deleted, 17 unchanged \$ curl -sk -o /dev/null -w "%{http_code}\n" -H "Accept: text/html" \\ https://dawarich.viktorbarzin.me/ https://nextcloud.viktorbarzin.me/ \\ https://budget-viktor.viktorbarzin.me/ 200 302 200 \$ kubectl -n actualbudget get deploy,pvc -l app=budget-viktor deployment.apps/budget-viktor 1/1 1 1 Ready persistentvolumeclaim/budget-viktor-data-encrypted Bound 10Gi RWO proxmox-lvm-encrypted \`\`\` ### Manual Verification 1. Confirm the annotation is present on an ingress_factory ingress: \`\`\` kubectl -n dawarich get ingress dawarich -o \\ jsonpath='{.metadata.annotations.uptime\.viktorbarzin\.me/external-monitor}' # Expected: "true" \`\`\` 2. Confirm the new `[External] <name>` monitor appears in Uptime Kuma within 10 min (CronJob interval). For Immich specifically, it will appear after the immich stack is re-applied. 3. Verify actualbudget plan is clean: \`\`\` cd stacks/actualbudget && scripts/tg plan --non-interactive # Expected: no "Invalid count argument" errors \`\`\` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 10:34:32 +00:00
Viktor Barzin	996bdfc9b6	[technitium] Uninstall MySQL+SQLite query log plugins instead of just disabling ## Context Disabling MySQL/SQLite query logging via config was not durable — Technitium re-enables disabled plugins on pod restart, causing 46 GB/day of writes to the standalone MySQL (15M inserts to technitium.dns_logs between CronJob runs). ## This change: The password-sync CronJob now UNINSTALLS MySQL and SQLite query log plugins via `/api/apps/uninstall` instead of setting `enableLogging:false`. This is permanent — the plugin files are removed from the PVC, so they can't re-enable on restart. The CronJob checks if the plugins are present first (idempotent). Only PostgreSQL query logging remains (90-day retention). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 08:20:55 +00:00
Viktor Barzin	f0a73815d8	[freedify] Remove stale sed patches from container startup The audio-engine.js, dom.js, and dj.js files were refactored/removed in the upstream Freedify repo. The sed patches that disabled iOS EQ auto-init and visualizer no longer have targets, causing the container to crash on startup. Use the image's default CMD instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 06:17:13 +00:00
Viktor Barzin	f8facf44dd	[infra] Fix rewrite-body plugin + cleanup TrueNAS + version bumps ## Context The rewrite-body Traefik plugin (packruler/rewrite-body v1.2.0) silently broke on Traefik v3.6.12 — every service using rybbit analytics or anti-AI injection returned HTTP 200 with "Error 404: Not Found" body. Root cause: middleware specs referenced plugin name `rewrite-body` but Traefik registered it as `traefik-plugin-rewritebody`. Migrated to maintained fork `the-ccsn/traefik-plugin-rewritebody` v0.1.3 which uses the correct plugin name. Also added `lastModified = true` and `methods = ["GET"]` to anti-AI middleware to avoid rewriting non-HTML responses. ## This change - Replace packruler/rewrite-body v1.2.0 with the-ccsn/traefik-plugin-rewritebody v0.1.3 - Fix plugin name in all 3 middleware locations (ingress_factory, reverse-proxy factory, traefik anti-AI) - Remove deprecated TrueNAS cloud sync monitor (VM decommissioned 2026-04-13) - Remove CloudSyncStale/CloudSyncFailing/CloudSyncNeverRun alerts - Fix PrometheusBackupNeverRun alert (for: 48h → 32d to match monthly sidecar schedule) - Bump versions: rybbit v1.0.21→v1.1.0, wealthfolio v1.1.0→v3.2, networking-toolbox 1.1.1→1.6.0, cyberchef v10.24.0→v9.55.0 - MySQL standalone storage_limit 30Gi → 50Gi - beads-server: fix Dolt workbench type casing, remove Authentik on GraphQL endpoint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 05:51:52 +00:00
Viktor Barzin	4c8e5bea0b	[traefik] Add global compress middleware to fix response compression The rewrite-body plugin (rybbit analytics, anti-AI trap links) requires strip-accept-encoding to work, which killed HTTP compression for 50+ services. This adds Traefik's built-in compress middleware at the websecure entrypoint level to re-compress responses to clients after rewrite-body has modified them. Uses includedContentTypes whitelist (not excludedContentTypes) so only text-based types are compressed. SSE, WebSocket, gRPC, and binary downloads are unaffected. Measured improvement on ha-sofia: - app.js: 540KB → 167KB (3.2x) - core.js: 52KB → 19KB (2.7x) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 22:18:51 +00:00
Viktor Barzin	e80b2f026f	[infra] Migrate Terraform state from local SOPS to PostgreSQL backend Two-tier state architecture: - Tier 0 (infra, platform, cnpg, vault, dbaas, external-secrets): local state with SOPS encryption in git — unchanged, required for bootstrap. - Tier 1 (105 app stacks): PostgreSQL backend on CNPG cluster at 10.0.20.200:5432/terraform_state with native pg_advisory_lock. Motivation: multi-operator friction (every workstation needed SOPS + age + git-crypt), bootstrap complexity for new operators, and headless agents/CI needing the full encryption toolchain just to read state. Changes: - terragrunt.hcl: conditional backend (local vs pg) based on tier0 list - scripts/tg: tier detection, auto-fetch PG creds from Vault for Tier 1, skip SOPS and Vault KV locking for Tier 1 stacks - scripts/state-sync: tier-aware encrypt/decrypt (skips Tier 1) - scripts/migrate-state-to-pg: one-shot migration script (idempotent) - stacks/vault/main.tf: pg-terraform-state static role + K8s auth role for claude-agent namespace - stacks/dbaas: terraform_state DB creation + MetalLB LoadBalancer service on shared IP 10.0.20.200 - Deleted 107 .tfstate.enc files for migrated Tier 1 stacks - Cleaned up per-stack tiers.tf (now generated by root terragrunt.hcl) [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 19:33:12 +00:00
Viktor Barzin	f538115c43	[dbaas] Migrate MySQL from InnoDB Cluster to standalone StatefulSet ## Context Disk write analysis showed MySQL InnoDB Cluster writing ~95 GB/day for only ~35 MB of actual data due to Group Replication overhead (binlog, relay log, GR apply log). The operator enforces GR even with serverInstances=1. Bitnami Helm charts were deprecated by Broadcom in Aug 2025 — no free container images available. Using official mysql:8.4 image instead. ## This change: - Replace helm_release.mysql_cluster service selector with raw kubernetes_stateful_set_v1 using official mysql:8.4 image - ConfigMap mysql-standalone-cnf: skip-log-bin, innodb_flush_log_at_trx_commit=2, innodb_doublewrite=ON (re-enabled for standalone safety) - Service selector switched to standalone pod labels - Technitium: disable SQLite query logging (18 GB/day write amplification), keep PostgreSQL-only logging (90-day retention) - Grafana datasource and dashboards migrated from MySQL to PostgreSQL - Dashboard SQL queries fixed for PG integer division (::float cast) - Updated CLAUDE.md service-specific notes ## What is NOT in this change: - InnoDB Cluster + operator removal (Phase 4, 7+ days from now) - Stale Vault role cleanup (Phase 4) - Old PVC deletion (Phase 4) Expected write reduction: ~113 GB/day (MySQL 95 + Technitium 18) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 19:01:06 +00:00
Viktor Barzin	39b5ed04a7	upgrade: dawarich 0.37.1 -> 1.6.1 (fix entrypoint + add production env) ## Context Version 1.3.0+ changed the recommended command from `bin/dev` (development) to `bin/rails server -p 3000 -b ::` (production). Also requires RAILS_ENV=production, SECRET_KEY_BASE, and RAILS_LOG_TO_STDOUT env vars. ## This change - Command: `bin/dev` → `bin/rails server -p 3000 -b ::` - Add RAILS_ENV=production - Add SECRET_KEY_BASE (stored in Vault secret/dawarich, synced via ESO) - Add RAILS_LOG_TO_STDOUT=true ## What happened 1. Initial upgrade applied version 1.6.1 — DB migrations ran but pod CrashLooped due to wrong entrypoint (bin/dev exits in production mode) 2. Rollback to 0.37.1 failed because 1.6.1 migrations already ran (ActiveRecord::UnknownPrimaryKey on rails_pulse_routes) 3. Rolled forward with corrected entrypoint + env vars 4. Service now stable: 20/20 health checks passed over 5 minutes Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 17:25:29 +00:00
Viktor Barzin	59e99f2a3a	upgrade: paperless-ngx increase memory 1Gi -> 2Gi v2.20.14 OOMKills at 1Gi during search index rebuild on upgrade. Bumped to 2Gi request=limit to handle startup index operations. Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 17:16:23 +00:00
Viktor Barzin	5f1b14ad53	upgrade: dawarich re-apply 1.6.1 (forward-fix after failed rollback) DB migrations from 1.6.1 already ran, making 0.37.1 incompatible (ActiveRecord::UnknownPrimaryKey on rails_pulse_routes table). Rolling forward is the correct path. Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 17:08:20 +00:00
Viktor Barzin	f5883be981	Revert "upgrade: dawarich 0.37.1 -> 1.6.1" This reverts commit `ec8b4dbaac`.	2026-04-16 17:04:06 +00:00
Viktor Barzin	7b69641357	upgrade: linkwarden v2.9.1 -> v2.14.0 Changelog summary: 23 intermediate releases spanning text highlighting, drag & drop organization, tag management page, compact sidebar, mobile app (iOS/Android), SingleFile upload support, performance refactors, and NextJS CVE patches. Risk: SAFE Breaking changes: none DB backup: yes (job: pre-upgrade-linkwarden-1776357253, 254M dump confirmed) Config changes applied: none Flagged for manual review: none Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 16:54:30 +00:00
Viktor Barzin	e1aa59ce53	upgrade: paperless-ngx 2.16.4 -> 2.20.14 Changelog summary: 4 minor versions of bug fixes, security patches (7+ CVEs), and features (nested tags, PDF editor, advanced workflow filters, Trixie base). Risk: CAUTION Breaking changes: - v2.17.0: Scheduled workflow offset sign corrected (restores pre-2.16 behavior) - v2.20.7: Filename template rendering restricted to safe document context DB backup: yes (job: pre-upgrade-paperless-ngx-1776357314, 254MiB) Config changes applied: none Flagged for manual review: - Check scheduled workflow offsets if any were modified during 2.16.x - Verify storage path templates still render correctly after v2.20.7 restriction Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 16:53:42 +00:00
Viktor Barzin	237126eb3a	upgrade: onlyoffice 8.2.3 -> 9.3.1 Changelog summary: Major version bump spanning 13 releases. v9.0.0 adds PDF editor API, macro recording, Service Worker caching. v9.2.1 fixes critical security vulns (XSS, memory manipulation leading to RCE in XLS conversion). v9.3.0 adds GIF animations, multiple pages view, signature settings, hyperlinks on images/shapes. Risk: CAUTION (major version bump 8->9) Breaking changes: none affecting Docker+MySQL deployment. PostgreSQL schema change in v9.0.0 (irrelevant — we use MySQL). API endpoint deprecations (ConvertService.ashx, GET requests to converter/command) — not removals. Config parameter renames (leftMenu->layout.leftMenu etc.) are editor JS API, not server config. DB backup: yes (job: pre-upgrade-onlyoffice-1776357277, MySQL full dump) Config changes applied: none required Flagged for manual review: none Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 16:53:24 +00:00
Viktor Barzin	8637d82817	upgrade: phpipam v1.7.0 -> v1.7.4 Changelog summary: Bugfixes (PHP8 compat, UI performance, jQuery errors) and security fixes (XSS reflected, CSRF cookie, RCE via ping_path, DB credential exposure via mysqldump). Risk: SAFE Breaking changes: none DB backup: yes (job: pre-upgrade-phpipam-1776357227, mysql, dbaas ns) Config changes applied: none Flagged for manual review: none Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 16:53:21 +00:00
Viktor Barzin	6727894573	upgrade: url (shlink) 4.3.4 -> 5.0.2 Changelog summary: Major version bump. v5.0.0 removes QR code generation, REDIRECT_APPEND_EXTRA_PATH env var, and trusted proxy auto-detection. Various CLI option removals. v4.4-4.6 added REDIRECT_EXTRA_PATH_MODE, DB_USE_ENCRYPTION, TRUSTED_PROXIES, CORS controls, FrankenPHP support. Risk: CAUTION (major version bump 4→5) Breaking changes: QR codes removed, REDIRECT_APPEND_EXTRA_PATH removed, trusted proxy auto-detection removed, CLI option renames DB backup: yes (job: pre-upgrade-url-1776357271, completed) Config changes applied: none (no affected env vars in current config) Flagged for manual review: TRUSTED_PROXIES env var may be needed (Shlink behind Cloudflare + Traefik = 2 proxies, auto-detection removed in 5.0.0) Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 16:49:42 +00:00
Viktor Barzin	ec8b4dbaac	upgrade: dawarich 0.37.1 -> 1.6.1 Changelog summary: 19 intermediate releases. 1.0.0 is cosmetic (same as 0.37.3). Key changes: per-user timezone (1.3.0), motion_data column with background migration (1.3.0), GPS noise filtering (1.5.0), family page with map (1.4.0), redesigned archival system (1.4.0). Risk: CAUTION (major version boundary + breaking keyword in 1.3.3) Breaking changes: 1.3.3 API change (distance field integer→object, affects API consumers only) DB backup: yes (job: pre-upgrade-dawarich-1776357303, postgresql, completed) Config changes applied: none (existing TIME_ZONE=Europe/London is compatible) Flagged for manual review: none Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 16:48:34 +00:00
Viktor Barzin	287d5eb28d	upgrade: coturn 4.6.3-r1 -> 4.10.0-r1 Changelog summary: Security fixes (CVE-2025-69217, CVE-2026-27624, CVE-2026-40613), performance improvements (recvmmsg, lock-free atomics), memory safety fixes, and DDoS handling improvements. Risk: CAUTION (4.7.0 has breaking changes for deprecated config options) Breaking changes: 4.7.0 removed keep-address-family, response-origin-only-with-rfc5780, inverted no-stun-backward-compatibility. None of these are in our config — no impact. DB backup: no (not DB-backed) Config changes applied: none (no-tlsv1, no-tlsv1_1, no-cli now unnecessary but still accepted — no removal needed) Flagged for manual review: none Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 16:34:59 +00:00
Viktor Barzin	cce513349a	upgrade: immich v2.7.4 -> v2.7.5 Changelog summary: Bug fix for version check rate limiting and deduplication, translation updates. Patch-only release with no breaking changes. Risk: SAFE Breaking changes: none DB backup: yes (job: pre-upgrade-immich-1776357229, 1.9G, immich namespace) Config changes applied: none Flagged for manual review: none Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 16:34:57 +00:00
Viktor Barzin	3afdc9a6cb	upgrade: ollama (open-webui) v0.7.2 -> v0.8.12 Changelog summary: 13 intermediate releases. v0.8.0 introduces analytics dashboard, Skills support, Open Responses protocol, and a long-running DB migration on chat_message table. v0.8.1-v0.8.12 add model editing shortcuts, OIDC logout endpoint, terminal integration, notebook execution, and numerous bug fixes. Risk: CAUTION Breaking changes: v0.8.0 long-running chat_message table migration + schema changes, v0.8.1 additional schema changes. SQLite auto-migrates on startup. DB backup: skipped (SQLite on proxmox-lvm PVC, LVM snapshots available for rollback) Config changes applied: none Flagged for manual review: none — all changes are additive features/fixes Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 16:34:48 +00:00
Viktor Barzin	1ea48c93e5	upgrade: owntracks 0.9.9 -> 1.0.1 Changelog summary: - 1.0.0: POI inline image support, deprecate google maps in vmap.html, packaging fixes - 1.0.1: ocat JSON array output fix, revgeo error messages, OpenBSD support, storage dir env fix Risk: CAUTION (major version 0→1, but changes are benign — no schema/config/API breaking changes) Breaking changes: none (deprecate keyword hit on vmap.html google maps — cosmetic only) DB backup: skipped (not DB-backed) Config changes applied: none required Flagged for manual review: none Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 16:34:29 +00:00
Viktor Barzin	216d4240c9	[infra] Add Cloudflare provider to all stack lock files and generated providers Terragrunt now generates cloudflare_provider.tf (Vault-sourced API key) and includes cloudflare in required_providers. These are the generated files from running `terragrunt init -upgrade` across all stacks. [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 16:31:36 +00:00
Viktor Barzin	cf93f123f1	upgrade: audiobookshelf 2.32.1 -> 2.33.1 Changelog summary: Security fixes (IDOR vulnerabilities in sessions/progress/bookmarks), DB index + query parallelization for discover performance, crash fixes, HTML sanitization on playlist/collection/podcast endpoints, API key enabled/disabled fix. Risk: SAFE Breaking changes: none DB backup: no (not DB-backed) Config changes applied: none Flagged for manual review: none Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 16:00:26 +00:00
root	af090c818b	Woodpecker CI deploy [CI SKIP]	2026-04-16 13:46:08 +00:00
Viktor Barzin	b1d152be1f	[infra] Auto-create Cloudflare DNS records from ingress_factory ## Context Deploying new services required manually adding hostnames to cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars — a separate file from the service stack. This was frequently forgotten, leaving services unreachable externally. ## This change: - Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory` modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP). - Simplify cloudflared tunnel from 100 per-hostname rules to wildcard `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing. - Add global Cloudflare provider via terragrunt.hcl (separate cloudflare_provider.tf with Vault-sourced API key). - Migrate 118 hostnames from centralized config.tfvars to per-service dns_type. 17 hostnames remain centrally managed (Helm ingresses, special cases). - Update docs, AGENTS.md, CLAUDE.md, dns.md runbook. ``` BEFORE AFTER config.tfvars (manual list) stacks/<svc>/main.tf \| module "ingress" { v dns_type = "proxied" stacks/cloudflared/ } for_each = list \| cloudflare_record auto-creates tunnel per-hostname cloudflare_record + annotation ``` ## What is NOT in this change: - Uptime Kuma monitor migration (still reads from config.tfvars) - 17 remaining centrally-managed hostnames (Helm, special cases) - Removal of allow_overwrite (keep until migration confirmed stable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:45:04 +00:00
Viktor Barzin	b98890d799	fix(beads-server): fix Workbench GraphQL URL for remote hosting Dolt Workbench hardcodes http://localhost:9002/graphql in the built JS. For k8s hosting, init container patches this to relative /graphql path. Second ingress routes /graphql to port 9002 behind Authentik auth. - Init container copies static JS to writable emptyDir, patches URL - Pre-seeds store.json with Dolt connection config - Added /graphql ingress with Authentik forward-auth Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:53:57 +00:00
Viktor Barzin	375a3d91d5	[monitoring] Exclude websocket protocol from HighServiceLatency alert Traefik records websocket connection lifetimes (minutes to hours) as "request duration." When websockets close, the full lifetime pollutes the average latency metric — Authentik showed 6.7s avg (201s websocket avg) vs 0.065s actual HTTP avg. This caused ~90 false alerts/day across 12 services (Authentik, Vaultwarden, Terminal, HA, etc.). Changes: - Add protocol!="websocket" filter to HighServiceLatency alert expr - Raise minimum traffic threshold from 0.01 to 0.05 rps to filter statistical noise from services with <3 req/min - Remove .githooks/pre-commit file-size hook (blocked state commits) Validated against 7-day historical data: 637 breaches → ~2 with both filters applied (99.7% reduction). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:51:19 +00:00
Viktor Barzin	3e273399c1	fix(ci): add registry.viktorbarzin.me:5050 to imagePullSecrets Pipeline pods pull from registry.viktorbarzin.me:5050 but the registry-credentials secret only had auth for registry.viktorbarzin.me (without port). Containerd requires exact hostname:port match. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:50:51 +00:00
Viktor Barzin	116fdcf82d	fix(ci): Woodpecker secret sync includes all event types The vault-woodpecker-sync script was creating global secrets with only push/tag/deployment events. Manual and cron-triggered pipelines couldn't access secrets, causing "secret not found" errors and pipeline failures. Also fixes three root causes of CI failures: 1. Pull-through cache corruption: purged stale blobs, added post-GC registry restart cron to prevent recurrence 2. Missing repo-level secrets: added registry_user/registry_password for the infra repo's build-ci-image workflow 3. Stuck pipelines: cleaned up 3 pipelines stuck in "running" since March Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:43:48 +00:00
Viktor Barzin	d9ed166640	fix(beads-server): add Authentik auth to Dolt Workbench - Set protected=true on ingress (Authentik forward-auth) - Remove unused DATABASE_URL env var (Workbench uses browser-based connection config) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:43:22 +00:00
Viktor Barzin	c33f597111	feat(upgrade-agent): add automated service upgrade pipeline with n8n + DIUN Pipeline: DIUN detects new image versions every 6h → webhook to n8n → n8n filters (skip databases/custom/infra/:latest) and rate-limits (max 5/6h) → SSH to dev VM → claude -p runs upgrade agent. Agent workflow: resolve GitHub repo → fetch changelogs → classify risk (SAFE/CAUTION) → backup DB if needed → bump version in .tf → commit+push → wait for CI → verify (pod ready + HTTP + Uptime Kuma) → rollback on failure. Changes: - stacks/n8n: add N8N_PORT=5678 to fix K8s env var conflict - stacks/n8n/workflows: version-controlled n8n workflow backup - docs/architecture/automated-upgrades.md: full pipeline documentation - AGENTS.md: add upgrade agent section - service-catalog.md: update DIUN description Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:38:27 +00:00
Viktor Barzin	27d7c91608	feat(beads-server): add Dolt Workbench web UI Deploy dolthub/dolt-workbench alongside the Dolt server in beads-server namespace. Provides SQL console, spreadsheet editor, and commit graph visualization for the centralized beads task database. - Workbench at dolt-workbench.viktorbarzin.me (Cloudflare-proxied) - Connects to Dolt server via in-cluster service DNS - Added to cloudflare_proxied_names for external access Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:32:45 +00:00
Viktor Barzin	c124a23390	fix(ci): add K8s pull secrets to Woodpecker agents Pipeline pods were failing with "authorization failed: no basic auth credentials" when pulling from the private registry. The WOODPECKER_BACKEND_K8S_PULL_SECRET_NAMES env var was in values.yaml but never deployed to the agents. Also removes the stale db-init job that used `-U root` (incompatible with CNPG's `postgres` superuser). The database already exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 21:21:12 +00:00
Viktor Barzin	f7411327d1	fix(affine): update image tag 0.20.7 → 0.26.6 Image ghcr.io/toeverything/affine:0.20.7 was removed from ghcr.io, causing persistent ImagePullBackOff. Updated to latest stable 0.26.6. Prisma migrations run via init container on startup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 20:49:46 +00:00
Viktor Barzin	8b004c4c94	feat(storage): migrate all sensitive services to proxmox-lvm-encrypted Reconcile Terraform with cluster state after manual encrypted PVC migrations and complete the remaining unfinished migrations. All services storing sensitive data now use LUKS2-encrypted block storage via the Proxmox CSI plugin. ## Context Only Technitium DNS was using encrypted storage in Terraform. Many services had been manually migrated to encrypted PVCs in the cluster, but Terraform was never updated — creating dangerous state drift where a `tg apply` could recreate unencrypted PVCs. ## This change Phase 0 — Infrastructure: - Add `proxmox-lvm-encrypted` StorageClass to Helm values (extraParameters) - Add ExternalSecret for LUKS encryption passphrase to Terraform - Fix CSI node plugin memory: `node.plugin.resources` (not `node.resources`) with 1280Mi limit for LUKS2 Argon2id key derivation Phase 1 — TF state reconciliation (zero downtime): - Health, Matrix, N8N, Forgejo, Vaultwarden, Mailserver: state rm + import - Redis, DBAAS MySQL, DBAAS PostgreSQL: Helm/CNPG value updates Phase 2 — Data migration (encrypted PVCs existed but unused): - Headscale, Frigate, MeshCentral: rsync + switchover - Nextcloud (20Gi): rsync + chart_values update Phase 3 — New encrypted PVCs: - Roundcube HTML, HackMD, Affine, DBAAS pgadmin: create + rsync + switchover Phase 4 — Cleanup: - Deleted 5 orphaned unencrypted PVCs ## Services migrated (18 PVCs across 14 namespaces) ``` vaultwarden → vaultwarden-data-encrypted dbaas → datadir-mysql-cluster-0, pg-cluster-{1,2}, dbaas-pgadmin-encrypted mailserver → mailserver-data-encrypted, roundcubemail-{enigma,html}-encrypted nextcloud → nextcloud-data-encrypted forgejo → forgejo-data-encrypted matrix → matrix-data-encrypted n8n → n8n-data-encrypted affine → affine-data-encrypted health → health-uploads-encrypted hackmd → hackmd-data-encrypted redis → redis-data-redis-node-{0,1} headscale → headscale-data-encrypted frigate → frigate-config-encrypted meshcentral → meshcentral-{data,files}-encrypted ``` Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 20:15:30 +00:00

1 2 3 4 5 ...

545 commits