infra

Author	SHA1	Message	Date
Viktor Barzin	f6812fe69f	[uptime-kuma] Support per-ingress probe path annotation ## Context The `external-monitor-sync` CronJob probed `https://<host>/` for every `.viktorbarzin.me` ingress. Homepages frequently return 200 (or allow-listed 30x/40x) even when the backend or DB is broken, producing false-negatives — the forgejo outage on 2026-04-17 was not caught for this reason: `/` returned a login page while `/api/healthz` returned 503 from the DB probe. Manual monitor edits don't stick: the next sync is create-if-missing only, so a deleted monitor gets recreated pointing at `/` again. ## This change Teaches the sync three things: 1. Reads a new annotation* `uptime.viktorbarzin.me/external-monitor-path`. The annotation value is appended as the probe path; default `/` preserves today's behaviour for every ingress that hasn't opted in. 2. Tightens accepted status codes when an explicit path is set: `['200-299']` (strict — we expect a real healthz). The default `/` path keeps the existing lenient set `['200-299','300-399','400-499']` because homepages routinely 30x redirect or 40x on missing auth. 3. Updates existing monitors when the target URL or accepted status codes drift. Previously the loop was create-if-missing only, so annotating an already-monitored ingress had no effect until the monitor was deleted. Now re-running the sync after changing the annotation converges the live monitor. ## What is NOT in this change - No change to the Ingress annotations on any individual stack. Each service that wants a non-`/` probe path opts in separately. - No change to the ConfigMap fallback payload shape — legacy entries still get the lenient status codes. - Monitor DB state in Uptime Kuma's SQLite is untouched at plan time; the sync CronJob is what reconciles state on each run. ## Flow ``` ingress annotation CronJob Python ------------------ -------------- (none) --> url = https://host/ codes = lenient external-monitor-path --> url = https://host<path> codes = strict ['200-299'] ^^ "/api/healthz" https://host/api/healthz codes = ['200-299'] existing monitor + drifted target url --> api.edit_monitor(id, url=..., accepted_statuscodes=...) ``` ## Test Plan ### Automated - `terraform fmt -check -recursive stacks/uptime-kuma` — exit 0. - `scripts/tg plan` on `stacks/uptime-kuma` — `Plan: 0 to add, 1 to change, 0 to destroy`. The single in-place change is the CronJob command (Python heredoc re-rendered). No other resources drift. - Embedded Python compiles: extracted the `PYEOF` block and ran `python3 -m py_compile` — OK. ### Manual Verification 1. Annotate an ingress: `kubectl annotate ingress/<name> -n <ns> uptime.viktorbarzin.me/external-monitor-path=/api/healthz` 2. Trigger sync early: `kubectl -n uptime-kuma create job --from=cronjob/external-monitor-sync external-monitor-sync-manual` 3. Expected log line: `Updating monitor [External] <name>: https://host/ -> https://host/api/healthz (codes ['200-299','300-399','400-499'] -> ['200-299'])` 4. Inspect monitor in Uptime Kuma UI: URL and accepted status codes reflect the annotation. 5. Final summary line includes updated count: `Sync complete: 0 created, 1 updated, 0 deleted, N unchanged`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 22:06:23 +00:00
Claude Agent	842646ea4f	[ci skip] e2e: test commit from claude-agent-service	2026-04-17 22:03:50 +00:00
Viktor Barzin	65b0f30d5e	[docs] Update anti-AI and rybbit docs after rewrite-body removal - Anti-AI: 5-layer → 3 active layers (bot-block, X-Robots-Tag, tarpit) - Layer 3 (trap links via rewrite-body) removed — Yaegi v3 incompatible - Rybbit analytics now injected via Cloudflare Worker (HTMLRewriter) - strip-accept-encoding middleware removed from all references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 21:43:13 +00:00
Viktor Barzin	4117809a54	[rybbit] Deploy Cloudflare Worker for analytics injection Replaces the broken Traefik rewrite-body plugin with a Cloudflare Worker using HTMLRewriter to inject the rybbit tracking script into HTML responses at the CDN edge. - Wildcard route: .viktorbarzin.me/ covers all proxied services - 28 services have explicit site ID mappings - Unmapped hosts pass through without injection - Zero Traefik dependency, zero performance impact Closes: code-sed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 21:26:16 +00:00
Viktor Barzin	498e7f3305	[uptime-kuma] Fix duplicate monitor creation + clean up down monitors ## Duplicate bug fix The external-monitor-sync deduped targets by hostname (`host in seen`) but multiple ingresses can share the same hostname. Changed to dedupe by final monitor name (`f"{PREFIX}{label}" in seen`) — prevents creating duplicate [External] monitors on every sync run. This caused 90 duplicates. ## Monitor cleanup Deleted 118 monitors total: - 90 duplicate [External] monitors (kept lower ID of each pair) - 14 paused internal monitors for decommissioned services - 14 external monitors for non-existent, scaled-down, or non-HTTP services (xray-vless, complaints, hermes-agent, etc.) ## Opt-outs Added `uptime.viktorbarzin.me/external-monitor=false` annotation to ingresses that shouldn't have external HTTP monitors: xray (non-HTTP protocol), council-complaints, hermes-agent, task-webhook, torrserver, www (no CF DNS). 329 monitors → ~210 monitors. Zero down monitors expected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 21:12:31 +00:00
Viktor Barzin	5319f03ebc	[storage] Fix owntracks + wealthfolio: switch to encrypted PVCs Some checks failed Build Custom DIUN Image / build (push) Has been cancelled Details Deploy Post-Mortems to GitHub Pages / deploy (push) Has been cancelled Details Both services were running against empty unencrypted PVCs after the proxmox-lvm-encrypted migration. Data copied from old Released PVs via LUKS-unlock on PVE host, deployments switched to encrypted PVCs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 20:29:57 +00:00
Viktor Barzin	e51bdb2af8	Add broker-sync Terraform stack (#7 ) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in `c7c7047f` (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since `c7c7047f`. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='.tf' --include='.hcl' \ --include='.yaml' --include='.yml' --include='.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config \| 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in `bcad200a` (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from `bcad200a` forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='.tf' --include='.hcl' \ --include='.yaml' --include='.yml' --include='.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml \| 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='.tf' --include='.hcl' \ --include='.yaml' --include='.yml' --include='.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh \| 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\\|main\.sh' \ --include='.tf' --include='.hcl' --include='.yaml' \ --include='.yml' --include='.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for .viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/* filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks//secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits `5a988133` 2026-03-11, `a6f71fc6` 2026-03-18, `9820f2ce` 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN .viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks//secrets/ filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 21:17:45 +01:00
Viktor Barzin	7a884a0b97	[monitoring] Fix alerts for intentionally scaled-down services PoisonFountainDown and ForwardAuthFallbackActive both fired because poison-fountain was scaled to 0 replicas (intentional). Updated both alert expressions to check kube_deployment_spec_replicas > 0 before alerting on missing available replicas — if desired replicas is 0, the service is intentionally down and should not alert. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 19:17:41 +00:00
Viktor Barzin	a19581e32b	fix(beads-server): fix Workbench timeout — use internal GraphQL URL GRAPHQLAPI_URL must point to localhost:9002 (internal), not the external URL which goes through Authentik. SSR can't authenticate to Authentik. Also removed Authentik from /graphql ingress — browser fetch() can't follow 302 redirects on POST requests. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 19:05:47 +00:00
Viktor Barzin	da6b82ed5c	fix(beads-server): persist GRAPHQLAPI_URL in Terraform The env var was only set via kubectl and got overwritten on next apply. Now permanently in the deployment spec. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 18:58:59 +00:00
Viktor Barzin	afb8a16623	[infra] Scale down unused services + remove DoH ingress Scale to 0 replicas: - ollama: low usage, saves ~2Gi memory + 59GB NFS-SSD model data idle - poison-fountain: RSS link archiver, not actively used - travel-blog: Hugo blog, not actively used Remove technitium DoH ingress (dns.viktorbarzin.me): externally unreachable and unused. DNS is served on UDP/TCP port 53 via LoadBalancer (10.0.20.201). Clears 3 of 5 ExternalAccessDivergence services. Remaining 2 (pdf, travel) should clear now that the Uptime Kuma monitors will report both down. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 18:55:52 +00:00
Viktor Barzin	cdc851fc63	[alerts] Fix status-page-pusher crash + Prometheus backup push ## status-page-pusher (ExternalAccessDivergence false positive) The pusher was crashing with `AttributeError: 'list' object has no attribute 'get'` at line 122 — the uptime-kuma-api library changed the heartbeats return format. Fixed by making beat flattening more robust: handle any nesting of lists/dicts in the heartbeat data, and add isinstance check before calling `.get()` on the latest beat. ## Prometheus backup (PrometheusBackupNeverRun) The backup sidecar's Pushgateway push was silently failing because `wget --post-file=-` needs `--header="Content-Type: text/plain"` for Pushgateway to accept the Prometheus exposition format. Added the header. Also manually pushed the metric to clear the `absent()` alert immediately. Note: ExternalAccessDivergence still fires because 5 services (ollama, pdf, poison, dns, travel) ARE genuinely externally unreachable but internally up. This is a real issue (likely Cloudflare tunnel routing) not a false positive. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 18:29:43 +00:00
Viktor Barzin	eef4242408	fix(beads-server): auto-connect Workbench to Dolt on startup The Workbench's database connection is in-memory and lost on pod restart. Added startup script that waits for GraphQL server readiness, then calls addDatabaseConnection mutation automatically. No more manual reconnection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 18:12:31 +00:00
Viktor Barzin	5e9e487661	feat(setup-project): auto-PR working Dockerfiles back to upstream ## Context The setup-project skill treats "build from a Dockerfile" as priority 6 — "last resort, avoid if possible" — with no formalized path for apps whose upstream lacks a working Dockerfile. When we end up writing one to get the deploy green, that Dockerfile stays private in the infra repo and upstream never benefits. ## This change Adds a closed-loop flow: when we author a new Dockerfile (or fix a broken upstream one) and the deploy is healthy for 10 minutes, auto-open a PR against the upstream repo so the self-hosting community gets the working recipe. Flow: 1. Classify dockerfile_state during research phase (image-used / used-as-is / fixed-broken-upstream / written-from-scratch). Persist to modules/kubernetes/<service>/.contribution-state.json. 2. After Terraform apply, run scripts/stability-gate.sh — polls pod Ready + HTTP 200 every 30s x 20 iterations, requires 18/20 successes. 3. On pass with a trigger state, scripts/contribute-dockerfile.sh does the GitHub API dance: fork → merge-upstream → branch → commit Dockerfile / .dockerignore / BUILD.md via Contents API → open PR with body rendered from templates/PR_BODY.md. Idempotent (skips on recorded PR URL, existing fork, existing branch, open PR, upstream landed a Dockerfile mid-deploy). GitHub API via curl (gh CLI is sandbox-blocked per .claude/CLAUDE.md); token pulled from Vault (`secret/viktor` → `github_pat`). Commits include Signed-off-by for DCO-enforcing repos. Fork branch name is `add-dockerfile` for written-from-scratch or `fix-dockerfile` for fixed-broken-upstream, with timestamp suffix on collision. ## Files - SKILL.md — state classification table, quality bar checklist, §8b stability gate, §10 contribute-upstream step, checklist updates - scripts/stability-gate.sh — 10-minute health probe - scripts/contribute-dockerfile.sh — GitHub API orchestrator - templates/PR_BODY.md — `{{VAR}}` placeholder template for PR description - templates/Dockerfile.README.md — BUILD.md template shipped with the PR ## What is NOT in this change - No Woodpecker / GHA changes (skill-local flow). - No auto-tracking of merge/reject outcomes upstream (manual follow-up). - Not yet exercised end-to-end; first real-world run will validate the API dance. Plan to dry-run against a throwaway sink repo before pointing at a real upstream. ## Test Plan ### Automated - bash -n on both scripts → pass - Manual read-through of SKILL.md — step numbering coherent, existing §1-9 untouched semantics, new §8b/§10 reference real files ### Manual Verification 1. Next time setup-project onboards a Dockerfile-less app: - Confirm .contribution-state.json is written with `written-from-scratch` - Run stability-gate.sh — expect 18/20 passes on a healthy deploy - Run contribute-dockerfile.sh — expect a fork + branch + PR on ViktorBarzin - Verify contribution_pr_url is back-written to the state file 2. Re-run contribute-dockerfile.sh → must be a no-op (idempotent) 3. Upstream-archived case: manually archive a test upstream → re-run → expect SKIP, no PR created [ci skip] Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 18:12:13 +00:00
Viktor Barzin	1860cd1dfb	state(vault): update encrypted state	2026-04-17 14:14:05 +00:00
Viktor Barzin	f0ddfb8cae	state(dbaas): update encrypted state	2026-04-17 14:08:49 +00:00
Viktor Barzin	b034c868db	[traefik] Remove broken rewrite-body plugin and all rybbit/anti-AI injection The rewrite-body Traefik plugin (both packruler/rewrite-body v1.2.0 and the-ccsn/traefik-plugin-rewritebody v0.1.3) silently fails on Traefik v3.6.12 due to Yaegi interpreter issues with ResponseWriter wrapping. Both plugins load without errors but never inject content. Removed: - rewrite-body plugin download (init container) and registration - strip-accept-encoding middleware (only existed for rewrite-body bug) - anti-ai-trap-links middleware (used rewrite-body for injection) - rybbit_site_id variable from ingress_factory and reverse_proxy factory - rybbit_site_id from 25 service stacks (39 instances) - Per-service rybbit-analytics middleware CRD resources Kept: - compress middleware (entrypoint-level, working correctly) - ai-bot-block middleware (ForwardAuth to bot-block-proxy) - anti-ai-headers middleware (X-Robots-Tag: noai, noimageai) - All CrowdSec, Authentik, rate-limit middleware unchanged Next: Cloudflare Workers with HTMLRewriter for edge-side injection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 12:41:17 +00:00
Viktor Barzin	b24545ffdb	fix(beads-server): fix BeadBoard project ID + install bd binary - Fixed project_id mismatch (was "beadboard", should be actual DB project ID) - Rebuilt Docker image with bd v1.0.2 binary (node:20-slim for glibc compat) - Ran bd migrate to update schema from 1.0.0 → 1.0.2 (adds started_at, etc.) - Task creation and bd CLI now work inside the container Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 11:57:45 +00:00
Viktor Barzin	f2037545b3	fix(beads-server): make BeadBoard .beads dir writable BeadBoard needs to create templates/ and archetypes/ subdirectories inside .beads/. ConfigMap mounts are read-only, causing ENOENT errors and 503 responses. Fix: init container copies ConfigMap to emptyDir. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 11:37:26 +00:00
Viktor Barzin	00e2f15a5d	feat(beads-server): deploy BeadBoard task visualization dashboard Add BeadBoard (zenchantlive/beadboard) alongside Dolt server and Workbench for task dependency graph, kanban, and agent coordination views. - Built custom Docker image (registry.viktorbarzin.me:5050/beadboard) - ConfigMap provides .beads/metadata.json pointing to Dolt server - Behind Authentik auth at beadboard.viktorbarzin.me - Also fixed: GraphQL ingress now has Authentik middleware - Also fixed: Workbench store.json type enum (mysql → Mysql) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 11:30:43 +00:00
Viktor Barzin	26abd8fe94	[skill] Add /disk-wear skill for periodic disk write analysis ## Context After the MySQL standalone migration + Technitium SQLite disable saved ~130 GB/day of disk writes, this methodology should be reusable for periodic health reviews. ## This change: Adds `/disk-wear` skill that combines three data sources: - SSH to PVE host for real-time 30s I/O snapshots and SSD SMART health - Prometheus PromQL for per-app write attribution (node_disk_written_bytes_total joined with node_disk_device_mapper_info for dm->LVM mapping) - kubectl for PVC UUID -> pod/namespace mapping Produces ranked breakdowns by physical disk, VM, k8s namespace, and individual PVC. Includes baselines, red flag detection, and annualized wear projections. Note: container_fs_writes_bytes_total has 0 series (cadvisor doesn't track block device writes per container), so per-app attribution uses the PVE host's dm-device level metrics mapped through Prometheus and kubectl. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 11:15:26 +00:00
Viktor Barzin	366e2ab083	[uptime-kuma] Opt-out external monitoring for every public ingress [ci skip] ## Context After the previous commit migrated monitor discovery to per-ingress annotation (opt-in via `uptime.viktorbarzin.me/external-monitor=true`), coverage expanded from 13 → 26 monitors but still left ~99 public ingresses uncovered — notably Helm-managed services (authentik, grafana, vault, forgejo, ntfy) that don't go through `ingress_factory`, plus any `dns_type = "non-proxied"` ingress (Immich was a direct victim: `dns_type = "non-proxied"` → no annotation added → no monitor → invisible outage). The user's concern: "I should have known external Immich was down before users tried to open it." ## This change Flipped the semantic from opt-in to opt-out by default: - Every ingress whose host ends in `.viktorbarzin.me` gets a `[External] <label>` monitor automatically - Only ingresses with annotation `uptime.viktorbarzin.me/external-monitor=false` are skipped - Host dedup via a `seen` set (one monitor per hostname, regardless of how many Ingress resources share it) ## Verification Triggered a manual CronJob run post-apply: ``` Sync complete: 102 created, 1 deleted, 23 unchanged ``` Coverage jumped from 26 → ~124 external monitors. All 6 Helm-managed services now have dedicated monitors: - [External] immich, authentik, forgejo, grafana, ntfy, vault ## Scope Only `stacks/uptime-kuma/modules/uptime-kuma/main.tf` (Python script in the CronJob resource). No RBAC or service account changes — the ones added in the previous commit still cover this path. ## Test plan ### Automated \`\`\` \$ kubectl -n uptime-kuma logs -l job-name=manual-sync-optout-1776422993 --tail=50 \| grep -iE 'immich\|authentik\|grafana\|forgejo\|vault\|ntfy' Creating monitor: [External] authentik -> https://authentik.viktorbarzin.me Creating monitor: [External] forgejo -> https://forgejo.viktorbarzin.me Creating monitor: [External] immich -> https://immich.viktorbarzin.me Creating monitor: [External] grafana -> https://grafana.viktorbarzin.me Creating monitor: [External] ntfy -> https://ntfy.viktorbarzin.me Creating monitor: [External] vault -> https://vault.viktorbarzin.me \`\`\` ### Manual Verification 1. Open `https://uptime.viktorbarzin.me` → confirm `[External] immich` exists 2. Simulate an Immich outage (scale deploy to 0 briefly) → external monitor should go red within the probe interval (5min); internal monitor stays up (pod-level from a different probe angle) → `ExternalAccessDivergence` alert fires after 15 min Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 11:12:00 +00:00
Viktor Barzin	66d2d9916b	[infra] Per-ingress external-monitor annotation + actualbudget plan-time fix [ci skip] ## Context Two operational gaps surfaced during a healthcheck sweep today: 1. External monitoring coverage: Only ~13 hostnames (via `cloudflare_proxied_names` in `config.tfvars`) had `[External]` monitors in Uptime Kuma. Any service deployed via `ingress_factory` with `dns_type = "proxied"` auto-created its DNS record but was NOT registered for external probing — so outages like Immich going down externally were invisible until a user complained. 99 of ~125 public ingresses had no external monitor. 2. actualbudget stack unplannable: `count = var.budget_encryption_password != null ? 1 : 0` in `factory/main.tf:152` failed with "Invalid count argument" because the value flows from a `data.kubernetes_secret` whose contents are `(known after apply)` at plan time. Blocked CI applies and drift reconciliation. ## This change ### Per-ingress external-monitor annotation (ingress_factory + reverse_proxy/factory) - New variables `external_monitor` (bool, nullable) + `external_monitor_name` (string, nullable). Default is "follow dns_type" — enabled for any public DNS record (`dns_type != "none"`, covers both proxied and non-proxied so Immich and other direct-A records are also monitored). - Emits two annotations on the Ingress: - `uptime.viktorbarzin.me/external-monitor = "true"` - `uptime.viktorbarzin.me/external-monitor-name = "<label>"` (optional override) ### external-monitor-sync CronJob (uptime-kuma stack) - Discovers targets from live Ingress objects via the K8s API first (filter by annotation), falls back to the legacy `external-monitor-targets` ConfigMap on any API error (zero rollout risk). - New `ServiceAccount` + cluster-wide `ClusterRole`/`ClusterRoleBinding` giving `list`/`get` on `networking.k8s.io/ingresses`. - `API_SERVER` now uses the `KUBERNETES_SERVICE_HOST` env var (always injected by K8s) instead of `kubernetes.default.svc` — the search-domain expansion failed in the CronJob pod's DNS config. Verified working: CronJob now logs `Loaded N external monitor targets (source=k8s-api)`. ### actualbudget count-on-unknown refactor - Replaced `count = var.budget_encryption_password != null ? 1 : 0` with two explicit plan-time booleans: `enable_http_api` and `enable_bank_sync`. Values are known at plan; no `-target` workaround needed. - Callers (`stacks/actualbudget/main.tf`) pass `true` explicitly. Runtime behaviour is unchanged — the secret is still consumed via env var. - Also aligned the factory with live state (the 3 budget-* PVCs had been migrated `proxmox-lvm` → `proxmox-lvm-encrypted` outside Terraform): PVC resource renamed `data_proxmox` → `data_encrypted`, storage class updated, orphaned `nfs_data` module removed. State was rm'd + re-imported with matching UIDs, so no data was moved. ## Rollout status (already partially applied in this session) - `stacks/uptime-kuma` applied — SA + RBAC + CronJob changes live; FQDN fix verified - `stacks/actualbudget` applied — budget-{viktor,anca,emo} all 200 OK externally - `stacks/mailserver` + 21 other ingress_factory consumers applied — annotations live - CronJob `external-monitor-sync` latest run: `source=k8s-api`, 26 monitors active (was 13 on the central list) ## Deferred (separate work) - 4 stacks show pre-existing DESTRUCTIVE drift in plan (metallb namespace, claude-memory, rbac, redis) — NOT triggered by this commit but will be by CI's global-file cascade. `[ci skip]` here so those don't auto-apply; they will be fixed manually before the next CI push. - Cleanup of `cloudflare_proxied_names` list once Helm-managed ingresses (authentik, grafana, vault, forgejo) are annotated — separate PR. ## Test plan ### Automated \`\`\` \$ kubectl -n uptime-kuma logs \$(kubectl -n uptime-kuma get pods -l job-name -o name \| tail -1) Loaded 26 external monitor targets (source=k8s-api) Sync complete: 7 created, 0 deleted, 17 unchanged \$ curl -sk -o /dev/null -w "%{http_code}\n" -H "Accept: text/html" \\ https://dawarich.viktorbarzin.me/ https://nextcloud.viktorbarzin.me/ \\ https://budget-viktor.viktorbarzin.me/ 200 302 200 \$ kubectl -n actualbudget get deploy,pvc -l app=budget-viktor deployment.apps/budget-viktor 1/1 1 1 Ready persistentvolumeclaim/budget-viktor-data-encrypted Bound 10Gi RWO proxmox-lvm-encrypted \`\`\` ### Manual Verification 1. Confirm the annotation is present on an ingress_factory ingress: \`\`\` kubectl -n dawarich get ingress dawarich -o \\ jsonpath='{.metadata.annotations.uptime\.viktorbarzin\.me/external-monitor}' # Expected: "true" \`\`\` 2. Confirm the new `[External] <name>` monitor appears in Uptime Kuma within 10 min (CronJob interval). For Immich specifically, it will appear after the immich stack is re-applied. 3. Verify actualbudget plan is clean: \`\`\` cd stacks/actualbudget && scripts/tg plan --non-interactive # Expected: no "Invalid count argument" errors \`\`\` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 10:34:32 +00:00
Viktor Barzin	0c4fe98d75	state(dbaas): update encrypted state	2026-04-17 10:08:04 +00:00
Viktor Barzin	996bdfc9b6	[technitium] Uninstall MySQL+SQLite query log plugins instead of just disabling ## Context Disabling MySQL/SQLite query logging via config was not durable — Technitium re-enables disabled plugins on pod restart, causing 46 GB/day of writes to the standalone MySQL (15M inserts to technitium.dns_logs between CronJob runs). ## This change: The password-sync CronJob now UNINSTALLS MySQL and SQLite query log plugins via `/api/apps/uninstall` instead of setting `enableLogging:false`. This is permanent — the plugin files are removed from the PVC, so they can't re-enable on restart. The CronJob checks if the plugins are present first (idempotent). Only PostgreSQL query logging remains (90-day retention). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 08:20:55 +00:00
Viktor Barzin	f0a73815d8	[freedify] Remove stale sed patches from container startup The audio-engine.js, dom.js, and dj.js files were refactored/removed in the upstream Freedify repo. The sed patches that disabled iOS EQ auto-init and visualizer no longer have targets, causing the container to crash on startup. Use the image's default CMD instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 06:17:13 +00:00
Viktor Barzin	f8facf44dd	[infra] Fix rewrite-body plugin + cleanup TrueNAS + version bumps ## Context The rewrite-body Traefik plugin (packruler/rewrite-body v1.2.0) silently broke on Traefik v3.6.12 — every service using rybbit analytics or anti-AI injection returned HTTP 200 with "Error 404: Not Found" body. Root cause: middleware specs referenced plugin name `rewrite-body` but Traefik registered it as `traefik-plugin-rewritebody`. Migrated to maintained fork `the-ccsn/traefik-plugin-rewritebody` v0.1.3 which uses the correct plugin name. Also added `lastModified = true` and `methods = ["GET"]` to anti-AI middleware to avoid rewriting non-HTML responses. ## This change - Replace packruler/rewrite-body v1.2.0 with the-ccsn/traefik-plugin-rewritebody v0.1.3 - Fix plugin name in all 3 middleware locations (ingress_factory, reverse-proxy factory, traefik anti-AI) - Remove deprecated TrueNAS cloud sync monitor (VM decommissioned 2026-04-13) - Remove CloudSyncStale/CloudSyncFailing/CloudSyncNeverRun alerts - Fix PrometheusBackupNeverRun alert (for: 48h → 32d to match monthly sidecar schedule) - Bump versions: rybbit v1.0.21→v1.1.0, wealthfolio v1.1.0→v3.2, networking-toolbox 1.1.1→1.6.0, cyberchef v10.24.0→v9.55.0 - MySQL standalone storage_limit 30Gi → 50Gi - beads-server: fix Dolt workbench type casing, remove Authentik on GraphQL endpoint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 05:51:52 +00:00
Viktor Barzin	8b206a63ad	state(dbaas): update encrypted state	2026-04-16 22:55:52 +00:00
Viktor Barzin	4c8e5bea0b	[traefik] Add global compress middleware to fix response compression The rewrite-body plugin (rybbit analytics, anti-AI trap links) requires strip-accept-encoding to work, which killed HTTP compression for 50+ services. This adds Traefik's built-in compress middleware at the websecure entrypoint level to re-compress responses to clients after rewrite-body has modified them. Uses includedContentTypes whitelist (not excludedContentTypes) so only text-based types are compressed. SSE, WebSocket, gRPC, and binary downloads are unaffected. Measured improvement on ha-sofia: - app.js: 540KB → 167KB (3.2x) - core.js: 52KB → 19KB (2.7x) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 22:18:51 +00:00
Viktor Barzin	e80b2f026f	[infra] Migrate Terraform state from local SOPS to PostgreSQL backend Two-tier state architecture: - Tier 0 (infra, platform, cnpg, vault, dbaas, external-secrets): local state with SOPS encryption in git — unchanged, required for bootstrap. - Tier 1 (105 app stacks): PostgreSQL backend on CNPG cluster at 10.0.20.200:5432/terraform_state with native pg_advisory_lock. Motivation: multi-operator friction (every workstation needed SOPS + age + git-crypt), bootstrap complexity for new operators, and headless agents/CI needing the full encryption toolchain just to read state. Changes: - terragrunt.hcl: conditional backend (local vs pg) based on tier0 list - scripts/tg: tier detection, auto-fetch PG creds from Vault for Tier 1, skip SOPS and Vault KV locking for Tier 1 stacks - scripts/state-sync: tier-aware encrypt/decrypt (skips Tier 1) - scripts/migrate-state-to-pg: one-shot migration script (idempotent) - stacks/vault/main.tf: pg-terraform-state static role + K8s auth role for claude-agent namespace - stacks/dbaas: terraform_state DB creation + MetalLB LoadBalancer service on shared IP 10.0.20.200 - Deleted 107 .tfstate.enc files for migrated Tier 1 stacks - Cleaned up per-stack tiers.tf (now generated by root terragrunt.hcl) [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 19:33:12 +00:00
Viktor Barzin	f538115c43	[dbaas] Migrate MySQL from InnoDB Cluster to standalone StatefulSet ## Context Disk write analysis showed MySQL InnoDB Cluster writing ~95 GB/day for only ~35 MB of actual data due to Group Replication overhead (binlog, relay log, GR apply log). The operator enforces GR even with serverInstances=1. Bitnami Helm charts were deprecated by Broadcom in Aug 2025 — no free container images available. Using official mysql:8.4 image instead. ## This change: - Replace helm_release.mysql_cluster service selector with raw kubernetes_stateful_set_v1 using official mysql:8.4 image - ConfigMap mysql-standalone-cnf: skip-log-bin, innodb_flush_log_at_trx_commit=2, innodb_doublewrite=ON (re-enabled for standalone safety) - Service selector switched to standalone pod labels - Technitium: disable SQLite query logging (18 GB/day write amplification), keep PostgreSQL-only logging (90-day retention) - Grafana datasource and dashboards migrated from MySQL to PostgreSQL - Dashboard SQL queries fixed for PG integer division (::float cast) - Updated CLAUDE.md service-specific notes ## What is NOT in this change: - InnoDB Cluster + operator removal (Phase 4, 7+ days from now) - Stale Vault role cleanup (Phase 4) - Old PVC deletion (Phase 4) Expected write reduction: ~113 GB/day (MySQL 95 + Technitium 18) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 19:01:06 +00:00
Viktor Barzin	ef30f27ac9	state(dbaas): update encrypted state	2026-04-16 18:56:59 +00:00
Viktor Barzin	b6fc1e63a6	state(dbaas): import postgresql-lb service	2026-04-16 18:55:40 +00:00
Viktor Barzin	14fa2b9762	state(vault): update encrypted state	2026-04-16 18:43:06 +00:00
Viktor Barzin	1a42f750f8	state(dbaas): update encrypted state	2026-04-16 18:41:34 +00:00
Viktor Barzin	0a43b5c2ac	state(dbaas): update encrypted state	2026-04-16 18:31:33 +00:00
Viktor Barzin	cd513a2226	state(dbaas): update encrypted state	2026-04-16 18:24:31 +00:00
Viktor Barzin	0368601eff	state(dbaas): update encrypted state	2026-04-16 18:24:20 +00:00
Viktor Barzin	a237ac97e0	docs(upgrades): add bulk upgrade results from first production run 12 services upgraded in 30 min: audiobookshelf, owntracks, open-webui, immich, coturn, shlink, phpipam, onlyoffice, paperless-ngx, linkwarden, synapse, dawarich. Documents auto-rollback behavior, resource awareness (paperless memory bump), bulk upgrade procedure, and rate limit reset. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 17:34:27 +00:00
Viktor Barzin	39b5ed04a7	upgrade: dawarich 0.37.1 -> 1.6.1 (fix entrypoint + add production env) ## Context Version 1.3.0+ changed the recommended command from `bin/dev` (development) to `bin/rails server -p 3000 -b ::` (production). Also requires RAILS_ENV=production, SECRET_KEY_BASE, and RAILS_LOG_TO_STDOUT env vars. ## This change - Command: `bin/dev` → `bin/rails server -p 3000 -b ::` - Add RAILS_ENV=production - Add SECRET_KEY_BASE (stored in Vault secret/dawarich, synced via ESO) - Add RAILS_LOG_TO_STDOUT=true ## What happened 1. Initial upgrade applied version 1.6.1 — DB migrations ran but pod CrashLooped due to wrong entrypoint (bin/dev exits in production mode) 2. Rollback to 0.37.1 failed because 1.6.1 migrations already ran (ActiveRecord::UnknownPrimaryKey on rails_pulse_routes) 3. Rolled forward with corrected entrypoint + env vars 4. Service now stable: 20/20 health checks passed over 5 minutes Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 17:25:29 +00:00
Viktor Barzin	8bd2ace00d	state(technitium): update encrypted state	2026-04-16 17:21:06 +00:00
Viktor Barzin	7680d4e009	state(dawarich): update encrypted state	2026-04-16 17:19:29 +00:00
Viktor Barzin	59e99f2a3a	upgrade: paperless-ngx increase memory 1Gi -> 2Gi v2.20.14 OOMKills at 1Gi during search index rebuild on upgrade. Bumped to 2Gi request=limit to handle startup index operations. Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 17:16:23 +00:00
Viktor Barzin	611c67b92c	state(paperless-ngx): update encrypted state	2026-04-16 17:10:57 +00:00
Viktor Barzin	5f1b14ad53	upgrade: dawarich re-apply 1.6.1 (forward-fix after failed rollback) DB migrations from 1.6.1 already ran, making 0.37.1 incompatible (ActiveRecord::UnknownPrimaryKey on rails_pulse_routes table). Rolling forward is the correct path. Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>	2026-04-16 17:08:20 +00:00
Viktor Barzin	e9275534b6	state(dawarich): update encrypted state	2026-04-16 17:08:15 +00:00
Viktor Barzin	1f589a403c	state(dawarich): update encrypted state	2026-04-16 17:04:44 +00:00
Viktor Barzin	f5883be981	Revert "upgrade: dawarich 0.37.1 -> 1.6.1" This reverts commit `ec8b4dbaac`.	2026-04-16 17:04:06 +00:00
Viktor Barzin	178fc4b398	state(matrix): update encrypted state	2026-04-16 17:01:28 +00:00
Viktor Barzin	449f1af9d6	state(immich): update encrypted state	2026-04-16 17:00:59 +00:00

1 2 3 4 5 ...

2794 commits