Commit graph

1638 commits

Author SHA1 Message Date
Viktor Barzin
695e020111 cloudflared: move bridge removed{} to stack root — removed blocks are root-module-only
Some checks failed
ci/woodpecker/push/default Pipeline failed
Pipeline 461 failed terraform init: the removed{} handoff block sat in
the stack-local module, but Terraform only allows removed blocks in the
root module. Same intent, correct position (from =
module.cloudflared.cloudflare_record.bridge_pages, destroy=false).
Without this the stale state entry would make the next cloudflared
apply destroy the record valia-sites now owns.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 12:31:53 +00:00
Viktor Barzin
8b80b4cc41 valia-sites: registry stack for Valia's Pages sites + declarative internal DNS (ADR-0018)
Some checks failed
Build valia-sites-sync / build (push) Waiting to run
ci/woodpecker/push/default Pipeline failed
Valia keeps asking Viktor to host 1-page sites from her Drive folders;
this makes it one map entry. New stacks/valia-sites: per site a CF Pages
project + custom domain + proxied CNAME (bridge adopted via import{}),
a ConfigMap feed (valia-sites-dns) the technitium ingress-dns-sync
script now reconciles internal CNAMEs from (add/update/REMOVE — fixes
the add-only stale-record gotcha), and one shared 10-min CronJob that
mirrors each Content folder (rclone, drive.readonly, stem95su's guards)
and wrangler-deploys ONLY on manifest change (free-tier deploy cap).
Scoped CF Pages token + shared rclone conf in secret/valia-sites; the
Global API Key never enters a pod. cloudflared forgets bridge's record
via removed{} (no destroy). stem95su is in the map dns-parked
(manage_dns=false) until its cutover commit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 12:28:06 +00:00
Viktor Barzin
e1bd111562 rename CF Pages site most.viktorbarzin.me -> bridge.viktorbarzin.me
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked to rename the 'мост' school static site to 'bridge'.
New Cloudflare Pages project 'bridge' (bridge-cv2.pages.dev) already
deployed and the custom domain attached; this renames the public CNAME
(TF resource most_pages -> bridge_pages, destroy+create swaps the
record) and the internal split-horizon static CNAME in the
ingress-dns-sync CronJob. The old 'most' Pages project and the stale
internal 'most' record are removed out-of-band after this applies.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 10:52:30 +00:00
Viktor Barzin
7dd80b6c7c technitium: mirror most.viktorbarzin.me into the internal zone (CF Pages site)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The internal split-horizon zone is authoritative for viktorbarzin.me,
so the new Cloudflare Pages site (most.viktorbarzin.me, added for
Viktor's 'мост' school static site) NXDOMAINed for every internal
client — LAN, VLANs and pods — while resolving fine externally.
Per the superset rule, add it as a static CNAME (-> most-6if.pages.dev)
in the ingress-dns-sync CronJob next to the mail-auth records, and
document the off-infra-site case in dns.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 10:10:46 +00:00
Viktor Barzin
217a54be9d cloudflared: add most.viktorbarzin.me CNAME for Cloudflare Pages site
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked to host a static HTML site (the 'мост' school project,
ОбУ „Отец Паисий", pulled from his Google Drive) on Cloudflare Pages
with a custom domain, as a try-out of Pages hosting. The site content
is deployed off-infra via wrangler to the Pages project 'most'
(most-6if.pages.dev); this CNAME points most.viktorbarzin.me at it.
The custom domain is already attached to the Pages project and is
waiting on this DNS record to validate.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 10:06:33 +00:00
Viktor Barzin
08fb65827c tripit: set PLACE_PHOTO_PROVIDER=wikipedia — real place preview photos
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked for place photos on the tripit Trip board. The app-side
work (add-time photo fetch, board place cards) shipped in tripit
v0.106.0, but prod never set PLACE_PHOTO_PROVIDER, so the fake provider
would store placeholder PNGs for every hand-added place. Same class of
fake-default gap as PLACE_RESOLVER_MODE (set explicitly for the same
reason); the ADR-0035 rollout had left both the env flip and its
backfill cron undone.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 21:57:21 +00:00
Viktor Barzin
248e186dce CCTV segment (dCCTV 10.0.30.0/24) on a dedicated pfSense leg for the garage camera
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor and emo are adding the first owned camera at the Sofia site (HiLook
IPC-T241H-C watching the garage / server rack). Viktor asked to finalize
emo's plan; the grilling session resolved emo's five open decisions and
replaced the doc's 802.1Q-trunk idea with the site idiom: a dedicated
physical leg (R730 eno2 -> vmbr2 -> pfSense net3 = dCCTV 10.0.30.1/24),
port-based VLAN split on the shared TL-SG105PE, camera default-deny with
NTP-only egress, Frigate + ha-sofia as the only consumers.

The PVE bridge, pfSense interface, Kea subnet and firewall rules were
applied live this session (hand-managed hosts, backed up). This commit
records the decision (ADR-0017), the glossary terms (Segment / CCTV
segment), the as-built architecture doc, and bumps Frigate's ADR-0016
VRAM budget 2000 -> 2300 MiB for the upcoming NVDEC stream.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 20:01:45 +00:00
9e253d409a immich(frame-emo): show photos from the last 365 days (was 730)
Emil asked his Sofia Portal Mini photo-frame to show only the past
year of photos rolling from today, instead of the last two years.
Changes ImagesFromDays 730 -> 365 in the frame-emo Settings.yml.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 19:05:31 +00:00
Viktor Barzin
21afae85c9 dawarich: dedicated 100/1000 Traefik rate limit (default 10/50 429'd page loads)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor saw dawarich throwing 429s through Traefik and asked to loosen
the burst for it. The access log confirms the burst pattern: one page
load fires the whole fingerprinted-asset tail (SVG store badges,
favicons, webmanifest) from a single client IP and trips the default
10 req/s / burst 50 limiter (repro: 80 parallel GETs -> 28x 429).
Same remedy as ha-sofia, ActualBudget, noVNC, tripit, health and
authentik: dedicated dawarich-rate-limit middleware (average 100 /
burst 1000) + skip_default_rate_limit on the dawarich ingress. Also
updates the networking.md middleware enumerations (adding the
previously undocumented tripit/health limiters alongside dawarich).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 15:03:08 +00:00
Viktor Barzin
91d0213d1a Merge remote-tracking branch 'forgejo/master' into wizard/excalidraw-export-rename
Some checks failed
ci/woodpecker/push/default Pipeline was successful
Build excalidraw-library / build (push) Has been cancelled
2026-07-02 14:29:34 +00:00
Viktor Barzin
8fc657f431 excalidraw: migrate image build to GHA -> private ghcr (ADR-0002)
The image was still built by hand and pushed to DockerHub (v1..v4),
predating the all-builds-off-infra doctrine; Viktor chose to move it
onto the standard pipeline while shipping the export/rename feature
rather than keep the manual flow.

Mirrors the k8s-portal pattern: .github/workflows/build-excalidraw.yml
(go test + buildx linux/amd64, pushes ghcr latest+sha), excalidraw ns
added to the Kyverno ghcr-credentials allowlist (package is PRIVATE),
deployment now pins ghcr :latest with pullPolicy Always + pull secret,
Keel force/match-tag/5m annotations seed the metadata (live values win
via ignore_changes). DockerHub viktorbarzin/excalidraw-library:v4 stays
frozen as the rollback image. Docs: ci-cd.md + .claude/CLAUDE.md image
lists updated (also backfilled the missing k8s-portal rows in ci-cd.md).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 14:29:23 +00:00
Viktor Barzin
1cbc1e962b excalidraw: native export menu + drawing rename
Users couldn't see Excalidraw's built-in Save as / Export image options:
the app's custom toolbar was drawn exactly on top of the native hamburger
menu button, hiding it. Removed the overlay and integrated Back to
Library / Save now / Rename into the native menu, so the native export
formats (.excalidraw file, PNG, SVG, clipboard) are now reachable.
Viktor asked for exports to work via the native Excalidraw feature and
for drawings to be renameable by clicking their name.

Rename: new PATCH /api/drawings/{id} endpoint (server-side name
sanitization, 409 on conflict) + click-to-rename title pill in the
editor (updates URL in place) + Rename button/modal in the dashboard.
Existing GET/PUT/DELETE semantics unchanged for API compatibility
(emo's upload pipeline). Added main_test.go (httptest) covering rename
+ existing handler behavior; dashboard rows now DOM-built (XSS-safe).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 14:29:10 +00:00
Viktor Barzin
d94f267c93 immich: upgrade v2.7.5 → v3.0.0 (postgres → vectorchord 0.4.3, frames → immich_v3 tag)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked to upgrade Immich to the just-released v3.0.0 (release notes,
migration guide and release discussion #29439 reviewed — no config-breaking
changes for this stack: we already use the split MACHINE_LEARNING_PRELOAD
vars, don't set DB_VECTOR_EXTENSION, OAuth goes through Authentik over
HTTPS, and the GPU node's CPU meets the new x86-64-v2 requirement).

The Immich Postgres image moves to VectorChord 0.4.3 to match the upstream
v3 reference stack (0.3.0 is still within v3's supported range '>=0.3 <2';
Immich upgrades the extension itself at startup). Both photo frames switch
to ImmichFrame's immich_v3 compatibility tag because every versioned
ImmichFrame release (≤ v1.0.33.0) crashes deserializing Immich v3 API
responses; repin to a versioned tag once upstream ships stable v3 support.

Deployment images are Keel-managed (KEEL_IGNORE_IMAGE, policy=patch), so
this commit is the source-of-truth record; the live rollout happens via
kubectl set image in the same session. Pre-upgrade pg_dumpall taken
(job postgresql-backup-pre-v3).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 14:18:22 +00:00
Viktor Barzin
6f03ccd1aa excalidraw: grant emo-browser SA port-forward for drawing uploads
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked to fix emo's permission so his Claude can upload to the
Excalidraw service. emo's recent sessions show the documented upload
recipe (kubectl port-forward svc/draw + X-Authentik-Username header,
from his ~/.claude/CLAUDE.md) failing with:

  pods/portforward forbidden for system:serviceaccount:chrome-service:emo-browser
  in namespace excalidraw

because his default kubeconfig is the read-only emo-browser SA (its
port-forward grant covers only chrome-service) and his old admin
kubeconfig at /home/emo/code/config expired and was removed.

Add a namespace-scoped Role (pods/portforward create) + RoleBinding for
that SA in the excalidraw namespace, mirroring the 2026-06-28
chrome-service grant. Trade-off (any-user drawings via the trusted
username header) documented in the file and accepted.

Also record the grant in docs/architecture/chrome-service.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 11:08:28 +00:00
Viktor Barzin
a64d2ba2b9 upgrades: fix hourly gotenberg error + cap update notifications at weekly
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor was getting upgrade-error Slack messages every hour and wants
update notifications at most weekly. Root cause of the errors: Keel kept
trying to roll gotenberg 8.25->8.25.1 in paperless-ngx but kyverno's
require-trusted-registries denied it — gotenberg/* (and apache/*, which
tika will hit next) were never allowlisted, and Keel's Slack notifier at
info level re-posted the identical failure to #general on every hourly
poll since Jun 28.

Changes: allowlist gotenberg/* + apache/* so the patch applies cleanly;
disable Keel's direct Slack notifier and replace failure visibility with
a KeelUpdateFailing Loki-ruler alert (alert-on-change: one notification
plus the daily digest, never an hourly drip); remove diun's Slack
notifier whose default message @channel-pinged #image-updates for every
new upstream tag every 6h (the n8n upgrade-agent webhook feed is
untouched). The k8s upgrade report is already weekly (Mon 06:07 UTC).
Paperless-ngx itself stays paused (keel policy=never, user-managed) while
the ingest runs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 07:16:50 +00:00
Viktor Barzin
dab307f9f8 Merge remote-tracking branch 'origin/master'
All checks were successful
ci/woodpecker/push/default Pipeline was successful
2026-07-02 05:39:15 +00:00
Viktor Barzin
f1e81772d5 broker-sync: repoint image to ghcr (was frozen on pre-migration DockerHub)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The nightly ibkr sync failed with 'No such command ibkr': every broker-sync
CronJob still pulled viktorbarzin/broker-sync:latest from DockerHub, which
nothing has pushed to since the ADR-0002 move to GHA->ghcr on 2026-06-13 —
the jobs were silently running a frozen pre-ibkr build. The migration had
allowlisted only the wealthfolio namespace for the private
ghcr.io/viktorbarzin/wealthfolio-sync image, so broker-sync also lacked
pull credentials. Repoint the image, add ghcr-credentials imagePullSecrets
to all eight CronJobs, and allowlist the broker-sync namespace (wealthfolio
stays — its own monthly sync pulls the same image). Related: code-9ko8.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 05:31:00 +00:00
Viktor Barzin
ac41e7c017 nvidia: run advertise-gpumem provisioner under bash (dash rejects pipefail)
First apply of ADR-0016 failed: terraform local-exec defaults to /bin/sh,
which on Ubuntu is dash — 'set -euo pipefail' exits 2 before running kubectl.
Pin the interpreter to bash. Everything else in the gpumem apply succeeded.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 05:21:47 +00:00
Viktor Barzin
968b2b9c64 Merge remote-tracking branch 'origin/master' into wizard/gpu-vram-budget 2026-07-02 05:18:34 +00:00
Viktor Barzin
a12b09af04 broker-sync: pin data-mounting CronJobs to k8s-node4 (stop nightly RWO wedge)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
All broker-sync CronJobs share one RWO proxmox-lvm volume. With free
scheduling the nightly 02:00-04:15 runs land on different nodes, forcing
a detach/attach cycle whose QMP hotplug intermittently ghost-attaches on
disk-heavy VMs — every job then sits in ContainerCreating for hours
(happened 2026-06-30, 07-01 and again 07-02; fires
PodsStuckContainerCreating and skips the day's trade syncs). Pinning all
seven volume-mounting jobs to k8s-node4 (fewest CSI disks, 11) makes the
volume attach once and stay put — no hotplug dance, no wedge.
version_probe mounts nothing and stays unpinned. Durable fix for the
recurrence tracked in beads code-9ko8.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 05:16:38 +00:00
Viktor Barzin
3c85af2dc2 fire-countdown dashboard: SQL guards + tax regime + honesty fixes
All checks were successful
ci/woodpecker/push/default Pipeline was successful
From the flaw-hunt workflow (all verified):
- Projected-FIRE-date panels (solo/household/family) now guard savings £/yr:
  0 / empty / negative all render "Set savings £/yr" instead of a blank tile,
  a SQL error, or a nonsensical past date ("Jan 1849"). Verified across cases.
- New "Tax regime" panel surfaces the per-country jurisdiction — 14/22 countries
  fall back to the neutral 'nomad' 1% assumption, which was previously invisible.
- Intro no longer hard-codes "£139k pension" (contradicted the £328k tranche
  panel); pension value is now only shown data-bound in the tranche panel.
- Intro adds caveats: Anca's spend is an estimate (pending live re-pull), and
  non-modelled countries use the nomad tax fallback.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-01 22:44:17 +00:00
Viktor Barzin
339f5d89b9 onlyoffice: decommission (stack destroyed, dir removed)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The document server had been deliberately scaled to 0/0 for 184 days, but
its ingress kept the uptime-kuma monitors alive, so 'onlyoffice down'
showed up in every daily alert digest. Viktor approved tearing it down.
terragrunt destroy ran clean (11 resources) before this commit; the kuma
monitors auto-prune with the ingress. Also drops the onlyoffice/* image
prefix from the kyverno trusted-registries allowlist, the service-catalog
rows, and updates the nextcloud collabora comment. Document data (if any)
remains on the PVE NFS share.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-01 22:35:22 +00:00
Viktor Barzin
3c476dab32 postiz+portal: remove broken alert sources (stale backup CronJob, bogus scrape annotations)
Viktor is getting daily Slack alert noise; these two were the recurring
generators. The postiz-postgres-backup CronJob still dumped from the old
in-namespace postiz-postgresql service that was removed in the CNPG
migration (2026-06-28) — it failed every night at 03:00 and re-fired
BackupCronJobFailed each day. The postiz DB now lives on the shared CNPG
cluster and is already covered by the dbaas per-db dumps, so the CronJob
(and its NFS backup volume) is redundant and removed rather than repaired.

portal-stt/portal-tts advertised prometheus.io scrape annotations that
never worked: the deployed Speaches build 404s /metrics, and openai-edge-tts
has no metrics at all (its annotation pointed at a JSON endpoint, which
fails exposition parsing regardless). Both produced a permanently firing
ScrapeTargetDown. Annotations removed until the apps actually serve metrics.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-01 22:35:21 +00:00
Viktor Barzin
5a312563c6 monitoring/wealth: dash the in-progress year on the hourly-rate panel
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The current, still-accruing calendar year read misleadingly high (e.g. 2026
at 5 months showed £149/h gross, above all of 2025) because the full-year
bonus - paid every March - plus front-loaded quarterly RSU vests get divided
by only the months worked so far. It settles lower as the year completes.

Split each line into a solid series (complete years) and a dashed series
(the latest, still-accruing year), so the provisional point is visually
flagged. The split auto-detects the in-progress year (latest year with
< 12 months of payslips), so it needs no per-year maintenance. Panel
description now explains the caveat.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 12:45:51 +00:00
Viktor Barzin
28984dda9a monitoring/wealth: add per-year effective hourly-rate panel (gross vs net)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor wanted to see, on the wealth dashboard, the hourly wage he earned
each year - both gross and net - with year on the X axis.

New timeseries (line) panel "Effective hourly rate - gross vs net":
- hourly = annual pay / hours worked; hours = contractual 40h/week
  (2,080h per full year, confirmed from the Facebook/Meta UK offer letter:
  Mon-Fri 09:00-18:00 less a 1h lunch), prorated by the months actually
  worked so partial years (2019, 2020, 2026) read correctly.
- Gross = gross_pay incl. notional RSU vest; Net = take-home.
- timeFrom 10y so all years show under the dashboard's default 180d range.

Source data: a duplicate March-2023 payslip (Paperless doc 347, a re-upload
of doc 33) was removed separately, so 2023 is no longer double-counted; this
also corrects the existing net-pay panel.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 12:28:46 +00:00
Viktor Barzin
82371d1ef8 dbaas/mysql: innodb_doublewrite=DETECT_ONLY to halve page-flush writes
All checks were successful
ci/woodpecker/push/default Pipeline was successful
MySQL device-write investigation (code-oflt): after the nextcloud webcal
throttle settled (the earlier 3.4-8.8 MB/s were post-restart transients),
MySQL is ~1.74 MB/s at the InnoDB level — and HALF of that (~0.86 MB/s,
~55 pages/s) is the doublewrite buffer writing every flushed page twice.
Redo is negligible (0.01 MB/s), no temp-table spilling.

Set innodb_doublewrite=DETECT_ONLY (dynamic, no restart; persisted in the
cnf): InnoDB stops writing full page CONTENT to the doublewrite buffer
(~halves MySQL's page-flush writes on the IOPS-bound sdc) but keeps
torn-page DETECTION metadata — a crash-torn page is flagged on recovery
(restore from the daily mysqldump) rather than silently corrupt. Chosen
over full OFF: same write saving, keeps detection, and OFF requires a
shutdown ("cannot change to OFF if doublewrite is enabled"). Acceptable
risk given the PERC BBU cache + UPS (in-flight writes complete on power
loss) + daily per-db backups.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 08:47:09 +00:00
Viktor Barzin
74819d4061 feat(nvidia): GPU VRAM budget + watchdog to stop T4 overallocation
The single time-sliced Tesla T4 has no per-tenant memory isolation, so its
~9 GPU workloads can collectively overallocate VRAM. On 2026-06-02 immich-ml's
onnxruntime arena grew to 10.7 GB and silently starved llama-swap, breaking
recruiter-responder for ~5h. Viktor asked for memory protection so we don't
overallocate GPU memory, and chose to do it at the scheduling level (no
device-plugin swap) after weighing HAMi and MPS.

Make the scheduler VRAM-aware and add runtime teeth, all repo-native,
time-slicing untouched:
- Advertise a node extended resource viktorbarzin.me/gpumem (~14000 MiB) via a
  reconcile null_resource (immediate, apply-time) + hourly re-assert CronJob.
- Each always-on GPU tenant declares a gpumem budget (immich-ml 3000,
  llama-swap 5000, frigate 2000, immich-server 1800, portal-stt 1500; sum 13300
  <= advertised) so the scheduler refuses to co-schedule past the card
  (overflow -> Pending).
- gpu-vram-watchdog Deployment recycles the biggest over-budget tenant ONLY when
  actual free VRAM < floor. Ships DRY_RUN=true (observe-then-enforce); flip to
  false after a few cycles look right.
- Prometheus alerts GPUVRAMLow / GPUVRAMTelemetryDown / GPUVRAMWatchdogDown --
  the 2026-06-02 post-mortem's never-built free-VRAM follow-up.
- Docs: ADR-0016 (records why HAMi/MPS were rejected), CONTEXT.md GPU-sharing
  glossary; fix the stale "whole T4 / scale immich-ml to 0" llama-cpp comment.

HITL GPU-node change: apply nvidia FIRST (advertise gpumem), verify the node
shows the capacity, THEN the consumer stacks -- the cutover bounces GPU pods.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 07:57:40 +00:00
Viktor Barzin
82c9e69b77 dbaas/mysql: 2Gi InnoDB buffer pool + 6Gi limit + ignore VCT drift
Some checks failed
ci/woodpecker/push/default Pipeline was canceled
Cut MySQL's write-IOPS footprint on the contended PVE sdc HDD (code-oflt).
Standalone MySQL was the #1 sdc bandwidth writer (~2.8-3.5 MB/s). Live
attribution found ~60% of its writes were nextcloud webcal calendar churn
(throttled separately at the app layer); this addresses write amplification
on the remainder:

- innodb_buffer_pool_size 1Gi -> 2Gi: the pool was too small for the ~5.6Gi
  hot set (Innodb_buffer_pool_wait_free=1.78M = threads stalling for a free
  page -> constant flush-to-make-room write IOPS).
- container memory limit 4Gi -> 6Gi (requests 3->4Gi): the pod was already
  at ~3.7Gi/4Gi (near OOM) with the 1Gi pool, so the 2Gi pool needs the
  headroom. One-time MySQL pod restart to apply.
- ignore_changes on the StatefulSet volume_claim_template: the VCT is
  immutable post-creation and pvc-autoresizer rewrites its annotations on
  the live object, so TF's desired VCT could never apply and errored every
  broad dbaas apply. Ignoring it (autoresizer owns PVC sizing) removes the
  long-standing need to -target around it.

Applied + verified live: buffer_pool=2.0GiB, limit=6Gi, pod healthy,
24 DBs reachable, restart clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 07:55:18 +00:00
469cdd7507 frigate: expose go2rtc on a dedicated MetalLB LB IP (RTSP 8554 + WebRTC 8555)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
HA live video from the cluster Frigate hangs/fails because the only path
to Frigate is the Traefik HTTP(S) ingress (frigate-lan -> 10.0.20.203),
which cannot carry RTSP or WebRTC. The container already listens on
8554+8555 but only RTSP had a Service (NodePort), and WebRTC (8555) was
never exposed. Convert frigate-rtsp to a LoadBalancer on a dedicated MetalLB
IP (.204, ETP=Local, pod pinned to the GPU node) carrying RTSP 8554 +
WebRTC 8555 (TCP+UDP), giving HA Sofia + LAN browsers a stable cross-VLAN
endpoint for native HLS/WebRTC live (parity with the Hikvision NVR).
Companion non-Terraform steps are in the PR body.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-30 07:15:22 +00:00
Viktor Barzin
9ea9cae073 rightsize: reconcile batch-2/3 stacks blocked by killed #427 (job-hunter, wealthfolio, f1-stream)
Some checks failed
ci/woodpecker/push/default Pipeline failed
Memory limits were committed (batch 2/3) but pipeline #427 was killed mid-apply and the local homelab tf apply hit a stale backend-init; this comment-only diff re-triggers a clean CI apply for the three stacks so live matches master (job-hunter 768Mi, wealthfolio 512Mi, f1-stream 384Mi).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 15:59:41 +00:00
Viktor Barzin
7cc9cde5b1 external-secrets: enable ESO Vault token cache to cut sdc write churn
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Add --enable-vault-token-cache to the ESO controller (a graduated,
non-experimental flag in chart 2.6.0). Until now ESO authenticated to
Vault with login -> lookup-self -> revoke-self on *every* secret fetch.
Across 92 ExternalSecrets refreshing every 15m that measured ~0.22
logins/s + ~0.22 revoke-self/s on the active Vault member, and each
cycle is a token create+revoke (plus its lease) written to the Raft log
on all three members. Those fsync-heavy writes land on the contended
PVE RAID1 7200rpm HDD (sdc) -- one of the write sources behind the
recurring control-plane flaps (code-oflt write-reduction).

The eso kubernetes-auth role already issues a 240h periodic, unlimited-
use token, so the churn was pure waste: ESO discarded a perfectly good
token after a single use. With token caching ESO mints one token and
reuses/renews it, collapsing logins from ~13/min to a handful per token
lifetime. Verified live: vault cache initialized, 112/113 ExternalSecrets
Ready (the one failure, instagram-poster, is pre-existing data drift
unrelated to auth), logins dropped to ~0 after warm-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 15:32:37 +00:00
Viktor Barzin
bc626a2d89 rightsize: raise OOM-tight memory limits (batch 3/N — spike protection)
Some checks failed
ci/woodpecker/push/default Pipeline failed
shlink 512->704Mi, linkwarden 1Gi->1280Mi, chrome-service 2Gi->2624Mi, forgejo 4Gi->5Gi, f1-stream 256->384Mi. All were request==limit with 30d peak at 91-100% of the ceiling — a spike would OOM-kill them. Raising the limit (now Burstable, request<limit) gives real burst headroom. This is the genuine 'don't OOM on occasional spike' fix. Small add (~2.2Gi limits) vs the ~20Gi of fat removed in batches 1-2, so net overcommit keeps dropping.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 15:28:11 +00:00
Viktor Barzin
418d1efb4b rightsize: trim over-provisioned memory (batch 2/N)
Some checks failed
ci/woodpecker/push/default Pipeline was canceled
claude-agent-service 12Gi->3Gi (peak 585Mi — the single biggest fat, ~9Gi of limit-overcommit removed), job-hunter 1280->768Mi (kept chromium headroom; 30d peak 118Mi), fire-planner 1024->320Mi, wealthfolio 1Gi->512Mi (kept history-growth headroom). Burstable, limits kept >= generous peak headroom, never below peak. ~10.7Gi of limit overcommit removed. paperless-ai intentionally LEFT at 4Gi (documented in-process RAG model load).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 15:27:17 +00:00
Viktor Barzin
c3553731c7 dbaas: CNPG write-reduction — archive_timeout=0, commit_delay, wal_compression=zstd
Part of code-oflt (cut sdc write IOPS before the SSD move; analysis #6922).
- archive_timeout 300->0: CNPG forces archive_mode=on but .spec.backup is empty
  (no ObjectStore), so a 16MB WAL segment switch every 5min shipped NOWHERE =
  ~4.6 GB/day of pure-waste WAL on the contended sdc. archive_mode stays CNPG-on
  (reserved); 0 just stops the timed switch. Daily pg_dump cron unchanged.
- commit_delay 0->2500us: group-commit coalesces concurrent fsyncs. SAFE for
  every DB incl financial -- data still fsynced before COMMIT acks, only <=2.5ms
  added latency under concurrency.
- wal_compression pglz->zstd: ~30-50% smaller full-page images.
All sighup-reloadable. Applied via targeted apply of
module.dbaas.null_resource.pg_cluster (trigger bumped) to avoid the pre-existing
mysql VCT drift that breaks broad dbaas applies.

Refs: code-oflt.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 15:16:38 +00:00
Viktor Barzin
5d059786a1 rightsize: trim over-provisioned memory limits+requests (batch 1/N)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
claude-breakglass 4Gi->512Mi, stirling-pdf 1536->512Mi, insta2spotify 2Gi->256Mi, recruiter-responder 768->256Mi. These idle/utility services had memory LIMITS sitting 4-15x above their 30d peak, inflating cluster limit-overcommit to 142% across the 5 post-node6 nodes. Burstable (request<limit), limits capped at ~peak x1.5 (never below peak), so no OOM risk (verified zero OOMKills cluster-wide in 30d). Reduces phantom limit overcommit + frees scheduler requests.

Follows the 3-reviewer adversarial review: raising limits on an already-overcommitted cluster worsens correlated node-OOM; the real fix is trimming the fat. Limits only lowered where peak is far below; tuned/DB/GPU limits untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 14:46:58 +00:00
Viktor Barzin
256122ff5b monitoring: make ClusterCannotTolerateNonGpuNodeLoss topology-agnostic
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The N-1 capacity alert was hardcoded to k8s-node[234]/[1234], predating node5/node6 (added 2026-05-26) and the 2026-06-29 removal of node6 — so it no longer reflected the real cluster and gave no trustworthy N-1 signal. Generalize node selection via metrics: GPU node by nvidia_com_gpu capacity, drained/cordoned by kube_node_spec_unschedulable, down by the Ready condition. Control-plane excluded by name (node!~"k8s-master.*") because this cluster's kube-state-metrics exposes neither kube_node_role nor node taints/labels (verified live).

Also fixes a latent bug (multiplying by kube_node_spec_unschedulable==0 zeroed the result) and refreshes the remediation text (krr, not the removed Goldilocks). With node6 gone the rule now correctly evaluates LHS 31.0Gi > RHS 27.9Gi (fires) — the honest signal that removing node6 tightened requests-based N-1; trimming the inflated requests clears it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 12:34:01 +00:00
Viktor Barzin
c0e0911afa dbaas: bump pg_cluster trigger so the checkpoint/WAL params actually apply
a2c8f906 added checkpoint_timeout=15min + max/min_wal_size to the CNPG
Cluster YAML, but the cluster is applied via null_resource.pg_cluster +
local-exec kubectl apply, which only re-runs when its `triggers` change.
The YAML edit didn't bump a trigger, so the change was inert and never
applied (incl. via CI). Bump the pg_params trigger so the kubectl apply
re-runs and CNPG hot-reloads the new params (reloadable, no restart).

Landing it via a targeted apply (-target=null_resource.pg_cluster) to avoid
3 pre-existing unrelated drifts in this stack -- notably a mysql_standalone
volumeClaimTemplate annotation diff the apiserver rejects as immutable,
which is what fails broad dbaas applies (and silently blocked a2c8f906).

Refs: code-oflt.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 12:25:37 +00:00
Viktor Barzin
a2c8f906ec dbaas: stretch CNPG checkpoint timer 5->15min + raise WAL size (cut sdc write IOPS)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked to reduce CNPG checkpoint/WAL writes as part of the sdc
IOPS-isolation work (code-oflt). The IOPS deep-dive found CNPG checkpoints
fire 100% on the 5-min timer (checkpoints_timed >> checkpoints_req), each
triggering a full-page-write burst + flush onto the contended 7200rpm sdc
spindle -- a top write-IOPS source after etcd.

Set checkpoint_timeout=15min + max_wal_size=4GB + min_wal_size=1GB so
checkpoints fire ~1/3 as often (fewer FPW) and WAL segments are recycled
rather than churned. All three are sighup-reloadable -> CNPG applies them
without a restart or failover. checkpoint_completion_target stays 0.9 so
each checkpoint's IO is still smeared across the interval. Bounded
recovery-time tradeoff (more WAL to replay on crash), acceptable for the
write relief. wal_compression left at pglz ('on') pending image
zstd-support verification.

Also refreshes the stale CNPG tuning note in .claude/CLAUDE.md (it listed
shared_buffers=512MB / effective_cache_size=1536MB / 2Gi; live is 1024MB /
2560MB / 3Gi).

Refs: code-oflt (etcd/sdc IO isolation).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 11:41:09 +00:00
Viktor Barzin
3398873a16 k8s-upgrade: move version-check cadence from daily to weekly (Sun check, Mon report)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked to move the upgrade checks to weekly. With the actionable-vs-held
gate now quieting the routine 'held' churn (e.g. 1.36), a daily check + attempt
buys little; weekly is enough. Accepted trade-off: k8s patch (incl. security)
uptake now lags up to 7 days instead of <=1.

- var.schedule:        0 23 * * *  ->  0 23 * * 0   (detector: weekly Sunday 23:00 UTC)
- var.report_schedule: 7 6 * * *   ->  7 6 * * 1    (report: Monday 06:07 UTC, ~7h
  after the Sunday check, so nightly-report.py's ~25h staleness threshold stays
  valid AND still flags a missed weekly run; no STALE_SECONDS change needed)

The report CronJob keeps its historical name k8s-upgrade-nightly-report (rename
= churn). Cadence wording updated across main.tf comments, nightly-report.py
docstring, and the runbook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 06:22:20 +00:00
Viktor Barzin
e43e64c666 kyverno: disable reports-controller to stop etcd ephemeralreport load
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor flagged not wanting to wear the single non-RAID SSD with useless etcd
writes if etcd moves there. Investigation found the avoidable load is kyverno
reporting: the 2026-06-12 etcd-load-reduction disabled the report *features*
but left the reports-controller running (default --enableReporting +
--validatingAdmissionPolicyReports=true), so the 2026-06-21 kyverno upgrade
left a one-time pile of ~10.5k cluster/namespaced ephemeralreports (~114MB in
etcd) that nothing reaps (aggregation off). Listing that range starves etcd's
fdatasync enough to flap the apiserver (observed live 2026-06-28).

Disable the reports-controller outright (reportsController.enabled=false),
completing the 2026-06-12 intent. Reports are not consumed (violations surface
via Loki->Slack); admission enforcement (deny-* policies) and Keel mutation are
independent of it. The ~10.5k stale reports already in etcd are cleared
separately (throttled, out-of-band) since bulk-deleting them is itself
etcd-heavy.

Refs: code-oflt (etcd IO isolation), code-at4f (etcd starvation alerting).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-29 05:35:36 +00:00
Viktor Barzin
cf42042cba monitoring: re-trigger apply to persist state after CI cancel-race
All checks were successful
ci/woodpecker/push/default Pipeline was successful
No-op comment touch in loki.tf to force a clean `terragrunt apply monitoring`.
The pfSense egress-monitoring apply (commit 7fe2d978, CI pipeline #414) was
cancelled by a newer push and SIGKILLed mid-helm-upgrade: the live resources
applied (probes green, rules loaded) but the Terraform state write and the helm
release finalize were lost, leaving the prometheus release stuck in
pending-upgrade (manually unstuck). This commit re-applies the unchanged
monitoring stack so state matches live, with zero resource changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 16:58:49 +00:00
Viktor Barzin
f92075b7c5 fire-planner: solve FIRE targets to age 100 (horizon 60→72)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor plans to live to 100, so the portfolio must last that long. The
fire-targets CronJob was solving a 60-year horizon (≈ to age 88); set it to 72
(retire ~age 28 → age 100). Raises every case's FIRE number modestly (more years
to fund). A one-off in-cluster job re-solves the existing rows at the new horizon.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 16:49:20 +00:00
Viktor Barzin
7fe2d9780e monitoring: add pfSense WAN/egress alerting + probes
Some checks failed
ci/woodpecker/push/default Pipeline was canceled
On 2026-06-27 pfSense (Proxmox VMID 101) stopped passing internet egress for
~20 min while internal routing + Unbound stayed up; recovery needed a manual
reboot and NOTHING alerted — there was no egress probe and the cloudflared
replica metric stayed green. Add first-class egress monitoring so the next
occurrence pages in ~2 min instead of being noticed by a human.

- blackbox-exporter: new icmp_egress + dns_external probe modules (+ NET_RAW
  so ICMP can use raw sockets).
- Three in-cluster probe jobs exercising the pod->node->pfSense-NAT path that
  failed: wan-gateway-icmp (192.168.1.1), internet-egress-icmp (9.9.9.9 +
  1.1.1.1), internet-egress-dns (cloudflare.com via both resolvers).
- Prometheus alerts (group "Egress / pfSense"): WANGatewayUnreachable,
  InternetEgressDown (both providers dead), ExternalDNSResolutionDown,
  EgressOnlyDivergence (reuses the existing t3-probe legs — the incident's
  exact "external down while internal up" signature), PfSenseVMDown.
- Loki ruler: CloudflaredTunnelConnLoss — the canary that fired first; the
  cloudflared replica metric is blind to tunnel-connection loss. Threshold
  calibrated against live Loki (steady-state ~2/6h vs 37-85/5m in-incident).
- Alertmanager inhibit: WAN/egress-down suppresses the downstream egress
  symptom alerts so one root alert pages, not a storm.
- Runbook docs/runbooks/pfsense-egress.md + .claude/CLAUDE.md.

All metric names + the cloudflared threshold verified against live
Prometheus/Loki. Pure GitOps, no pfSense change. Firewall-side hardening
(dpinger retargeting, failover gateway, pfSense syslog -> Loki) is deferred
and documented in the runbook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 16:46:30 +00:00
Viktor Barzin
6f042ee239 fix(fire-planner): grafana fire-planner-pg datasource survives pw rotation
Some checks failed
ci/woodpecker/push/default Pipeline failed
The fire-planner-pg Grafana datasource baked the rotating fire_planner DB
password into its provisioning ConfigMap at terraform plan-time, so on every
7-day static-role rotation the password went stale and ALL fire-planner-pg
dashboards (fire-planner, cost-of-living, and the new wealth FIRE Countdown)
silently failed with "password authentication failed for user fire_planner"
until the next stack apply.

Switch to the same live-env pattern wealth-pg / payslips-pg already use:
- new ExternalSecret grafana-fire-planner-pg-creds (monitoring ns, Reloader
  match) mirrors the rotating Vault static-creds/pg-fire-planner password
- datasource ConfigMap now references $__env{FIRE_PLANNER_PG_PASSWORD}
- Grafana mounts it via envFromSecrets; reloader (auto) restarts Grafana on
  rotation so the provisioned datasource never goes stale

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 16:14:42 +00:00
Viktor Barzin
35c0057d83 chrome-service: raise noVNC sidecar memory limit 96Mi->256Mi (fix OOMKill)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The noVNC sidecar (x11vnc + websockify) was OOMKilled (exit 137) repeatedly
whenever someone actively opened chrome.viktorbarzin.me — the view connected
then froze/hung. Idle usage is ~37Mi, but x11vnc + websockify
framebuffer/encode buffers spike past the 96Mi cap when streaming the
1280x720 screen to a client. Raised request 32Mi->64Mi, limit 96Mi->256Mi
(Burstable, aux tier). Already applied live via a transient kubectl patch
(Recreate rollout, verified 0 restarts since); this lands the durable state
so the next apply / daily drift-detection doesn't revert it to 96Mi.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 15:39:17 +00:00
Viktor Barzin
2e50c1235c chrome-service: grant emo shared browser access (noVNC + homelab browser CLI)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked to give emo access to the cluster's headed Chrome so he can fill
in forms and get past anti-bot / captcha pages. emo was deliberately locked
out of chrome-service (noVNC Authentik allowlist was Viktor-only + his
power-user RBAC has no pods/portforward). Viktor's explicit decision: SHARE
his existing browser rather than stand up an isolated per-user instance,
accepting that emo can therefore reach Viktor's warmed logged-in sessions
(CDP has no per-context auth, so the single shared persistent profile is
reachable by anyone who can drive the browser). emo's CLI use is hands-off
(his agent can run it unattended).

- authentik: add emo (emil.barzin / emil.barzin@gmail.com) to CHROME_ALLOWED
  so the admin-services-restriction policy admits him to chrome.viktorbarzin.me
  (noVNC). Reverses the prior Viktor-only lock; comment updated to record why.
- chrome-service/rbac.tf (new): emo-browser ServiceAccount + long-lived token
  (dashboard-sa.tf pattern), a chrome-service-portforward Role granting
  pods/portforward, and a cluster read-only binding (oidc-power-user-readonly)
  so the SA can resolve the Service and emo's normal read access doesn't regress.
- t3-provision-users.sh: install_browser_kubeconfig installs a dual-context
  kubeconfig for any user with a <user>-browser SA — SA token as the default
  context (non-interactive, works headless), personal OIDC retained as the
  oidc@homelab named context. emo's OIDC-only kubeconfig can't authenticate the
  headless agent session that homelab browser needs.
- docs/architecture/chrome-service.md: document the shared-browser multi-user
  access model, the session-exposure trade-off, and how to grant/revoke a user.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 15:20:07 +00:00
Viktor Barzin
50077b43d4 paperless-ngx: drop TASK_WORKERS 6->4 (6 OOMKilled the pod mid-import)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
6 OCR workers crept past the 8Gi per-container memory cap over ~6h and
OOMKilled paperless at 15:00 during the Emo bulk import. The import
auto-recovered (the consume dir lives on the PVC, so a restart re-scans
and reprocesses — nothing lost), but it left the queue inflated with
re-queued duplicates and spiked etcd on each restart.

The 8Gi cap is the shared edge-tier `tier-defaults` LimitRange, not worth
raising for one namespace. 4 workers fit with headroom (4 measured
~1.3Gi). Matches the value applied live via `kubectl set env` during
incident response; this removes the drift so the next apply keeps it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 15:06:46 +00:00
Viktor Barzin
8236ae309d postiz: reconcile HCL to live (adopt unmerged stack config), keep parked
All checks were successful
ci/woodpecker/push/default Pipeline was successful
postiz's live deployment (Helm + Temporal + Elasticsearch + Authentik
OIDC + static-DB password) came from the never-merged branch
`wizard/postiz-cnpg-oidc`, so master's HCL was stale and a `terragrunt
apply` would have DESTROYED the stack. This lands that postiz config to
master so HCL == state == live (CI green; destroy-landmine gone).

Kept PARKED (postiz + temporal replicas = 0): IG-via-postiz is Meta-
blocked (it hardcodes retired Instagram scopes → OAuth "Invalid Scopes"),
which is why it was parked; IG runs via the instagram-poster service. To
revive later: flip postiz `replicaCount` + temporal `replicas` back to 1
and re-check image pins.

Notes captured in this reconcile:
- ES image pinned to 7.17.28 (the branch's 7.17.24 was a DOWNGRADE vs the
  live data → ES refused to start "cannot downgrade node 7.17.28→7.17.24";
  caught + rolled back during this work).
- The 4 Authentik resources (app/provider/group/binding) were re-imported
  into state (adopted, not recreated — no duplicate AK objects); the
  obsolete `external_secret_jwt` ExternalSecret was removed (Retain → its
  synced secret was kept).
- Vault-side cleanup (removing the unused pg-postiz rotated role) is
  deliberately NOT included here — deferred, postiz uses a static
  secret/postiz database_url.

State was already reconciled by a local `scripts/tg apply`; this commit is
the HCL catch-up (CI re-apply is a no-op).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 12:54:59 +00:00
Viktor Barzin
e518ada3d4 authentik: repoint to overlay patch3 (all-iOS SFE + SFE social links) + docs
All checks were successful
ci/woodpecker/push/default Pipeline was successful
global.image -> 2026.2.4-patch3. Old iPad Chrome (and any iOS browser) now gets
the SFE too, and the SFE login shows social-login buttons (emo is Google-only with
no password, so the password form alone was a dead end). Docs: .claude/CLAUDE.md +
authentication.md.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 11:53:26 +00:00
Viktor Barzin
4fc09b7a61 Merge remote-tracking branch 'origin/master' into wizard/authentik-sfe-social
Some checks failed
ci/woodpecker/push/default Pipeline was canceled
Build Custom Authentik Image / build (push) Has been cancelled
2026-06-28 11:53:04 +00:00