Commit graph

3957 commits

Author SHA1 Message Date
Viktor Barzin
c958f6a589 feat(nextcloud-todos): Phase 4 IaC — service stack, Vault role, DB bootstrap, OpenClaw plugin, monitoring
Phase 4 infrastructure-as-code for the nextcloud-todos service (watches the
Nextcloud Personal task list; classifies todos via local qwen3-8b and routes
research/mutating work through claude-agent-service). Clones the
recruiter-responder service pattern end-to-end. Written only — NOT applied.

- stacks/nextcloud-todos/{main.tf,terragrunt.hcl}: new aux stack cloning
  recruiter-responder — ns (tier aux, istio-injection disabled, keel enrolled),
  two ExternalSecrets (vault-kv app secrets + vault-database DSN), Recreate
  deployment with alembic-migrate init-container, ClusterIP svc, /cb-only
  HMAC-gated ingress (auth=none, proxied), and an idempotent webhook-register
  null_resource (OCS webhook_listeners API, both CalendarObject Created/Updated
  events -> internal svc URL, Bearer auth).
- stacks/vault/main.tf: pg_nextcloud_todos static role (nextcloud_todos, 7d
  rotation) + pg-nextcloud-todos in the postgresql allowed_roles array.
- stacks/dbaas/modules/dbaas/main.tf: pg_nextcloud_todos_db null_resource
  (clone of pg_tripit_db) — creates role+DB, pins role search_path, and
  creates schema nextcloud_todos AUTHORIZATION nextcloud_todos.
- stacks/openclaw/main.tf: install-nextcloud-todos-plugin init-container,
  nextcloud-todos-api in plugins.allow + the doctor-fix re-add + plugins
  enable, NEXTCLOUD_TODOS_URL/NEXTCLOUD_TODOS_TOKEN env, and the cross-path
  ESO key (secret/nextcloud-todos.webhook_bearer_token).
- stacks/uptime-kuma/modules/uptime-kuma/main.tf: internal /healthz HTTP
  monitor. Prometheus /metrics scrape via svc annotations in the new stack.
- .gitleaksignore: allowlist two curl-auth-user false positives (the OCS
  webhook curl uses a Vault-sourced shell var, not a literal credential).

KV seed (secret/nextcloud-todos) + applies are deferred to the apply runbook.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:10 +00:00
Viktor Barzin
c3c3d5e010 feat(claude-agent-service): seed nextcloud-todos planner + exec agents
Add cp lines in the seed-beads-agent init-container so the two new
nextcloud-todos agent definitions (baked into the image at
/usr/share/agent-seed/ by the claude-agent-service Dockerfile) land in
~/.claude/agents/ at pod start. Phase 3, task 3.3.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:09 +00:00
github-actions[bot]
d0206848cb priority-pass: bump image_tag to 63e118c3 [ci skip]
Auto-committed by ViktorBarzin/priority-pass GHA on push to main.
Source: 63e118c334
2026-06-05 09:19:09 +00:00
Viktor Barzin
55a8b238a0 mam-farming: migrate data volume proxmox-lvm → NFS
The grabber + bp-spender shared a 1Gi proxmox-lvm RWO PVC holding two
plain-text files (mam_id cookie + grabbed_ids.txt dedup list — no embedded
DB). On 2026-06-04 a grabber pod wedged in ContainerCreating for >1h because
its proxmox-lvm disk couldn't hot-plug onto a SCSI-LUN-saturated node VM
(k8s-node3, QEMU `query-pci` QMP timeout); `concurrencyPolicy: Forbid` then
blocked every run → 0 grabs → MAMFarmingStuck.

NFS (nfs_volume module, matching the other 9 servarr apps) removes this
volume from the per-VM SCSI hotplug path entirely: it mounts over the
network, consumes zero LUN slots, and is RWX so the grabber + bp-spender can
co-schedule on any node. Data (mam_id + grabbed_ids.txt) was copied across
before the switch; verified a grabber run Succeeds on NFS on node4 with the
preserved dedup list (tracked IDs carried over). Lever #1 from
docs/architecture/storage.md "Per-VM SCSI-LUN cap".
2026-06-05 09:19:09 +00:00
Viktor Barzin
cb96d5d590 fix(k8s-dashboard): use email_verified=true + groups scope mappings
The apiserver rejects the email username-claim when email_verified is false
(invalid bearer token 401). Authentik external/social users are unverified,
so the default scope-email mapping fails. Mirror the proven kubernetes
provider: use the custom 'Kubernetes Email (verified)' mapping (hardcodes
email_verified=true) + 'Kubernetes Groups'. Drop the now-unneeded dual-aud
mapping (apiserver trusts the k8s-dashboard issuer w/ audience=client_id) and
align oauth2-proxy scope to 'openid email profile groups'.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:09 +00:00
Viktor Barzin
1042c0f082 fix(k8s-dashboard): set RS256 signing_key on Authentik OIDC provider
Provider had signing_key=null → Authentik signed id_tokens with HS256 and
served an empty JWKS, so oauth2-proxy (and the apiserver) failed signature
verification (500 'failed to verify id token signature' on the callback).
Use the same 'authentik Self-signed Certificate' keypair the kubernetes
provider uses.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:09 +00:00
Viktor Barzin
e436af8d8c fix(k8s-dashboard): drop group-restriction policy; RBAC is the gate
The Authentik group policy denied admins: it gated on kubernetes-* group
membership, but cluster access is email-based RBAC (User bindings from
k8s_users), not group-based. vbarzin@gmail.com (Home Server Admins) gets
cluster-admin via oidc-admin-vbarzin but isn't in any kubernetes-* group,
so the gate locked him out. Apiserver RBAC is now the sole gate — matching
the kubelogin CLI (authenticate freely, RBAC decides actions).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:09 +00:00
Viktor Barzin
ad3432d685 docs(k8s-dashboard): dashboard SSO as-built (Option B multi-issuer apiserver)
Update authentication.md (structured multi-issuer AuthenticationConfiguration
+ dashboard SSO flow), multi-tenancy.md (web dashboard access), authentik-state
(new k8s-dashboard app + gheorghe groups), service-catalog (dashboard auth),
and the k8s-version-upgrade runbook (kubeadm wipes --authentication-config →
re-apply rbac post-upgrade). Design/plan addenda record the issuer-constraint
pivot from the original dual-aud approach. [ci skip]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:09 +00:00
Viktor Barzin
c9b22c7dd3 feat(k8s-dashboard): cut over ingress to oauth2-proxy SSO
Dashboard now authenticates via Authentik (oauth2-proxy, k8s-dashboard
issuer) and applies each user's own RBAC via the apiserver multi-issuer
AuthenticationConfiguration. Committed so CI converges (uncommitted local
applies were being reverted by the Woodpecker terragrunt-apply pipeline).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:09 +00:00
Viktor Barzin
ed4ed6bd09 fix(k8s-dashboard): ignore Keel/tier drift on oauth2-proxy deployment 2026-06-05 09:19:09 +00:00
Viktor Barzin
75c2b6dc5e feat(rbac): apiserver multi-issuer OIDC via structured AuthenticationConfiguration
Replace the legacy single --oidc-* flags (which kubeadm v1.34 had wiped,
silently disabling apiserver OIDC) with an apiserver.config.k8s.io/v1
AuthenticationConfiguration trusting BOTH the kubernetes (CLI) and
k8s-dashboard (oauth2-proxy) issuers. Enables per-user RBAC for the
dashboard via SSO while keeping the CLI issuer working. Remote script
health-gates /livez and auto-rolls-back on failure (single master).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:09 +00:00
github-actions[bot]
5b25ce1ec5 priority-pass: bump image_tag to 061a66ad [ci skip]
Auto-committed by ViktorBarzin/priority-pass GHA on push to main.
Source: 061a66ad3b
2026-06-05 09:19:09 +00:00
Viktor Barzin
9c4335025d feat(tripit): linked-email verification (SMTP + confirm carve-out) [ci skip]
Adds outbound mail for linked-email verification: EMAIL_PROVIDER=smtp + SMTP_*
app env (submits via the cluster mailserver as spam@, relayed by Brevo),
SMTP_PASSWORD mapped to the existing PLANS_IMAP_PASSWORD (no new secret), and a
token-gated /api/emails/confirm ingress carve-out (auth=none, like the calendar
feed). Applied locally via scripts/tg.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:09 +00:00
Viktor Barzin
b8c55732e0 feat(k8s-dashboard): deploy oauth2-proxy (not yet wired to ingress)
2 replicas in kubernetes-dashboard ns; OIDC code-flow against the
k8s-dashboard Authentik client, injects user id_token as Bearer upstream
to kong-proxy. ESO syncs client/cookie secrets from Vault. Ingress still
points at kong-proxy — no user-facing change yet.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:09 +00:00
root
7c4375d7cd Woodpecker CI deploy [CI SKIP] 2026-06-05 09:19:09 +00:00
Viktor Barzin
4ed0c5a834 uptime-kuma: codify Traefik LB internal monitor at .203 (was stale .200)
A hand-created (non-TF) uptime-kuma monitor "Traefik LoadBalancer" (id=95)
port-checked 10.0.20.200:443 — the shared LB IP Traefik moved OFF on
2026-05-30 when it took its dedicated .203 (ETP=Local). It had been DOWN
for ~5 days, surfacing as the cluster-health "uptime_kuma internal down(1)"
WARN.

Add it to local.internal_monitors as "Traefik LoadBalancer (10.0.20.203)"
(port 10.0.20.203:443) so it's managed like the TP-Link/Proxmox direct-IP
probes — a direct check of the MetalLB L2 + Traefik bind, complementing the
[External] traefik (full CF path) and Traefik Dashboard (in-cluster)
monitors. The sync CronJob created it (id=902, reporting UP @1ms); the
orphan id=95 was deleted via the uptime-kuma API.
2026-06-05 09:19:08 +00:00
Viktor Barzin
011c63c92d feat(k8s-dashboard): add Authentik OIDC app for dashboard SSO
Confidential client k8s-dashboard + custom scope mapping emitting
aud=[kubernetes,k8s-dashboard] + group-restriction policy (kubernetes-*
RBAC groups). Additive — dashboard ingress unchanged. Token via Vault
secret/k8s-dashboard.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:07 +00:00
Viktor Barzin
549320f79c docs(k8s-dashboard): SSO via Authentik oauth2-proxy — implementation plan [ci skip]
Task-by-task plan: Vault secret, Authentik OIDC app (TF), oauth2-proxy
deploy, ingress cutover with blocking audience-verification gate, docs.
Additive + one revertible ingress repoint.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:07 +00:00
Viktor Barzin
23d87d8885 cluster-health #20: fix false NFS FAIL on Linux (nc -G is macOS-only)
The NFS connectivity check fell through to `nc -z -G 3 192.168.1.127 2049`
when `showmount` is absent (the DevVM ships no nfs-common). But `-G` is a
macOS/Darwin-only connect-timeout flag — OpenBSD/GNU nc on Linux rejects it
with "invalid option -- 'G'", so the elif failed and the check reported
"NFS unreachable" on every Linux run even though port 2049 was wide open
(confirmed via /dev/tcp). All deployment/PVC/statefulset checks were green
throughout — a real PVE NFS outage would have taken down 30+ services.

Fix: use the portable `-w` timeout flag, and add a final bash /dev/tcp
fallback so the probe is correct even on hosts with neither showmount nor a
usable nc.
2026-06-05 09:19:07 +00:00
Viktor Barzin
8b72eaebb0 docs(k8s-dashboard): SSO via Authentik oauth2-proxy — design [ci skip]
Design for letting namespace-owner users (e.g. gheorghe/vabbit81) open the
K8s Dashboard with their Authentik account, mapped to their per-user RBAC.
oauth2-proxy fronts kong-proxy, runs the OIDC code-flow, and injects the
user's id_token as Bearer so the apiserver applies existing namespace-owner
bindings. Additive + one ingress repoint; multi-audience scope mapping
keeps the CLI flow untouched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:07 +00:00
Viktor Barzin
f201e4573e immich: fix slow context search — prewarm clip_index + latency alert/healthcheck
Context (smart) search latency was caused by the 665MB vchord clip_index
decaying out of PG shared_buffers (~33% resident -> ~1.8s cold ANN reads vs
~4ms warm), NOT by yesterday's ML MODEL_TTL/clip-keepalive change (CLIP textual
is warm ~15ms on GPU). The postStart prewarm runs once at pod start and
pg_prewarm.autoprewarm only re-warms at startup, so the index decays under job
buffer-pressure over days.

- clip-index-prewarm CronJob (immich, */5): pg_prewarm('clip_index') keeps the
  whole index resident -> searches stay ~4ms.
- immich-search-probe CronJob (immich, */5): times a random-vector ANN query +
  reads clip_index residency, pushes gauges to the Pushgateway.
- Prometheus alerts ImmichSmartSearchSlow / ImmichClipIndexColdCache /
  ImmichSearchProbeStale (+ inhibition when the probe is stale).
- cluster_healthcheck.sh check #46 check_immich_search (TOTAL_CHECKS 45->46).
- Docs: infra CLAUDE.md immich note, monitoring.md, cluster-health skill.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:07 +00:00
Viktor Barzin
38c77048fd chore(travel-agent): decommission — merged into tripit [ci skip]
travel-agent's transport-to-airport + weather-brief workflows now run inside
tripit (DB-driven instead of CalDAV), so the standalone CronJob stack is
retired (namespace + ExternalSecret + 2 CronJobs destroyed via scripts/tg).
secret/travel-agent left in Vault as an archive. Applied locally.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:07 +00:00
Viktor Barzin
be4ee7315a feat(tripit): proactive-nudge CronJobs (transport + weather brief) [ci skip]
Merge travel-agent's two workflows into tripit (beads code-muqi): adds
tripit-transport-nudge (08:00) + tripit-weather-brief (21:00) CronJobs on
Europe/London, an optional per-job timezone, and SLACK_BOT_TOKEN +
DAWARICH_API_KEY in the tripit-secrets ExternalSecret. Nudges post to Slack
#travel + Web Push; Dawarich location via the public host (in-cluster *.svc
is 403'd by Rails host-auth). Vault secret/tripit seeded from
secret/travel-agent + secret/owntracks. Applied locally via scripts/tg.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:06 +00:00
Viktor Barzin
a2fa912b44 cluster-health: add check #45 — HA Sofia Status Dashboard
Mirrors the verdict of emo's curated Барзини → Статус Lovelace view
(dashboard-barzini / path 'status', 8 sections, ~43 mushroom-template
cards). Pulls the dashboard config via the HA WebSocket API (one-shot,
shared cache), batch-renders every card's secondary Jinja template
against /api/template in a single POST, and classifies the rendered
text per card:

  FAIL — contains "Offline" / "Disconnected" / "Разкачен" / "— No data"
  WARN — contains "⚠️" / "Abnormal" / "Trouble (" / "(ниска)" /
         "Пълен резервоар" / "Грешка" / "attention" / "Внимание"

Roll-up is a single check with a per-section breakdown
(Сигурност 0F/0W/4P; Мрежа 0F/1W/10P; …). On WARN/FAIL the non-quiet
non-JSON path lists each offending card with its rendered status line.

Verified live against ha-sofia: 2 offline devices (Пералня, Гардероб
спалня) and 1 degraded (NAS_Barzini volume attention, 7% free) surfaced
correctly in both human and JSON output.
2026-06-05 09:19:06 +00:00
Viktor Barzin
98f29edf34 technitium: CoreDNS rewrite forgejo.viktorbarzin.me -> Traefik ClusterIP
In-cluster pods resolved forgejo.viktorbarzin.me to the public IP
(176.12.22.76) and hairpinned out through the WAN gateway, intermittently
timing out buildkit pushes from Woodpecker build pods (which, unlike
kubelet, don't use the per-node containerd Forgejo mirror). This silently
failed CI build-and-push for Forgejo-hosted repos (recruiter-responder
pipelines #15-#18 at the push step).

Add a CoreDNS `rewrite name exact forgejo.viktorbarzin.me
traefik.traefik.svc.cluster.local` so pods resolve to the Traefik ClusterIP
(reachable in-cluster, unlike the ETP=Local LB .203; the Service-name target
auto-tracks the ClusterIP so it can't rot on a Traefik renumber). Traefik's
*.viktorbarzin.me wildcard keeps SNI/TLS valid. Makes the per-pod
woodpecker-server hostAlias belt-and-suspenders.

Applied via targeted apply (coredns ConfigMap only, to avoid reconciling 7
unrelated pre-existing drifts in the stack) + verified:
- pod resolves forgejo.viktorbarzin.me -> 10.111.111.95 (Traefik ClusterIP)
- recruiter-responder pipeline #20 build-and-push succeeds via ClusterIP

Docs: networking.md (K8s cluster DNS path) + .claude/CLAUDE.md (forgejo
registry quick-ref). Advances beads code-yh33.

[ci skip]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 07:34:30 +00:00
Viktor Barzin
7302cd7908 infra: untrack generated backend.tf (stale PG creds + .200 literal) [CI SKIP]
terragrunt generates backend.tf per run (remote_state generate,
if_exists=overwrite_terragrunt) from get_env("PG_CONN_STR"); these 72 committed
copies are stale artifacts already covered by .gitignore:65. They held a
plaintext (Vault-rotated, ~expired) PG password + the .200 state-backend literal
and were re-committed by CI on every run. git rm --cached stops that; they
regenerate locally from PG_CONN_STR. The live .200:5432 literal now lives only
in scripts/tg (its single bootstrap source).

Part of the L4 LB-IP review (docs/plans/2026-06-03-lb-ip-hygiene-design.md).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 10:52:46 +00:00
Viktor Barzin
7d7a0ad474 infra: fix stale Traefik LB-IP refs + accurate LB-IP registry
Some checks failed
ci/woodpecker/push/build-cli Pipeline was successful
ci/woodpecker/push/default Pipeline was canceled
Part of the L4 LB-IP review (docs/plans/2026-06-03-lb-ip-hygiene-design.md).
The 2026-05-30 Traefik .200->.203 move left consumers pointing at the dead
.200; this fixes the two in-Terraform ones and replaces the stale networking
doc with an accurate registry + a renumber checklist.

- woodpecker: forgejo.viktorbarzin.me hostAlias hardcoded 10.0.20.200
  (.200:443 refuses TLS now; the next woodpecker apply would re-pin it and
  break pipeline creation). Now reads the Traefik ClusterIP dynamically via a
  kubernetes_service data source -- cannot rot on a future renumber and avoids
  the ETP=Local hairpin trap.
- monitoring: ViktorBarzinApexDrift alert summary said "expected 10.0.20.200"
  -> 10.0.20.203 (cosmetic; alert logic already correct).
- docs/architecture/networking.md: rewrote the MetalLB section (it wrongly had
  KMS on .200, mailserver on a LB IP, "two dedicated") into an accurate 4-IP
  registry + LB-IP renumber checklist (in-band + out-of-band consumers).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 10:24:25 +00:00
Viktor Barzin
dcb7c74531 url/shlink: fix admin UI — pin shlink-web-client 4.7.1 + port 8080
The shlink-web admin SPA (shlink.viktorbarzin.me) showed "Something went
wrong while loading short URLs". Root cause: the web client was untagged
(:latest) and Keel's 2026-05-26 match-tag rewrite downgraded it to the
ancient 0.1.1 (2019 image), which speaks the removed /rest/v1/authenticate
API (404) and serves nginx on port 80. Backend (shlink:5.0.2) was healthy.

Pin shlink-web-client to 4.7.1 (current stable; :latest/:stable resolve to
it) and align container port + both probes + service target_port to 8080
(the port the 4.x nginx listens on). A clean semver anchor can no longer be
Keel-downgraded to 0.1.1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 10:24:25 +00:00
Viktor Barzin
c7cf21a986 Revert mail LAN-redirect approach; pending VIP-based redesign
The pfSense NAT rdr rules added in f7cf9f07 hardcoded 10.0.20.203
(Traefik LB IP) as the redirect source. That couples mail's LAN
path to Traefik's IP choice — if Traefik moves again (it just
moved .200 → .203 on 2026-05-30), the mail path silently breaks.

Removing the script and the matching doc paragraph; keeping the
networking.md .200 → .203 staleness fix (separate correction).

Follow-up: give the mail HAProxy listener a dedicated pfSense
Virtual IP (IP Alias on opt1), update Technitium internal zone
+ WAN port-forwards to target the VIP, so mail's LAN-side path
is decoupled from any other service's LB IP.
2026-06-03 10:24:25 +00:00
Viktor Barzin
922d95af9c Reapply "tripit: Gmail ingest (12-month) + vbarzin owner + plans@ forward-to-parse"
This reverts commit a82ba46ad83e85a231d839564c2f009c700dc4d1.
2026-06-03 10:24:25 +00:00
Viktor Barzin
f0843e398b Revert "tripit: Gmail ingest (12-month) + vbarzin owner + plans@ forward-to-parse"
This reverts commit 4cc9229e716b6683418a148a0f896442d5ab07ad.
2026-06-03 10:24:25 +00:00
Viktor Barzin
0c7ec3d470 tripit: Gmail ingest (12-month) + vbarzin owner + plans@ forward-to-parse
Reconciles the tripit stack source with live state and adds the forward
flow. Ingest now polls vbarzin@gmail.com [Gmail]/All Mail read-only over a
rolling 12-month X-GM-RAW travel-sender window (Croatia Jet2 refs excluded),
filing trips under MAIL_DEFAULT_OWNER_EMAIL=vbarzin@gmail.com (Viktor's
Authentik login identity). Adds an ingest-plans CronJob that polls spam@
filtered to To:plans@viktorbarzin.me (the @viktorbarzin.me catch-all target)
so forwarded bookings are extracted and attached to the matching trip;
IMAP_PASSWORD is overridden per-job to spam@'s creds (PLANS_IMAP_PASSWORD).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 10:24:25 +00:00
Viktor Barzin
fd35c4f303 pfSense: LAN-side NAT redirect for mail ports landing on Traefik LB IP
Technitium's split-horizon rewrites *.viktorbarzin.me to 10.0.20.203
(Traefik LB) for the 192.168.1.0/24 Barzini WiFi (TP-Link router has
no hairpin NAT). The rule is name-agnostic so mail.viktorbarzin.me
(and imap./smtp.) get sent to .203 too — where Traefik does not
listen on 25/465/587/993. iOS Mail on Barzini WiFi silently hangs
while Roundcube (port 443 via Traefik) keeps working.

Adds pfSense NAT rdr rules so traffic to 10.0.20.203:{25,465,587,993}
gets redirected to 10.0.20.1 (the mail HAProxy listener already
serving the public path). Loaded on every incoming interface by
pfSense rule generation, so any LAN/VPN client falling into the
split-horizon answer lands on the right service unchanged.

Includes idempotent reproducer script (mirrors the existing
pfsense-haproxy-bootstrap.php pattern) and the networking.md
mail carve-out paragraph plus the stale .200 → .203 reference.
2026-06-03 10:24:25 +00:00
Viktor Barzin
ff26d1c957 openclaw: give recruiter-api plugin the Telegram bot token so it can announce
The recruiter-api plugin's announceEvent() sends recruiter cards to Telegram
via OPENLOBSTER_CHANNELS_TELEGRAM_TOKEN (its fallback path, since OpenClaw
doesn't pass api.bot to "kind: tools" plugins). That env was never set in the
container, so every hourly poll threw on the send, events were never marked
consumed, and no Telegram notification ever went out — the rest of the
"recruiter pipeline has no responses" problem (the GPU/triage half was fixed
separately). Wire it from openclaw-secrets.telegram_bot_token (same token as
channels.telegram.botToken). Verified: the 3 backlogged events were announced
+ consumed on the openclaw restart.

Drafting (the /api/draft 500 that also degraded the cards) was fixed in
parallel by swapping Vault secret/recruiter-responder gpt_mini_model from the
slow/timing-out qwen3-coder-480b to meta/llama-3.3-70b-instruct (~1.6s).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 10:24:25 +00:00
root
c85533d2d9 Woodpecker CI deploy [CI SKIP] 2026-06-03 10:24:25 +00:00
Viktor Barzin
982dc9e63a openclaw: task-webhook ingress auth required->none (inbound Forgejo webhook)
The task-webhook host is an inbound webhook receiver: Forgejo (a machine
with no Authentik SSO cookie) POSTs issue/comment events, so forward-auth
302-bounced every delivery and silently dropped all webhooks. Flip only
this ingress to auth=none; the do_POST handler gates on payload action +
bot-user filtering. Gateway (openclaw) and openlobster stay auth=required.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-03 10:24:25 +00:00
root
91d110acf5 Woodpecker CI deploy [CI SKIP] 2026-06-03 10:24:24 +00:00
Viktor Barzin
fde2d19bf7 trading-bot: ingress auth required->app (app has own WebAuthn/JWT)
The app ships complete auth — WebAuthn/passkey (RP_ID=trading.viktorbarzin.me)
+ JWT bearer on every /api/* route + a /ws?token=<JWT> WebSocket. Authentik
forward-auth on / was 302-bouncing the WebAuthn XHR flow and the WS upgrade,
making the app unusable. Flip to auth = "app" so the backend's own auth is the
gate (same-origin SPA + bearer-token API, same pattern as immich). Verified all
11 route modules enforce Depends(get_current_user) and dev_mode defaults False
before flipping.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-03 10:24:24 +00:00
Viktor Barzin
e18e0d51a0 uptime-kuma: public status pages + push monitors bypass Authentik
The single uptime ingress gated the ENTIRE site (path "/") behind
Authentik forward-auth, so public-by-design endpoints 302-bounced to
SSO: status pages (/status/<slug>), push-monitor ingest
(/api/push/<key>), status-page API + heartbeat (/api/status-page),
badges (/api/badge), and static assets. Status pages are for
logged-out viewers and push monitors POST from machines — neither can
follow the Authentik OAuth cookie dance, so all were broken.

Fix mirrors the meshcentral agent carve-out (9a15f3f2): add a second
path-scoped ingress_factory (auth="none") pointing at the same
uptime-kuma Service. Traefik routes longest-rule-first, so these
out-prioritise the "/" catch-all; the dashboard (/, /dashboard,
/manage-*, /settings, etc.) stays Authentik-gated via the original
ingress. WebSocket status UI keeps working — the default middleware
chain passes Upgrade/Connection through.

Verified: /status/infra, /api/status-page/{,heartbeat/}infra,
/api/badge no longer 302 (200); / still 302s to authentik.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-03 10:24:24 +00:00
root
17f91f6167 Woodpecker CI deploy [CI SKIP] 2026-06-03 10:24:24 +00:00
Viktor Barzin
bc5aba34b6 meshcentral: fix agent connectivity behind Authentik + TLS-offload Traefik
Two root causes kept all 8 mesh agents (incl. family laptops) offline:

1. The single ingress gated the ENTIRE site (path "/") behind Authentik
   forward-auth, so the agent/relay endpoints (/agent.ashx, /meshrelay.ashx,
   /control.ashx, etc.) got 302-bounced to SSO. Native mesh clients can't do
   the OAuth cookie dance. Fix: add a second ingress_factory (auth="none")
   path-scoped to the agent endpoints, pointing at the same meshcentral
   service. Traefik routes by rule length so these out-prioritise the "/"
   catch-all; the human web UI stays Authentik-gated.

2. After the auth fix, agents reached /agent.ashx but were rejected with
   "Agent bad web cert hash" — MeshCentral pins the OUTER TLS cert, but with
   TLS offload the agent sees Traefik's Let's Encrypt cert (which differs
   between the internal .203 LB and the external Cloudflare path, and rotates
   monthly), not MeshCentral's own webserver cert. Fix: set
   ignoreAgentHashCheck=true in the init-container config so MeshCentral
   echoes back the agent-reported hash. The separate mesh-certificate
   (ServerID) handshake still authenticates the server.

Verified: agent paths no longer 302->authentik; web UI root still does;
laptop "Valia_Laptop" enrolled in group "laptops" and ONLINE.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-03 10:24:24 +00:00
Viktor Barzin
01ea7d6fa1 immich: clip-keepalive CronJob to pin smart-search model warm
MACHINE_LEARNING_MODEL_TTL=600 is a single global knob, so it unloads the
CLIP textual (smart-search) encoder after idle exactly like OCR/face —
immich has no per-model pin. This CronJob pings the textual encoder every
5 min (< the 600s TTL) via immich-ml /predict, so a search query never
pays the ~1.5s cold-load, while idle OCR/face still free their VRAM on the
shared T4. Textual-only (search = text->embedding->pgvector); the visual
encoder is import-time and left to unload. curl baked into the image (no
runtime install).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 10:24:24 +00:00
Viktor Barzin
f0948493b3 claude-agent-service: wire parallel execution (git-crypt mount, memory, MAX_CONCURRENCY)
The service now runs agent calls concurrently (bounded semaphore, per-job
isolated clones) instead of single-flight. Infra side:
- mount git-crypt-key into the main container (each job re-unlocks its own clone)
- MAX_CONCURRENCY=10 env (excess calls queue FIFO)
- bump pod memory 2Gi req / 12Gi limit, cpu req 1 (Burstable, tier-aux) — sized
  for ~10 concurrent claude+terraform runs; fits node2/3/5 headroom
- docs: beads-auto-dispatch + automated-upgrades no longer describe single-slot

Service code: viktor/claude-agent-service @ 66104a3.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 10:24:24 +00:00
Viktor Barzin
16763464cd job-hunter dashboard: role panels now respect the $location filter
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
The role panels (Top roles, Top companies by role volume, New roles/day,
Roles by source, Salary distribution) had no location filter, so they showed
all locations regardless of the $location dropdown. Add
'primary_location IN (${location:sqlstring})' to each (matching the comp
panels' pattern). Also switch the 'Your comp vs the market' panel from
hardcoded 'london' to the same $location filter for consistency. Data was
fine (all london-tagged roles genuinely contain 'london').

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 23:35:25 +00:00
Viktor Barzin
7a7abe4cbe uk-payslip dashboard: count gross comp on taxable_pay (P60) basis
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
The 'Yearly receipt' + 'YTD gross salary' panels summed salary+bonus+rsu_vest
(rsu_vest = net/partial RSU), understating gross by ~£73k/yr. Switch to
COALESCE(taxable_pay, gross_pay) + pension_sacrifice = true P60 gross (verified:
23/24 -> £286,288, 25/26 -> £416,646, matching the P60 + job-hunter realized
bar). 'Yearly receipt' rsu_gross is now the real gross RSU (£150k/£271k, not
£70k/£128k). Relabel the Sankey RSU inflow 'RSU (net vested)' for honesty;
leave cash-flow/net_pay + the (taxable_pay-based) reconciliation/rate panels.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 23:23:15 +00:00
Viktor Barzin
aa0d6511b2 job-hunter runbook: document two self baselines + taxable_pay gotcha
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
Dashboard now shows two 'Me' bars: realized gross (~£409k, from
SUM(payslip taxable_pay) = P60 basis) and package/grant-value (~£267k,
levels.fyi-comparable). Document that gross MUST come from taxable_pay, NOT
salary+bonus+rsu_vest (rsu_vest is net/partial, understates RSU ~50%).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 23:13:35 +00:00
Viktor Barzin
50a4ad70f0 job-hunter runbook: self-comp re-seed stores full TC breakdown
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
total_value (what the comparison bar uses) must be full TC; document storing
base+bonus+RSU components too so it's verifiable that RSU+bonus are included.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 22:23:42 +00:00
Viktor Barzin
deb0dd4778 monitoring: "Your comp vs the market" panel on Job Hunter dashboard
Add a barchart (panel 10) ranking every company's London p50 total comp
(COALESCE total/base) with the user's current comp shown in line, so it's a
direct "how do I compare" view. The user's figure is NOT hardcoded in the
dashboard JSON — it's a labeled comp_point in the DB (company_slug
'self-current', source 'self', "Me (Meta IC5)"), keeping the sensitive number
out of git. It's below the £500k alert bar (no Slack ping) and ranks too low
to appear in analyze leaders. Runbook documents the panel + how to update the
baseline.

[ci skip] — dashboard ConfigMap applied locally (targeted).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 21:27:26 +00:00
Viktor Barzin
74313149dd job-hunter: weekly above-target Slack alert CronJob
Add job-hunter-alert CronJob (Sundays 05:00 UTC, an hour after the refresh):
`python -m job_hunter alert --threshold 500000 --location london --slack`
posts to Slack the companies whose London p50 total comp >= £500k, flagging
any that newly crossed since last week's snapshot. SLACK_WEBHOOK_URL wired via
the job-hunter-secrets ExternalSecret from Vault secret/job-hunter
slack_webhook_url (seeded from the shared workspace webhook; repointable to a
dedicated channel). Runbook gains an "above-target Slack alert" section.

[ci skip] — applied locally (stack-scoped).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 20:49:42 +00:00
Viktor Barzin
5dc5cd53c0 url/shlink: ingress url.viktorbarzin.me auth required -> none
Some checks failed
ci/woodpecker/push/build-cli Pipeline was successful
ci/woodpecker/push/default Pipeline was canceled
Authentik forward-auth on the shlink REST API + short-link domain
(url.viktorbarzin.me) 302s shlink-web's cross-origin API XHR (CORS
preflight) and SSO-bounces every public short link. Result: the admin
UI showed "Something went wrong while loading short URLs" and short
links never resolved for logged-out clients.

The shlink REST API is self-gated by its X-Api-Key and short links are
public by design, so Authentik must not front this domain. CrowdSec +
rate-limit + anti-AI bot-block still apply. The admin web UI
(shlink.viktorbarzin.me) stays auth=required via module.ingress-web.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-02 20:37:33 +00:00