Commit graph

4519 commits

Author SHA1 Message Date
Viktor Barzin
6f3cfb18c7 state(external-secrets): update encrypted state 2026-06-23 09:41:46 +00:00
Viktor Barzin
6e8e066215 state(external-secrets): update encrypted state 2026-06-23 09:40:14 +00:00
Viktor Barzin
de1fb04d9f state(external-secrets): update encrypted state 2026-06-23 09:39:12 +00:00
Viktor Barzin
606cfdb544 state(external-secrets): update encrypted state 2026-06-23 09:38:12 +00:00
Viktor Barzin
72464e7880 state(external-secrets): update encrypted state 2026-06-23 09:37:11 +00:00
Viktor Barzin
7061b1dfc6 state(external-secrets): update encrypted state 2026-06-22 20:55:27 +00:00
Viktor Barzin
e2f328ff4a state(external-secrets): update encrypted state 2026-06-22 20:45:24 +00:00
Viktor Barzin
a735be9ba4 state(external-secrets): update encrypted state 2026-06-22 20:45:08 +00:00
Viktor Barzin
c670cb7118 eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1
Some checks failed
ci/woodpecker/push/default Pipeline failed
The API rewrite half of the ESO 0.12->2.6 migration (last k8s-1.35 compat-gate
blocker). Done on chart 0.16.2, which serves BOTH external-secrets.io/v1beta1
and v1, so this is the safe window — MUST land before 0.17 removes v1beta1
(there is no conversion webhook). Pure apiVersion bump, schema is byte-identical:
106 occurrences (104 ExternalSecrets + 2 ClusterSecretStores vault-kv/vault-database)
across 73 .tf files, v1beta1 -> v1, no other field changes.

Validated live first on tandoor (single, non-coupled, synced ES): the
kubernetes_manifest apiVersion bump forces a REPLACE; the target Secret is
cascade-GC'd for ONE ~0.3s poll then ESO recreates it (identical value re-synced
from Vault, new UID) and the ES returns SecretSynced=True on v1. Running pods
keep their mounted copy through the sub-second blip. All 110 target Secrets were
snapshotted to /tmp first as a backstop.

CI applies the changed stacks serially (staged rollout); watching aggregate ES
sync back to 108 synced (2 pre-existing dead: instagram-poster, payslip-ingest).
Next: Phase 3 climb 0.16.2 -> 2.6.0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 19:13:04 +00:00
Viktor Barzin
98cd535b97 authentik: lock chrome.viktorbarzin.me noVNC to Viktor only
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The chrome-service noVNC exposes Viktor's live logged-in browser sessions
(Instagram etc. — he'll sign in there for homelab browser to reuse). It was
auth="required" = any authenticated user, and "Home Server Admins" includes emo
(emil.barzin@gmail.com), so the admin group is not a sufficient gate. Add a
host-specific case to the domain-wide forward-auth restriction allowing only
Viktor's accounts (vbarzin@gmail.com + akadmin break-glass); everyone else,
incl. emo, is denied at the noVNC. emo's AGENT already can't reach the browser
(read-only RBAC blocks port-forward); this closes the human noVNC path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 18:09:27 +00:00
Viktor Barzin
a3cdc0d6d0 chrome-service: size headed Chrome window to fill Xvfb (noVNC cut-off)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The noVNC view showed the browser in the top-left with the rest of the
framebuffer black. Cause: Chrome launched with no --window-size, and there's no
window manager, so it opened at its profile-persisted (smaller) size inside the
1280x720 Xvfb. Add --window-size=1280,720 --window-position=0,0 so the window
fills the screen on every launch (fresh pods/profiles too). Live windows were
already resized via CDP as a stopgap.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 18:00:20 +00:00
Viktor Barzin
c7ead032ec chrome-service: fix noVNC stuck-"Connecting" (x11vnc fd-sweep under nofile=2^31)
Some checks failed
ci/woodpecker/push/default Pipeline was successful
Build chrome-service-novnc / build (push) Has been cancelled
The noVNC view hung on "Connecting" forever then timed out. Root cause: x11vnc
sweeps the entire fd table (fcntl per fd) on every client connection, and
containerd grants pods RLIMIT_NOFILE=2^31, so the RFB handshake never completes
(websockify accepts the WS and dials localhost:5900, but x11vnc never sends its
banner — verified: handshake timed out at 8s, x11vnc had burned 1h41m CPU
spinning). Same bug + fix the android-emulator stack already carries.

Cap nofile before x11vnc starts, in two places:
- files/novnc/entrypoint.sh: `ulimit -n 65536` (root fix, makes the image correct)
- main.tf novnc container: `command = ["bash","-c","ulimit -n 65536; exec /entrypoint.sh"]`
  so the cap applies deterministically on rollout even though the image is
  :latest/IfNotPresent (a rebuilt entrypoint isn't guaranteed to be re-pulled).

Also documents the gotcha + diagnosis in docs/architecture/chrome-service.md and
notes the black-when-idle behaviour + the autoconnect URL.

(A live x11vnc relaunch with the cap already unblocked the running pod; this
makes it survive restarts.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 17:34:03 +00:00
Viktor Barzin
20ca5ee624 tripit: REEL_PROVIDER=anonymous — actually fetch reels (was fake canned caption)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
REEL_PROVIDER was unset, so the reel pipeline used FakeReelExtractor, which returns
a CANNED caption — every pasted (tripit #120) or forwarded reel produced a DUMMY
Saved Place instead of reading the real reel. Set REEL_PROVIDER=anonymous in app_env
(covers the web Deployment + the ingest CronJob) so AnonymousReelExtractor does the
real anonymous read. Verified live from the cluster: yt-dlp fetched a real IG /p/
caption (no IG_GRAPHQL_DOC_ID needed — the internal-API path is an optional
optimisation; yt-dlp fallback works). LLM extraction + Nominatim POI geocoding were
already real (prior commits); this was the last fake link in the chain.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 17:30:47 +00:00
Viktor Barzin
f46b69f372 tripit: enable real LLM + Nominatim on the web Deployment (in-app reel paste #120)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The web Deployment ran LLM_MODE=fake with no reel geocoder — only the ingest-plans
CronJob had real providers. The in-app reel-URL paste feature (tripit #120) runs
ingest_reel IN the web pod (BackgroundTask), so the Deployment now needs real
extraction: LLM_MODE=llamacpp (qwen3vl-8b; qwen3-8b segfaults on the current
llama-swap image) with the ADR-0033 claude-agent-service fallback, plus
REEL_GEOCODER_PROVIDER=nominatim for venue->city/country POI geocoding. Set in
app_env (feeds the Deployment; the CronJobs already had these via extra_env). Bonus:
this also un-fakes the in-app booking *share* import, which used the same fake LLM.
MAIL_INGEST_ENABLED stays false on the Deployment (only the CronJob polls mail).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 16:50:04 +00:00
Viktor Barzin
59f2070e56 tripit: switch mail-ingest LLM_MODEL qwen3-8b -> qwen3vl-8b (qwen3-8b segfaults)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The qwen3-8b GGUF segfaults on load on the current llama-swap :cuda image
("common_init_from_params: failed to create context"; llama-swap returns 502),
which broke ALL tripit mail ingest text extraction — booking emails AND forwarded
reels (status=failed, "no place could be read"). The GGUF isn't corrupt (valid
header, full size, worked for weeks) — it's a llama.cpp/image regression. Rather
than pin the SHARED llama-swap image (cross-user blast radius), repoint the
ingest-plans CronJob at qwen3vl-8b, an already-provisioned 8B model that loads
fine and extracts flight numbers + places reliably. Restores the auto-path
(reels resolve via the Nominatim geocoder; bookings parse again). The broken
qwen3-8b GGUF is a separate, non-urgent llama-cpp cleanup.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 15:52:09 +00:00
Viktor Barzin
7dbbb74163 homelab v0.8.1: frame browser as escalation (default headless), match CLAUDE.md
Some checks failed
ci/woodpecker/push/default Pipeline was successful
Build infra CLI / build (push) Has been cancelled
Make `homelab browser --help` and chrome-service.md state the same tiered rule
now in ~/code/CLAUDE.md: default to the Playwright MCP/headless browser for all
routine automation; reach for `homelab browser` ONLY when headless is blocked
(loads-but-submit-fails / one request errors while siblings 200 / explicit bot
wall). Removes the "co-equal choice" framing so agents have one non-conflicting
instruction. Adds a test asserting the tiered wording so it can't regress.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 15:44:43 +00:00
Viktor Barzin
f96cde35bd tripit: enable Nominatim POI geocoding for reel→Wishlist ingest
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Forwarded reels (tripit ADR-0031) geocode their venue to map a Saved Place to a
country + city, but the reel route was wired to the global geocoder, which here is
GEOCODER_PROVIDER=openmeteo (city-level, name-based). OpenMeteo returns nothing for
a venue query like "Time Out Market, Lisbon" so reels never resolved and no Saved
Place was created. The app fix (tripit 3c62d596) gave the reel route its own
geocoder behind REEL_GEOCODER_PROVIDER; set it to nominatim on the ingest-plans
CronJob (the only one running the reel route) so forwarded reels resolve to real
venue coords + city + country. Isolated from the global geocoder, which stays
openmeteo for weather/tours. Verified Nominatim resolves the venue from the cluster.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 14:59:37 +00:00
Viktor Barzin
a6b52a5839 homelab v0.8.0: browser verbs for headful anti-bot web automation
Some checks are pending
Build infra CLI / build (push) Waiting to run
ci/woodpecker/push/default Pipeline was successful
Add `homelab browser run|open` so agents can drive the cluster's headful
Chrome (chrome-service) over CDP from the devvm. The headless playwright/mcp
browser can load anti-bot sites and fill their forms, but the gated submit
silently fails — e.g. the Stirling Ackroyd Fixflo tenant portal returned
net::ERR_FILE_NOT_FOUND on its pre-submit check and hung, creating nothing.
Driving the real headful Chrome submits first try. That capability already
existed but was undiscoverable, so it cost ~40 min + redundant form re-runs to
find; now it is one command, versioned, test-covered, and `browser --help`
carries the when-to-use signature + an error-code cheat-sheet so the right tool
is reached at the right moment (the failure was judgment, not setup).

- port-forward svc/chrome-service:9222 (tunnels API-server->pod, so it bypasses
  the :9222 NetworkPolicy), assert non-headless via /json/version,
  connect_over_cdp, inject the same vendored stealth.js the in-cluster callers
  use; the port-forward is always torn down, on success and on error.
- node CDP client pinned to playwright-core@1.48.2 to match the v1.48.0-noble
  image (Chromium 130); self-provisioned lazily into ~/.cache/homelab, no
  per-user setup.
- default is a fresh incognito context (safe for the shared browser + concurrent
  callers); --shared-context reuses the warmed persistent profile.
- TDD: cmd_browser_test.go covers arg parsing, headless detection, the version
  pin, the help cheat-sheet, and a stealth.js drift guard. Verified end-to-end
  against bot.sannysoft.com (real Chrome UA, webdriver hidden, plugins/WebGL
  spoofed) and `browser open`.
- docs: README v0.8 section, ADR-0013, and a chrome-service.md "driving from
  outside the cluster" section.

Closes: code-nepg

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 12:22:22 +00:00
Viktor Barzin
de163aa6af workstation: switch devvm OOM backstop from systemd-oomd to earlyoom
All checks were successful
ci/woodpecker/push/postmortem-todos Pipeline was successful
ci/woodpecker/push/default Pipeline was successful
The systemd-oomd backstop added in the previous commit is INERT on this box.
oomd's memory-pressure kill only acts on cgroups doing active reclaim (pgscan
rising); with MemorySwapMax=0 + anonymous agent memory there is nothing to
reclaim, so pgscan stays 0 and oomd never fires. Proven live: a cgroup held at
96-99% memory.pressure for >70s with pgscan=0 was never killed (oomctl + balloon).
The very swap=0 that kills the IO storm also neuters oomd.

Replace it with earlyoom, which watches free RAM (MemAvailable%) and is
swap-independent: SIGTERM the biggest task at 5%, SIGKILL at 3%, swap ignored
(-s 100). It --avoids sshd/systemd/dockerd/containerd/t3-dispatch/tmux (the
admin's way in always survives) and --prefers the agent/browser hogs. Verified
via --dryrun: fires on the RAM threshold and selects a chrome process, not a
protected daemon.

The per-cgroup caps (MemoryHigh=12G/MemoryMax=16G/MemorySwapMax=0 per user,
docker.slice 8G) are unchanged and remain the PRIMARY guard — earlyoom is the
aggregate net for the rare all-users-maxed case. systemd-oomd purged; its config
+ ManagedOOM drop-ins removed. Post-mortem updated with the finding.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 10:39:16 +00:00
Viktor Barzin
3a59f4a8bf workstation: per-user memory caps + systemd-oomd backstop on devvm
All checks were successful
ci/woodpecker/push/postmortem-todos Pipeline was successful
ci/woodpecker/push/default Pipeline was successful
The shared devvm keeps overloading and had to be hard-killed again today
(2026-06-22): a runaway in one user's ssh/tmux session (a 10G ugrep, plus
stacked max-effort agents) grew unbounded, spilled into the disk swap, and
swap-thrashed the throttled virtual disk into an IO storm until the box wedged.

Root cause: ssh/tmux work runs under user-<uid>.slice, left memory-uncontained
by the explicit 2026-06-10 "swap-only" decision, while only the t3-serve tree
was capped. So one user could starve everyone.

This bounds every user on BOTH trees (MemoryHigh=12G, MemoryMax=16G,
MemorySwapMax=0 so work OOMs locally at its ceiling instead of thrashing swap),
adds a systemd-oomd PSI backstop that sheds the single worst work cgroup under
box-wide pressure while leaving system.slice (sshd/services/your way in)
protected, gives system.slice a fair-share CPU/IO priority edge, and routes
docker containers into a capped, oomd-policed docker.slice so they can't dodge
the caps or mis-target oomd. All durable in setup-devvm.sh so a VM rebuild
reproduces them; systemd-oomd added to packages.txt.

Applied live and verified: oomctl shows the backstop armed (not dry-run) on the
work slices with system.slice protected; a capped-balloon stress test OOM-killed
locally at the ceiling with swap flat (no thrash).

Post-mortem: docs/post-mortems/2026-06-22-devvm-mem-io-overload-containment.md

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 10:25:09 +00:00
Viktor Barzin
2169e0de5f workstation: harden memory hooks — prune dead plugin refs + homelab-CLI-only store
All checks were successful
ci/woodpecker/push/default Pipeline was successful
wire-memory-hooks.py now PRUNES any settings.json hook still pointing at the
retired claude-memory plugin (plugins/claude-memory/hooks/) before the additive
pass. install_memory() rm -rf's that dir, so those entries are dangling — and a
missing UserPromptSubmit hook exits 2, a BLOCKING error that erases the prompt
and froze emo's sessions (2026-06-22). The plugin shares basenames with the
homelab hooks, so the old additive-only logic saw the dead plugin path as
"already present" and skipped installing the real ~/.claude/hooks/ copy; pruning
first fixes that. Verified against emo's exact original config: yields the
correct 4-hook set, drops the dead PermissionRequest entry, idempotent on rerun.

auto-learn.py now stores via the `homelab memory` CLI only — dropped the direct
HTTP path and the local-SQLite fallback (memory is homelab-CLI-only per Viktor;
never local files). No-ops silently when no API key is in env (e.g. ancamilea).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 09:24:42 +00:00
Viktor Barzin
aeed461591 Revert "feat(monitoring): Tempo + OTel Collector for tripit tracing, hardened (ADR-0032 Phase 2)"
All checks were successful
ci/woodpecker/push/default Pipeline was successful
This reverts commit 1595bddfc2.
2026-06-22 08:31:17 +00:00
Viktor Barzin
1595bddfc2 feat(monitoring): Tempo + OTel Collector for tripit tracing, hardened (ADR-0032 Phase 2)
Some checks failed
ci/woodpecker/push/default Pipeline failed
Re-land Phase 2 after the first attempt's two failure modes, both fixed:
- tempo.resources set under the correct single-binary chart key (was OOMKilled on
  the namespace LimitRange default when mis-placed at top level).
- atomic=true + cleanup_on_fail=true on BOTH helm releases — a failed install
  auto-rolls-back instead of leaving a stuck/orphaned release (memory #6479).

Tempo (single-binary, proxmox-lvm 20Gi, 30d) + OTel Collector (contrib; otlp ->
redaction -> batch -> tempo) + Tempo datasource + additive trace_id->Tempo
derivedField on Loki + tripit LOG_FORMAT=json/OTEL_EXPORTER_OTLP_ENDPOINT.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 08:17:59 +00:00
Viktor Barzin
a0897de7c3 workstation: document homelab-memory hooks + provisioner self-deploy [ci skip]
multi-tenancy.md never mentioned the homelab-memory hooks rollout and still
listed claude_memory credential injection as purely "future". Document what is
actually true now: install_memory provisions the recall/auto-learn/compaction
hooks per user, the provisioner binary self-deploys from the repo (step 0), the
set -e abort fix, and that the hooks no-op without a MEMORY_API_KEY in env (CLI
defaults the URL) — emo has a key, ancamilea is keyless until one is minted.
Also clarify setup-devvm.sh's binary install is now bootstrap-only (ongoing
edits self-deploy).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 08:04:38 +00:00
Viktor Barzin
92f35550f2 workstation: self-deploy t3-provision-users from the repo each reconcile [ci skip]
Root cause of emo's lost memory: nothing redeployed /usr/local/bin/t3-provision-users
except the manual setup-devvm.sh, so the homelab-memory rollout (44562535/9aa2438e,
Jun 21) sat committed-but-undeployed for a day — the hourly reconcile kept running the
pre-memory binary and never wired the new memory hooks for emo/anca.

Close the gap the same way the script already treats managed-settings.json and
start-claude.sh (sync_managed_config / deploy_user_launcher): the repo is the
authoring surface. At the top of the run, if the repo copy differs from the deployed
binary, install it and re-exec the fresh one. Guards: a re-exec env flag (no loop),
bash -n (never deploy a broken script), DRY_RUN (no mutation), cmp (no churn when
unchanged). Verified across all four paths in isolation.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 08:02:31 +00:00
Viktor Barzin
0b11a28d66 workstation: stop install_memory aborting the reconcile under set -e
install_memory (added in 44562535) ended with `[[ -d <plugin-dir> ]] && rm && log`
and guarded a chmod with a bare `[[ -f settings ]] && chmod`. When the plugin dir
or settings file is absent — the normal case for users who never had the
claude-memory plugin — those return non-zero, and under `set -euo pipefail` the
function returns non-zero and kills the whole hourly reconcile after the FIRST
user, before the rest are processed.

It never fired before because the rollout was committed but the deployed
/usr/local/bin/t3-provision-users was never updated, so install_memory had never
run. On first real run it aborted right after ancamilea, so emo (and wizard)
never got their memory hooks wired — the reason emo's sessions lost memory. Wrap
the cleanup in an if-block, guard the chmod, and end the function with return 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 07:59:47 +00:00
Viktor Barzin
464e0bfb97 Revert "feat(monitoring): Tempo + OTel Collector for tripit tracing (ADR-0032 Phase 2)"
All checks were successful
ci/woodpecker/push/default Pipeline was successful
This reverts commit 7513468a2d.
2026-06-22 06:46:56 +00:00
Viktor Barzin
72dcb125d5 Revert "fix(monitoring): tempo OOMKilled — move resources under tempo.resources"
This reverts commit a02782d11f.
2026-06-22 06:46:56 +00:00
Viktor Barzin
a02782d11f fix(monitoring): tempo OOMKilled — move resources under tempo.resources
Some checks failed
ci/woodpecker/push/default Pipeline failed
Pipeline #315 failed: tempo-0 CrashLoopBackOff / OOMKilled (exit 137). The
single-binary grafana/tempo chart (v1.24.4) takes container resources at
tempo.resources, not a top-level resources: — so my block was ignored and the pod
fell to the namespace LimitRange default and OOMed. Set tempo.resources explicitly
(req 256Mi / limit 2Gi). tripit + existing monitoring were unaffected throughout.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 06:44:31 +00:00
Viktor Barzin
7513468a2d feat(monitoring): Tempo + OTel Collector for tripit tracing (ADR-0032 Phase 2)
Some checks failed
ci/woodpecker/push/default Pipeline failed
Stand up the cluster's first trace store + OTLP ingress so tripit's OpenTelemetry
spans (Phase 1, already live in prod) export and correlate with logs:
- Grafana Tempo (single-binary, filesystem on proxmox-lvm 20Gi, 30d)
- OTel Collector (contrib; otlp -> redaction deny-list backstop -> batch -> tempo)
- Grafana: a Tempo datasource + an ADDITIVE trace_id->Tempo derivedField on the
  Loki datasource (no uid change, so existing dashboards are unaffected)
- tripit deployment: LOG_FORMAT=json + OTEL_EXPORTER_OTLP_ENDPOINT -> the Collector

Additive (new helm releases; Loki/Prometheus/Grafana untouched). Offline
'terraform validate' clean; full plan+apply runs in CI (locked git-crypt blocks a
local plan as non-admin).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 06:31:11 +00:00
Viktor Barzin
1a32c07ffe docs(eso): Phase 1 done (0.16.2) + confirmed Phase 2 GC findings
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Execution log added to the ESO migration plan. Phase 1 complete: ESO at 0.16.2
(both v1beta1+v1 served). Phase 2 findings confirmed live: apiVersion bump forces
a kubernetes_manifest REPLACE, and ESO ESs use creationPolicy=Owner (target Secret
ownerRef → cascade-GC risk on the replace's delete). Phase 2 must snapshot Secrets
+ empirically validate GC-survival on the first live ES + per-stack two-phase
-target apply (fallback: state rm + import). Corrected the doc's k8s assumption
(cluster is on 1.34; whole climb stays on 1.34, no interleave).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 20:44:50 +00:00
Viktor Barzin
ac27e41fde Merge remote-tracking branch 'origin/master'
All checks were successful
ci/woodpecker/push/default Pipeline was successful
2026-06-21 20:41:35 +00:00
Viktor Barzin
296deda3b4 eso: Phase 1 — climb chart 0.12.1 -> 0.16.2 (transition version) + atomic
First half of the ESO 0.12->2.6 migration (docs/plans/2026-06-21-eso-0.12-to-2.x-migration-design.md),
clearing the LAST k8s-1.35 compat-gate blocker. Stepped one minor at a time on
k8s 1.34 (no k8s interleave — cluster already on 1.34, ESO bands are conservative
tested ranges not hard limits): 0.12.1 -> 0.13.0 -> 0.14.4 -> 0.15.1 -> 0.16.2.
Each hop applied + verified: controller healthy, all 108 live ExternalSecrets
stayed SecretSynced (2 pre-existing dead — instagram-poster, payslip-ingest —
missing Vault data, untouched). Added atomic=true + timeout=600 (ESO had no
rollback safety net). 0.16.2 serves BOTH v1beta1 AND v1 (storedVersions now
["v1beta1","v1"]) — the safe window to rewrite all 104 CRs to v1 (Phase 2) before
0.17 removes v1beta1. State auto-committed per hop by scripts/tg (Tier-0 SOPS).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 20:41:30 +00:00
Viktor Barzin
0cd59d2c55 state(external-secrets): update encrypted state 2026-06-21 20:41:10 +00:00
Viktor Barzin
b8612e788d state(external-secrets): update encrypted state 2026-06-21 20:39:45 +00:00
Viktor Barzin
877e5c73b2 state(external-secrets): update encrypted state 2026-06-21 20:38:34 +00:00
Viktor Barzin
de2250f667 immich-frame: set photo date format to dd/MM/yyyy
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The photo date overlay was showing US-style MM/dd/yyyy — ImmichFrame's built-in default when PhotoDateFormat is unset. Viktor wants UK day/month/year ordering instead. Pin PhotoDateFormat to the date-fns pattern "dd/MM/yyyy" (uppercase MM = month; lowercase mm would render minutes). The config map carries reloader.stakater.com/match, so Reloader restarts the immich-frame pod automatically on apply.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 20:36:43 +00:00
Viktor Barzin
8e6eff03dd state(external-secrets): update encrypted state 2026-06-21 20:36:37 +00:00
Viktor Barzin
0bae025b9b wealth dashboard: spend-down figures in today's money (inflation-adjusted)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked whether the spend-down numbers were inflation-adjusted —
they were not (all nominal). He chose to switch the card to today's
money, so every row now shows constant purchasing power for life.

Each row is a die-with-zero annuity at the REAL rate (1+g)/1.03−1
(3% inflation), spending a constant inflation-adjusted amount (the
actual pounds withdrawn rise with inflation) until net worth hits £0
at age 100:
  • No growth (0%)  → £12/day, £370/mo,   £4,446/yr   (negative real: loses to inflation)
  • Inflation (3%)  → £43/day, £1,315/mo, £15,776/yr  (0% real: holds value)
  • Market (7%)     → £130/day, £3,942/mo, £47,300/yr (~3.9% real)

Title now flags "(today's £)". Same panel/layout; only the SQL, title,
and tooltip changed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 20:13:59 +00:00
Viktor Barzin
3fb6284e2b immich-frame: use 24-hour clock (ClockFormat HH:mm)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked to switch the Immich photo-frame shown on the Portal
kitchen appliance to a 24-hour clock. immichFrame defaults ClockFormat
to 'hh:mm' (12-hour) and we never overrode it, so the frame was showing
12-hour time. Set ClockFormat: "HH:mm" (date-fns 24h token) in the
frame Settings.yml ConfigMap; Reloader restarts the pod on apply.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 20:10:51 +00:00
Viktor Barzin
e89de86af0 wealth dashboard: spend-down table → three growth scenarios
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor wanted the spend-down card to compare three portfolio-growth
scenarios rather than the previous floor-vs-4%-real pair.

The table now has three rows, each a die-with-zero annuity (drain net
worth to £0 by age 100) spending a constant number of ACTUAL (nominal)
pounds, differing only by the assumed nominal growth rate:
  • No growth (0%)      → £43/day,  £1,315/mo, £15,776/yr  (= NW ÷ years)
  • Inflation (3%)      → £106/day, £3,233/mo, £38,792/yr  (NEW)
  • Avg market (7%)     → £220/day, £6,703/mo, £80,435/yr

This keeps the £43 no-growth floor he anchored on. The old third row
was "4% real" (£133) expressed in today's money; it's replaced by the
7%-nominal market row (£220, actual pounds) so all three rows share one
basis (nominal pounds) and are directly comparable. 3%/7% are hardcoded
(one-line SQL edit). Table height 4→5 for the extra row; panels below
shifted down 1.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 20:06:29 +00:00
Viktor Barzin
85d42f2c13 wealth dashboard: merge spend-down tiles into one compact table
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor wanted the six separate spend-down stat tiles consolidated into a
single, more compact card with the figures laid out as rows.

Replaces stat panels 9220-9225 with one table panel (id 9220) in the
Overview row: 2 rows (Floor / 4% real) × 3 columns (per day / month /
year). Same underlying math and live values (£43/£1,315/£15,776 floor;
£133/£4,039/£48,463 at 4% real). w=9 instead of the full-width tile row,
so it takes ~a third of the width.

Note: this intentionally overrides the "table panels live at the bottom"
layout convention — Viktor chose to keep this headline KPI glanceable at
the top of the dashboard rather than scroll for it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 19:55:57 +00:00
Viktor Barzin
63add2a126 feat(tripit): finalize ADR-0028 auth env — AUTH_MODE=normal, trips@ sender, trust XFF
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Now that the native-auth rollout is complete: (1) AUTH_MODE hybrid->normal — the legacy Authentik OIDC-bearer + forward-auth arms were removed in #96, and 'hybrid' already resolved to 'normal' via backward-compat parsing; this makes it explicit and corrects the now-false comment. (2) SMTP_FROM plans@->trips@ — the dedicated native-auth sender; the trips@->spam@ send-as alias is live + verified (RCPT 250). (3) TRUST_FORWARDED_FOR=true — so #95's per-IP signup rate-limit keys on the real client behind Traefik, not the shared ingress pod IP. Env-only; the Deployment image is KEEL_IGNORE_IMAGE (lifecycle-ignored), so this does NOT touch the running image. Reloader restarts the pod to pick up the new env.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 19:50:20 +00:00
Viktor Barzin
166a2bcab4 wealth dashboard: add "spend-down to £0 at 100" stat tiles
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor wanted a glanceable number on the Wealth dashboard for how much
he can spend for the rest of his life — spending the whole net worth
down to zero by age 100.

Adds a third line of six stat tiles to the Overview section, two
equations × three cadences (per day / month / year):

  • FLOOR  — net worth ÷ time remaining to age 100. Treats the money as
    cash (no growth, no inflation): a conservative lower bound.
    ≈ £43/day, £1.3k/mo, £15.8k/yr.
  • 4% REAL — die-with-zero annuity: the constant, inflation-adjusted
    spend that drains the balance to £0 at 100 while it keeps earning
    4% real. PMT = NW·r/(1−(1+r)^−n). ≈ £133/day, £4.0k/mo, £48.5k/yr.

Horizon is today → his 100th birthday (DOB 1998-10-04 → 2098-10-04),
computed live so the figures tick as net worth and the horizon move.
Net worth reuses the existing latest-per-account dav_corrected math, so
the tiles always agree with the "Net worth (current)" stat (pension
included; target £0). The 4% real rate is hard-coded per his "keep it
simple, just a number" steer — a one-line SQL edit to change later.

Layout: tiles inserted at y=9; all sections below shifted down 4 rows.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 19:48:30 +00:00
c830f9f462 Merge pull request 'workstation: wire-memory-hooks as root (fix non-admin wiring)' (#14) from wizard/mem-fix into master
All checks were successful
ci/woodpecker/push/default Pipeline was successful
2026-06-21 17:45:39 +00:00
Viktor Barzin
9aa2438e75 workstation: run wire-memory-hooks as root, not runuser (fix non-admin wiring)
install_memory ran the JSON-merge helper via 'runuser -u $user', but the helper
lives under the admin's mode-700 home ($WORKSTATION_DIR) which non-admin users
can't traverse -> wiring silently failed for emo/anca (hooks copied but never
wired into settings.json). Run the helper as root (it reads both the repo helper
and the user's home) and chown the result back to the user. Verified by the live
all-users rollout: emo + anca now wired correctly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 17:45:36 +00:00
f318773cb0 Merge pull request 'workstation: homelab-memory for all users (retire claude-memory MCP)' (#13) from wizard/memory-allusers into master
All checks were successful
ci/woodpecker/push/default Pipeline was successful
2026-06-21 17:42:51 +00:00
Viktor Barzin
44562535a2 workstation: provision homelab-memory hooks for all users (retire claude-memory MCP)
Roll the wizard MCP->homelab-CLI memory migration out to every devvm user. Adds
install_memory() to t3-provision-users.sh (mirrors install_playwright: per-user,
idempotent, if-absent, as-the-user): installs the 4 memory hook scripts into
~/.claude/hooks, wires them into settings.json additively (wire-memory-hooks.py
never touches env / the per-user MEMORY_API_KEY), and removes ONLY the
claude_memory MCP + plugin if present. Reuses each user's existing key (no
minting; per-user isolation stays deferred per the 2026-06-07 design). The
homelab CLI hits the same remote HTTP API the MCP used; recall runs via the
homelab-memory-recall.py UserPromptSubmit hook. Shared instructions (rules/skills
symlinked from base; root+infra CLAUDE.md) already cover all users.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 17:42:42 +00:00
Viktor Barzin
79749d7324 Merge remote-tracking branch 'origin/master'
All checks were successful
ci/woodpecker/push/default Pipeline was successful
2026-06-21 17:27:42 +00:00
Viktor Barzin
5e3fe2e8e2 docs(plans): ESO 0.12->2.6 (v1beta1->v1) migration design — the last k8s-1.35 blocker
Design doc for migrating External Secrets Operator off v0.12 (k8s <=1.31), now
the ONLY remaining compat-gate blocker for autonomous k8s 1.35 (kyverno cleared
to 1.18.1 today). Decisive findings: NO v1beta1->v1 conversion webhook, so all
104 ExternalSecrets (across 73 stacks) + 2 ClusterSecretStores must be rewritten
to external-secrets.io/v1 (byte-identical apiVersion bump) while on 0.16.2, BEFORE
crossing 0.17 (which removes v1beta1 — the point of no return). Step one minor at
a time (no skipping); chart==app version; downstream Secrets survive. 5-phase
ordered plan + per-phase rollback + the plan-time data.kubernetes_secret -target
gotcha (15 stacks) + Tier-0/SOPS handling. Plan only — nothing applied.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 17:27:37 +00:00