Commit graph

3103 commits

Author SHA1 Message Date
Viktor Barzin
afafc9928f [docs] Onboarding runbook for new Forgejo repos in Woodpecker 2026-05-07 23:29:35 +00:00
Viktor Barzin
5b255cf6f2 state(vault): update encrypted state 2026-05-07 23:29:35 +00:00
Viktor Barzin
108bef7b1a f1-stream: subreddit extractor scans r/motorsportsstreams2 (active sub)
User asked specifically for r/motorsportstreams. Reddit banned that sub
years ago; the active 12.5k-subscriber successor is r/motorsportsstreams2.
Added it to SUBREDDITS plus r/f1streams (709 subs, public).

Also extended:
- SEARCH_QUERIES with three Sky Sports F1 / live-stream phrases that
  catch the `[F1 STREAM]` post pattern the community uses on race
  weekends (titles like "[F1 STREAM] Bahrain GP - Live Race | No Buffer
  | Mobile Friendly" linking to boxboxbox.pro/stream-1).
- _INTERESTING_HOSTS allowlist with boxboxbox.{pro,live,lol},
  pitsport.live, ppv.to, streamed.pk, acestrlms/aceztrims, and the
  Super Formula direct CDNs (racelive.jp, cdn.sfgo.jp) — all observed
  in last-50-posts on r/motorsportsstreams2.

Where this leaves us, honestly:
- The r/motorsportsstreams2 megathread "Where to watch every F1 race"
  recommends EXACTLY the four sites we already pull from: pitsport.xyz,
  streamed.pk, ppv.to, acestrlms. The community has the same broken JW
  Player chain we have for Sky Sports F1 24/7 streams. There is no
  free-and-working alternative they know about.
- boxboxbox.pro (the most-promoted F1 stream domain in race-weekend
  posts) is currently NXDOMAIN; .live is parked, .lol unreachable. The
  domain rotates after takedowns; Reddit posts will surface fresh ones
  when posters share them.
- For F1 specifically: extractor surfaces 2 motomundo.net candidates
  (MotoGP wrappers) and lights up to ~6+ during F1 race weekends as
  posters share fresh boxboxbox/equivalent URLs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:35 +00:00
Viktor Barzin
e110b40a4a monitoring(wealth): monthly contrib-vs-mkt as line chart, not bars
User asked for two lines instead of side-by-side bars at monthly
granularity. Converts panel 25 from barchart to timeseries:

  * type: barchart -> timeseries
  * format: table -> time_series, SELECT month::timestamp AS time
  * drawStyle line, lineWidth 2, fillOpacity 0, showPoints auto
  * Same blue (contributions) / green (market gain) colour overrides

Where the green line rises above the blue line is the visual cue that
the market out-earned new contributions for that month -- the trend
the user wants to track.

Diff is small (15 ins / 28 del) because the bar-chart-only fields
(barRadius, barWidth, groupWidth, stacking, xField, xTickLabelRotation)
are dropped.
2026-05-07 23:29:35 +00:00
Viktor Barzin
84fd752747 monitoring(wealth): monthly contributions vs market gain bar chart
Goal stated by user: see when monthly market gain starts to exceed
monthly contributions, i.e. the inflection point where the market is
out-earning savings rather than the other way around.

New panel id=25 between the annual decomposition (13) and per-account
ROI (14): bar chart with two side-by-side bars per month --
contributions (blue) and market gain (green). Same calculation as
panel 13 but month-grain instead of year-grain. Months where the
green bar dwarfs the blue one are visible at a glance.

SQL: same endpoints CTE pattern as panel 13, with date_trunc('month',
valuation_date) as the grouping key. Uses max_complete cutoff so
partial-today doesn't skew the latest month.

Layout: panels at y >= 75 shifted down by 11 (chart height). New
chart at y=75; panel 14 (per-account ROI) -> y=86; panel 10
(activity log) -> y=96.

Spot check (recent months from PG):
  2025-07: contrib +£5,601    market +£42,295   <- big market month
  2025-09: contrib +£1,501    market +£24,206
  2026-02: contrib +£35,501   market +£41,382
  2026-03: contrib +£5,501    market -£38,483   <- correction
  2026-04: contrib +£73,267   market +£21,448
2026-05-07 23:29:34 +00:00
Viktor Barzin
f1d69b0a7a [wealthfolio] Flip wealthfolio-sync CronJob image to Forgejo
The CronJob has been broken since registry-private lost the
wealthfolio-sync image (last successful run 36+ days ago). The image
is built from /home/wizard/code/broker-sync (the brokerage data sync —
Trading 212, Schwab, Fidelity, IMAP-CSV → wealthfolio).

Set up: viktor/broker-sync repo on Forgejo with .woodpecker/build.yml
that pushes to forgejo.viktorbarzin.me/viktor/wealthfolio-sync. Until
Woodpecker recognises the new repo's webhook, the image was bootstrapped
via 'docker pull viktorbarzin/broker-sync:latest && docker tag … &&
docker push forgejo.viktorbarzin.me/viktor/wealthfolio-sync:latest' so
the CronJob unblocks immediately.
2026-05-07 23:29:34 +00:00
Viktor Barzin
d942a21d93 [woodpecker] Bump server + agent v3.13.0 → v3.14.0
Fixes the 'could not load config from forge: context deadline exceeded'
issue that blocked every Forgejo-triggered pipeline during the
forgejo-registry-consolidation cutover. Helm chart 3.5.1 stays
(no 3.6 yet); only the image tag overrides change.
2026-05-07 23:29:34 +00:00
Viktor Barzin
8c73a0243a [forgejo] Phase 4 final decommission: drop registry-private container + port 5050
Image migration completed (forgejo-migrate-orphan-images.sh ran +
all in-scope images now under forgejo.viktorbarzin.me/viktor/) and
the cluster cutover landed in commit 3148d15d. registry-private is
no longer needed.

* infra/modules/docker-registry/docker-compose.yml — registry-private
  service block removed; nginx 5050 port mapping dropped.
* infra/modules/docker-registry/nginx_registry.conf — upstream
  private block + port 5050 server block removed.
* infra/.woodpecker/build-ci-image.yml — drop the dual-push to
  registry.viktorbarzin.me:5050; only push to Forgejo. Verify-
  integrity step removed (the every-15min forgejo-integrity-probe
  in monitoring covers it). Break-glass tarball step still runs but
  pulls from Forgejo (the only registry left).

The registry-config-sync.yml pipeline will pick this commit up and
sync the new compose+nginx to the VM. Manual final step on the VM:
  ssh root@10.0.20.10 'cd /opt/registry && docker compose up -d --remove-orphans'
to actually destroy the registry-private container — compose does
NOT do orphan removal on a normal up -d.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:34 +00:00
Viktor Barzin
59885c21d0 [claude-memory] Restore truncated main.tf — apply Phase 3 image flip on full file
The Phase 3 commit 3148d15d ran into a disk-full ENOSPC during edit
of stacks/claude-memory/main.tf, and the file was committed truncated
at line 286 mid-string ('Cor instead of 'Core Platform' / closing
braces). terraform validate failed with 'Unterminated template string'.
Restoring the trailing 2 lines + re-applying the
viktorbarzin/claude-memory-mcp:17 → forgejo.viktorbarzin.me/viktor/
claude-memory-mcp:17 cutover that Phase 3 was meant to do.
2026-05-07 23:29:34 +00:00
Viktor Barzin
3f3e5fc954 chrome-service: open NP for Traefik → noVNC sidecar (port 6080)
Existing NetworkPolicy only admitted port 3000 (Playwright WS) from
labelled client namespaces, blocking Traefik's traffic to the noVNC
sidecar on port 6080. The chrome.viktorbarzin.me ingress would hang
forever — page never loads, eventually times out.

Adds a second ingress rule allowing TCP/6080 from the traefik
namespace only. Authentik forward-auth still gates external access
at the Traefik layer.

Also reconciles the noVNC image to the new Forgejo registry path
(:v4 unchanged) — already declared in TF, just live-state drift from
the Phase 3 registry consolidation.

Updates the architecture doc; the previous text still described the
old nginx static health stub that noVNC replaced.
2026-05-07 23:29:34 +00:00
Viktor Barzin
56fbd281c9 [forgejo] Restore registry-private temporarily until image migration completes
The Phase 4 docker-compose + nginx changes I landed earlier dropped
the registry-private container's port-5050 listener BEFORE migrating
the existing images to Forgejo. The registry-config-sync pipeline
applied the new nginx config, breaking pulls from registry-private —
which is the source of every image we still need to copy to Forgejo.

Restore registry-private + the 5050 listener until the migration
script has finished. Subsequent commit will drop them once images
are confirmed in Forgejo.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:34 +00:00
Viktor Barzin
a91bbe189e f1-stream: subreddit extractor finds Reddit '[Watch / Download]' threads
Two fixes for the previously-dormant subreddit extractor + a chrome-browser TARGETS pivot to MotoGP weekend live URLs.

1. **Reddit fetch was 403'd by `Accept: application/json`**. Cluster IP +
   that header trips Reddit's anti-bot fingerprint and returns HTML 403.
   Removing the explicit Accept (default `*/*`) restores HTTP 200 with
   JSON. Confirmed via direct httpx test from the f1-stream pod.

2. **Search the right things**. The community uses a stable
   `[Watch / Download] <Series> <Year> - <Round> | <Event>` post pattern
   with selftext links to admin-curated WordPress sites (motomundo.net
   for MotoGP, sister sites for F1 when active). New extractor:
   - Hits both /new.json and /search.json across r/MotorsportsReplays
     and three smaller motorsport subs.
   - Filters posts where title contains `[watch`, `watch online`, or
     flair = `live`.
   - Extracts URLs from selftext (regex), filters to a positive
     `_INTERESTING_HOSTS` allowlist (motomundo, freemotorsports,
     pitsport, rerace, dd12, etc.) so we don't drown the verifier in
     YouTube/Discord/gofile links.
   - Returns each as embed-type so the chrome-service verifier visits.

3. **chrome_browser.TARGETS pivoted** to the live MotoMundo MotoGP
   French GP iframes (motomundo.top/e/<id> + motomundo.upns.xyz/#<id>)
   while the weekend is on. The previous DD12 NASCAR + Acestrlms F1
   targets were both broken JW Player paths anyway.

State after deploy:
- /streams: 3 verified live (WRC Rally Portugal, NASCAR 24/7, Premier League Darts) — Darts is currently active because UK is mid-match.
- Subreddit extractor surfaces the live MotoMundo URL but the verifier
  marks the WordPress wrapper page playable=False (no top-level <video>
  element; the m3u8 lives in nested iframes). Next iteration: drill the
  verifier into iframe contentDocument and capture from there.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:34 +00:00
Viktor Barzin
4ec40ea804 [forgejo] Phases 3+4+5: cutover, decommission, docs sweep
End of forgejo-registry-consolidation. After Phase 0/1 already landed
(Forgejo ready, dual-push CI, integrity probe, retention CronJob,
images migrated via forgejo-migrate-orphan-images.sh), this commit
flips everything off registry.viktorbarzin.me onto Forgejo and
removes the legacy infrastructure.

Phase 3 — image= flips:
* infra/stacks/{payslip-ingest,job-hunter,claude-agent-service,
  fire-planner,freedify/factory,chrome-service,beads-server}/main.tf
  — image= now points to forgejo.viktorbarzin.me/viktor/<name>.
* infra/stacks/claude-memory/main.tf — also moved off DockerHub
  (viktorbarzin/claude-memory-mcp:17 → forgejo.viktorbarzin.me/viktor/...).
* infra/.woodpecker/{default,drift-detection}.yml — infra-ci pulled
  from Forgejo. build-ci-image.yml dual-pushes still until next
  build cycle confirms Forgejo as canonical.
* /home/wizard/code/CLAUDE.md — claude-memory-mcp install URL updated.

Phase 4 — decommission registry-private:
* registry-credentials Secret: dropped registry.viktorbarzin.me /
  registry.viktorbarzin.me:5050 / 10.0.20.10:5050 auths entries.
  Forgejo entry is the only one left.
* infra/stacks/infra/main.tf cloud-init: dropped containerd
  hosts.toml entries for registry.viktorbarzin.me +
  10.0.20.10:5050. (Existing nodes already had the file removed
  manually by `setup-forgejo-containerd-mirror.sh` rollout — the
  cloud-init template only fires on new VM provision.)
* infra/modules/docker-registry/docker-compose.yml: registry-private
  service block removed; nginx 5050 port mapping dropped. Pull-
  through caches for upstream registries (5000/5010/5020/5030/5040)
  stay on the VM permanently.
* infra/modules/docker-registry/nginx_registry.conf: upstream
  `private` block + port 5050 server block removed.
* infra/stacks/monitoring/modules/monitoring/main.tf: registry_
  integrity_probe + registry_probe_credentials resources stripped.
  forgejo_integrity_probe is the only manifest probe now.

Phase 5 — final docs sweep:
* infra/docs/runbooks/registry-vm.md — VM scope reduced to pull-
  through caches; forgejo-registry-breakglass.md cross-ref added.
* infra/docs/architecture/ci-cd.md — registry component table +
  diagram now reflect Forgejo. Pre-migration root-cause sentence
  preserved as historical context with a pointer to the design doc.
* infra/docs/architecture/monitoring.md — Registry Integrity Probe
  row updated to point at the Forgejo probe.
* infra/.claude/CLAUDE.md — Private registry section rewritten end-
  to-end (auth, retention, integrity, where the bake came from).
* prometheus_chart_values.tpl — RegistryManifestIntegrityFailure
  alert annotation simplified now that only one registry is in
  scope.

Operational follow-up (cannot be done from a TF apply):
1. ssh root@10.0.20.10 — edit /opt/registry/docker-compose.yml to
   match the new template AND `docker compose up -d --remove-orphans`
   to actually stop the registry-private container. Memory id=1078
   confirms cloud-init won't redeploy on TF apply alone.
2. After 1 week of no incidents, `rm -rf /opt/registry/data/private/`
   on the VM (~2.6GB freed).
3. Open the dual-push step in build-ci-image.yml and drop
   registry.viktorbarzin.me:5050 from the `repo:` list — at that
   point the post-push integrity check at line 33-107 also needs
   to be repointed at Forgejo or removed (the per-build verify is
   redundant with the every-15min Forgejo probe).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:34 +00:00
Viktor Barzin
e86efd107a [forgejo] Migration script: exclude empty repos, all-images full mode
Updated to handle the actual situation: wealthfolio-sync and
fire-planner have registry repos but no tags (broken/abandoned
deployments). Skip those with a SKIP marker. Migrate everything
else as a stop-gap until Woodpecker pipelines start producing
Forgejo images on their own.

The image list now covers all private images currently in scope.
2026-05-07 23:29:34 +00:00
Viktor Barzin
874f80ecbe [woodpecker] Persist hostAliases patch via null_resource (chart doesn't expose it)
Helm chart 3.5.1 has no `server.hostAliases` field, so the YAML
addition I made earlier was a no-op. Apply via kubectl patch in a
null_resource keyed on helm revision so it re-asserts on every
chart upgrade. Same pattern as the CoreDNS replicas/affinity patch
in stacks/technitium/.

Without this, every helm upgrade on woodpecker reverts the
hostAliases fix and the Forgejo pipeline triggers start failing
with context-deadline-exceeded again.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:34 +00:00
Viktor Barzin
ff19d86557 [woodpecker] Pin forgejo.viktorbarzin.me to in-cluster Traefik LB
Pipeline triggers from Forgejo were failing with "could not load
config from forge: context deadline exceeded" — Woodpecker's
forge-API fetch path was round-tripping through Cloudflare via the
public IP, hitting 30s deadline timeouts on cold connections. The
in-cluster path via the Traefik LB (10.0.20.200) is consistently
sub-100ms.

Same trick we use for the containerd hosts.toml redirect on each
node — Traefik serves the *.viktorbarzin.me wildcard cert so SNI
verification still passes. OAuth callbacks still use the public
hostname (correct, those come from the user's browser).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:34 +00:00
Viktor Barzin
a0b70482fe [forgejo] Bump webhook DELIVER_TIMEOUT 5s -> 30s
Forgejo→Woodpecker webhooks were timing out on first request after
pod restart. The default 5s deadline is too tight for the cold
Cloudflare-tunnel TLS handshake (observed 6-8s). 30s comfortably
covers retries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:34 +00:00
Viktor Barzin
83496f6e0c [forgejo] Allow webhook delivery to ci.viktorbarzin.me + *.viktorbarzin.me
The Forgejo→Woodpecker webhook (so Woodpecker fires on each push to
viktor/<repo>) was being blocked by the existing ALLOWED_HOST_LIST
of *.svc.cluster.local — ci.viktorbarzin.me resolves to the public IP
because Cloudflare proxying wasn't covering that path. Without this
fix, no Woodpecker pipeline run was triggered on push, the dual-push
bake would never start, and Forgejo's package catalog stays empty.

Add ci.viktorbarzin.me explicitly + *.viktorbarzin.me as a future-
proofing wildcard. The list still excludes arbitrary external hosts,
so this is not a security regression — just unblocking the webhook
to our own CI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:34 +00:00
Viktor Barzin
76d2d0e536 [forgejo] Add chrome-service-novnc:v4 to orphan-image migrator 2026-05-07 23:29:34 +00:00
Viktor Barzin
413ceec35c [forgejo] securityContext.fsGroup=1000 so /data is writable to forgejo
Phase 0 enabled packages but the pod crashloops on
`mkdir /data/tmp: permission denied` — Forgejo loads the chunked
upload path (default /data/tmp/package-upload) before s6-overlay
gets a chance to chown /data. fsGroup tells kubelet to recursively
chown the volume to GID 1000 on mount, which fixes it.

Pre-23-day Forgejo deployed with packages off so this code path
never ran.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:34 +00:00
Viktor Barzin
3fb05825d8 [forgejo] Drop the FORGEJO__packages__CHUNKED_UPLOAD_PATH override
Setting it to /data/tmp/package-upload triggers a CrashLoopBackOff
because /data is the volume mount root and is owned by root, not
the forgejo user (uid 1000) — Forgejo can't `mkdir /data/tmp`.

The default value resolves under the AppDataPath (a subdir Forgejo
itself owns) which works fine. Keep the ENABLED=true override; v11
ships packages on but explicit is safer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:34 +00:00
Viktor Barzin
d67e8ddaf8 f1-stream: add chrome-browser, subreddit, dd12 extractors; fix streamed.pk
User asked to broaden the source pipeline so f1-stream can find F1 (and
adjacent motorsport) streams from Sky Sports / DAZN / Reddit / etc.,
using the in-cluster chrome-service headed browser where needed. Four
changes:

1. **streamed.py**: BASE_URL streamed.su → streamed.pk. The .su domain
   stopped serving the API host in 2026 (only the marketing page is
   left); .pk hosts the JSON API now. Adds 3 events/round (currently
   all routed through embedsports.top — see #2 caveat).

2. **chrome_browser.py** (new): generic chrome-service-driven extractor.
   Connects to the existing chrome-service WS (CHROME_WS_URL +
   CHROME_WS_TOKEN env), navigates a list of TARGETS, captures any HLS
   playlist URL the page fetches at runtime, returns one ExtractedStream
   per discovery. Uses the same stealth init script as the verifier so
   anti-bot checks don't trip the page. Handles iframes (DD12-style
   /nas → /new-nas/jwplayer) and probes child-frame <video>/source
   elements after settle. Caveat: most aggregator sites (pooembed,
   embedsports, hmembeds, even DD12's JW Player path) use a broken
   runtime decoder that produces no m3u8 in our environment, so the
   TARGETS list is currently 0-yielding; the framework is the
   contribution and concrete sites can be added as they're discovered.

3. **subreddit.py** (new): scans r/MotorsportsReplays, r/motorsports,
   r/formula1, r/motogp via the public old.reddit.com JSON API for
   posts whose flair/title indicates a live stream. Discovered URLs
   are returned as embed-type streams; the verifier visits each via
   chrome-service to confirm playability. Note: Reddit currently HTTP
   403's our cluster outbound IP for anonymous JSON requests; the
   extractor returns 0 in that state and logs a debug message. Will
   work from any IP Reddit isn't blocking.

4. **dd12.py** (new): inline-HTML scraper for DD12Streams. The site
   embeds `playerInstance.setup({file: "..."})` directly in HTML — no
   JS decoder needed. Currently surfaces NASCAR Cup Series 24/7 (clean
   BunnyCDN-hosted HLS at w9329432hnf3h34.b-cdn.net/pdfs/master.m3u8);
   add new `(path, label, title)` tuples to CHANNELS as DD12 expands.

Result: /streams now shows 2 verified live streams (Rally TV via
pitsport + DD12 NASCAR Cup 24/7). When the next F1 weekend (Canadian
GP, May 22-24) goes live, pitsport will surface F1 sessions
automatically via the existing pushembdz path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:34 +00:00
Viktor Barzin
a3024d1f51 [docs] Forgejo registry image-rebuild runbook
Companion to forgejo-registry-breakglass.md but for the more common
case: the Forgejo registry is healthy as a whole, but one image's
manifest/blob references are broken (orphan child, half-pushed
upload, retention-vs-pull race). The
RegistryManifestIntegrityFailure alert annotation already points
here.

Mirrors registry-rebuild-image.md (the registry-private equivalent)
in structure: confirm via probe + curl, delete broken version
through Forgejo API, rebuild via Woodpecker manual run, force
consumers to re-pull, verify integrity recovery.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:33 +00:00
Viktor Barzin
fbb41eff9d [ci] Phase 1: infra-ci dual-push + break-glass tarball
Adds Forgejo as a second push target on the build-ci-image pipeline
and saves the just-pushed image as a gzipped tarball on the registry
VM disk (/opt/registry/data/private/_breakglass/) so we can recover
infra-ci with `ctr images import` if both registries are down.

* Dual-push: registry.viktorbarzin.me:5050/infra-ci AND
  forgejo.viktorbarzin.me/viktor/infra-ci, in the same
  woodpeckerci/plugin-docker-buildx step. Same image bytes; the
  Forgejo integrity probe (every 15min) catches any divergence.
* Break-glass step: SSHes to 10.0.20.10, docker pulls + saves +
  gzips, keeps last 5 tarballs (latest symlink). Failure-tolerant
  so a transient registry blip doesn't fail the build pipeline.
* Runbook docs/runbooks/forgejo-registry-breakglass.md documents
  the recovery flow (when to use, scp+ctr import, node cordon,
  underlying-issue fix).

Tarball mirrors to Synology automatically through the existing
daily offsite-sync-backup job — no new sync wiring needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:33 +00:00
Viktor Barzin
70ea1cf6fd [forgejo] Tolerate missing Vault keys during Phase 0 bootstrap
Wrap the three new Vault key reads in try(...) so the first apply
succeeds even when forgejo_pull_token / forgejo_cleanup_token /
secret/ci/global haven't been populated yet. Without this, CI
auto-apply blocks on the very push that introduces the references —
chicken-and-egg with the runbook order (which is: apply Forgejo bumps,
then create users + PATs, then apply the rest).

Empty tokens are intentionally visible-broken (auth fails, probe
reports auth failure, cleanup CronJob errors) — that's the signal
to run the bootstrap runbook. Subsequent apply picks up the real
values.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:33 +00:00
Viktor Barzin
f793a5f50b [forgejo] Phase 0 of registry consolidation: prepare Forgejo OCI registry
Stage 1 of moving private images off the registry:2 container at
registry.viktorbarzin.me:5050 (which has hit distribution#3324 corruption
3x in 3 weeks) onto Forgejo's built-in OCI registry. No cutover risk —
pods still pull from the existing registry until Phase 3.

What changes:
* Forgejo deployment: memory 384Mi→1Gi, PVC 5Gi→15Gi (cap 50Gi).
  Explicit FORGEJO__packages__ENABLED + CHUNKED_UPLOAD_PATH (defensive,
  v11 default-on).
* ingress_factory: max_body_size variable was declared but never wired
  in after the nginx→Traefik migration. Now creates a per-ingress
  Buffering middleware when set; default null = no limit (preserves
  existing behavior). Forgejo ingress sets max_body_size=5g to allow
  multi-GB layer pushes.
* Cluster-wide registry-credentials Secret: 4th auths entry for
  forgejo.viktorbarzin.me, populated from Vault secret/viktor/
  forgejo_pull_token (cluster-puller PAT, read:package). Existing
  Kyverno ClusterPolicy syncs cluster-wide — no policy edits.
* Containerd hosts.toml redirect: forgejo.viktorbarzin.me → in-cluster
  Traefik LB 10.0.20.200 (avoids hairpin NAT for in-cluster pulls).
  Cloud-init for new VMs + scripts/setup-forgejo-containerd-mirror.sh
  for existing nodes.
* Forgejo retention CronJob (0 4 * * *): keeps newest 10 versions per
  package + always :latest. First 7 days dry-run (DRY_RUN=true);
  flip the local in cleanup.tf after log review.
* Forgejo integrity probe CronJob (*/15): same algorithm as the
  existing registry-integrity-probe. Existing Prometheus alerts
  (RegistryManifestIntegrityFailure et al) made instance-aware so
  they cover both registries during the bake.
* Docs: design+plan in docs/plans/, setup runbook in docs/runbooks/.

Operational note — the apply order is non-trivial because the new
Vault keys (forgejo_pull_token, forgejo_cleanup_token,
secret/ci/global/forgejo_*) must exist BEFORE terragrunt apply in the
kyverno + monitoring + forgejo stacks. The setup runbook documents
the bootstrap sequence.

Phase 1 (per-project dual-push pipelines) follows in subsequent
commits. Bake clock starts when the last project goes dual-push.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:33 +00:00
Viktor Barzin
00614a3302 f1-stream: drop broken curated, dedupe streams, accept all pitsport categories
User feedback: every stream on /watch shows ads but the player fails
to load. Three causes, three fixes:

1. CuratedExtractor's two hmembeds 24/7 channels (Sky F1, DAZN F1)
   sat at the top of the list and ALWAYS failed: they load the
   upstream's ad overlay then JW Player throws error 102630 (empty
   playlist; the obfuscated decoder produces no fileURL in our
   environment). Disabled the registration in extractors/__init__.py
   until/unless we find a working bypass — leaving the existing
   `CURATED_BYPASS = {"curated"}` shim in service.py so the swap is
   reversible.

2. Pitsport surfaces every WRC stage / MotoGP session as its own
   /watch UUID, but they all resolve to the same upstream m3u8 URL
   (e.g. RallyTV one master.m3u8 across all 22 Rally de Portugal
   stages). Added URL-keyed dedupe in service.run_extraction so the
   /streams response shows one row per actual stream.

3. The pitsport category filter was still narrowed to motorsport.
   Pitsport.xyz only lists curated sports broadcasts (WRC, MotoGP,
   IndyCar, NASCAR, Premier League Darts, Premier League football…),
   so the site's own selection is the right filter. Replaced the
   hand-maintained MOTORSPORT_KEYWORDS list with `bool(category or
   title)` — anything pitsport returns goes through. Streams that
   aren't actually live get filtered out downstream when the embed
   API returns an empty manifest.

Frontend: hls.js `lowLatencyMode` was on by default but RallyTV (and
most non-LL-HLS providers) don't ship the LL-HLS extensions, which
broke playback in real browsers. Default to `lowLatencyMode: false`.

Result: /streams is now 1 verified live entry (Rally TV WRC stage
currently airing); was 24 with the top 2 always broken + 22 dupes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:33 +00:00
Viktor Barzin
18d96712c7 f1-stream: pitsport extractor — broaden categories + new safeStream payload
The previous extractor only surfaced Formula 1/2/3 and never returned
anything outside race weekends. Two fixes:

1. Broadened category filter from {formula 1/2/3} to a motorsport set
   (MotoGP/Moto2/Moto3, WRC/WEC/IndyCar/NASCAR + the F1 series).
   Replaces the NON_F1_KEYWORDS exclusion list with a positive-match
   MOTORSPORT_KEYWORDS set; removes the F1-specific filter on title
   keywords. Old `_is_f1_*` aliases retained as compat shims.

2. Updated `_parse_stream_config` for the current pushembdz.store embed
   payload — Next.js now serves `safeStream` (just title + method) and
   the actual stream URL is fetched at runtime from
   `pushembdz.store/api/stream/<slug>`. Extractor now hits that endpoint
   when the inline link is missing. Treats `method=jwp` as HLS and
   accepts URLs ending in `.css` (pushembdz disguises some HLS playlists
   with a `.css` extension).

End-to-end result: /streams went from 2 (curated, broken JW decoder) to
24 streams marked `is_live=True`. The verifier confirms each via
`manifest_parsed_codec_missing_in_verifier` (Playwright Chromium has no
H.264 — manifest fetch alone is the codec-independent positive signal).
Currently surfaces Rally de Portugal SS1–SS22 (WRC); MotoGP starts
appearing once the French GP weekend goes live tomorrow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:33 +00:00
Viktor Barzin
8146d05191 chrome-service: replace static health stub with noVNC view
The static nginx stub at chrome.viktorbarzin.me wasn't useful for
debugging anti-bot interactions. Swap it for a live noVNC HTML5 view
of the headed Chromium session: x11vnc taps Xvfb's :99 over localhost
TCP (added `-listen tcp -ac` to Xvfb), websockify wraps it as a WS
endpoint, and noVNC's vendored web client serves it on :6080.

The ingress chain is unchanged — chrome.viktorbarzin.me stays
Authentik-gated, dns_type=proxied, port 3000 (the Playwright WS) stays
internal-only behind the NetworkPolicy + token. Custom image
`registry.viktorbarzin.me/chrome-service-novnc:v4` (ubuntu:24.04 +
x11vnc + websockify + novnc apt packages) needs imagePullSecrets, so
also added registry-credentials reference to the deployment spec.

x11vnc flags: `-noshm -noxdamage -nopw -shared -forever`. SHM is
disabled because each container has its own /dev/shm so the X server
can't grant access; XDAMAGE isn't compiled into the noble Xvfb. The
sidecar entrypoint waits up to 30s for both Xvfb (:6099) and x11vnc
(:5900) to bind before exec'ing websockify.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:33 +00:00
Viktor Barzin
f18cd1d314 chrome-service: in-cluster headed Chromium pool for f1-stream verifier
The f1-stream verifier's in-process headless Chromium kept tripping
hmembeds' disable-devtool.js Performance detector (CDP latency on
console.log vs console.table) and getting redirected to google.com.

This adds a single-replica chrome-service stack running Playwright
launch-server under Xvfb so callers can connect via WS+token to a
shared headed browser. f1-stream's _ensure_browser now prefers
chromium.connect(CHROME_WS_URL/CHROME_WS_TOKEN) and adds a vendored
stealth init script (webdriver/plugins/languages/Permissions/WebGL
spoofs + querySelector hijack to disarm disable-devtool-auto) on
every new context. Falls back to in-process headless if the env
vars aren't set.

Encrypted PVC for profile + npm cache, NetworkPolicy to TCP/3000
gated by client-namespace label, 6h tar.gz backup CronJob to NFS,
Authentik-gated nginx sidecar at chrome.viktorbarzin.me for human
liveness checks. Image pinned to playwright:v1.48.0-noble in
lockstep with the Python client's playwright==1.48.0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:32 +00:00
Viktor Barzin
41655096c7 openclaw: realtime usage dashboard via Prometheus exporter sidecar
Stdlib-only Python exporter ($1) reads ~/.openclaw/agents/*/sessions/*.jsonl
(assistant messages with usage) plus auth-profiles.json (OAuth expiry,
Plus-tier label) and exposes Prometheus text format on :9099/metrics.
Container is python:3.12-slim; pod template gets prometheus.io/scrape
annotations so the existing kubernetes-pods job picks it up — no
ServiceMonitor needed.

Metrics exported:
  openclaw_codex_messages_total{provider,model,session_kind}    counter
  openclaw_codex_input/output/cache_read/cache_write_tokens_total
  openclaw_codex_message_errors_total{reason}
  openclaw_codex_active_sessions{kind}                          gauge
  openclaw_codex_oauth_expiry_seconds{provider,account,plan}    gauge
  openclaw_codex_last_run_timestamp                             gauge

Grafana dashboard "OpenClaw — Codex Usage" (Applications folder, 30s
refresh): messages/5h vs Plus rate-card, % of 1,200 floor, tokens/5h,
cache hit %, OAuth expiry days, active sessions, last-turn age, errors,
plus per-model timeseries + bar gauge + error table.

Plus rate-card thresholds in the gauge are conservative (1,200/5h floor;
real cap is dynamic 1,200–7,000). Re-baseline if throttling shows up
below 80%.
2026-05-07 23:29:32 +00:00
Viktor Barzin
115ca184ff openclaw: switch primary to ChatGPT Plus OAuth (openai-codex/gpt-5.4-mini)
Bumps image 2026.2.26 → 2026.5.4 (openai-codex provider plugin landed in
2026.4.21+). Auth profile is OAuth via the device-pairing flow against the
Codex backend (account ancaelena98@gmail.com); token persists in
/home/node/.openclaw/agents/main/agent/auth-state.json on NFS so it survives
pod restarts. Plus tier accepts gpt-5.4-mini (1,200–7,000 local msgs/5h);
gpt-5-mini and gpt-5.1-codex-mini both return errors on Plus, so we pin
gpt-5.4-mini explicitly. doctor --fix auto-promotes the highest-tier model
(gpt-5-pro) after model discovery, so the container command pins the mini
back as default after doctor runs but before gateway start.
2026-05-07 23:29:32 +00:00
Viktor Barzin
574cdf08d2 f1-stream: drop demo + landing-page extractors, add fetch-proxy injection
Per user feedback: the demo Big Buck Bunny / Apple test streams aren't
useful in an F1-streams app. Removed DemoExtractor entirely. Tightened
the discord-extractor path filter from "any stream-shaped path" to
"direct embed/player path only" — the previous filter still let
sportsurge `/event/...` landing pages through, which the verifier
mistook for playable because they render player-class divs without a
real player.

Embed proxy now also rewrites window.fetch + XMLHttpRequest.open inside
the upstream HTML so that cross-origin XHRs (e.g. the hmembeds
`/sec/<JWT>` token-binding endpoint) go through our /embed-asset relay.
This avoids the CORS reject that fired when the player JS tried to call
hghndasw.gbgdhdffhf.shop/sec/... from an `f1.viktorbarzin.me` origin.

The verifier now requires a `<video>` element to mark embed streams
playable (not just a player-class div). Curated streams bypass the
verifier — hmembeds aggressively detects headless Chromium (devtool
trap, console-clear timing, automation flags) and won't progress past
JW Player init in our pod, but the user's real browser should clear
those checks. We can't honestly headless-verify hmembeds, so we trust
the curator instead of falsely rejecting them.

Image: viktorbarzin/f1-stream:v6.1.1
2026-05-07 23:29:32 +00:00
Viktor Barzin
f90d79ed4e f1-stream: only show streams confirmed playable by headless browser
Cuts the stream list from 23 mostly-broken entries to ~6 confirmed-playable
ones, and adds an iframe-stripping proxy so embed sources (hmembeds, etc.)
load through our origin without X-Frame-Options / CSP / JS frame-buster
blocks.

Why: the previous list was dominated by Discord-shared news article URLs,
hardcoded aggregator landing pages, and other non-stream URLs that all sat
at is_live=true because embed streams skipped the health check entirely.
Users could not tell which links would actually play.

What:
- backend/playback_verifier.py: new headless-Chromium verifier (Playwright)
  that polls each candidate stream for a codec-independent "playable" signal
  (hls.js MANIFEST_PARSED for m3u8; <video>/player div for embed). Replaces
  the unconditional is_live=True for embed streams in service.py.
- backend/embed_proxy.py: new /embed and /embed-asset routes that fetch
  upstream embed pages, strip X-Frame-Options/CSP/Set-Cookie, and inject a
  <base href> + frame-buster-defeat <script> that locks down window.top,
  document.referrer, console.clear/table, and window.location so the
  hmembeds disable-devtool.js redirect-to-google trap can't fire.
- extractors/curated.py: new always-on extractor with two known-good 24/7
  hmembeds embeds (Sky Sports F1, DAZN F1) so the list isn't empty between
  race weekends.
- extractors/__init__.py: register CuratedExtractor first; drop
  FallbackExtractor (its 10 aggregator landing-pages can't iframe-play).
- extractors/discord_source.py: positive-match path filter (must look like
  /embed/, /stream, /watch, /live, /player, *.m3u8, *.php) plus expanded
  domain blocklist for news sites — was 10 noise URLs, now ~1.
- extractors/service.py: run_extraction now health-checks AND verifier-
  checks both stream types; only verified-playable streams reach is_live.
- main.py: register /embed + /embed-asset routes; defer initial extraction
  by 8s so the verifier can reach the local /embed proxy on 127.0.0.1:8000.
- frontend/lib/api.js + watch/+page.svelte: route embed iframes through
  /embed proxy instead of the upstream URL, so X-Frame-Options/CSP can't
  block them.
- Dockerfile: install Playwright chromium + system codec-runtime libs.
- main.tf: bump pod memory 256Mi → 1Gi for chromium.

Verified end-to-end with Playwright against
https://f1.viktorbarzin.me/watch — 6/6 streams reach a player UI; the 3
demo m3u8s actually play (codec-bearing browser); the 3 embeds (Sky
Sports F1, DAZN F1, sportsurge) render iframes through the proxy.

Image: viktorbarzin/f1-stream:v6.0.5

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:31 +00:00
Viktor Barzin
8b180f7662 openclaw: switch primary model to qwen3-coder-480b (qwen3.5-397b dead on NIM)
NVIDIA retired nim/qwen/qwen3.5-397b-a17b — modelrelay shows consistent
TIMEOUTs over 24h+ of pings, and nim/nvidia/llama-3.1-nemotron-ultra-253b-v1
returns 404. With both gone the openclaw failover never reached
mistral-large-3 in time, so every message hung until the 120s embedded-run
timeout. Promote qwen3-coder-480b-a35b-instruct (already in models list, UP
~1-2s, 256k ctx) to primary; drop the dead nemotron-ultra fallback.
2026-05-07 23:29:31 +00:00
Viktor Barzin
f006b48566 monitoring(wealth): delta panels to 2x4 grid (rows = type, cols = window)
Better visual grouping: instead of 8 paired panels in a single row at
w=3 (cramped, hard to scan), arrange as a 2x4 grid at w=6. Top row
("all" — wealth change incl new money), bottom row ("mkt" — pure
market gain). Columns are timeframes 1d / 7d / 30d / 90d.

Reading vertically: same window, two interpretations side by side.
Reading horizontally: same metric across timeframes.

Layout shift: delta row goes from y=4 (4 wide) to y=4..11 (8 high).
All chart/log panels with y >= 8 shift down by another 4 rows
(net-worth chart 8->12, activity log 81->85, etc.).
2026-05-07 23:29:31 +00:00
Viktor Barzin
0f107aeacb monitoring(wealth): pair every delta panel with market-only twin
User feedback: net-worth delta panels (1d/7d/30d/90d) confused
because +£174k over 90d looked too big against the £271k cumulative
unrealised gain. Decomposition showed the 90d delta was £114k of new
money in (contributions) + £60k of actual market gain.

So now the delta row shows BOTH:
  Δ Nd (all)  — net-worth change incl new money (the original number)
  Δ Nd (mkt)  — pure market gain, contributions stripped out

Pattern for "(mkt)" panels: same now_snap / past_snap CTEs but
selecting both total_value and net_contribution, then computing
(nw_delta - contrib_delta) = market_gain over window.

Layout: 8 panels at w=3 each on the y=4 row, paired by window
(all next to mkt for each timeframe), so you can see "wealth
change vs investment performance" at a glance.

Verified live (90d): all=+£174,612, mkt=+£60,343, contrib=+£114,268.
2026-05-07 23:29:31 +00:00
Viktor Barzin
87069ae5c3 monitoring(wealth): add delta row (1d / 7d / 30d / 90d net-worth changes)
New row at y=4 with 4 stat panels showing net-worth change over the
trailing windows. Each uses the latest-per-account stitching pattern
(skew-resilient against partial-day syncs) and computes:

  delta = SUM(latest per account) - SUM(latest per account at or
                                       before max_complete - N)

Where max_complete is the most recent date all accounts have a row.
For each window: 1d, 7d, 30d, 90d.

Verified live values: +£8,575 / +£22,696 / +£144,633 / +£174,612.

All panels at y >= 4 shifted down by 4 rows to make room (Net worth
chart 4->8, Per-account stacked 24->28, Activity log 77->81, etc.).

Note: this commit also reformats the dashboard JSON from compact-
object form to indented form (json.dump indent=2 side effect from the
Python patch script). No semantic changes outside the new panels and
y-shifts.
2026-05-07 23:29:31 +00:00
Viktor Barzin
da7a11eb3b fix: strip conditional headers in bot-block-proxy to fix CalDAV sync
nginx's not_modified_filter evaluated If-Match headers forwarded by
Traefik's forwardAuth, returning 412 and breaking CalDAV VTODO updates
from macOS/iOS Reminders. Switch to OpenResty and clear conditional
headers with Lua before proxy processing.
2026-05-07 23:29:31 +00:00
Viktor Barzin
813148c4af kms: switch to non-proxied DNS so port 1688 is reachable externally
Cloudflare cannot proxy raw TCP/1688 (KMS protocol). Switch
kms.viktorbarzin.me from CF-proxied CNAME to direct A/AAAA so
clients can reach the vlmcsd LoadBalancer (10.0.20.200) via the
existing pfSense WAN port-forward for 1688.

Verified end-to-end: vlmcs against 176.12.22.76:1688 completes
the KMS V4 handshake for Office Professional Plus 2019.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-06 18:02:25 +00:00
github-actions[bot]
b45c45e419 priority-pass: bump image_tag to 88f18e53 [ci skip]
Auto-committed by ViktorBarzin/priority-pass GHA on push to main.
Source: 88f18e532b
2026-05-05 21:13:14 +00:00
Viktor Barzin
fb454e16d5 priority-pass: parameterise image_tag via var pattern (matches job-hunter)
Adopts the always-latest convention used by job-hunter, payslip-ingest,
and fire-planner: image SHA lives in stacks/priority-pass/terragrunt.hcl
inputs, default in main.tf var. The priority-pass GHA build workflow
auto-commits new SHAs to this file on every successful push.

- Add `variable "image_tag"` (default = current value 7c01448d).
- Both containers now use `local.{frontend,backend}_image` interpolation.
- Replace symlinked terragrunt.hcl with a real file so the stack-local
  inputs block can override image_tag (mirrors payslip-ingest exactly).

State note: priority-pass TF state is currently empty (Tier 1 PG migration
skipped this stack). A subsequent `terragrunt import` is required to
adopt the live deployment + namespace + ingress before running apply.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-05 21:03:46 +00:00
Viktor Barzin
4c8d12229f mailserver: split healthcheck path off PROXY-aware listeners + book-search uses ClusterIP
Two coordinated fixes for the same root cause: Postfix's smtpd_upstream_proxy_protocol
listener fatals on every HAProxy health probe with `smtpd_peer_hostaddr_to_sockaddr:
... Servname not supported for ai_socktype` — the daemon respawns get throttled by
postfix master, and real client connections that land mid-respawn time out. We saw
this as ~50% timeout rate on public 587 from inside the cluster.

Layer 1 (book-search) — stacks/ebooks/main.tf:
  SMTP_HOST mail.viktorbarzin.me → mailserver.mailserver.svc.cluster.local
  Internal services should use ClusterIP, not hairpin through pfSense+HAProxy.
  12/12 OK in <28ms vs ~6/12 timeouts on the public path.

Layer 2 (pfSense HAProxy) — stacks/mailserver + scripts/pfsense-haproxy-bootstrap.php:
  Add 3 non-PROXY healthcheck NodePorts to mailserver-proxy svc:
    30145 → pod 25  (stock postscreen)
    30146 → pod 465 (stock smtps)
    30147 → pod 587 (stock submission)
  HAProxy uses `port <healthcheck-nodeport>` (per-server in advanced field) to
  redirect L4 health probes to those ports while real client traffic keeps
  going to 30125-30128 with PROXY v2.
  Result: 0 fatals/min (was 96), 30/30 probes OK on 587, e2e roundtrip 20.4s.
  Inter dropped 120000 → 5000 since log-spam concern is gone.

`option smtpchk EHLO` was tried first but flapped against postscreen (multi-line
greet + DNSBL silence + anti-pre-greet detection trip HAProxy's parser → L7RSP).
Plain TCP accept-on-port check is sufficient for both submission and postscreen.

Updated docs/runbooks/mailserver-pfsense-haproxy.md to reflect the new healthcheck
path and mark the "Known warts" entry as resolved.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-05 19:45:33 +00:00
Viktor Barzin
c4c5057edc priority-pass: pin to backend 7c01448d (transplant QR into golden-position container)
[ci skip]
2026-05-05 19:14:11 +00:00
Viktor Barzin
1cb2bb30f7 monitoring(wealth): show pre-2024 historical data on timeseries
Bug: timeseries panels were empty before 2024-04-10. Cause was the
complete_dates CTE filtering to "every active account has a row for
this date" -- which excluded every day before the most-recently-added
account first appeared. The 6th account (Trading212 Invest GIA) only
started 2024-04-10, so 4 years of legitimate historical data
(2020-06-07 onwards, when the user genuinely had fewer accounts) got
hidden.

New pattern across panels 5/6/7/8/9/12/13: replace complete_dates with
max_complete cutoff. Compute the most-recent date where all current
accounts have a row, then include every historical date up to and
including that day. Partial-today is still excluded automatically.
Historical days with fewer accounts now show as their actual smaller
sums -- which is the correct historical net worth at the time.

Verified via PG: new pattern returns 2,159 distinct days from
2020-06-07 to 2026-05-05 (vs the previous 391 from 2024-04-10).

Per-account first-seen dates:
  InvestEngine ISA       - 2020-06-07
  Schwab US workplace    - 2020-11-17
  InvestEngine GIA       - 2022-03-17
  Fidelity UK Pension    - 2022-05-16
  Trading212 ISA         - 2024-04-08
  Trading212 Invest GIA  - 2024-04-10  (was the bottleneck)
2026-05-05 18:43:26 +00:00
Viktor Barzin
11a615e723 mailserver: retrigger CI to apply 6e77d187
Pipeline 1846 apply step errored at ~5s post-start, leaving the
e2e probe shell-quoting fix unapplied. K8s state was patched
manually as a temporary unblock; this empty commit retriggers CI
to land the source fix and resolve drift.

[ci]

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 08:05:53 +00:00
Viktor Barzin
6e77d1870e mailserver: fix e2e probe shell-quoting bug (apostrophe in comment)
The 2026-05-02 change that added the Brevo defensive-unblock step
to the email-roundtrip-monitor cron contained an apostrophe in a
Python comment ("wasn't"). The whole script is wrapped in shell
single quotes (python3 -c '...'), so the apostrophe terminated
the shell string. Python only parsed up to the apostrophe and
raised IndentationError on the now-bodyless try: block; everything
after was handed to /bin/sh which complained about "try::" and
unmatched parens. Result: every probe run since 2026-05-02 00:41 UTC
crashed before it could push, and the "Email Roundtrip E2E" Uptime
Kuma push monitor went DOWN with "No heartbeat in the time window".

Fix: rewrite the comment without an apostrophe and add a banner
warning so the next person editing this heredoc does not regress.

Validated: shell parses (bash -n), Python compiles (py_compile)
with the wrapping single quotes intact.
2026-05-04 07:52:32 +00:00
root
0aea98f225 Woodpecker CI Update TLS Certificates Commit 2026-05-03 00:02:02 +00:00
Viktor Barzin
6715cdc51f monitoring(wealth): re-add milestone annotations (now that PG creds rotated)
Re-applies the milestone annotation commit reverted in 0ef36aec. The
earlier "nothing loads / syntax error" was a red herring: Vault had
rotated the wealthfolio_sync DB password 7 days prior, the K8s Secret
picked it up automatically (pg-sync sidecar still working), but the
Grafana datasource ConfigMap is baked at TF-apply time so Grafana was
sending the old password. Every panel + the new annotation alike
failed with: pq password authentication failed for user wealthfolio_sync.

Fix today: refresh the datasource ConfigMap and roll Grafana.

  scripts/tg apply -target=kubernetes_config_map.grafana_wealth_datasource
  kubectl -n monitoring rollout restart deploy/grafana

Annotation source verified live via /api/ds/query: SQL returns 5
milestone rows correctly. Dashboard charts now show vertical dashed
lines at GBP100k 2021-11-01, GBP250k 2023-07-18, GBP500k 2024-09-19,
GBP750k 2025-08-26, GBP1M 2026-04-18.

KNOWN FOLLOW-UP: Vault rotates pg-wealthfolio-sync every 7 days
(static role). Todays failure will recur unless the Grafana
datasource auto-refreshes. Options:
  1. Annotate Grafana deploy with stakater/reloader so it restarts
     when wealthfolio-sync-db-creds Secret changes.
  2. Switch datasource provisioning to read password from an env var
     sourced from the Secret instead of baking into the ConfigMap.
     Combined with reloader, picks up rotation cleanly.
2026-05-02 20:27:21 +00:00
Viktor Barzin
0ef36aec36 Revert "monitoring(wealth): milestone annotations on every timeseries chart"
This reverts commit 5a00b9c096.
2026-05-02 20:20:18 +00:00