Commit graph

3152 commits

Author SHA1 Message Date
Viktor Barzin
ce4a75d79a x402: deploy payment gateway in front of Anubis on all 9 public sites
Adds modules/kubernetes/x402_instance/ — a small Go reverse proxy
(forgejo.viktorbarzin.me/viktor/x402-gateway:ce333419) that selectively
issues HTTP 402 Payment Required to declared AI-bot User-Agents and
validates X-PAYMENT headers against a Coinbase x402 facilitator.
Browsers are forwarded transparently to Anubis (which then handles the
JS PoW gate as before).

Wired into all nine Anubis-fronted sites:
  ingress -> x402-X -> anubis-X -> backend

While `wallet_address` is empty the gateway runs in DRY_RUN — every
request is transparent-proxied, no 402s issued. This lets the pod sit
in the request path with zero behavioural impact today; flipping the
wallet variable in the per-stack module call activates payment-required
mode for AI-bot UAs.

Default config: Base mainnet USDC, $0.01/req, x402.org/facilitator,
catch-all UA list (ClaudeBot|GPTBot|Bytespider|meta-externalagent|
PerplexityBot|GoogleOther|cohere-ai|Diffbot|Amazonbot|
Applebot-Extended|FacebookBot|ImagesiftBot|YouBot|anthropic-ai|
Claude-Web|petalbot|spawning-ai|scrapy|python-requests).

Verified post-apply: 9/9 pods Running, all 9 sites still serve the
Anubis challenge to plain curl with identical TTFB, x402 logs confirm
"dry_run":true on every instance.
2026-05-10 11:12:40 +00:00
root
a1b659de2a Woodpecker CI deploy [CI SKIP] 2026-05-10 11:12:40 +00:00
Viktor Barzin
04cb22fd3b anubis: re-protect f1 with a per-host policy that allows JSON routes
Earlier f1 revert left the host fully unprotected (no Anubis,
exclude_crowdsec=true on the ingress already). Re-add Anubis with
a custom policy_yaml that:

- ALLOWs /_app/* (SvelteKit immutable JS/CSS chunks loaded before
  any cookie exists), /openapi.json, /docs, /api/* (FastAPI meta).
- ALLOWs the 9 known JSON/proxy routes (schedule, streams,
  embed, embed-asset, extract, extractors, health, proxy, relay)
  so the SvelteKit SPA's XHRs return JSON instead of the challenge
  HTML.
- Catch-all CHALLENGE for everything else — the SPA HTML pages
  (which fall through to FastAPI's `/{path}` catch-all) get the
  PoW gate.

The ALLOWed JSON routes are technically scrapeable by a determined
bot, but the user's stated goal is "avoid accidental scrapes" — the
HTML/SPA is the AI-training target, and that stays gated.

Verified: / → Anubis challenge HTML; /schedule, /streams → JSON;
/_app/.../app.js → text/javascript; ClaudeBot UA → Anubis deny page.
2026-05-10 11:12:40 +00:00
Viktor Barzin
a89d4a7d2a anubis: pull f1 off Anubis (XHR-vs-challenge collision) + add latency alerts
f1.viktorbarzin.me is a SPA whose JS fetches /schedule, /embed,
/embed-asset, … on the same path tree. With Anubis fronting `/`,
those XHRs land on the challenge HTML even when the cookie *should*
be valid, breaking the page with `Unexpected token '<', "<!doctype "
... is not valid JSON`. Removed Anubis from f1 — would need a path
carve-out (the way wrongmove does for /api) to re-enable. Added a
top-of-block comment so future me remembers why.

Plus four new Prometheus alerts in `Slow Ingress Latency` group
(stacks/monitoring/.../prometheus_chart_values.tpl):

- IngressTTFBHigh         (warn, 10m, avg latency >1s)
- IngressTTFBCritical     (crit, 5m,  avg latency >3s)
- IngressErrorRate5xxHigh (crit, 5m,  5xx >5%)
- AnubisChallengeStoreErrors (crit, 5m, any 5xx on *anubis* services
  via Traefik — proxies for the in-pod challenge-store error since
  Anubis itself only exposes Go-runtime metrics)

Notes from the alert author: avg-not-p95 because the existing
Prometheus scrape config drops traefik bucket series; once those
are restored, swap to histogram_quantile(0.95). TraefikDown inhibit
rule extended to suppress these four during a Traefik outage.
2026-05-10 11:12:40 +00:00
Viktor Barzin
8197842646 anubis: fix 500 on multi-replica + roll out to 6 more public sites
Browser visits to viktorbarzin.me started returning HTTP 500 with
`store: key not found: "challenge:..."` in pod logs. Root cause:
each Anubis pod stores in-flight challenges in process memory; with
2 replicas behind a ClusterIP, the PoW-solved request can be
routed to a different pod than the one that issued the challenge.
Anubis upstream documents the same caveat ("when running multiple
instances on the same base domain, the key must be the same across
all instances" — true for the ed25519 signing key, but the
challenge store is still pod-local without a shared backend).

Drop module default replicas: 2 → 1. Worst-case: ~1s cold-start on
pod restart. Real fix (Redis-backed challenge store) noted as a
follow-up in CLAUDE.md.

Roll Anubis out to: f1-stream, cyberchef (cc), jsoncrack (json),
privatebin (pb), homepage (home), real-estate-crawler (wrongmove
UI only — `/api` ingress stays direct via path-based ingress carve-
out so XHRs from the SPA bypass the challenge).

End-state: 9 public hosts now Anubis-fronted (blog, www, kms,
travel, f1, cc, json, pb, home, wrongmove). All return the
challenge HTML to bare curl/browser; verified-IP search engines and
/robots.txt + /.well-known still skip via the strict-policy
allowlist.
2026-05-10 11:12:40 +00:00
Viktor Barzin
abdef1781c anubis: strict bot policy — catch-all CHALLENGE for unmatched UAs
The default upstream policy only WEIGHs Mozilla|Opera UAs and lets
everything else (curl, wget, python-requests, scrapy, headless CLI
scrapers) fall through to the implicit ALLOW. On non-CDN-fronted
hosts (kms, anything dns_type=non-proxied) this meant a plain
`curl https://kms.viktorbarzin.me/` returned the real backend
content with no challenge — defeating the whole point of the
"avoid casual scrapers" intent.

Now the module ships a custom POLICY_FNAME mounted via ConfigMap:
- Imports the upstream deny-pathological / ai-block-aggressive /
  allow-good-crawlers / keep-internet-working snippets unchanged
- Adds a final `path_regex: .*` → action: CHALLENGE catch-all

Result: only IP-verified search engines (Googlebot from Google IPs,
Bingbot, etc.) and well-known paths (robots.txt, .well-known,
favicon, sitemap) skip the challenge. Everything else — including
spoofed-Googlebot-UA-from-random-IP — solves PoW or gets nothing.

Verified post-apply: curl default UA on viktorbarzin.me + kms +
travel returns the Anubis challenge HTML; /robots.txt still 200s
straight through.
2026-05-10 11:12:40 +00:00
Viktor Barzin
2d6812f951 fire-planner: dual ingress — /api/* unprotected, / behind Authentik
The SPA can't carry an Authentik session on its own fetch() XHRs in
all cases (cross-origin redirect to authentik.viktorbarzin.me on a
stale cookie returns HTML, fetch().json() parse fails). Splitting
the ingress so /api/ paths skip forward-auth lets the React app talk
to its API end-to-end. The browser still has to log in via
Authentik to load the SPA at /.

Verified end-to-end via chrome-service Playwright: dashboard load,
scenario list, what-if run with real Monte Carlo, save-as-scenario
round-trip, run-now on detail, delete — all pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:40 +00:00
Viktor Barzin
58fd4025f8 anubis: per-site PoW reverse proxy on blog + kms + travel-blog
Adds modules/kubernetes/anubis_instance/ — a per-site reverse proxy
instance pinned to ghcr.io/techarohq/anubis:v1.25.0. Each instance
issues a 30-day JWT cookie scoped to viktorbarzin.me after a tiny
proof-of-work (difficulty 2 ≈ 250 ms desktop / 700 ms mobile). The
shared ed25519 signing key (Vault: secret/viktor → anubis_ed25519_key)
makes a single solve good across every Anubis-fronted subdomain.

Wired into blog (viktorbarzin.me + www), kms.viktorbarzin.me, and
travel.viktorbarzin.me — each with anti_ai_scraping=false on the
ingress so the redundant ai-bot-block forwardAuth is dropped from the
chain. Skipped forgejo (Git/API clients can't solve PoW) and resume
(replicas=0).

Also tightens bot-block-proxy nginx timeouts (3s/5s → 100ms/200ms) so
any ingress still using the ai-bot-block forwardAuth pays at most
~150 ms when poison-fountain is scaled down, instead of 3 s.

End-to-end TTFB on viktorbarzin.me dropped from ~3.2 s to ~150-200 ms.

Docs: .claude/reference/patterns.md "Anti-AI Scraping" updated to
4 layers; .claude/CLAUDE.md adds the Anubis usage paragraph and
Forgejo/API caveat.
2026-05-10 11:12:40 +00:00
root
ea2cb57e69 Woodpecker CI Update TLS Certificates Commit 2026-05-10 11:12:40 +00:00
Viktor Barzin
248279605b postiz: disable signups (DISABLE_REGISTRATION=true)
Admin account already exists; we don't want random users registering
on the public-facing instance. Sign-in only from now on.
2026-05-10 11:12:40 +00:00
Viktor Barzin
9904561c26 fire-planner: ingress port 8080 (was defaulting to 80)
ingress_factory's port var defaults to 80, but fire-planner publishes
on 8080. Traefik logged 'Cannot create service error="service port
not found"' and 404'd every request. Cloudflare's standard
origin-error decoy page (with the noindex meta + cdn-cgi/content
honeypot link) made it look like a bot-block, but it was just the
upstream coming back 404.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:39 +00:00
root
017e139b80 Woodpecker CI deploy [CI SKIP] 2026-05-10 11:12:39 +00:00
Viktor Barzin
08edd92b22 kms: deploy slack-notifier sidecar with Prometheus metrics + document public exposure
Slack notifier now also exposes /metrics on :9101 with stdlib HTTP — counts
activations and dedup-skips by product, gauges last-activation timestamp.
Pod template gets the standard prometheus.io/scrape annotations so the
cluster-wide kubernetes-pods job picks it up via pod IP. Memory request
bumped to 48Mi to cover counter dicts + HTTPServer.

Plus docs: networking.md footnotes the windows-kms row noting public WAN
exposure with the rate-limited (max-src-conn 50, max-src-conn-rate 10/60,
overload <virusprot> flush) pfSense filter rule, and a new runbook covers
log locations, rate-limit tuning, and how to revoke the WAN forward.

The matching pfSense rule was tightened in place (TCP-only + rate limits)
via SSH; pfSense isn't Terraform-managed.
2026-05-10 11:12:39 +00:00
Viktor Barzin
efadeb531d state(dbaas): update encrypted state 2026-05-10 11:12:39 +00:00
Viktor Barzin
0d8e0ca6fc backup: fix daily-backup silent failures, postiz pg_dump CronJob, doc reconcile
daily-backup ran out of its 1h budget and SIGTERMed for 10 days straight (Apr
30 → May 9). Each failed run left its snapshot mount stacked on /tmp/pvc-mount,
which blocked the next run from completing — root cause of the WeeklyBackupStale
alert going silent (the metric never reached its end-of-script push).

Fixes:
- TimeoutStartSec 1h → 4h (current workload of 118 PVCs needs ~1.5h, was hitting
  the wall during week 18 runs)
- Recursive umount + LUKS cleanup on EXIT trap, plus the same at script start as
  belt-and-braces for any inherited stuck state from a prior crashed run
- TERM/INT trap pushes status=2 metric so WeeklyBackupFailing fires instead of
  the alert going blind on systemd kills
- pfsense metric pushed in BOTH success and failure paths (was only on success;
  any ssh-to-pfsense outage made PfsenseBackupStale silent until the alert
  threshold expired)

Postiz backup CronJob: bundled bitnami PG/Redis live on local-path (K8s node
OS disk) — outside Layer 1+2 of the 3-2-1 pipeline. Added postiz-postgres-backup
that pg_dumps postiz + temporal + temporal_visibility daily 03:00 to
/srv/nfs/postiz-backup, getting Layer 3 offsite coverage. Verified end-to-end:
3 dumps written, Pushgateway metric received. Note: bitnamilegacy/postgresql
image is stripped (no curl/wget/python) — switched to docker.io/library/postgres
matching the dbaas/postgresql-backup pattern with apt-installed curl.

Doc reconcile (backup-dr.md): metric names had drifted (e.g. the docs claimed
backup_weekly_last_success_timestamp but the script pushes
daily_backup_last_run_timestamp). Updated to match what's actually emitted, and
added a "default-covered" footnote to the Service Protection Matrix so the
~40 services with PVCs not enumerated in the table are no longer ambiguous.

Manual PVE-host actions (out-of-band, not in TF):
- unmounted 6 stacked snapshots from /tmp/pvc-mount
- pruned 5 stale snapshots on vm-9999-pvc-67c90b6b... (origin LV that the
  loop got SIGTERMed against repeatedly, so prune kept failing)
- created /srv/nfs/postiz-backup directory
- triggered a one-shot daily-backup run with the new TimeoutStartSec to
  validate the fix end-to-end

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:39 +00:00
Viktor Barzin
8c619278d3 grafana: env-var datasources + reloader so Vault rotations stop breaking dashboards
Wealth, Payslips, and Job-Hunter Grafana datasources all baked the
rotating PG password into their ConfigMap at TF-apply time, so every
7-day Vault static-role rotation silently broke the panels until a
manual `terragrunt apply`. Same family as the recurring grafana-mysql
backend bug — Grafana caches creds at startup and never picks up the
new ESO-synced password without a restart.

Fix:
- Each source stack now creates an ExternalSecret in `monitoring`
  exposing the rotating password as `<NAME>_PG_PASSWORD` env-var.
- Grafana mounts those via `envFromSecrets` (optional=true so a
  missing source stack doesn't block boot) and the datasource
  ConfigMaps reference `$__env{<NAME>_PG_PASSWORD}` instead of a
  literal password.
- `reloader.stakater.com/auto: "true"` on the Grafana pod restarts
  it whenever any of the four DB-cred Secrets is updated.

Tested end-to-end: forced `vault write -force database/rotate-role/
pg-wealthfolio-sync` → ESO synced (~30s) → reloader fired →
Grafana booted with new env in ~50s total → all three /api/datasources
/uid/*/health endpoints return "Database Connection OK".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:39 +00:00
Viktor Barzin
57250cfda2 mysql: bump to 4Gi limit / 3Gi request; grow /srv/nfs LV to 3 TiB
mysql-standalone OOMKilled May 8 18:05 (anon-rss 2 GB at the 2 Gi limit).
innodb_buffer_pool_size=1Gi plus connection buffers and InnoDB internals
don't fit in 2 Gi. Bumping limit to 4 Gi (request 3 Gi) leaves headroom
without changing the buffer pool config.

/srv/nfs was at 90% (1.7T / 2T); grew the underlying pve/nfs-data LV
1 TiB online and ran resize2fs (now 60% used). Triggered by surfacing
during the 2026-05-09 IO-pressure post-mortem; thinpool had ~4.6 TiB
free.

The post-mortem also covers the stale-NFS-client trigger (legacy
/usr/local/bin/weekly-backup pointing at the decommissioned TrueNAS IP)
and the resulting wedged kthread on the PVE host. Script removed and
node_exporter restarted out-of-band; kthread will clear at next PVE
reboot. See docs/post-mortems/2026-05-09-io-pressure-stale-nfs.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:38 +00:00
Viktor Barzin
b254c536f9 ig-poster: bump to da5b4191 (auto-curate from recent favorites) 2026-05-10 11:12:38 +00:00
root
a5a54aebe3 Woodpecker CI deploy [CI SKIP] 2026-05-10 11:12:38 +00:00
Viktor Barzin
72013a0890 n8n: real-time training loop + decoupled posting
instagram-approval: after every tap, immediately fetch /candidates?limit=1
and send the next photo as a fresh inline-keyboard message — the user's
tap chains back into this same workflow, so the loop is user-paced.
When the pool is exhausted, send an 'all caught up' summary with the
backlog count + cumulative training stats.

instagram-discover: cron throttled from every-30-min to daily 09:00.
The chain handles ongoing training; the daily run only kickstarts a
session if the user hasn't been tapping. Limit reduced from 3 → 1 so
each kickstart sends a single photo (chain takes over).
2026-05-10 11:12:38 +00:00
Viktor Barzin
ff2f32a33e ig-poster b17a9737 + n8n discover rewritten to use /candidates with CLIP scoring 2026-05-10 11:12:38 +00:00
Viktor Barzin
94e2f34e2a ig-poster: bump to 3b862fe4 (EXIF orientation + auto-pending /candidates) 2026-05-10 11:12:38 +00:00
Viktor Barzin
29bb434e1e ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
Viktor Barzin
cb83972b79 ig-poster: bump to cac6fa97 + sync POSTIZ_INTEGRATION_ID via ESO 2026-05-10 11:12:37 +00:00
Viktor Barzin
40ca011bd6 postiz: expose /uploads publicly so Meta IG fetcher can pull JPEGs
Stories+feed posts via Postiz failed with state=ERROR and Postiz
mistranslated the cause as 'Invalid Instagram image resolution
max: 1920x1080px'. Real cause: Postiz hands Meta an upload URL
under https://postiz.viktorbarzin.me/uploads/... and Meta gets a
302 to the Authentik login page instead of bytes. Meta returns
error 36001 (image not fetchable) which Postiz maps to that
misleading resolution string.

Split the ingress: /uploads/* on a public ingress (matches the
instagram-poster /image+/original pattern), everything else
remains behind Authentik forward-auth. /uploads contents are
random UUIDs, low blast radius if scraped.
2026-05-10 11:12:37 +00:00
Viktor Barzin
b3ae2c5476 docs: PVC templates need lifecycle.ignore_changes for autoresizer
The canonical proxmox-lvm and proxmox-lvm-encrypted PVC templates were
missing `lifecycle { ignore_changes = [spec[0].resources[0].requests] }`.
Without it, every PVC created from these templates becomes a drift bomb
the moment pvc-autoresizer expands it: the next `tg apply` on that stack
will try to shrink the PVC back to the TF-declared size, K8s rejects the
shrink, and apply fails.

This was latent because pvc-autoresizer was silently broken cluster-wide
(commit 9d5da4d8 fixed it by allow-listing kubelet_volume_stats_available_bytes
in Prometheus). Now that the autoresizer actually works, every existing
proxmox-lvm/encrypted PVC without ignore_changes is at risk.

Sweep needed (separate task): grep for kubernetes_persistent_volume_claim
across stacks/ and add ignore_changes to any with resize.topolvm.io
annotations.
2026-05-10 11:12:37 +00:00
Viktor Barzin
ce9bf5b676 postiz: wire INSTAGRAM_APP_ID/SECRET via ESO for IG-standalone provider
Standalone provider (instagram-standalone OAuth flow) is what the user
is trying after the FB-Login path was blocked by their Business Account
ad-policy flag. Uses modern scope names (instagram_business_*), so no
JS patch needed unlike the FB-Login provider.
2026-05-10 11:12:37 +00:00
Viktor Barzin
e883c9d63f ci(drift-detection): generate kubeconfig from projected SA token
Same fix as default.yml — drift-detection cron also runs terragrunt
plan on every stack, which requires the kubeconfig at <repo>/config
that terragrunt.hcl injects via -var kube_config_path. Pipeline #547
(latest scheduled drift-detection run) failed with the same
'config_path refers to an invalid path' error.
2026-05-10 11:12:37 +00:00
Viktor Barzin
ce45e69e38 ci(woodpecker): generate kubeconfig from projected SA token
terragrunt.hcl injects -var kube_config_path=${repo_root}/config for
every terraform invocation, but the pipeline never created that file.
Every commit that touched a TF stack since #545 (2026-05-08) failed
with 'config_path refers to an invalid path: \"../../config\": no such
file or directory' followed by the kubernetes provider falling back
to localhost:80.

Add a step that writes a kubeconfig at <repo>/config using the
projected SA token + cluster CA. The woodpecker namespace's default
SA is already cluster-admin (woodpecker-default ClusterRoleBinding),
so the projected token is sufficient for any stack apply. Using
tokenFile (not an inline token) lets the provider re-read it if
kubelet rotates the projected token mid-pipeline.

#545 was the last green run because that commit only changed the
build-cli pipeline — 0 stacks applied so the missing kubeconfig
never mattered.
2026-05-10 11:12:37 +00:00
Viktor Barzin
9c1df3ad96 chore: remove decommissioned registry.viktorbarzin.me ingress
The old port-5050 R/W private registry was decommissioned 2026-05-07
(forgejo-registry-consolidation Phase 4). The reverse-proxy ingress
+ ExternalName service + Cloudflare DNS record kept pointing at the
dead backend, returning 502 to anyone hitting registry.viktorbarzin.me.

This was driving 3 monitoring artifacts that auto-cleared on cleanup:
- Uptime Kuma external monitor #586 (deleted)
- Pushgateway stale registry-integrity-probe metrics (deleted)
- ExternalAccessDivergence + RegistryIntegrityProbeStale alerts
2026-05-10 11:12:37 +00:00
Viktor Barzin
8c09543391 fix: restore pvc-autoresizer by allow-listing kubelet_volume_stats_available_bytes
The Prometheus scrape config for the kubernetes-nodes job kept
capacity_bytes + used_bytes but dropped available_bytes. pvc-autoresizer
computes utilization from available/capacity, so without that metric it
was silent for every PVC in the cluster — including mailserver, which
filled to 89% (1.7G/2.0G) and started rejecting all inbound mail with
'452 4.3.1 Insufficient system storage' (15+ hours, all real senders:
Brevo, Gmail, Facebook).

Also bumps the floors of mailserver (2Gi -> 5Gi, limit 10Gi) and forgejo
(15Gi -> 30Gi) PVCs to recover from the immediate outage, and adds
ignore_changes on requests.storage so future autoresizer expansions
don't cause TF drift.
2026-05-10 11:12:37 +00:00
Viktor Barzin
c44d855960 ig-poster: pivot to Telegram-only delivery (manual IG upload)
User dropped Postiz/Instagram OAuth (Meta Business Account flagged
+ Postiz scope drift). New pipeline ends at Telegram — full-quality
JPEG delivered to the bot chat, manually uploaded to IG by the user.

- Image bumped to 25e46efd: adds /deliver/{asset_id} endpoint that
  multipart-uploads to Telegram (URL-fetch fails through Cloudflare
  for >5MB), then tags 'posted' in Immich.
- ESO now syncs telegram_bot_token + telegram_chat_id from Vault.
- Public ingress paths grow to ['/image', '/original'] (Authentik
  bypass on /original is harmless — files are user-tagged, low blast
  radius — and useful for ad-hoc browser downloads).
- Memory limit 512Mi -> 1500Mi: full-resolution Pillow HEIC decode
  was OOMing on 12MP+ phone photos.
- discover.json simplified to scan -> deliver per item; approval and
  post workflows already deactivated. Telegram bot webhook removed.
2026-05-10 11:12:37 +00:00
Viktor Barzin
bd8dbbc76f postiz: wire FACEBOOK_APP_ID/SECRET via ESO for IG-Business integration 2026-05-10 11:12:37 +00:00
Viktor Barzin
02e28294e9 postiz: idempotent Job to drop default Text search attributes (Temporal SQL visibility caps at 3 Text attrs; auto-setup ships with 2, Postiz adds 2 more — gitroomhq/postiz-app#1504) 2026-05-10 11:12:37 +00:00
Viktor Barzin
16e408ee59 postiz: bump memory limit to 4Gi (was OOMing during NestJS startup) 2026-05-10 11:12:37 +00:00
Viktor Barzin
888df84fb5 postiz: add Temporal sidecar; lock both stacks behind Authentik
Postiz backend was crashlooping on connect ECONNREFUSED ::1:7233 —
Postiz needs Temporal for cron/scheduled posts and the Helm chart
doesn't bundle it. Added a single-replica temporalio/auto-setup:1.28.1
Deployment in the postiz namespace, backed by the bundled
postiz-postgresql (separate `temporal` + `temporal_visibility`
databases pre-created via init container), ENABLE_ES=false (Postiz
only uses the workflow engine, not visibility search). Skips
DYNAMIC_CONFIG_FILE_PATH because that file isn't bundled in
auto-setup.

Auth audit:
- postiz: ingress now `protected = true` (Authentik forward-auth).
  Postiz also has its own login on top, but registration is no
  longer exposed to the open internet.
- instagram-poster: split into two ingresses on the same host.
  `/image/*` stays public (Meta + Telegram fetch the 9:16
  derivatives). Everything else (/healthz, /queue, /scan,
  /enqueue, /reject, /post-next) sits behind Authentik. The
  protected ingress sets dns_type=none — the public one already
  created the CF DNS record.
2026-05-10 11:12:37 +00:00
Viktor Barzin
c6939c3d53 postiz + n8n: real DB URL + webhook-trigger approval
- postiz: set DATABASE_URL/REDIS_URL pointing at the bundled subcharts;
  the chart does NOT auto-wire even when postgresql.enabled=true, so
  the prisma db:push was failing with empty DATABASE_URL.
- n8n approval workflow: swap telegramTrigger -> webhook node so it
  works without an n8n-stored Telegram credential. Telegram bot's
  webhook is set via setWebhook to https://n8n.viktorbarzin.me/webhook/instagram-approval.
  Parse-callback Code node tolerates both shapes ({body:{callback_query:...}}
  vs {callback_query:...}) so a future move back to telegramTrigger doesn't break.
2026-05-10 11:12:37 +00:00
Viktor Barzin
5057341d09 postiz + instagram-poster: deploy fixes after first apply
- postiz: pin chart name to 'postiz-app' (was 'postiz', wrong path)
  and override bundled bitnami subchart images to bitnamilegacy/* —
  Bitnami removed bitnami/postgresql + bitnami/redis from DockerHub
  in Aug 2025 (Broadcom acquisition).
- postiz: enable initial registration (DISABLE_REGISTRATION=false)
  so first admin user can be created in UI; tighten after.
- instagram-poster: add securityContext (fsGroup/runAsUser=10001)
  so kubelet chowns the PVC mount for the non-root 'poster' user;
  was crashing on alembic with 'unable to open database file'.
- instagram-poster: bump image_tag to 24935ab4 (uvicorn now binds
  to port 8000 to match Service contract; was 8080 -> probe 404).
2026-05-10 11:12:37 +00:00
Viktor Barzin
2d1dfa49f6 instagram-poster: pin image tag to 23f8b4ed (initial push) 2026-05-10 11:12:37 +00:00
Viktor Barzin
73eb01f994 add postiz + instagram-poster stacks for IG Stories pipeline
New stacks:
- stacks/postiz/ — Postiz scheduler (Helm chart v1.0.5, image v2.21.7)
  with bundled PG/Redis, /uploads PVC on proxmox-lvm, JWT_SECRET
  via ESO from secret/instagram-poster.
- stacks/instagram-poster/ — custom Python service that polls Immich
  for the 'instagram' tag, reformats photos to 9:16 with blurred-bg
  letterbox, exposes /image/<asset_id> publicly so Postiz can fetch.
  Image: forgejo.viktorbarzin.me/viktor/instagram-poster.

n8n: 3 new workflows (discover, approval, post) for the Telegram
inline-button approval UX. Adds ExternalSecret + env vars for
TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID, IMMICH_API_KEY, plus static
URLs for the new service.

Vault: seed secret/instagram-poster with telegram_bot_token,
telegram_chat_id, immich_api_key, postiz_api_token,
postiz_jwt_secret before applying.
2026-05-10 11:12:37 +00:00
Viktor Barzin
badc341669 openclaw: regenerate kubeconfig at pod start using projected SA tokenFile
The previously-baked kubeconfig at /home/node/.openclaw/kubeconfig retained
a service-account token bound to the original (long-dead) pod, so kubectl
calls from inside the openclaw container failed with "the server has asked
for the client to provide credentials" even though the openclaw SA has
cluster-admin and kubelet projects a fresh token at
/var/run/secrets/kubernetes.io/serviceaccount/token.

Add init-container "setup-kubeconfig" that writes a kubeconfig with
tokenFile + certificate-authority paths pointing at the projected
SA volume — kubelet auto-rotates the token, kubectl always reads
fresh creds, no Vault K8s-creds-engine refresh needed.

Verified end-to-end: agent ran `kubectl get nodes -o wide` inside the
pod and delivered a correct one-line summary to Telegram via
openai-codex/gpt-5.4-mini.
2026-05-10 11:12:37 +00:00
Viktor Barzin
8b0b4e5148 [ci] build-cli: drop registry.viktorbarzin.me:5050 push (decommissioned)
The build-cli pipeline was still pushing to the
registry.viktorbarzin.me:5050/infra path that no longer exists
post Phase 4 — failing with 'error authenticating: exit status 1'
on every infra push. Drop the second repo + login; DockerHub +
Forgejo are the canonical destinations now.
2026-05-10 11:12:37 +00:00
Viktor Barzin
a39893bb60 [woodpecker] Re-fix null_resource trigger after lint reverted it
The helm provider in this Terraform version doesn't support
list-index access on helm_release.metadata[0]. Switch the
woodpecker_server_host_alias trigger to {helm_version, sha256(values)}
which works regardless of provider quirks. (Original fix landed
2026-05-07; got reverted by a linter pass.)
2026-05-10 11:12:36 +00:00
Viktor Barzin
564c64f4c7 f1-stream: register HmembedsExtractor in registry
Companion commit to 92474254 — the new extractor wasn't being
registered, only the file was added. Add the import + register call
in create_registry().

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:36 +00:00
Viktor Barzin
18604d808e f1-stream: hmembeds offline decoder — reverse-engineered the JW Player trap
Four-agent parallel investigation finally pinned down what's happening
with the hmembeds.one streams. The TL;DR is unexpected: there is no
fingerprint check, no decoder failure, no broken JS — the obfuscated
decoder is trivial to reproduce, but the upstream origin is dead.

Findings (saved at /tmp/jwre/{findings.md, blob-analysis.md,
fingerprint-gap.md, trace-summary.md}):

1. **The "ZpQw9XkLmN8c3vR3" blob is decoy.** It's an Adcash adblock-
   bypass config — not the stream URL. The actual stream URL is in a
   different inline `<script>` block of the embed HTML.

2. **The real decoder is base64 + XOR with a hardcoded key**, the key
   appears literally in the HTML (e.g. `var k="bux7ver6mow4trh1"`).
   No browser-derived inputs. We can run it in Python in 50µs.

3. **The decoded URL is JWT-bound to /24 of the requestor's IP**. JWT
   payload: `{stream, ip:"176.12.22.0/24", session_id, exp}`. From our
   cluster (egress 176.12.22.76) the JWT IP-binding is satisfied.

4. **The origin still returns 404 (GET) / 403 (HEAD).** Tested both
   curated embeds (Sky F1 888520f3..., DAZN F1 fc3a5463...) — same
   404. Origin landing page (`/`) returns 200, so the host is up;
   the `/sec/<JWT>/<embed_id>.m3u8` endpoint specifically refuses.

5. **No fingerprint surface trips this.** Runtime trace via
   chrome-service hooks confirmed: decoder reads navigator.userAgent
   (heavy), screen dimensions, and a single WebGL getParameter call.
   No canvas, audio, fonts, fetch-to-fingerprint-API. JW Player setup
   is given a valid file URL — the playlist stays empty because JW
   can't fetch the manifest from the (dead) origin.

Verdict: **the legacy curated hmembeds embeds (`888520f3...` Sky F1,
`fc3a5463...` DAZN F1) are upstream-dead.** No browser-side fix is
possible. The community uses these IDs as "24/7 channels" but they're
in a perpetually-offline state right now.

This commit ships the offline decoder anyway, registered as a new
extractor. Two reasons:
- If those origins come back online, no code change needed.
- Future curated hmembeds IDs (added by hand or discovered via
  subreddit posts) will resolve through the same path.

Files added: `extractors/hmembeds.py` (~120 lines incl. the decoder
and a `decode_embed(html) -> str | None` helper that's reusable).
Registered in `__init__.py`. The existing CuratedExtractor stays
disabled; this replaces its mechanism with one that can absorb new
embed IDs without code changes.

Bonus from the agent work:
- Confirmed our stealth.js is sufficient — the runtime trace showed
  the decoder reads only the surfaces we already cover.
- Identified ~10 fingerprint surfaces we don't spoof (platform,
  userAgentData, hardwareConcurrency, deviceMemory, timezone,
  AudioContext, ICE candidates) but proved they're not what's
  blocking us, so no change needed for now.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:36 +00:00
Viktor Barzin
ffa1d6d5dc [woodpecker] Programmatic Forgejo repo registration
Earlier I claimed the OAuth Web UI flow was the only way to onboard
new Forgejo repos in Woodpecker. That's wrong.

Two parts to the actual workaround:
1. Woodpecker session JWTs are HS256 signed with the user's per-user
   `hash` column from the PG `users` table (NOT the global agent
   secret). Mint a session JWT for the Forgejo viktor user (id=2,
   forge_id=2), and you're authenticated as that user.
2. POST /api/repos?forge_remote_id=N as viktor → Woodpecker calls
   Forgejo with viktor's stored OAuth access_token to create the
   webhook + per-repo signing key. Works.

The 500 I saw earlier was from POST'ing as ViktorBarzin (GitHub
admin), whose user row has no Forgejo OAuth token — Woodpecker's
forge-API call fails for that user, surfacing as a 500.

scripts/woodpecker-register-forgejo-repo.sh wraps the whole flow:
extract hash from PG → mint JWT → activate repo. Verified against
viktor/{broker-sync,claude-agent-service,freedify,hmrc-sync} in
this session — all activated cleanly.

Also updated the runbook with the actual mechanism + the
WOODPECKER_FORGE_TIMEOUT=30s tip (the real root cause of the
'context deadline exceeded' failures, NOT the v3.14 upgrade).
2026-05-10 11:12:36 +00:00
Viktor Barzin
afd78f8d3e kms: replace inline ConfigMap nginx with custom Hugo image
The kms-web-page deployment now pulls
forgejo.viktorbarzin.me/viktor/kms-website:${var.image_tag} (source
in the new Forgejo repo viktor/kms-website). The ConfigMap-mounted
index.html is gone — the new site is a Hugo build with full GVLK
catalog for every Microsoft KMS-eligible Windows + Office edition,
copy-to-clipboard, dark/light themes.

The container image tag is managed by CI (kubectl set image), so
add lifecycle ignore_changes on container[0].image alongside the
existing dns_config (Kyverno) ignore.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:35 +00:00
Viktor Barzin
4518aff71c f1-stream: Stremio addon extractor — TvVoo + StremVerse Sky F1 / DAZN F1
5 parallel research agents surveyed Stremio addons, F1 TV / Sky / DAZN
official APIs, IPTV M3U lists, and free-to-air broadcasters. The clean
finding: two community Stremio addons already index Sky Sports F1 +
DAZN F1 via their public HTTP APIs — no Stremio client required, just
GET /stream/<type>/<id>.json on the addon's hosted instance.

New `stremio.py` extractor pulls from:
- **TvVoo** (`https://tvvoo.hayd.uk/manifest.json`) — wraps Vavoo IPTV.
  Lists Sky Sports F1 UK + Sky Sports F1 HD + Sky Sport F1 IT + Sky
  Sport F1 HD DE + DAZN F1 ES. Returns 2 IP-bound m3u8 URLs per
  channel. Source: github.com/qwertyuiop8899/tvvoo. Vavoo's CDN SSL
  certs are currently expired so most clients fail verification today
  — addon framework is right but delivery is degraded.
- **StremVerse** (`https://stremverse.onrender.com/manifest.json`) —
  Returns 11+ streams per id (`stremevent_591` = F1, `stremevent_866`
  = MotoGP). Mix of DRM-walled DASH, JW-broken-chain JWT URLs, and
  HuggingFace-Space proxies that 404 without a per-instance api_password.

The extractor surfaces 15 candidate URLs per run; verifier filters to
the playable subset. Today that subset is 0 (Vavoo cert expiry + JW
chain + proxy auth), but the wiring is correct: as the addons fix
delivery or rotate to fresh URLs, candidates will start passing.

Other agent findings worth noting (not coded but documented):
- F1 TV Pro live = Widevine DASH; impossible without a CDM. VOD is
  clean HLS but only post-session.
- Sky Go / DAZN / Viaplay / Canal+ = all Widevine + geo-fenced + active
  DMCA enforcement. Pursuing not feasible.
- ServusTV AT (free F1 race weekends) = clean public HLS at
  rbmn-live.akamaized.net/hls/live/2002825/geoSTVATweb/master.m3u8 but
  geo-fenced; needs an Austrian-IP egress proxy/VPN.
- iptv-org/iptv has an F1 Channel (Pluto TV IE) at
  jmp2.uk/plu-6661739641af6400080cd8f1.m3u8 — 24/7 free, BG works,
  but only historic races + shoulder programming. Worth adding as a
  curated entry later.
- boxboxbox.* (community-favourite F1 race-weekend domain) is dead
  across all known TLDs as of today.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 23:29:35 +00:00
Viktor Barzin
d832a33039 [woodpecker] Bump WOODPECKER_FORGE_TIMEOUT 3s → 30s
The default forge-API timeout is 3 seconds. The config-loader makes
4-6 sequential calls per pipeline trigger (probing for .woodpecker dir
then each .woodpecker.{yaml,yml} variant), and Forgejo responses on
this cluster spike to 1-2s under load — easy to trip the cumulative
3s deadline. Result: 'could not load config from forge: context
deadline exceeded' on virtually every pipeline trigger.

This was the actual root cause of the 'Woodpecker forge-API bug'
that v3.13 → v3.14 was supposed to fix — turns out v3.14 didn't
change the timeout default, and the v3.13 successes I saw earlier
were warm-cache flukes.
2026-05-07 23:29:35 +00:00
Viktor Barzin
afafc9928f [docs] Onboarding runbook for new Forgejo repos in Woodpecker 2026-05-07 23:29:35 +00:00