Compare commits

...

48 commits

Author SHA1 Message Date
Viktor Barzin
a5e9fd8c71 fire-planner: expose actualbudget creds via ExternalSecret
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
Adds 3 new keys (ACTUALBUDGET_API_URL/KEY/SYNC_ID) sourced from
Vault secret/fire-planner so the FastAPI backend can read viktor's
spending from the in-cluster actualbudget HTTP API and prefill the
Annual spending field on the WhatIf form. Vault keys seeded manually
ahead of this commit; ESO has already synced the K8s Secret.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:40 +00:00
Viktor Barzin
753e9bb971 x402: consolidate to a single shared forwardAuth gateway
The per-site `x402_instance` module created one Deployment + Service +
PDB per protected host (9 in total, 9×64Mi). Every pod was running the
exact same logic with the same config — the only thing that varied
was the upstream URL, which we don't even need since the gateway can
return 200 to "allow" and Traefik handles the upstream itself.

Refactor to the same pattern as `ai-bot-block`:
 * single deployment + service in `traefik` namespace, 2 replicas, HA
 * Traefik `Middleware` CRD `x402` (forwardAuth → x402-gateway:8080/auth)
 * each consumer ingress just appends `traefik-x402@kubernetescrd` to
   its middleware chain via `extra_middlewares`

x402-gateway gains a `MODE=forwardauth` env var that returns 200 (allow)
or 402 (with x402 PaymentRequiredResponse body) instead of reverse-
proxying. Image: ghcr ... f4804d62.

Pod count: 9 → 2 (78% memory saved). All 9 sites verified still
serving the Anubis challenge to plain curl with identical TTFB.
DRY_RUN until `var.x402_wallet_address` is set on the traefik stack.

Removes `modules/kubernetes/x402_instance/` (dead code now).
2026-05-10 11:12:40 +00:00
Viktor Barzin
ce4a75d79a x402: deploy payment gateway in front of Anubis on all 9 public sites
Adds modules/kubernetes/x402_instance/ — a small Go reverse proxy
(forgejo.viktorbarzin.me/viktor/x402-gateway:ce333419) that selectively
issues HTTP 402 Payment Required to declared AI-bot User-Agents and
validates X-PAYMENT headers against a Coinbase x402 facilitator.
Browsers are forwarded transparently to Anubis (which then handles the
JS PoW gate as before).

Wired into all nine Anubis-fronted sites:
  ingress -> x402-X -> anubis-X -> backend

While `wallet_address` is empty the gateway runs in DRY_RUN — every
request is transparent-proxied, no 402s issued. This lets the pod sit
in the request path with zero behavioural impact today; flipping the
wallet variable in the per-stack module call activates payment-required
mode for AI-bot UAs.

Default config: Base mainnet USDC, $0.01/req, x402.org/facilitator,
catch-all UA list (ClaudeBot|GPTBot|Bytespider|meta-externalagent|
PerplexityBot|GoogleOther|cohere-ai|Diffbot|Amazonbot|
Applebot-Extended|FacebookBot|ImagesiftBot|YouBot|anthropic-ai|
Claude-Web|petalbot|spawning-ai|scrapy|python-requests).

Verified post-apply: 9/9 pods Running, all 9 sites still serve the
Anubis challenge to plain curl with identical TTFB, x402 logs confirm
"dry_run":true on every instance.
2026-05-10 11:12:40 +00:00
root
a1b659de2a Woodpecker CI deploy [CI SKIP] 2026-05-10 11:12:40 +00:00
Viktor Barzin
04cb22fd3b anubis: re-protect f1 with a per-host policy that allows JSON routes
Earlier f1 revert left the host fully unprotected (no Anubis,
exclude_crowdsec=true on the ingress already). Re-add Anubis with
a custom policy_yaml that:

- ALLOWs /_app/* (SvelteKit immutable JS/CSS chunks loaded before
  any cookie exists), /openapi.json, /docs, /api/* (FastAPI meta).
- ALLOWs the 9 known JSON/proxy routes (schedule, streams,
  embed, embed-asset, extract, extractors, health, proxy, relay)
  so the SvelteKit SPA's XHRs return JSON instead of the challenge
  HTML.
- Catch-all CHALLENGE for everything else — the SPA HTML pages
  (which fall through to FastAPI's `/{path}` catch-all) get the
  PoW gate.

The ALLOWed JSON routes are technically scrapeable by a determined
bot, but the user's stated goal is "avoid accidental scrapes" — the
HTML/SPA is the AI-training target, and that stays gated.

Verified: / → Anubis challenge HTML; /schedule, /streams → JSON;
/_app/.../app.js → text/javascript; ClaudeBot UA → Anubis deny page.
2026-05-10 11:12:40 +00:00
Viktor Barzin
a89d4a7d2a anubis: pull f1 off Anubis (XHR-vs-challenge collision) + add latency alerts
f1.viktorbarzin.me is a SPA whose JS fetches /schedule, /embed,
/embed-asset, … on the same path tree. With Anubis fronting `/`,
those XHRs land on the challenge HTML even when the cookie *should*
be valid, breaking the page with `Unexpected token '<', "<!doctype "
... is not valid JSON`. Removed Anubis from f1 — would need a path
carve-out (the way wrongmove does for /api) to re-enable. Added a
top-of-block comment so future me remembers why.

Plus four new Prometheus alerts in `Slow Ingress Latency` group
(stacks/monitoring/.../prometheus_chart_values.tpl):

- IngressTTFBHigh         (warn, 10m, avg latency >1s)
- IngressTTFBCritical     (crit, 5m,  avg latency >3s)
- IngressErrorRate5xxHigh (crit, 5m,  5xx >5%)
- AnubisChallengeStoreErrors (crit, 5m, any 5xx on *anubis* services
  via Traefik — proxies for the in-pod challenge-store error since
  Anubis itself only exposes Go-runtime metrics)

Notes from the alert author: avg-not-p95 because the existing
Prometheus scrape config drops traefik bucket series; once those
are restored, swap to histogram_quantile(0.95). TraefikDown inhibit
rule extended to suppress these four during a Traefik outage.
2026-05-10 11:12:40 +00:00
Viktor Barzin
8197842646 anubis: fix 500 on multi-replica + roll out to 6 more public sites
Browser visits to viktorbarzin.me started returning HTTP 500 with
`store: key not found: "challenge:..."` in pod logs. Root cause:
each Anubis pod stores in-flight challenges in process memory; with
2 replicas behind a ClusterIP, the PoW-solved request can be
routed to a different pod than the one that issued the challenge.
Anubis upstream documents the same caveat ("when running multiple
instances on the same base domain, the key must be the same across
all instances" — true for the ed25519 signing key, but the
challenge store is still pod-local without a shared backend).

Drop module default replicas: 2 → 1. Worst-case: ~1s cold-start on
pod restart. Real fix (Redis-backed challenge store) noted as a
follow-up in CLAUDE.md.

Roll Anubis out to: f1-stream, cyberchef (cc), jsoncrack (json),
privatebin (pb), homepage (home), real-estate-crawler (wrongmove
UI only — `/api` ingress stays direct via path-based ingress carve-
out so XHRs from the SPA bypass the challenge).

End-state: 9 public hosts now Anubis-fronted (blog, www, kms,
travel, f1, cc, json, pb, home, wrongmove). All return the
challenge HTML to bare curl/browser; verified-IP search engines and
/robots.txt + /.well-known still skip via the strict-policy
allowlist.
2026-05-10 11:12:40 +00:00
Viktor Barzin
abdef1781c anubis: strict bot policy — catch-all CHALLENGE for unmatched UAs
The default upstream policy only WEIGHs Mozilla|Opera UAs and lets
everything else (curl, wget, python-requests, scrapy, headless CLI
scrapers) fall through to the implicit ALLOW. On non-CDN-fronted
hosts (kms, anything dns_type=non-proxied) this meant a plain
`curl https://kms.viktorbarzin.me/` returned the real backend
content with no challenge — defeating the whole point of the
"avoid casual scrapers" intent.

Now the module ships a custom POLICY_FNAME mounted via ConfigMap:
- Imports the upstream deny-pathological / ai-block-aggressive /
  allow-good-crawlers / keep-internet-working snippets unchanged
- Adds a final `path_regex: .*` → action: CHALLENGE catch-all

Result: only IP-verified search engines (Googlebot from Google IPs,
Bingbot, etc.) and well-known paths (robots.txt, .well-known,
favicon, sitemap) skip the challenge. Everything else — including
spoofed-Googlebot-UA-from-random-IP — solves PoW or gets nothing.

Verified post-apply: curl default UA on viktorbarzin.me + kms +
travel returns the Anubis challenge HTML; /robots.txt still 200s
straight through.
2026-05-10 11:12:40 +00:00
Viktor Barzin
2d6812f951 fire-planner: dual ingress — /api/* unprotected, / behind Authentik
The SPA can't carry an Authentik session on its own fetch() XHRs in
all cases (cross-origin redirect to authentik.viktorbarzin.me on a
stale cookie returns HTML, fetch().json() parse fails). Splitting
the ingress so /api/ paths skip forward-auth lets the React app talk
to its API end-to-end. The browser still has to log in via
Authentik to load the SPA at /.

Verified end-to-end via chrome-service Playwright: dashboard load,
scenario list, what-if run with real Monte Carlo, save-as-scenario
round-trip, run-now on detail, delete — all pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:40 +00:00
Viktor Barzin
58fd4025f8 anubis: per-site PoW reverse proxy on blog + kms + travel-blog
Adds modules/kubernetes/anubis_instance/ — a per-site reverse proxy
instance pinned to ghcr.io/techarohq/anubis:v1.25.0. Each instance
issues a 30-day JWT cookie scoped to viktorbarzin.me after a tiny
proof-of-work (difficulty 2 ≈ 250 ms desktop / 700 ms mobile). The
shared ed25519 signing key (Vault: secret/viktor → anubis_ed25519_key)
makes a single solve good across every Anubis-fronted subdomain.

Wired into blog (viktorbarzin.me + www), kms.viktorbarzin.me, and
travel.viktorbarzin.me — each with anti_ai_scraping=false on the
ingress so the redundant ai-bot-block forwardAuth is dropped from the
chain. Skipped forgejo (Git/API clients can't solve PoW) and resume
(replicas=0).

Also tightens bot-block-proxy nginx timeouts (3s/5s → 100ms/200ms) so
any ingress still using the ai-bot-block forwardAuth pays at most
~150 ms when poison-fountain is scaled down, instead of 3 s.

End-to-end TTFB on viktorbarzin.me dropped from ~3.2 s to ~150-200 ms.

Docs: .claude/reference/patterns.md "Anti-AI Scraping" updated to
4 layers; .claude/CLAUDE.md adds the Anubis usage paragraph and
Forgejo/API caveat.
2026-05-10 11:12:40 +00:00
root
ea2cb57e69 Woodpecker CI Update TLS Certificates Commit 2026-05-10 11:12:40 +00:00
Viktor Barzin
248279605b postiz: disable signups (DISABLE_REGISTRATION=true)
Admin account already exists; we don't want random users registering
on the public-facing instance. Sign-in only from now on.
2026-05-10 11:12:40 +00:00
Viktor Barzin
9904561c26 fire-planner: ingress port 8080 (was defaulting to 80)
ingress_factory's port var defaults to 80, but fire-planner publishes
on 8080. Traefik logged 'Cannot create service error="service port
not found"' and 404'd every request. Cloudflare's standard
origin-error decoy page (with the noindex meta + cdn-cgi/content
honeypot link) made it look like a bot-block, but it was just the
upstream coming back 404.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:39 +00:00
root
017e139b80 Woodpecker CI deploy [CI SKIP] 2026-05-10 11:12:39 +00:00
Viktor Barzin
08edd92b22 kms: deploy slack-notifier sidecar with Prometheus metrics + document public exposure
Slack notifier now also exposes /metrics on :9101 with stdlib HTTP — counts
activations and dedup-skips by product, gauges last-activation timestamp.
Pod template gets the standard prometheus.io/scrape annotations so the
cluster-wide kubernetes-pods job picks it up via pod IP. Memory request
bumped to 48Mi to cover counter dicts + HTTPServer.

Plus docs: networking.md footnotes the windows-kms row noting public WAN
exposure with the rate-limited (max-src-conn 50, max-src-conn-rate 10/60,
overload <virusprot> flush) pfSense filter rule, and a new runbook covers
log locations, rate-limit tuning, and how to revoke the WAN forward.

The matching pfSense rule was tightened in place (TCP-only + rate limits)
via SSH; pfSense isn't Terraform-managed.
2026-05-10 11:12:39 +00:00
Viktor Barzin
efadeb531d state(dbaas): update encrypted state 2026-05-10 11:12:39 +00:00
Viktor Barzin
0d8e0ca6fc backup: fix daily-backup silent failures, postiz pg_dump CronJob, doc reconcile
daily-backup ran out of its 1h budget and SIGTERMed for 10 days straight (Apr
30 → May 9). Each failed run left its snapshot mount stacked on /tmp/pvc-mount,
which blocked the next run from completing — root cause of the WeeklyBackupStale
alert going silent (the metric never reached its end-of-script push).

Fixes:
- TimeoutStartSec 1h → 4h (current workload of 118 PVCs needs ~1.5h, was hitting
  the wall during week 18 runs)
- Recursive umount + LUKS cleanup on EXIT trap, plus the same at script start as
  belt-and-braces for any inherited stuck state from a prior crashed run
- TERM/INT trap pushes status=2 metric so WeeklyBackupFailing fires instead of
  the alert going blind on systemd kills
- pfsense metric pushed in BOTH success and failure paths (was only on success;
  any ssh-to-pfsense outage made PfsenseBackupStale silent until the alert
  threshold expired)

Postiz backup CronJob: bundled bitnami PG/Redis live on local-path (K8s node
OS disk) — outside Layer 1+2 of the 3-2-1 pipeline. Added postiz-postgres-backup
that pg_dumps postiz + temporal + temporal_visibility daily 03:00 to
/srv/nfs/postiz-backup, getting Layer 3 offsite coverage. Verified end-to-end:
3 dumps written, Pushgateway metric received. Note: bitnamilegacy/postgresql
image is stripped (no curl/wget/python) — switched to docker.io/library/postgres
matching the dbaas/postgresql-backup pattern with apt-installed curl.

Doc reconcile (backup-dr.md): metric names had drifted (e.g. the docs claimed
backup_weekly_last_success_timestamp but the script pushes
daily_backup_last_run_timestamp). Updated to match what's actually emitted, and
added a "default-covered" footnote to the Service Protection Matrix so the
~40 services with PVCs not enumerated in the table are no longer ambiguous.

Manual PVE-host actions (out-of-band, not in TF):
- unmounted 6 stacked snapshots from /tmp/pvc-mount
- pruned 5 stale snapshots on vm-9999-pvc-67c90b6b... (origin LV that the
  loop got SIGTERMed against repeatedly, so prune kept failing)
- created /srv/nfs/postiz-backup directory
- triggered a one-shot daily-backup run with the new TimeoutStartSec to
  validate the fix end-to-end

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:39 +00:00
Viktor Barzin
8c619278d3 grafana: env-var datasources + reloader so Vault rotations stop breaking dashboards
Wealth, Payslips, and Job-Hunter Grafana datasources all baked the
rotating PG password into their ConfigMap at TF-apply time, so every
7-day Vault static-role rotation silently broke the panels until a
manual `terragrunt apply`. Same family as the recurring grafana-mysql
backend bug — Grafana caches creds at startup and never picks up the
new ESO-synced password without a restart.

Fix:
- Each source stack now creates an ExternalSecret in `monitoring`
  exposing the rotating password as `<NAME>_PG_PASSWORD` env-var.
- Grafana mounts those via `envFromSecrets` (optional=true so a
  missing source stack doesn't block boot) and the datasource
  ConfigMaps reference `$__env{<NAME>_PG_PASSWORD}` instead of a
  literal password.
- `reloader.stakater.com/auto: "true"` on the Grafana pod restarts
  it whenever any of the four DB-cred Secrets is updated.

Tested end-to-end: forced `vault write -force database/rotate-role/
pg-wealthfolio-sync` → ESO synced (~30s) → reloader fired →
Grafana booted with new env in ~50s total → all three /api/datasources
/uid/*/health endpoints return "Database Connection OK".

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:39 +00:00
Viktor Barzin
57250cfda2 mysql: bump to 4Gi limit / 3Gi request; grow /srv/nfs LV to 3 TiB
mysql-standalone OOMKilled May 8 18:05 (anon-rss 2 GB at the 2 Gi limit).
innodb_buffer_pool_size=1Gi plus connection buffers and InnoDB internals
don't fit in 2 Gi. Bumping limit to 4 Gi (request 3 Gi) leaves headroom
without changing the buffer pool config.

/srv/nfs was at 90% (1.7T / 2T); grew the underlying pve/nfs-data LV
1 TiB online and ran resize2fs (now 60% used). Triggered by surfacing
during the 2026-05-09 IO-pressure post-mortem; thinpool had ~4.6 TiB
free.

The post-mortem also covers the stale-NFS-client trigger (legacy
/usr/local/bin/weekly-backup pointing at the decommissioned TrueNAS IP)
and the resulting wedged kthread on the PVE host. Script removed and
node_exporter restarted out-of-band; kthread will clear at next PVE
reboot. See docs/post-mortems/2026-05-09-io-pressure-stale-nfs.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:38 +00:00
Viktor Barzin
b254c536f9 ig-poster: bump to da5b4191 (auto-curate from recent favorites) 2026-05-10 11:12:38 +00:00
root
a5a54aebe3 Woodpecker CI deploy [CI SKIP] 2026-05-10 11:12:38 +00:00
Viktor Barzin
72013a0890 n8n: real-time training loop + decoupled posting
instagram-approval: after every tap, immediately fetch /candidates?limit=1
and send the next photo as a fresh inline-keyboard message — the user's
tap chains back into this same workflow, so the loop is user-paced.
When the pool is exhausted, send an 'all caught up' summary with the
backlog count + cumulative training stats.

instagram-discover: cron throttled from every-30-min to daily 09:00.
The chain handles ongoing training; the daily run only kickstarts a
session if the user hasn't been tapping. Limit reduced from 3 → 1 so
each kickstart sends a single photo (chain takes over).
2026-05-10 11:12:38 +00:00
Viktor Barzin
ff2f32a33e ig-poster b17a9737 + n8n discover rewritten to use /candidates with CLIP scoring 2026-05-10 11:12:38 +00:00
Viktor Barzin
94e2f34e2a ig-poster: bump to 3b862fe4 (EXIF orientation + auto-pending /candidates) 2026-05-10 11:12:38 +00:00
Viktor Barzin
29bb434e1e ig-poster: 69e395f2 + sync IMMICH_PG_* via ESO for CLIP scoring; postiz publish-notify n8n workflow 2026-05-10 11:12:38 +00:00
Viktor Barzin
cb83972b79 ig-poster: bump to cac6fa97 + sync POSTIZ_INTEGRATION_ID via ESO 2026-05-10 11:12:37 +00:00
Viktor Barzin
40ca011bd6 postiz: expose /uploads publicly so Meta IG fetcher can pull JPEGs
Stories+feed posts via Postiz failed with state=ERROR and Postiz
mistranslated the cause as 'Invalid Instagram image resolution
max: 1920x1080px'. Real cause: Postiz hands Meta an upload URL
under https://postiz.viktorbarzin.me/uploads/... and Meta gets a
302 to the Authentik login page instead of bytes. Meta returns
error 36001 (image not fetchable) which Postiz maps to that
misleading resolution string.

Split the ingress: /uploads/* on a public ingress (matches the
instagram-poster /image+/original pattern), everything else
remains behind Authentik forward-auth. /uploads contents are
random UUIDs, low blast radius if scraped.
2026-05-10 11:12:37 +00:00
Viktor Barzin
b3ae2c5476 docs: PVC templates need lifecycle.ignore_changes for autoresizer
The canonical proxmox-lvm and proxmox-lvm-encrypted PVC templates were
missing `lifecycle { ignore_changes = [spec[0].resources[0].requests] }`.
Without it, every PVC created from these templates becomes a drift bomb
the moment pvc-autoresizer expands it: the next `tg apply` on that stack
will try to shrink the PVC back to the TF-declared size, K8s rejects the
shrink, and apply fails.

This was latent because pvc-autoresizer was silently broken cluster-wide
(commit 9d5da4d8 fixed it by allow-listing kubelet_volume_stats_available_bytes
in Prometheus). Now that the autoresizer actually works, every existing
proxmox-lvm/encrypted PVC without ignore_changes is at risk.

Sweep needed (separate task): grep for kubernetes_persistent_volume_claim
across stacks/ and add ignore_changes to any with resize.topolvm.io
annotations.
2026-05-10 11:12:37 +00:00
Viktor Barzin
ce9bf5b676 postiz: wire INSTAGRAM_APP_ID/SECRET via ESO for IG-standalone provider
Standalone provider (instagram-standalone OAuth flow) is what the user
is trying after the FB-Login path was blocked by their Business Account
ad-policy flag. Uses modern scope names (instagram_business_*), so no
JS patch needed unlike the FB-Login provider.
2026-05-10 11:12:37 +00:00
Viktor Barzin
e883c9d63f ci(drift-detection): generate kubeconfig from projected SA token
Same fix as default.yml — drift-detection cron also runs terragrunt
plan on every stack, which requires the kubeconfig at <repo>/config
that terragrunt.hcl injects via -var kube_config_path. Pipeline #547
(latest scheduled drift-detection run) failed with the same
'config_path refers to an invalid path' error.
2026-05-10 11:12:37 +00:00
Viktor Barzin
ce45e69e38 ci(woodpecker): generate kubeconfig from projected SA token
terragrunt.hcl injects -var kube_config_path=${repo_root}/config for
every terraform invocation, but the pipeline never created that file.
Every commit that touched a TF stack since #545 (2026-05-08) failed
with 'config_path refers to an invalid path: \"../../config\": no such
file or directory' followed by the kubernetes provider falling back
to localhost:80.

Add a step that writes a kubeconfig at <repo>/config using the
projected SA token + cluster CA. The woodpecker namespace's default
SA is already cluster-admin (woodpecker-default ClusterRoleBinding),
so the projected token is sufficient for any stack apply. Using
tokenFile (not an inline token) lets the provider re-read it if
kubelet rotates the projected token mid-pipeline.

#545 was the last green run because that commit only changed the
build-cli pipeline — 0 stacks applied so the missing kubeconfig
never mattered.
2026-05-10 11:12:37 +00:00
Viktor Barzin
9c1df3ad96 chore: remove decommissioned registry.viktorbarzin.me ingress
The old port-5050 R/W private registry was decommissioned 2026-05-07
(forgejo-registry-consolidation Phase 4). The reverse-proxy ingress
+ ExternalName service + Cloudflare DNS record kept pointing at the
dead backend, returning 502 to anyone hitting registry.viktorbarzin.me.

This was driving 3 monitoring artifacts that auto-cleared on cleanup:
- Uptime Kuma external monitor #586 (deleted)
- Pushgateway stale registry-integrity-probe metrics (deleted)
- ExternalAccessDivergence + RegistryIntegrityProbeStale alerts
2026-05-10 11:12:37 +00:00
Viktor Barzin
8c09543391 fix: restore pvc-autoresizer by allow-listing kubelet_volume_stats_available_bytes
The Prometheus scrape config for the kubernetes-nodes job kept
capacity_bytes + used_bytes but dropped available_bytes. pvc-autoresizer
computes utilization from available/capacity, so without that metric it
was silent for every PVC in the cluster — including mailserver, which
filled to 89% (1.7G/2.0G) and started rejecting all inbound mail with
'452 4.3.1 Insufficient system storage' (15+ hours, all real senders:
Brevo, Gmail, Facebook).

Also bumps the floors of mailserver (2Gi -> 5Gi, limit 10Gi) and forgejo
(15Gi -> 30Gi) PVCs to recover from the immediate outage, and adds
ignore_changes on requests.storage so future autoresizer expansions
don't cause TF drift.
2026-05-10 11:12:37 +00:00
Viktor Barzin
c44d855960 ig-poster: pivot to Telegram-only delivery (manual IG upload)
User dropped Postiz/Instagram OAuth (Meta Business Account flagged
+ Postiz scope drift). New pipeline ends at Telegram — full-quality
JPEG delivered to the bot chat, manually uploaded to IG by the user.

- Image bumped to 25e46efd: adds /deliver/{asset_id} endpoint that
  multipart-uploads to Telegram (URL-fetch fails through Cloudflare
  for >5MB), then tags 'posted' in Immich.
- ESO now syncs telegram_bot_token + telegram_chat_id from Vault.
- Public ingress paths grow to ['/image', '/original'] (Authentik
  bypass on /original is harmless — files are user-tagged, low blast
  radius — and useful for ad-hoc browser downloads).
- Memory limit 512Mi -> 1500Mi: full-resolution Pillow HEIC decode
  was OOMing on 12MP+ phone photos.
- discover.json simplified to scan -> deliver per item; approval and
  post workflows already deactivated. Telegram bot webhook removed.
2026-05-10 11:12:37 +00:00
Viktor Barzin
bd8dbbc76f postiz: wire FACEBOOK_APP_ID/SECRET via ESO for IG-Business integration 2026-05-10 11:12:37 +00:00
Viktor Barzin
02e28294e9 postiz: idempotent Job to drop default Text search attributes (Temporal SQL visibility caps at 3 Text attrs; auto-setup ships with 2, Postiz adds 2 more — gitroomhq/postiz-app#1504) 2026-05-10 11:12:37 +00:00
Viktor Barzin
16e408ee59 postiz: bump memory limit to 4Gi (was OOMing during NestJS startup) 2026-05-10 11:12:37 +00:00
Viktor Barzin
888df84fb5 postiz: add Temporal sidecar; lock both stacks behind Authentik
Postiz backend was crashlooping on connect ECONNREFUSED ::1:7233 —
Postiz needs Temporal for cron/scheduled posts and the Helm chart
doesn't bundle it. Added a single-replica temporalio/auto-setup:1.28.1
Deployment in the postiz namespace, backed by the bundled
postiz-postgresql (separate `temporal` + `temporal_visibility`
databases pre-created via init container), ENABLE_ES=false (Postiz
only uses the workflow engine, not visibility search). Skips
DYNAMIC_CONFIG_FILE_PATH because that file isn't bundled in
auto-setup.

Auth audit:
- postiz: ingress now `protected = true` (Authentik forward-auth).
  Postiz also has its own login on top, but registration is no
  longer exposed to the open internet.
- instagram-poster: split into two ingresses on the same host.
  `/image/*` stays public (Meta + Telegram fetch the 9:16
  derivatives). Everything else (/healthz, /queue, /scan,
  /enqueue, /reject, /post-next) sits behind Authentik. The
  protected ingress sets dns_type=none — the public one already
  created the CF DNS record.
2026-05-10 11:12:37 +00:00
Viktor Barzin
c6939c3d53 postiz + n8n: real DB URL + webhook-trigger approval
- postiz: set DATABASE_URL/REDIS_URL pointing at the bundled subcharts;
  the chart does NOT auto-wire even when postgresql.enabled=true, so
  the prisma db:push was failing with empty DATABASE_URL.
- n8n approval workflow: swap telegramTrigger -> webhook node so it
  works without an n8n-stored Telegram credential. Telegram bot's
  webhook is set via setWebhook to https://n8n.viktorbarzin.me/webhook/instagram-approval.
  Parse-callback Code node tolerates both shapes ({body:{callback_query:...}}
  vs {callback_query:...}) so a future move back to telegramTrigger doesn't break.
2026-05-10 11:12:37 +00:00
Viktor Barzin
5057341d09 postiz + instagram-poster: deploy fixes after first apply
- postiz: pin chart name to 'postiz-app' (was 'postiz', wrong path)
  and override bundled bitnami subchart images to bitnamilegacy/* —
  Bitnami removed bitnami/postgresql + bitnami/redis from DockerHub
  in Aug 2025 (Broadcom acquisition).
- postiz: enable initial registration (DISABLE_REGISTRATION=false)
  so first admin user can be created in UI; tighten after.
- instagram-poster: add securityContext (fsGroup/runAsUser=10001)
  so kubelet chowns the PVC mount for the non-root 'poster' user;
  was crashing on alembic with 'unable to open database file'.
- instagram-poster: bump image_tag to 24935ab4 (uvicorn now binds
  to port 8000 to match Service contract; was 8080 -> probe 404).
2026-05-10 11:12:37 +00:00
Viktor Barzin
2d1dfa49f6 instagram-poster: pin image tag to 23f8b4ed (initial push) 2026-05-10 11:12:37 +00:00
Viktor Barzin
73eb01f994 add postiz + instagram-poster stacks for IG Stories pipeline
New stacks:
- stacks/postiz/ — Postiz scheduler (Helm chart v1.0.5, image v2.21.7)
  with bundled PG/Redis, /uploads PVC on proxmox-lvm, JWT_SECRET
  via ESO from secret/instagram-poster.
- stacks/instagram-poster/ — custom Python service that polls Immich
  for the 'instagram' tag, reformats photos to 9:16 with blurred-bg
  letterbox, exposes /image/<asset_id> publicly so Postiz can fetch.
  Image: forgejo.viktorbarzin.me/viktor/instagram-poster.

n8n: 3 new workflows (discover, approval, post) for the Telegram
inline-button approval UX. Adds ExternalSecret + env vars for
TELEGRAM_BOT_TOKEN, TELEGRAM_CHAT_ID, IMMICH_API_KEY, plus static
URLs for the new service.

Vault: seed secret/instagram-poster with telegram_bot_token,
telegram_chat_id, immich_api_key, postiz_api_token,
postiz_jwt_secret before applying.
2026-05-10 11:12:37 +00:00
Viktor Barzin
badc341669 openclaw: regenerate kubeconfig at pod start using projected SA tokenFile
The previously-baked kubeconfig at /home/node/.openclaw/kubeconfig retained
a service-account token bound to the original (long-dead) pod, so kubectl
calls from inside the openclaw container failed with "the server has asked
for the client to provide credentials" even though the openclaw SA has
cluster-admin and kubelet projects a fresh token at
/var/run/secrets/kubernetes.io/serviceaccount/token.

Add init-container "setup-kubeconfig" that writes a kubeconfig with
tokenFile + certificate-authority paths pointing at the projected
SA volume — kubelet auto-rotates the token, kubectl always reads
fresh creds, no Vault K8s-creds-engine refresh needed.

Verified end-to-end: agent ran `kubectl get nodes -o wide` inside the
pod and delivered a correct one-line summary to Telegram via
openai-codex/gpt-5.4-mini.
2026-05-10 11:12:37 +00:00
Viktor Barzin
8b0b4e5148 [ci] build-cli: drop registry.viktorbarzin.me:5050 push (decommissioned)
The build-cli pipeline was still pushing to the
registry.viktorbarzin.me:5050/infra path that no longer exists
post Phase 4 — failing with 'error authenticating: exit status 1'
on every infra push. Drop the second repo + login; DockerHub +
Forgejo are the canonical destinations now.
2026-05-10 11:12:37 +00:00
Viktor Barzin
a39893bb60 [woodpecker] Re-fix null_resource trigger after lint reverted it
The helm provider in this Terraform version doesn't support
list-index access on helm_release.metadata[0]. Switch the
woodpecker_server_host_alias trigger to {helm_version, sha256(values)}
which works regardless of provider quirks. (Original fix landed
2026-05-07; got reverted by a linter pass.)
2026-05-10 11:12:36 +00:00
Viktor Barzin
564c64f4c7 f1-stream: register HmembedsExtractor in registry
Companion commit to 92474254 — the new extractor wasn't being
registered, only the file was added. Add the import + register call
in create_registry().

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:36 +00:00
Viktor Barzin
18604d808e f1-stream: hmembeds offline decoder — reverse-engineered the JW Player trap
Four-agent parallel investigation finally pinned down what's happening
with the hmembeds.one streams. The TL;DR is unexpected: there is no
fingerprint check, no decoder failure, no broken JS — the obfuscated
decoder is trivial to reproduce, but the upstream origin is dead.

Findings (saved at /tmp/jwre/{findings.md, blob-analysis.md,
fingerprint-gap.md, trace-summary.md}):

1. **The "ZpQw9XkLmN8c3vR3" blob is decoy.** It's an Adcash adblock-
   bypass config — not the stream URL. The actual stream URL is in a
   different inline `<script>` block of the embed HTML.

2. **The real decoder is base64 + XOR with a hardcoded key**, the key
   appears literally in the HTML (e.g. `var k="bux7ver6mow4trh1"`).
   No browser-derived inputs. We can run it in Python in 50µs.

3. **The decoded URL is JWT-bound to /24 of the requestor's IP**. JWT
   payload: `{stream, ip:"176.12.22.0/24", session_id, exp}`. From our
   cluster (egress 176.12.22.76) the JWT IP-binding is satisfied.

4. **The origin still returns 404 (GET) / 403 (HEAD).** Tested both
   curated embeds (Sky F1 888520f3..., DAZN F1 fc3a5463...) — same
   404. Origin landing page (`/`) returns 200, so the host is up;
   the `/sec/<JWT>/<embed_id>.m3u8` endpoint specifically refuses.

5. **No fingerprint surface trips this.** Runtime trace via
   chrome-service hooks confirmed: decoder reads navigator.userAgent
   (heavy), screen dimensions, and a single WebGL getParameter call.
   No canvas, audio, fonts, fetch-to-fingerprint-API. JW Player setup
   is given a valid file URL — the playlist stays empty because JW
   can't fetch the manifest from the (dead) origin.

Verdict: **the legacy curated hmembeds embeds (`888520f3...` Sky F1,
`fc3a5463...` DAZN F1) are upstream-dead.** No browser-side fix is
possible. The community uses these IDs as "24/7 channels" but they're
in a perpetually-offline state right now.

This commit ships the offline decoder anyway, registered as a new
extractor. Two reasons:
- If those origins come back online, no code change needed.
- Future curated hmembeds IDs (added by hand or discovered via
  subreddit posts) will resolve through the same path.

Files added: `extractors/hmembeds.py` (~120 lines incl. the decoder
and a `decode_embed(html) -> str | None` helper that's reusable).
Registered in `__init__.py`. The existing CuratedExtractor stays
disabled; this replaces its mechanism with one that can absorb new
embed IDs without code changes.

Bonus from the agent work:
- Confirmed our stealth.js is sufficient — the runtime trace showed
  the decoder reads only the surfaces we already cover.
- Identified ~10 fingerprint surfaces we don't spoof (platform,
  userAgentData, hardwareConcurrency, deviceMemory, timezone,
  AudioContext, ICE candidates) but proved they're not what's
  blocking us, so no change needed for now.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 11:12:36 +00:00
Viktor Barzin
ffa1d6d5dc [woodpecker] Programmatic Forgejo repo registration
Earlier I claimed the OAuth Web UI flow was the only way to onboard
new Forgejo repos in Woodpecker. That's wrong.

Two parts to the actual workaround:
1. Woodpecker session JWTs are HS256 signed with the user's per-user
   `hash` column from the PG `users` table (NOT the global agent
   secret). Mint a session JWT for the Forgejo viktor user (id=2,
   forge_id=2), and you're authenticated as that user.
2. POST /api/repos?forge_remote_id=N as viktor → Woodpecker calls
   Forgejo with viktor's stored OAuth access_token to create the
   webhook + per-repo signing key. Works.

The 500 I saw earlier was from POST'ing as ViktorBarzin (GitHub
admin), whose user row has no Forgejo OAuth token — Woodpecker's
forge-API call fails for that user, surfacing as a 500.

scripts/woodpecker-register-forgejo-repo.sh wraps the whole flow:
extract hash from PG → mint JWT → activate repo. Verified against
viktor/{broker-sync,claude-agent-service,freedify,hmrc-sync} in
this session — all activated cleanly.

Also updated the runbook with the actual mechanism + the
WOODPECKER_FORGE_TIMEOUT=30s tip (the real root cause of the
'context deadline exceeded' failures, NOT the v3.14 upgrade).
2026-05-10 11:12:36 +00:00
116 changed files with 5982 additions and 1814 deletions

View file

@ -29,6 +29,7 @@ Violations cause state drift, which causes future applies to break or silently r
- **New services need CI/CD** and **monitoring** (Prometheus/Uptime Kuma)
- **New service**: Use `setup-project` skill for full workflow
- **Ingress**: `ingress_factory` module. Auth: `protected = true`. Anti-AI: on by default. **DNS**: `dns_type = "proxied"` (Cloudflare CDN) or `"non-proxied"` (direct A/AAAA). DNS records are auto-created — no need to edit `config.tfvars`.
- **Anubis PoW challenge** (`modules/kubernetes/anubis_instance/`): per-site reverse proxy that issues a 30-day JWT cookie after a tiny PoW solve. Use for **public, content-bearing sites without app-level auth** (blog, docs, wikis, static landing pages). Pattern: declare `module "anubis" { source = "../../modules/kubernetes/anubis_instance"; name = "X"; namespace = ...; target_url = "http://<backend>.<ns>.svc.cluster.local" }`, then in `ingress_factory` set `service_name = module.anubis.service_name`, `port = module.anubis.service_port`, `anti_ai_scraping = false`. Shared ed25519 key in Vault `secret/viktor` -> `anubis_ed25519_key`; cookie scoped to `viktorbarzin.me` so one solve covers all Anubis-fronted subdomains. **DO NOT put Anubis in front of Git/API/WebDAV/CLI endpoints** — clients without JS can't solve PoW. **Replicas default to 1** because Anubis stores in-flight challenges in process memory; a challenge issued by pod A and solved against pod B errors with `store: key not found` (HTTP 500). Bumping replicas requires wiring a shared Redis store (TODO). For path-level carve-outs (e.g. wrongmove has `/` behind Anubis but `/api` direct), declare a second `ingress_factory` with `ingress_path = ["/api"]` pointing at the bare backend service. Active on: blog, www, kms, travel, f1, cc, json, pb (privatebin), home (homepage), wrongmove (UI only). See `.claude/reference/patterns.md` "Anti-AI Scraping" for full layering.
- **Docker images**: Always build for `linux/amd64`. Use 8-char git SHA tags — `:latest` causes stale pull-through cache.
- **Private registry**: `forgejo.viktorbarzin.me/viktor/<name>` (Forgejo packages, OAuth-style PAT auth). Use `image: forgejo.viktorbarzin.me/viktor/<name>:<tag>` + `imagePullSecrets: [{name: registry-credentials}]`. Kyverno auto-syncs the Secret to all namespaces. Containerd `hosts.toml` on every node redirects to in-cluster Traefik LB `10.0.20.200` to avoid hairpin NAT. Push-side: viktor PAT in Vault `secret/ci/global/forgejo_push_token` (Forgejo container packages are scoped per-user; only the package owner can push, ci-pusher cannot write to viktor/*). Pull-side: cluster-puller PAT in Vault `secret/viktor/forgejo_pull_token`. Retention CronJob (`forgejo-cleanup` in `forgejo` ns, daily 04:00) keeps newest 10 versions + always `:latest`; integrity probed every 15min by `forgejo-integrity-probe` in `monitoring` ns (catalog walk + manifest HEAD on every blob). See `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md` for the migration history. Pull-through caches for upstream registries (DockerHub, GHCR, Quay, k8s.gcr, Kyverno) stay on the registry VM at `10.0.20.10` ports 5000/5010/5020/5030/5040 — the old port-5050 R/W private registry was decommissioned 2026-05-07.
- **LinuxServer.io containers**: `DOCKER_MODS` runs apt-get on every start — bake slow mods into a custom image (`RUN /docker-mods || true` then `ENV DOCKER_MODS=`). Set `NO_CHOWN=true` to skip recursive chown that hangs on NFS mounts.
@ -188,11 +189,20 @@ resource "kubernetes_persistent_volume_claim" "data_proxmox" {
requests = { storage = "1Gi" }
}
}
lifecycle {
# pvc-autoresizer expands this PVC up to storage_limit; ignore drift on
# requests.storage so the next TF apply doesn't try to shrink it back
# (K8s rejects shrinks → apply fails). To bump the floor manually:
# temporarily remove this block, apply the new size, re-add the block,
# apply again.
ignore_changes = [spec[0].resources[0].requests]
}
}
```
- `wait_until_bound = false` is **required** (WaitForFirstConsumer binding)
- Deployment strategy **must be Recreate** (RWO volumes)
- Autoresizer annotations are **required** on all proxmox-lvm PVCs
- `lifecycle.ignore_changes` on `requests` is **required** to coexist with the autoresizer
- Every proxmox-lvm app **MUST** add a backup CronJob writing to NFS `/mnt/main/<app>-backup/`
**proxmox-lvm-encrypted PVC template** (Terraform) — use for all sensitive data:
@ -215,9 +225,13 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" {
requests = { storage = "1Gi" }
}
}
lifecycle {
# See data_proxmox above — required for autoresizer coexistence.
ignore_changes = [spec[0].resources[0].requests]
}
}
```
- Same rules as `proxmox-lvm` (wait_until_bound, Recreate strategy, autoresizer, backup CronJob)
- Same rules as `proxmox-lvm` (wait_until_bound, Recreate strategy, autoresizer, backup CronJob, `lifecycle.ignore_changes`)
- Uses LUKS2 encryption with Argon2id key derivation via Proxmox CSI plugin
- Encryption passphrase stored in Vault KV (`secret/viktor/proxmox_csi_encryption_passphrase`), synced to K8s Secret `proxmox-csi-encryption` in `kube-system` via ExternalSecret
- Backup key at `/root/.luks-backup-key` on PVE host (chmod 600)

View file

@ -26,12 +26,16 @@ module "nfs_data" {
## ~~iSCSI Storage~~ (REMOVED — replaced by proxmox-lvm)
> iSCSI via democratic-csi and TrueNAS has been fully removed (2026-04). All database storage now uses `StorageClass: proxmox-lvm` (Proxmox CSI, LVM-thin hotplug). TrueNAS has been decommissioned.
## Anti-AI Scraping (3 Active Layers) (Updated 2026-04-17)
## Anti-AI Scraping (4 Active Layers) (Updated 2026-05-10)
Default `anti_ai_scraping = true` in ingress_factory. Disable per-service: `anti_ai_scraping = false`.
1. Bot blocking (ForwardAuth → poison-fountain) 2. X-Robots-Tag noai 3. Tarpit/poison content (standalone at poison.viktorbarzin.me)
Trap links (formerly layer 3) removed April 2026 — rewrite-body plugin broken on Traefik v3.6.12 (Yaegi bugs). `strip-accept-encoding` and `anti-ai-trap-links` middlewares deleted.
1. **Anubis PoW challenge** (per-site reverse proxy) — `modules/kubernetes/anubis_instance/`. Latest: `ghcr.io/techarohq/anubis:v1.25.0`. Difficulty 2 (~250 ms desktop / ~700 ms mobile), 30-day JWT cookie scoped to `viktorbarzin.me` so a single solve covers every Anubis-fronted subdomain. Active on: `viktorbarzin.me`, `kms.viktorbarzin.me`, `travel.viktorbarzin.me`. Add to a stack: `module "anubis" { source = "../../modules/kubernetes/anubis_instance"; name = "X"; namespace = ...; target_url = "http://<svc>.<ns>.svc.cluster.local" }`, then point ingress_factory at `module.anubis.service_name` + `port = module.anubis.service_port` and set `anti_ai_scraping = false`. Shared ed25519 signing key in Vault `secret/viktor` -> `anubis_ed25519_key`. **Avoid putting Anubis in front of CLI/API/Git endpoints (Forgejo, APIs, WebDAV)** — clients without JS can't solve PoW.
2. **Bot blocking forwardAuth** (ForwardAuth → bot-block-proxy → poison-fountain) — global default for non-Anubis sites. `bot-block-proxy` (OpenResty in `traefik` ns) is fail-open with 100 ms connect / 200 ms read timeouts so a downed poison-fountain costs ≤200 ms per request. Source: `stacks/traefik/modules/traefik/main.tf`.
3. **X-Robots-Tag noai** — set by `traefik-anti-ai-headers` middleware. Anubis additionally serves a comprehensive `/robots.txt` (`SERVE_ROBOTS_TXT=true`) to well-behaved bots.
4. **Tarpit/poison content** (standalone at poison.viktorbarzin.me, `stacks/poison-fountain/`). Currently scaled to `replicas = 0` — fail-open path means no live traffic, no penalty.
Trap links (formerly a layer) removed April 2026 — rewrite-body plugin broken on Traefik v3.6.12 (Yaegi bugs). `strip-accept-encoding` and `anti-ai-trap-links` middlewares deleted.
Rybbit analytics injection now via Cloudflare Worker (`stacks/rybbit/worker/`, HTMLRewriter, wildcard route `*.viktorbarzin.me/*`, 28 site ID mappings).
Key files: `stacks/poison-fountain/`, `stacks/rybbit/worker/`, `stacks/platform/modules/traefik/middleware.tf`
Key files: `modules/kubernetes/anubis_instance/`, `stacks/poison-fountain/`, `stacks/rybbit/worker/`, `stacks/traefik/modules/traefik/main.tf`
## Terragrunt Architecture
- Root `terragrunt.hcl`: DRY providers, backend, variable loading, `generate "tiers"` block

View file

@ -15,22 +15,23 @@ steps:
username: "viktorbarzin"
password:
from_secret: dockerhub-pat
# Phase 4 of forgejo-registry-consolidation 2026-05-07 —
# registry.viktorbarzin.me:5050 decommissioned. Push to DockerHub
# (the public-facing infra image) AND Forgejo (the cluster pull
# source). Same image, two locations.
repo:
- viktorbarzin/infra
- registry.viktorbarzin.me:5050/infra
- forgejo.viktorbarzin.me/viktor/infra
logins:
- registry: https://index.docker.io/v1/
username: viktorbarzin
password:
from_secret: dockerhub-pat
# Private registry on :5050 requires htpasswd auth since 2026-03-22.
# Without this, buildx pushes the second repo but blob HEAD comes
# back 401 → pipeline fails → CI false-negative (see bd code-12b).
- registry: registry.viktorbarzin.me:5050
- registry: forgejo.viktorbarzin.me
username:
from_secret: registry_user
from_secret: forgejo_user
password:
from_secret: registry_password
from_secret: forgejo_push_token
dockerfile: cli/Dockerfile
context: cli
auto_tag: true

View file

@ -73,6 +73,38 @@ steps:
# the env var is unset.
umask 077; printf '%s' "$VAULT_TOKEN" > "$HOME/.vault-token"
# ── Generate kubeconfig from projected SA token ──
# terragrunt.hcl injects `-var kube_config_path=<repo>/config` for every
# terraform invocation, so we need a kubeconfig file at that path. The
# `default` SA in the woodpecker namespace is cluster-admin (via the
# `woodpecker-default` ClusterRoleBinding), so the projected token is
# sufficient to apply any stack. Using `tokenFile` (not an inline token)
# so the provider re-reads it if kubelet rotates the projected token
# mid-pipeline.
- |
cat > config <<'EOF'
apiVersion: v1
kind: Config
clusters:
- name: kubernetes
cluster:
server: https://10.0.20.100:6443
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
contexts:
- name: ci
context:
cluster: kubernetes
user: ci
current-context: ci
users:
- name: ci
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
EOF
chmod 600 config
# Sanity check: kubeconfig works
kubectl --kubeconfig=config get ns kube-system -o name >/dev/null
# ── Detect changed stacks ──
- |
PLATFORM_STACKS="dbaas authentik crowdsec monitoring nvidia mailserver cloudflared kyverno metallb redis traefik technitium headscale rbac k8s-portal vaultwarden reverse-proxy metrics-server vpa nfs-csi iscsi-csi cnpg sealed-secrets uptime-kuma wireguard xray infra-maintenance platform vault reloader descheduler external-secrets"

View file

@ -41,6 +41,34 @@ steps:
export VAULT_TOKEN=$(curl -s -X POST "$VAULT_ADDR/v1/auth/kubernetes/login" \
-d "{\"role\":\"ci\",\"jwt\":\"$SA_TOKEN\"}" | jq -r .auth.client_token)
# ── Generate kubeconfig from projected SA token ──
# See default.yml for rationale. terragrunt.hcl injects
# `-var kube_config_path=<repo>/config` for every terraform invocation,
# so we need a kubeconfig file at that path. The woodpecker default SA
# is cluster-admin, so the projected token is sufficient.
- |
cat > config <<'EOF'
apiVersion: v1
kind: Config
clusters:
- name: kubernetes
cluster:
server: https://10.0.20.100:6443
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
contexts:
- name: ci
context:
cluster: kubernetes
user: ci
current-context: ci
users:
- name: ci
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
EOF
chmod 600 config
kubectl --kubeconfig=config get ns kube-system -o name >/dev/null
# ── Run terraform plan on all stacks ──
# Emits two timestamps per drifted stack so the Pushgateway/Prometheus
# side can compute drift-age-hours via `time() - drift_stack_first_seen`.

View file

@ -267,7 +267,7 @@ Native LVM thin snapshots provide crash-consistent point-in-time recovery for 62
**Snapshot Pruning**: Deletes LVM snapshots older than 7 days (safety net for snapshots that outlive `lvm-pvc-snapshot` timer).
**Monitoring**: Pushes `backup_weekly_last_success_timestamp` to Pushgateway. Alerts: `WeeklyBackupStale` (>8d), `WeeklyBackupFailing`.
**Monitoring**: Pushes `daily_backup_last_run_timestamp`, `daily_backup_last_status`, and `daily_backup_bytes_synced` to Pushgateway (job `daily-backup`). Alerts: `WeeklyBackupStale` (>9d on `daily_backup_last_run_timestamp`), `WeeklyBackupFailing` (`daily_backup_last_status != 0`). The metric is pushed both on clean exit AND from a `trap TERM INT` handler — a 2026-04-30 → 2026-05-09 silent-failure incident traced to systemd SIGTERMing the script before it reached its final push, leaving the alert blind.
### Layer 2b: Application-Level Backups
@ -686,9 +686,11 @@ module "nfs_backup" {
**Metrics sources**:
- Backup CronJobs: Push `backup_last_success_timestamp` to Pushgateway on completion
- LVM snapshot script: Pushes `lvm_snapshot_last_success_timestamp`, `lvm_snapshot_count`, `lvm_thin_pool_free_percent`
- Daily backup script: Pushes `backup_weekly_last_success_timestamp`, `backup_disk_usage_percent`
- Offsite sync script: Pushes `offsite_backup_sync_last_success_timestamp`
- LVM snapshot script: Pushes `lvm_snapshot_last_run_timestamp`, `lvm_snapshot_last_status`, `lvm_snapshot_created_total`, `lvm_snapshot_failed_total`, `lvm_snapshot_pruned_total`, `lvm_snapshot_thinpool_free_pct` (job `lvm-pvc-snapshot`)
- Daily backup script: Pushes `daily_backup_last_run_timestamp`, `daily_backup_last_status`, `daily_backup_bytes_synced` (job `daily-backup`). Disk-fullness alert (`BackupDiskFull`) does NOT use a script-pushed metric; it derives from node-exporter `node_filesystem_avail_bytes{job="proxmox-host", mountpoint="/mnt/backup"}`.
- pfSense backup (step 3 of `daily-backup`): Pushes `backup_last_run_timestamp`, `backup_last_status`, and `backup_last_success_timestamp` (only on success) under job `pfsense-backup`. Pushed in BOTH success and failure paths so `PfsenseBackupStale` doesn't go silent when SSH-to-pfsense breaks.
- Offsite sync script: Pushes `backup_last_success_timestamp`, `offsite_sync_last_status` (job `offsite-backup-sync`)
- Prometheus backup (sidecar in prometheus-server pod, monthly 1st-Sunday 04:00 UTC): Pushes `prometheus_backup_last_success_timestamp` (job `prometheus-backup`)
- ~~CloudSync monitor~~: Removed (TrueNAS decommissioned)
- Vaultwarden integrity: Pushes `vaultwarden_sqlite_integrity_ok` hourly
@ -728,6 +730,8 @@ the 2026-04-22 backup_offsite_sync FAIL (node3 kubelet hiccup at
| NovelApp | ✓ | ✓ | — | ✓ | proxmox-lvm |
| Headscale | ✓ | ✓ | — | ✓ | proxmox-lvm |
| Uptime Kuma | ✓ | ✓ | — | ✓ | proxmox-lvm |
| **Other apps not enumerated above** | ✓¹ | ✓¹ | varies | ✓ | proxmox-lvm / proxmox-lvm-encrypted |
| **Postiz** (bundled bitnami PG on local-path) | — | — | ✓ daily pg_dump → NFS | ✓ | local-path + NFS |
| **Media (NFS)** |
| Immich (~800GB) | — | — | — | ✓ | NFS |
| Audiobookshelf | — | — | — | ✓ | NFS |
@ -739,7 +743,13 @@ the 2026-04-22 backup_offsite_sync FAIL (node3 kubelet hiccup at
- — = Not needed (other layers cover it, or data is regenerable/disposable)
- excluded = Too large/regenerable, not worth offsite bandwidth
**Note**: All 65 proxmox-lvm PVCs get LVM snapshots (except dbaas+monitoring = 3 PVCs) + file-level backup (except dbaas+monitoring). NFS-backed media syncs directly to Synology `nfs/` and `nfs-ssd/` via inotify change tracking.
**Note**: All proxmox-lvm and proxmox-lvm-encrypted PVCs get LVM snapshots (except `dbaas` and `monitoring` namespaces, excluded for write-amplification reasons) + file-level backup. NFS-backed media syncs directly to Synology `nfs/` and `nfs-ssd/` via inotify change tracking.
¹ **"Other apps not enumerated above"** — the table only enumerates services worth calling out. The default backup posture for any service using `proxmox-lvm` or `proxmox-lvm-encrypted` (outside `dbaas`/`monitoring`) is **automatic** Layer 1 (LVM thin snapshots, 7d retention) + Layer 2 (file backup, 4 weekly versions on sda) + Layer 3 (offsite to Synology). Auto-discovery is by LV name pattern (`vm-*-pvc-*`), so adding a new service to the cluster gets it covered without any explicit registration. Run `ssh root@192.168.1.127 lvs --noheadings -o lv_name pve | grep '^vm-.*-pvc-' | grep -v _snap_ | wc -l` to see the live count.
**Known gaps** — services with PVCs not on the proxmox-lvm path lose Layer 1+2:
- **Postiz** PG and Redis (bundled bitnami chart) live on `local-path` (K8s node OS disk). PG covered by the postiz-postgres-backup CronJob (daily pg_dump → `/srv/nfs/postiz-backup/`, Layer 3 via offsite sync). Redis is regenerable cache — not backed up.
- **Prometheus, Alertmanager, Pushgateway**`monitoring` namespace excluded by policy; loss is acceptable (metrics regenerable, silences ephemeral, Pushgateway has on-disk persistence for 24h gap tolerance).
## Recovery Procedures

View file

@ -261,7 +261,7 @@ MetalLB v0.15.3 allocates IPs from the range 10.0.20.200-10.0.20.220 in **Layer
| traefik | traefik | 10.0.20.200 (shared) | 80, 443, 443/UDP (HTTP/3), 10200, 10300, 11434/TCP |
| coturn | coturn | 10.0.20.200 (shared) | 3478/UDP (STUN/TURN), 49152-49252/UDP (relay) |
| headscale | headscale | 10.0.20.200 (shared) | 41641/UDP, 3479/UDP |
| windows-kms | kms | 10.0.20.200 (shared) | 1688/TCP |
| windows-kms¹ | kms | 10.0.20.200 (shared) | 1688/TCP |
| qbittorrent | servarr | 10.0.20.200 (shared) | 50000/TCP+UDP |
| shadowsocks | shadowsocks | 10.0.20.200 (shared) | 8388/TCP+UDP |
| torrserver-bt | tor-proxy | 10.0.20.200 (shared) | 5665/TCP |
@ -272,6 +272,8 @@ MetalLB v0.15.3 allocates IPs from the range 10.0.20.200-10.0.20.220 in **Layer
pfSense aliases reference these IPs: `k8s_shared_lb` (10.0.20.200), `technitium_dns` (10.0.20.201). NAT rules use aliases for maintainability.
¹ **windows-kms is publicly WAN-exposed.** pfSense forwards WAN TCP/1688 → `k8s_shared_lb:1688` so any internet host can activate. The matching filter rule applies a per-source rate limit (`max-src-conn 50`, `max-src-conn-rate 10/60`) with `overload <virusprot>` flush — offenders are auto-added to pfSense's stock `virusprot` pf table for follow-on blocks. Operations (rate-limit tuning, log locations, revocation) are documented in `docs/runbooks/kms-public-exposure.md`.
Critical services are scaled to **3 replicas**:
- Traefik (PDB: minAvailable=2)
- Authentik (PDB: minAvailable=2)

View file

@ -1,6 +1,6 @@
# Storage Architecture
Last updated: 2026-04-15
Last updated: 2026-05-09
## Overview
@ -13,7 +13,7 @@ The cluster uses two storage backends: **Proxmox CSI** for database block storag
All services storing sensitive data were migrated to `proxmox-lvm-encrypted` on 2026-04-15. This eliminates the previous double-CoW (ZFS + LVM-thin) path and ensures data-at-rest encryption.
**NFS storage (Proxmox host)**: ~100 NFS shares for media libraries (Immich, audiobookshelf, servarr, navidrome), backup targets (`*-backup/` directories), and app data are served directly from the Proxmox host at `192.168.1.127`. Two NFS export roots exist:
- **HDD NFS**: `/srv/nfs` on ext4 LV `pve/nfs-data` (2TB) — bulk media and backup targets
- **HDD NFS**: `/srv/nfs` on ext4 LV `pve/nfs-data` (3TB) — bulk media and backup targets
- **SSD NFS**: `/srv/nfs-ssd` on ext4 LV `ssd/nfs-ssd-data` (100GB) — high-performance data (Immich ML)
Both `StorageClass: nfs-truenas` and `StorageClass: nfs-proxmox` point to the Proxmox host and are functionally identical. The `nfs-truenas` name is historical — it was retained because StorageClass names are immutable on bound PVs (48 PVs reference it) and renaming would force mass PV churn across the cluster.
@ -31,7 +31,7 @@ graph TB
subgraph Proxmox["Proxmox Host (192.168.1.127)"]
sdc["sdc: 10.7TB RAID1 HDD<br/>VG pve, LV data (thin pool)<br/>~67 proxmox-lvm PVCs<br/>~28 proxmox-lvm-encrypted PVCs"]
sda["sda: 1.1TB RAID1 SAS<br/>VG backup, LV data (ext4)<br/>/mnt/backup"]
NFS_HDD["LV pve/nfs-data (2TB ext4)<br/>/srv/nfs<br/>~100 NFS shares<br/>Media + backup targets"]
NFS_HDD["LV pve/nfs-data (3TB ext4)<br/>/srv/nfs<br/>~100 NFS shares<br/>Media + backup targets"]
NFS_SSD["LV ssd/nfs-ssd-data (100GB ext4)<br/>/srv/nfs-ssd<br/>High-performance data<br/>(Immich ML)"]
NFS_Exports["NFS Exports<br/>managed by /etc/exports"]
NFS_HDD --> NFS_Exports
@ -74,7 +74,7 @@ graph TB
| **Proxmox CSI plugin** | Helm chart | Namespace: proxmox-csi | Block storage via LVM-thin hotplug |
| **StorageClass `proxmox-lvm`** | RWO, WaitForFirstConsumer | Cluster-wide | Non-sensitive stateful apps |
| **StorageClass `proxmox-lvm-encrypted`** | RWO, WaitForFirstConsumer, LUKS2 | Cluster-wide | **All sensitive data** (databases, auth, email, passwords, git) |
| Proxmox NFS (HDD) | LV `pve/nfs-data`, 2TB ext4 | 192.168.1.127:/srv/nfs | Bulk NFS data for all services |
| Proxmox NFS (HDD) | LV `pve/nfs-data`, 3TB ext4 | 192.168.1.127:/srv/nfs | Bulk NFS data for all services |
| Proxmox NFS (SSD) | LV `ssd/nfs-ssd-data`, 100GB ext4 | 192.168.1.127:/srv/nfs-ssd | High-performance data (Immich ML) |
| nfs-csi | Helm chart | Namespace: nfs-csi | NFS CSI driver |
| StorageClass `nfs-proxmox` | RWX, soft mount | Cluster-wide | NFS storage, points to Proxmox host |

View file

@ -0,0 +1,56 @@
# Post-Mortem: IO Pressure Stalls from Stale NFS Client to Decommissioned TrueNAS
| Field | Value |
|-------|-------|
| **Date** | 2026-05-09 (issue first observable in journal at 2026-05-08 00:00:04) |
| **Duration** | Intermittent IO PSI stalls and kubectl TLS handshake timeouts during the session; PVE host loadavg ~15 sustained. No user-visible outage. |
| **Severity** | SEV3 (degraded host I/O, no service down) |
| **Affected Components** | PVE host (192.168.1.127), `node_exporter` (PID 1479, D-state), kernel NFS kthread `[10.0.10.15-manager]`, k8s-node3 (downstream IO PSI). |
| **Status** | Resolved structurally. Stale connection source removed; recurring trigger eliminated. Wedged kthread persists in kernel queue — clears on next PVE reboot. |
## Summary
The PVE host's NFS client was retaining a wedged connection to `10.0.10.15` — the IP of the TrueNAS VM that was operationally decommissioned 2026-04-13 (storage migrated to `192.168.1.127:/srv/nfs`). The connection was created by `/usr/local/bin/weekly-backup`, a legacy script left over from before the NFS migration that had never been removed. Its kernel kthread `[10.0.10.15-manager]` parked itself in `rpc_wait_bit_killable` and stayed there. Any process that touched `/proc/mountstats` — including `node_exporter` — got dragged into D-state alongside it, which in turn fed back into IO pressure metrics. cluster-health surfaced this as `k8s-node3 full avg10=23%` and PVE loadavg sustained at ~15.
## Impact
- **User-facing**: None directly. Intermittent kubectl TLS handshake timeouts during the session, attributable to the elevated PVE loadavg.
- **Blast radius**: Single PVE host. node_exporter (PID 1479) wedged in D-state with the kthread. k8s-node3 downstream IO PSI peaked at `full avg10=23%`.
- **Data loss**: None.
- **Observability gap**: No alert fired for "stale NFS connection to decommissioned host". The IO PSI watchdog caught the symptom, not the cause.
## Root Cause
`/usr/local/bin/weekly-backup` was an artifact of the pre-2026-04-13 backup pipeline (when TrueNAS at `10.0.10.15` was the NFS server). After the TrueNAS decommission and migration to host NFS at `192.168.1.127`, the script was never deleted. It executed at least once recently (manually, or via a cron entry that has since been pruned), opening an NFS RPC session to `10.0.10.15`. With no peer answering, the kernel's RPC retry timer parked the manager kthread in `rpc_wait_bit_killable`. The kthread holds a lock that any reader of `/proc/mountstats` must take — `node_exporter` reads that file every scrape interval, so its scrape goroutine wedged in D-state too.
## Resolution
1. `lvextend -L +1T /dev/pve/nfs-data` + `resize2fs``/srv/nfs` 2 TiB → 3 TiB (90% → 60% used). Unrelated to the IO issue but bundled because `/srv/nfs` was at 90% and the user picked "grow LV" over "diet Immich". Thinpool (sdc) had ~4.6 TiB free.
2. `rm /usr/local/bin/weekly-backup` — eliminates the trigger. Backup pipeline is now `daily-backup.service` + `offsite-sync-backup.service` + per-app CronJobs (mysql/postgres/vault/etc.); `weekly-backup` was fully redundant.
3. `systemctl restart node_exporter` — replaces the wedged process. New PID 183319 healthy, `:9100/metrics` responsive.
4. `mysql-standalone` memory bump 2 Gi → 4 Gi limit, 1.5 Gi → 3 Gi request (commit forthcoming). Coincident May 8 18:05 OOM, not caused by this incident — `innodb_buffer_pool_size=1Gi` plus connection buffers and InnoDB internals didn't fit in 2 Gi.
## Open / Out-of-Scope
- **Wedged kthread `[10.0.10.15-manager]` (PID 3796184)** persists in the kernel queue. The kernel will eventually reap it once the RPC retry timer gives up, or it clears at next PVE reboot. With the script gone, no new ops queue against it. **Plan**: if PVE host PSI does not fully clear within 24 h, fold a PVE reboot into the next maintenance window. Not done in this change.
- **Transient OOMs unrelated to this incident**:
- `mysql-standalone-0` May 8 18:05 (anon-rss 2 GB at 2 Gi limit) — addressed by the limit bump above.
- postgres helpers May 9 12:37 — anon-rss <8 MB, pods no longer exist, no recurrence. No action.
- python pod May 9 13:36 (anon-rss 518 MB on k8s-node2) — pod no longer exists, no recurrence. No action.
- **Pre-existing TF drift**: `null_resource.pg_job_hunter_db` in `stacks/dbaas/modules/dbaas/main.tf` execs against `pg-cluster-1`, but the current CNPG primary is `pg-cluster-2`. Unrelated to this incident; surfaced during the targeted MySQL apply. Fix is a separate ticket — should resolve the primary dynamically (e.g., via the `cnpg.io/instanceRole=primary` selector) instead of hardcoding pod ordinal.
## Action Items
- [x] Delete `/usr/local/bin/weekly-backup` on PVE host.
- [x] Restart `node_exporter.service` on PVE host.
- [x] Grow `pve/nfs-data` LV to 3 TiB; online `resize2fs`.
- [x] Bump `mysql-standalone` memory request/limit to 3 Gi / 4 Gi.
- [x] Update `docs/architecture/storage.md` to record the new LV size.
- [ ] Reboot PVE host at next maintenance window if `[10.0.10.15-manager]` kthread does not clear within 24 h.
- [ ] (Separate ticket) Fix `null_resource.pg_*_db` resources to target the actual CNPG primary instead of hardcoding `pg-cluster-1`.
## Related
- TrueNAS decommission: memory `id=674` (2026-04-13).
- Prior LV grow on `pve/nfs-data` (2 TiB out-of-band): memory `id=691` (2026-04-12).
- Architecture: `docs/architecture/storage.md`, `docs/architecture/backup-dr.md`.

View file

@ -0,0 +1,115 @@
# Runbook: KMS public exposure (kms.viktorbarzin.me:1688)
`kms.viktorbarzin.me:1688/TCP` is intentionally open to the internet so any
visitor can activate Volume License Microsoft products. The webpage at
`https://kms.viktorbarzin.me/` documents how to use it.
This runbook covers operations on the public exposure: where to find logs,
how to tune the rate limit, how to revoke if abused.
## Architecture
- **K8s service**: `windows-kms` in namespace `kms`, MetalLB shared LB IP
`10.0.20.200:1688`. ETP=Cluster, so client IPs in vlmcsd logs are SNAT'd
k8s node IPs (not real-world client IPs). Trade-off accepted —
preserving real client IPs would require a dedicated MetalLB IP with
ETP=Local or a PROXY-protocol bounce; vlmcsd doesn't speak PROXY-v2.
- **pfSense WAN forward**: `WAN TCP/1688 → k8s_shared_lb:1688`
(alias = `10.0.20.200`). Description: `KMS public — kms.viktorbarzin.me`.
- **Filter rule** on the WAN interface, TCP/1688, with state-table
per-source caps:
- `max-src-conn 50` — concurrent connections per source IP
- `max-src-conn-rate 10/60` — 10 new connections per 60 seconds per
source
- `overload <virusprot>` flush — sources that exceed either cap get added
to pfSense's stock `virusprot` pf table and have their existing states
flushed. (`virusprot` is the only table pfSense's filter generator
targets for `overload`; see `/etc/inc/filter.inc`. Don't try to point
it at a custom table — the schema doesn't expose that knob.)
## Where the logs are
### vlmcsd (kms namespace, k8s)
```bash
# Live tail
kubectl logs -n kms -l app=kms-service -c windows-kms --tail=50 -f
# All activations in the running pod
kubectl logs -n kms -l app=kms-service -c windows-kms | grep "Incoming KMS request"
```
Source IPs in this log are the SNAT'd node IPs because the LB Service uses
ETP=Cluster on a shared MetalLB IP. Don't expect real WAN client IPs here.
### Slack notifier (kms namespace, k8s)
```bash
kubectl logs -n kms -l app=kms-service -c slack-notifier --tail=50 -f
```
Posts to `#alerts`, dedup window 1h per (source-IP, product). Activations
also increment the Prometheus counter `kms_activations_total{product,status}`
exposed on the same pod at `:9101/metrics` (scraped by the cluster-wide
`kubernetes-pods` job; query via Prometheus or Grafana directly).
### pfSense — virusprot table and filter hits
```bash
# SSH to 10.0.20.1 as root
pfctl -t virusprot -T show # who's currently in the virusprot table
pfctl -t virusprot -T expire 86400 # boot anyone added more than 24h ago
pfctl -t virusprot -T flush # nuke the entire table
# Filter rule hit counts (find the KMS public rule, look at Evaluations / States)
pfctl -sr -v | grep -A 4 1688
# State table — current TCP/1688 connections, per source
pfctl -ss | grep ':1688 '
```
## Tightening or loosening the rate limit
The filter rule is configured via the pfSense web UI
(`Firewall → Rules → WAN`, look for the `KMS public — kms.viktorbarzin.me`
rule) under **Advanced Options → "Maximum new connections per source per
seconds"** and **"Maximum state entries per source"**.
- **Default**: `max-src-conn 50`, `max-src-conn-rate 10/60`
- To **tighten** (suspected abuse): drop to `max-src-conn 10`,
`max-src-conn-rate 3/60`. Flush state and existing virusprot afterwards
(`pfctl -k 0.0.0.0/0 -K 0.0.0.0/0` is overkill — just save+apply the
rule, pfSense reloads pf and existing virusprot stay blocked).
- To **loosen** (legitimate users blocked): bump to
`max-src-conn-rate 30/60`. The `virusprot` table flush still applies on
overload; reduce its lifetime via
`Firewall → Advanced → State Timeouts` if entries linger.
The `overload` table entry survives pf reloads. Running
`pfctl -t virusprot -T flush` after a tuning change clears the slate.
## Revoking the public exposure
If the activation surface needs to come down (abuse, legal, audit):
1. **pfSense web UI**`Firewall → NAT → Port Forward` → find
`WAN TCP/1688 → k8s_shared_lb`**delete** (or disable). Apply.
2. **pfSense web UI**`Firewall → Rules → WAN` → find
`KMS public — kms.viktorbarzin.me`**delete** (or disable). Apply.
3. Verify externally: from a phone tether, `nc -zw3 kms.viktorbarzin.me 1688`
should now fail.
The k8s service stays reachable on the LAN
(`10.0.20.200:1688` and the internal `kms.viktorbarzin.lan` ingress for
the webpage) — only the WAN port-forward is removed.
To put it back, recreate the NAT rule (target alias `k8s_shared_lb`,
port `1688`) and the filter rule with the same per-source caps.
## Related
- Stack: `stacks/kms/` (Terraform; deployment, MetalLB Service, ingress,
ExternalSecret for the Slack webhook)
- Webpage source: `kms-website/` repo (Hugo + nginx, deployed via Drone CI)
- Networking architecture footnote:
`docs/architecture/networking.md` § "MetalLB & Load Balancing"

View file

@ -2,72 +2,85 @@
Last updated: 2026-05-07
When you create a new repo on `forgejo.viktorbarzin.me`, Woodpecker
does NOT auto-discover it via the cluster's existing OAuth session.
The `forgejo` user inside Woodpecker (Forgejo-OAuth'd) needs to:
## Programmatic (preferred)
1. Open `https://ci.viktorbarzin.me/` in a browser.
2. Log in via Forgejo OAuth (the "Sign in with Forgejo" button).
3. Click "Add Repository" — your new repo should appear.
4. Click the toggle to activate it. Woodpecker will:
- Add a webhook on the Forgejo repo (push, PR, release events).
- Register the repo's `forge_remote_id` in its DB so subsequent
hooks deserialize correctly.
5. Push a commit (or hit "Run pipeline" in Woodpecker UI) — first
build fires.
```bash
infra/scripts/woodpecker-register-forgejo-repo.sh viktor/<repo-name>
```
## Why API-only doesn't work
The script:
1. Pulls the `viktor` (Forgejo-OAuth'd) user's `hash` from the
Woodpecker PG `users` table.
2. Mints a session JWT (HS256, signed with that hash) — Woodpecker
per-user session JWTs have payload
`{"type":"user","user-id":"<id>"}` and the signing key is the
user's `hash` column. (Confirmed against a known-good admin
token: same payload shape, signature reproducible from the user's
stored hash via `openssl dgst -sha256 -hmac "$HASH"`.)
3. Looks up the Forgejo repo id and POSTs to
`https://ci.viktorbarzin.me/api/repos?forge_remote_id=<id>` as
that user. Woodpecker server creates the per-repo webhook +
per-repo signing key on the Forgejo side automatically (uses
the user's stored Forgejo OAuth `access_token` to do so — that's
why this only works with viktor's user, not the GitHub admin's).
The webhook URL contains a JWT signed with a per-server key that's
stored in the DB and only accessible at OAuth-flow time. POST'ing
`/api/repos` as the admin (`ViktorBarzin` GitHub user) returns 500
because the lookup queries forge-side OAuth state for THAT user,
which doesn't exist for the Forgejo `viktor` user. We confirmed:
Pre-requisites:
- `vault login -method=oidc` with read access to
`database/static-creds/pg-woodpecker`.
- `kubectl` cluster access (the script spawns a 5-min psql pod in
the `woodpecker` namespace to query the DB).
- A Forgejo PAT in `secret/viktor/forgejo_admin_token` (or pass
`FORGEJO_TOKEN=…` env), used to look up the repo's numeric ID.
- The `viktor` Woodpecker user must already exist (i.e., they've
logged in via Forgejo OAuth at least once on the Web UI).
If user_id=2 / forge_id=2 doesn't exist in `users`, the OAuth
bootstrap is unavoidable — but it only needs to happen once for
the lifetime of the Woodpecker DB.
- Direct `POST /api/repos?forge_remote_id=N` → HTTP 500 server-side.
- Generating a JWT with the agent secret → "token is unverifiable"
on hook delivery (the signing key is repo-specific, not the
global agent secret).
## Why the GitHub admin token can't do this
There's no admin endpoint that side-steps the OAuth flow.
The earlier 500 from `POST /api/repos?forge_remote_id=N` was
because my admin session token authenticates as `ViktorBarzin`
(GitHub user, forge_id=1). Woodpecker tries to call Forgejo as
that user (using their stored Forgejo OAuth token) — which doesn't
exist for the GitHub user, hence the lookup error. There's no way
around this without acting as the Forgejo user.
## Bootstrap when UI access isn't available
## Why the previous "JWT for the webhook" approach didn't work
If you absolutely need to bootstrap a new image without UI access
(e.g., during an outage), the workaround is:
I tried generating a webhook JWT signed with `WOODPECKER_AGENT_SECRET`
(the global agent secret) and registering it directly on Forgejo.
That fails because the webhook JWT verification path runs through a
DB-backed `keyfunc` — Woodpecker stores a per-repo signing key when
the repo is activated, and rejects any JWT signed with a different
key. POST /api/repos is what creates that per-repo key.
1. Build locally:
```bash
docker build -t forgejo.viktorbarzin.me/viktor/<name>:<tag> /path/to/source
docker push forgejo.viktorbarzin.me/viktor/<name>:<tag>
```
2. Or pull from another already-built source and retag:
```bash
docker pull viktorbarzin/<name>:<tag> # DockerHub
docker tag viktorbarzin/<name>:<tag> forgejo.viktorbarzin.me/viktor/<name>:<tag>
docker push forgejo.viktorbarzin.me/viktor/<name>:<tag>
```
3. Flip the cluster `image=` reference and restart deployments.
## After registration
Document the bootstrap in the relevant stack so future maintainers
know the image was put there by hand. After Woodpecker UI onboarding,
the next pipeline run replaces the bootstrap image with a CI-built one.
Pipelines fire automatically on push. The `WOODPECKER_FORGE_TIMEOUT`
default of 3s was too tight for our cluster (Forgejo response time
spikes to 1-2s under load) — bumped to 30s in
`infra/stacks/woodpecker/values.yaml` 2026-05-07. Without that bump,
config-loader hits the deadline and every pipeline errors with
`could not load config from forge: context deadline exceeded`.
## Repos onboarded in flight 2026-05-07
## When the v3.13 → v3.14 server upgrade matters
These were created during the forgejo-registry-consolidation but the
UI step above hasn't been done yet — their `.woodpecker.yml` /
`.woodpecker/build.yml` exists on Forgejo but no pipeline fires:
`v3.14.0` doesn't fix this on its own — the timeout default is the
same. Set `WOODPECKER_FORGE_TIMEOUT` regardless of version. The
v3.14 upgrade was useful for unrelated forge-API changes (smarter
config-loader, fewer redundant calls per trigger).
- `viktor/broker-sync` — image bootstrapped via DockerHub (see
`infra/stacks/wealthfolio/main.tf` comment).
- `viktor/fire-planner` — image bootstrapped via local docker build.
- `viktor/hmrc-sync`
- `viktor/freedify`
- `viktor/claude-agent-service`
- `viktor/beadboard` — image bootstrapped via local docker build.
- `viktor/claude-memory-mcp`
## Troubleshooting
Walk through each in the Woodpecker UI to enable. Pipelines for
already-onboarded repos (payslip-ingest, job-hunter, infra) fired
correctly after the v3.13 → v3.14 upgrade.
- Pipeline status `error` with `could not load config from forge`:
bump `WOODPECKER_FORGE_TIMEOUT`. 30s is plenty.
- Pipeline status `error` with `secret "registry-password" not found`:
the repo's `.woodpecker.yml` still references registry-private
credentials. Drop the `registry.viktorbarzin.me` block — Forgejo
is the only registry now.
- Pipeline status `failure` with `"/vault": not found` (or any
other COPY of a binary): the gitignored binary wasn't pushed to
Forgejo. Switch the Dockerfile to `curl … && unzip` from the
HashiCorp/upstream release URL. See `claude-agent-service/Dockerfile`
commit bab6dd2 for the pattern.

View file

@ -0,0 +1,406 @@
terraform {
required_providers {
kubernetes = {
source = "hashicorp/kubernetes"
}
}
}
# Per-site Anubis reverse proxy.
# Sits between Traefik and the real backend. On first visit, serves a
# proof-of-work challenge; on success, drops a long-lived JWT cookie and
# proxies the request through to `target_url`.
#
# Sharing a single ed25519 signing key across instances + COOKIE_DOMAIN at
# the registrable domain means a token solved on one viktorbarzin.me subdomain
# is honoured by every other Anubis-fronted site.
variable "name" {
type = string
description = "Short logical name (e.g. \"blog\"). Used to derive Service / Deployment / Secret names as anubis-<name>."
}
variable "namespace" {
type = string
description = "Namespace to deploy into — typically the same as the protected backend service."
}
variable "target_url" {
type = string
description = "Backend URL Anubis forwards passing requests to (e.g. http://blog.website.svc.cluster.local)."
}
variable "cookie_domain" {
type = string
default = "viktorbarzin.me"
description = "Cookie domain — set to the registrable domain so a single PoW solve covers every Anubis-fronted subdomain."
}
variable "difficulty" {
type = number
default = 2
description = "PoW difficulty (leading-zero hex chars). 2 = ~250ms desktop / ~700ms mobile. Bump for stronger filtering."
}
variable "cookie_expiration_hours" {
type = number
default = 720 # 30 days
description = "Lifetime of the issued JWT cookie in hours."
}
variable "image_tag" {
type = string
default = "v1.25.0"
description = "ghcr.io/techarohq/anubis tag — pin to a release, never :latest."
}
variable "replicas" {
type = number
default = 1
description = "Replica count. Default 1 because Anubis stores in-flight challenges in process memory — with N>1 a challenge issued by pod A and solved against pod B fails with `store: key not found` (HTTP 500). For HA, configure a shared store (Redis) and bump this. Per-pod 128Mi @ idle is cheap, single-pod restart is sub-second, so 1 is fine for content sites."
}
variable "memory" {
type = string
default = "128Mi"
description = "requests==limits memory. Anubis docs suggest 128Mi handles many concurrent clients."
}
variable "policy_yaml" {
type = string
default = null
description = "Override the strict default bot-policy YAML. Leave null to use the catch-all CHALLENGE policy."
}
variable "cpu_request" {
type = string
default = "20m"
description = "CPU request. PoW verification is server-cheap (just hash check)."
}
locals {
full_name = "anubis-${var.name}"
labels = {
"app" = local.full_name
"app.kubernetes.io/name" = "anubis"
"app.kubernetes.io/instance" = local.full_name
"app.kubernetes.io/component" = "ai-bot-challenge"
"app.kubernetes.io/managed-by" = "terraform"
}
# Strict bot policy. Default Anubis policy only WEIGHs Mozilla|Opera UAs
# and lets unmatched UAs (curl, wget, Python-requests, scrapy, headless
# CLI scrapers) fall through to ALLOW. We import the same upstream
# snippets and append a catch-all CHALLENGE so anyone without JS+PoW
# capability is filtered.
default_policy_yaml = <<-EOT
bots:
# Hard-deny known-bad bots first.
- import: (data)/bots/_deny-pathological.yaml
- import: (data)/bots/aggressive-brazilian-scrapers.yaml
# Hard-deny declared AI/LLM crawlers (ClaudeBot, GPTBot, Bytespider, ).
- import: (data)/meta/ai-block-aggressive.yaml
# Whitelist legitimate search-engine crawlers (Googlebot, Bingbot, ).
- import: (data)/crawlers/_allow-good.yaml
# Challenge Firefox AI previews specifically.
- import: (data)/clients/x-firefox-ai.yaml
# Allow /.well-known, /robots.txt, /favicon.*, /sitemap.xml keeps
# the internet working for benign crawlers and discovery clients.
- import: (data)/common/keep-internet-working.yaml
# Catch-all: every remaining request must solve the challenge. This
# closes the "unmatched UA falls through to ALLOW" gap that lets
# curl/wget/Python-requests scrape non-CDN-fronted hosts.
- name: catchall-challenge
path_regex: .*
action: CHALLENGE
EOT
}
# Bot policy ConfigMap. Mounted into the pod and referenced by POLICY_FNAME.
resource "kubernetes_config_map" "policy" {
metadata {
name = "${local.full_name}-policy"
namespace = var.namespace
labels = local.labels
}
data = {
"botPolicies.yaml" = coalesce(var.policy_yaml, local.default_policy_yaml)
}
}
# ED25519 signing key pulled from Vault `secret/viktor` -> field
# `anubis_ed25519_key`. Same key across every instance so JWTs are
# cross-validatable, enabling cross-subdomain SSO.
resource "kubernetes_manifest" "ed25519_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "${local.full_name}-key"
namespace = var.namespace
}
spec = {
refreshInterval = "1h"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "${local.full_name}-key"
creationPolicy = "Owner"
}
data = [{
secretKey = "key"
remoteRef = {
key = "viktor"
property = "anubis_ed25519_key"
}
}]
}
}
}
resource "kubernetes_deployment" "anubis" {
metadata {
name = local.full_name
namespace = var.namespace
labels = local.labels
}
spec {
replicas = var.replicas
selector {
match_labels = { app = local.full_name }
}
strategy {
type = "RollingUpdate"
rolling_update {
max_surge = 1
max_unavailable = 0
}
}
template {
metadata {
labels = local.labels
}
spec {
# Spread replicas across nodes to survive a single node failure.
topology_spread_constraint {
max_skew = 1
topology_key = "kubernetes.io/hostname"
when_unsatisfiable = "ScheduleAnyway"
label_selector {
match_labels = { app = local.full_name }
}
}
container {
name = "anubis"
image = "ghcr.io/techarohq/anubis:${var.image_tag}"
port {
name = "http"
container_port = 8923
}
port {
name = "metrics"
container_port = 9090
}
env {
name = "BIND"
value = ":8923"
}
env {
name = "METRICS_BIND"
value = ":9090"
}
env {
name = "TARGET"
value = var.target_url
}
env {
name = "DIFFICULTY"
value = tostring(var.difficulty)
}
env {
name = "COOKIE_EXPIRATION_TIME"
value = "${var.cookie_expiration_hours}h"
}
# Cross-subdomain SSO: cookie scoped to the registrable domain so
# a JWT solved on any Anubis-fronted subdomain is honoured on every
# other one. (COOKIE_DOMAIN and COOKIE_DYNAMIC_DOMAIN are mutually
# exclusive picking the explicit form.)
env {
name = "COOKIE_DOMAIN"
value = var.cookie_domain
}
env {
name = "COOKIE_SECURE"
value = "true"
}
env {
name = "COOKIE_SAME_SITE"
value = "Lax"
}
# Built-in robots.txt that disallows known AI scrapers well-behaved
# bots get blocked here without ever paying the PoW cost.
env {
name = "SERVE_ROBOTS_TXT"
value = "true"
}
# Drop cluster-internal IPs from XFF so Anubis sees the real client.
env {
name = "XFF_STRIP_PRIVATE"
value = "true"
}
env {
name = "SLOG_LEVEL"
value = "INFO"
}
env {
name = "ED25519_PRIVATE_KEY_HEX_FILE"
# Mounted from the ESO-managed Secret below.
value = "/keys/key"
}
env {
name = "POLICY_FNAME"
value = "/config/botPolicies.yaml"
}
volume_mount {
name = "ed25519-key"
mount_path = "/keys"
read_only = true
}
volume_mount {
name = "policy"
mount_path = "/config"
read_only = true
}
resources {
requests = {
cpu = var.cpu_request
memory = var.memory
}
limits = {
memory = var.memory
}
}
# Liveness + readiness on the metrics endpoint (zero auth, always 200).
liveness_probe {
http_get {
path = "/metrics"
port = "metrics"
}
initial_delay_seconds = 10
period_seconds = 30
failure_threshold = 3
}
readiness_probe {
http_get {
path = "/metrics"
port = "metrics"
}
initial_delay_seconds = 2
period_seconds = 5
failure_threshold = 2
}
security_context {
run_as_non_root = true
run_as_user = 1000
run_as_group = 1000
allow_privilege_escalation = false
read_only_root_filesystem = true
capabilities {
drop = ["ALL"]
}
}
}
volume {
name = "ed25519-key"
secret {
secret_name = "${local.full_name}-key"
items {
key = "key"
path = "key"
}
}
}
volume {
name = "policy"
config_map {
name = kubernetes_config_map.policy.metadata[0].name
}
}
}
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].template[0].spec[0].dns_config]
}
depends_on = [kubernetes_manifest.ed25519_secret]
}
resource "kubernetes_service" "anubis" {
metadata {
name = local.full_name
namespace = var.namespace
labels = local.labels
annotations = {
"prometheus.io/scrape" = "true"
"prometheus.io/path" = "/metrics"
"prometheus.io/port" = "9090"
}
}
spec {
selector = { app = local.full_name }
port {
name = "http"
port = 8080
target_port = 8923
protocol = "TCP"
}
port {
name = "metrics"
port = 9090
target_port = 9090
protocol = "TCP"
}
}
}
resource "kubernetes_pod_disruption_budget_v1" "anubis" {
metadata {
name = local.full_name
namespace = var.namespace
}
spec {
min_available = "1"
selector {
match_labels = { app = local.full_name }
}
}
}
output "service_name" {
value = kubernetes_service.anubis.metadata[0].name
description = "ClusterIP service name. Pass this to ingress_factory's `service_name` so Traefik routes through Anubis."
}
output "service_port" {
value = 8080
description = "Service port. Anubis listens on 8923 inside; the Service exposes 8080."
}

View file

@ -8,4 +8,7 @@ ExecStart=/usr/local/bin/daily-backup
StandardOutput=journal
StandardError=journal
SyslogIdentifier=daily-backup
TimeoutStartSec=3600
# 4h budget — the snapshot mount + LUKS decrypt + rsync + sqlite scan loop
# scales with the number of PVCs (118 today). Hit the 1h ceiling around week
# 18 of 2026 and silently SIGTERM'd for 10 days. Bumped to 4h with margin.
TimeoutStartSec=14400

View file

@ -21,15 +21,48 @@ warn() { log "WARN: $*" >&2; }
die() { log "FATAL: $*" >&2; push_metrics 1 0; exit 1; }
# --- Locking ---
# Track whether we got SIGTERM/SIGINT so cleanup can push a non-success metric.
# Without this, a systemd timeout-kill leaves WeeklyBackupFailing alerts blind:
# the script never reaches the success push at the end and the metric goes stale
# silently. (Root cause of 2026-04-30 → 2026-05-09 silent-failure run.)
KILLED=""
cleanup() {
umount "${PVC_MOUNT}" 2>/dev/null || true
# Recursively unmount /tmp/pvc-mount: previous SIGTERM'd runs left snapshot
# mounts stacked here, which made every subsequent run start with an
# already-occupied mountpoint and time out before reaching its own umount.
while mountpoint -q "${PVC_MOUNT}" 2>/dev/null; do
umount "${PVC_MOUNT}" 2>/dev/null || umount -l "${PVC_MOUNT}" 2>/dev/null || break
done
# Close any LUKS mappers we opened (or that were left over from a prior crash).
for m in /dev/mapper/pvc-snap-*; do
[ -e "$m" ] || continue
cryptsetup close "$(basename "$m")" 2>/dev/null || true
done
rm -f "${LOCKFILE}"
if [ -n "${KILLED}" ]; then
# status=2 = aborted (matches lvm-pvc-snapshot's convention)
push_metrics 2 "${TOTAL_BYTES:-0}"
fi
}
trap cleanup EXIT
trap 'KILLED=1; exit 143' TERM INT
if ! ( set -o noclobber; echo $$ > "${LOCKFILE}" ) 2>/dev/null; then
die "Another instance is running (PID $(cat "${LOCKFILE}" 2>/dev/null || echo unknown))"
fi
# Belt-and-braces: if a previous run was SIGTERM'd before its trap completed,
# /tmp/pvc-mount may have stacked mounts and stale LUKS mappers. The lock above
# guarantees we're alone, so it's safe to clean these up now.
while mountpoint -q "${PVC_MOUNT}" 2>/dev/null; do
umount "${PVC_MOUNT}" 2>/dev/null || umount -l "${PVC_MOUNT}" 2>/dev/null || break
done
for m in /dev/mapper/pvc-snap-*; do
[ -e "$m" ] || continue
cryptsetup close "$(basename "$m")" 2>/dev/null || true
done
# --- Metrics ---
push_metrics() {
local status="${1:-0}" bytes="${2:-0}"
@ -243,6 +276,7 @@ fi
log "--- Step 3: pfsense backup ---"
PFSENSE_DEST="${BACKUP_ROOT}/pfsense"
DATE=$(date +%Y%m%d)
PFSENSE_STATUS=0
mkdir -p "${PFSENSE_DEST}"
if timeout 10 ssh -o BatchMode=yes -o ConnectTimeout=5 root@10.0.20.1 true 2>/dev/null; then
@ -253,6 +287,7 @@ if timeout 10 ssh -o BatchMode=yes -o ConnectTimeout=5 root@10.0.20.1 true 2>/de
else
warn "Failed to copy pfsense config.xml"
STATUS=1
PFSENSE_STATUS=1
fi
# Full filesystem tar
@ -264,21 +299,28 @@ if timeout 10 ssh -o BatchMode=yes -o ConnectTimeout=5 root@10.0.20.1 true 2>/de
else
warn "Failed to tar pfsense filesystem"
STATUS=1
PFSENSE_STATUS=1
fi
# Retention: keep 4 weekly copies
ls -t "${PFSENSE_DEST}"/config-*.xml 2>/dev/null | tail -n +5 | xargs rm -f 2>/dev/null || true
ls -t "${PFSENSE_DEST}"/pfsense-full-*.tar.gz 2>/dev/null | tail -n +5 | xargs rm -f 2>/dev/null || true
# Push pfsense-specific metric
echo "backup_last_success_timestamp $(date +%s)" | \
curl -s --connect-timeout 5 --max-time 10 --data-binary @- \
"${PUSHGATEWAY}/metrics/job/pfsense-backup" 2>/dev/null || true
else
warn "Cannot SSH to pfsense (10.0.20.1) — skipping"
STATUS=1
PFSENSE_STATUS=1
fi
# Push pfsense-backup metrics in BOTH success and failure paths so
# PfsenseBackupStale + PfsenseBackupFailing alerts can fire instead of going
# silent when ssh-to-pfsense is broken.
{
echo "backup_last_run_timestamp $(date +%s)"
echo "backup_last_status ${PFSENSE_STATUS}"
[ "${PFSENSE_STATUS}" -eq 0 ] && echo "backup_last_success_timestamp $(date +%s)"
} | curl -s --connect-timeout 5 --max-time 10 --data-binary @- \
"${PUSHGATEWAY}/metrics/job/pfsense-backup" 2>/dev/null || true
# ============================================================
# STEP 4: PVE host config backup
# ============================================================

View file

@ -0,0 +1,121 @@
#!/usr/bin/env bash
# Programmatically register a Forgejo repo in Woodpecker without needing the
# Web UI's OAuth flow.
#
# Earlier we believed only the OAuth login could create a working webhook
# because the webhook URL contains a JWT signed with a server-side key.
# That's true for the JWT, BUT the webhook is created server-side when the
# repo is activated through POST /api/repos — Woodpecker handles the JWT
# generation internally. We just need to call that endpoint as the right
# user (the one whose forge OAuth token can read the repo).
#
# The Woodpecker admin token (mine, ViktorBarzin@github) is a session JWT
# of the form `{"type":"user","user-id":"1"}` signed with the user's
# `hash` column (per-user, stored in the `users` table). Forge-API calls
# made on behalf of that user use the user's stored OAuth `access_token`
# from the same row. My GitHub admin can't read Forgejo repos, so the
# admin token can't activate Forgejo repos.
#
# The fix: mint a session JWT for the Forgejo `viktor` user (user_id=2)
# using `viktor`'s `hash`. Then POST /api/repos as viktor — viktor's
# stored Forgejo OAuth token has the access needed.
#
# Usage:
# ./woodpecker-register-forgejo-repo.sh <forgejo-org/repo> [<forgejo-org/repo> ...]
# Example:
# ./woodpecker-register-forgejo-repo.sh viktor/broker-sync viktor/freedify
#
# Requires:
# - vault CLI logged in (oidc or token), with read access to
# secret/database/static-creds/pg-woodpecker AND a Forgejo PAT in
# secret/viktor/forgejo_admin_token (or pass FORGEJO_TOKEN env var)
# - kubectl with cluster access (for the temporary psql pod)
# - openssl
set -euo pipefail
NS=${NS:-woodpecker}
WP_URL=${WP_URL:-https://ci.viktorbarzin.me}
FORGEJO_URL=${FORGEJO_URL:-https://forgejo.viktorbarzin.me}
FORGEJO_USER_LOGIN=${FORGEJO_USER_LOGIN:-viktor}
if [ "$#" -lt 1 ]; then
echo "usage: $0 <org/repo> [<org/repo> ...]" >&2
exit 1
fi
# Pull viktor's `hash` from the woodpecker DB (used to sign the session JWT)
# and OAuth access_token (sanity check it exists).
WP_DB_USER=$(vault read -format=json database/static-creds/pg-woodpecker | jq -r .data.username)
WP_DB_PASS=$(vault read -format=json database/static-creds/pg-woodpecker | jq -r .data.password)
PG_POD=tmp-wp-register-$$
cat <<EOF | kubectl apply -f - >/dev/null
apiVersion: v1
kind: Pod
metadata: { name: $PG_POD, namespace: $NS }
spec:
restartPolicy: Never
containers:
- name: psql
image: postgres:15
env: [{name: PGPASSWORD, value: "$WP_DB_PASS"}]
command: ["sleep", "300"]
EOF
trap "kubectl delete pod -n $NS $PG_POD --wait=false >/dev/null 2>&1 || true" EXIT
for _ in $(seq 1 30); do
PHASE=$(kubectl get pod -n $NS $PG_POD -o jsonpath='{.status.phase}' 2>/dev/null || true)
[ "$PHASE" = "Running" ] && break
sleep 1
done
VIKTOR_HASH=$(kubectl exec -n $NS $PG_POD -- psql -h pg-cluster-rw.dbaas -U "$WP_DB_USER" -d woodpecker -tA -c \
"SELECT hash FROM users WHERE login='$FORGEJO_USER_LOGIN' AND forge_id=2" | tr -d '[:space:]')
if [ -z "$VIKTOR_HASH" ]; then
echo "ERROR: no woodpecker user found for forge_id=2 login=$FORGEJO_USER_LOGIN" >&2
echo " (have they ever logged in via Forgejo OAuth?)" >&2
exit 1
fi
# Mint a session JWT (HS256) for that user.
b64() { openssl base64 -A | tr '+/' '-_' | tr -d '='; }
HEADER=$(printf '%s' '{"alg":"HS256","typ":"JWT"}' | b64)
PAYLOAD=$(printf '{"type":"user","user-id":"%s"}' \
"$(kubectl exec -n $NS $PG_POD -- psql -h pg-cluster-rw.dbaas -U "$WP_DB_USER" -d woodpecker -tA -c \
"SELECT id FROM users WHERE login='$FORGEJO_USER_LOGIN' AND forge_id=2" | tr -d '[:space:]')" | b64)
SIG=$(printf '%s.%s' "$HEADER" "$PAYLOAD" | openssl dgst -sha256 -hmac "$VIKTOR_HASH" -binary | b64)
TOKEN="$HEADER.$PAYLOAD.$SIG"
# Sanity check: am I really logged in as viktor?
ME=$(curl -sf "$WP_URL/api/user" -H "Authorization: Bearer $TOKEN" | jq -r '.login')
if [ "$ME" != "$FORGEJO_USER_LOGIN" ]; then
echo "ERROR: minted token authenticates as '$ME', not '$FORGEJO_USER_LOGIN'" >&2
exit 1
fi
echo "Authenticated as: $ME"
# Activate each repo via POST /api/repos?forge_remote_id=N
# Forgejo repo ID is fetched via the Forgejo API.
FORGEJO_AUTH="${FORGEJO_TOKEN:-$(vault kv get -field=forgejo_admin_token secret/viktor 2>/dev/null || true)}"
if [ -z "$FORGEJO_AUTH" ]; then
echo "ERROR: set FORGEJO_TOKEN env or seed secret/viktor/forgejo_admin_token in vault" >&2
exit 1
fi
for repo in "$@"; do
FRID=$(curl -sf "$FORGEJO_URL/api/v1/repos/$repo" -H "Authorization: token $FORGEJO_AUTH" | jq -r .id 2>/dev/null || true)
if [ -z "$FRID" ] || [ "$FRID" = "null" ]; then
echo " $repo: ERROR resolving Forgejo repo id" >&2
continue
fi
HTTP=$(curl -s -X POST "$WP_URL/api/repos?forge_remote_id=$FRID" \
-H "Authorization: Bearer $TOKEN" \
-o /tmp/wp-add-$FRID.json -w "%{http_code}")
case "$HTTP" in
200) echo " $repo: activated (id=$(jq -r .id /tmp/wp-add-$FRID.json))" ;;
409) echo " $repo: already active" ;;
*) echo " $repo: HTTP $HTTP$(cat /tmp/wp-add-$FRID.json)" ;;
esac
rm -f /tmp/wp-add-$FRID.json
done

Binary file not shown.

Binary file not shown.

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -33,6 +33,10 @@ variable "homepage_annotations" {
type = map(string)
default = {}
}
variable "storage_size" {
type = string
default = "1Gi"
}
resource "kubernetes_persistent_volume_claim" "data_encrypted" {
wait_until_bound = false
@ -50,7 +54,7 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" {
storage_class_name = "proxmox-lvm-encrypted"
resources {
requests = {
storage = "1Gi"
storage = var.storage_size
}
}
}
@ -261,7 +265,7 @@ resource "kubernetes_cron_job_v1" "bank-sync" {
metadata {}
spec {
backoff_limit = 1
ttl_seconds_after_finished = 300
ttl_seconds_after_finished = 86400
template {
metadata {}
spec {
@ -287,23 +291,28 @@ resource "kubernetes_cron_job_v1" "bank-sync" {
LAST_SUCCESS=$END
else
SUCCESS=0
LAST_SUCCESS=0
echo "Bank sync failed with HTTP $HTTP_CODE:"
cat /tmp/response.txt
echo ""
fi
cat <<METRICS | curl -s --data-binary @- "$PUSHGATEWAY"
# HELP bank_sync_success Whether the last bank sync succeeded (1=ok, 0=fail)
# TYPE bank_sync_success gauge
bank_sync_success $SUCCESS
# HELP bank_sync_duration_seconds Duration of the last bank sync run
# TYPE bank_sync_duration_seconds gauge
bank_sync_duration_seconds $DURATION
# HELP bank_sync_last_success_timestamp Unix timestamp of the last successful sync
# TYPE bank_sync_last_success_timestamp gauge
bank_sync_last_success_timestamp $LAST_SUCCESS
METRICS
# Pushgateway POST preserves metrics not in the payload, so on
# failure we omit bank_sync_last_success_timestamp to keep the
# prior success value this prevents BankSyncStale from firing
# alongside BankSyncFailing after a single failed run.
{
printf '# HELP bank_sync_success Whether the last bank sync succeeded (1=ok, 0=fail)\n'
printf '# TYPE bank_sync_success gauge\n'
printf 'bank_sync_success %s\n' "$SUCCESS"
printf '# HELP bank_sync_duration_seconds Duration of the last bank sync run\n'
printf '# TYPE bank_sync_duration_seconds gauge\n'
printf 'bank_sync_duration_seconds %s\n' "$DURATION"
if [ "$SUCCESS" = "1" ]; then
printf '# HELP bank_sync_last_success_timestamp Unix timestamp of the last successful sync\n'
printf '# TYPE bank_sync_last_success_timestamp gauge\n'
printf 'bank_sync_last_success_timestamp %s\n' "$LAST_SUCCESS"
fi
} | curl -s --data-binary @- "$PUSHGATEWAY"
EOT
]
}

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -112,14 +112,27 @@ resource "kubernetes_service" "blog" {
}
}
# Anubis reverse proxy in front of the blog. First-time visitors solve a
# tiny PoW (~250ms desktop), get a 30-day cookie, and pass through. Replaces
# the global ai-bot-block forwardAuth for this site.
module "anubis" {
source = "../../modules/kubernetes/anubis_instance"
name = "blog"
namespace = kubernetes_namespace.website.metadata[0].name
target_url = "http://${kubernetes_service.blog.metadata[0].name}.${kubernetes_namespace.website.metadata[0].name}.svc.cluster.local"
}
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.website.metadata[0].name
name = "blog"
service_name = "blog"
full_host = "viktorbarzin.me"
dns_type = "proxied"
tls_secret_name = var.tls_secret_name
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.website.metadata[0].name
name = "blog"
service_name = module.anubis.service_name
port = module.anubis.service_port
extra_middlewares = ["traefik-x402@kubernetescrd"]
full_host = "viktorbarzin.me"
dns_type = "proxied"
tls_secret_name = var.tls_secret_name
anti_ai_scraping = false # Anubis is the gatekeeper now drop the redundant ai-bot-block forwardAuth.
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "Blog"
@ -131,10 +144,13 @@ module "ingress" {
}
module "ingress-www" {
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.website.metadata[0].name
name = "blog-www"
service_name = "blog"
full_host = "www.viktorbarzin.me"
tls_secret_name = var.tls_secret_name
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.website.metadata[0].name
name = "blog-www"
service_name = module.anubis.service_name
port = module.anubis.service_port
extra_middlewares = ["traefik-x402@kubernetescrd"]
full_host = "www.viktorbarzin.me"
tls_secret_name = var.tls_secret_name
anti_ai_scraping = false
}

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -70,7 +70,7 @@ resource "kubernetes_persistent_volume_claim" "data_proxmox" {
annotations = {
"resize.topolvm.io/threshold" = "80%"
"resize.topolvm.io/increase" = "100%"
"resize.topolvm.io/storage_limit" = "5Gi"
"resize.topolvm.io/storage_limit" = "8Gi"
}
}
spec {
@ -78,7 +78,7 @@ resource "kubernetes_persistent_volume_claim" "data_proxmox" {
storage_class_name = "proxmox-lvm"
resources {
requests = {
storage = "1Gi"
storage = "4Gi"
}
}
}

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform {
backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "claude-memory"
}
}

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -104,12 +104,23 @@ resource "kubernetes_service" "cyberchef" {
}
module "anubis" {
source = "../../modules/kubernetes/anubis_instance"
name = "cc"
namespace = kubernetes_namespace.cyberchef.metadata[0].name
target_url = "http://${kubernetes_service.cyberchef.metadata[0].name}.${kubernetes_namespace.cyberchef.metadata[0].name}.svc.cluster.local"
}
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
dns_type = "proxied"
namespace = kubernetes_namespace.cyberchef.metadata[0].name
name = "cc"
tls_secret_name = var.tls_secret_name
source = "../../modules/kubernetes/ingress_factory"
dns_type = "proxied"
namespace = kubernetes_namespace.cyberchef.metadata[0].name
name = "cc"
service_name = module.anubis.service_name
port = module.anubis.service_port
extra_middlewares = ["traefik-x402@kubernetescrd"]
tls_secret_name = var.tls_secret_name
anti_ai_scraping = false
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "CyberChef"

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -188,10 +188,10 @@ resource "kubernetes_stateful_set_v1" "mysql_standalone" {
resources {
requests = {
cpu = "250m"
memory = "1536Mi"
memory = "3Gi"
}
limits = {
memory = "2Gi"
memory = "4Gi"
}
}

View file

@ -24,6 +24,28 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
"zh:090260dc7889ea822ec1d899344e1ee23eba5290461989c0796149c9511f2316",
"zh:13c2655ff824b0dc4b9bb832b5ca6d41dba97cb280330258c5fef4115e236209",
"zh:166a73c3a810c9c895d68a8ff968158f339f8a2c1c03e20ec9fc5ed99cc64e20",
"zh:203777eae1cdc711233315499643180604cff2324411b186b7cf07fdbe16f655",
"zh:3b2f18c9a8d28dac74dc6bbf168c946855ab9c68f053578d4630c50d5eaf30a0",
"zh:4822275985f6b74b6196c47112316a4252db22cf4ceaef7c9ab4c66d488abf2f",
"zh:53ea97562666c8a5a2f6d63d418a302a7f8ee4b7bb7da35dedaa89aa5708b7f0",
"zh:56b8a230901e3550c92a1d3f58ee9dafe9853f30fe4315af3ab28ae63262e15d",
"zh:6293ab7b1fd8206a0c853591f50186aca4a1eff117b2a773e10760a23a2c83e9",
"zh:9433970f79fb92d8aae3ee436db5630ab312c78b6dc9df9c1db3273a18f8aaa1",
"zh:95df406214f79b3b98222d7c7fe8fc319a3d90b7a9d53e1d5abbda5dfb8b9436",
"zh:a85880da0552a42c8f449390fbd7d8b03541d1a13e04bba9f1404fa658754260",
"zh:a95f6e9bd62c67e70eba1b1a14728856b9a6a28cd1e5e3be54a7718882c87e7f",
"zh:dd599b51c5beb34a4c6feece244fde07d2558d69929449ab1fd39a5ebe738781",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [
@ -48,6 +70,18 @@ provider "registry.terraform.io/hashicorp/kubernetes" {
version = "3.1.0"
hashes = [
"h1:oodIAuFMikXNmEtil5MQgP4dfSctUBYQiGJfjbsF3NY=",
"zh:0215c5c60be62028c09a2f22458e89cda3ef5830a632299f1d401eb3538874b0",
"zh:09ebb9f442431e278a310a9423f32caf467cb4b3cad3fe59573ca71fa7b14e20",
"zh:0c4e5912f83bb35846ae0a9ae54fc320706ee61894cd21cc6b4181b1c5a2fa5c",
"zh:1678c982853ad461e65ccb5e79d585e13ed109dd47dab2a66d3a7a304faeef65",
"zh:1c050a5c15e330457a9c18caacf61a923c59d663e13f2962e4b32f04fef523a0",
"zh:2c55bcec83be58ec132c7cb0a1ac644758b800d794fdc636d53a0eada0358a3a",
"zh:a062bb0aa316c08d8460c66a5d68da71da40de5d3bc3b31abcf3a1a9a19650f1",
"zh:a26fdea0afaa9b247c73c0b42843ca51ba7db0ac2571f9d3d50dcabd20ca1b98",
"zh:c872c9385a78d502bf5823d61cd3bb0f9a0585030e025eb12585c83451beeaa1",
"zh:f180879af931182beee4c8c0d9dab62b81d86f17ddcbe3786ef4c7cec9163a4e",
"zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
"zh:f70f5789264069e0eef06f9b5d5fde955ef7206f7d446d1ce51a4c37a3f3e02f",
]
}

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform {
backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "f1-stream"
}
}

View file

@ -15,6 +15,7 @@ from backend.extractors.aceztrims import AceztrimsExtractor
from backend.extractors.chrome_browser import ChromeBrowserExtractor
from backend.extractors.curated import CuratedExtractor
from backend.extractors.dd12 import DD12Extractor
from backend.extractors.hmembeds import HmembedsExtractor
from backend.extractors.stremio import StremioAddonExtractor
from backend.extractors.subreddit import SubredditExtractor
from backend.extractors.daddylive import DaddyLiveExtractor
@ -64,6 +65,10 @@ def create_registry() -> ExtractorRegistry:
# JW Player file URL. The site embeds the m3u8 in HTML so curl-based
# parsing is enough — no browser needed.
registry.register(DD12Extractor())
# HmembedsExtractor offline-decodes hmembeds.one JWT m3u8 URLs
# (base64+XOR with hardcoded key per page; reverse-engineered
# 2026-05-07). Verifier filters dead origins.
registry.register(HmembedsExtractor())
# StremioAddonExtractor calls Stremio addon HTTP APIs (TvVoo, StremVerse)
# which already index Sky F1 / DAZN F1 / Vavoo IPTV channels. No
# Stremio client needed — just /stream/<type>/<id>.json calls.

View file

@ -0,0 +1,131 @@
"""hmembeds.one decoder + extractor.
Reverse-engineered 2026-05-07 (4-agent parallel session). The hmembeds
embed page contains an inline `<script>` block of the form:
var k = "<16-char ASCII key>";
var b = atob("<URI-encoded XOR-encrypted blob>");
var c = decodeURIComponent(escape(b));
var d = "";
for (var i = 0; i < c.length; i++)
d += String.fromCharCode(c.charCodeAt(i) ^ k.charCodeAt(i % k.length));
(new Function(d))();
The decoded `d` is plain JavaScript that calls `jwplayer('player').setup({
file: <m3u8_url>, ... })`. The `<m3u8_url>` is a JWT-bound URL on
`amsterdam-0183.zulo-0084.online/sec/<JWT>/<embed_id>.m3u8` where the
JWT pins the request to a /24 of the requestor's IP.
So: pure client-side decoding. No fingerprint check, no canvas hash, no
browser-derived input. We can produce the m3u8 URL with curl + Python
faster than launching Chromium.
**Caveat (2026-05-07 reality)**: the hmembeds backend issues JWT URLs
for the curated `888520f3...` (Sky Sports F1 24/7) and `fc3a5463...`
(DAZN F1 24/7) embeds, but the origin (`amsterdam-0183.zulo-0084.online`)
returns 404/403 on the m3u8 fetch from any IP we tested (cluster IPv4
176.12.22.x, dev VM IPv6 2001:470:6f:43d::). Both legacy embed IDs
appear to be offline upstream. This extractor will produce JWT URLs
that the verifier marks unplayable for those specific embeds; if the
upstream broadcasts come back online or fresh IDs are added, the same
extractor logic just works.
"""
import base64
import logging
import re
import urllib.parse
import httpx
from backend.extractors.base import BaseExtractor
from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
USER_AGENT = (
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
"AppleWebKit/605.1.15 (KHTML, like Gecko) "
"Version/17.4 Safari/605.1.15"
)
# Curated hmembeds embed IDs that the community treats as 24/7 channels.
# `_CHANNELS` mirrors the legacy `CuratedExtractor` list — keeping them
# here means the resolver can attempt offline-decoded JWT URLs and the
# verifier filters out the ones that are upstream-offline.
_CHANNELS = (
("888520f36cd94c5da4c71fddc1a5fc9b", "Sky Sports F1 (24/7) — hmembeds"),
("fc3a54634d0867b0c02ee3223292e7c6", "DAZN F1 (24/7) — hmembeds"),
)
_KEY_RE = re.compile(r'k\s*=\s*"([a-z0-9]+)"')
_BLOB_RE = re.compile(r'b\s*=\s*atob\("([^"]+)"\)')
_URL_RE = re.compile(r'streamUrl\s*=\s*"([^"]+)"')
def decode_embed(html: str) -> str | None:
"""Pull the m3u8 URL out of an hmembeds embed HTML.
Returns the JWT-bound m3u8 URL the page would tell JW Player to
play, or None if the page doesn't match the expected shape.
"""
km = _KEY_RE.search(html)
bm = _BLOB_RE.search(html)
if not km or not bm:
return None
key = km.group(1)
blob = bm.group(1)
try:
# b = atob(blob) — base64-decode bytes
# c = decodeURIComponent(escape(b)) — Latin-1 → UTF-8 round-trip
# d[i] = c[i] ^ k[i % len(k)] — XOR with rotating key
raw = base64.b64decode(blob).decode("latin-1")
deuri = urllib.parse.unquote(raw)
decoded = "".join(
chr(ord(c) ^ ord(key[i % len(key)])) for i, c in enumerate(deuri)
)
except Exception:
return None
m = _URL_RE.search(decoded)
return m.group(1) if m else None
class HmembedsExtractor(BaseExtractor):
@property
def site_key(self) -> str:
return "hmembeds"
@property
def site_name(self) -> str:
return "hmembeds.one"
async def extract(self) -> list[ExtractedStream]:
results: list[ExtractedStream] = []
async with httpx.AsyncClient(
timeout=15.0,
follow_redirects=True,
headers={"User-Agent": USER_AGENT, "Referer": "https://hmembeds.one/"},
) as client:
for embed_id, label in _CHANNELS:
try:
page = await client.get(f"https://hmembeds.one/embed/{embed_id}")
except Exception:
logger.debug("[hmembeds] embed %s fetch failed", embed_id, exc_info=True)
continue
if page.status_code != 200:
continue
m3u8 = decode_embed(page.text)
if not m3u8:
continue
results.append(
ExtractedStream(
url=m3u8,
site_key=self.site_key,
site_name=self.site_name,
quality="",
title=label,
stream_type="m3u8",
)
)
logger.info("[hmembeds] resolved %d JWT URL(s) (verifier filters dead origins)", len(results))
return results

View file

@ -228,13 +228,57 @@ module "tls_secret" {
}
# f1-stream serves its SvelteKit SPA via the FastAPI `/{path}` catch-all
# and exposes 14 JSON/proxy routes at root (/schedule, /streams, /embed,
# /embed-asset, /relay, /proxy, /extract, /extractors, /health). A flat
# Anubis catch-all CHALLENGE breaks the SPA's XHRs with "Unexpected token
# '<', '<!doctype '" because the schedule fetch lands on the challenge HTML.
# Custom policy: ALLOW the known JSON routes + SvelteKit `_app/` assets
# (which load before any user has a chance to solve PoW), CHALLENGE
# everything else the HTML pages.
module "anubis" {
source = "../../modules/kubernetes/anubis_instance"
name = "f1"
namespace = kubernetes_namespace.f1-stream.metadata[0].name
target_url = "http://${kubernetes_service.f1-stream.metadata[0].name}.${kubernetes_namespace.f1-stream.metadata[0].name}.svc.cluster.local"
policy_yaml = <<-EOT
bots:
- import: (data)/bots/_deny-pathological.yaml
- import: (data)/bots/aggressive-brazilian-scrapers.yaml
- import: (data)/meta/ai-block-aggressive.yaml
- import: (data)/crawlers/_allow-good.yaml
- import: (data)/clients/x-firefox-ai.yaml
- import: (data)/common/keep-internet-working.yaml
# SvelteKit immutable assets (CSS/JS chunks) and OpenAPI/health routes
# served pre-cookie, must pass without challenge.
- name: f1-svelte-assets-and-meta
path_regex: ^/(_app/|openapi\.json|docs|api/)
action: ALLOW
# Application JSON routes XHR'd by the SPA after the user has solved
# the PoW for `/`. We allow them unconditionally because the alternative
# (carve-out per route via separate Ingress objects) is brittle and
# because the data they expose (stream URLs, schedule metadata) is not
# the AI-scraping target the HTML/SPA is.
- name: f1-data-routes
path_regex: ^/(embed|embed-asset|extract|extractors|health|proxy|relay|schedule|streams)(/|\?|$)
action: ALLOW
- name: catchall-challenge
path_regex: .*
action: CHALLENGE
EOT
}
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
dns_type = "non-proxied"
namespace = kubernetes_namespace.f1-stream.metadata[0].name
name = "f1"
tls_secret_name = var.tls_secret_name
exclude_crowdsec = true
source = "../../modules/kubernetes/ingress_factory"
dns_type = "non-proxied"
namespace = kubernetes_namespace.f1-stream.metadata[0].name
name = "f1"
service_name = module.anubis.service_name
port = module.anubis.service_port
tls_secret_name = var.tls_secret_name
exclude_crowdsec = true
anti_ai_scraping = false
extra_middlewares = ["traefik-x402@kubernetescrd"]
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "F1 Stream"

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -6,6 +6,11 @@ variable "image_tag" {
variable "postgresql_host" { type = string }
variable "tls_secret_name" {
type = string
sensitive = true
}
locals {
namespace = "fire-planner"
# Phase 3 cutover 2026-05-07. NOTE: the registry-private repo for
@ -24,6 +29,10 @@ resource "kubernetes_namespace" "fire_planner" {
labels = {
tier = local.tiers.aux
"istio-injection" = "disabled"
# Lets us drive the deployed UI from the in-cluster chrome-service
# for headless verification (NetworkPolicy in chrome-service ns admits
# any namespace carrying this label).
"chrome-service.viktorbarzin.me/client" = "true"
}
}
lifecycle {
@ -68,6 +77,27 @@ resource "kubernetes_manifest" "external_secret" {
property = "recompute_bearer_token"
}
},
{
secretKey = "ACTUALBUDGET_API_URL"
remoteRef = {
key = "fire-planner"
property = "actualbudget_api_url"
}
},
{
secretKey = "ACTUALBUDGET_API_KEY"
remoteRef = {
key = "fire-planner"
property = "actualbudget_api_key"
}
},
{
secretKey = "ACTUALBUDGET_SYNC_ID"
remoteRef = {
key = "fire-planner"
property = "actualbudget_sync_id"
}
},
]
}
}
@ -117,6 +147,53 @@ resource "kubernetes_manifest" "db_external_secret" {
depends_on = [kubernetes_namespace.fire_planner]
}
# Read-only credentials for the wealthfolio_sync mirror DB (a separate
# Postgres database on the same CNPG cluster). The wealthfolio pod's
# pg-sync sidecar populates `daily_account_valuation` etc. hourly; the
# fire-planner ingest reads those tables via this role.
resource "kubernetes_manifest" "wealthfolio_sync_db_external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "wealthfolio-sync-db-creds"
namespace = local.namespace
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-database"
kind = "ClusterSecretStore"
}
target = {
name = "wealthfolio-sync-db-creds"
template = {
metadata = {
annotations = {
"reloader.stakater.com/match" = "true"
}
}
data = {
WEALTHFOLIO_SYNC_DB_CONNECTION_STRING = "postgresql+asyncpg://wealthfolio_sync:{{ .password }}@${var.postgresql_host}:5432/wealthfolio_sync"
}
}
}
data = [{
secretKey = "password"
remoteRef = {
key = "static-creds/pg-wealthfolio-sync"
property = "password"
}
}]
}
}
depends_on = [kubernetes_namespace.fire_planner]
}
# tls-secret for fire-planner.viktorbarzin.me is auto-cloned into every
# namespace by Kyverno's `sync-tls-secret` ClusterPolicy no local module
# call needed.
resource "kubernetes_deployment" "fire_planner" {
metadata {
name = "fire-planner"
@ -194,6 +271,11 @@ resource "kubernetes_deployment" "fire_planner" {
name = "fire-planner-db-creds"
}
}
env_from {
secret_ref {
name = "wealthfolio-sync-db-creds"
}
}
readiness_probe {
http_get {
@ -304,6 +386,11 @@ resource "kubernetes_cron_job_v1" "fire_planner_recompute" {
name = "fire-planner-db-creds"
}
}
env_from {
secret_ref {
name = "wealthfolio-sync-db-creds"
}
}
resources {
requests = {
@ -329,9 +416,51 @@ resource "kubernetes_cron_job_v1" "fire_planner_recompute" {
depends_on = [
kubernetes_manifest.external_secret,
kubernetes_manifest.db_external_secret,
kubernetes_manifest.wealthfolio_sync_db_external_secret,
]
}
# Public ingress at fire-planner.viktorbarzin.me. Authentik-protected
# (forward-auth at the Traefik layer); Cloudflare-proxied for CDN +
# DDoS shielding. Backend FastAPI serves the SPA at / and the API
# under /api/* (FRONTEND_DIST=/app/frontend_dist, baked into the image).
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
dns_type = "proxied"
namespace = kubernetes_namespace.fire_planner.metadata[0].name
name = "fire-planner"
port = 8080
tls_secret_name = var.tls_secret_name
protected = true
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "FIRE Planner"
"gethomepage.dev/description" = "Risk-adjusted retirement projections (ProjectionLab clone)"
"gethomepage.dev/icon" = "mdi-fire"
"gethomepage.dev/group" = "Finance"
}
}
# Second ingress at the same host for the /api/ prefix WITHOUT Authentik
# forward-auth. The SPA loads under Authentik (main ingress at /), then its
# fetch() XHRs hit /api/* directly forward-auth on /api/* would 302 the
# XHR to a cross-origin Authentik login page, which fetch().json() can't
# parse. App-layer bearer auth still gates writes (POST/PATCH/DELETE on
# scenarios, /recompute, /simulate); read endpoints are open. Acceptable
# for a personal tool whose only data is anonymous numeric projections.
module "ingress_api" {
source = "../../modules/kubernetes/ingress_factory"
dns_type = "none"
namespace = kubernetes_namespace.fire_planner.metadata[0].name
name = "fire-planner-api"
host = "fire-planner" # share effective_host with main ingress
service_name = "fire-planner"
port = 8080
ingress_path = ["/api/"]
tls_secret_name = var.tls_secret_name
protected = false
}
# Plan-time read of the ESO-created K8s Secret for Grafana datasource
# password. First-apply gotcha: must
# `terragrunt apply -target=kubernetes_manifest.db_external_secret` so

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform {
backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "forgejo"
}
}

View file

@ -40,10 +40,16 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" {
storage_class_name = "proxmox-lvm-encrypted"
resources {
requests = {
storage = "15Gi"
storage = "30Gi"
}
}
}
lifecycle {
# pvc-autoresizer expands this PVC up to storage_limit; ignore drift on
# requests.storage. To bump the floor manually: temporarily remove this
# block, apply the new size, re-add the block, apply again.
ignore_changes = [spec[0].resources[0].requests]
}
}
resource "kubernetes_deployment" "forgejo" {

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform {
backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "freedify"
}
}

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [
@ -45,22 +53,9 @@ provider "registry.terraform.io/hashicorp/helm" {
}
provider "registry.terraform.io/hashicorp/kubernetes" {
version = "3.0.1"
version = "3.1.0"
hashes = [
"h1:P0c8knzZnouTNFIRij8IS7+pqd0OKaFDYX0j4GRsiqo=",
"h1:vyHdH0p6bf9xp1NPePObAJkXTJb/I09FQQmmevTzZe0=",
"zh:02d55b0b2238fd17ffa12d5464593864e80f402b90b31f6e1bd02249b9727281",
"zh:20b93a51bfeed82682b3c12f09bac3031f5bdb4977c47c97a042e4df4fb2f9ba",
"zh:6e14486ecfaee38c09ccf33d4fdaf791409f90795c1b66e026c226fad8bc03c7",
"zh:8d0656ff422df94575668e32c310980193fccb1c28117e5c78dd2d4050a760a6",
"zh:9795119b30ec0c1baa99a79abace56ac850b6e6fbce60e7f6067792f6eb4b5f4",
"zh:b388c87acc40f6bd9620f4e23f01f3c7b41d9b88a68d5255dec0a72f0bdec249",
"zh:b59abd0a980649c2f97f172392f080eaeb18e486b603f83bf95f5d93aeccc090",
"zh:ba6e3060fddf4a022087d8f09e38aa0001c705f21170c2ded3d1c26c12f70d97",
"zh:c12626d044b1d5501cf95ca78cbe507c13ad1dd9f12d4736df66eb8e5f336eb8",
"zh:c55203240d50f4cdeb3df1e1760630d677679f5b1a6ffd9eba23662a4ad05119",
"zh:ea206a5a32d6e0d6e32f1849ad703da9a28355d9c516282a8458b5cf1502b2a1",
"zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
"h1:oodIAuFMikXNmEtil5MQgP4dfSctUBYQiGJfjbsF3NY=",
]
}

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -137,14 +137,24 @@ resource "kubernetes_service" "cache_proxy" {
}
}
module "anubis" {
source = "../../modules/kubernetes/anubis_instance"
name = "homepage"
namespace = kubernetes_namespace.homepage.metadata[0].name
target_url = "http://${kubernetes_service.cache_proxy.metadata[0].name}.${kubernetes_namespace.homepage.metadata[0].name}.svc.cluster.local"
}
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.homepage.metadata[0].name
name = "homepage"
host = "home"
dns_type = "proxied"
service_name = kubernetes_service.cache_proxy.metadata[0].name
tls_secret_name = var.tls_secret_name
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.homepage.metadata[0].name
name = "homepage"
host = "home"
dns_type = "proxied"
service_name = module.anubis.service_name
port = module.anubis.service_port
extra_middlewares = ["traefik-x402@kubernetescrd"]
tls_secret_name = var.tls_secret_name
anti_ai_scraping = false
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "Homepage"

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform {
backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
conn_str = "postgres://terraform_state:tOvxJ-7fxdWq0p3jKeYB@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "immich"
}
}

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -0,0 +1,17 @@
variable "tls_secret_name" {
type = string
sensitive = true
}
variable "image_tag" {
type = string
default = "latest"
description = "instagram-poster image tag. Use 8-char git SHA in CI; :latest only for local trials."
}
module "instagram_poster" {
source = "./modules/instagram-poster"
tier = local.tiers.aux
tls_secret_name = var.tls_secret_name
image_tag = var.image_tag
}

View file

@ -0,0 +1,324 @@
locals {
namespace = "instagram-poster"
# Forgejo registry consolidation (2026-05-07): all custom service images
# live under forgejo.viktorbarzin.me/viktor/<name>. The old 10.0.20.10
# private registry was decommissioned the same day.
image = "forgejo.viktorbarzin.me/viktor/instagram-poster:${var.image_tag}"
labels = {
app = "instagram-poster"
}
}
resource "kubernetes_namespace" "instagram_poster" {
metadata {
name = local.namespace
labels = {
tier = var.tier
"istio-injection" = "disabled"
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
}
# App secrets sourced from Vault KV `secret/instagram-poster`.
# Seed these manually in Vault before applying:
# secret/instagram-poster -> properties:
# - immich_api_key (required)
# - postiz_api_token (required)
# - immich_tag_instagram (optional auto-resolved if missing)
# - immich_tag_posted (optional auto-resolved if missing)
resource "kubernetes_manifest" "external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "instagram-poster-secrets"
namespace = local.namespace
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "instagram-poster-secrets"
template = {
metadata = {
annotations = {
"reloader.stakater.com/match" = "true"
}
}
}
}
data = [
{
secretKey = "IMMICH_API_KEY"
remoteRef = { key = "instagram-poster", property = "immich_api_key" }
},
{
secretKey = "POSTIZ_API_TOKEN"
remoteRef = { key = "instagram-poster", property = "postiz_api_token" }
},
{
secretKey = "IMMICH_TAG_INSTAGRAM"
remoteRef = { key = "instagram-poster", property = "immich_tag_instagram" }
},
{
secretKey = "IMMICH_TAG_POSTED"
remoteRef = { key = "instagram-poster", property = "immich_tag_posted" }
},
{
secretKey = "TELEGRAM_BOT_TOKEN"
remoteRef = { key = "instagram-poster", property = "telegram_bot_token" }
},
{
secretKey = "TELEGRAM_CHAT_ID"
remoteRef = { key = "instagram-poster", property = "telegram_chat_id" }
},
{
secretKey = "POSTIZ_INTEGRATION_ID"
remoteRef = { key = "instagram-poster", property = "postiz_integration_id" }
},
{
secretKey = "IMMICH_PG_HOST"
remoteRef = { key = "instagram-poster", property = "immich_pg_host" }
},
{
secretKey = "IMMICH_PG_PORT"
remoteRef = { key = "instagram-poster", property = "immich_pg_port" }
},
{
secretKey = "IMMICH_PG_DATABASE"
remoteRef = { key = "instagram-poster", property = "immich_pg_database" }
},
{
secretKey = "IMMICH_PG_USER"
remoteRef = { key = "instagram-poster", property = "immich_pg_user" }
},
{
secretKey = "IMMICH_PG_PASSWORD"
remoteRef = { key = "instagram-poster", property = "immich_pg_password" }
},
]
}
}
depends_on = [kubernetes_namespace.instagram_poster]
}
# Persistent state: SQLite + image cache. Sensitive (API tokens may end up
# in cached images / debug logs), but the proxmox-lvm-encrypted SC is for
# user-data DBs; this is a small app cache so plain proxmox-lvm fits the
# infra/.claude/CLAUDE.md decision rule.
resource "kubernetes_persistent_volume_claim" "data" {
wait_until_bound = false
metadata {
name = "instagram-poster-data"
namespace = kubernetes_namespace.instagram_poster.metadata[0].name
annotations = {
"resize.topolvm.io/threshold" = "80%"
"resize.topolvm.io/increase" = "100%"
"resize.topolvm.io/storage_limit" = "20Gi"
}
}
spec {
access_modes = ["ReadWriteOnce"]
storage_class_name = "proxmox-lvm"
resources {
requests = {
storage = "10Gi"
}
}
}
}
resource "kubernetes_deployment" "instagram_poster" {
metadata {
name = "instagram-poster"
namespace = kubernetes_namespace.instagram_poster.metadata[0].name
labels = merge(local.labels, {
tier = var.tier
})
annotations = {
"reloader.stakater.com/search" = "true"
}
}
spec {
replicas = 1
# RWO PVC cannot rolling-update.
strategy {
type = "Recreate"
}
selector {
match_labels = local.labels
}
template {
metadata {
labels = local.labels
annotations = {
# Diun watches this image tag and POSTs the auto-upgrade pipeline.
"diun.enable" = "true"
}
}
spec {
image_pull_secrets {
name = "registry-credentials"
}
# PVC mounts as root by default; pod runs as uid/gid 10001 (poster).
# fs_group makes kubelet chown the volume to gid 10001 on mount.
security_context {
fs_group = 10001
run_as_user = 10001
run_as_group = 10001
run_as_non_root = true
}
container {
name = "instagram-poster"
image = local.image
port {
container_port = 8000
}
env_from {
secret_ref {
name = "instagram-poster-secrets"
}
}
env {
name = "IMMICH_BASE_URL"
value = "https://immich.viktorbarzin.me"
}
env {
name = "POSTIZ_BASE_URL"
value = "http://postiz.postiz.svc.cluster.local"
}
env {
name = "PUBLIC_BASE_URL"
value = "https://instagram-poster.viktorbarzin.me"
}
env {
name = "DATA_DIR"
value = "/data"
}
env {
name = "LOG_LEVEL"
value = "INFO"
}
volume_mount {
name = "data"
mount_path = "/data"
}
readiness_probe {
http_get {
path = "/healthz"
port = 8000
}
initial_delay_seconds = 5
period_seconds = 10
}
liveness_probe {
http_get {
path = "/healthz"
port = 8000
}
initial_delay_seconds = 15
period_seconds = 20
}
resources {
requests = {
cpu = "50m"
memory = "128Mi"
}
# Pillow full-resolution HEIC decode peaks ~600-800Mi for big phone
# photos; 512Mi was OOMKilling on /original requests.
limits = {
memory = "1500Mi"
}
}
}
volume {
name = "data"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.data.metadata[0].name
}
}
}
}
}
lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
depends_on = [
kubernetes_manifest.external_secret,
]
}
resource "kubernetes_service" "instagram_poster" {
metadata {
name = "instagram-poster"
namespace = kubernetes_namespace.instagram_poster.metadata[0].name
labels = local.labels
}
spec {
type = "ClusterIP"
selector = local.labels
port {
name = "http"
port = 80
target_port = 8000
}
}
}
# Two ingresses on the same host Traefik picks the longest path prefix.
#
# `/image/*` must be reachable WITHOUT auth so Meta's content fetcher (and
# Telegram's photo preview) can render the 9:16 derivatives we produce.
# Everything else (/queue, /scan, /enqueue, /post-next, /reject, /healthz)
# sits behind Authentik forward-auth same defense as every other UI on
# the cluster, no random caller can pop items off the approval queue.
module "ingress_image_public" {
source = "../../../../modules/kubernetes/ingress_factory"
dns_type = "proxied"
namespace = kubernetes_namespace.instagram_poster.metadata[0].name
name = "instagram-poster-image"
host = "instagram-poster"
tls_secret_name = var.tls_secret_name
protected = false
ingress_path = ["/image", "/original"]
port = 80
service_name = "instagram-poster"
}
module "ingress_protected" {
source = "../../../../modules/kubernetes/ingress_factory"
dns_type = "none" # DNS record already created by the public ingress above
namespace = kubernetes_namespace.instagram_poster.metadata[0].name
name = "instagram-poster"
host = "instagram-poster"
tls_secret_name = var.tls_secret_name
protected = true
ingress_path = ["/"]
port = 80
service_name = "instagram-poster"
}

View file

@ -0,0 +1,15 @@
variable "tls_secret_name" {
type = string
sensitive = true
}
variable "image_tag" {
type = string
default = "latest"
description = "instagram-poster image tag. Use 8-char git SHA in CI; :latest only for local trials."
}
variable "tier" {
type = string
default = "4-aux"
}

View file

@ -0,0 +1 @@
../../secrets

View file

@ -0,0 +1,23 @@
include "root" {
path = find_in_parent_folders()
}
dependency "platform" {
config_path = "../platform"
skip_outputs = true
}
dependency "vault" {
config_path = "../vault"
skip_outputs = true
}
dependency "external-secrets" {
config_path = "../external-secrets"
skip_outputs = true
}
inputs = {
# Bump per deploy. Use 8-char git SHA :latest causes stale pull-through cache.
image_tag = "da5b4191"
}

View file

@ -294,18 +294,52 @@ resource "kubernetes_service" "job_hunter" {
}
}
# Plan-time read of the ESO-created DB creds Secret for Grafana datasource.
# First apply: -target=kubernetes_manifest.db_external_secret first so the Secret exists.
data "kubernetes_secret" "job_hunter_db_creds" {
metadata {
name = "job-hunter-db-creds"
namespace = kubernetes_namespace.job_hunter.metadata[0].name
# ExternalSecret in the monitoring namespace mirroring the rotating
# job_hunter DB password. Grafana mounts this via envFromSecrets in
# monitoring/grafana_chart_values.yaml; the datasource ConfigMap below
# references it as $__env{JOB_HUNTER_PG_PASSWORD}. Reloader restarts
# Grafana whenever ESO updates this secret (every 7d on rotation).
resource "kubernetes_manifest" "grafana_job_hunter_db_external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "grafana-job-hunter-pg-creds"
namespace = "monitoring"
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-database"
kind = "ClusterSecretStore"
}
target = {
name = "grafana-job-hunter-pg-creds"
template = {
metadata = {
annotations = {
"reloader.stakater.com/match" = "true"
}
}
data = {
JOB_HUNTER_PG_PASSWORD = "{{ .password }}"
}
}
}
data = [{
secretKey = "password"
remoteRef = {
key = "static-creds/pg-job-hunter"
property = "password"
}
}]
}
}
depends_on = [kubernetes_manifest.db_external_secret]
}
# Grafana datasource for the job_hunter Postgres DB. Lives in the monitoring
# namespace so the grafana sidecar (label grafana_datasource=1) picks it up.
# Password is injected via $__env{...} from grafana-job-hunter-pg-creds (above).
resource "kubernetes_config_map" "grafana_job_hunter_datasource" {
metadata {
name = "grafana-job-hunter-datasource"
@ -333,10 +367,11 @@ resource "kubernetes_config_map" "grafana_job_hunter_datasource" {
timescaledb = false
}
secureJsonData = {
password = data.kubernetes_secret.job_hunter_db_creds.data["DB_PASSWORD"]
password = "$__env{JOB_HUNTER_PG_PASSWORD}"
}
editable = true
}]
})
}
depends_on = [kubernetes_manifest.grafana_job_hunter_db_external_secret]
}

View file

@ -84,12 +84,23 @@ resource "kubernetes_service" "jsoncrack" {
}
}
module "anubis" {
source = "../../modules/kubernetes/anubis_instance"
name = "json"
namespace = kubernetes_namespace.jsoncrack.metadata[0].name
target_url = "http://${kubernetes_service.jsoncrack.metadata[0].name}.${kubernetes_namespace.jsoncrack.metadata[0].name}.svc.cluster.local"
}
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
dns_type = "proxied"
namespace = kubernetes_namespace.jsoncrack.metadata[0].name
name = "json"
tls_secret_name = var.tls_secret_name
source = "../../modules/kubernetes/ingress_factory"
dns_type = "proxied"
namespace = kubernetes_namespace.jsoncrack.metadata[0].name
name = "json"
service_name = module.anubis.service_name
port = module.anubis.service_port
extra_middlewares = ["traefik-x402@kubernetescrd"]
tls_secret_name = var.tls_secret_name
anti_ai_scraping = false
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "JSON Crack"

View file

@ -29,6 +29,20 @@ provider "registry.terraform.io/goauthentik/authentik" {
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
"zh:090260dc7889ea822ec1d899344e1ee23eba5290461989c0796149c9511f2316",
"zh:13c2655ff824b0dc4b9bb832b5ca6d41dba97cb280330258c5fef4115e236209",
"zh:166a73c3a810c9c895d68a8ff968158f339f8a2c1c03e20ec9fc5ed99cc64e20",
"zh:203777eae1cdc711233315499643180604cff2324411b186b7cf07fdbe16f655",
"zh:3b2f18c9a8d28dac74dc6bbf168c946855ab9c68f053578d4630c50d5eaf30a0",
"zh:4822275985f6b74b6196c47112316a4252db22cf4ceaef7c9ab4c66d488abf2f",
"zh:53ea97562666c8a5a2f6d63d418a302a7f8ee4b7bb7da35dedaa89aa5708b7f0",
"zh:56b8a230901e3550c92a1d3f58ee9dafe9853f30fe4315af3ab28ae63262e15d",
"zh:6293ab7b1fd8206a0c853591f50186aca4a1eff117b2a773e10760a23a2c83e9",
"zh:9433970f79fb92d8aae3ee436db5630ab312c78b6dc9df9c1db3273a18f8aaa1",
"zh:95df406214f79b3b98222d7c7fe8fc319a3d90b7a9d53e1d5abbda5dfb8b9436",
"zh:a85880da0552a42c8f449390fbd7d8b03541d1a13e04bba9f1404fa658754260",
"zh:a95f6e9bd62c67e70eba1b1a14728856b9a6a28cd1e5e3be54a7718882c87e7f",
"zh:dd599b51c5beb34a4c6feece244fde07d2558d69929449ab1fd39a5ebe738781",
]
}
@ -56,6 +70,18 @@ provider "registry.terraform.io/hashicorp/kubernetes" {
version = "3.1.0"
hashes = [
"h1:oodIAuFMikXNmEtil5MQgP4dfSctUBYQiGJfjbsF3NY=",
"zh:0215c5c60be62028c09a2f22458e89cda3ef5830a632299f1d401eb3538874b0",
"zh:09ebb9f442431e278a310a9423f32caf467cb4b3cad3fe59573ca71fa7b14e20",
"zh:0c4e5912f83bb35846ae0a9ae54fc320706ee61894cd21cc6b4181b1c5a2fa5c",
"zh:1678c982853ad461e65ccb5e79d585e13ed109dd47dab2a66d3a7a304faeef65",
"zh:1c050a5c15e330457a9c18caacf61a923c59d663e13f2962e4b32f04fef523a0",
"zh:2c55bcec83be58ec132c7cb0a1ac644758b800d794fdc636d53a0eada0358a3a",
"zh:a062bb0aa316c08d8460c66a5d68da71da40de5d3bc3b31abcf3a1a9a19650f1",
"zh:a26fdea0afaa9b247c73c0b42843ca51ba7db0ac2571f9d3d50dcabd20ca1b98",
"zh:c872c9385a78d502bf5823d61cd3bb0f9a0585030e025eb12585c83451beeaa1",
"zh:f180879af931182beee4c8c0d9dab62b81d86f17ddcbe3786ef4c7cec9163a4e",
"zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
"zh:f70f5789264069e0eef06f9b5d5fde955ef7206f7d446d1ce51a4c37a3f3e02f",
]
}

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform {
backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "kms"
}
}

View file

@ -0,0 +1,222 @@
#!/usr/bin/env python3
"""
Tail vlmcsd verbose log; post a Slack message per activation, and expose
Prometheus metrics on /metrics for activation counts.
vlmcsd verbose output emits a multi-line block per request:
<ts>: IPv4 connection accepted: <ip>:<port>.
<ts>: <<< Incoming KMS request
<ts>: Application ID : <uuid> (<name>)
<ts>: Activation ID (Product): <uuid> (<product>)
<ts>: Workstation name : <hostname>
...
<ts>: IPv4 connection closed: <ip>:<port>.
We accumulate per-connection state and emit on close. Dedupes by
(client_ip, product) within DEDUP_WINDOW_SECONDS to avoid spam from
Windows' default 7-day re-activation cycle hitting us repeatedly.
Prometheus metrics (text format, no client_ip label cardinality):
kms_activations_total{product, status} counter
kms_activations_dedup_skipped_total{product} counter
kms_last_activation_timestamp_seconds gauge
kms_slack_notifier_up gauge (heartbeat)
"""
import json
import os
import re
import sys
import threading
import time
import urllib.error
import urllib.request
from collections import OrderedDict
from http.server import BaseHTTPRequestHandler, HTTPServer
LOG_PATH = os.environ.get("VLMCSD_LOG", "/var/log/vlmcsd/vlmcsd.log")
WEBHOOK = os.environ["SLACK_WEBHOOK_URL"]
CHANNEL = os.environ.get("SLACK_CHANNEL", "#alerts")
DEDUP_WINDOW = int(os.environ.get("DEDUP_WINDOW_SECONDS", "3600"))
DEDUP_MAX = 4096
METRICS_PORT = int(os.environ.get("METRICS_PORT", "9101"))
OPEN_RE = re.compile(r":\s*IPv[46] connection accepted:\s*([0-9a-f.:\[\]]+):\d+")
CLOSE_RE = re.compile(r":\s*IPv[46] connection closed:\s*([0-9a-f.:\[\]]+):\d+")
APP_RE = re.compile(r":\s*Application ID\s*:\s*[0-9a-f-]+\s*\(([^)]+)\)")
PROD_RE = re.compile(r":\s*Activation ID \(Product\)\s*:\s*[0-9a-f-]+\s*\(([^)]+)\)")
HOST_RE = re.compile(r":\s*Workstation name\s*:\s*(.+?)\s*$")
STATUS_RE = re.compile(r":\s*Licensing status\s*:\s*\d+\s*\((.+?)\)\s*$")
_metrics_lock = threading.Lock()
_activations: dict = {}
_dedup_skipped: dict = {}
_last_activation_ts: float = 0.0
def _esc(value: str) -> str:
return str(value).replace("\\", "\\\\").replace('"', '\\"').replace("\n", "\\n")
def record_activation(product: str, status: str) -> None:
global _last_activation_ts
with _metrics_lock:
key = (product, status)
_activations[key] = _activations.get(key, 0) + 1
_last_activation_ts = time.time()
def record_dedup_skip(product: str) -> None:
with _metrics_lock:
_dedup_skipped[product] = _dedup_skipped.get(product, 0) + 1
def render_metrics() -> bytes:
out = []
with _metrics_lock:
activations = dict(_activations)
dedup_skipped = dict(_dedup_skipped)
last_ts = _last_activation_ts
out.append("# HELP kms_activations_total KMS activation events that resulted in a Slack post.")
out.append("# TYPE kms_activations_total counter")
for (product, status), count in sorted(activations.items()):
out.append(
f'kms_activations_total{{product="{_esc(product)}",status="{_esc(status)}"}} {count}'
)
out.append("# HELP kms_activations_dedup_skipped_total KMS activation events suppressed by dedup window.")
out.append("# TYPE kms_activations_dedup_skipped_total counter")
for product, count in sorted(dedup_skipped.items()):
out.append(f'kms_activations_dedup_skipped_total{{product="{_esc(product)}"}} {count}')
out.append("# HELP kms_last_activation_timestamp_seconds Unix ts of the last non-deduped activation.")
out.append("# TYPE kms_last_activation_timestamp_seconds gauge")
out.append(f"kms_last_activation_timestamp_seconds {last_ts}")
out.append("# HELP kms_slack_notifier_up 1 while the notifier process is running.")
out.append("# TYPE kms_slack_notifier_up gauge")
out.append("kms_slack_notifier_up 1")
return ("\n".join(out) + "\n").encode("utf-8")
class MetricsHandler(BaseHTTPRequestHandler):
def do_GET(self):
if self.path == "/healthz":
self.send_response(200)
self.send_header("Content-Type", "text/plain")
self.end_headers()
self.wfile.write(b"ok\n")
return
if self.path != "/metrics":
self.send_response(404)
self.end_headers()
return
body = render_metrics()
self.send_response(200)
self.send_header("Content-Type", "text/plain; version=0.0.4; charset=utf-8")
self.send_header("Content-Length", str(len(body)))
self.end_headers()
self.wfile.write(body)
def log_message(self, *args, **kwargs):
pass
def start_metrics_server() -> None:
server = HTTPServer(("0.0.0.0", METRICS_PORT), MetricsHandler)
print(f"[slack-notifier] metrics on :{METRICS_PORT}/metrics", flush=True)
server.serve_forever()
def slack_post(text: str) -> None:
payload = json.dumps({"channel": CHANNEL, "text": text, "username": "kms.viktorbarzin.me", "icon_emoji": ":computer:"}).encode("utf-8")
req = urllib.request.Request(WEBHOOK, data=payload, headers={"Content-Type": "application/json"})
try:
urllib.request.urlopen(req, timeout=10).read()
except urllib.error.URLError as exc:
print(f"[slack] post failed: {exc}", file=sys.stderr)
class DedupCache(OrderedDict):
def should_send(self, key: str) -> bool:
now = time.time()
while self and (now - next(iter(self.values()))) > DEDUP_WINDOW:
self.popitem(last=False)
if key in self and (now - self[key]) < DEDUP_WINDOW:
return False
if len(self) >= DEDUP_MAX:
self.popitem(last=False)
self[key] = now
self.move_to_end(key)
return True
def follow(path: str):
while not os.path.exists(path):
time.sleep(1)
fh = open(path, "r", encoding="utf-8", errors="replace")
fh.seek(0, 2)
inode = os.fstat(fh.fileno()).st_ino
while True:
line = fh.readline()
if line:
yield line.rstrip("\n")
continue
time.sleep(0.5)
try:
new_inode = os.stat(path).st_ino
if new_inode != inode:
fh.close()
fh = open(path, "r", encoding="utf-8", errors="replace")
inode = new_inode
except FileNotFoundError:
time.sleep(1)
def main() -> None:
threading.Thread(target=start_metrics_server, daemon=True).start()
dedup = DedupCache()
print(f"[slack-notifier] tailing {LOG_PATH}, posting to {CHANNEL} via Slack", flush=True)
state: dict = {}
for line in follow(LOG_PATH):
if (m := OPEN_RE.search(line)):
state = {"ip": m.group(1)}
continue
if not state:
continue
if (m := APP_RE.search(line)):
state["app"] = m.group(1)
elif (m := PROD_RE.search(line)):
state["product"] = m.group(1)
elif (m := HOST_RE.search(line)):
state["host"] = m.group(1)
elif (m := STATUS_RE.search(line)):
state["status"] = m.group(1)
elif CLOSE_RE.search(line):
ip = state.get("ip", "?")
product = state.get("product", state.get("app", "unknown"))
host = state.get("host", "?")
status = state.get("status", "unknown")
key = f"{ip}|{product}"
if dedup.should_send(key):
text = (
f":computer: KMS activation\n"
f"• *Client*: `{ip}`\n"
f"• *Workstation*: `{host}`\n"
f"• *Product*: `{product}`\n"
f"• *Status before*: {status}"
)
slack_post(text)
record_activation(product, status)
print(f"[slack-notifier] sent: ip={ip} product={product!r} host={host!r}", flush=True)
else:
record_dedup_skip(product)
print(f"[slack-notifier] dedup-skip: ip={ip} product={product!r}", flush=True)
state = {}
if __name__ == "__main__":
main()

View file

@ -103,12 +103,23 @@ resource "kubernetes_service" "kms-web-page" {
}
}
module "anubis" {
source = "../../modules/kubernetes/anubis_instance"
name = "kms"
namespace = kubernetes_namespace.kms.metadata[0].name
target_url = "http://${kubernetes_service.kms-web-page.metadata[0].name}.${kubernetes_namespace.kms.metadata[0].name}.svc.cluster.local"
}
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
dns_type = "non-proxied"
namespace = kubernetes_namespace.kms.metadata[0].name
name = "kms"
tls_secret_name = var.tls_secret_name
source = "../../modules/kubernetes/ingress_factory"
dns_type = "non-proxied"
namespace = kubernetes_namespace.kms.metadata[0].name
name = "kms"
service_name = module.anubis.service_name
port = module.anubis.service_port
extra_middlewares = ["traefik-x402@kubernetescrd"]
tls_secret_name = var.tls_secret_name
anti_ai_scraping = false
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "KMS"
@ -119,6 +130,46 @@ module "ingress" {
}
}
resource "kubernetes_config_map" "kms_slack_notifier" {
metadata {
name = "kms-slack-notifier"
namespace = kubernetes_namespace.kms.metadata[0].name
}
data = {
"notifier.py" = file("${path.module}/files/slack-notifier.py")
}
}
resource "kubernetes_manifest" "kms_slack_external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "kms-slack-webhook"
namespace = kubernetes_namespace.kms.metadata[0].name
}
spec = {
refreshInterval = "1h"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "kms-slack-webhook"
creationPolicy = "Owner"
}
data = [{
secretKey = "url"
remoteRef = {
key = "kms"
property = "slack_webhook_url"
}
}]
}
}
depends_on = [kubernetes_namespace.kms]
}
resource "kubernetes_deployment" "windows_kms" {
metadata {
name = "kms"
@ -140,11 +191,31 @@ resource "kubernetes_deployment" "windows_kms" {
labels = {
app = "kms-service"
}
annotations = {
# Reload pods when the notifier script changes
"checksum/notifier" = sha1(file("${path.module}/files/slack-notifier.py"))
# Prometheus scrape kubernetes-pods job picks up via pod IP
"prometheus.io/scrape" = "true"
"prometheus.io/port" = "9101"
"prometheus.io/path" = "/metrics"
}
}
spec {
volume {
name = "vlmcsd-log"
empty_dir {}
}
volume {
name = "slack-notifier-script"
config_map {
name = kubernetes_config_map.kms_slack_notifier.metadata[0].name
}
}
container {
image = "kebe/vlmcsd:latest"
name = "windows-kms"
image = "kebe/vlmcsd:latest"
name = "windows-kms"
command = ["/usr/bin/vlmcsd"]
args = ["-D", "-v", "-l", "/var/log/vlmcsd/vlmcsd.log"]
resources {
limits = {
memory = "64Mi"
@ -157,6 +228,59 @@ resource "kubernetes_deployment" "windows_kms" {
port {
container_port = 1688
}
volume_mount {
name = "vlmcsd-log"
mount_path = "/var/log/vlmcsd"
}
}
container {
image = "python:3.12-alpine"
name = "slack-notifier"
command = ["python3", "-u", "/scripts/notifier.py"]
env {
name = "VLMCSD_LOG"
value = "/var/log/vlmcsd/vlmcsd.log"
}
env {
name = "SLACK_CHANNEL"
value = "#alerts"
}
env {
name = "DEDUP_WINDOW_SECONDS"
value = "3600"
}
env {
name = "SLACK_WEBHOOK_URL"
value_from {
secret_key_ref {
name = "kms-slack-webhook"
key = "url"
}
}
}
port {
container_port = 9101
name = "metrics"
}
resources {
limits = {
memory = "64Mi"
}
requests = {
cpu = "5m"
memory = "48Mi"
}
}
volume_mount {
name = "vlmcsd-log"
mount_path = "/var/log/vlmcsd"
read_only = true
}
volume_mount {
name = "slack-notifier-script"
mount_path = "/scripts"
read_only = true
}
}
}
}
@ -165,6 +289,7 @@ resource "kubernetes_deployment" "windows_kms" {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].template[0].spec[0].dns_config]
}
depends_on = [kubernetes_manifest.kms_slack_external_secret]
}
resource "kubernetes_service" "windows_kms" {

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -293,7 +293,7 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" {
annotations = {
"resize.topolvm.io/threshold" = "80%"
"resize.topolvm.io/increase" = "100%"
"resize.topolvm.io/storage_limit" = "5Gi"
"resize.topolvm.io/storage_limit" = "10Gi"
}
}
spec {
@ -301,10 +301,16 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" {
storage_class_name = "proxmox-lvm-encrypted"
resources {
requests = {
storage = "2Gi"
storage = "5Gi"
}
}
}
lifecycle {
# pvc-autoresizer expands this PVC up to storage_limit; ignore drift on
# requests.storage. To bump the floor manually: temporarily remove this
# block, apply the new size, re-add the block, apply again.
ignore_changes = [spec[0].resources[0].requests]
}
}
resource "kubernetes_deployment" "mailserver" {

View file

@ -22,6 +22,7 @@ topologySpreadConstraints:
app.kubernetes.io/name: grafana
podAnnotations:
dependency.kyverno.io/wait-for: "mysql.dbaas:3306"
reloader.stakater.com/auto: "true"
podDisruptionBudget:
maxUnavailable: 1
persistence:
@ -72,6 +73,19 @@ dashboardProviders:
envFromSecrets:
- name: grafana-db-creds
optional: false
# Cross-namespace passwords for provisioned datasources backed by
# rotating Vault static-roles. Each source stack creates the secret
# via its own ExternalSecret in `monitoring`. `optional: true` lets
# Grafana boot if a stack hasn't applied yet; reloader (podAnnotation
# above) restarts Grafana when any of these secrets is created or
# rotated, so $__env{...} substitution in datasource ConfigMaps stays
# current.
- name: grafana-wealth-pg-creds
optional: true
- name: grafana-payslips-pg-creds
optional: true
- name: grafana-job-hunter-pg-creds
optional: true
env:
GF_SERVER_ROOT_URL: https://grafana.viktorbarzin.me

View file

@ -83,7 +83,7 @@ alertmanager:
- source_matchers:
- alertname = TraefikDown
target_matchers:
- alertname =~ "HighServiceErrorRate|HighService4xxRate|HighServiceLatency|TraefikHighOpenConnections"
- alertname =~ "HighServiceErrorRate|HighService4xxRate|HighServiceLatency|TraefikHighOpenConnections|IngressTTFBHigh|IngressTTFBCritical|IngressErrorRate5xxHigh|AnubisChallengeStoreErrors"
# Traefik down makes ForwardAuth alerts redundant
- source_matchers:
- alertname = TraefikDown
@ -380,8 +380,11 @@ serverFiles:
regex: 'kubernetes_feature_enabled|kubelet_container_log_filesystem_used_bytes'
action: drop
# Whitelist: only keep essential kubelet metrics
# kubelet_volume_stats_available_bytes is required by pvc-autoresizer
# (it computes utilization as 1 - available/capacity). Without it the
# autoresizer is silent for every PVC in the cluster.
- source_labels: [__name__]
regex: 'kubelet_volume_stats_capacity_bytes|kubelet_volume_stats_used_bytes|kubelet_volume_stats_inodes_used|kubelet_running_containers|kubelet_runtime_operations_errors_total|process_cpu_seconds_total|process_resident_memory_bytes|process_start_time_seconds|go_memstats_alloc_bytes|up'
regex: 'kubelet_volume_stats_capacity_bytes|kubelet_volume_stats_used_bytes|kubelet_volume_stats_available_bytes|kubelet_volume_stats_inodes_used|kubelet_running_containers|kubelet_runtime_operations_errors_total|process_cpu_seconds_total|process_resident_memory_bytes|process_start_time_seconds|go_memstats_alloc_bytes|up'
action: keep
- job_name: kubernetes-nodes-cadvisor
scheme: https
@ -1879,6 +1882,71 @@ serverFiles:
# summary: OpenWRT high memory usage. Can cause services getting stuck.
# MailServerDown, HackmdDown, PrivatebinDown moved to "Application Health" group
# New Tailscale client moved to "Infrastructure Health" group
- name: "Slow Ingress Latency"
# Per-host slow-latency + Anubis-specific 5xx alerts. Sourced from
# `traefik_service_*` metrics scraped via `kubernetes-pods` (only fresh
# samples we have — `*_bucket` series are scraped but the `traefik`
# job's metric_relabel drops them, so `histogram_quantile` produces no
# samples). Once buckets are restored, replace the avg expressions with
# `histogram_quantile(0.95, ...)`. The `service` label format is
# `<ns>-<release>-<port>@kubernetes` and maps roughly 1:1 to a public
# host (e.g. `travel-blog-anubis-travel-8080@kubernetes`).
rules:
- alert: IngressTTFBHigh
expr: |
(
sum(rate(traefik_service_request_duration_seconds_sum{service!~".*idrac.*|.*headscale.*|.*nextcloud.*|.*immich.*",protocol!="websocket"}[5m])) by (service)
/ sum(rate(traefik_service_request_duration_seconds_count{service!~".*idrac.*|.*headscale.*|.*nextcloud.*|.*immich.*",protocol!="websocket"}[5m])) by (service)
) > 1
and sum(rate(traefik_service_request_duration_seconds_count{service!~".*idrac.*|.*headscale.*|.*nextcloud.*|.*immich.*",protocol!="websocket"}[5m])) by (service) > 0.05
and on() (time() - process_start_time_seconds{job="prometheus"}) > 900
for: 10m
labels:
severity: warning
annotations:
summary: "Slow ingress on {{ $labels.service }}: avg latency {{ $value | printf \"%.2f\" }}s (threshold: 1s for 10m)"
- alert: IngressTTFBCritical
expr: |
(
sum(rate(traefik_service_request_duration_seconds_sum{service!~".*idrac.*|.*headscale.*|.*nextcloud.*|.*immich.*",protocol!="websocket"}[5m])) by (service)
/ sum(rate(traefik_service_request_duration_seconds_count{service!~".*idrac.*|.*headscale.*|.*nextcloud.*|.*immich.*",protocol!="websocket"}[5m])) by (service)
) > 3
and sum(rate(traefik_service_request_duration_seconds_count{service!~".*idrac.*|.*headscale.*|.*nextcloud.*|.*immich.*",protocol!="websocket"}[5m])) by (service) > 0.05
and on() (time() - process_start_time_seconds{job="prometheus"}) > 900
for: 5m
labels:
severity: critical
annotations:
summary: "Critically slow ingress on {{ $labels.service }}: avg latency {{ $value | printf \"%.2f\" }}s (threshold: 3s for 5m)"
- alert: IngressErrorRate5xxHigh
expr: |
(
sum(rate(traefik_service_requests_total{code=~"5..", service!~".*nextcloud.*"}[5m])) by (service)
/ sum(rate(traefik_service_requests_total{service!~".*nextcloud.*"}[5m])) by (service)
* 100
) > 5
and sum(rate(traefik_service_requests_total{service!~".*nextcloud.*"}[5m])) by (service) > 0.1
and on() (time() - process_start_time_seconds{job="prometheus"}) > 900
for: 5m
labels:
severity: critical
annotations:
summary: "5xx rate on {{ $labels.service }}: {{ $value | printf \"%.1f\" }}% (threshold: 5% for 5m)"
- alert: AnubisChallengeStoreErrors
# Anubis exposes only Go-runtime metrics on :9090 (no anubis_* /
# challenge_* counters), so we proxy via Traefik 5xx on services
# whose name contains `anubis`. Catches the "store: key not found"
# 500 we saw — every Anubis 5xx is suspicious because the only
# legitimate path through it is /.within.website/x/cmd/anubis or a
# redirect to the upstream, both 200/3xx in healthy operation.
expr: |
sum(rate(traefik_service_requests_total{service=~".*anubis.*",code=~"5.."}[5m])) by (service) > 0
and on() (time() - process_start_time_seconds{job="prometheus"}) > 900
for: 5m
labels:
severity: critical
annotations:
summary: "Anubis service {{ $labels.service }} returning 5xx ({{ $value | printf \"%.2f\" }} req/s) — likely challenge-store error"
- name: "Networking & Access"
rules:
- alert: CloudflaredDown

View file

@ -24,10 +24,31 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
"zh:090260dc7889ea822ec1d899344e1ee23eba5290461989c0796149c9511f2316",
"zh:13c2655ff824b0dc4b9bb832b5ca6d41dba97cb280330258c5fef4115e236209",
"zh:166a73c3a810c9c895d68a8ff968158f339f8a2c1c03e20ec9fc5ed99cc64e20",
"zh:203777eae1cdc711233315499643180604cff2324411b186b7cf07fdbe16f655",
"zh:3b2f18c9a8d28dac74dc6bbf168c946855ab9c68f053578d4630c50d5eaf30a0",
"zh:4822275985f6b74b6196c47112316a4252db22cf4ceaef7c9ab4c66d488abf2f",
"zh:53ea97562666c8a5a2f6d63d418a302a7f8ee4b7bb7da35dedaa89aa5708b7f0",
"zh:56b8a230901e3550c92a1d3f58ee9dafe9853f30fe4315af3ab28ae63262e15d",
"zh:6293ab7b1fd8206a0c853591f50186aca4a1eff117b2a773e10760a23a2c83e9",
"zh:9433970f79fb92d8aae3ee436db5630ab312c78b6dc9df9c1db3273a18f8aaa1",
"zh:95df406214f79b3b98222d7c7fe8fc319a3d90b7a9d53e1d5abbda5dfb8b9436",
"zh:a85880da0552a42c8f449390fbd7d8b03541d1a13e04bba9f1404fa658754260",
"zh:a95f6e9bd62c67e70eba1b1a14728856b9a6a28cd1e5e3be54a7718882c87e7f",
"zh:dd599b51c5beb34a4c6feece244fde07d2558d69929449ab1fd39a5ebe738781",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [
"h1:47CqNwkxctJtL/N/JuEj+8QMg8mRNI/NWeKO5/ydfZU=",
"h1:5b2ojWKT0noujHiweCds37ZreRFRQLNaErdJLusJN88=",
"zh:1a6d5ce931708aec29d1f3d9e360c2a0c35ba5a54d03eeaff0ce3ca597cd0275",
"zh:3411919ba2a5941801e677f0fea08bdd0ae22ba3c9ce3309f55554699e06524a",
@ -48,6 +69,18 @@ provider "registry.terraform.io/hashicorp/kubernetes" {
version = "3.1.0"
hashes = [
"h1:oodIAuFMikXNmEtil5MQgP4dfSctUBYQiGJfjbsF3NY=",
"zh:0215c5c60be62028c09a2f22458e89cda3ef5830a632299f1d401eb3538874b0",
"zh:09ebb9f442431e278a310a9423f32caf467cb4b3cad3fe59573ca71fa7b14e20",
"zh:0c4e5912f83bb35846ae0a9ae54fc320706ee61894cd21cc6b4181b1c5a2fa5c",
"zh:1678c982853ad461e65ccb5e79d585e13ed109dd47dab2a66d3a7a304faeef65",
"zh:1c050a5c15e330457a9c18caacf61a923c59d663e13f2962e4b32f04fef523a0",
"zh:2c55bcec83be58ec132c7cb0a1ac644758b800d794fdc636d53a0eada0358a3a",
"zh:a062bb0aa316c08d8460c66a5d68da71da40de5d3bc3b31abcf3a1a9a19650f1",
"zh:a26fdea0afaa9b247c73c0b42843ca51ba7db0ac2571f9d3d50dcabd20ca1b98",
"zh:c872c9385a78d502bf5823d61cd3bb0f9a0585030e025eb12585c83451beeaa1",
"zh:f180879af931182beee4c8c0d9dab62b81d86f17ddcbe3786ef4c7cec9163a4e",
"zh:f569b65999264a9416862bca5cd2a6177d94ccb0424f3a4ef424428912b9cb3c",
"zh:f70f5789264069e0eef06f9b5d5fde955ef7206f7d446d1ce51a4c37a3f3e02f",
]
}
@ -55,7 +88,6 @@ provider "registry.terraform.io/hashicorp/vault" {
version = "4.8.0"
constraints = "~> 4.0"
hashes = [
"h1:GPfhH6dr1LY0foPBDYv9bEGifx7eSwYqFcEAOWOUxLk=",
"h1:aHqgWQhDBMeZO9iUKwJYMlh4q+xNMUlMIcjRbF4d02Y=",
"zh:269ab13433f67684012ae7e15876532b0312f5d0d2002a9cf9febb1279ce5ea6",
"zh:4babc95bf0c40eb85005db1dc2ca403c46be4a71dd3e409db3711a56f7a5ca0e",

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform {
backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "n8n"
}
}

View file

@ -80,6 +80,44 @@ resource "kubernetes_manifest" "external_secret_claude_agent" {
depends_on = [kubernetes_namespace.n8n]
}
# Shared secrets for the Immich Telegram Postiz Instagram pipeline.
# Workflows in stacks/n8n/workflows/instagram-*.json reference these env vars.
resource "kubernetes_manifest" "external_secret_instagram_pipeline" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "instagram-pipeline-secrets"
namespace = "n8n"
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "instagram-pipeline-secrets"
}
data = [
{
secretKey = "telegram_bot_token"
remoteRef = { key = "instagram-poster", property = "telegram_bot_token" }
},
{
secretKey = "telegram_chat_id"
remoteRef = { key = "instagram-poster", property = "telegram_chat_id" }
},
{
secretKey = "immich_api_key"
remoteRef = { key = "instagram-poster", property = "immich_api_key" }
},
]
}
}
depends_on = [kubernetes_namespace.n8n]
}
resource "kubernetes_persistent_volume_claim" "data_encrypted" {
wait_until_bound = false
metadata {
@ -253,6 +291,47 @@ resource "kubernetes_deployment" "n8n" {
name = "N8N_BLOCK_ENV_ACCESS_IN_NODE"
value = "false"
}
# Instagram pipeline env (consumed by workflows in
# stacks/n8n/workflows/instagram-*.json).
env {
name = "TELEGRAM_BOT_TOKEN"
value_from {
secret_key_ref {
name = "instagram-pipeline-secrets"
key = "telegram_bot_token"
}
}
}
env {
name = "TELEGRAM_CHAT_ID"
value_from {
secret_key_ref {
name = "instagram-pipeline-secrets"
key = "telegram_chat_id"
}
}
}
env {
name = "IMMICH_API_KEY"
value_from {
secret_key_ref {
name = "instagram-pipeline-secrets"
key = "immich_api_key"
}
}
}
env {
name = "IMMICH_BASE_URL"
value = "https://immich.viktorbarzin.me"
}
env {
name = "INSTAGRAM_POSTER_INTERNAL_URL"
value = "http://instagram-poster.instagram-poster.svc.cluster.local"
}
env {
name = "PUBLIC_INSTAGRAM_POSTER_URL"
value = "https://instagram-poster.viktorbarzin.me"
}
volume_mount {
name = "data"
mount_path = "/home/node/.n8n"

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -0,0 +1,390 @@
{
"name": "Instagram Approval",
"active": true,
"id": "483773c0-0b62-4ae5-b1b1-345f5df7b133",
"versionId": "483773c0-0b62-4ae5-b1b1-345f5df7b133",
"nodes": [
{
"parameters": {
"httpMethod": "POST",
"path": "instagram-approval",
"responseMode": "onReceived",
"options": {}
},
"id": "telegram-trigger",
"name": "Telegram Webhook",
"type": "n8n-nodes-base.webhook",
"typeVersion": 2,
"position": [250, 400],
"webhookId": "f2c7c254-ebaf-4f66-b1b4-5c1629c07e08",
"notes": "Receives Telegram inline-button taps."
},
{
"parameters": {
"jsCode": "const raw = $input.first().json;\nconst update = raw.body || raw;\nconst cb = update.callback_query || {};\nconst data = cb.data || '';\nconst [action, assetId] = data.split(':');\nconst message = cb.message || {};\nconst chatId = (message.chat || {}).id;\nconst messageId = message.message_id;\nconst originalCaption = message.caption || '';\nconst callbackQueryId = cb.id;\nreturn [{\n json: {\n action,\n asset_id: assetId,\n chat_id: chatId,\n message_id: messageId,\n original_caption: originalCaption,\n callback_query_id: callbackQueryId,\n }\n}];"
},
"id": "parse-callback",
"name": "Parse callback_data",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [470, 400],
"notes": "Splits callback_data into action + asset_id."
},
{
"parameters": {
"rules": {
"values": [
{
"conditions": {
"options": {"caseSensitive": true, "leftValue": "", "typeValidation": "strict"},
"conditions": [{
"id": "is-approve",
"leftValue": "={{ $json.action }}",
"rightValue": "approve",
"operator": {"type": "string", "operation": "equals"}
}],
"combinator": "and"
},
"outputKey": "approve"
},
{
"conditions": {
"options": {"caseSensitive": true, "leftValue": "", "typeValidation": "strict"},
"conditions": [{
"id": "is-reject",
"leftValue": "={{ $json.action }}",
"rightValue": "reject",
"operator": {"type": "string", "operation": "equals"}
}],
"combinator": "and"
},
"outputKey": "reject"
}
]
},
"options": {}
},
"id": "switch-action",
"name": "Switch on action",
"type": "n8n-nodes-base.switch",
"typeVersion": 3.2,
"position": [690, 400],
"notes": "approve | reject branches; unknown actions dropped."
},
{
"parameters": {
"method": "POST",
"url": "={{ $env.INSTAGRAM_POSTER_INTERNAL_URL }}/enqueue",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ asset_id: $json.asset_id }) }}",
"options": {"timeout": 30000}
},
"id": "approve-enqueue",
"name": "Approve: enqueue + log decision",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [910, 250],
"onError": "continueErrorOutput",
"notes": "Calls /enqueue → moves story_queue row to 'approved' (= backlog) AND records decision row with embedding for CLIP scoring."
},
{
"parameters": {
"method": "POST",
"url": "={{ $env.INSTAGRAM_POSTER_INTERNAL_URL }}/reject",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ asset_id: $json.asset_id }) }}",
"options": {"timeout": 30000}
},
"id": "reject-mark",
"name": "Reject: mark seen + log decision",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [910, 550],
"onError": "continueErrorOutput",
"notes": "Calls /reject → records decision (negative training signal); doesn't add to backlog."
},
{
"parameters": {
"jsCode": "const upstream = $('Parse callback_data').item.json;\nreturn [{ json: { ...upstream, new_caption: (upstream.original_caption || '') + '\\n\\n✅ Saved to backlog' } }];"
},
"id": "approve-caption",
"name": "Approve: build new caption",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [1130, 250],
"notes": "Append confirmation."
},
{
"parameters": {
"jsCode": "const upstream = $('Parse callback_data').item.json;\nreturn [{ json: { ...upstream, new_caption: (upstream.original_caption || '') + '\\n\\n❌ Rejected' } }];"
},
"id": "reject-caption",
"name": "Reject: build new caption",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [1130, 550],
"notes": "Append rejection."
},
{
"parameters": {
"method": "POST",
"url": "=https://api.telegram.org/bot{{ $env.TELEGRAM_BOT_TOKEN }}/editMessageCaption",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ chat_id: $json.chat_id, message_id: $json.message_id, caption: $json.new_caption, parse_mode: 'HTML' }) }}",
"options": {"timeout": 30000}
},
"id": "edit-caption",
"name": "Telegram editMessageCaption",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [1350, 400],
"notes": "Updates the original DM caption to show the resulting state."
},
{
"parameters": {
"method": "POST",
"url": "=https://api.telegram.org/bot{{ $env.TELEGRAM_BOT_TOKEN }}/editMessageReplyMarkup",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ chat_id: $('Parse callback_data').item.json.chat_id, message_id: $('Parse callback_data').item.json.message_id, reply_markup: { inline_keyboard: [] } }) }}",
"options": {"timeout": 30000}
},
"id": "edit-reply-markup",
"name": "Telegram editMessageReplyMarkup",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [1570, 400],
"notes": "Strip the inline buttons from the original DM. Refers back to Parse callback_data because the previous Telegram HTTP call replaced $json with its API response (which has result.chat.id, not chat_id at root)."
},
{
"parameters": {
"method": "POST",
"url": "=https://api.telegram.org/bot{{ $env.TELEGRAM_BOT_TOKEN }}/answerCallbackQuery",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ callback_query_id: $('Parse callback_data').item.json.callback_query_id, text: 'Recorded' }) }}",
"options": {"timeout": 15000}
},
"id": "answer-callback",
"name": "Telegram answerCallbackQuery",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [1790, 400],
"notes": "Dismiss the spinner on the user's tap. callback_query_id from Parse callback_data (upstream HTTP responses don't carry it)."
},
{
"parameters": {
"method": "GET",
"url": "={{ $env.INSTAGRAM_POSTER_INTERNAL_URL }}/candidates?limit=1",
"options": {"timeout": 60000}
},
"id": "get-next",
"name": "Get next candidate",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [2010, 400],
"notes": "Real-time training loop: after every approve/reject, immediately fetch the next ranked candidate so the user can keep tapping. Endpoint excludes already-decided assets so no repeats."
},
{
"parameters": {
"method": "GET",
"url": "={{ $env.INSTAGRAM_POSTER_INTERNAL_URL }}/queue?status=approved",
"options": {"timeout": 30000}
},
"id": "backlog-count",
"name": "Get backlog count",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [2230, 400],
"notes": "Count of approved-but-not-yet-posted rows. Shown to the user so they know how many photos are queued for posting."
},
{
"parameters": {
"jsCode": "// Decide: send next candidate, or 'all caught up' message.\nconst nextResp = $('Get next candidate').item.json;\nconst backlog = $input.first().json;\nconst chatId = $('Parse callback_data').item.json.chat_id;\nconst candidates = (nextResp && nextResp.candidates) || [];\nconst stats = (nextResp && nextResp.stats) || {};\nconst backlogCount = Array.isArray(backlog) ? backlog.length : 0;\n\nif (candidates.length === 0) {\n return [{ json: { has_next: false, chat_id: chatId, backlog_count: backlogCount, stats } }];\n}\n\nconst c = candidates[0];\nconst score = (typeof c.score === 'number') ? c.score.toFixed(2) : '';\nconst takenDate = c.taken_at ? c.taken_at.slice(0, 10) : '';\nconst lines = [\n '<b>📸 Next</b>',\n '',\n '<b>File:</b> ' + (c.filename || c.asset_id),\n];\nif (takenDate) lines.push('<b>Taken:</b> ' + takenDate);\nlines.push('<b>Score:</b> ' + score + (c.has_embedding ? '' : ' (no embedding)'));\nlines.push('', '<i>Backlog: ' + backlogCount + ' · trained on ' + (stats.approved || 0) + '✅ / ' + (stats.rejected || 0) + '❌</i>');\nreturn [{ json: { has_next: true, asset_id: c.asset_id, caption: lines.join('\\n'), chat_id: chatId } }];"
},
"id": "build-next",
"name": "Build next-candidate payload",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [2450, 400],
"notes": "Assemble caption with score + cumulative stats + backlog count, OR signal 'all caught up'."
},
{
"parameters": {
"rules": {
"values": [{
"conditions": {
"options": {"caseSensitive": true, "leftValue": "", "typeValidation": "strict"},
"conditions": [{
"id": "has-next",
"leftValue": "={{ $json.has_next }}",
"rightValue": true,
"operator": {"type": "boolean", "operation": "true"}
}],
"combinator": "and"
},
"outputKey": "next"
}]
},
"options": {"fallbackOutput": "extra"}
},
"id": "switch-has-next",
"name": "Branch: has next?",
"type": "n8n-nodes-base.switch",
"typeVersion": 3.2,
"position": [2670, 400],
"notes": "Route to sendPhoto if there's another candidate, otherwise to 'all caught up' message."
},
{
"parameters": {
"method": "POST",
"url": "=https://api.telegram.org/bot{{ $env.TELEGRAM_BOT_TOKEN }}/sendPhoto",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ chat_id: $json.chat_id, photo: $env.PUBLIC_INSTAGRAM_POSTER_URL + '/image/' + $json.asset_id, caption: $json.caption, parse_mode: 'HTML', reply_markup: { inline_keyboard: [[ { text: '✅ Approve', callback_data: 'approve:' + $json.asset_id }, { text: '❌ Reject', callback_data: 'reject:' + $json.asset_id } ]] } }) }}",
"options": {"timeout": 30000}
},
"id": "send-next",
"name": "Telegram sendPhoto (next)",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [2890, 250],
"notes": "Sends the next candidate with its own approve/reject buttons; tap chains back into this same workflow."
},
{
"parameters": {
"method": "POST",
"url": "=https://api.telegram.org/bot{{ $env.TELEGRAM_BOT_TOKEN }}/sendMessage",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ chat_id: $json.chat_id, text: '🎉 All caught up — nothing more tagged in Immich.\\n\\nBacklog: ' + $json.backlog_count + ' approved photos waiting to post.\\nTrained on ' + (($json.stats && $json.stats.approved) || 0) + '✅ / ' + (($json.stats && $json.stats.rejected) || 0) + '❌.', parse_mode: 'HTML' }) }}",
"options": {"timeout": 15000}
},
"id": "send-empty",
"name": "Telegram all-caught-up",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [2890, 550],
"notes": "When no more candidates, tell the user how big the backlog is so they know how many days of content are queued."
},
{
"parameters": {
"jsCode": "const cb = $('Parse callback_data').item.json;\nconst err = $input.first().json.error || $input.first().json;\nconst msg = (err && (err.message || err.description || JSON.stringify(err))) || 'unknown error';\nreturn [{ json: { chat_id: cb.chat_id, text: 'Instagram poster error for ' + cb.asset_id + ':\\n' + msg } }];"
},
"id": "build-error-msg",
"name": "Build error message",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [1130, 750],
"notes": "Catches non-2xx from /enqueue or /reject."
},
{
"parameters": {
"method": "POST",
"url": "=https://api.telegram.org/bot{{ $env.TELEGRAM_BOT_TOKEN }}/sendMessage",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ chat_id: $json.chat_id, text: $json.text }) }}",
"options": {"timeout": 15000}
},
"id": "telegram-error-msg",
"name": "Telegram error notice",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [1350, 750],
"notes": "Sends the error text to the user."
}
],
"connections": {
"Telegram Webhook": {"main": [[{"node": "Parse callback_data", "type": "main", "index": 0}]]},
"Parse callback_data": {"main": [[{"node": "Switch on action", "type": "main", "index": 0}]]},
"Switch on action": {
"main": [
[{"node": "Approve: enqueue + log decision", "type": "main", "index": 0}],
[{"node": "Reject: mark seen + log decision", "type": "main", "index": 0}]
]
},
"Approve: enqueue + log decision": {
"main": [
[{"node": "Approve: build new caption", "type": "main", "index": 0}],
[{"node": "Build error message", "type": "main", "index": 0}]
]
},
"Reject: mark seen + log decision": {
"main": [
[{"node": "Reject: build new caption", "type": "main", "index": 0}],
[{"node": "Build error message", "type": "main", "index": 0}]
]
},
"Approve: build new caption": {"main": [[{"node": "Telegram editMessageCaption", "type": "main", "index": 0}]]},
"Reject: build new caption": {"main": [[{"node": "Telegram editMessageCaption", "type": "main", "index": 0}]]},
"Telegram editMessageCaption": {"main": [[{"node": "Telegram editMessageReplyMarkup", "type": "main", "index": 0}]]},
"Telegram editMessageReplyMarkup": {"main": [[{"node": "Telegram answerCallbackQuery", "type": "main", "index": 0}]]},
"Telegram answerCallbackQuery": {"main": [[{"node": "Get next candidate", "type": "main", "index": 0}]]},
"Get next candidate": {"main": [[{"node": "Get backlog count", "type": "main", "index": 0}]]},
"Get backlog count": {"main": [[{"node": "Build next-candidate payload", "type": "main", "index": 0}]]},
"Build next-candidate payload": {"main": [[{"node": "Branch: has next?", "type": "main", "index": 0}]]},
"Branch: has next?": {
"main": [
[{"node": "Telegram sendPhoto (next)", "type": "main", "index": 0}],
[{"node": "Telegram all-caught-up", "type": "main", "index": 0}]
]
},
"Build error message": {"main": [[{"node": "Telegram error notice", "type": "main", "index": 0}]]}
},
"settings": {"executionOrder": "v1", "saveExecutionProgress": false, "saveManualExecutions": true},
"staticData": null,
"meta": {"templateCredsSetupCompleted": false},
"pinData": {}
}

View file

@ -0,0 +1,106 @@
{
"name": "Instagram Discover",
"active": true,
"id": "3bae241e-c693-49aa-b271-51af0ec811dc",
"versionId": "3bae241e-c693-49aa-b271-51af0ec811dc",
"nodes": [
{
"parameters": {
"rule": {
"interval": [{
"field": "cronExpression",
"expression": "0 9 * * *"
}]
}
},
"id": "cron-daily-9",
"name": "Daily 09:00",
"type": "n8n-nodes-base.scheduleTrigger",
"typeVersion": 1.1,
"position": [250, 300],
"notes": "Once a day kickstart. Sends 1 candidate so the user can start a training session by tapping. The approval workflow's chain takes over from there — every approve/reject sends the next candidate immediately. Daily cadence avoids spamming Telegram if the user is actively training; the loop is user-paced."
},
{
"parameters": {
"method": "GET",
"url": "={{ $env.INSTAGRAM_POSTER_INTERNAL_URL }}/candidates?limit=1",
"options": {"timeout": 60000}
},
"id": "candidates",
"name": "Get top-3 ranked candidates",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [500, 300],
"notes": "GET /candidates?limit=3 returns assets ranked by CLIP similarity to approved/rejected centroids. Cold-start (no decision history) falls back to recency. Endpoint also auto-adds returned items to story_queue as pending so /enqueue can transition them on approve."
},
{
"parameters": {
"fieldToSplitOut": "candidates",
"options": {}
},
"id": "split-items",
"name": "Split candidates",
"type": "n8n-nodes-base.splitOut",
"typeVersion": 1,
"position": [750, 300],
"notes": "One Telegram message per candidate."
},
{
"parameters": {
"batchSize": 1,
"options": {}
},
"id": "batch-loop",
"name": "Loop one at a time",
"type": "n8n-nodes-base.splitInBatches",
"typeVersion": 3,
"position": [970, 300],
"notes": "Process one asset at a time so a single Telegram error doesn't stop the others."
},
{
"parameters": {
"jsCode": "const c = $input.first().json;\nconst score = (typeof c.score === 'number') ? c.score.toFixed(2) : '';\nconst takenDate = c.taken_at ? c.taken_at.slice(0, 10) : '';\nconst lines = [\n '<b>📸 New candidate</b>',\n '',\n '<b>File:</b> ' + (c.filename || c.asset_id),\n];\nif (takenDate) lines.push('<b>Taken:</b> ' + takenDate);\nlines.push('<b>Score:</b> ' + score + (c.has_embedding ? '' : ' (no embedding yet)'));\nlines.push('', 'Approve to queue for posting, reject to mark seen.');\nreturn [{ json: { asset_id: c.asset_id, caption: lines.join('\\n') } }];"
},
"id": "build-caption",
"name": "Build caption",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [1190, 300],
"notes": "Format the Telegram caption with the CLIP-similarity score, taken date, filename. Score is approve_centroid_cos reject_centroid_cos; nulls show as ."
},
{
"parameters": {
"method": "POST",
"url": "=https://api.telegram.org/bot{{ $env.TELEGRAM_BOT_TOKEN }}/sendPhoto",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ chat_id: $env.TELEGRAM_CHAT_ID, photo: $env.PUBLIC_INSTAGRAM_POSTER_URL + '/image/' + $json.asset_id, caption: $json.caption, parse_mode: 'HTML', reply_markup: { inline_keyboard: [[ { text: '✅ Approve', callback_data: 'approve:' + $json.asset_id }, { text: '❌ Reject', callback_data: 'reject:' + $json.asset_id } ]] } }) }}",
"options": {"timeout": 30000}
},
"id": "telegram-send-photo",
"name": "Telegram sendPhoto with buttons",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [1410, 300],
"notes": "Telegram fetches the 9:16 derivative from instagram-poster.viktorbarzin.me/image/<id>. Inline keyboard wires the action:asset_id format consumed by instagram-approval workflow."
}
],
"connections": {
"Daily 09:00": {"main": [[{"node": "Get top-3 ranked candidates", "type": "main", "index": 0}]]},
"Get top-3 ranked candidates": {"main": [[{"node": "Split candidates", "type": "main", "index": 0}]]},
"Split candidates": {"main": [[{"node": "Loop one at a time", "type": "main", "index": 0}]]},
"Loop one at a time": {"main": [[{"node": "Build caption", "type": "main", "index": 0}]]},
"Build caption": {"main": [[{"node": "Telegram sendPhoto with buttons", "type": "main", "index": 0}]]},
"Telegram sendPhoto with buttons": {"main": [[{"node": "Loop one at a time", "type": "main", "index": 0}]]}
},
"settings": {"executionOrder": "v1", "saveExecutionProgress": false, "saveManualExecutions": true},
"staticData": null,
"meta": {"templateCredsSetupCompleted": false},
"pinData": {}
}

View file

@ -0,0 +1,177 @@
{
"name": "Instagram Post",
"active": true,
"id": "8964902b-b106-4cea-8965-77724baa71be",
"versionId": "8964902b-b106-4cea-8965-77724baa71be",
"nodes": [
{
"parameters": {
"rule": {
"interval": [{"field": "days", "daysInterval": 1, "triggerAtHour": 11, "triggerAtMinute": 0}]
}
},
"id": "cron-daily-11",
"name": "Daily 11:00 Europe/London",
"type": "n8n-nodes-base.scheduleTrigger",
"typeVersion": 1.1,
"position": [250, 300],
"notes": "Fires once a day. Postiz handles per-platform scheduling windows; this just feeds the next approved asset to the poster service."
},
{
"parameters": {
"method": "POST",
"url": "={{ $env.INSTAGRAM_POSTER_INTERNAL_URL }}/post-next",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Authorization", "value": "=Bearer {{ $env.INSTAGRAM_POSTER_TOKEN }}"},
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": false,
"options": {
"timeout": 60000,
"response": {"response": {"fullResponse": true, "neverError": true}}
}
},
"id": "post-next",
"name": "Call /post-next",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [500, 300],
"notes": "neverError + fullResponse gives us the status code so we can branch on 200 / 404 / 5xx without throwing."
},
{
"parameters": {
"rules": {
"values": [
{
"conditions": {
"options": {"caseSensitive": true, "leftValue": "", "typeValidation": "strict"},
"conditions": [{"id": "is-200", "leftValue": "={{ $json.statusCode }}", "rightValue": 200, "operator": {"type": "number", "operation": "equals"}}],
"combinator": "and"
},
"outputKey": "ok"
},
{
"conditions": {
"options": {"caseSensitive": true, "leftValue": "", "typeValidation": "strict"},
"conditions": [{"id": "is-404", "leftValue": "={{ $json.statusCode }}", "rightValue": 404, "operator": {"type": "number", "operation": "equals"}}],
"combinator": "and"
},
"outputKey": "empty"
},
{
"conditions": {
"options": {"caseSensitive": true, "leftValue": "", "typeValidation": "strict"},
"conditions": [{"id": "is-5xx", "leftValue": "={{ $json.statusCode }}", "rightValue": 500, "operator": {"type": "number", "operation": "largerEqual"}}],
"combinator": "and"
},
"outputKey": "error"
}
]
},
"options": {"fallbackOutput": "extra", "renameFallbackOutput": "other"}
},
"id": "switch-status",
"name": "Switch on status code",
"type": "n8n-nodes-base.switch",
"typeVersion": 3.2,
"position": [750, 300],
"notes": "200 -> success notify, 404 -> silent no-op, 5xx -> alert. Other 4xx falls into the fallback branch and is also alerted."
},
{
"parameters": {
"jsCode": "const body = $input.first().json.body || $input.first().json;\nconst assetId = (body && (body.asset_id || body.id)) || 'unknown';\nreturn [{ json: { chat_id: $env.TELEGRAM_CHAT_ID, text: 'Story scheduled: ' + assetId } }];"
},
"id": "build-success-msg",
"name": "Build success message",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [1000, 150],
"notes": "Pulls asset_id out of the response body for the confirmation Telegram message."
},
{
"parameters": {
"method": "POST",
"url": "=https://api.telegram.org/bot{{ $env.TELEGRAM_BOT_TOKEN }}/sendMessage",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ chat_id: $json.chat_id, text: $json.text }) }}",
"options": {"timeout": 15000}
},
"id": "telegram-success",
"name": "Telegram success notice",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [1250, 150],
"notes": "Confirms the scheduled post to the user."
},
{
"parameters": {
"jsCode": "const r = $input.first().json;\nconst body = r.body || {};\nconst err = body.error || JSON.stringify(body) || ('HTTP ' + r.statusCode);\nreturn [{ json: { chat_id: $env.TELEGRAM_CHAT_ID, text: 'Instagram post-next failed (HTTP ' + r.statusCode + '): ' + err } }];"
},
"id": "build-error-msg",
"name": "Build error message",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [1000, 450],
"notes": "Formats a Telegram alert with status code + body error message."
},
{
"parameters": {
"method": "POST",
"url": "=https://api.telegram.org/bot{{ $env.TELEGRAM_BOT_TOKEN }}/sendMessage",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ chat_id: $json.chat_id, text: $json.text }) }}",
"options": {"timeout": 15000}
},
"id": "telegram-error",
"name": "Telegram error alert",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [1250, 450],
"notes": "Sends the error message to the user."
},
{
"parameters": {},
"id": "noop-empty",
"name": "Empty queue (no-op)",
"type": "n8n-nodes-base.noOp",
"typeVersion": 1,
"position": [1000, 300],
"notes": "404 means there are no approved items waiting; do nothing instead of spamming Telegram."
}
],
"connections": {
"Daily 11:00 Europe/London": {"main": [[{"node": "Call /post-next", "type": "main", "index": 0}]]},
"Call /post-next": {"main": [[{"node": "Switch on status code", "type": "main", "index": 0}]]},
"Switch on status code": {
"main": [
[{"node": "Build success message", "type": "main", "index": 0}],
[{"node": "Empty queue (no-op)", "type": "main", "index": 0}],
[{"node": "Build error message", "type": "main", "index": 0}],
[{"node": "Build error message", "type": "main", "index": 0}]
]
},
"Build success message": {"main": [[{"node": "Telegram success notice", "type": "main", "index": 0}]]},
"Build error message": {"main": [[{"node": "Telegram error alert", "type": "main", "index": 0}]]}
},
"settings": {"executionOrder": "v1", "saveExecutionProgress": false, "saveManualExecutions": true},
"staticData": null,
"meta": {"templateCredsSetupCompleted": false},
"pinData": {}
}

View file

@ -0,0 +1,64 @@
{
"name": "Postiz Publish Notify",
"active": true,
"id": "9c1b3d76-4e2a-4f8b-b1d5-2a9c4e3d7f01",
"versionId": "9c1b3d76-4e2a-4f8b-b1d5-2a9c4e3d7f01",
"nodes": [
{
"parameters": {
"httpMethod": "POST",
"path": "postiz-publish",
"responseMode": "onReceived",
"options": {}
},
"id": "postiz-webhook",
"name": "Postiz webhook (publish)",
"type": "n8n-nodes-base.webhook",
"typeVersion": 2,
"position": [250, 300],
"webhookId": "9c1b3d76-postiz-publish",
"notes": "Postiz fires this webhook AFTER a successful publish (post.workflow.v1.0.2.js -> sendWebhooks). Body = full post JSON. Register URL in Postiz UI → Settings → Webhooks → https://n8n.viktorbarzin.me/webhook/postiz-publish"
},
{
"parameters": {
"jsCode": "// Postiz webhook payload is the full post object.\nconst raw = $input.first().json;\nconst body = raw.body || raw;\nconst integ = body.integration || {};\nconst providerName = integ.name || 'unknown';\nconst providerIdentifier = integ.providerIdentifier || 'unknown';\nconst content = (body.content || '').slice(0, 200);\nconst releaseURL = body.releaseURL || '';\nconst publishDate = body.publishDate || '';\nconst state = body.state || '';\nconst integrationPicture = integ.picture || '';\n\nconst when = publishDate ? new Date(publishDate).toLocaleString('en-GB', { timeZone: 'Europe/Sofia' }) : 'just now';\n\nconst lines = [\n '<b>📤 Posted to ' + providerName + '</b> (' + providerIdentifier + ')',\n '',\n];\nif (releaseURL) lines.push('<a href=\"' + releaseURL + '\">View on Instagram</a>');\nif (content) lines.push('', '<i>' + content + '</i>');\nlines.push('', 'state=' + state + ' · published ' + when);\n\nreturn [{ json: {\n text: lines.join('\\n'),\n release_url: releaseURL,\n post_id: body.id,\n integration_id: integ.id,\n}}];"
},
"id": "format-message",
"name": "Format Telegram message",
"type": "n8n-nodes-base.code",
"typeVersion": 2,
"position": [500, 300],
"notes": "Build the HTML-formatted Telegram message from Postiz's post JSON. Defensive for missing fields — Postiz only fires on success, but webhooks elsewhere might send partial data."
},
{
"parameters": {
"method": "POST",
"url": "=https://api.telegram.org/bot{{ $env.TELEGRAM_BOT_TOKEN }}/sendMessage",
"sendHeaders": true,
"headerParameters": {
"parameters": [
{"name": "Content-Type", "value": "application/json"}
]
},
"sendBody": true,
"specifyBody": "json",
"jsonBody": "={{ JSON.stringify({ chat_id: $env.TELEGRAM_CHAT_ID, text: $json.text, parse_mode: 'HTML', disable_web_page_preview: false }) }}",
"options": {"timeout": 30000}
},
"id": "telegram-notify",
"name": "Telegram sendMessage",
"type": "n8n-nodes-base.httpRequest",
"typeVersion": 4.2,
"position": [750, 300],
"notes": "Send the formatted notification to the user's Telegram chat. parse_mode=HTML so the link is clickable; preview enabled so the IG card renders inline."
}
],
"connections": {
"Postiz webhook (publish)": {"main": [[{"node": "Format Telegram message", "type": "main", "index": 0}]]},
"Format Telegram message": {"main": [[{"node": "Telegram sendMessage", "type": "main", "index": 0}]]}
},
"settings": {"executionOrder": "v1", "saveExecutionProgress": false, "saveManualExecutions": true},
"staticData": null,
"meta": {"templateCredsSetupCompleted": false},
"pinData": {}
}

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform {
backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
conn_str = "postgres://terraform_state:tOvxJ-7fxdWq0p3jKeYB@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "nextcloud"
}
}

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform {
backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "openclaw"
}
}

View file

@ -399,6 +399,44 @@ resource "kubernetes_deployment" "openclaw" {
}
}
# Init 1b: regenerate kubeconfig pointing at the projected SA tokenFile
# so kubectl always reads the fresh, kubelet-rotated token. Without
# this the previously-baked kubeconfig retains a SA token bound to a
# long-dead pod and kubectl returns "must be logged in to the server".
init_container {
name = "setup-kubeconfig"
image = "busybox:1.37"
command = ["sh", "-c", <<-EOT
cat > /home/node/.openclaw/kubeconfig <<'KUBECONFIG_EOF'
apiVersion: v1
kind: Config
clusters:
- cluster:
certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
server: https://kubernetes.default.svc
name: in-cluster
contexts:
- context:
cluster: in-cluster
user: openclaw
namespace: openclaw
name: in-cluster
current-context: in-cluster
users:
- name: openclaw
user:
tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
KUBECONFIG_EOF
chown 1000:1000 /home/node/.openclaw/kubeconfig
chmod 0644 /home/node/.openclaw/kubeconfig
EOT
]
volume_mount {
name = "openclaw-home"
mount_path = "/home/node/.openclaw"
}
}
# Init 2 removed: install-dotfiles init container was cloning dotfiles
# repo via git on every pod start, causing 200+ small NFS writes.
# Dotfiles already exist on NFS at /home/node/.openclaw/dotfiles from

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform {
backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
conn_str = "postgres://terraform_state:tOvxJ-7fxdWq0p3jKeYB@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "paperless-ngx"
}
}

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -404,18 +404,52 @@ resource "kubernetes_cron_job_v1" "actualbudget_payroll_sync" {
]
}
# Plan-time read of the ESO-created K8s Secret for Grafana datasource password.
# First apply: -target=kubernetes_manifest.db_external_secret first so the Secret exists.
data "kubernetes_secret" "payslip_ingest_db_creds" {
metadata {
name = "payslip-ingest-db-creds"
namespace = kubernetes_namespace.payslip_ingest.metadata[0].name
# ExternalSecret in the monitoring namespace mirroring the rotating
# payslip-ingest DB password. Grafana mounts this via envFromSecrets in
# monitoring/grafana_chart_values.yaml; the datasource ConfigMap below
# references it as $__env{PAYSLIPS_PG_PASSWORD}. Reloader restarts
# Grafana whenever ESO updates this secret (every 7d on rotation).
resource "kubernetes_manifest" "grafana_payslips_db_external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "grafana-payslips-pg-creds"
namespace = "monitoring"
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-database"
kind = "ClusterSecretStore"
}
target = {
name = "grafana-payslips-pg-creds"
template = {
metadata = {
annotations = {
"reloader.stakater.com/match" = "true"
}
}
data = {
PAYSLIPS_PG_PASSWORD = "{{ .password }}"
}
}
}
data = [{
secretKey = "password"
remoteRef = {
key = "static-creds/pg-payslip-ingest"
property = "password"
}
}]
}
}
depends_on = [kubernetes_manifest.db_external_secret]
}
# Grafana datasource for payslip_ingest PostgreSQL DB.
# Lives in the monitoring namespace so the grafana sidecar (label grafana_datasource=1) picks it up.
# Password is injected via $__env{...} from grafana-payslips-pg-creds (above).
resource "kubernetes_config_map" "grafana_payslips_datasource" {
metadata {
name = "grafana-payslips-datasource"
@ -445,10 +479,11 @@ resource "kubernetes_config_map" "grafana_payslips_datasource" {
timescaledb = false
}
secureJsonData = {
password = data.kubernetes_secret.payslip_ingest_db_creds.data["DB_PASSWORD"]
password = "$__env{PAYSLIPS_PG_PASSWORD}"
}
editable = true
}]
})
}
depends_on = [kubernetes_manifest.grafana_payslips_db_external_secret]
}

11
stacks/postiz/main.tf Normal file
View file

@ -0,0 +1,11 @@
variable "tls_secret_name" {
type = string
sensitive = true
}
variable "nfs_server" { type = string }
module "postiz" {
source = "./modules/postiz"
tls_secret_name = var.tls_secret_name
tier = local.tiers.aux
}

View file

@ -0,0 +1,578 @@
#
# Postiz social media post scheduler (Instagram Stories + others).
#
# Chart: oci://ghcr.io/gitroomhq/postiz-helmchart/charts/postiz (v1.0.5)
# App : ghcr.io/gitroomhq/postiz-app:v2.21.7
#
# Layout:
# - Bundled Postgres + Redis (chart subcharts) fine for v1.
# - Local file storage for uploads on a `proxmox-lvm` PVC mounted at /uploads.
# - JWT_SECRET is sourced from Vault via ESO. The chart's helper-templated
# Secret name is `<release>-secrets`; we pin `fullnameOverride: postiz` so
# the Secret resolves to `postiz-secrets`. The chart already mounts that
# Secret via `envFrom: secretRef: <fullname>-secrets`, so ESO patching the
# same Secret with `creationPolicy: Merge` injects `JWT_SECRET` into the
# pod env without forking the chart.
# - OAuth credentials for Meta/X/LinkedIn etc. are NOT pre-seeded Postiz
# stores those in its own DB once the user adds providers via the UI.
#
resource "kubernetes_namespace" "postiz" {
metadata {
name = var.namespace
labels = {
tier = var.tier
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
}
module "tls_secret" {
source = "../../../../modules/kubernetes/setup_tls_secret"
namespace = kubernetes_namespace.postiz.metadata[0].name
tls_secret_name = var.tls_secret_name
}
# /uploads PVC keeps user-uploaded media across pod restarts.
resource "kubernetes_persistent_volume_claim" "uploads" {
wait_until_bound = false
metadata {
name = "postiz-uploads"
namespace = kubernetes_namespace.postiz.metadata[0].name
annotations = {
"resize.topolvm.io/threshold" = "80%"
"resize.topolvm.io/increase" = "100%"
"resize.topolvm.io/storage_limit" = "50Gi"
}
}
spec {
access_modes = ["ReadWriteOnce"]
storage_class_name = "proxmox-lvm"
resources {
requests = {
storage = var.storage_size
}
}
}
}
# ExternalSecret: patches the chart-managed `postiz-secrets` Secret with
# JWT_SECRET pulled from Vault. `creationPolicy: Merge` means ESO will not
# take ownership it just adds/updates the keys it manages, leaving the
# Helm-owned Secret resource intact. The chart's deployment already wires
# this Secret in via `envFrom: secretRef: postiz-secrets`.
resource "kubernetes_manifest" "external_secret_jwt" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "postiz-jwt-secret"
namespace = kubernetes_namespace.postiz.metadata[0].name
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "postiz-secrets"
creationPolicy = "Merge"
}
data = [
{
secretKey = "JWT_SECRET"
remoteRef = { key = "instagram-poster", property = "postiz_jwt_secret" }
},
{
secretKey = "FACEBOOK_APP_ID"
remoteRef = { key = "instagram-poster", property = "facebook_app_id" }
},
{
secretKey = "FACEBOOK_APP_SECRET"
remoteRef = { key = "instagram-poster", property = "facebook_app_secret" }
},
{
secretKey = "INSTAGRAM_APP_ID"
remoteRef = { key = "instagram-poster", property = "instagram_app_id" }
},
{
secretKey = "INSTAGRAM_APP_SECRET"
remoteRef = { key = "instagram-poster", property = "instagram_app_secret" }
},
]
}
}
depends_on = [kubernetes_namespace.postiz]
}
resource "helm_release" "postiz" {
namespace = kubernetes_namespace.postiz.metadata[0].name
name = "postiz"
create_namespace = false
atomic = true
timeout = 600
repository = "oci://ghcr.io/gitroomhq/postiz-helmchart/charts"
chart = "postiz-app"
version = var.chart_version
values = [yamlencode({
fullnameOverride = "postiz"
image = {
repository = "ghcr.io/gitroomhq/postiz-app"
tag = var.image_tag
pullPolicy = "IfNotPresent"
}
service = {
type = "ClusterIP"
port = 80 # chart maps Service port 80 -> targetPort http (containerPort 5000)
}
# Non-secret env. Note: BACKEND_INTERNAL_URL stays in-pod (Postiz convention).
env = {
MAIN_URL = "https://postiz.viktorbarzin.me"
FRONTEND_URL = "https://postiz.viktorbarzin.me"
NEXT_PUBLIC_BACKEND_URL = "https://postiz.viktorbarzin.me/api"
BACKEND_INTERNAL_URL = "http://localhost:3000"
STORAGE_PROVIDER = "local"
UPLOAD_DIRECTORY = "/uploads"
NEXT_PUBLIC_UPLOAD_DIRECTORY = "/uploads"
# Disabled admin user already created; sign-in only.
DISABLE_REGISTRATION = "true"
IS_GENERAL = "true"
NX_ADD_PLUGINS = "false"
# Postiz uses Temporal for cron/scheduling bring our own; Helm chart doesn't.
TEMPORAL_ADDRESS = "temporal:7233"
}
# Postiz reads DATABASE_URL/REDIS_URL from this Secret. The chart does
# NOT auto-wire bundled subcharts we have to point at the in-namespace
# PG/Redis Services. ESO patches JWT_SECRET + FACEBOOK_APP_* on top via
# creationPolicy=Merge from secret/instagram-poster.
# Subchart auth uses the chart defaults (postiz / postiz-password,
# postiz-redis-password) both Services are ClusterIP, only routable
# from inside the postiz namespace, so the well-known creds are safe.
secrets = {
DATABASE_URL = "postgresql://postiz:postiz-password@postiz-postgresql:5432/postiz"
REDIS_URL = "redis://default:postiz-redis-password@postiz-redis-master:6379"
JWT_SECRET = ""
# IG-via-Facebook OAuth (Postiz Instagram-Business integration). Empty
# placeholder; ESO patches the real values from Vault below.
FACEBOOK_APP_ID = ""
FACEBOOK_APP_SECRET = ""
# IG standalone (Postiz Instagram-Login integration). Uses the modern
# `instagram_business_*` scopes does not require the FB Login dance.
INSTAGRAM_APP_ID = ""
INSTAGRAM_APP_SECRET = ""
}
# Use our PVC for uploads (overrides the chart's emptyDir default).
extraVolumes = [{
name = "uploads-volume"
persistentVolumeClaim = {
claimName = kubernetes_persistent_volume_claim.uploads.metadata[0].name
}
}]
extraVolumeMounts = [{
name = "uploads-volume"
mountPath = "/uploads"
}]
# Postiz runs frontend (Next 16) + backend (NestJS) + orchestrator
# (Temporal worker with webpack bundling) in one pod. The orchestrator
# alone bundles ~3MB JS per task queue, and on cold start it bundles
# several queues pushed peak RSS past 2Gi OOMKill mid-NestJS init.
resources = {
requests = {
cpu = "100m"
memory = "512Mi"
}
limits = {
memory = "4Gi"
}
}
# Bundled stateful deps fine for v1, reconsider promotion to CNPG later.
# Subchart passwords intentionally left to chart defaults; the bundled
# PG/Redis Services are ClusterIP and only routable from the postiz
# namespace, so the credentials never leave the pod network. Promotion to
# CNPG with Vault-rotated creds is the next step.
# Bitnami removed bitnami/postgresql + bitnami/redis from DockerHub
# (Broadcom acquisition, Aug 2025). Older tags moved to bitnamilegacy/*.
postgresql = {
enabled = true
image = {
registry = "docker.io"
repository = "bitnamilegacy/postgresql"
tag = "16.4.0-debian-12-r7"
}
auth = {
username = "postiz"
database = "postiz"
}
}
redis = {
enabled = true
image = {
registry = "docker.io"
repository = "bitnamilegacy/redis"
tag = "7.4.0-debian-12-r2"
}
}
})]
depends_on = [
kubernetes_persistent_volume_claim.uploads,
kubernetes_manifest.external_secret_jwt,
]
}
# Two ingresses on the same host. /uploads/* must be reachable WITHOUT auth
# so Meta's IG Graph API fetcher can pull the JPEG when Postiz hands it the
# upload URL when behind Authentik, Meta receives a 302 to the login page
# and rejects with error code 36001 (Postiz mistranslates this as "Invalid
# Instagram image resolution"). Everything else stays behind Authentik.
module "ingress_uploads_public" {
source = "../../../../modules/kubernetes/ingress_factory"
dns_type = "proxied"
namespace = kubernetes_namespace.postiz.metadata[0].name
name = "postiz-uploads"
host = var.host
service_name = "postiz"
port = 80
protected = false
ingress_path = ["/uploads"]
tls_secret_name = var.tls_secret_name
}
module "ingress" {
source = "../../../../modules/kubernetes/ingress_factory"
dns_type = "none" # DNS already created by ingress_uploads_public
namespace = kubernetes_namespace.postiz.metadata[0].name
name = "postiz"
host = var.host
service_name = "postiz"
port = 80
protected = true # Authentik forward-auth on the UI / API path
ingress_path = ["/"]
tls_secret_name = var.tls_secret_name
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "Postiz"
"gethomepage.dev/description" = "Social media post scheduler"
"gethomepage.dev/icon" = "postiz.png"
"gethomepage.dev/group" = "Automation"
"gethomepage.dev/pod-selector" = ""
}
}
#
# Temporal cron/workflow engine Postiz requires for scheduled posts.
#
# Lightweight single-replica deployment using temporalio/auto-setup, backed
# by the bundled postiz-postgresql (separate `temporal` database). Visibility
# search via Elasticsearch is disabled (ENABLE_ES=false) Postiz only uses
# the workflow engine, not visibility, so SQL is enough.
#
# Important: temporalio/auto-setup creates schemas in the `temporal` and
# `temporal_visibility` databases on first boot. We pre-create them with an
# init container running psql against postiz-postgresql.
#
resource "kubernetes_deployment" "temporal" {
metadata {
name = "temporal"
namespace = kubernetes_namespace.postiz.metadata[0].name
labels = {
app = "temporal"
}
}
spec {
replicas = 1
strategy {
type = "Recreate"
}
selector {
match_labels = { app = "temporal" }
}
template {
metadata {
labels = { app = "temporal" }
}
spec {
# Pre-create the two databases Temporal expects on the bundled PG.
init_container {
name = "create-temporal-dbs"
image = "docker.io/bitnamilegacy/postgresql:16.4.0-debian-12-r7"
env {
name = "PGPASSWORD"
value = "postiz-password"
}
command = ["/bin/bash", "-c"]
args = [
<<-EOT
set -e
for db in temporal temporal_visibility; do
psql -h postiz-postgresql -U postiz -d postgres -tc "SELECT 1 FROM pg_database WHERE datname='$db'" | grep -q 1 \
|| psql -h postiz-postgresql -U postiz -d postgres -c "CREATE DATABASE \"$db\""
done
EOT
]
}
container {
name = "temporal"
image = "temporalio/auto-setup:1.28.1"
port {
container_port = 7233
name = "grpc"
}
env {
name = "DB"
value = "postgres12"
}
env {
name = "DB_PORT"
value = "5432"
}
env {
name = "POSTGRES_USER"
value = "postiz"
}
env {
name = "POSTGRES_PWD"
value = "postiz-password"
}
env {
name = "POSTGRES_SEEDS"
value = "postiz-postgresql"
}
env {
name = "DBNAME"
value = "temporal"
}
env {
name = "VISIBILITY_DBNAME"
value = "temporal_visibility"
}
env {
name = "ENABLE_ES"
value = "false"
}
env {
name = "TEMPORAL_NAMESPACE"
value = "default"
}
# NOTE: not setting DYNAMIC_CONFIG_FILE_PATH that file isn't
# bundled in temporalio/auto-setup. Defaults are fine for our
# use (Postiz only needs the workflow engine, not dynamic config).
resources {
requests = {
cpu = "50m"
memory = "256Mi"
}
limits = {
memory = "1Gi"
}
}
# Auto-setup runs schema migrations on first boot give it time.
startup_probe {
tcp_socket {
port = 7233
}
failure_threshold = 30
period_seconds = 5
initial_delay_seconds = 10
}
liveness_probe {
tcp_socket {
port = 7233
}
period_seconds = 30
}
}
}
}
}
lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
depends_on = [helm_release.postiz]
}
resource "kubernetes_service" "temporal" {
metadata {
name = "temporal"
namespace = kubernetes_namespace.postiz.metadata[0].name
}
spec {
selector = { app = "temporal" }
port {
name = "grpc"
port = 7233
target_port = 7233
}
}
}
# One-shot Job: remove the two default Text-typed search attributes
# (CustomTextField, CustomStringField) that temporalio/auto-setup ships
# with. Postiz needs to register `organizationId` + `postId`, and SQL
# visibility caps at 3 Text attributes total without this, Postiz's
# NestJS bootstrap crashes with "cannot have more than 3 search attribute
# of type Text" and the backend never starts.
# Upstream issue: https://github.com/gitroomhq/postiz-app/issues/1504
#
# Backup CronJob nightly pg_dump of the bundled postiz-postgresql to NFS.
#
# The bundled PostgreSQL StatefulSet uses local-path storage on the K8s node
# OS disk (chart default), which is NOT covered by Layer 1 (LVM thin
# snapshots) or Layer 2 (sda file backup) of the 3-2-1 pipeline. A pg_dump
# CronJob writing to /srv/nfs/postiz-backup/ closes the gap: dumps land on
# Proxmox host NFS covered by inotify-driven offsite sync to Synology.
# Three databases are dumped: postiz (app data), temporal (workflow engine),
# temporal_visibility (workflow search). Bitnami chart-default credentials
# are used same creds the Postiz pod itself uses, scoped to the postiz
# namespace via ClusterIP-only Services.
#
module "nfs_backup_host" {
source = "../../../../modules/kubernetes/nfs_volume"
name = "postiz-backup-host"
namespace = kubernetes_namespace.postiz.metadata[0].name
nfs_server = "192.168.1.127"
nfs_path = "/srv/nfs/postiz-backup"
}
resource "kubernetes_cron_job_v1" "postgres_backup" {
metadata {
name = "postiz-postgres-backup"
namespace = kubernetes_namespace.postiz.metadata[0].name
labels = { app = "postiz", component = "backup" }
}
spec {
schedule = "0 3 * * *"
concurrency_policy = "Forbid"
successful_jobs_history_limit = 3
failed_jobs_history_limit = 5
job_template {
metadata {}
spec {
backoff_limit = 1
ttl_seconds_after_finished = 86400
template {
metadata {
labels = { app = "postiz", component = "backup" }
}
spec {
restart_policy = "OnFailure"
container {
name = "backup"
# Same image/pattern as dbaas/postgresql-backup: official postgres
# client tools + apt-installed curl for the Pushgateway push. The
# bitnamilegacy/postgresql variant is stripped (no curl/wget/python),
# so the metric push silently failed there.
image = "docker.io/library/postgres:16.4-bullseye"
command = ["/bin/bash", "-c"]
args = [
<<-EOT
set -uo pipefail
apt-get update -qq && apt-get install -yqq curl >/dev/null 2>&1 || true
TIMESTAMP=$(date +%Y%m%d_%H%M)
BACKUP_DIR=/backup
STATUS=0
for db in postiz temporal temporal_visibility; do
echo "Dumping $db..."
if PGPASSWORD=postiz-password pg_dump -h postiz-postgresql -U postiz \
--format=custom --compress=6 \
--file="$BACKUP_DIR/$db-$TIMESTAMP.dump" \
"$db"; then
echo " OK: $db ($(du -h "$BACKUP_DIR/$db-$TIMESTAMP.dump" | cut -f1))"
else
echo " FAIL: $db" >&2
STATUS=1
fi
done
find "$BACKUP_DIR" -name '*.dump' -mtime +30 -delete 2>/dev/null || true
{
echo "backup_last_run_timestamp $(date +%s)"
echo "backup_last_status $STATUS"
[ "$STATUS" -eq 0 ] && echo "backup_last_success_timestamp $(date +%s)"
} | curl -sf --connect-timeout 5 --max-time 10 --data-binary @- \
"http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/postiz-postgres-backup" || true
exit $STATUS
EOT
]
volume_mount {
name = "backup"
mount_path = "/backup"
}
resources {
requests = { cpu = "10m", memory = "64Mi" }
limits = { memory = "256Mi" }
}
}
volume {
name = "backup"
persistent_volume_claim {
claim_name = module.nfs_backup_host.claim_name
}
}
}
}
}
}
}
lifecycle {
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
depends_on = [helm_release.postiz]
}
resource "kubernetes_job" "temporal_search_attr_cleanup" {
metadata {
name = "temporal-search-attr-cleanup"
namespace = kubernetes_namespace.postiz.metadata[0].name
}
spec {
backoff_limit = 30
ttl_seconds_after_finished = 300
template {
metadata {}
spec {
restart_policy = "OnFailure"
container {
name = "cleanup"
image = "temporalio/auto-setup:1.28.1"
command = ["/bin/sh", "-c"]
args = [
<<-EOT
set -e
# Wait for Temporal to be reachable (auto-setup may take 30s).
for i in $(seq 1 60); do
if temporal --address temporal:7233 operator search-attribute list >/dev/null 2>&1; then break; fi
sleep 5
done
for attr in CustomTextField CustomStringField; do
if temporal --address temporal:7233 operator search-attribute list 2>/dev/null | grep -q "$attr"; then
temporal --address temporal:7233 operator search-attribute remove --name "$attr" --yes
fi
done
EOT
]
}
}
}
}
wait_for_completion = false
lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
depends_on = [kubernetes_deployment.temporal]
}

View file

@ -0,0 +1,40 @@
variable "tls_secret_name" {
type = string
sensitive = true
description = "Name of the wildcard TLS Secret to copy into the postiz namespace."
}
variable "tier" {
type = string
description = "Workload tier label applied to the namespace (e.g. 4-aux)."
}
variable "namespace" {
type = string
default = "postiz"
description = "Kubernetes namespace for Postiz."
}
variable "host" {
type = string
default = "postiz"
description = "Ingress hostname label (joined with root_domain by ingress_factory)."
}
variable "image_tag" {
type = string
default = "v2.21.7"
description = "Postiz container image tag."
}
variable "chart_version" {
type = string
default = "1.0.5"
description = "Postiz Helm chart version (OCI ghcr.io/gitroomhq/postiz-helmchart)."
}
variable "storage_size" {
type = string
default = "20Gi"
description = "Persistent volume size for /uploads."
}

1
stacks/postiz/secrets Symbolic link
View file

@ -0,0 +1 @@
../../secrets

View file

@ -0,0 +1,13 @@
include "root" {
path = find_in_parent_folders()
}
dependency "platform" {
config_path = "../platform"
skip_outputs = true
}
dependency "vault" {
config_path = "../vault"
skip_outputs = true
}

View file

@ -131,12 +131,23 @@ resource "kubernetes_service" "privatebin" {
}
}
module "anubis" {
source = "../../modules/kubernetes/anubis_instance"
name = "privatebin"
namespace = kubernetes_namespace.privatebin.metadata[0].name
target_url = "http://${kubernetes_service.privatebin.metadata[0].name}.${kubernetes_namespace.privatebin.metadata[0].name}.svc.cluster.local"
}
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.privatebin.metadata[0].name
name = "privatebin"
host = "pb"
dns_type = "proxied"
service_name = module.anubis.service_name
port = module.anubis.service_port
extra_middlewares = ["traefik-x402@kubernetescrd"]
anti_ai_scraping = false
tls_secret_name = var.tls_secret_name
custom_content_security_policy = "script-src 'self' 'unsafe-inline' 'unsafe-eval' 'wasm-unsafe-eval'"
extra_annotations = {

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -330,13 +330,25 @@ resource "kubernetes_service" "realestate-crawler-api" {
}
}
# Anubis fronts the UI ingress only; the /api ingress (`module "ingress-api"`)
# stays direct so XHRs from the UI bypass the challenge.
module "anubis" {
source = "../../modules/kubernetes/anubis_instance"
name = "wrongmove"
namespace = kubernetes_namespace.realestate-crawler.metadata[0].name
target_url = "http://realestate-crawler-ui.${kubernetes_namespace.realestate-crawler.metadata[0].name}.svc.cluster.local"
}
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
dns_type = "proxied"
namespace = kubernetes_namespace.realestate-crawler.metadata[0].name
name = "wrongmove"
service_name = "realestate-crawler-ui"
tls_secret_name = var.tls_secret_name
source = "../../modules/kubernetes/ingress_factory"
dns_type = "proxied"
namespace = kubernetes_namespace.realestate-crawler.metadata[0].name
name = "wrongmove"
service_name = module.anubis.service_name
port = module.anubis.service_port
extra_middlewares = ["traefik-x402@kubernetescrd"]
anti_ai_scraping = false
tls_secret_name = var.tls_secret_name
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "Wrongmove"

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -167,25 +167,9 @@ module "docker-registry-ui" {
}
}
# https://registry.viktorbarzin.me/ (Docker CLI push/pull endpoint)
module "docker-registry-cli" {
source = "./factory"
dns_type = "non-proxied"
name = "registry"
external_name = "docker-registry.viktorbarzin.lan"
port = 5050
backend_protocol = "HTTPS"
tls_secret_name = var.tls_secret_name
protected = false # Docker CLI uses htpasswd, NOT Authentik
max_body_size = "0" # unlimited - Docker layers can be large
depends_on = [kubernetes_namespace.reverse-proxy]
extra_annotations = {
# Skip rate-limit (Docker push/pull generates many rapid requests)
# Keep CrowdSec for L7 protection
"traefik.ingress.kubernetes.io/router.middlewares" = "traefik-csp-headers@kubernetescrd,traefik-crowdsec@kubernetescrd"
"gethomepage.dev/enabled" = "false"
}
}
# registry.viktorbarzin.me decommissioned 2026-05-07 (forgejo-registry-consolidation
# Phase 4). Forgejo at forgejo.viktorbarzin.me is the only writable private
# registry now. Pull-through caches stay on registry VM at 10.0.20.10:5000-5040.
# https://valchedrym.viktorbarzin.me/
module "valchedrym" {

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
]
}
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1"
hashes = [

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

Some files were not shown because too many files have changed in this diff Show more