infra

Author	SHA1	Message	Date
Viktor Barzin	f10784ddb6	infra: document auth = "app\|none" tier on every legacy ingress Sweep through the 30+ stacks that predated the auth = "app" tier and were tagged auth = "none" without a comment explaining why they weren't behind Authentik. Each is now self-documenting at the call site, so the tg-level anti-exposure guard passes and future readers don't have to reverse-engineer the intent. Flipped 6 stacks from "none" to "app" — their backends have their own user auth and the new tier records that more accurately: - navidrome (Subsonic user/password) - ntfy (deny-all default + user.db tokens) - nextcloud (WebDAV/CalDAV/CardDAV app passwords) - vaultwarden (Bitwarden-compatible token auth) - headscale (OIDC + preauth keys for Tailscale nodes) - paperless-ngx (app-layer login + API tokens) Kept "none" with a comment on the rest — they're genuinely public, webhook receivers, native-protocol endpoints, OAuth callbacks, or Anubis-fronted: authentik (×2 + guest outpost), beads-server (dolt), claude-memory (bearer-token MCP), dawarich, ebooks/book-search-api, fire-planner /api, forgejo (git/OCI native clients), frigate (HA integration), immich/frame, insta2spotify /api, instagram-poster (meta fetcher), k8s-portal, matrix (native bearer), monitoring×2 (HA REST scrapes), n8n (webhooks), nvidia, onlyoffice (JWT), owntracks (HTTP Basic), postiz, privatebin (client-side enc), rybbit (analytics tracker), send (E2E file drop), tuya-bridge (API key), vault (own auth + CLI), webhook_handler, woodpecker (forgejo webhooks + OAuth), xray (×3 VPN transports). real-estate-crawler/main.tf:400 already had its comment from a prior edit — not touched here. No live state changes — auth = "app" produces the same middleware chain as auth = "none" (verified earlier this session). This commit is purely documentation + intent-tagging.	2026-05-22 14:16:44 +00:00
Viktor Barzin	09f83b4e83	fire-planner / k8s-portal / insta2spotify: revert auth=public to auth=none The Phase 4 audit promoted three "smoke-test candidates" from `protected = false` to `auth = "public"`, but all three are XHR / curl-driven endpoints (fetch() calls, automation scripts) that don't survive the 302+cookie redirect dance that the public-auto-login flow requires on first visit. fire-planner's SPA broke immediately — every fetch() to /api/* hit a cross-origin redirect and CORS preflight rejected it. Important learning for the `auth = "public"` design: `auth = "public"` is functionally equivalent to a normal Authentik forward-auth for the FIRST request — it issues a 302 to authentik to set a guest session cookie, then 302s back. This is invisible for top-level browser navigation but BREAKS: - XHR/fetch() under CORS preflight (preflight rejects redirects) - curl/automation scripts that don't preserve cookies across requests - Mobile / native clients that can't follow OAuth-style redirects Use `auth = "public"` only for top-level HTML pages where the user navigates via the browser address bar (or links). For XHR APIs, native-client surfaces, webhooks, OAuth callbacks — use `auth = "none"`. The plan's "smoke test 3 candidates" were misjudged on this front. Reverting all three to `auth = "none"` (their previous behaviour). The end-to-end public flow IS verified working via curl + flow API — the design is sound, just the test targets were wrong. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 14:16:42 +00:00
Viktor Barzin	ff5538a667	ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default false → unprotected) variable in `modules/kubernetes/ingress_factory` with `auth = string` enum (default "required" → fail-closed). Touches every ingress_factory caller so the audit decision is recorded explicitly in code. ingress_factory (Phase 3): - `auth = "required"`: standard Authentik forward-auth (the legacy `protected = true` semantic). - `auth = "public"`: forward-auth via the new `authentik-forward-auth-public` middleware → dedicated public outpost → guest auto-bind. Logged-in users keep their real identity. - `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost itself. - `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated ingresses don't need anti-AI noise; the auth flow already discourages bots). Audit pass (Phase 4) across 96 ingress_factory call sites: - 49 explicit `protected = true` → `auth = "required"` - 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3) - 64 previously-default (no protected line) → `auth = "required"` ADDED, then reviewed individually: * 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack, homepage, wrongmove UI, privatebin) → `auth = "none"` * 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC, xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich location ingestion, immich frame kiosk, headscale CP, send anonymous drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) → `auth = "none"` * Remaining ~33 → `auth = "required"` confirmed (admin tools, internal UIs, services without app-level auth) - Smoke-test promotions to `auth = "public"`: fire-planner public UI, k8s-portal API, insta2spotify callback. Three call sites in wrapper modules (`stacks/freedify/factory/`, `stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected` bool — they translate to `auth` internally, out of scope for this rename. Behavior change: previously-default ingresses now fail closed (require Authentik login) unless explicitly flipped to `auth = "none"` or `auth = "public"`. This is the audit goal — no more accidentally-unprotected surfaces. Sites that were intentionally public (Anubis content, native APIs, webhooks) are now explicitly recorded as `auth = "none"`. Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via `terraform fmt -recursive` during the audit. Behavior-neutral. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 14:16:42 +00:00
Viktor Barzin	4a3ca572e8	fire-planner: imagePullPolicy=Always on alembic-migrate init container After a rollout-restart, the main container (default Always for :latest) pulled the new image with alembic 0003, but the init container defaulted to IfNotPresent and reused a cached old image lacking 0003 → "Can't locate revision identified by '0003'" → CrashLoopBackOff. Setting Always on the init container so both containers stay in lockstep across rollouts. Longer term we should switch the deployment to 8-char git-SHA tags per the cluster policy in .claude/CLAUDE.md, but this unblocks the Wave 1 deploy in the meantime.	2026-05-22 14:16:40 +00:00
Viktor Barzin	a5e9fd8c71	fire-planner: expose actualbudget creds via ExternalSecret All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details Adds 3 new keys (ACTUALBUDGET_API_URL/KEY/SYNC_ID) sourced from Vault secret/fire-planner so the FastAPI backend can read viktor's spending from the in-cluster actualbudget HTTP API and prefill the Annual spending field on the WhatIf form. Vault keys seeded manually ahead of this commit; ESO has already synced the K8s Secret. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 11:12:40 +00:00
Viktor Barzin	2d6812f951	fire-planner: dual ingress — /api/* unprotected, / behind Authentik The SPA can't carry an Authentik session on its own fetch() XHRs in all cases (cross-origin redirect to authentik.viktorbarzin.me on a stale cookie returns HTML, fetch().json() parse fails). Splitting the ingress so /api/ paths skip forward-auth lets the React app talk to its API end-to-end. The browser still has to log in via Authentik to load the SPA at /. Verified end-to-end via chrome-service Playwright: dashboard load, scenario list, what-if run with real Monte Carlo, save-as-scenario round-trip, run-now on detail, delete — all pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 11:12:40 +00:00
Viktor Barzin	9904561c26	fire-planner: ingress port 8080 (was defaulting to 80) ingress_factory's port var defaults to 80, but fire-planner publishes on 8080. Traefik logged 'Cannot create service error="service port not found"' and 404'd every request. Cloudflare's standard origin-error decoy page (with the noindex meta + cdn-cgi/content honeypot link) made it look like a bot-block, but it was just the upstream coming back 404. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 11:12:39 +00:00
Viktor Barzin	4ec40ea804	[forgejo] Phases 3+4+5: cutover, decommission, docs sweep End of forgejo-registry-consolidation. After Phase 0/1 already landed (Forgejo ready, dual-push CI, integrity probe, retention CronJob, images migrated via forgejo-migrate-orphan-images.sh), this commit flips everything off registry.viktorbarzin.me onto Forgejo and removes the legacy infrastructure. Phase 3 — image= flips: * infra/stacks/{payslip-ingest,job-hunter,claude-agent-service, fire-planner,freedify/factory,chrome-service,beads-server}/main.tf — image= now points to forgejo.viktorbarzin.me/viktor/<name>. * infra/stacks/claude-memory/main.tf — also moved off DockerHub (viktorbarzin/claude-memory-mcp:17 → forgejo.viktorbarzin.me/viktor/...). * infra/.woodpecker/{default,drift-detection}.yml — infra-ci pulled from Forgejo. build-ci-image.yml dual-pushes still until next build cycle confirms Forgejo as canonical. * /home/wizard/code/CLAUDE.md — claude-memory-mcp install URL updated. Phase 4 — decommission registry-private: * registry-credentials Secret: dropped registry.viktorbarzin.me / registry.viktorbarzin.me:5050 / 10.0.20.10:5050 auths entries. Forgejo entry is the only one left. * infra/stacks/infra/main.tf cloud-init: dropped containerd hosts.toml entries for registry.viktorbarzin.me + 10.0.20.10:5050. (Existing nodes already had the file removed manually by `setup-forgejo-containerd-mirror.sh` rollout — the cloud-init template only fires on new VM provision.) * infra/modules/docker-registry/docker-compose.yml: registry-private service block removed; nginx 5050 port mapping dropped. Pull- through caches for upstream registries (5000/5010/5020/5030/5040) stay on the VM permanently. * infra/modules/docker-registry/nginx_registry.conf: upstream `private` block + port 5050 server block removed. * infra/stacks/monitoring/modules/monitoring/main.tf: registry_ integrity_probe + registry_probe_credentials resources stripped. forgejo_integrity_probe is the only manifest probe now. Phase 5 — final docs sweep: * infra/docs/runbooks/registry-vm.md — VM scope reduced to pull- through caches; forgejo-registry-breakglass.md cross-ref added. * infra/docs/architecture/ci-cd.md — registry component table + diagram now reflect Forgejo. Pre-migration root-cause sentence preserved as historical context with a pointer to the design doc. * infra/docs/architecture/monitoring.md — Registry Integrity Probe row updated to point at the Forgejo probe. * infra/.claude/CLAUDE.md — Private registry section rewritten end- to-end (auth, retention, integrity, where the bake came from). * prometheus_chart_values.tpl — RegistryManifestIntegrityFailure alert annotation simplified now that only one registry is in scope. Operational follow-up (cannot be done from a TF apply): 1. ssh root@10.0.20.10 — edit /opt/registry/docker-compose.yml to match the new template AND `docker compose up -d --remove-orphans` to actually stop the registry-private container. Memory id=1078 confirms cloud-init won't redeploy on TF apply alone. 2. After 1 week of no incidents, `rm -rf /opt/registry/data/private/` on the VM (~2.6GB freed). 3. Open the dual-push step in build-ci-image.yml and drop registry.viktorbarzin.me:5050 from the `repo:` list — at that point the post-push integrity check at line 33-107 also needs to be repointed at Forgejo or removed (the per-build verify is redundant with the every-15min Forgejo probe). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-07 23:29:34 +00:00
Viktor Barzin	f0ce7b0363	fire-planner: add stack, Vault DB role, dashboard, DB New stacks/fire-planner/ mirrors payslip-ingest layout: - ExternalSecret pulling RECOMPUTE_BEARER_TOKEN from Vault secret/fire-planner - DB ExternalSecret templating DB_CONNECTION_STRING via static role pg-fire-planner - FastAPI Deployment (serve), CronJob (recompute-all monthly on 2nd at 09:00 UTC, scheduled after wealthfolio-sync's 1st at 08:00), ClusterIP Service - Grafana datasource ConfigMap "FirePlanner" — `database` inside jsonData (`cc56ba29` fix; otherwise Grafana 11.2+ hits "you do not have default database") Plus: - vault/main.tf: pg-fire-planner static role (7d rotation), allowed_roles - dbaas/modules/dbaas/main.tf: null_resource creates fire_planner DB+role - monitoring/dashboards/fire-planner.json: 9-panel Finance-folder dashboard (NW timeseries, MC fan chart, success heatmap, lifetime tax bars, years-to-ruin table, optimal leave-UK stat, ending wealth stat, UK success-by-strategy bars, sequence-risk correlation table) - monitoring/modules/monitoring/grafana.tf: register "fire-planner.json" in Finance folder Apply order: 1. vault stack — creates the static role 2. dbaas stack — creates the database & role 3. external-secrets stack picks up vault-database refs (no change needed) 4. fire-planner stack — first apply with -target=kubernetes_manifest.db_external_secret before full apply, per the plan-time-data-source pattern 5. monitoring stack — picks up the new dashboard ConfigMap [ci skip] Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 17:27:19 +00:00

9 commits