infra

Author	SHA1	Message	Date
Viktor Barzin	8f0502230b	grafana: env-var datasources + reloader so Vault rotations stop breaking dashboards Wealth, Payslips, and Job-Hunter Grafana datasources all baked the rotating PG password into their ConfigMap at TF-apply time, so every 7-day Vault static-role rotation silently broke the panels until a manual `terragrunt apply`. Same family as the recurring grafana-mysql backend bug — Grafana caches creds at startup and never picks up the new ESO-synced password without a restart. Fix: - Each source stack now creates an ExternalSecret in `monitoring` exposing the rotating password as `<NAME>_PG_PASSWORD` env-var. - Grafana mounts those via `envFromSecrets` (optional=true so a missing source stack doesn't block boot) and the datasource ConfigMaps reference `$__env{<NAME>_PG_PASSWORD}` instead of a literal password. - `reloader.stakater.com/auto: "true"` on the Grafana pod restarts it whenever any of the four DB-cred Secrets is updated. Tested end-to-end: forced `vault write -force database/rotate-role/ pg-wealthfolio-sync` → ESO synced (~30s) → reloader fired → Grafana booted with new env in ~50s total → all three /api/datasources /uid/*/health endpoints return "Database Connection OK". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 17:38:38 +00:00
Viktor Barzin	3148d15d5a	[forgejo] Phases 3+4+5: cutover, decommission, docs sweep End of forgejo-registry-consolidation. After Phase 0/1 already landed (Forgejo ready, dual-push CI, integrity probe, retention CronJob, images migrated via forgejo-migrate-orphan-images.sh), this commit flips everything off registry.viktorbarzin.me onto Forgejo and removes the legacy infrastructure. Phase 3 — image= flips: * infra/stacks/{payslip-ingest,job-hunter,claude-agent-service, fire-planner,freedify/factory,chrome-service,beads-server}/main.tf — image= now points to forgejo.viktorbarzin.me/viktor/<name>. * infra/stacks/claude-memory/main.tf — also moved off DockerHub (viktorbarzin/claude-memory-mcp:17 → forgejo.viktorbarzin.me/viktor/...). * infra/.woodpecker/{default,drift-detection}.yml — infra-ci pulled from Forgejo. build-ci-image.yml dual-pushes still until next build cycle confirms Forgejo as canonical. * /home/wizard/code/CLAUDE.md — claude-memory-mcp install URL updated. Phase 4 — decommission registry-private: * registry-credentials Secret: dropped registry.viktorbarzin.me / registry.viktorbarzin.me:5050 / 10.0.20.10:5050 auths entries. Forgejo entry is the only one left. * infra/stacks/infra/main.tf cloud-init: dropped containerd hosts.toml entries for registry.viktorbarzin.me + 10.0.20.10:5050. (Existing nodes already had the file removed manually by `setup-forgejo-containerd-mirror.sh` rollout — the cloud-init template only fires on new VM provision.) * infra/modules/docker-registry/docker-compose.yml: registry-private service block removed; nginx 5050 port mapping dropped. Pull- through caches for upstream registries (5000/5010/5020/5030/5040) stay on the VM permanently. * infra/modules/docker-registry/nginx_registry.conf: upstream `private` block + port 5050 server block removed. * infra/stacks/monitoring/modules/monitoring/main.tf: registry_ integrity_probe + registry_probe_credentials resources stripped. forgejo_integrity_probe is the only manifest probe now. Phase 5 — final docs sweep: * infra/docs/runbooks/registry-vm.md — VM scope reduced to pull- through caches; forgejo-registry-breakglass.md cross-ref added. * infra/docs/architecture/ci-cd.md — registry component table + diagram now reflect Forgejo. Pre-migration root-cause sentence preserved as historical context with a pointer to the design doc. * infra/docs/architecture/monitoring.md — Registry Integrity Probe row updated to point at the Forgejo probe. * infra/.claude/CLAUDE.md — Private registry section rewritten end- to-end (auth, retention, integrity, where the bake came from). * prometheus_chart_values.tpl — RegistryManifestIntegrityFailure alert annotation simplified now that only one registry is in scope. Operational follow-up (cannot be done from a TF apply): 1. ssh root@10.0.20.10 — edit /opt/registry/docker-compose.yml to match the new template AND `docker compose up -d --remove-orphans` to actually stop the registry-private container. Memory id=1078 confirms cloud-init won't redeploy on TF apply alone. 2. After 1 week of no incidents, `rm -rf /opt/registry/data/private/` on the VM (~2.6GB freed). 3. Open the dual-push step in build-ci-image.yml and drop registry.viktorbarzin.me:5050 from the `repo:` list — at that point the post-push integrity check at line 33-107 also needs to be repointed at Forgejo or removed (the per-build verify is redundant with the every-15min Forgejo probe). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-07 18:30:02 +00:00
Viktor Barzin	2224a6b2cc	[job-hunter] Bump image to 92afc38d — Frankfurter FX + comp_table COALESCE	2026-04-19 19:09:54 +00:00
Viktor Barzin	e813170960	[job-hunter] Bump image to 99ab188f — levels.fyi per-level + comp_points 99ab188f adds the structured-comp pipeline: levels.fyi __NEXT_DATA__ scraper, Robert Walters + Hays PDF parser, comp_points/levels tables (alembic 0003), CLI comp/comp-table/comp-band/backfill-levels, and Grafana panels 6-9. Alembic 0003 runs via the existing init container. After apply, exec: kubectl -n job-hunter exec deploy/job-hunter -c job-hunter -- \ python -m job_hunter backfill-levels kubectl -n job-hunter exec deploy/job-hunter -c job-hunter -- \ python -m job_hunter refresh --source levels_fyi kubectl -n job-hunter exec deploy/job-hunter -c job-hunter -- \ python -m job_hunter refresh --source uk_surveys	2026-04-19 18:56:20 +00:00
Viktor Barzin	ef53053ae6	[job-hunter] Bump image to 48f8615d — London filter + AI CLI New image adds Alembic 0002 (primary_location column), London-default query/bands/report commands, and FX-priming on refresh so USD/EUR salaries convert correctly. Applied live; 5826 rows backfilled. Refs: code-snp Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 18:13:26 +00:00
Viktor Barzin	fec0bbb7dd	[job-hunter] Pin to first built image tag 9c42eac9 Locally-built image pushed to registry.viktorbarzin.me/job-hunter:9c42eac9 after Woodpecker v3.13 Forgejo webhook parsing bug left CI unable to build the initial image (server/forge/forgejo/helper.go:57 nil pointer panic on parse — see repaired webhooks still not triggering pipelines). Unblocks code-97n (TF apply) without waiting for CI recovery. Refs: code-snp, code-0c6 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:48:16 +00:00
Viktor Barzin	c9d6343a9b	[job-hunter] Switch ExternalSecret to explicit UPPERCASE data mappings Replaces dataFrom.extract with per-key `data` entries so the Secret keys in K8s (and therefore env vars in the pod) are always UPPERCASE: WEBHOOK_BEARER_TOKEN, CDIO_API_KEY, SMTP_USERNAME, SMTP_PASSWORD, DIGEST_TO_ADDRESS, DIGEST_FROM_ADDRESS. Vault KV keys at secret/job-hunter stay lowercase (webhook_bearer_token etc.). Refs: code-snp Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:23:28 +00:00
Viktor Barzin	e7ce545da2	[job-hunter] Add infra stack + Grafana dashboard + n8n digest workflow New service stack at stacks/job-hunter/ mirroring the payslip-ingest pattern: per-service CNPG database + role (via dbaas null_resource), Vault static role pg-job-hunter (7d rotation), ExternalSecrets for app secrets and DB creds, Deployment with alembic-migrate init container, ClusterIP Service, Grafana datasource ConfigMap. Grafana dashboard job-hunter.json in Finance folder: new roles per day, source breakdown, top companies, GBP salary distribution, recent roles table (sorted by parse confidence then salary). n8n weekly-digest workflow calls POST /digest/generate with bearer auth every Monday 07:00 London; digest_runs table provides idempotency. Refs: code-snp Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 17:09:29 +00:00

8 commits