infra

Author	SHA1	Message	Date
Viktor Barzin	e6699ed20b	uptime-kuma: retry Kuma login in monitor-sync jobs (intermittent socket.io timeout) All checks were successful ci/woodpecker/push/default Pipeline was successful Details The internal + external monitor-sync CronJobs intermittently failed with socketio.exceptions.TimeoutError on api.login(), firing JobFailed -> Slack noise (and leaving monitor sync stale). Kuma 2.3.2 itself is healthy (1/1, 30m CPU); its single Node event loop just briefly stalls under ~300 monitors so the socket.io login handshake occasionally exceeds the client timeout. Wrap connect+login in a 5-attempt / 15s-backoff retry (disconnecting the half-open client between tries) so a transient stall no longer fails the whole job. Applied to both sync scripts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 20:54:14 +00:00
Viktor Barzin	a6381b8cf8	forgejo: custom 8Gi ResourceQuota (was pegged at the 4Gi tier cap) Some checks failed ci/woodpecker/push/default Pipeline failed Details Yesterday's Forgejo 3Gi->4Gi OOM fix pushed its tier-3-edge namespace quota (requests.memory=4Gi) to 100%, firing KubeQuotaAlmostFull + the healthcheck resourcequota check. Forgejo is the git + OCI-registry backbone and legitimately needs ~4Gi, so the edge tier's 4Gi ceiling is too tight. Opt the namespace out of the auto tier quota (resource-governance/custom-quota=true) and define a forgejo-specific ResourceQuota at requests.memory=8Gi, so the 4Gi pod sits at ~50% with headroom. Same opt-out pattern dbaas uses. Re-tiering was rejected: tier 1-cluster is also 4Gi, and 0-core (8Gi) would over-classify Forgejo's priority/eviction. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 17:16:47 +00:00
Viktor Barzin	25a39fd54e	k8s-portal: wire private-ghcr pull (allowlist + imagePullSecrets) All checks were successful ci/woodpecker/push/default Pipeline was successful Details k8s-portal was the last in-cluster image build; it now builds on GHA and pushes ghcr.io/viktorbarzin/k8s-portal:latest, which is PRIVATE (infra repo default). To pull it: add k8s-portal to the sync-ghcr-credentials Kyverno allowlist (clones the ghcr-credentials Secret into the namespace) and reference that secret via imagePullSecrets on the deployment — same wiring as tripit/recruiter-responder. Completes the no-local-builds migration so nothing builds container images on the cluster anymore (ADR-0002). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 15:38:42 +00:00
Viktor Barzin	a7d33abec9	k8s-portal: commit package.json + lock (force; was gitignored) — unblocks GHA build Some checks failed ci/woodpecker/push/default Pipeline was successful Details Build k8s-portal / build (push) Has been cancelled Details Recovered the real manifest + resolved lockfile (lockfileVersion 3, 71 pkgs) from the running pod. A parent .gitignore force-ignored package.json, so the git source tree was incomplete and the image only ever built manually. Now reproducible on GHA (ADR-0002 no-local-builds). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 15:29:27 +00:00
Viktor Barzin	a9b08c03cf	fix(k8s-portal): npm install (no committed lockfile) so GHA can build Some checks are pending Build k8s-portal / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details package-lock.json was never committed to either lineage — npm ci needs it, so the build only ever worked from a manual devvm build with a local lock. npm install resolves from package.json, unblocking the GHA build (ADR-0002). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 15:26:42 +00:00
Viktor Barzin	b906f61ac3	k8s-portal: build off-infra GHA -> ghcr + Keel; remove Woodpecker build (no-local-builds) Some checks failed ci/woodpecker/push/default Pipeline was canceled Details The last in-cluster image build. GHA build-k8s-portal.yml builds ghcr.io/viktorbarzin/k8s-portal:latest+sha (path-filtered on the Dockerfile dir); Keel (force/poll/match-tag) rolls the deployment. Stack image repointed to ghcr (ignore_changed); .woodpecker/k8s-portal.yml deleted. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 15:21:35 +00:00
Viktor Barzin	9501da81a0	dbaas: document postgresql-backup startingDeadlineSeconds rationale All checks were successful ci/woodpecker/push/default Pipeline was successful Details Inline note on why the four backup CronJobs moved 10s->600s (`bda1bdcb`): a 10s deadline silently dropped the 2026-06-13 midnight full-backup run, firing PostgreSQLBackupStale. `bda1bdcb` rode in the same push as a forgejo change that failed CI on a namespace-quota error, so that pipeline failed before the dbaas apply took effect (live deadline was still 10s). This dbaas-only commit re-triggers the dbaas apply at a clean master so the 600s deadline actually goes live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 14:22:24 +00:00
Viktor Barzin	ba72621e52	forgejo: 6Gi exceeded namespace quota, set to 4Gi (quota ceiling) Some checks failed ci/woodpecker/push/default Pipeline was canceled Details The 3Gi->6Gi bump in `ff3cc44a` was rejected by the forgejo namespace tier-quota (requests.memory capped at 4Gi). With Guaranteed QoS the 6Gi request exceeded quota; FailedCreate left forgejo with 0 pods for ~6 min (git remote + OCI registry outage) until I patched the live Deployment back to a schedulable 4Gi. 4Gi is the most the quota allows and is still a headroom bump over the OOM-prone 3Gi. To go higher the tier-quota must be raised in the same change. This reconciles TF to the live 4Gi so the pending/next apply is a no-op rather than reverting to the quota-busting 6Gi. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 14:13:36 +00:00
Viktor Barzin	ff3cc44a29	forgejo: raise memory limit from 3Gi to 6Gi (OOMKilled at 3Gi) Some checks failed ci/woodpecker/push/default Pipeline failed Details Forgejo OOMKilled twice on 2026-06-13 at the 3Gi cap (exit 137), briefly taking the git remote and OCI registry down and spiking ingress TTFB to 4.7s and the 4xx rate to 51%. Steady-state is ~2.2Gi but it spiked into the cap (true demand above 3.2Gi). The 2026-06-09 bump to 3Gi was sized for tripit buildkit registry pushes, but that driver is gone now that the Forgejo registry was frozen and emptied today (ADR-0002, images on ghcr), so the spike is git ops / the integrity-probe catalog walk / a possible leak. 6Gi gives headroom on the critical git backbone while we watch whether working-set keeps climbing (which would indicate a leak). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 14:02:55 +00:00
Viktor Barzin	bda1bdcbf3	dbaas: widen backup CronJob startingDeadlineSeconds from 10s to 600s The daily full PostgreSQL backup silently skipped its 2026-06-13 00:00 run, leaving the last full dump 37h old and firing the critical PostgreSQLBackupStale alert. Root cause: startingDeadlineSeconds was 10s on all four dbaas backup CronJobs, so when the CronJob controller was more than 10s late to the midnight tick (many IO-heavy backups all fire at 00:00, the known etcd-starvation window) the run was dropped entirely instead of starting late. 600s lets a brief controller lag still launch the job. Applied to all four (mysql + pg, full + per-db) since they share the footgun and the midnight contention. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 14:02:54 +00:00
Viktor Barzin	3e82c64a76	docs: sync CI/CD docs to ADR-0002 final state (ghcr + Woodpecker deploy-only) [ci skip] ADR-0002 is fully landed (issues #11-#32 closed): every owned image now builds on GitHub Actions and pushes to ghcr.io/viktorbarzin/<name>, with Woodpecker reduced to deploy-only. The Forgejo container registry is frozen and emptied; there are no in-cluster image builds or CI test runs anywhere. The docs still described the old hybrid topology (DockerHub builds, Woodpecker-native owned-app builds, the per-pattern migration lists, the tripit-only pilot framing), which would mislead future sessions and incident response. This brings the docs to the completed reality (closes #33): - docs/architecture/ci-cd.md: full rewrite as the canonical CI/CD reference — the fleet GHA->ghcr->Woodpecker-deploy pattern, public/private ghcr package split, infra-owned image workflows (incl. infra-ci on ghcr), the frozen Forgejo registry, what Woodpecker still runs, and the #31 decommissions. - .claude/CLAUDE.md: rewrite the "CI/CD Architecture" section to the fleet-wide final state; FIX the stale claim that claude-memory-mcp builds to DockerHub (it is GHA->ghcr); note owned images now live on ghcr and the Forgejo registry is frozen/break-glass near the image-registry bullet. - .claude/reference/service-catalog.md: f1-stream is GHA->ghcr + Woodpecker deploy-only (was "Woodpecker-native build->deploy"). - stacks/{tuya-bridge,android-emulator}/variables.tf + stacks/terminal/main.tf: cosmetic description/comment updates (forgejo -> ghcr; terminal-lobby has no CI pipeline). Description/comment text only — no stack logic changed. Historical records (docs/post-mortems/, docs/plans/) and ADR-0002 itself are left untouched as point-in-time records. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 12:55:49 +00:00
Viktor Barzin	6e4db0ddc6	openclaw + f1-stream: last forgejo image refs -> ghcr (ADR-0002 #32 prep) All checks were successful ci/woodpecker/push/default Pipeline was successful Details openclaw's install-nextcloud-todos-plugin init still pulled forgejo nextcloud-todos (would ImagePullBackOff on restart once the forgejo registry is wiped) -> ghcr:latest. f1-stream stack base (KEEL_IGNORE'd, live already ghcr via set-image) repointed for fresh-create correctness. Clears the last LIVE forgejo viktor/* refs before the registry reclaim. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 12:36:10 +00:00
Viktor Barzin	eb8b550521	chrome-service: TF-manage novnc image (ghcr:latest), drop its KEEL_IGNORE (ADR-0002 #29 ) All checks were successful ci/woodpecker/push/default Pipeline was successful Details novnc's image was ignore_changed (KEEL_IGNORE) but nothing manages its tag (keel.sh/policy=never), so the earlier forgejo->ghcr repoint never took. Removing container[1].image from ignore_changes lets terragrunt own novnc=ghcr:latest and roll it. container[0]/[2] (pinned playwright) stay ignored. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 09:49:58 +00:00
Viktor Barzin	94a3d1b870	chrome-service-novnc + android-emulator images -> ghcr (ADR-0002 #29/#30) All checks were successful ci/woodpecker/push/default Pipeline was successful Details Both now built by GHA → public ghcr. Repoint stack image bases forgejo→ghcr:latest (terragrunt-managed, imagePullPolicy Always picks up rebuilds). android var default api36-v8 -> latest. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 09:43:40 +00:00
Viktor Barzin	a69847a0f3	tripit: enable Wikipedia city cover photos (CITY_IMAGE_PROVIDER=wikipedia, #47 ) Flips the planning workspace's Stay cover photos from the fake provider to live Wikipedia lead-image fetches (downloaded into STORAGE_DIR, served by the backend, editable per Stay). Part of the new-trip flow feature: every picked destination city gets a banner-ready cover. HOLD-ORDER: pushed only after the tripit image containing CityImageMode.wikipedia rolled out. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 09:43:40 +00:00
Viktor Barzin	f61d707d75	travel_blog: remove decommissioned stack (ADR-0002 infra#31) All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details Service was already scaled 0/0 and unused (Viktor: 'not used anymore'). Live resources destroyed via scripts/tg destroy (10 resources: deployment, namespace, service, anubis-travel + PDB/cm/svc/secret, ingress, TLS). Removing the stack dir; old Woodpecker build (repo 5) deactivated separately. The harmless legacy 'travel' CNAME->apex in config.tfvars is left (now 404s; removing it would trigger a full-platform apply). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 09:32:39 +00:00
Viktor Barzin	90fb0685ae	traefik: x402-gateway image forgejo -> ghcr + KEEL_IGNORE_IMAGE (ADR-0002 infra#28) All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details Formalizing x402-gateway CI (was a manual no-CI image). The deployment lives in the traefik module; its image was NOT in ignore_changes, so a set-image deploy would be reverted on the next traefik apply — added it (KEEL_IGNORE_IMAGE). Base repointed to ghcr:latest; the GHA deploy set-images the :sha8. Public ghcr package = no pull secret. Inert on the live pod (image now ignored); rolling cutover keeps forwardAuth up. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 02:42:45 +00:00
Viktor Barzin	3960eac716	claude-memory: image base forgejo -> ghcr (ADR-0002 infra#20) Some checks failed ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was canceled Details GHA now builds+pushes ghcr.io/viktorbarzin/claude-memory-mcp (public). Image is KEEL_IGNORE_IMAGE (set-image managed), so this apply is inert on the live pod; the stale :17 default literal is corrected to :latest. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 02:34:20 +00:00
Viktor Barzin	2f3c58dff1	claude-agent-service image -> ghcr across all five consumer stacks (infra#19) All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details GHA now builds+pushes ghcr.io/viktorbarzin/claude-agent-service (public package, anonymous pulls). Repointed: claude-agent-service (deployment + git-init/seed-beads-agent inits), claude-breakglass, ci-pipeline-health, beads-server CronJobs, k8s-version-upgrade (tag var 2fd7670d -> latest — the Forgejo registry lost that sha; node caches were the only thing keeping those CronJobs alive). publish-gate: vendor-contact emails (licensing@/legal@/security@/sales@) ruled license-boilerplate, not PII. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 01:47:54 +00:00
Viktor Barzin	8aba3a0179	offinfra-onboard --no-deploy; wealthfolio-sync image -> ghcr (ADR-0002 infra#25) All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details broker-sync is a CronJob-only consumer (no deployment): new --no-deploy mode skips Woodpecker registration and renders build.yml without the deploy job — :latest+Always CronJobs pick up builds on the next run. wealthfolio stack: ghcr-credentials pull secret + image base repoint. The wealthfolio-sync image regains a reproducible rebuild path. Closes: code-62tm Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 01:39:35 +00:00
Viktor Barzin	2dde480795	openclaw: install-recruiter-plugin init image forgejo -> ghcr :latest (infra#27) All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details Second half of the recruiter-responder off-infra migration: the first GHA build has published ghcr.io/viktorbarzin/recruiter-responder:{1d99a8d5,latest}, so the openclaw plugin-install init container can now follow the ghcr :latest. The forgejo-side build pipeline was removed by the onboarding commit, so the old forgejo :latest tag is frozen and would silently serve stale plugin code. Deferred from the first commit on purpose - flipping it before the package existed would have wedged the openclaw rollout on ImagePullBackOff. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:57:30 +00:00
Viktor Barzin	57ff41e47e	recruiter-responder: pull image from ghcr + ghcr-credentials on all consumers (ADR-0002, infra#27) All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details Migrating recruiter-responder off in-cluster Woodpecker builds: GHA will build and push ghcr.io/viktorbarzin/recruiter-responder (PRIVATE package). This commit lands the pull-side prerequisites BEFORE the first off-infra build fires: - stacks/recruiter-responder: image base forgejo -> ghcr (inert on the live Deployment - both containers are ignore_changes'd; the Woodpecker deploy moves the tag) + ghcr-credentials imagePullSecrets on the Deployment (covers the recruiter-responder container AND the alembic-migrate init container, which share the image). - stacks/openclaw: ghcr-credentials imagePullSecrets on the openclaw Deployment - its install-recruiter-plugin init container consumes the :latest tag of this image. The image ref itself flips to ghcr in a follow-up once the first GHA build has created the package (flipping now would ImagePullBackOff on a not-yet-existing package and wedge the apply). - stacks/kyverno: allowlist openclaw in sync-ghcr-credentials so the pull secret is cloned into that namespace too. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:43:35 +00:00
Viktor Barzin	c594274c83	ci: re-apply fire-planner stack after pipeline race All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details Comment-only touch so the changed-stack detection applies stacks/fire-planner from the current master tree. Pipeline 150 (commit `f18dfa4c` — the ghcr image base + ghcr-credentials migration for issue #26) was auto-killed when the concurrent nextcloud-todos push superseded it, and pipeline 151 diffed from `f18dfa4c` onward so the fire-planner stack changes were never applied (cronjobs still point at the forgejo image, pod specs lack ghcr-credentials). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:41:20 +00:00
Viktor Barzin	a264a19629	Merge remote-tracking branch 'forgejo/master' into wizard/nextcloud-todos-ghcr All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details	2026-06-13 00:38:27 +00:00
Viktor Barzin	d5c328d23c	nextcloud-todos: image base forgejo -> ghcr (ADR-0002, infra#18) The nextcloud-todos build moved off-infra: GHA builds on the public GitHub mirror and pushes ghcr.io/viktorbarzin/nextcloud-todos (public package, anonymous pulls); Woodpecker repo 207 is deploy-only. First ghcr image (:19c22d8c) is already built, deployed and rolled out, so this repoint lands after the image exists. Both deployment image refs (main + alembic-migrate init) are ignore_changes'd — no live churn, the base matters only on resource (re)create. Old image was pulled from a Forgejo registry package that no longer exists (pods survived on node image cache only). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:38:25 +00:00
Viktor Barzin	f18dfa4c8b	fire-planner: pull image from ghcr + add ghcr-credentials to all pod specs Some checks failed ci/woodpecker/push/build-cli Pipeline was canceled Details ci/woodpecker/push/default Pipeline was canceled Details Migrating fire-planner off in-cluster Woodpecker builds to GitHub Actions -> ghcr.io (ADR-0002, issue #26). The image base moves forgejo.viktorbarzin.me/viktor/fire-planner -> ghcr.io/viktorbarzin/fire-planner (a PRIVATE ghcr package), so the deployment, all three cronjobs (recompute, col-refresh, examples-weekly) and the examples bulk job gain the ghcr-credentials imagePullSecret (the kyverno sync-ghcr-credentials allowlist already covers the fire-planner namespace). registry-credentials stays alongside so the currently-running sha-pinned forgejo image can still be pulled until the first ghcr deploy lands; the cronjob images are TF literals and flip to ghcr :latest on this apply. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:38:09 +00:00
Viktor Barzin	cdd60d9078	ci: re-apply instagram-poster + payslip-ingest stacks after pipeline race All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details Comment-only touch of both stacks so the changed-stack detection applies them from the current master tree. Two pipelines went wrong in sequence during the parallel ADR-0002 wave-2 migrations (issues #23/#24): - pipeline 146 (instagram-poster stack prep, commit `29c69250`) was auto-killed when the concurrent payslip-ingest push superseded it, so its apply never ran; - restarting it as pipeline 148 inherited CI_PREV_COMMIT_SHA = the NEW branch head (`6928ce0b`) with the OLD checkout (`29c69250`) — a reverse diff that re-applied stacks/payslip-ingest from the pre-migration tree, stripping the ghcr image base + ghcr-credentials pull secrets that pipeline 147 had just applied (2 resources reverted). This commit restores the committed payslip-ingest config exactly as issue #24 landed it and finally applies the instagram-poster ghcr prep from issue #23. Lesson encoded in the comments: do not restart killed infra pipelines after master has moved — re-trigger with a touch commit instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:11:17 +00:00
Viktor Barzin	6928ce0be5	Merge remote-tracking branch 'forgejo/master' into wizard/payslip-ingest-ghcr All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details	2026-06-13 00:03:29 +00:00
Viktor Barzin	5d236c2352	payslip-ingest: image base forgejo -> ghcr, ghcr-credentials pull secret, cron to :latest+Always Prep for moving payslip-ingest's image build off-infra to GitHub Actions -> ghcr.io (ADR-0002 wave 2, issue #24). One stack commit before onboarding: - image base repointed forgejo.viktorbarzin.me/viktor/payslip-ingest -> ghcr.io/viktorbarzin/payslip-ingest (private ghcr package) - ghcr-credentials imagePullSecrets added on the Deployment AND the actualbudget-payroll-sync CronJob pod specs (namespace is already in the kyverno sync-ghcr-credentials allowlist; secret verified present) - the CronJob's SHA pin is retired: terragrunt image_tag 4f70681d -> latest plus explicit imagePullPolicy Always on the cron container, per the fleet convention for owned-app CronJobs — one less set-image target, and the cron can never go back to pulling the dead Forgejo tag The Deployment keeps KEEL_IGNORE_IMAGE; its concrete :sha8 tag is set by the Woodpecker deploy pipeline after each GHA build. Closes: nothing yet — the repo-side onboarding (offinfra-onboard) follows. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:03:11 +00:00
Viktor Barzin	29c6925031	instagram-poster: image base forgejo->ghcr + ghcr-credentials pull secret All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details Prep for migrating instagram-poster off in-cluster Woodpecker builds to GitHub Actions -> ghcr.io (ADR-0002, issue #23, PRIVATE-repo path). Viktor asked for the wave-2 migration of instagram-poster per the wave-1 retro recipe: before onboarding, the stack must (a) carry the ghcr-credentials imagePullSecret on the Deployment so the cluster can pull the private ghcr image, and (b) repoint the image base from forgejo.viktorbarzin.me/viktor to ghcr.io/viktorbarzin. The Deployment image is KEEL_IGNORE_IMAGE (ignore_changes), so this apply does NOT roll the pod to a not-yet-existing ghcr image — the live forgejo-built :da5b4191 keeps running until the first GHA build POSTs the Woodpecker deploy. The three CronJobs run curlimages/curl (public DockerHub), not the app image, so they need neither the pull secret nor a repoint. registry-credentials stays for the transition window. Closes: nothing (stack prep only; repo onboarding follows) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:02:04 +00:00
Viktor Barzin	72b5843e4b	publish-gate: exclude package-lock + beads tracker from email heuristic; beadboard image base -> ghcr All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details infra#17: the gate flagged npm deprecation boilerplate (package-lock.json escapes the *.lock filter) and the upstream fork author's email in tracked .beads data — both already-public upstream content, ruled false positives. Lock files excluded properly; .beads moved to the eyeball inventory. beads-server stack: beadboard image base repointed (deployment image is KEEL-ignored; no CronJobs use it). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:52:07 +00:00
Viktor Barzin	57ffd0ed8d	Merge remote-tracking branch 'forgejo/master' into wizard/freedify-mig All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details	2026-06-12 23:37:19 +00:00
Viktor Barzin	c16fe56180	freedify: image base forgejo registry -> ghcr (ADR-0002) Freedify builds moved off-infra per issue #22: GitHub Actions on the ViktorBarzin/freedify mirror now builds and pushes the public image ghcr.io/viktorbarzin/freedify, and the Woodpecker deploy pipeline (repo 202) rolls :sha8 via kubectl set image. Both factory deployments (music-viktor, music-emo) now seed from ghcr instead of the retired in-cluster Forgejo build, and the container image joins lifecycle ignore_changes (KEEL_IGNORE_IMAGE) so terraform applies do not revert the deployed :sha8. Landed after the first GHA push so ghcr :latest already existed when this repoint applied. Public package - no pull secret needed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:37:10 +00:00
Viktor Barzin	9f742b544c	kms: image base forgejo registry -> ghcr (ADR-0002 infra#21) All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details kms-website moves off in-cluster Woodpecker builds to GHA -> ghcr. The kms-web-page deployment image is ignore_changes'd (CI sets the live tag), so this repoint only governs future creates; package is PUBLIC so no pull secret is wired. No CronJobs in this stack. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:30:07 +00:00
Viktor Barzin	fb88440ec4	ci-pipeline-health: billing moved to the enhanced usage endpoint All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details The legacy /settings/billing/actions endpoint now returns 410; sum Minutes usageItems from /settings/billing/usage instead (found during the infra#16 retro: June-to-date = 420/2000). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:24:18 +00:00
Viktor Barzin	12bdd06f74	kyverno: force_new on sync-ghcr-credentials — generate rules are immutable All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details Pipeline 138: the validate-policy webhook denies in-place edits of a generate rule (allowlist additions). force_new = delete+recreate; generated secrets survive and generateExisting re-adopts. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:18:15 +00:00
Viktor Barzin	6b0d42c7bc	publish-gate + tuya-bridge ghcr cutover prep (ADR-0002 infra#15) Some checks failed ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline failed Details publish-gate: gitleaks + trufflehog (full history) + PII heuristics; CLEAN verdict gates any public flip, DIRTY = stays private. tuya-bridge: ghcr-credentials pull secret + image base -> ghcr; namespace added to the ghcr-credentials allowlist as a safety net (new ghcr packages default PRIVATE even from public repos — prune after visibility flip). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:12:02 +00:00
Viktor Barzin	54dfaf6edc	job-hunter: image base forgejo registry -> ghcr (ADR-0002) All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details CronJobs track :latest via the TF literal (unlike the ignore_changes'd deployment), so they kept pulling the dead Forgejo image after the GHA/ghcr cutover — repoint the stack's image base. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:06:54 +00:00
Viktor Barzin	1c41781996	job-hunter: ghcr-credentials pull secret on deployment + CronJobs All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details ADR-0002 wave 1 (infra#14): job-hunter's image moves to private ghcr; the deployment AND both :latest CronJobs need the Kyverno-cloned pull secret. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 22:56:48 +00:00
Viktor Barzin	baff3d7477	offinfra-onboard: per-repo GHA->ghcr migration tool + f1-stream ghcr pull secret All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details ADR-0002 tracer bullet (infra#13), per Viktor's go-ahead. Idempotent script: GitHub mirror repo (create/unarchive/visibility), GHA secrets via gh, Forgejo push-mirror (sync_on_commit) + initial sync, Woodpecker mirror registration, renders build.yml/deploy.yml from templates (single-manifest provenance:false, svu semver to Forgejo, ghcr keep-10 retention, Slack notify-failure, manual-event deploy), removes the old in-cluster build pipeline, commits on the Canonical side. f1-stream stack gains the ghcr-credentials imagePullSecret (first consumer). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 22:21:22 +00:00
Viktor Barzin	3138a0a040	Merge remote-tracking branch 'forgejo/master' into wizard/breakglass Some checks failed ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline failed Details	2026-06-12 21:41:58 +00:00
Viktor Barzin	32cf75635f	claude-breakglass: in-cluster warm break-glass UI for the devvm Stand up the infra for Viktor's break-glass: when the devvm is wedged (cluster healthy), open breakglass.viktorbarzin.me, have Claude SSH in to diagnose/fix, and power-cycle VM 102 via the Proxmox host if needed. App half landed in the claude-agent-service repo. New stack stacks/claude-breakglass/ — own namespace + SA, NO Vault role (ESO syncs only its key, so the pod has zero direct Vault access). Hardened to survive the pressure it exists to fix: priorityClassName tier-0-core, broad node-pressure tolerations, anti-affinity off node1, imagePullPolicy Always. auth="required" ingress so it rides the Authentik resilience proxy and stays reachable via the basic-auth fallback during an auth-stack outage. Runs the shared claude-agent-service image with the breakglass entrypoint. files/breakglass-pve is the PVE forced-command (status\|forensics\|reset\|stop\| start\|cycle on VM 102, forensics-first). Isolation: the shared claude-agent pod's terraform-state Vault policy is explicitly DENIED secret/claude-breakglass/* (stacks/vault/main.tf) so a prompt-injected agent on that pod can't read the root-on-devvm key. traefik: add a checksum/auth-proxy-htpasswd annotation so the auth-proxy rolls when the emergency basic-auth password rotates (it's a subPath mount that doesn't auto-update) — regenerated this session so Viktor has a known emergency credential, which the auth-stack-outage failure domain requires. Docs: docs/runbooks/breakglass-ui.md (full incident + bootstrap procedure, incl. the per-host from= NAT quirks) and a security.md note recording the two new privileged footholds. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 21:40:17 +00:00
Viktor Barzin	1eee2d6eb6	Merge remote-tracking branch 'forgejo/master' into wizard/tripit-sub-mode All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details	2026-06-12 21:17:09 +00:00
Viktor Barzin	42cd7d8272	tripit: flip AUTH_MODE to hybrid + OTA bundle env (Android Shell live) The 81a816f7 image (hybrid auth + OTA endpoints) is rolled out, so the env can flip: AUTH_MODE=hybrid with the tripit-app OIDC knobs makes the bearer-only tripit-api host actually authenticate Shell logins (browser cookie path unchanged); BUNDLE_PUBLIC_BASE pins the signed OTA zip URLs to that host; BUNDLE_TOKEN_SECRET joins the tripit-secrets ES (value already written to Vault secret/tripit). Part of the Android APK work (tripit #50/#51). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 21:16:14 +00:00
Viktor Barzin	02785987dd	ci-pipeline-health: image :latest+Always — registry lost the 2fd7670d tag All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details The sha tag other claude-agent-service CronJobs pin no longer exists in the Forgejo registry (node caches mask it); fresh pulls 404. Follow the owned-app CronJob convention until infra#19 moves this image to ghcr. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 21:06:20 +00:00
Viktor Barzin	765cfe803f	tripit: tripit-app provider issues sub = user email (hybrid-auth identity fix) Review of tripit slice #50 caught that the provider's default sub_mode (hashed_user_id) would make Shell JWTs carry a sub that never matches the email-keyed prod user rows - first app login would either 500 in placeholder reconciliation or split the user's identity. sub_mode = user_email makes bearer and forward-auth resolve the same row. Part of the Android APK work (tripit #50). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 21:00:33 +00:00
Viktor Barzin	bd0cb71f17	tts: TCP probes — http liveness killed the server mid-synthesis All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details The devnen server runs chunked synthesis as a blocking call inside its async handler, so the event loop (and every HTTP probe) hangs for the whole multi-minute story. Kubelet's http liveness probe (1s timeout) then killed the container mid-story (exit 137, twice within 10 min of the first real drain), which reset the engine, so every following pass started cold and tripit's 120s synthesis budget could never be met — the queue would never drain. TCP probes keep the meaning that matters: uvicorn binds 8004 only after the model finishes loading in the lifespan hook, so readiness still gates 'model loaded', while a GPU-busy server is left alive. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 20:57:28 +00:00
Viktor Barzin	30ff8f2db3	ci: diff changed stacks against CI_PREV_COMMIT_SHA, not HEAD~1 All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details HEAD~1 on a merge commit is the feature-branch parent, so the changed-stack detection diffed the WRONG side and silently skipped the stacks the push actually changed — pipeline 128 'succeeded' without applying the new ci-pipeline-health stack. Use the push's true before-state (CI_PREV_COMMIT_SHA) when it resolves, HEAD~1 as fallback (first build / shallow edge cases). Also touches the ci-pipeline-health stack so THIS push applies it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 20:50:43 +00:00
Viktor Barzin	fb8b6aa2f3	Merge remote-tracking branch 'forgejo/master' into wizard/ci-pipeline-health All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details	2026-06-12 20:45:30 +00:00
Viktor Barzin	d02ca4f2db	ci-pipeline-health: daily sweep of the off-infra CI chain (ADR-0002) Viktor asked to monitor the pipelines closely as builds move off-infra (PRD infra#10). New aux stack: daily 07:30 UTC CronJob on the claude-agent-service image running a deterministic shell sweep — GitHub Actions failures/stuck runs across owned repos, Woodpecker pipeline failures, GHA free-tier minutes burn. Healthy = one quiet Slack line; issues = Slack alert + comment on infra#10. In-cluster (not a cloud routine) because Vault + the Woodpecker token are LAN-only. Secrets via ExternalSecret (github_pat deliberately, not the ghcr_pull_token alias — a scoped packages-only rotation couldn't read Actions runs). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 20:45:28 +00:00

1 2 3 4 5 ...

1401 commits