infra

Author	SHA1	Message	Date
Viktor Barzin	214638216b	fix(anisette): wait_for_rollout=false so a slow first start can't strand the deploy out of state All checks were successful ci/woodpecker/push/default Pipeline was successful Details The docker.io fix created the deployment, but wait_for_rollout (default true) then hung on the OOMing pod and the apply failed — leaving the deployment in the cluster but NOT in terraform state, so every later apply hit 'deployments.apps "anisette" already exists'. Deleted that orphan and set wait_for_rollout=false (mirrors tts/llama-cpp slow-start services); readiness probe still gates Service traffic. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 20:56:30 +00:00
Viktor Barzin	d8c60d7ab8	t3-afk: dedicated in-cluster T3 Code instance (AFK executor + cockpit) All checks were successful ci/woodpecker/push/default Pipeline was successful Details Slice #2 of claude-agent-service PRD #1 (AFK implementation pipeline). Dedicated in-cluster T3 Code instance the control plane dispatches issues into; runs the issue-implementer agent in a git worktree with a live cockpit. Applied + live 2026-06-14 (9 resources). Pilot-fast: stock docker.io/library/node:24 + install pinned t3@0.0.27 + Claude CLI at startup onto an SSD-NFS PVC. Authentik-gated ingress. issue-implementer behaviour ships as a user-level ~/.claude/CLAUDE.md (T3 hardcodes the system prompt; settingSources loads it) and forbids plan-mode/clarifying-questions so unattended threads don't stall. Keel-excluded (ADR 0003). wait_for_rollout=false (slow first start). Image fully-qualified for the Kyverno trusted-registries allowlist; container mem limit 4Gi (tier-aux LimitRange cap). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 20:06:33 +00:00
Viktor Barzin	bc7b28244f	fix(anisette): raise memory limit to 512Mi — 128Mi OOMKilled at startup Some checks failed ci/woodpecker/push/default Pipeline failed Details The pod CrashLooped with OOMKilled (exit 137): anisette downloads and initializes Apple's CoreADI provisioning library on startup, spiking past the 128Mi limit before it can bind :6969 (empty logs, liveness 'connection refused'). Bump request 256Mi / limit 512Mi; steady state is much lower. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 19:54:13 +00:00
Viktor Barzin	96addf65b4	fix(anisette): docker.io/ image prefix to pass Kyverno require-trusted-registries Some checks failed ci/woodpecker/push/default Pipeline was canceled Details First apply was denied at admission — a bare dadoum/anisette-v3-server@sha256 ref isn't in the trusted-registries allowlist (only enumerated DockerHub user-repo prefixes are). docker.io/* IS allowlisted, so use the explicit registry prefix; still pulls via the 10.0.20.10 pull-through cache. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 19:47:05 +00:00
Viktor Barzin	0bfa6f0774	feat(anisette): self-hosted Apple anisette server for SideStore (infra #40 ) Some checks failed ci/woodpecker/push/default Pipeline failed Details Deploy a small stateless anisette-data server so the TripIt iOS Shell can be sideloaded with SideStore using a free Apple ID, without brokering the Apple-ID auth dance through a public third-party anisette server (which would see every login). SideStore points at a stable internal endpoint we control. - Image: Dadoum/anisette-v3-server, the de-facto standard anisette-v3 server for SideStore/AltStore. Upstream ships only a mutable :latest (no GitHub releases / semver / sha tags), so pinned by manifest digest instead of a tag per the "never :latest" rule. Pulled from DockerHub via the registry-VM pull-through cache like echo/cyberchef. Diun watches :latest (notify-only) so a new upstream build prompts a digest re-pin. - Stateless: emptyDir backs the provisioning-library cache dir (regenerable download; upstream issue #23 means it doesn't preserve client auth across restarts anyway) — no PVC, no Vault secret. - Internal-only endpoint http://anisette.viktorbarzin.lan (auth=none, allow_local_access_only, ssl_redirect off) — SideStore is a native client that can't do the Authentik cookie dance, same reasoning as android-emulator's adb. The .lan CNAME is auto-created by technitium-ingress-dns-sync; never publicly exposed. Mirrors the echo/networking-toolbox/android-emulator stack pattern. Service catalog updated. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 19:35:57 +00:00
Viktor Barzin	fe1f8d62e7	tripit: re-apply tripit stack to land CITY_IMAGE_PROVIDER=wikipedia All checks were successful ci/woodpecker/push/default Pipeline was successful Details The commit that enabled real city cover photos (`a69847a0`, CITY_IMAGE_PROVIDER=wikipedia, #47) was committed to master but its CI run skipped the tripit stack apply (changed-stack diff race — same class as the prior "re-apply after pipeline race" fixes). The env never landed in-cluster, so the provider stayed on its fake 1x1-PNG default and every trip/stay cover rendered blank/placeholder in prod. This comment touch forces CI to re-apply the tripit stack; terraform then reconciles the drift (desired HCL already has the env) so the deployment picks up CITY_IMAGE_PROVIDER=wikipedia. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 17:45:07 +00:00
Viktor Barzin	2df6ebf305	health: fix middleware ref namespace prefix (restore site from 404) Some checks failed ci/woodpecker/push/default Pipeline was canceled Details My previous commit referenced the new limiter as `health-rate-limit@kubernetescrd`, omitting the namespace prefix. Traefik CRD middleware refs are `<namespace>-<name>@kubernetescrd`, and the Middleware lives in the `traefik` ns, so the router couldn't resolve it — Traefik failed the whole health.viktorbarzin.me router and returned 404 on every path (the app + pod were healthy throughout; verified via port-forward). Correct it to `traefik-health-rate-limit@kubernetescrd`, matching the working traefik-tripit-rate-limit / traefik-actualbudget-rate-limit references. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 17:43:08 +00:00
Viktor Barzin	086ff85911	health: dedicated 100/1000 rate limit for the redesigned SPA Some checks failed ci/woodpecker/push/default Pipeline failed Details Viktor hit 429s browsing the redesigned health app. The default shared limiter is 10 req/s / burst 50, but each page load is the shell (JS chunks + two self-hosted Geist woff2) plus a 5-8 call API burst, so fast tab-to-tab navigation from one client IP overruns burst 50 — Traefik 429s the tail and the affected cards/pages render empty. Give health its own limiter (average 100, burst 1000) and skip the default, exactly as tripit/immich/actualbudget/ha-sofia already do for the same parallel-burst pattern. Attached via the ingress_factory escape hatch (skip_default_rate_limit + extra_middlewares). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 13:03:51 +00:00
Viktor Barzin	6dc77f4612	uptime-kuma: add CONTEXT.md + ADR-0001 (intentionally lean; sizing/placement review) All checks were successful ci/woodpecker/push/default Pipeline was successful Details Documents the 2026-06-13 right-sizing review: Kuma is already lean (~1 check/s, 227 monitors mostly at 300s, 77MB on shared MySQL, 30d retention); the 'scraping too much' concern traced to a fixed socket.io login-timeout incident, not load. Records the deliberate decisions (keep per-service [External] monitors over canaries; keep datastore on shared mysql.dbaas) with rejected alternatives + rationale, plus the known internal-sync no-prune gap (stale Goldilocks monitor cleaned up by hand). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 09:11:22 +00:00
Viktor Barzin	05bec26d09	health: internal test-access ingress + DEV_AUTH_EMAIL (ADR-0008) Some checks failed ci/woodpecker/push/default Pipeline was canceled Details Add health-test.viktorbarzin.lan (auth=none, allow_local_access_only, anti-AI off) pointing at the same health deployment, plus a DEV_AUTH_EMAIL=vbarzin@gmail.com env on the container. Lets automated E2E / Playwright / manual screenshots reach the live app without the Authentik SSO redirect, for testing — while the public health.viktorbarzin.me ingress stays auth=required (forward-auth fails closed, so the public path always carries the real X-authentik-email header and never hits the DEV_AUTH_EMAIL fallback). LAN-only, no public exposure. Decision recorded in health repo ADR-0008. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-14 04:02:34 +00:00
Viktor Barzin	e6699ed20b	uptime-kuma: retry Kuma login in monitor-sync jobs (intermittent socket.io timeout) All checks were successful ci/woodpecker/push/default Pipeline was successful Details The internal + external monitor-sync CronJobs intermittently failed with socketio.exceptions.TimeoutError on api.login(), firing JobFailed -> Slack noise (and leaving monitor sync stale). Kuma 2.3.2 itself is healthy (1/1, 30m CPU); its single Node event loop just briefly stalls under ~300 monitors so the socket.io login handshake occasionally exceeds the client timeout. Wrap connect+login in a 5-attempt / 15s-backoff retry (disconnecting the half-open client between tries) so a transient stall no longer fails the whole job. Applied to both sync scripts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 20:54:14 +00:00
Viktor Barzin	a6381b8cf8	forgejo: custom 8Gi ResourceQuota (was pegged at the 4Gi tier cap) Some checks failed ci/woodpecker/push/default Pipeline failed Details Yesterday's Forgejo 3Gi->4Gi OOM fix pushed its tier-3-edge namespace quota (requests.memory=4Gi) to 100%, firing KubeQuotaAlmostFull + the healthcheck resourcequota check. Forgejo is the git + OCI-registry backbone and legitimately needs ~4Gi, so the edge tier's 4Gi ceiling is too tight. Opt the namespace out of the auto tier quota (resource-governance/custom-quota=true) and define a forgejo-specific ResourceQuota at requests.memory=8Gi, so the 4Gi pod sits at ~50% with headroom. Same opt-out pattern dbaas uses. Re-tiering was rejected: tier 1-cluster is also 4Gi, and 0-core (8Gi) would over-classify Forgejo's priority/eviction. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 17:16:47 +00:00
Viktor Barzin	25a39fd54e	k8s-portal: wire private-ghcr pull (allowlist + imagePullSecrets) All checks were successful ci/woodpecker/push/default Pipeline was successful Details k8s-portal was the last in-cluster image build; it now builds on GHA and pushes ghcr.io/viktorbarzin/k8s-portal:latest, which is PRIVATE (infra repo default). To pull it: add k8s-portal to the sync-ghcr-credentials Kyverno allowlist (clones the ghcr-credentials Secret into the namespace) and reference that secret via imagePullSecrets on the deployment — same wiring as tripit/recruiter-responder. Completes the no-local-builds migration so nothing builds container images on the cluster anymore (ADR-0002). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 15:38:42 +00:00
Viktor Barzin	a7d33abec9	k8s-portal: commit package.json + lock (force; was gitignored) — unblocks GHA build Some checks failed ci/woodpecker/push/default Pipeline was successful Details Build k8s-portal / build (push) Has been cancelled Details Recovered the real manifest + resolved lockfile (lockfileVersion 3, 71 pkgs) from the running pod. A parent .gitignore force-ignored package.json, so the git source tree was incomplete and the image only ever built manually. Now reproducible on GHA (ADR-0002 no-local-builds). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 15:29:27 +00:00
Viktor Barzin	a9b08c03cf	fix(k8s-portal): npm install (no committed lockfile) so GHA can build Some checks are pending Build k8s-portal / build (push) Waiting to run Details ci/woodpecker/push/default Pipeline was successful Details package-lock.json was never committed to either lineage — npm ci needs it, so the build only ever worked from a manual devvm build with a local lock. npm install resolves from package.json, unblocking the GHA build (ADR-0002). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 15:26:42 +00:00
Viktor Barzin	b906f61ac3	k8s-portal: build off-infra GHA -> ghcr + Keel; remove Woodpecker build (no-local-builds) Some checks failed ci/woodpecker/push/default Pipeline was canceled Details The last in-cluster image build. GHA build-k8s-portal.yml builds ghcr.io/viktorbarzin/k8s-portal:latest+sha (path-filtered on the Dockerfile dir); Keel (force/poll/match-tag) rolls the deployment. Stack image repointed to ghcr (ignore_changed); .woodpecker/k8s-portal.yml deleted. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 15:21:35 +00:00
Viktor Barzin	9501da81a0	dbaas: document postgresql-backup startingDeadlineSeconds rationale All checks were successful ci/woodpecker/push/default Pipeline was successful Details Inline note on why the four backup CronJobs moved 10s->600s (`bda1bdcb`): a 10s deadline silently dropped the 2026-06-13 midnight full-backup run, firing PostgreSQLBackupStale. `bda1bdcb` rode in the same push as a forgejo change that failed CI on a namespace-quota error, so that pipeline failed before the dbaas apply took effect (live deadline was still 10s). This dbaas-only commit re-triggers the dbaas apply at a clean master so the 600s deadline actually goes live. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 14:22:24 +00:00
Viktor Barzin	ba72621e52	forgejo: 6Gi exceeded namespace quota, set to 4Gi (quota ceiling) Some checks failed ci/woodpecker/push/default Pipeline was canceled Details The 3Gi->6Gi bump in `ff3cc44a` was rejected by the forgejo namespace tier-quota (requests.memory capped at 4Gi). With Guaranteed QoS the 6Gi request exceeded quota; FailedCreate left forgejo with 0 pods for ~6 min (git remote + OCI registry outage) until I patched the live Deployment back to a schedulable 4Gi. 4Gi is the most the quota allows and is still a headroom bump over the OOM-prone 3Gi. To go higher the tier-quota must be raised in the same change. This reconciles TF to the live 4Gi so the pending/next apply is a no-op rather than reverting to the quota-busting 6Gi. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 14:13:36 +00:00
Viktor Barzin	ff3cc44a29	forgejo: raise memory limit from 3Gi to 6Gi (OOMKilled at 3Gi) Some checks failed ci/woodpecker/push/default Pipeline failed Details Forgejo OOMKilled twice on 2026-06-13 at the 3Gi cap (exit 137), briefly taking the git remote and OCI registry down and spiking ingress TTFB to 4.7s and the 4xx rate to 51%. Steady-state is ~2.2Gi but it spiked into the cap (true demand above 3.2Gi). The 2026-06-09 bump to 3Gi was sized for tripit buildkit registry pushes, but that driver is gone now that the Forgejo registry was frozen and emptied today (ADR-0002, images on ghcr), so the spike is git ops / the integrity-probe catalog walk / a possible leak. 6Gi gives headroom on the critical git backbone while we watch whether working-set keeps climbing (which would indicate a leak). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 14:02:55 +00:00
Viktor Barzin	bda1bdcbf3	dbaas: widen backup CronJob startingDeadlineSeconds from 10s to 600s The daily full PostgreSQL backup silently skipped its 2026-06-13 00:00 run, leaving the last full dump 37h old and firing the critical PostgreSQLBackupStale alert. Root cause: startingDeadlineSeconds was 10s on all four dbaas backup CronJobs, so when the CronJob controller was more than 10s late to the midnight tick (many IO-heavy backups all fire at 00:00, the known etcd-starvation window) the run was dropped entirely instead of starting late. 600s lets a brief controller lag still launch the job. Applied to all four (mysql + pg, full + per-db) since they share the footgun and the midnight contention. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 14:02:54 +00:00
Viktor Barzin	3e82c64a76	docs: sync CI/CD docs to ADR-0002 final state (ghcr + Woodpecker deploy-only) [ci skip] ADR-0002 is fully landed (issues #11-#32 closed): every owned image now builds on GitHub Actions and pushes to ghcr.io/viktorbarzin/<name>, with Woodpecker reduced to deploy-only. The Forgejo container registry is frozen and emptied; there are no in-cluster image builds or CI test runs anywhere. The docs still described the old hybrid topology (DockerHub builds, Woodpecker-native owned-app builds, the per-pattern migration lists, the tripit-only pilot framing), which would mislead future sessions and incident response. This brings the docs to the completed reality (closes #33): - docs/architecture/ci-cd.md: full rewrite as the canonical CI/CD reference — the fleet GHA->ghcr->Woodpecker-deploy pattern, public/private ghcr package split, infra-owned image workflows (incl. infra-ci on ghcr), the frozen Forgejo registry, what Woodpecker still runs, and the #31 decommissions. - .claude/CLAUDE.md: rewrite the "CI/CD Architecture" section to the fleet-wide final state; FIX the stale claim that claude-memory-mcp builds to DockerHub (it is GHA->ghcr); note owned images now live on ghcr and the Forgejo registry is frozen/break-glass near the image-registry bullet. - .claude/reference/service-catalog.md: f1-stream is GHA->ghcr + Woodpecker deploy-only (was "Woodpecker-native build->deploy"). - stacks/{tuya-bridge,android-emulator}/variables.tf + stacks/terminal/main.tf: cosmetic description/comment updates (forgejo -> ghcr; terminal-lobby has no CI pipeline). Description/comment text only — no stack logic changed. Historical records (docs/post-mortems/, docs/plans/) and ADR-0002 itself are left untouched as point-in-time records. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 12:55:49 +00:00
Viktor Barzin	6e4db0ddc6	openclaw + f1-stream: last forgejo image refs -> ghcr (ADR-0002 #32 prep) All checks were successful ci/woodpecker/push/default Pipeline was successful Details openclaw's install-nextcloud-todos-plugin init still pulled forgejo nextcloud-todos (would ImagePullBackOff on restart once the forgejo registry is wiped) -> ghcr:latest. f1-stream stack base (KEEL_IGNORE'd, live already ghcr via set-image) repointed for fresh-create correctness. Clears the last LIVE forgejo viktor/* refs before the registry reclaim. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 12:36:10 +00:00
Viktor Barzin	eb8b550521	chrome-service: TF-manage novnc image (ghcr:latest), drop its KEEL_IGNORE (ADR-0002 #29 ) All checks were successful ci/woodpecker/push/default Pipeline was successful Details novnc's image was ignore_changed (KEEL_IGNORE) but nothing manages its tag (keel.sh/policy=never), so the earlier forgejo->ghcr repoint never took. Removing container[1].image from ignore_changes lets terragrunt own novnc=ghcr:latest and roll it. container[0]/[2] (pinned playwright) stay ignored. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 09:49:58 +00:00
Viktor Barzin	94a3d1b870	chrome-service-novnc + android-emulator images -> ghcr (ADR-0002 #29/#30) All checks were successful ci/woodpecker/push/default Pipeline was successful Details Both now built by GHA → public ghcr. Repoint stack image bases forgejo→ghcr:latest (terragrunt-managed, imagePullPolicy Always picks up rebuilds). android var default api36-v8 -> latest. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 09:43:40 +00:00
Viktor Barzin	a69847a0f3	tripit: enable Wikipedia city cover photos (CITY_IMAGE_PROVIDER=wikipedia, #47 ) Flips the planning workspace's Stay cover photos from the fake provider to live Wikipedia lead-image fetches (downloaded into STORAGE_DIR, served by the backend, editable per Stay). Part of the new-trip flow feature: every picked destination city gets a banner-ready cover. HOLD-ORDER: pushed only after the tripit image containing CityImageMode.wikipedia rolled out. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-13 09:43:40 +00:00
Viktor Barzin	f61d707d75	travel_blog: remove decommissioned stack (ADR-0002 infra#31) All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details Service was already scaled 0/0 and unused (Viktor: 'not used anymore'). Live resources destroyed via scripts/tg destroy (10 resources: deployment, namespace, service, anubis-travel + PDB/cm/svc/secret, ingress, TLS). Removing the stack dir; old Woodpecker build (repo 5) deactivated separately. The harmless legacy 'travel' CNAME->apex in config.tfvars is left (now 404s; removing it would trigger a full-platform apply). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 09:32:39 +00:00
Viktor Barzin	90fb0685ae	traefik: x402-gateway image forgejo -> ghcr + KEEL_IGNORE_IMAGE (ADR-0002 infra#28) All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details Formalizing x402-gateway CI (was a manual no-CI image). The deployment lives in the traefik module; its image was NOT in ignore_changes, so a set-image deploy would be reverted on the next traefik apply — added it (KEEL_IGNORE_IMAGE). Base repointed to ghcr:latest; the GHA deploy set-images the :sha8. Public ghcr package = no pull secret. Inert on the live pod (image now ignored); rolling cutover keeps forwardAuth up. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 02:42:45 +00:00
Viktor Barzin	3960eac716	claude-memory: image base forgejo -> ghcr (ADR-0002 infra#20) Some checks failed ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was canceled Details GHA now builds+pushes ghcr.io/viktorbarzin/claude-memory-mcp (public). Image is KEEL_IGNORE_IMAGE (set-image managed), so this apply is inert on the live pod; the stale :17 default literal is corrected to :latest. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 02:34:20 +00:00
Viktor Barzin	2f3c58dff1	claude-agent-service image -> ghcr across all five consumer stacks (infra#19) All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details GHA now builds+pushes ghcr.io/viktorbarzin/claude-agent-service (public package, anonymous pulls). Repointed: claude-agent-service (deployment + git-init/seed-beads-agent inits), claude-breakglass, ci-pipeline-health, beads-server CronJobs, k8s-version-upgrade (tag var 2fd7670d -> latest — the Forgejo registry lost that sha; node caches were the only thing keeping those CronJobs alive). publish-gate: vendor-contact emails (licensing@/legal@/security@/sales@) ruled license-boilerplate, not PII. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 01:47:54 +00:00
Viktor Barzin	8aba3a0179	offinfra-onboard --no-deploy; wealthfolio-sync image -> ghcr (ADR-0002 infra#25) All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details broker-sync is a CronJob-only consumer (no deployment): new --no-deploy mode skips Woodpecker registration and renders build.yml without the deploy job — :latest+Always CronJobs pick up builds on the next run. wealthfolio stack: ghcr-credentials pull secret + image base repoint. The wealthfolio-sync image regains a reproducible rebuild path. Closes: code-62tm Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 01:39:35 +00:00
Viktor Barzin	2dde480795	openclaw: install-recruiter-plugin init image forgejo -> ghcr :latest (infra#27) All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details Second half of the recruiter-responder off-infra migration: the first GHA build has published ghcr.io/viktorbarzin/recruiter-responder:{1d99a8d5,latest}, so the openclaw plugin-install init container can now follow the ghcr :latest. The forgejo-side build pipeline was removed by the onboarding commit, so the old forgejo :latest tag is frozen and would silently serve stale plugin code. Deferred from the first commit on purpose - flipping it before the package existed would have wedged the openclaw rollout on ImagePullBackOff. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:57:30 +00:00
Viktor Barzin	57ff41e47e	recruiter-responder: pull image from ghcr + ghcr-credentials on all consumers (ADR-0002, infra#27) All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details Migrating recruiter-responder off in-cluster Woodpecker builds: GHA will build and push ghcr.io/viktorbarzin/recruiter-responder (PRIVATE package). This commit lands the pull-side prerequisites BEFORE the first off-infra build fires: - stacks/recruiter-responder: image base forgejo -> ghcr (inert on the live Deployment - both containers are ignore_changes'd; the Woodpecker deploy moves the tag) + ghcr-credentials imagePullSecrets on the Deployment (covers the recruiter-responder container AND the alembic-migrate init container, which share the image). - stacks/openclaw: ghcr-credentials imagePullSecrets on the openclaw Deployment - its install-recruiter-plugin init container consumes the :latest tag of this image. The image ref itself flips to ghcr in a follow-up once the first GHA build has created the package (flipping now would ImagePullBackOff on a not-yet-existing package and wedge the apply). - stacks/kyverno: allowlist openclaw in sync-ghcr-credentials so the pull secret is cloned into that namespace too. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:43:35 +00:00
Viktor Barzin	c594274c83	ci: re-apply fire-planner stack after pipeline race All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details Comment-only touch so the changed-stack detection applies stacks/fire-planner from the current master tree. Pipeline 150 (commit `f18dfa4c` — the ghcr image base + ghcr-credentials migration for issue #26) was auto-killed when the concurrent nextcloud-todos push superseded it, and pipeline 151 diffed from `f18dfa4c` onward so the fire-planner stack changes were never applied (cronjobs still point at the forgejo image, pod specs lack ghcr-credentials). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:41:20 +00:00
Viktor Barzin	a264a19629	Merge remote-tracking branch 'forgejo/master' into wizard/nextcloud-todos-ghcr All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details	2026-06-13 00:38:27 +00:00
Viktor Barzin	d5c328d23c	nextcloud-todos: image base forgejo -> ghcr (ADR-0002, infra#18) The nextcloud-todos build moved off-infra: GHA builds on the public GitHub mirror and pushes ghcr.io/viktorbarzin/nextcloud-todos (public package, anonymous pulls); Woodpecker repo 207 is deploy-only. First ghcr image (:19c22d8c) is already built, deployed and rolled out, so this repoint lands after the image exists. Both deployment image refs (main + alembic-migrate init) are ignore_changes'd — no live churn, the base matters only on resource (re)create. Old image was pulled from a Forgejo registry package that no longer exists (pods survived on node image cache only). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:38:25 +00:00
Viktor Barzin	f18dfa4c8b	fire-planner: pull image from ghcr + add ghcr-credentials to all pod specs Some checks failed ci/woodpecker/push/build-cli Pipeline was canceled Details ci/woodpecker/push/default Pipeline was canceled Details Migrating fire-planner off in-cluster Woodpecker builds to GitHub Actions -> ghcr.io (ADR-0002, issue #26). The image base moves forgejo.viktorbarzin.me/viktor/fire-planner -> ghcr.io/viktorbarzin/fire-planner (a PRIVATE ghcr package), so the deployment, all three cronjobs (recompute, col-refresh, examples-weekly) and the examples bulk job gain the ghcr-credentials imagePullSecret (the kyverno sync-ghcr-credentials allowlist already covers the fire-planner namespace). registry-credentials stays alongside so the currently-running sha-pinned forgejo image can still be pulled until the first ghcr deploy lands; the cronjob images are TF literals and flip to ghcr :latest on this apply. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:38:09 +00:00
Viktor Barzin	cdd60d9078	ci: re-apply instagram-poster + payslip-ingest stacks after pipeline race All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details Comment-only touch of both stacks so the changed-stack detection applies them from the current master tree. Two pipelines went wrong in sequence during the parallel ADR-0002 wave-2 migrations (issues #23/#24): - pipeline 146 (instagram-poster stack prep, commit `29c69250`) was auto-killed when the concurrent payslip-ingest push superseded it, so its apply never ran; - restarting it as pipeline 148 inherited CI_PREV_COMMIT_SHA = the NEW branch head (`6928ce0b`) with the OLD checkout (`29c69250`) — a reverse diff that re-applied stacks/payslip-ingest from the pre-migration tree, stripping the ghcr image base + ghcr-credentials pull secrets that pipeline 147 had just applied (2 resources reverted). This commit restores the committed payslip-ingest config exactly as issue #24 landed it and finally applies the instagram-poster ghcr prep from issue #23. Lesson encoded in the comments: do not restart killed infra pipelines after master has moved — re-trigger with a touch commit instead. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:11:17 +00:00
Viktor Barzin	6928ce0be5	Merge remote-tracking branch 'forgejo/master' into wizard/payslip-ingest-ghcr All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details	2026-06-13 00:03:29 +00:00
Viktor Barzin	5d236c2352	payslip-ingest: image base forgejo -> ghcr, ghcr-credentials pull secret, cron to :latest+Always Prep for moving payslip-ingest's image build off-infra to GitHub Actions -> ghcr.io (ADR-0002 wave 2, issue #24). One stack commit before onboarding: - image base repointed forgejo.viktorbarzin.me/viktor/payslip-ingest -> ghcr.io/viktorbarzin/payslip-ingest (private ghcr package) - ghcr-credentials imagePullSecrets added on the Deployment AND the actualbudget-payroll-sync CronJob pod specs (namespace is already in the kyverno sync-ghcr-credentials allowlist; secret verified present) - the CronJob's SHA pin is retired: terragrunt image_tag 4f70681d -> latest plus explicit imagePullPolicy Always on the cron container, per the fleet convention for owned-app CronJobs — one less set-image target, and the cron can never go back to pulling the dead Forgejo tag The Deployment keeps KEEL_IGNORE_IMAGE; its concrete :sha8 tag is set by the Woodpecker deploy pipeline after each GHA build. Closes: nothing yet — the repo-side onboarding (offinfra-onboard) follows. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:03:11 +00:00
Viktor Barzin	29c6925031	instagram-poster: image base forgejo->ghcr + ghcr-credentials pull secret All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details Prep for migrating instagram-poster off in-cluster Woodpecker builds to GitHub Actions -> ghcr.io (ADR-0002, issue #23, PRIVATE-repo path). Viktor asked for the wave-2 migration of instagram-poster per the wave-1 retro recipe: before onboarding, the stack must (a) carry the ghcr-credentials imagePullSecret on the Deployment so the cluster can pull the private ghcr image, and (b) repoint the image base from forgejo.viktorbarzin.me/viktor to ghcr.io/viktorbarzin. The Deployment image is KEEL_IGNORE_IMAGE (ignore_changes), so this apply does NOT roll the pod to a not-yet-existing ghcr image — the live forgejo-built :da5b4191 keeps running until the first GHA build POSTs the Woodpecker deploy. The three CronJobs run curlimages/curl (public DockerHub), not the app image, so they need neither the pull secret nor a repoint. registry-credentials stays for the transition window. Closes: nothing (stack prep only; repo onboarding follows) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 00:02:04 +00:00
Viktor Barzin	72b5843e4b	publish-gate: exclude package-lock + beads tracker from email heuristic; beadboard image base -> ghcr All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details infra#17: the gate flagged npm deprecation boilerplate (package-lock.json escapes the *.lock filter) and the upstream fork author's email in tracked .beads data — both already-public upstream content, ruled false positives. Lock files excluded properly; .beads moved to the eyeball inventory. beads-server stack: beadboard image base repointed (deployment image is KEEL-ignored; no CronJobs use it). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:52:07 +00:00
Viktor Barzin	57ffd0ed8d	Merge remote-tracking branch 'forgejo/master' into wizard/freedify-mig All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details	2026-06-12 23:37:19 +00:00
Viktor Barzin	c16fe56180	freedify: image base forgejo registry -> ghcr (ADR-0002) Freedify builds moved off-infra per issue #22: GitHub Actions on the ViktorBarzin/freedify mirror now builds and pushes the public image ghcr.io/viktorbarzin/freedify, and the Woodpecker deploy pipeline (repo 202) rolls :sha8 via kubectl set image. Both factory deployments (music-viktor, music-emo) now seed from ghcr instead of the retired in-cluster Forgejo build, and the container image joins lifecycle ignore_changes (KEEL_IGNORE_IMAGE) so terraform applies do not revert the deployed :sha8. Landed after the first GHA push so ghcr :latest already existed when this repoint applied. Public package - no pull secret needed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:37:10 +00:00
Viktor Barzin	9f742b544c	kms: image base forgejo registry -> ghcr (ADR-0002 infra#21) All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details kms-website moves off in-cluster Woodpecker builds to GHA -> ghcr. The kms-web-page deployment image is ignore_changes'd (CI sets the live tag), so this repoint only governs future creates; package is PUBLIC so no pull secret is wired. No CronJobs in this stack. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:30:07 +00:00
Viktor Barzin	fb88440ec4	ci-pipeline-health: billing moved to the enhanced usage endpoint All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details The legacy /settings/billing/actions endpoint now returns 410; sum Minutes usageItems from /settings/billing/usage instead (found during the infra#16 retro: June-to-date = 420/2000). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:24:18 +00:00
Viktor Barzin	12bdd06f74	kyverno: force_new on sync-ghcr-credentials — generate rules are immutable All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details Pipeline 138: the validate-policy webhook denies in-place edits of a generate rule (allowlist additions). force_new = delete+recreate; generated secrets survive and generateExisting re-adopts. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:18:15 +00:00
Viktor Barzin	6b0d42c7bc	publish-gate + tuya-bridge ghcr cutover prep (ADR-0002 infra#15) Some checks failed ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline failed Details publish-gate: gitleaks + trufflehog (full history) + PII heuristics; CLEAN verdict gates any public flip, DIRTY = stays private. tuya-bridge: ghcr-credentials pull secret + image base -> ghcr; namespace added to the ghcr-credentials allowlist as a safety net (new ghcr packages default PRIVATE even from public repos — prune after visibility flip). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:12:02 +00:00
Viktor Barzin	54dfaf6edc	job-hunter: image base forgejo registry -> ghcr (ADR-0002) All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details CronJobs track :latest via the TF literal (unlike the ignore_changes'd deployment), so they kept pulling the dead Forgejo image after the GHA/ghcr cutover — repoint the stack's image base. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 23:06:54 +00:00
Viktor Barzin	1c41781996	job-hunter: ghcr-credentials pull secret on deployment + CronJobs All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details ADR-0002 wave 1 (infra#14): job-hunter's image moves to private ghcr; the deployment AND both :latest CronJobs need the Kyverno-cloned pull secret. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 22:56:48 +00:00
Viktor Barzin	baff3d7477	offinfra-onboard: per-repo GHA->ghcr migration tool + f1-stream ghcr pull secret All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details ADR-0002 tracer bullet (infra#13), per Viktor's go-ahead. Idempotent script: GitHub mirror repo (create/unarchive/visibility), GHA secrets via gh, Forgejo push-mirror (sync_on_commit) + initial sync, Woodpecker mirror registration, renders build.yml/deploy.yml from templates (single-manifest provenance:false, svu semver to Forgejo, ghcr keep-10 retention, Slack notify-failure, manual-event deploy), removes the old in-cluster build pipeline, commits on the Canonical side. f1-stream stack gains the ghcr-credentials imagePullSecret (first consumer). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 22:21:22 +00:00

1 2 3 4 5 ...

1411 commits