Commit graph

4295 commits

Author SHA1 Message Date
Viktor Barzin
086ff85911 health: dedicated 100/1000 rate limit for the redesigned SPA
Some checks failed
ci/woodpecker/push/default Pipeline failed
Viktor hit 429s browsing the redesigned health app. The default shared limiter
is 10 req/s / burst 50, but each page load is the shell (JS chunks + two
self-hosted Geist woff2) plus a 5-8 call API burst, so fast tab-to-tab
navigation from one client IP overruns burst 50 — Traefik 429s the tail and the
affected cards/pages render empty.

Give health its own limiter (average 100, burst 1000) and skip the default,
exactly as tripit/immich/actualbudget/ha-sofia already do for the same
parallel-burst pattern. Attached via the ingress_factory escape hatch
(skip_default_rate_limit + extra_middlewares).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 13:03:51 +00:00
Viktor Barzin
6dc77f4612 uptime-kuma: add CONTEXT.md + ADR-0001 (intentionally lean; sizing/placement review)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Documents the 2026-06-13 right-sizing review: Kuma is already lean (~1 check/s, 227 monitors mostly at 300s, 77MB on shared MySQL, 30d retention); the 'scraping too much' concern traced to a fixed socket.io login-timeout incident, not load. Records the deliberate decisions (keep per-service [External] monitors over canaries; keep datastore on shared mysql.dbaas) with rejected alternatives + rationale, plus the known internal-sync no-prune gap (stale Goldilocks monitor cleaned up by hand).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 09:11:22 +00:00
Viktor Barzin
05bec26d09 health: internal test-access ingress + DEV_AUTH_EMAIL (ADR-0008)
Some checks failed
ci/woodpecker/push/default Pipeline was canceled
Add health-test.viktorbarzin.lan (auth=none, allow_local_access_only,
anti-AI off) pointing at the same health deployment, plus a
DEV_AUTH_EMAIL=vbarzin@gmail.com env on the container. Lets automated
E2E / Playwright / manual screenshots reach the live app without the
Authentik SSO redirect, for testing — while the public
health.viktorbarzin.me ingress stays auth=required (forward-auth fails
closed, so the public path always carries the real X-authentik-email
header and never hits the DEV_AUTH_EMAIL fallback). LAN-only, no public
exposure. Decision recorded in health repo ADR-0008.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-14 04:02:34 +00:00
Viktor Barzin
e6699ed20b uptime-kuma: retry Kuma login in monitor-sync jobs (intermittent socket.io timeout)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The internal + external monitor-sync CronJobs intermittently failed with socketio.exceptions.TimeoutError on api.login(), firing JobFailed -> Slack noise (and leaving monitor sync stale). Kuma 2.3.2 itself is healthy (1/1, 30m CPU); its single Node event loop just briefly stalls under ~300 monitors so the socket.io login handshake occasionally exceeds the client timeout. Wrap connect+login in a 5-attempt / 15s-backoff retry (disconnecting the half-open client between tries) so a transient stall no longer fails the whole job. Applied to both sync scripts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 20:54:14 +00:00
Viktor Barzin
a6381b8cf8 forgejo: custom 8Gi ResourceQuota (was pegged at the 4Gi tier cap)
Some checks failed
ci/woodpecker/push/default Pipeline failed
Yesterday's Forgejo 3Gi->4Gi OOM fix pushed its tier-3-edge namespace quota (requests.memory=4Gi) to 100%, firing KubeQuotaAlmostFull + the healthcheck resourcequota check. Forgejo is the git + OCI-registry backbone and legitimately needs ~4Gi, so the edge tier's 4Gi ceiling is too tight. Opt the namespace out of the auto tier quota (resource-governance/custom-quota=true) and define a forgejo-specific ResourceQuota at requests.memory=8Gi, so the 4Gi pod sits at ~50% with headroom. Same opt-out pattern dbaas uses. Re-tiering was rejected: tier 1-cluster is also 4Gi, and 0-core (8Gi) would over-classify Forgejo's priority/eviction.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 17:16:47 +00:00
Viktor Barzin
72982683bc docs(CLAUDE.md): k8s-portal now GHA->ghcr, not a Woodpecker build
All checks were successful
ci/woodpecker/push/default Pipeline was successful
k8s-portal was the last in-cluster image builder. Its .woodpecker/k8s-portal.yml
was deleted; it now builds on GHA (build-k8s-portal.yml) -> PRIVATE ghcr, pulled
via the Kyverno ghcr-credentials allowlist and deployed by Keel. Fix the CI/CD
section: drop k8s-portal from the Woodpecker-pipelines list (stale), move it from
'already on GHA' to the infra-owned private-ghcr images, and add it to the
PRIVATE ghcr allowlist roster. Completes the no-local-builds migration.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 16:10:56 +00:00
Viktor Barzin
25a39fd54e k8s-portal: wire private-ghcr pull (allowlist + imagePullSecrets)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
k8s-portal was the last in-cluster image build; it now builds on GHA and
pushes ghcr.io/viktorbarzin/k8s-portal:latest, which is PRIVATE (infra repo
default). To pull it: add k8s-portal to the sync-ghcr-credentials Kyverno
allowlist (clones the ghcr-credentials Secret into the namespace) and
reference that secret via imagePullSecrets on the deployment — same wiring
as tripit/recruiter-responder. Completes the no-local-builds migration so
nothing builds container images on the cluster anymore (ADR-0002).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 15:38:42 +00:00
Viktor Barzin
a7d33abec9 k8s-portal: commit package.json + lock (force; was gitignored) — unblocks GHA build
Some checks failed
ci/woodpecker/push/default Pipeline was successful
Build k8s-portal / build (push) Has been cancelled
Recovered the real manifest + resolved lockfile (lockfileVersion 3, 71 pkgs)
from the running pod. A parent .gitignore force-ignored package.json, so the
git source tree was incomplete and the image only ever built manually. Now
reproducible on GHA (ADR-0002 no-local-builds).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 15:29:27 +00:00
Viktor Barzin
a9b08c03cf fix(k8s-portal): npm install (no committed lockfile) so GHA can build
Some checks are pending
Build k8s-portal / build (push) Waiting to run
ci/woodpecker/push/default Pipeline was successful
package-lock.json was never committed to either lineage — npm ci needs it,
so the build only ever worked from a manual devvm build with a local lock.
npm install resolves from package.json, unblocking the GHA build (ADR-0002).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 15:26:42 +00:00
Viktor Barzin
bdfdf8db72 fix(ci): k8s-portal build context is stacks/k8s-portal/modules/k8s-portal/files (was stale platform/ path)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 15:23:46 +00:00
Viktor Barzin
b906f61ac3 k8s-portal: build off-infra GHA -> ghcr + Keel; remove Woodpecker build (no-local-builds)
Some checks failed
ci/woodpecker/push/default Pipeline was canceled
The last in-cluster image build. GHA build-k8s-portal.yml builds
ghcr.io/viktorbarzin/k8s-portal:latest+sha (path-filtered on the Dockerfile
dir); Keel (force/poll/match-tag) rolls the deployment. Stack image repointed
to ghcr (ignore_changed); .woodpecker/k8s-portal.yml deleted.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 15:21:35 +00:00
Viktor Barzin
9501da81a0 dbaas: document postgresql-backup startingDeadlineSeconds rationale
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Inline note on why the four backup CronJobs moved 10s->600s (bda1bdcb): a 10s deadline silently dropped the 2026-06-13 midnight full-backup run, firing PostgreSQLBackupStale. bda1bdcb rode in the same push as a forgejo change that failed CI on a namespace-quota error, so that pipeline failed before the dbaas apply took effect (live deadline was still 10s). This dbaas-only commit re-triggers the dbaas apply at a clean master so the 600s deadline actually goes live.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 14:22:24 +00:00
Viktor Barzin
ba72621e52 forgejo: 6Gi exceeded namespace quota, set to 4Gi (quota ceiling)
Some checks failed
ci/woodpecker/push/default Pipeline was canceled
The 3Gi->6Gi bump in ff3cc44a was rejected by the forgejo namespace tier-quota (requests.memory capped at 4Gi). With Guaranteed QoS the 6Gi request exceeded quota; FailedCreate left forgejo with 0 pods for ~6 min (git remote + OCI registry outage) until I patched the live Deployment back to a schedulable 4Gi. 4Gi is the most the quota allows and is still a headroom bump over the OOM-prone 3Gi. To go higher the tier-quota must be raised in the same change. This reconciles TF to the live 4Gi so the pending/next apply is a no-op rather than reverting to the quota-busting 6Gi.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 14:13:36 +00:00
Viktor Barzin
ff3cc44a29 forgejo: raise memory limit from 3Gi to 6Gi (OOMKilled at 3Gi)
Some checks failed
ci/woodpecker/push/default Pipeline failed
Forgejo OOMKilled twice on 2026-06-13 at the 3Gi cap (exit 137), briefly taking the git remote and OCI registry down and spiking ingress TTFB to 4.7s and the 4xx rate to 51%. Steady-state is ~2.2Gi but it spiked into the cap (true demand above 3.2Gi). The 2026-06-09 bump to 3Gi was sized for tripit buildkit registry pushes, but that driver is gone now that the Forgejo registry was frozen and emptied today (ADR-0002, images on ghcr), so the spike is git ops / the integrity-probe catalog walk / a possible leak. 6Gi gives headroom on the critical git backbone while we watch whether working-set keeps climbing (which would indicate a leak).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 14:02:55 +00:00
Viktor Barzin
bda1bdcbf3 dbaas: widen backup CronJob startingDeadlineSeconds from 10s to 600s
The daily full PostgreSQL backup silently skipped its 2026-06-13 00:00 run, leaving the last full dump 37h old and firing the critical PostgreSQLBackupStale alert. Root cause: startingDeadlineSeconds was 10s on all four dbaas backup CronJobs, so when the CronJob controller was more than 10s late to the midnight tick (many IO-heavy backups all fire at 00:00, the known etcd-starvation window) the run was dropped entirely instead of starting late. 600s lets a brief controller lag still launch the job. Applied to all four (mysql + pg, full + per-db) since they share the footgun and the midnight contention.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 14:02:54 +00:00
Viktor Barzin
3e82c64a76 docs: sync CI/CD docs to ADR-0002 final state (ghcr + Woodpecker deploy-only) [ci skip]
ADR-0002 is fully landed (issues #11-#32 closed): every owned image now
builds on GitHub Actions and pushes to ghcr.io/viktorbarzin/<name>, with
Woodpecker reduced to deploy-only. The Forgejo container registry is frozen
and emptied; there are no in-cluster image builds or CI test runs anywhere.
The docs still described the old hybrid topology (DockerHub builds,
Woodpecker-native owned-app builds, the per-pattern migration lists, the
tripit-only pilot framing), which would mislead future sessions and
incident response.

This brings the docs to the completed reality (closes #33):

- docs/architecture/ci-cd.md: full rewrite as the canonical CI/CD reference —
  the fleet GHA->ghcr->Woodpecker-deploy pattern, public/private ghcr package
  split, infra-owned image workflows (incl. infra-ci on ghcr), the frozen
  Forgejo registry, what Woodpecker still runs, and the #31 decommissions.
- .claude/CLAUDE.md: rewrite the "CI/CD Architecture" section to the
  fleet-wide final state; FIX the stale claim that claude-memory-mcp builds
  to DockerHub (it is GHA->ghcr); note owned images now live on ghcr and the
  Forgejo registry is frozen/break-glass near the image-registry bullet.
- .claude/reference/service-catalog.md: f1-stream is GHA->ghcr + Woodpecker
  deploy-only (was "Woodpecker-native build->deploy").
- stacks/{tuya-bridge,android-emulator}/variables.tf + stacks/terminal/main.tf:
  cosmetic description/comment updates (forgejo -> ghcr; terminal-lobby has no
  CI pipeline). Description/comment text only — no stack logic changed.

Historical records (docs/post-mortems/*, docs/plans/*) and ADR-0002 itself
are left untouched as point-in-time records.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 12:55:49 +00:00
Viktor Barzin
6e4db0ddc6 openclaw + f1-stream: last forgejo image refs -> ghcr (ADR-0002 #32 prep)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
openclaw's install-nextcloud-todos-plugin init still pulled forgejo
nextcloud-todos (would ImagePullBackOff on restart once the forgejo
registry is wiped) -> ghcr:latest. f1-stream stack base (KEEL_IGNORE'd,
live already ghcr via set-image) repointed for fresh-create correctness.
Clears the last LIVE forgejo viktor/* refs before the registry reclaim.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 12:36:10 +00:00
Viktor Barzin
3c3e6bfc95 ci: retire in-cluster infra-ci build; breakglass becomes manual ghcr pull-and-save (ADR-0002 #30)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
infra-ci now builds on GHA → ghcr and the ghcr-based apply is PROVEN
(pipeline 165 ran terragrunt apply in the ghcr image). Removing the
Woodpecker build-ci-image.yml (clean cut). The breakglass tarball is
preserved as a MANUAL Woodpecker job pulling ghcr (public) → registry VM;
infra-ci on ghcr is external + node-cached, so the Forgejo-down rationale
for the old auto-tarball is moot — this is belt-and-braces DR.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 10:07:58 +00:00
Viktor Barzin
ee25a41c74 ci: apply + drift steps run on ghcr infra-ci (ADR-0002 #30)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The terragrunt apply step (default.yml) and drift-detection now pull
ghcr.io/viktorbarzin/infra-ci:latest (GHA-built, verified toolchain:
tf 1.5.7 / tg 0.99.4 / sops / kubectl 1.34 / vault / git-crypt). ghcr is
public + proven pullable in-cluster. build-ci-image.yml (forgejo build)
KEPT as the fallback copy until this ghcr-based apply is proven, so a
revert restores the working forgejo image if needed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 10:05:34 +00:00
Viktor Barzin
23fc2bf2ec ci: GHA→ghcr build for infra-ci (ADR-0002 #30, bootstrap-safe — woodpecker build kept until proven)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 09:53:43 +00:00
Viktor Barzin
eb8b550521 chrome-service: TF-manage novnc image (ghcr:latest), drop its KEEL_IGNORE (ADR-0002 #29)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
novnc's image was ignore_changed (KEEL_IGNORE) but nothing manages its
tag (keel.sh/policy=never), so the earlier forgejo->ghcr repoint never
took. Removing container[1].image from ignore_changes lets terragrunt
own novnc=ghcr:latest and roll it. container[0]/[2] (pinned playwright)
stay ignored.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 09:49:58 +00:00
Viktor Barzin
94a3d1b870 chrome-service-novnc + android-emulator images -> ghcr (ADR-0002 #29/#30)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Both now built by GHA → public ghcr. Repoint stack image bases
forgejo→ghcr:latest (terragrunt-managed, imagePullPolicy Always picks up
rebuilds). android var default api36-v8 -> latest.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 09:43:40 +00:00
Viktor Barzin
a69847a0f3 tripit: enable Wikipedia city cover photos (CITY_IMAGE_PROVIDER=wikipedia, #47)
Flips the planning workspace's Stay cover photos from the fake provider to live Wikipedia lead-image fetches (downloaded into STORAGE_DIR, served by the backend, editable per Stay). Part of the new-trip flow feature: every picked destination city gets a banner-ready cover. HOLD-ORDER: pushed only after the tripit image containing CityImageMode.wikipedia rolled out.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-13 09:43:40 +00:00
Viktor Barzin
1621f0b204 ci: GHA→ghcr builds for chrome-service-novnc, android-emulator, infra CLI (ADR-0002 #29/#30)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Infra-owned rare-build images move off Woodpecker/manual to GHA (build
from the github checkout — Dockerfiles verified identical on both
remotes). chrome-service-novnc + android-emulator → public ghcr
(dispatch+path). CLI → DockerHub (kept) + ghcr; Woodpecker build-cli.yml
removed. infra-ci handled separately (bootstrap-critical).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 09:38:36 +00:00
Viktor Barzin
f61d707d75 travel_blog: remove decommissioned stack (ADR-0002 infra#31)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
Service was already scaled 0/0 and unused (Viktor: 'not used anymore').
Live resources destroyed via scripts/tg destroy (10 resources: deployment,
namespace, service, anubis-travel + PDB/cm/svc/secret, ingress, TLS).
Removing the stack dir; old Woodpecker build (repo 5) deactivated
separately. The harmless legacy 'travel' CNAME->apex in config.tfvars is
left (now 404s; removing it would trigger a full-platform apply).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 09:32:39 +00:00
Viktor Barzin
90fb0685ae traefik: x402-gateway image forgejo -> ghcr + KEEL_IGNORE_IMAGE (ADR-0002 infra#28)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
Formalizing x402-gateway CI (was a manual no-CI image). The deployment
lives in the traefik module; its image was NOT in ignore_changes, so a
set-image deploy would be reverted on the next traefik apply — added it
(KEEL_IGNORE_IMAGE). Base repointed to ghcr:latest; the GHA deploy
set-images the :sha8. Public ghcr package = no pull secret. Inert on the
live pod (image now ignored); rolling cutover keeps forwardAuth up.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 02:42:45 +00:00
Viktor Barzin
bdea34b992 offinfra-onboard: --dockerfile flag for non-root Dockerfiles
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
claude-memory-mcp's Dockerfile is at docker/Dockerfile, not repo root
(infra#20 build failed: 'open Dockerfile: no such file or directory').
build.yml template gains file: {{DOCKERFILE}} (default ./Dockerfile).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 02:37:25 +00:00
Viktor Barzin
3960eac716 claude-memory: image base forgejo -> ghcr (ADR-0002 infra#20)
Some checks failed
ci/woodpecker/push/build-cli Pipeline was successful
ci/woodpecker/push/default Pipeline was canceled
GHA now builds+pushes ghcr.io/viktorbarzin/claude-memory-mcp (public).
Image is KEEL_IGNORE_IMAGE (set-image managed), so this apply is inert
on the live pod; the stale :17 default literal is corrected to :latest.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 02:34:20 +00:00
Viktor Barzin
2f3c58dff1 claude-agent-service image -> ghcr across all five consumer stacks (infra#19)
All checks were successful
ci/woodpecker/push/build-cli Pipeline was successful
ci/woodpecker/push/default Pipeline was successful
GHA now builds+pushes ghcr.io/viktorbarzin/claude-agent-service (public
package, anonymous pulls). Repointed: claude-agent-service (deployment +
git-init/seed-beads-agent inits), claude-breakglass, ci-pipeline-health,
beads-server CronJobs, k8s-version-upgrade (tag var 2fd7670d -> latest —
the Forgejo registry lost that sha; node caches were the only thing
keeping those CronJobs alive). publish-gate: vendor-contact emails
(licensing@/legal@/security@/sales@) ruled license-boilerplate, not PII.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 01:47:54 +00:00
Viktor Barzin
8aba3a0179 offinfra-onboard --no-deploy; wealthfolio-sync image -> ghcr (ADR-0002 infra#25)
All checks were successful
ci/woodpecker/push/build-cli Pipeline was successful
ci/woodpecker/push/default Pipeline was successful
broker-sync is a CronJob-only consumer (no deployment): new --no-deploy
mode skips Woodpecker registration and renders build.yml without the
deploy job — :latest+Always CronJobs pick up builds on the next run.
wealthfolio stack: ghcr-credentials pull secret + image base repoint.
The wealthfolio-sync image regains a reproducible rebuild path.

Closes: code-62tm

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 01:39:35 +00:00
Viktor Barzin
2dde480795 openclaw: install-recruiter-plugin init image forgejo -> ghcr :latest (infra#27)
All checks were successful
ci/woodpecker/push/build-cli Pipeline was successful
ci/woodpecker/push/default Pipeline was successful
Second half of the recruiter-responder off-infra migration: the first GHA
build has published ghcr.io/viktorbarzin/recruiter-responder:{1d99a8d5,latest},
so the openclaw plugin-install init container can now follow the ghcr
:latest. The forgejo-side build pipeline was removed by the onboarding
commit, so the old forgejo :latest tag is frozen and would silently serve
stale plugin code. Deferred from the first commit on purpose - flipping it
before the package existed would have wedged the openclaw rollout on
ImagePullBackOff.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:57:30 +00:00
Viktor Barzin
57ff41e47e recruiter-responder: pull image from ghcr + ghcr-credentials on all consumers (ADR-0002, infra#27)
All checks were successful
ci/woodpecker/push/build-cli Pipeline was successful
ci/woodpecker/push/default Pipeline was successful
Migrating recruiter-responder off in-cluster Woodpecker builds: GHA will
build and push ghcr.io/viktorbarzin/recruiter-responder (PRIVATE package).
This commit lands the pull-side prerequisites BEFORE the first off-infra
build fires:

- stacks/recruiter-responder: image base forgejo -> ghcr (inert on the live
  Deployment - both containers are ignore_changes'd; the Woodpecker deploy
  moves the tag) + ghcr-credentials imagePullSecrets on the Deployment
  (covers the recruiter-responder container AND the alembic-migrate init
  container, which share the image).
- stacks/openclaw: ghcr-credentials imagePullSecrets on the openclaw
  Deployment - its install-recruiter-plugin init container consumes the
  :latest tag of this image. The image ref itself flips to ghcr in a
  follow-up once the first GHA build has created the package (flipping now
  would ImagePullBackOff on a not-yet-existing package and wedge the apply).
- stacks/kyverno: allowlist openclaw in sync-ghcr-credentials so the pull
  secret is cloned into that namespace too.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:43:35 +00:00
Viktor Barzin
c594274c83 ci: re-apply fire-planner stack after pipeline race
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
Comment-only touch so the changed-stack detection applies
stacks/fire-planner from the current master tree. Pipeline 150 (commit
f18dfa4c — the ghcr image base + ghcr-credentials migration for issue
#26) was auto-killed when the concurrent nextcloud-todos push superseded
it, and pipeline 151 diffed from f18dfa4c onward so the fire-planner
stack changes were never applied (cronjobs still point at the forgejo
image, pod specs lack ghcr-credentials).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:41:20 +00:00
Viktor Barzin
a264a19629 Merge remote-tracking branch 'forgejo/master' into wizard/nextcloud-todos-ghcr
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
2026-06-13 00:38:27 +00:00
Viktor Barzin
d5c328d23c nextcloud-todos: image base forgejo -> ghcr (ADR-0002, infra#18)
The nextcloud-todos build moved off-infra: GHA builds on the public
GitHub mirror and pushes ghcr.io/viktorbarzin/nextcloud-todos (public
package, anonymous pulls); Woodpecker repo 207 is deploy-only. First
ghcr image (:19c22d8c) is already built, deployed and rolled out, so
this repoint lands after the image exists. Both deployment image refs
(main + alembic-migrate init) are ignore_changes'd — no live churn,
the base matters only on resource (re)create. Old image was pulled
from a Forgejo registry package that no longer exists (pods survived
on node image cache only).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:38:25 +00:00
Viktor Barzin
f18dfa4c8b fire-planner: pull image from ghcr + add ghcr-credentials to all pod specs
Some checks failed
ci/woodpecker/push/build-cli Pipeline was canceled
ci/woodpecker/push/default Pipeline was canceled
Migrating fire-planner off in-cluster Woodpecker builds to GitHub
Actions -> ghcr.io (ADR-0002, issue #26). The image base moves
forgejo.viktorbarzin.me/viktor/fire-planner ->
ghcr.io/viktorbarzin/fire-planner (a PRIVATE ghcr package), so the
deployment, all three cronjobs (recompute, col-refresh,
examples-weekly) and the examples bulk job gain the ghcr-credentials
imagePullSecret (the kyverno sync-ghcr-credentials allowlist already
covers the fire-planner namespace). registry-credentials stays
alongside so the currently-running sha-pinned forgejo image can still
be pulled until the first ghcr deploy lands; the cronjob images are TF
literals and flip to ghcr :latest on this apply.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:38:09 +00:00
Viktor Barzin
e696957ebf ci: ancestor guard on DIFF_BASE; gate allowlists the owner's work email [ci skip]
Restarted infra pipelines after master moved diffed in REVERSE and
re-applied stale trees (pipeline 148 reverted payslip-ingest's fresh
ghcr config — repaired by the wave-2 agent). Only trust
CI_PREV_COMMIT_SHA when it is an ancestor of HEAD. publish-gate:
viktorbarzin@meta.com is the owner's own work email (same class as the
allowlisted personal domain), not blockable PII — unblocks infra#18.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:31:33 +00:00
Viktor Barzin
cdd60d9078 ci: re-apply instagram-poster + payslip-ingest stacks after pipeline race
All checks were successful
ci/woodpecker/push/build-cli Pipeline was successful
ci/woodpecker/push/default Pipeline was successful
Comment-only touch of both stacks so the changed-stack detection applies
them from the current master tree. Two pipelines went wrong in sequence
during the parallel ADR-0002 wave-2 migrations (issues #23/#24):

- pipeline 146 (instagram-poster stack prep, commit 29c69250) was
  auto-killed when the concurrent payslip-ingest push superseded it, so
  its apply never ran;
- restarting it as pipeline 148 inherited CI_PREV_COMMIT_SHA = the NEW
  branch head (6928ce0b) with the OLD checkout (29c69250) — a reverse
  diff that re-applied stacks/payslip-ingest from the pre-migration
  tree, stripping the ghcr image base + ghcr-credentials pull secrets
  that pipeline 147 had just applied (2 resources reverted).

This commit restores the committed payslip-ingest config exactly as
issue #24 landed it and finally applies the instagram-poster ghcr prep
from issue #23. Lesson encoded in the comments: do not restart killed
infra pipelines after master has moved — re-trigger with a touch commit
instead.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:11:17 +00:00
Viktor Barzin
6928ce0be5 Merge remote-tracking branch 'forgejo/master' into wizard/payslip-ingest-ghcr
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
2026-06-13 00:03:29 +00:00
Viktor Barzin
5d236c2352 payslip-ingest: image base forgejo -> ghcr, ghcr-credentials pull secret, cron to :latest+Always
Prep for moving payslip-ingest's image build off-infra to GitHub Actions ->
ghcr.io (ADR-0002 wave 2, issue #24). One stack commit before onboarding:

- image base repointed forgejo.viktorbarzin.me/viktor/payslip-ingest ->
  ghcr.io/viktorbarzin/payslip-ingest (private ghcr package)
- ghcr-credentials imagePullSecrets added on the Deployment AND the
  actualbudget-payroll-sync CronJob pod specs (namespace is already in the
  kyverno sync-ghcr-credentials allowlist; secret verified present)
- the CronJob's SHA pin is retired: terragrunt image_tag 4f70681d -> latest
  plus explicit imagePullPolicy Always on the cron container, per the fleet
  convention for owned-app CronJobs — one less set-image target, and the
  cron can never go back to pulling the dead Forgejo tag

The Deployment keeps KEEL_IGNORE_IMAGE; its concrete :sha8 tag is set by
the Woodpecker deploy pipeline after each GHA build.

Closes: nothing yet — the repo-side onboarding (offinfra-onboard) follows.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:03:11 +00:00
Viktor Barzin
29c6925031 instagram-poster: image base forgejo->ghcr + ghcr-credentials pull secret
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
Prep for migrating instagram-poster off in-cluster Woodpecker builds to
GitHub Actions -> ghcr.io (ADR-0002, issue #23, PRIVATE-repo path).
Viktor asked for the wave-2 migration of instagram-poster per the wave-1
retro recipe: before onboarding, the stack must (a) carry the
ghcr-credentials imagePullSecret on the Deployment so the cluster can
pull the private ghcr image, and (b) repoint the image base from
forgejo.viktorbarzin.me/viktor to ghcr.io/viktorbarzin.

The Deployment image is KEEL_IGNORE_IMAGE (ignore_changes), so this
apply does NOT roll the pod to a not-yet-existing ghcr image — the live
forgejo-built :da5b4191 keeps running until the first GHA build POSTs
the Woodpecker deploy. The three CronJobs run curlimages/curl (public
DockerHub), not the app image, so they need neither the pull secret nor
a repoint. registry-credentials stays for the transition window.

Closes: nothing (stack prep only; repo onboarding follows)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-13 00:02:04 +00:00
Viktor Barzin
72b5843e4b publish-gate: exclude package-lock + beads tracker from email heuristic; beadboard image base -> ghcr
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
infra#17: the gate flagged npm deprecation boilerplate (package-lock.json
escapes the *.lock filter) and the upstream fork author's email in tracked
.beads data — both already-public upstream content, ruled false positives.
Lock files excluded properly; .beads moved to the eyeball inventory.
beads-server stack: beadboard image base repointed (deployment image is
KEEL-ignored; no CronJobs use it).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:52:07 +00:00
Viktor Barzin
57ffd0ed8d Merge remote-tracking branch 'forgejo/master' into wizard/freedify-mig
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
2026-06-12 23:37:19 +00:00
Viktor Barzin
c16fe56180 freedify: image base forgejo registry -> ghcr (ADR-0002)
Freedify builds moved off-infra per issue #22: GitHub Actions on the
ViktorBarzin/freedify mirror now builds and pushes the public image
ghcr.io/viktorbarzin/freedify, and the Woodpecker deploy pipeline
(repo 202) rolls :sha8 via kubectl set image. Both factory deployments
(music-viktor, music-emo) now seed from ghcr instead of the retired
in-cluster Forgejo build, and the container image joins lifecycle
ignore_changes (KEEL_IGNORE_IMAGE) so terraform applies do not revert
the deployed :sha8. Landed after the first GHA push so ghcr :latest
already existed when this repoint applied. Public package - no pull
secret needed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:37:10 +00:00
Viktor Barzin
9f742b544c kms: image base forgejo registry -> ghcr (ADR-0002 infra#21)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
kms-website moves off in-cluster Woodpecker builds to GHA -> ghcr.
The kms-web-page deployment image is ignore_changes'd (CI sets the live
tag), so this repoint only governs future creates; package is PUBLIC so
no pull secret is wired. No CronJobs in this stack.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:30:07 +00:00
Viktor Barzin
fb88440ec4 ci-pipeline-health: billing moved to the enhanced usage endpoint
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
The legacy /settings/billing/actions endpoint now returns 410; sum
Minutes usageItems from /settings/billing/usage instead (found during
the infra#16 retro: June-to-date = 420/2000).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:24:18 +00:00
Viktor Barzin
12bdd06f74 kyverno: force_new on sync-ghcr-credentials — generate rules are immutable
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
Pipeline 138: the validate-policy webhook denies in-place edits of a
generate rule (allowlist additions). force_new = delete+recreate;
generated secrets survive and generateExisting re-adopts.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:18:15 +00:00
Viktor Barzin
6b0d42c7bc publish-gate + tuya-bridge ghcr cutover prep (ADR-0002 infra#15)
Some checks failed
ci/woodpecker/push/build-cli Pipeline was successful
ci/woodpecker/push/default Pipeline failed
publish-gate: gitleaks + trufflehog (full history) + PII heuristics;
CLEAN verdict gates any public flip, DIRTY = stays private. tuya-bridge:
ghcr-credentials pull secret + image base -> ghcr; namespace added to
the ghcr-credentials allowlist as a safety net (new ghcr packages
default PRIVATE even from public repos — prune after visibility flip).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:12:02 +00:00
Viktor Barzin
54dfaf6edc job-hunter: image base forgejo registry -> ghcr (ADR-0002)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
CronJobs track :latest via the TF literal (unlike the ignore_changes'd
deployment), so they kept pulling the dead Forgejo image after the
GHA/ghcr cutover — repoint the stack's image base.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:06:54 +00:00
Viktor Barzin
51682ee939 offinfra-onboard: require clean clone + ff to forgejo master first [ci skip]
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 23:00:55 +00:00