infra

Author	SHA1	Message	Date
Viktor Barzin	ebc8b6588f	ESO: add force_conflicts to all ExternalSecret manifests (fleet sweep) Some checks failed ci/woodpecker/push/default Pipeline failed Details The 2026-06-22 external-secrets v1 migration made the ESO controller the server-side-apply owner of .spec.refreshInterval on every ExternalSecret, so any stack defining one via kubernetes_manifest fails `terraform apply` with a field-manager conflict the next time it's applied (instagram-poster + grafana hit this on 2026-06-24; it was latent across the whole fleet). Add field_manager { force_conflicts = true } to all 101 remaining ExternalSecret manifests across 70 stacks, matching the fix already on grafana / woodpecker / traefik / k8s-version-upgrade / instagram-poster. TF and ESO set the same value, so it's stable (no perpetual drift). Defuses the landmine before each stack's next apply trips it. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-25 21:28:11 +00:00
Viktor Barzin	c670cb7118	eso: Phase 2 — migrate all 104 ExternalSecrets + 2 ClusterSecretStores to v1 Some checks failed ci/woodpecker/push/default Pipeline failed Details The API rewrite half of the ESO 0.12->2.6 migration (last k8s-1.35 compat-gate blocker). Done on chart 0.16.2, which serves BOTH external-secrets.io/v1beta1 and v1, so this is the safe window — MUST land before 0.17 removes v1beta1 (there is no conversion webhook). Pure apiVersion bump, schema is byte-identical: 106 occurrences (104 ExternalSecrets + 2 ClusterSecretStores vault-kv/vault-database) across 73 .tf files, v1beta1 -> v1, no other field changes. Validated live first on tandoor (single, non-coupled, synced ES): the kubernetes_manifest apiVersion bump forces a REPLACE; the target Secret is cascade-GC'd for ONE ~0.3s poll then ESO recreates it (identical value re-synced from Vault, new UID) and the ES returns SecretSynced=True on v1. Running pods keep their mounted copy through the sub-second blip. All 110 target Secrets were snapshotted to /tmp first as a backstop. CI applies the changed stacks serially (staged rollout); watching aggregate ES sync back to 108 synced (2 pre-existing dead: instagram-poster, payslip-ingest). Next: Phase 3 climb 0.16.2 -> 2.6.0. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-22 19:13:04 +00:00
Viktor Barzin	2f3c58dff1	claude-agent-service image -> ghcr across all five consumer stacks (infra#19) All checks were successful ci/woodpecker/push/build-cli Pipeline was successful Details ci/woodpecker/push/default Pipeline was successful Details GHA now builds+pushes ghcr.io/viktorbarzin/claude-agent-service (public package, anonymous pulls). Repointed: claude-agent-service (deployment + git-init/seed-beads-agent inits), claude-breakglass, ci-pipeline-health, beads-server CronJobs, k8s-version-upgrade (tag var 2fd7670d -> latest — the Forgejo registry lost that sha; node caches were the only thing keeping those CronJobs alive). publish-gate: vendor-contact emails (licensing@/legal@/security@/sales@) ruled license-boilerplate, not PII. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-13 01:47:54 +00:00
Viktor Barzin	02785987dd	ci-pipeline-health: image :latest+Always — registry lost the 2fd7670d tag All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details The sha tag other claude-agent-service CronJobs pin no longer exists in the Forgejo registry (node caches mask it); fresh pulls 404. Follow the owned-app CronJob convention until infra#19 moves this image to ghcr. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 21:06:20 +00:00
Viktor Barzin	30ff8f2db3	ci: diff changed stacks against CI_PREV_COMMIT_SHA, not HEAD~1 All checks were successful ci/woodpecker/push/default Pipeline was successful Details ci/woodpecker/push/build-cli Pipeline was successful Details HEAD~1 on a merge commit is the feature-branch parent, so the changed-stack detection diffed the WRONG side and silently skipped the stacks the push actually changed — pipeline 128 'succeeded' without applying the new ci-pipeline-health stack. Use the push's true before-state (CI_PREV_COMMIT_SHA) when it resolves, HEAD~1 as fallback (first build / shallow edge cases). Also touches the ci-pipeline-health stack so THIS push applies it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 20:50:43 +00:00
Viktor Barzin	d02ca4f2db	ci-pipeline-health: daily sweep of the off-infra CI chain (ADR-0002) Viktor asked to monitor the pipelines closely as builds move off-infra (PRD infra#10). New aux stack: daily 07:30 UTC CronJob on the claude-agent-service image running a deterministic shell sweep — GitHub Actions failures/stuck runs across owned repos, Woodpecker pipeline failures, GHA free-tier minutes burn. Healthy = one quiet Slack line; issues = Slack alert + comment on infra#10. In-cluster (not a cloud routine) because Vault + the Woodpecker token are LAN-only. Secrets via ExternalSecret (github_pat deliberately, not the ghcr_pull_token alias — a scoped packages-only rotation couldn't read Actions runs). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>	2026-06-12 20:45:28 +00:00

6 commits