The 2026-06-22 external-secrets v1 migration made the ESO controller the
server-side-apply owner of .spec.refreshInterval on every ExternalSecret, so any
stack defining one via kubernetes_manifest fails `terraform apply` with a
field-manager conflict the next time it's applied (instagram-poster + grafana hit
this on 2026-06-24; it was latent across the whole fleet). Add
field_manager { force_conflicts = true } to all 101 remaining ExternalSecret
manifests across 70 stacks, matching the fix already on grafana / woodpecker /
traefik / k8s-version-upgrade / instagram-poster. TF and ESO set the same value,
so it's stable (no perpetual drift). Defuses the landmine before each stack's
next apply trips it.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The API rewrite half of the ESO 0.12->2.6 migration (last k8s-1.35 compat-gate
blocker). Done on chart 0.16.2, which serves BOTH external-secrets.io/v1beta1
and v1, so this is the safe window — MUST land before 0.17 removes v1beta1
(there is no conversion webhook). Pure apiVersion bump, schema is byte-identical:
106 occurrences (104 ExternalSecrets + 2 ClusterSecretStores vault-kv/vault-database)
across 73 .tf files, v1beta1 -> v1, no other field changes.
Validated live first on tandoor (single, non-coupled, synced ES): the
kubernetes_manifest apiVersion bump forces a REPLACE; the target Secret is
cascade-GC'd for ONE ~0.3s poll then ESO recreates it (identical value re-synced
from Vault, new UID) and the ES returns SecretSynced=True on v1. Running pods
keep their mounted copy through the sub-second blip. All 110 target Secrets were
snapshotted to /tmp first as a backstop.
CI applies the changed stacks serially (staged rollout); watching aggregate ES
sync back to 108 synced (2 pre-existing dead: instagram-poster, payslip-ingest).
Next: Phase 3 climb 0.16.2 -> 2.6.0.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
GHA now builds+pushes ghcr.io/viktorbarzin/claude-agent-service (public
package, anonymous pulls). Repointed: claude-agent-service (deployment +
git-init/seed-beads-agent inits), claude-breakglass, ci-pipeline-health,
beads-server CronJobs, k8s-version-upgrade (tag var 2fd7670d -> latest —
the Forgejo registry lost that sha; node caches were the only thing
keeping those CronJobs alive). publish-gate: vendor-contact emails
(licensing@/legal@/security@/sales@) ruled license-boilerplate, not PII.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The sha tag other claude-agent-service CronJobs pin no longer exists in
the Forgejo registry (node caches mask it); fresh pulls 404. Follow the
owned-app CronJob convention until infra#19 moves this image to ghcr.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
HEAD~1 on a merge commit is the feature-branch parent, so the
changed-stack detection diffed the WRONG side and silently skipped the
stacks the push actually changed — pipeline 128 'succeeded' without
applying the new ci-pipeline-health stack. Use the push's true
before-state (CI_PREV_COMMIT_SHA) when it resolves, HEAD~1 as fallback
(first build / shallow edge cases). Also touches the ci-pipeline-health
stack so THIS push applies it.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor asked to monitor the pipelines closely as builds move off-infra
(PRD infra#10). New aux stack: daily 07:30 UTC CronJob on the
claude-agent-service image running a deterministic shell sweep —
GitHub Actions failures/stuck runs across owned repos, Woodpecker
pipeline failures, GHA free-tier minutes burn. Healthy = one quiet
Slack line; issues = Slack alert + comment on infra#10. In-cluster
(not a cloud routine) because Vault + the Woodpecker token are
LAN-only. Secrets via ExternalSecret (github_pat deliberately, not the
ghcr_pull_token alias — a scoped packages-only rotation couldn't read
Actions runs).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>