6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and
rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto
it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard +
--max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault
secret/stem95su. Requires the GCP OAuth app published to Production or the
refresh token expires ~weekly.
Lands the gdrive-sync stack on master (it had landed on a feature branch
by accident on the shared devvm checkout).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Root cause of today's CAPI 403 crashloop: chart 0.21.0 pins appVersion
to v1.7.3, but Keel had auto-bumped the running pods to v1.7.8 on
2026-05-16 and they ran fine with CAPI for 8 days. Today's TF apply
(b59acbc1 agent memory bump) re-rendered the deployment from chart
defaults, reverting the image to v1.7.3 — and v1.7.3 has a CAPI
watcher-auth bug against the current api.crowdsec.net behaviour, so
every fresh replica started 403'ing on startup.
Fix: set `image.tag: "v1.7.8"` in values.yaml so the image survives
future TF applies independently of the chart's appVersion. Verified
CAPI auth succeeds on all 3 fresh pods with v1.7.8.
Also dropped the ENROLL_KEY env block — the existing key `cmey5e636…`
is single-shot and was already consumed by the first replica;
subsequent pods hit 403 on `cscli console enroll`. CAPI works WITHOUT
console enrollment (separate flows). Re-enable console reporting by
generating a fresh enroll key at app.crowdsec.net (procedure
documented in the values.yaml comment block).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CAPI auth at api.crowdsec.net is rejecting watcher logins from inside
the cluster within ~1h of registration, even after rotating creds via
`cscli capi register`. The same login successfully authenticates from
devvm but fails from cluster pods → IP-throttle or account-state issue
at the central API. Until that's resolved with CrowdSec support (or
the throttle window resets), running with CAPI on is just chronic
crashloops on every fresh replica.
`DISABLE_ONLINE_API=true` makes the chart entrypoint
`conf_set 'del(.api.server.online_client)'`, removing the online_client
block entirely. Pods skip CAPI auth, no 403, no crashloop. Trade-off:
no community blocklists. Local scenarios + bouncers continue
unchanged.
Side-effect of disabling CAPI in this chart (v0.21.0) — `role.yaml`
is gated on `IsOnlineAPIDisabled=false` while `cscli-lapi-register-job`
is gated on `StoreLAPICscliCredentialsInSecret=true` (orthogonal). So
the hook runs without the Role it needs, and atomic apply rolls back.
Mitigation: pre-created the `crowdsec-lapi-cscli-credentials` Secret
manually (the hook short-circuits when the secret already exists) and
re-applied the missing Role for future re-enablement.
Re-enable path documented in the comment block.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
krr 2026-05-22 flagged crowdsec-agent DaemonSet (4 pods) as under-
requested by ~588 MiB across the cluster. Live usage around the
80-128 MiB mark for active log parsing — 64 MiB request risked eviction
ahead of more-needed pods. Limit stays at 512 MiB.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 1 of platform stack split for parallel CI applies.
All 3 modules were fully independent (no cross-module refs).
State migrated via terraform state mv. All 3 stacks applied
with zero changes (dbaas had pre-existing ResourceQuota drift).
Woodpecker pipeline updated to run extracted stacks in parallel.