nextcloud: re-enable Keel auto-upgrades with occ-upgrade self-heal + live-tag floor
Re-enrolls Nextcloud in Keel (opted out after the 2026-05-26 32.0.3->32.0.9 bump stuck the pod in maintenance mode ~22h). Two safeguards engineer around both failure modes: - F1 (interrupted occ upgrade -> 503): nextcloud-watchdog CronJob runs `occ upgrade` + clears maintenance mode when occ reports needsDbUpgrade=true; Job deadline bumped 120->600s so it isn't killed mid-migration. - F2 (helm re-renders a tag below the Keel-bumped live image -> downgrade CrashLoop): chart_values renders the live tag via a plural kubernetes_resources data source (empty-list-on-absence -> floor 32.0.9 on fresh install/DR), so a re-render never downgrades below live. Scope is patch -- Kyverno's shared inject-keel-annotations policy stamps it and its background-controller overrides a TF-set value, and patch == minor for Nextcloud in practice (32.0.x only; major 33 stays manual). Dropped the per-workload keel.sh/policy override resources to avoid perpetual drift; ns enrollment + Kyverno now own the keel annotations like other workloads. Also bumps the external-storage bootstrap Job create timeout 1m->12m to match its own 10m pod-wait, since Keel bumps now roll the pod mid-apply. Verified: Keel auto-upgraded 32.0.9->32.0.10 on apply, entrypoint occ upgrade completed clean (no watchdog needed), pod 2/2, HTTP 200, plan shows no drift.
This commit is contained in:
parent
50d0f1affa
commit
fb1e47a20a
4 changed files with 133 additions and 56 deletions
|
|
@ -1,13 +1,16 @@
|
|||
# Pin the image to 32.0.9 (apache). On 2026-05-26 Keel bumped the live
|
||||
# Deployment 32.0.3 → 32.0.9-apache and the DATA migrated to 32.0.9.2; Keel
|
||||
# was then disabled but chart_values was never pinned, so it kept defaulting
|
||||
# to the chart's appVersion (32.0.3). A 2026-06-01 `terragrunt apply`
|
||||
# reconciled that drift, rolled a 32.0.3 pod, and Nextcloud refused to
|
||||
# downgrade (data 32.0.9.2 > image 32.0.3.2) → CrashLoopBackOff. Pinning here
|
||||
# keeps TF the source of truth and matches the on-disk data version.
|
||||
# image.tag is rendered dynamically (templatefile var `image_tag`) from the
|
||||
# CURRENT live Deployment tag, falling back to var.nextcloud_image_tag_floor
|
||||
# (32.0.9) on fresh install / DR — see stacks/nextcloud/main.tf
|
||||
# `data.kubernetes_resource.nextcloud_live` + locals. This makes helm upgrades
|
||||
# image-no-ops in steady state and means a re-render can NEVER downgrade below
|
||||
# the Keel-bumped live tag (the 2026-06-01 CrashLoop: a pinned 32.0.3 lost to
|
||||
# live 32.0.9 and Nextcloud refused the downgrade). Keel (keel.sh/policy=minor)
|
||||
# bumps the live tag upward within major 32; the next apply just follows it.
|
||||
# flavor=apache renders the bare apache-default tag (live image is
|
||||
# `nextcloud:<tag>`, no -apache suffix).
|
||||
image:
|
||||
flavor: apache
|
||||
tag: "32.0.9"
|
||||
tag: "${image_tag}"
|
||||
|
||||
nextcloud:
|
||||
host: nextcloud.viktorbarzin.me
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue