nextcloud: re-enable Keel auto-upgrades with occ-upgrade self-heal + live-tag floor
Re-enrolls Nextcloud in Keel (opted out after the 2026-05-26 32.0.3->32.0.9 bump stuck the pod in maintenance mode ~22h). Two safeguards engineer around both failure modes: - F1 (interrupted occ upgrade -> 503): nextcloud-watchdog CronJob runs `occ upgrade` + clears maintenance mode when occ reports needsDbUpgrade=true; Job deadline bumped 120->600s so it isn't killed mid-migration. - F2 (helm re-renders a tag below the Keel-bumped live image -> downgrade CrashLoop): chart_values renders the live tag via a plural kubernetes_resources data source (empty-list-on-absence -> floor 32.0.9 on fresh install/DR), so a re-render never downgrades below live. Scope is patch -- Kyverno's shared inject-keel-annotations policy stamps it and its background-controller overrides a TF-set value, and patch == minor for Nextcloud in practice (32.0.x only; major 33 stays manual). Dropped the per-workload keel.sh/policy override resources to avoid perpetual drift; ns enrollment + Kyverno now own the keel annotations like other workloads. Also bumps the external-storage bootstrap Job create timeout 1m->12m to match its own 10m pod-wait, since Keel bumps now roll the pod mid-apply. Verified: Keel auto-upgraded 32.0.9->32.0.10 on apply, entrypoint occ upgrade completed clean (no watchdog needed), pod 2/2, HTTP 200, plan shows no drift.
This commit is contained in:
parent
50d0f1affa
commit
fb1e47a20a
4 changed files with 133 additions and 56 deletions
|
|
@ -116,6 +116,16 @@ resource "kubernetes_role_binding" "nextcloud_external_storage_bootstrap" {
|
|||
# ── Bootstrap Job ────────────────────────────────────────────────────────────
|
||||
|
||||
resource "kubernetes_job_v1" "nextcloud_external_storage_bootstrap" {
|
||||
# The bootstrap script (below) waits up to 10m for the NC pod to be Ready.
|
||||
# kubernetes_job_v1's default create timeout is only 1m, which spuriously
|
||||
# fails the apply whenever the NC pod takes >1m to come up — e.g. now that
|
||||
# Keel auto-upgrades nextcloud, a bump mid-apply runs `occ upgrade` in the
|
||||
# entrypoint and delays readiness past 1m (observed 2026-06-01). Match the
|
||||
# script's 10m wait plus margin.
|
||||
timeouts {
|
||||
create = "12m"
|
||||
}
|
||||
|
||||
metadata {
|
||||
name = "nextcloud-external-storage-bootstrap"
|
||||
namespace = kubernetes_namespace.nextcloud.metadata[0].name
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue