infra

viktor/infra

Fork 0

Commit graph

Author	SHA1	Message	Date
Viktor Barzin	fb1e47a20a	nextcloud: re-enable Keel auto-upgrades with occ-upgrade self-heal + live-tag floor Re-enrolls Nextcloud in Keel (opted out after the 2026-05-26 32.0.3->32.0.9 bump stuck the pod in maintenance mode ~22h). Two safeguards engineer around both failure modes: - F1 (interrupted occ upgrade -> 503): nextcloud-watchdog CronJob runs `occ upgrade` + clears maintenance mode when occ reports needsDbUpgrade=true; Job deadline bumped 120->600s so it isn't killed mid-migration. - F2 (helm re-renders a tag below the Keel-bumped live image -> downgrade CrashLoop): chart_values renders the live tag via a plural kubernetes_resources data source (empty-list-on-absence -> floor 32.0.9 on fresh install/DR), so a re-render never downgrades below live. Scope is patch -- Kyverno's shared inject-keel-annotations policy stamps it and its background-controller overrides a TF-set value, and patch == minor for Nextcloud in practice (32.0.x only; major 33 stays manual). Dropped the per-workload keel.sh/policy override resources to avoid perpetual drift; ns enrollment + Kyverno now own the keel annotations like other workloads. Also bumps the external-storage bootstrap Job create timeout 1m->12m to match its own 10m pod-wait, since Keel bumps now roll the pod mid-apply. Verified: Keel auto-upgraded 32.0.9->32.0.10 on apply, entrypoint occ upgrade completed clean (no watchdog needed), pod 2/2, HTTP 200, plan shows no drift.	2026-06-01 19:50:41 +00:00
Viktor Barzin	c624caf65a	nextcloud(external_storage): add per-mount enableSharing option Some checks failed ci/woodpecker/push/build-cli Pipeline failed Details ci/woodpecker/push/default Pipeline was successful Details Lets admin natively share folders from inside an external mount with internal users/groups or via public link. The two PVE pool browsers (visible to admin only) get enableSharing=true so they can act as a "share-from picker" over /srv/nfs and /srv/nfs-ssd; /anca-elements stays false so anca manages re-sharing inside her own view. - Manifest schema gains enableSharing on rootMounts + archiveMounts. - Bootstrap Job adds sync_option() and reconciles enable_sharing via occ files_external:option (idempotent — occ no-ops same-value set).	2026-05-24 11:39:16 +00:00
Viktor Barzin	cb1a34fd00	nextcloud: expose PVE NFS roots + /anca-elements via Files External Some checks failed ci/woodpecker/push/build-cli Pipeline failed Details ci/woodpecker/push/default Pipeline was successful Details Mounts the Proxmox host NFS exports (/srv/nfs and /srv/nfs-ssd) into the NC pod and surfaces them through occ files_external:create: - /PVE NFS Pool → /mnt/pve-nfs (admin group only) - /PVE NFS-SSD Pool → /mnt/pve-nfs-ssd (admin group only) - /anca-elements → /mnt/pve-nfs/anca-elements (admin, anca users) Mount visibility is controlled by occ files_external:applicable; no Files Access Control. ACL state is reconciled idempotently by a bootstrap Job that diffs desired vs current applicable_users / applicable_groups (via files_external:list --output=json). Bootstrap fixes vs initial design: - Sync loop used `[ -n "$U" ] && cmd` which returns 1 on empty input, triggering set -e on no-op re-runs. Switched to process substitution `< <(jq ...)` so empty diff -> loop body never runs -> 0 exit. - RBAC missed `watch` verb (kubectl wait spammed reflector errors). - Manifest used display-name "viktor" instead of NC username "admin" for the /anca-elements applicable list. Chart values: added two PV-backed volume mounts at /mnt/pve-nfs[+ssd] and pinned securityContext to fsGroup=33 with fsGroupChangePolicy: OnRootMismatch (chart default Always would recurse 600k+ files on every pod restart).	2026-05-24 11:27:26 +00:00

Author

SHA1

Message

Date

Viktor Barzin

fb1e47a20a

nextcloud: re-enable Keel auto-upgrades with occ-upgrade self-heal + live-tag floor

Re-enrolls Nextcloud in Keel (opted out after the 2026-05-26 32.0.3->32.0.9
bump stuck the pod in maintenance mode ~22h). Two safeguards engineer around
both failure modes:

- F1 (interrupted occ upgrade -> 503): nextcloud-watchdog CronJob runs
  `occ upgrade` + clears maintenance mode when occ reports needsDbUpgrade=true;
  Job deadline bumped 120->600s so it isn't killed mid-migration.
- F2 (helm re-renders a tag below the Keel-bumped live image -> downgrade
  CrashLoop): chart_values renders the live tag via a plural
  kubernetes_resources data source (empty-list-on-absence -> floor 32.0.9 on
  fresh install/DR), so a re-render never downgrades below live.

Scope is patch -- Kyverno's shared inject-keel-annotations policy stamps it and
its background-controller overrides a TF-set value, and patch == minor for
Nextcloud in practice (32.0.x only; major 33 stays manual). Dropped the
per-workload keel.sh/policy override resources to avoid perpetual drift; ns
enrollment + Kyverno now own the keel annotations like other workloads.

Also bumps the external-storage bootstrap Job create timeout 1m->12m to match
its own 10m pod-wait, since Keel bumps now roll the pod mid-apply.

Verified: Keel auto-upgraded 32.0.9->32.0.10 on apply, entrypoint occ upgrade
completed clean (no watchdog needed), pod 2/2, HTTP 200, plan shows no drift.

2026-06-01 19:50:41 +00:00

Viktor Barzin

c624caf65a

nextcloud(external_storage): add per-mount enableSharing option

ci/woodpecker/push/build-cli Pipeline failed

Details

ci/woodpecker/push/default Pipeline was successful

Details

Lets admin natively share folders from inside an external mount with
internal users/groups or via public link. The two PVE pool browsers
(visible to admin only) get enableSharing=true so they can act as a
"share-from picker" over /srv/nfs and /srv/nfs-ssd; /anca-elements
stays false so anca manages re-sharing inside her own view.

- Manifest schema gains enableSharing on rootMounts + archiveMounts.
- Bootstrap Job adds sync_option() and reconciles enable_sharing via
  occ files_external:option (idempotent — occ no-ops same-value set).

2026-05-24 11:39:16 +00:00

Viktor Barzin

cb1a34fd00

nextcloud: expose PVE NFS roots + /anca-elements via Files External

ci/woodpecker/push/build-cli Pipeline failed

Details

ci/woodpecker/push/default Pipeline was successful

Details

Mounts the Proxmox host NFS exports (/srv/nfs and /srv/nfs-ssd) into
the NC pod and surfaces them through occ files_external:create:

- /PVE NFS Pool      → /mnt/pve-nfs       (admin group only)
- /PVE NFS-SSD Pool  → /mnt/pve-nfs-ssd   (admin group only)
- /anca-elements     → /mnt/pve-nfs/anca-elements  (admin, anca users)

Mount visibility is controlled by occ files_external:applicable; no
Files Access Control. ACL state is reconciled idempotently by a
bootstrap Job that diffs desired vs current applicable_users /
applicable_groups (via files_external:list --output=json).

Bootstrap fixes vs initial design:
- Sync loop used `[ -n "$U" ] && cmd` which returns 1 on empty input,
  triggering set -e on no-op re-runs. Switched to process substitution
  `< <(jq ...)` so empty diff -> loop body never runs -> 0 exit.
- RBAC missed `watch` verb (kubectl wait spammed reflector errors).
- Manifest used display-name "viktor" instead of NC username "admin"
  for the /anca-elements applicable list.

Chart values: added two PV-backed volume mounts at /mnt/pve-nfs[+ssd]
and pinned securityContext to fsGroup=33 with fsGroupChangePolicy:
OnRootMismatch (chart default Always would recurse 600k+ files on
every pod restart).

2026-05-24 11:27:26 +00:00

3 commits