Default to proxmox-lvm for all new services. NFS only for RWX,
backup destinations, or shared media libraries. Updated iSCSI
backup section to reflect proxmox-lvm migration.
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
SQLite-backed services. Deployments updated to use new block storage
PVCs. Old NFS modules retained for 1-week rollback.
Services: ntfy, freshrss, insta2spotify, actualbudget (x3),
wealthfolio, navidrome (DB only), audiobookshelf config,
headscale, forgejo, uptime-kuma.
Also: set Recreate strategy on ntfy, forgejo, insta2spotify,
wealthfolio (required for RWO volumes).
Cluster doesn't have cert-manager installed. Use self-signed certificate
for the controller and disable the PVC mutating webhook (annotations are
set directly on PVCs via Terraform).
Deploy topolvm/pvc-autoresizer controller that monitors kubelet_volume_stats
via Prometheus and auto-expands annotated PVCs. Annotated all 9 block-storage
PVCs (proxmox-lvm) with per-PVC thresholds and max limits. Updated PVFillingUp
alert to critical/10m (means auto-expansion failed) and added PVAutoExpanding
info alert at 80%.
- Migrate ebooks-calibre-config-iscsi (2Gi, 2380 files) to proxmox-lvm
- Update docs/architecture/storage.md: document Proxmox CSI as primary
block storage, mark democratic-csi iSCSI as deprecated
- Add full migration plan to docs/plans/
When the pull-through proxy (10.0.20.10) is down, containerd now falls
back to the official upstream registries (registry-1.docker.io, ghcr.io)
instead of failing. Also cleans up stale disabled registry mirror dirs
and removes unnecessary containerd restart from the rollout script.
Blob caching (content-addressed by SHA256) is unaffected — only manifest
re-validation changes. Every pull now checks upstream for the current
manifest digest, eliminating stale :latest tag issues.
The cleanup-tags.sh + garbage-collect cycle can delete blob data while
leaving _layers/ link files intact. The registry then returns HTTP 200
with 0 bytes for those layers, causing "unexpected EOF" on image pulls.
fix-broken-blobs.sh walks all repositories, checks each layer link
against actual blob data, and removes orphaned links so the registry
re-fetches from upstream on next pull.
Schedule: daily at 2:30am (after tag cleanup) and Sunday 3:30am
(after garbage collection). First run found 2335/2556 (91%) of
layer links were orphaned.
localhost resolves to IPv6 ::1 but containers bind to 0.0.0.0 (IPv4
only), causing wget to fail with "Connection refused". The nginx
proxy had 18,462 consecutive health check failures because of this.
Also cleared corrupted pull-through cache for mghee/novelapp — the
registry had layer link files pointing to non-existent blob data,
causing containerd to get 200 responses with 0 bytes (unexpected EOF).
Caretta eBPF DaemonSet was using 600Mi x 5 nodes = 3Gi total for
non-critical network topology visualization. Removing it to free
memory for novelapp and aiostreams which were stuck in Pending.