From d6590612b233dcf79cee9028057193f0170872ed Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Sun, 24 May 2026 14:12:30 +0000 Subject: [PATCH] immich: bulk-import Anca's Elements photo archive into her account MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Grows pve/nfs-data 3T → 4T (online lvextend + resize2fs) to absorb ~340 GB of new originals landing under /srv/nfs/immich/upload during the import. Adds: - module "nfs_anca_elements_host" — RO PVC over /srv/nfs/anca-elements, consumed only by the import Job (not mounted in immich-server). - kubernetes_job_v1.anca_elements_import — immich-go v0.31.0 uploader posting to immich-server.immich.svc:2283 with Anca's API key (synced via the existing immich-secrets ExternalSecret from secret/immich.anca_api_key). Filters to image extensions, bans the non-photo top-level dirs (filme/, Music/, carti/, courses, installers, docs, etc.), puts every asset in the album "Poze (Elements)". Default `--pause-immich-jobs` is disabled — non-admin keys can't pause jobs. - docs/architecture/storage.md — note the new 4 TB size in 3 places. - docs/runbooks/grow-pve-nfs-lv.md — captures the one-shot lvextend procedure (no pve-host TF stack exists for this). Job is removed in the follow-up cleanup commit once the upload completes; the PVC stays for a videos batch later. Co-Authored-By: Claude Opus 4.7 --- docs/architecture/storage.md | 8 +- docs/runbooks/grow-pve-nfs-lv.md | 47 +++++++++++ stacks/immich/main.tf | 130 +++++++++++++++++++++++++++++++ 3 files changed, 181 insertions(+), 4 deletions(-) create mode 100644 docs/runbooks/grow-pve-nfs-lv.md diff --git a/docs/architecture/storage.md b/docs/architecture/storage.md index 22ddf9a0..856c2d2d 100644 --- a/docs/architecture/storage.md +++ b/docs/architecture/storage.md @@ -1,6 +1,6 @@ # Storage Architecture -Last updated: 2026-05-09 +Last updated: 2026-05-24 ## Overview @@ -13,7 +13,7 @@ The cluster uses two storage backends: **Proxmox CSI** for database block storag All services storing sensitive data were migrated to `proxmox-lvm-encrypted` on 2026-04-15. This eliminates the previous double-CoW (ZFS + LVM-thin) path and ensures data-at-rest encryption. **NFS storage (Proxmox host)**: ~100 NFS shares for media libraries (Immich, audiobookshelf, servarr, navidrome), backup targets (`*-backup/` directories), and app data are served directly from the Proxmox host at `192.168.1.127`. Two NFS export roots exist: -- **HDD NFS**: `/srv/nfs` on ext4 LV `pve/nfs-data` (3TB) — bulk media and backup targets +- **HDD NFS**: `/srv/nfs` on ext4 LV `pve/nfs-data` (4TB) — bulk media and backup targets - **SSD NFS**: `/srv/nfs-ssd` on ext4 LV `ssd/nfs-ssd-data` (100GB) — high-performance data (Immich ML) Both `StorageClass: nfs-truenas` and `StorageClass: nfs-proxmox` point to the Proxmox host and are functionally identical. The `nfs-truenas` name is historical — it was retained because StorageClass names are immutable on bound PVs (48 PVs reference it) and renaming would force mass PV churn across the cluster. @@ -31,7 +31,7 @@ graph TB subgraph Proxmox["Proxmox Host (192.168.1.127)"] sdc["sdc: 10.7TB RAID1 HDD
VG pve, LV data (thin pool)
~67 proxmox-lvm PVCs
~28 proxmox-lvm-encrypted PVCs"] sda["sda: 1.1TB RAID1 SAS
VG backup, LV data (ext4)
/mnt/backup"] - NFS_HDD["LV pve/nfs-data (3TB ext4)
/srv/nfs
~100 NFS shares
Media + backup targets"] + NFS_HDD["LV pve/nfs-data (4TB ext4)
/srv/nfs
~100 NFS shares
Media + backup targets"] NFS_SSD["LV ssd/nfs-ssd-data (100GB ext4)
/srv/nfs-ssd
High-performance data
(Immich ML)"] NFS_Exports["NFS Exports
managed by /etc/exports"] NFS_HDD --> NFS_Exports @@ -74,7 +74,7 @@ graph TB | **Proxmox CSI plugin** | Helm chart | Namespace: proxmox-csi | Block storage via LVM-thin hotplug | | **StorageClass `proxmox-lvm`** | RWO, WaitForFirstConsumer | Cluster-wide | Non-sensitive stateful apps | | **StorageClass `proxmox-lvm-encrypted`** | RWO, WaitForFirstConsumer, LUKS2 | Cluster-wide | **All sensitive data** (databases, auth, email, passwords, git) | -| Proxmox NFS (HDD) | LV `pve/nfs-data`, 3TB ext4 | 192.168.1.127:/srv/nfs | Bulk NFS data for all services | +| Proxmox NFS (HDD) | LV `pve/nfs-data`, 4TB ext4 | 192.168.1.127:/srv/nfs | Bulk NFS data for all services | | Proxmox NFS (SSD) | LV `ssd/nfs-ssd-data`, 100GB ext4 | 192.168.1.127:/srv/nfs-ssd | High-performance data (Immich ML) | | nfs-csi | Helm chart | Namespace: nfs-csi | NFS CSI driver | | StorageClass `nfs-proxmox` | RWX, soft mount | Cluster-wide | NFS storage, points to Proxmox host | diff --git a/docs/runbooks/grow-pve-nfs-lv.md b/docs/runbooks/grow-pve-nfs-lv.md new file mode 100644 index 00000000..156ebe84 --- /dev/null +++ b/docs/runbooks/grow-pve-nfs-lv.md @@ -0,0 +1,47 @@ +# Runbook: Grow `/srv/nfs` LV (`pve/nfs-data`) + +Use when `/srv/nfs` on the PVE host is filling up and the workloads writing to it cannot be slimmed down. The LV sits on the LVM-thin pool `pve/data` (10.54 TB total). Thin-pool free space is the real gate — confirm before extending. + +## When to use + +- `df -h /srv/nfs` shows usage > ~85 % and projected growth exceeds free space within a backup retention window. +- An upcoming bulk write (media import, restore) needs headroom that the current free space won't absorb. + +## Steps + +1. **Check thin-pool headroom on PVE host:** + + ```bash + ssh root@192.168.1.127 'lvs pve/data; lvs pve/nfs-data; df -h /srv/nfs' + ``` + + The `pve/data` thin pool's `Data%` should leave room for the extension (target `Data%` after extend < 90 %). + +2. **Extend the LV and online-resize ext4:** + + ```bash + ssh root@192.168.1.127 ' + lvextend -L +1T pve/nfs-data && + resize2fs /dev/pve/nfs-data + ' + ``` + + Both commands are safe online: `lvextend` only grows allocation, `resize2fs` extends ext4 while mounted. + +3. **Verify:** + + ```bash + ssh root@192.168.1.127 'lvs pve/nfs-data; df -h /srv/nfs' + ``` + + `df` should show the new size; `Use%` should drop proportionally. + +## Notes + +- **Not Terraform-managed.** PVE host LVs live outside the IaC tree (no `infra/stacks/pve-host/`). Record the new size in `docs/architecture/storage.md` (the "HDD NFS" line and the diagram label) in the same commit. +- **Thin-pool overcommit warning** from `lvextend` is informational — it reports the sum of all thin volume virtual sizes (currently ~12 TiB) vs. the physical pool (10.7 TiB). Real fill is `pve/data` `Data%`; ignore the overcommit warning unless `Data%` itself is climbing toward 100 %. +- **`/srv/nfs-ssd`** lives on a separate LV (`ssd/nfs-ssd-data`) backed by SSDs — the same `lvextend`/`resize2fs` pattern applies, but the source pool is `ssd/data`. + +## Backout + +Online shrinks are unsafe with active workloads. Don't try to shrink `pve/nfs-data` in place — restore from snapshot or migrate data out and rebuild the LV instead. diff --git a/stacks/immich/main.tf b/stacks/immich/main.tf index ae9ea310..5a3d2413 100644 --- a/stacks/immich/main.tf +++ b/stacks/immich/main.tf @@ -124,6 +124,19 @@ module "nfs_ml_cache_host" { nfs_path = "/srv/nfs-ssd/immich/machine-learning" } +# Read-only source for one-shot bulk imports into individual users' accounts +# (currently: Anca's WD Elements dump, mirrored to /srv/nfs/anca-elements from +# her Synology). Consumed only by the import Job below — NOT mounted into the +# immich-server Deployment. PVC stays after the Job is removed so videos can +# follow in batch 2. +module "nfs_anca_elements_host" { + source = "../../modules/kubernetes/nfs_volume" + name = "immich-anca-elements-host" + namespace = kubernetes_namespace.immich.metadata[0].name + nfs_server = var.proxmox_host + nfs_path = "/srv/nfs/anca-elements" +} + resource "kubernetes_namespace" "immich" { metadata { name = "immich" @@ -865,6 +878,123 @@ resource "kubernetes_cron_job_v1" "postgresql-backup" { } } +# One-shot bulk import of Anca's Synology Elements photo archive into her +# Immich account. Reads /srv/nfs/anca-elements via the RO PVC above and posts +# assets to immich-server in-cluster (bypasses ingress + CrowdSec entirely). +# +# Auth: Anca's personal Immich API key. Add to Vault `secret/immich` under key +# `anca_api_key`, then force-refresh the existing `immich-secrets` ExternalSecret: +# kubectl annotate externalsecret immich-secrets -n immich \ +# force-sync=$(date +%s) --overwrite +# +# After successful completion: REMOVE this resource block + apply again. The +# PVC stays for a videos batch later. Filters target a photo-only subset of +# the dump (videos / installers / docs / courses banned); EXIF is preserved +# end-to-end since immich-go uploads originals byte-for-byte. +resource "kubernetes_job_v1" "anca_elements_import" { + metadata { + name = "anca-elements-import" + namespace = kubernetes_namespace.immich.metadata[0].name + labels = { + app = "anca-elements-import" + tier = local.tiers.gpu + } + } + + # Don't block `terragrunt apply` on the multi-hour upload — TF returns once + # the Job is created; monitor via `kubectl logs -n immich -f job/...`. + wait_for_completion = false + + spec { + backoff_limit = 2 + ttl_seconds_after_finished = 604800 + template { + metadata { + labels = { + app = "anca-elements-import" + } + } + spec { + restart_policy = "OnFailure" + container { + name = "immich-go" + image = "alpine:3.20" + command = [ + "/bin/sh", + "-c", + <<-EOT + set -eu + apk add --no-cache curl tar ca-certificates >/dev/null + + IMMICH_GO_VERSION="v0.31.0" + cd /tmp + echo "Downloading immich-go $${IMMICH_GO_VERSION}…" + curl -sL "https://github.com/simulot/immich-go/releases/download/$${IMMICH_GO_VERSION}/immich-go_Linux_x86_64.tar.gz" \ + | tar -xz + chmod +x ./immich-go + + echo "Starting upload from /data → http://immich-server.immich.svc.cluster.local:2283 …" + exec ./immich-go upload from-folder /data \ + --server http://immich-server.immich.svc.cluster.local:2283 \ + --api-key "$${IMMICH_API_KEY}" \ + --include-extensions .jpg,.jpeg,.png,.heic,.heif,.gif,.tif,.tiff,.webp,.nef,.cr2,.dng,.raw \ + --into-album "Poze (Elements)" \ + --ban-file "filme/" --ban-file "Music/" --ban-file "carti/" \ + --ban-file "cursuri/" --ban-file "Adobe.*/" \ + --ban-file "Fullstack Web Development*/" \ + --ban-file "Contracte and CV/" --ban-file "Cv/" \ + --ban-file "docum/" --ban-file "finance/" \ + --ban-file "download/" --ban-file "kit/" \ + --ban-file "csp/" --ban-file "KOREAN/" \ + --ban-file "System Volume Information/" \ + --pause-immich-jobs=false \ + --concurrent-tasks 8 \ + --client-timeout 1h \ + --no-ui \ + --on-errors continue + EOT + ] + env { + name = "IMMICH_API_KEY" + value_from { + secret_key_ref { + name = "immich-secrets" + key = "anca_api_key" + } + } + } + volume_mount { + name = "anca-elements" + mount_path = "/data" + read_only = true + } + resources { + requests = { + cpu = "500m" + memory = "1Gi" + } + limits = { + memory = "1Gi" + } + } + } + volume { + name = "anca-elements" + persistent_volume_claim { + claim_name = module.nfs_anca_elements_host.claim_name + read_only = true + } + } + } + } + } + lifecycle { + # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2 + ignore_changes = [spec[0].template[0].spec[0].dns_config] + } + depends_on = [kubernetes_manifest.external_secret] +} + # POWER TOOLS # resource "kubernetes_deployment" "powertools" {