diff --git a/docs/architecture/storage.md b/docs/architecture/storage.md index 22ddf9a0..856c2d2d 100644 --- a/docs/architecture/storage.md +++ b/docs/architecture/storage.md @@ -1,6 +1,6 @@ # Storage Architecture -Last updated: 2026-05-09 +Last updated: 2026-05-24 ## Overview @@ -13,7 +13,7 @@ The cluster uses two storage backends: **Proxmox CSI** for database block storag All services storing sensitive data were migrated to `proxmox-lvm-encrypted` on 2026-04-15. This eliminates the previous double-CoW (ZFS + LVM-thin) path and ensures data-at-rest encryption. **NFS storage (Proxmox host)**: ~100 NFS shares for media libraries (Immich, audiobookshelf, servarr, navidrome), backup targets (`*-backup/` directories), and app data are served directly from the Proxmox host at `192.168.1.127`. Two NFS export roots exist: -- **HDD NFS**: `/srv/nfs` on ext4 LV `pve/nfs-data` (3TB) — bulk media and backup targets +- **HDD NFS**: `/srv/nfs` on ext4 LV `pve/nfs-data` (4TB) — bulk media and backup targets - **SSD NFS**: `/srv/nfs-ssd` on ext4 LV `ssd/nfs-ssd-data` (100GB) — high-performance data (Immich ML) Both `StorageClass: nfs-truenas` and `StorageClass: nfs-proxmox` point to the Proxmox host and are functionally identical. The `nfs-truenas` name is historical — it was retained because StorageClass names are immutable on bound PVs (48 PVs reference it) and renaming would force mass PV churn across the cluster. @@ -31,7 +31,7 @@ graph TB subgraph Proxmox["Proxmox Host (192.168.1.127)"] sdc["sdc: 10.7TB RAID1 HDD
VG pve, LV data (thin pool)
~67 proxmox-lvm PVCs
~28 proxmox-lvm-encrypted PVCs"] sda["sda: 1.1TB RAID1 SAS
VG backup, LV data (ext4)
/mnt/backup"] - NFS_HDD["LV pve/nfs-data (3TB ext4)
/srv/nfs
~100 NFS shares
Media + backup targets"] + NFS_HDD["LV pve/nfs-data (4TB ext4)
/srv/nfs
~100 NFS shares
Media + backup targets"] NFS_SSD["LV ssd/nfs-ssd-data (100GB ext4)
/srv/nfs-ssd
High-performance data
(Immich ML)"] NFS_Exports["NFS Exports
managed by /etc/exports"] NFS_HDD --> NFS_Exports @@ -74,7 +74,7 @@ graph TB | **Proxmox CSI plugin** | Helm chart | Namespace: proxmox-csi | Block storage via LVM-thin hotplug | | **StorageClass `proxmox-lvm`** | RWO, WaitForFirstConsumer | Cluster-wide | Non-sensitive stateful apps | | **StorageClass `proxmox-lvm-encrypted`** | RWO, WaitForFirstConsumer, LUKS2 | Cluster-wide | **All sensitive data** (databases, auth, email, passwords, git) | -| Proxmox NFS (HDD) | LV `pve/nfs-data`, 3TB ext4 | 192.168.1.127:/srv/nfs | Bulk NFS data for all services | +| Proxmox NFS (HDD) | LV `pve/nfs-data`, 4TB ext4 | 192.168.1.127:/srv/nfs | Bulk NFS data for all services | | Proxmox NFS (SSD) | LV `ssd/nfs-ssd-data`, 100GB ext4 | 192.168.1.127:/srv/nfs-ssd | High-performance data (Immich ML) | | nfs-csi | Helm chart | Namespace: nfs-csi | NFS CSI driver | | StorageClass `nfs-proxmox` | RWX, soft mount | Cluster-wide | NFS storage, points to Proxmox host | diff --git a/docs/runbooks/grow-pve-nfs-lv.md b/docs/runbooks/grow-pve-nfs-lv.md new file mode 100644 index 00000000..156ebe84 --- /dev/null +++ b/docs/runbooks/grow-pve-nfs-lv.md @@ -0,0 +1,47 @@ +# Runbook: Grow `/srv/nfs` LV (`pve/nfs-data`) + +Use when `/srv/nfs` on the PVE host is filling up and the workloads writing to it cannot be slimmed down. The LV sits on the LVM-thin pool `pve/data` (10.54 TB total). Thin-pool free space is the real gate — confirm before extending. + +## When to use + +- `df -h /srv/nfs` shows usage > ~85 % and projected growth exceeds free space within a backup retention window. +- An upcoming bulk write (media import, restore) needs headroom that the current free space won't absorb. + +## Steps + +1. **Check thin-pool headroom on PVE host:** + + ```bash + ssh root@192.168.1.127 'lvs pve/data; lvs pve/nfs-data; df -h /srv/nfs' + ``` + + The `pve/data` thin pool's `Data%` should leave room for the extension (target `Data%` after extend < 90 %). + +2. **Extend the LV and online-resize ext4:** + + ```bash + ssh root@192.168.1.127 ' + lvextend -L +1T pve/nfs-data && + resize2fs /dev/pve/nfs-data + ' + ``` + + Both commands are safe online: `lvextend` only grows allocation, `resize2fs` extends ext4 while mounted. + +3. **Verify:** + + ```bash + ssh root@192.168.1.127 'lvs pve/nfs-data; df -h /srv/nfs' + ``` + + `df` should show the new size; `Use%` should drop proportionally. + +## Notes + +- **Not Terraform-managed.** PVE host LVs live outside the IaC tree (no `infra/stacks/pve-host/`). Record the new size in `docs/architecture/storage.md` (the "HDD NFS" line and the diagram label) in the same commit. +- **Thin-pool overcommit warning** from `lvextend` is informational — it reports the sum of all thin volume virtual sizes (currently ~12 TiB) vs. the physical pool (10.7 TiB). Real fill is `pve/data` `Data%`; ignore the overcommit warning unless `Data%` itself is climbing toward 100 %. +- **`/srv/nfs-ssd`** lives on a separate LV (`ssd/nfs-ssd-data`) backed by SSDs — the same `lvextend`/`resize2fs` pattern applies, but the source pool is `ssd/data`. + +## Backout + +Online shrinks are unsafe with active workloads. Don't try to shrink `pve/nfs-data` in place — restore from snapshot or migrate data out and rebuild the LV instead. diff --git a/scripts/anca-elements-mirror.service b/scripts/anca-elements-mirror.service deleted file mode 100644 index db1bf270..00000000 --- a/scripts/anca-elements-mirror.service +++ /dev/null @@ -1,15 +0,0 @@ -[Unit] -Description=Mirror /srv/nfs/anca-elements to /mnt/backup (single-disk-failure protection) -After=network-online.target local-fs.target -Wants=network-online.target - -[Service] -Type=oneshot -ExecStart=/usr/local/bin/anca-elements-mirror -StandardOutput=journal -StandardError=journal -SyslogIdentifier=anca-elements-mirror -# Big sustained IO — don't compete with foreground services. -Nice=10 -IOSchedulingClass=idle -TimeoutStartSec=18000 diff --git a/scripts/anca-elements-mirror.sh b/scripts/anca-elements-mirror.sh deleted file mode 100644 index 4ce61ca2..00000000 --- a/scripts/anca-elements-mirror.sh +++ /dev/null @@ -1,82 +0,0 @@ -#!/usr/bin/env bash -# anca-elements-mirror — single-disk-failure mirror of /srv/nfs/anca-elements → /mnt/backup -# -# Deploy to PVE host at /usr/local/bin/anca-elements-mirror. -# Schedule: weekly Mon 04:00 via systemd timer (anca-elements-mirror.timer). -# -# WHY: /srv/nfs/anca-elements lives on the sdc thin pool. Synology no longer -# holds the original (deleted after this mirror was verified). sda /mnt/backup -# is the only other local disk with room (~770G) — this gives us a single- -# disk-failure copy. No offsite for this archive (intentional, see backup-dr.md). -# -# Idempotent: `rsync -aH --delete` makes destination match source exactly. -# Re-runs only transfer changed files. - -set -euo pipefail - -SRC=/srv/nfs/anca-elements -DST=/mnt/backup/anca-elements -LOG=/var/log/anca-elements-mirror.log -LOCKFILE=/run/anca-elements-mirror.lock -PUSHGATEWAY="${ANCA_MIRROR_PUSHGATEWAY:-http://10.0.20.100:30091}" -PUSHGATEWAY_JOB=anca-elements-mirror - -log() { echo "[$(date -u '+%Y-%m-%dT%H:%M:%SZ')] $*" | tee -a "$LOG"; } -warn() { log "WARN: $*"; } - -push_metrics() { - local status="${1:-0}" bytes="${2:-0}" - cat </dev/null || true -anca_elements_mirror_last_run_timestamp $(date +%s) -anca_elements_mirror_last_status ${status} -anca_elements_mirror_bytes ${bytes} -EOF -} - -KILLED="" -cleanup() { - rm -f "$LOCKFILE" - if [ -n "$KILLED" ]; then - push_metrics 2 0 # status=2 → aborted (matches lvm-pvc-snapshot convention) - fi -} -trap cleanup EXIT -trap 'KILLED=1; exit 143' TERM INT - -if ! ( set -o noclobber; echo $$ > "$LOCKFILE" ) 2>/dev/null; then - log "FATAL: another instance running (pid $(cat "$LOCKFILE" 2>/dev/null || echo unknown))" - exit 1 -fi - -mountpoint -q /mnt/backup || { log "FATAL: /mnt/backup not mounted"; push_metrics 1 0; exit 1; } -[ -d "$SRC" ] || { log "FATAL: source $SRC missing"; push_metrics 1 0; exit 1; } - -mkdir -p "$DST" - -log "=== mirror starting: $SRC → $DST ===" -SRC_SIZE_GB=$(du -sBG "$SRC" 2>/dev/null | awk '{print $1}') -log "source size: $SRC_SIZE_GB" - -# -aH preserves hardlinks (probably none here, cheap insurance). -# --info=stats2 emits a final transfer summary into the log. -# --no-perms / --no-owner / --no-group: source has root:www-data 2775 and -# we don't need to perfectly preserve those on the mirror copy — dest will -# inherit /mnt/backup's defaults. (Symmetric with anca-elements-sync.sh's -# choice when copying FROM Synology.) -RSYNC_RC=0 -rsync \ - -rlt --delete -H \ - --no-perms --no-owner --no-group \ - --info=stats2 \ - "$SRC/" "$DST/" 2>&1 | tee -a "$LOG" || RSYNC_RC=${PIPESTATUS[0]} - -DST_BYTES=$(du -sb "$DST" 2>/dev/null | awk '{print $1}') - -if [ "$RSYNC_RC" -eq 0 ]; then - log "=== mirror complete; dest size: $(du -sh "$DST" | cut -f1) ===" - push_metrics 0 "$DST_BYTES" -else - log "=== mirror failed: rsync exited $RSYNC_RC ===" - push_metrics 1 "$DST_BYTES" - exit "$RSYNC_RC" -fi diff --git a/scripts/anca-elements-mirror.timer b/scripts/anca-elements-mirror.timer deleted file mode 100644 index 642a7773..00000000 --- a/scripts/anca-elements-mirror.timer +++ /dev/null @@ -1,10 +0,0 @@ -[Unit] -Description=Weekly anca-elements mirror to /mnt/backup - -[Timer] -OnCalendar=Mon *-*-* 04:00:00 -Persistent=true -RandomizedDelaySec=15min - -[Install] -WantedBy=timers.target diff --git a/scripts/anca-elements-sync.sh b/scripts/anca-elements-sync.sh deleted file mode 100755 index e3fa14f7..00000000 --- a/scripts/anca-elements-sync.sh +++ /dev/null @@ -1,112 +0,0 @@ -#!/usr/bin/env bash -# anca-elements-sync.sh — copy Anca's WD-Elements backup from Synology to PVE NFS -# -# Usage: -# /usr/local/bin/anca-elements-sync.sh -# -# Idempotent: re-running after a successful sync is a no-op (only the dry-run -# verification runs, which reports "sync verified clean" immediately). -# -# Resumable: if fpsync was interrupted, resume with: -# fpsync -r /var/tmp/fpsync \ -# -n 4 -s 4G \ -# -o "-lptgoD -H --no-perms --no-owner --no-group --exclude=@eaDir/ --exclude=*@synoeastream --exclude=.DS_Store --exclude=Thumbs.db" \ -# /mnt/synology-backup/Anca/Elements/ /srv/nfs/anca-elements/ -# -# NOTE: fpsync -o = rsync options override (what we want) -# fpsync -O = fpart partition options override (NOT rsync) -# NOTE: Do NOT use -a or -r in fpsync rsync options — fpsync handles -# recursion via fpart; -r causes fpsync to warn and skip the slab. -# -# Log: /var/log/anca-elements-sync.log - -set -euo pipefail - -LOG=/var/log/anca-elements-sync.log -SRC_HOST=192.168.1.13 -SRC_EXPORT=/volume1/Backup -SRC_SUBPATH=Anca/Elements -MOUNT_POINT=/mnt/synology-backup -DEST=/srv/nfs/anca-elements - -log() { - echo "[$(date -u '+%Y-%m-%dT%H:%M:%SZ')] $*" | tee -a "$LOG" -} - -# ── 1. Ensure destination + mount-point directories exist ──────────────────── -log "Step 1: ensuring directories" -mkdir -p "$DEST" "$MOUNT_POINT" - -# ── 2. NFS-mount Synology read-only (skip if already mounted) ─────────────── -MOUNTED_HERE=0 -if mountpoint -q "$MOUNT_POINT"; then - log "Step 2: $MOUNT_POINT already mounted — skipping" -else - log "Step 2: mounting ${SRC_HOST}:${SRC_EXPORT} at $MOUNT_POINT (read-only)" - mount -t nfs \ - -o ro,vers=4,nolock,soft,timeo=300,retrans=2 \ - "${SRC_HOST}:${SRC_EXPORT}" \ - "$MOUNT_POINT" - MOUNTED_HERE=1 - log "Step 2: mount successful" -fi - -# ── 3. Ensure fpsync (from fpart package) is available ────────────────────── -log "Step 3: checking for fpsync" -if ! command -v fpsync >/dev/null 2>&1; then - log "Step 3: fpsync not found — installing fpart" - apt-get install -y fpart - log "Step 3: fpart installed" -else - log "Step 3: fpsync already available" -fi - -# ── 4. Run fpsync (4-way parallel, no compression — source is already-compressed media) ── -log "Step 4: starting fpsync" -log " source : ${MOUNT_POINT}/${SRC_SUBPATH}/" -log " dest : ${DEST}/" -log " workers: 4, slab: 4G" -fpsync \ - -n 4 \ - -s 4G \ - -o "-lptgoD -H --no-perms --no-owner --no-group --exclude=@eaDir/ --exclude=*@synoeastream --exclude=.DS_Store --exclude=Thumbs.db" \ - "${MOUNT_POINT}/${SRC_SUBPATH}/" \ - "${DEST}/" \ - 2>&1 | tee -a "$LOG" -log "Step 4: fpsync completed" - -# ── 5. Verification dry-run ────────────────────────────────────────────────── -log "Step 5: running dry-run verification rsync" -VERIFY_OUT=$(rsync \ - -rlptgoD -H --no-perms --no-owner --no-group \ - --exclude='@eaDir/' --exclude='*@synoeastream' \ - --exclude='.DS_Store' --exclude='Thumbs.db' \ - -n --delete \ - --info=progress2 \ - --out-format='%o %f' \ - "${MOUNT_POINT}/${SRC_SUBPATH}/" \ - "${DEST}/" \ - 2>&1 || true) - -# Count lines that represent actual file changes (send / del. operations) -CHANGE_COUNT=$(echo "$VERIFY_OUT" | grep -cE '^(send|del\.)' || true) - -if [ "$CHANGE_COUNT" -eq 0 ]; then - log "Step 5: sync verified clean — no pending changes" -else - log "Step 5: WARNING — verification found ${CHANGE_COUNT} pending change(s). First 50 lines:" - # Use printf to avoid SIGPIPE from head closing the pipe early (set -o pipefail) - { echo "$VERIFY_OUT" | head -50; } >> "$LOG" 2>&1 || true -fi - -# ── 6. Unmount (only if we mounted it) ────────────────────────────────────── -if [ "$MOUNTED_HERE" -eq 1 ]; then - log "Step 6: unmounting $MOUNT_POINT" - umount "$MOUNT_POINT" - rmdir "$MOUNT_POINT" - log "Step 6: unmounted" -else - log "Step 6: mount was pre-existing — leaving in place" -fi - -log "Done. Final size: $(du -sh "${DEST}" | cut -f1)" diff --git a/stacks/immich/main.tf b/stacks/immich/main.tf index ae9ea310..5a3d2413 100644 --- a/stacks/immich/main.tf +++ b/stacks/immich/main.tf @@ -124,6 +124,19 @@ module "nfs_ml_cache_host" { nfs_path = "/srv/nfs-ssd/immich/machine-learning" } +# Read-only source for one-shot bulk imports into individual users' accounts +# (currently: Anca's WD Elements dump, mirrored to /srv/nfs/anca-elements from +# her Synology). Consumed only by the import Job below — NOT mounted into the +# immich-server Deployment. PVC stays after the Job is removed so videos can +# follow in batch 2. +module "nfs_anca_elements_host" { + source = "../../modules/kubernetes/nfs_volume" + name = "immich-anca-elements-host" + namespace = kubernetes_namespace.immich.metadata[0].name + nfs_server = var.proxmox_host + nfs_path = "/srv/nfs/anca-elements" +} + resource "kubernetes_namespace" "immich" { metadata { name = "immich" @@ -865,6 +878,123 @@ resource "kubernetes_cron_job_v1" "postgresql-backup" { } } +# One-shot bulk import of Anca's Synology Elements photo archive into her +# Immich account. Reads /srv/nfs/anca-elements via the RO PVC above and posts +# assets to immich-server in-cluster (bypasses ingress + CrowdSec entirely). +# +# Auth: Anca's personal Immich API key. Add to Vault `secret/immich` under key +# `anca_api_key`, then force-refresh the existing `immich-secrets` ExternalSecret: +# kubectl annotate externalsecret immich-secrets -n immich \ +# force-sync=$(date +%s) --overwrite +# +# After successful completion: REMOVE this resource block + apply again. The +# PVC stays for a videos batch later. Filters target a photo-only subset of +# the dump (videos / installers / docs / courses banned); EXIF is preserved +# end-to-end since immich-go uploads originals byte-for-byte. +resource "kubernetes_job_v1" "anca_elements_import" { + metadata { + name = "anca-elements-import" + namespace = kubernetes_namespace.immich.metadata[0].name + labels = { + app = "anca-elements-import" + tier = local.tiers.gpu + } + } + + # Don't block `terragrunt apply` on the multi-hour upload — TF returns once + # the Job is created; monitor via `kubectl logs -n immich -f job/...`. + wait_for_completion = false + + spec { + backoff_limit = 2 + ttl_seconds_after_finished = 604800 + template { + metadata { + labels = { + app = "anca-elements-import" + } + } + spec { + restart_policy = "OnFailure" + container { + name = "immich-go" + image = "alpine:3.20" + command = [ + "/bin/sh", + "-c", + <<-EOT + set -eu + apk add --no-cache curl tar ca-certificates >/dev/null + + IMMICH_GO_VERSION="v0.31.0" + cd /tmp + echo "Downloading immich-go $${IMMICH_GO_VERSION}…" + curl -sL "https://github.com/simulot/immich-go/releases/download/$${IMMICH_GO_VERSION}/immich-go_Linux_x86_64.tar.gz" \ + | tar -xz + chmod +x ./immich-go + + echo "Starting upload from /data → http://immich-server.immich.svc.cluster.local:2283 …" + exec ./immich-go upload from-folder /data \ + --server http://immich-server.immich.svc.cluster.local:2283 \ + --api-key "$${IMMICH_API_KEY}" \ + --include-extensions .jpg,.jpeg,.png,.heic,.heif,.gif,.tif,.tiff,.webp,.nef,.cr2,.dng,.raw \ + --into-album "Poze (Elements)" \ + --ban-file "filme/" --ban-file "Music/" --ban-file "carti/" \ + --ban-file "cursuri/" --ban-file "Adobe.*/" \ + --ban-file "Fullstack Web Development*/" \ + --ban-file "Contracte and CV/" --ban-file "Cv/" \ + --ban-file "docum/" --ban-file "finance/" \ + --ban-file "download/" --ban-file "kit/" \ + --ban-file "csp/" --ban-file "KOREAN/" \ + --ban-file "System Volume Information/" \ + --pause-immich-jobs=false \ + --concurrent-tasks 8 \ + --client-timeout 1h \ + --no-ui \ + --on-errors continue + EOT + ] + env { + name = "IMMICH_API_KEY" + value_from { + secret_key_ref { + name = "immich-secrets" + key = "anca_api_key" + } + } + } + volume_mount { + name = "anca-elements" + mount_path = "/data" + read_only = true + } + resources { + requests = { + cpu = "500m" + memory = "1Gi" + } + limits = { + memory = "1Gi" + } + } + } + volume { + name = "anca-elements" + persistent_volume_claim { + claim_name = module.nfs_anca_elements_host.claim_name + read_only = true + } + } + } + } + } + lifecycle { + # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2 + ignore_changes = [spec[0].template[0].spec[0].dns_config] + } + depends_on = [kubernetes_manifest.external_secret] +} + # POWER TOOLS # resource "kubernetes_deployment" "powertools" {