Viktor Barzin d49acebd8e migrate ebooks-calibre to proxmox-lvm, update storage docs [ci skip]

- Migrate ebooks-calibre-config-iscsi (2Gi, 2380 files) to proxmox-lvm
- Update docs/architecture/storage.md: document Proxmox CSI as primary
  block storage, mark democratic-csi iSCSI as deprecated
- Add full migration plan to docs/plans/

2026-04-03 19:45:34 +03:00

13 KiB

Raw Blame History

Storage Architecture

Last updated: 2026-04-03

Overview

The cluster uses two storage backends: Proxmox CSI for database block storage and TrueNAS NFS for application data.

Block storage (Proxmox CSI): 13 PVCs for databases (CNPG PostgreSQL, MySQL InnoDB, Redis, Vaultwarden, Prometheus, Nextcloud, Calibre-Web) use StorageClass: proxmox-lvm, which provisions thin LVs directly from the Proxmox host's local-lvm storage. This eliminates the previous double-CoW (ZFS + LVM-thin) path that caused 56 ZFS checksum errors.

NFS storage (TrueNAS): ~100 NFS shares for application data, media, configs, and backup targets continue to use TrueNAS ZFS at 10.0.10.15 via StorageClass: nfs-truenas.

Migration (2026-04-02): All iSCSI block volumes were migrated from democratic-csi (TrueNAS iSCSI → ZFS → LVM-thin) to Proxmox CSI (direct LVM-thin hotplug). democratic-csi iSCSI driver is deprecated and pending removal.

Architecture Diagram

graph TB
    subgraph TrueNAS["TrueNAS (10.0.10.15)<br/>VMID 9000, 16c/16GB"]
        ZFS_Main["ZFS Pool: main<br/>1.64 TiB<br/>32G + 7x256G + 1T disks"]
        ZFS_SSD["ZFS Pool: ssd<br/>~256GB SSD<br/>Immich ML, PostgreSQL hot data"]

        ZFS_Main --> NFS_Datasets["NFS Datasets<br/>~100 shares<br/>main/&lt;service&gt;"]
        ZFS_Main --> iSCSI_Datasets["iSCSI Datasets<br/>main/iscsi (zvols)<br/>main/iscsi-snaps"]

        NFS_Datasets --> NFS_Exports["NFS Exports<br/>managed by secrets/nfs_exports.sh"]
        iSCSI_Datasets --> iSCSI_Targets["iSCSI Targets<br/>SSH-managed via democratic-csi"]

        ZFS_SSD --> SSD_Data["Immich ML models<br/>PostgreSQL CNPG"]
    end

    subgraph K8s["Kubernetes Cluster"]
        CSI_NFS["democratic-csi-nfs<br/>StorageClass: nfs-truenas<br/>soft,timeo=30,retrans=3"]
        CSI_iSCSI["democratic-csi-iscsi<br/>StorageClass: iscsi-truenas<br/>SSH driver"]

        NFS_PV["NFS PersistentVolumes<br/>RWX, ~100 volumes"]
        iSCSI_PV["iSCSI PersistentVolumes<br/>RWO, ~19 volumes"]

        Pods["Application Pods"]
        DBPods["Database Pods<br/>PostgreSQL CNPG<br/>MySQL InnoDB"]
    end

    NFS_Exports -->|CSI driver| CSI_NFS
    iSCSI_Targets -->|CSI driver| CSI_iSCSI

    CSI_NFS --> NFS_PV
    CSI_iSCSI --> iSCSI_PV

    NFS_PV --> Pods
    iSCSI_PV --> DBPods

    style TrueNAS fill:#e1f5ff
    style K8s fill:#fff4e1
    style ZFS_Main fill:#c8e6c9
    style ZFS_SSD fill:#ffe0b2

Components

Component	Version/Config	Location	Purpose
Proxmox CSI plugin	Helm chart	Namespace: proxmox-csi	Block storage via LVM-thin hotplug
StorageClass `proxmox-lvm`	RWO, WaitForFirstConsumer	Cluster-wide	Databases and stateful apps
TrueNAS VM	VMID 9000, 16c/16GB	Proxmox host (10.0.10.15)	ZFS NFS storage server
ZFS pool `main`	1.64 TiB usable	32G + 7x256G + 1T disks	NFS data for all services
ZFS pool `ssd`	~256GB SSD	Dedicated SSD	High-performance data (Immich ML)
nfs-csi	Helm chart	Namespace: nfs-csi	NFS CSI driver
StorageClass `nfs-truenas`	RWX, soft mount	Cluster-wide	Default storage for apps
~~democratic-csi-iscsi~~	DEPRECATED	Namespace: iscsi-csi	Replaced by Proxmox CSI (2026-04-02)
~~StorageClass `iscsi-truenas`~~	DEPRECATED	Cluster-wide	Replaced by `proxmox-lvm`
TF module `nfs_volume`	`modules/kubernetes/nfs_volume/`	Infra repo	NFS PV/PVC factory

How It Works

NFS Storage Flow

Dataset creation: NFS shares are created as ZFS datasets under main/<service> (e.g., main/immich, main/nextcloud)
Export configuration: /root/secrets/nfs_exports.sh on TrueNAS generates /etc/exports with per-dataset exports (/mnt/main/<service>)
CSI provisioning: democratic-csi-nfs mounts NFS shares and creates K8s PersistentVolumes

Terraform module: Stacks use modules/kubernetes/nfs_volume/ to declaratively create PV + PVC pairs:

module "nfs_data" {
  source     = "../../modules/kubernetes/nfs_volume"
  name       = "immich-data"
  namespace  = kubernetes_namespace.immich.metadata[0].name
  nfs_server = var.nfs_server  # 10.0.10.15
  nfs_path   = "/mnt/main/immich"
}

Pod mount: Applications reference PVCs in their deployment specs
Mount options: All NFS mounts use soft,timeo=30,retrans=3 (set in StorageClass) to prevent indefinite hangs

CRITICAL: Never use inline nfs {} blocks in pod specs — they default to hard,timeo=600 which causes 10-minute hangs on network issues. Always use the nfs-truenas StorageClass via PVCs.

Block Storage Flow (Proxmox CSI) — NEW

PVC creation: Pod requests a PVC with storageClass: proxmox-lvm
CSI provisioning: Proxmox CSI plugin calls the Proxmox API to create a thin LV in the local-lvm storage
SCSI hotplug: The thin LV is hotplugged as a VirtIO-SCSI disk directly into the K8s node VM
Filesystem: CSI formats the disk as ext4 and mounts it into the pod
Exclusive access: RWO only — disk is attached to one VM at a time
Topology: Nodes are labeled with topology.kubernetes.io/region=pve and zone=pve for scheduling

Key advantage: Single CoW layer (LVM-thin only). No ZFS, no iSCSI network hop, no double-CoW corruption.

Proxmox API token: csi@pve!csi-token with CSI role (VM.Audit VM.Config.Disk Datastore.Allocate Datastore.AllocateSpace Datastore.Audit). Stored in Vault at secret/viktor.

iSCSI Storage Flow (DEPRECATED — replaced 2026-04-02)

This section is historical. All iSCSI PVCs have been migrated to Proxmox CSI (proxmox-lvm). The democratic-csi iSCSI driver is pending removal.

~~Zvol creation: democratic-csi creates ZFS zvols under main/iscsi/<pvc-name> via SSH commands~~
~~Target setup: TrueNAS iSCSI service exposes zvols as iSCSI LUNs~~
~~Initiator connection: K8s nodes connect via open-iscsi~~

SQLite on NFS — Why It Fails

SQLite uses fsync() to guarantee durability. NFS's soft mount + async semantics break this:

Soft mount returns success even if data is still in client cache
Network blips during fsync → incomplete writes → corruption
WAL mode helps but doesn't eliminate the race

Solution: Use Proxmox CSI (proxmox-lvm) for any SQLite database (Vaultwarden, plotting-book) or local disk (ephemeral).

Democratic-CSI Sidecar Resources

The Helm chart spawns 17 sidecar containers (driver-registrar, external-provisioner, etc.) across controller + node DaemonSet pods. Each sidecar defaults to resources: {}, which gets LimitRange defaults of 256Mi.

Fix: Set explicit resources in values.yaml:

csiProxy:  # TOP-LEVEL key, not nested
  resources:
    requests:
      memory: "32Mi"
    limits:
      memory: "32Mi"

controller:
  externalProvisioner:
    resources:
      requests: {memory: "64Mi"}
      limits: {memory: "64Mi"}
  # ... repeat for all sidecars

Total footprint: ~1.5Gi → ~400Mi.

Configuration

Key Files

Path	Purpose
`/root/secrets/nfs_exports.sh`	TrueNAS: generates `/etc/exports` with all service shares
`stacks/proxmox-csi/`	Terraform stack for Proxmox CSI plugin + StorageClass
`stacks/iscsi-csi/`	DEPRECATED — democratic-csi iSCSI driver (pending removal)
`stacks/nfs-csi/`	NFS CSI driver
`modules/kubernetes/nfs_volume/`	Reusable module for NFS PV/PVC creation
`config.tfvars`	Variable `nfs_server = "10.0.10.15"` shared by all stacks

Vault Paths

Path	Contents
`secret/viktor/truenas_ssh_key`	SSH private key for democratic-csi SSH driver
`secret/viktor/truenas_root_password`	TrueNAS root password (web UI access)

Terraform Stacks

stacks/proxmox-csi/: Deploys Proxmox CSI plugin + proxmox-lvm StorageClass + node topology labels
stacks/nfs-csi/: Deploys NFS CSI driver for TrueNAS
stacks/iscsi-csi/: ~~Deploys democratic-csi iSCSI driver~~ — DEPRECATED, pending removal
All application stacks reference NFS volumes via module "nfs_<name>" calls
Database PVCs use storageClass: proxmox-lvm (CNPG, MySQL Helm VCT, Redis Helm, standalone PVCs)

NFS Export Management

NFS exports are NOT managed by Terraform. To add a new service:

SSH to TrueNAS: ssh root@10.0.10.15
Edit /root/secrets/nfs_exports.sh

Add dataset + export entry:

create_nfs_export "main/<service>" "/mnt/main/<service>"

Run the script: /root/secrets/nfs_exports.sh
Verify: showmount -e 10.0.10.15

Decisions & Rationale

Why NFS for Most Workloads?

Simplicity: No volume provisioning delays, instant mounts
RWX support: Multiple pods can share one volume (Nextcloud, Immich)
ZFS benefits: Snapshots, compression, dedup all work at dataset level
Good enough: For SQLite on NFS specifically, we accept the risk for low-value data (logs, caches) but mandate iSCSI for critical DBs

Why iSCSI for Databases?

ACID guarantees: Block device + local filesystem = real fsync
Performance: No NFS protocol overhead for random I/O
Tested: PostgreSQL CNPG and MySQL InnoDB Cluster both run on iSCSI, zero corruption in 2+ years

Why SSH Driver Over API?

The democratic-csi API driver (driver: freenas-api-iscsi) has these issues:

Requires TrueNAS API credentials in plaintext ConfigMap
Fails silently when API schema changes between TrueNAS versions
No retry logic on transient API errors

SSH driver (driver: freenas-ssh) is simpler:

Direct zfs commands, no API translation layer
SSH key auth (Vault-managed)
Deterministic error messages

Why Soft Mount for NFS?

Hard mounts with default timeo=600 (10 minutes) cause:

10-minute pod startup delays if NFS server is unreachable
kubectl delete pod hangs for 10 minutes
Kernel task hangs blocking node operations

Soft mount (soft,timeo=30,retrans=3) trades availability for responsiveness:

Max 90s hang (30s × 3 retries)
Operations return EIO after timeout → app can handle error
Acceptable for non-critical data paths

Critical paths: Databases use iSCSI (not NFS), so soft mount never affects data integrity.

Troubleshooting

NFS Mount Hangs

Symptom: Pod stuck in ContainerCreating, df -h hangs on NFS mount

Diagnosis:

# On K8s node
mount | grep nfs
showmount -e 10.0.10.15

# Check NFS server
ssh root@10.0.10.15
zfs list | grep main/<service>
cat /etc/exports | grep <service>

Fix:

Verify dataset exists: zfs list main/<service>
Verify export: grep <service> /etc/exports
If missing: re-run /root/secrets/nfs_exports.sh
Restart NFS server: service nfs-server restart

iSCSI Session Drops

Symptom: PostgreSQL/MySQL pod restarts, iSCSI reconnection loops

Diagnosis:

# On K8s node
iscsiadm -m session
dmesg | grep iscsi
journalctl -u iscsid -f

Fix:

Check TrueNAS iSCSI service: WebUI → Sharing → iSCSI → Targets
Verify hardened timeouts: iscsiadm -m node -o show | grep timeout
If defaults: re-apply cloud-init or manually update /etc/iscsi/iscsid.conf

Restart session:

iscsiadm -m node -u
iscsiadm -m node -l

Democratic-CSI Sidecar OOMKill

Symptom: kubectl describe pod shows sidecar containers OOMKilled

Diagnosis:

kubectl get events -n democratic-csi | grep OOM
kubectl top pod -n democratic-csi

Fix:

Set explicit resources in Helm values (see "Democratic-CSI Sidecar Resources" above)
Apply: terragrunt apply in stacks/democratic-csi/

SQLite Corruption on NFS

Symptom: database disk image is malformed, checksum errors

Diagnosis:

# In pod
sqlite3 /data/db.sqlite "PRAGMA integrity_check;"

Fix: Migrate to iSCSI

Create iSCSI PVC in Terraform stack
Restore from backup to new volume
Update deployment to use new PVC
Delete old NFS PVC

Slow NFS Performance

Symptom: High latency on file operations, iostat shows NFS wait times

Diagnosis:

# On TrueNAS
zpool iostat -v 5
arc_summary | grep "Hit Rate"

# On K8s node
nfsiostat 5

Optimization:

Check ZFS ARC hit rate (should be >90%)
Move hot datasets to SSD pool: zfs send main/<dataset> | zfs recv ssd/<dataset>
Tune NFS mount: add rsize=1048576,wsize=1048576 to StorageClass mountOptions

Runbooks:
- docs/runbooks/restore-postgresql.md
- docs/runbooks/restore-mysql.md
- docs/runbooks/recover-nfs-mount.md
Architecture: docs/architecture/backup-dr.md (backup strategy using ZFS snapshots)
Reference: .claude/reference/service-catalog.md (which services use NFS vs iSCSI)

13 KiB Raw Blame History Unescape Escape

Storage Architecture

Overview

Architecture Diagram

Components

How It Works

NFS Storage Flow

Block Storage Flow (Proxmox CSI) — NEW

iSCSI Storage Flow (DEPRECATED — replaced 2026-04-02)

SQLite on NFS — Why It Fails

Democratic-CSI Sidecar Resources

Configuration

Key Files

Vault Paths

Terraform Stacks

NFS Export Management

Decisions & Rationale

Why NFS for Most Workloads?

Why iSCSI for Databases?

Why SSH Driver Over API?

Why Soft Mount for NFS?

Troubleshooting

NFS Mount Hangs

iSCSI Session Drops

Democratic-CSI Sidecar OOMKill

SQLite Corruption on NFS

Slow NFS Performance

Related

13 KiB

Raw Blame History