From 2e0cebff8752bbbebecfa5d025f60e6f9f4373c9 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Thu, 11 Jun 2026 17:50:43 +0000 Subject: [PATCH] docs: sync compute/storage/proxmox-inventory with live state (memory audit) [ci skip] MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Viktor asked to go through the agent's stored infra facts and straighten out anything wrong about what-is-where. Cross-checking docs against the live cluster surfaced doc drift alongside the stale memories: - compute.md: add k8s-node5/6 (joined 2026-05-26) to diagram + node table; totals 48 vCPU / ~176GB -> 64 vCPU / ~240GB; cluster version v1.34.2 -> v1.34.8 (live-verified) - storage.md: the nfs-proxmox StorageClass no longer exists (removed 2026-04-25, commit 484b4c71) — nfs-truenas is the only NFS SC; fixed three spots that told readers to use nfs-proxmox - proxmox-inventory.md: k8s VM RAM rows live-verified via kubectl (master 32G, node1 48G, node2-4 32G — the old 16/32/24G figures predated the 2026-04-02 resize), added node5/6 rows, devvm swap 8G -> 14G (grown 2026-06-10), recomputed total (~288GB nominal of 272GB physical, overcommitted) Co-Authored-By: Claude Fable 5 --- .claude/reference/proxmox-inventory.md | 16 +++++++++------- docs/architecture/compute.md | 10 +++++++--- docs/architecture/storage.md | 9 ++++----- 3 files changed, 20 insertions(+), 15 deletions(-) diff --git a/.claude/reference/proxmox-inventory.md b/.claude/reference/proxmox-inventory.md index f2f53758..2b9ab3fe 100644 --- a/.claude/reference/proxmox-inventory.md +++ b/.claude/reference/proxmox-inventory.md @@ -92,19 +92,21 @@ Channel 3: A4 [32G] ──── A8 [32G] ──── A12[ 8G ] = 72 GB | VMID | Name | Status | CPUs | RAM | Network | Disk | Notes | |------|------|--------|------|-----|---------|------|-------| | 101 | pfsense | running | 8 | 4GB | vmbr0, vmbr1:vlan10, vmbr1:vlan20 | 32G | Gateway/firewall | -| 102 | devvm | running | 16 | 24GB | vmbr1:vlan10 | 100G | Development VM + t3code Workstation host. 8G swapfile (swappiness=10). Capacity budget: ~4-5G RAM/active user, max ~3-4 concurrent active Claude sessions. NOT Terraform-managed. | +| 102 | devvm | running | 16 | 24GB | vmbr1:vlan10 | 100G | Development VM + t3code Workstation host. 14G swap (8G /swapfile + 6G /swapfile2, grown 2026-06-10; swappiness=10). Capacity budget: ~4-5G RAM/active user, max ~3-4 concurrent active Claude sessions. NOT Terraform-managed. | | 103 | home-assistant | running | 8 | 8GB | vmbr0 | 64G | HA Sofia, net0(vlan10) disabled, SSH: vbarzin@192.168.1.8 | | 105 | pbs | stopped | 16 | 8GB | vmbr1:vlan10 | 32G | Proxmox Backup (unused) | -| 200 | k8s-master | running | 8 | 16GB | vmbr1:vlan20 | 64G | Control plane (10.0.20.100) | -| 201 | k8s-node1 | running | 16 | 32GB | vmbr1:vlan20 | 256G | GPU node, Tesla T4 | -| 202 | k8s-node2 | running | 8 | 24GB | vmbr1:vlan20 | 256G | Worker | -| 203 | k8s-node3 | running | 8 | 24GB | vmbr1:vlan20 | 256G | Worker | -| 204 | k8s-node4 | running | 8 | 24GB | vmbr1:vlan20 | 256G | Worker | +| 200 | k8s-master | running | 8 | 32GB | vmbr1:vlan20 | 64G | Control plane (10.0.20.100) | +| 201 | k8s-node1 | running | 16 | 48GB | vmbr1:vlan20 | 256G | GPU node, Tesla T4 | +| 202 | k8s-node2 | running | 8 | 32GB | vmbr1:vlan20 | 256G | Worker | +| 203 | k8s-node3 | running | 8 | 32GB | vmbr1:vlan20 | 256G | Worker | +| 204 | k8s-node4 | running | 8 | 32GB | vmbr1:vlan20 | 256G | Worker | +| 205 | k8s-node5 | running | 8 | 32GB | vmbr1:vlan20 | 256G | Worker (10.0.20.105, joined 2026-05-26) | +| 206 | k8s-node6 | running | 8 | 32GB | vmbr1:vlan20 | 256G | Worker (10.0.20.106, joined 2026-05-26) | | 220 | docker-registry | running | 4 | 4GB | vmbr1:vlan20 | 64G | MAC DE:AD:BE:EF:22:22 (10.0.20.10) | | 300 | Windows10 | running | 16 | 8GB | vmbr0 | 100G | Windows VM | | ~~9000~~ | ~~truenas~~ | **stopped/decommissioned** | — | — | — | — | NFS migrated to Proxmox host (192.168.1.127) at `/srv/nfs` and `/srv/nfs-ssd` | -**Total VM RAM allocated**: 196 GB of 272 GB (72%) — 76 GB free for future VMs (devvm corrected 8GB→24GB 2026-06-08) +**Total VM RAM allocated**: ~288 GB nominal across running VMs vs 272 GB physical — OVERCOMMITTED (ballooning enabled on K8s workers, host swap in use; see memory id=535/2543). K8s rows live-verified via `kubectl get nodes` capacity 2026-06-11 (master 32G, node1 48G, node2-6 32G; the old 16/32/24GB figures predated the 2026-04-02 resize and node5/6). ## VM Templates | VMID | Name | Purpose | diff --git a/docs/architecture/compute.md b/docs/architecture/compute.md index d4ccf6e1..0b808b55 100644 --- a/docs/architecture/compute.md +++ b/docs/architecture/compute.md @@ -22,9 +22,11 @@ graph TB NODE2["VM 202: k8s-node2
8c / 32GB"] NODE3["VM 203: k8s-node3
8c / 32GB"] NODE4["VM 204: k8s-node4
8c / 32GB"] + NODE5["VM 205: k8s-node5
8c / 32GB"] + NODE6["VM 206: k8s-node6
8c / 32GB"] end - subgraph K8s["Kubernetes Cluster v1.34.2"] + subgraph K8s["Kubernetes Cluster v1.34.8"] direction TB subgraph VPA["VPA (Goldilocks - Initial Mode)"] @@ -62,7 +64,7 @@ graph TB | Model | Dell PowerEdge R730 | | CPU | 1x Intel Xeon E5-2699 v4 (22 cores / 44 threads, CPU2 unpopulated) | | Total Cores/Threads | 22 cores / 44 threads | -| RAM | 272GB DDR4-2400 ECC RDIMM physical (10 DIMMs: 8x32G Samsung + 2x8G Hynix). VMs use ~176GB total (k8s-node1 48GB + 4 K8s VMs x 32GB) | +| RAM | 272GB DDR4-2400 ECC RDIMM physical (10 DIMMs: 8x32G Samsung + 2x8G Hynix). K8s VMs use ~240GB total (k8s-node1 48GB + 6 K8s VMs x 32GB) | | GPU | NVIDIA Tesla T4 (16GB GDDR6, PCIe 0000:06:00.0) | | Storage | 1.1TB SSD + 931GB SSD + 10.7TB HDD | | Hypervisor | Proxmox VE | @@ -76,8 +78,10 @@ graph TB | k8s-node2 | 202 | 8 | 32GB | vmbr1:vlan20 | Worker | None | | k8s-node3 | 203 | 8 | 32GB | vmbr1:vlan20 | Worker | None | | k8s-node4 | 204 | 8 | 32GB | vmbr1:vlan20 | Worker | None | +| k8s-node5 | 205 | 8 | 32GB | vmbr1:vlan20 (10.0.20.105) | Worker (joined 2026-05-26) | None | +| k8s-node6 | 206 | 8 | 32GB | vmbr1:vlan20 (10.0.20.106) | Worker (joined 2026-05-26) | None | -**Total Cluster Resources**: 48 vCPUs, ~176GB RAM (k8s-node1 48GB + 4 nodes x 32GB) +**Total Cluster Resources**: 64 vCPUs, ~240GB RAM (k8s-node1 16c/48GB + master and 5 workers at 8c/32GB each) > **All Linux VMs are hand-managed in Proxmox, NOT in Terraform** > (decided 2026-05-26, commit 44c3770a). The telmate/proxmox v3.0.2 diff --git a/docs/architecture/storage.md b/docs/architecture/storage.md index 486246a6..4b501598 100644 --- a/docs/architecture/storage.md +++ b/docs/architecture/storage.md @@ -17,7 +17,7 @@ All services storing sensitive data were migrated to `proxmox-lvm-encrypted` on - **HDD NFS**: `/srv/nfs` on ext4 LV `pve/nfs-data` (4TB) — bulk media and backup targets - **SSD NFS**: `/srv/nfs-ssd` on ext4 LV `ssd/nfs-ssd-data` (100GB) — high-performance data (Immich ML) -Both `StorageClass: nfs-truenas` and `StorageClass: nfs-proxmox` point to the Proxmox host and are functionally identical. The `nfs-truenas` name is historical — it was retained because StorageClass names are immutable on bound PVs (48 PVs reference it) and renaming would force mass PV churn across the cluster. +`StorageClass: nfs-truenas` is the **only** NFS StorageClass and points to the Proxmox host. The name is historical — it was retained because StorageClass names are immutable on bound PVs (48 PVs reference it) and renaming would force mass PV churn across the cluster. (A short-lived parallel `nfs-proxmox` StorageClass was removed on 2026-04-25, commit 484b4c71, during the vault NFS-hostile migration.) **Backup storage (sda)**: 1.1TB RAID1 SAS disk, VG `backup`, LV `data` (ext4), mounted at `/mnt/backup` on PVE host. Dedicated backup disk for weekly PVC file backups, auto SQLite backups, pfSense backups, and PVE config. NFS data syncs directly to Synology via inotify change tracking (not stored on sda). Independent of live storage (sdc). @@ -47,7 +47,7 @@ graph TB end subgraph K8s["Kubernetes Cluster"] - CSI_NFS["nfs-csi driver
StorageClass: nfs-proxmox (+ legacy nfs-truenas)
soft,timeo=30,retrans=3"] + CSI_NFS["nfs-csi driver
StorageClass: nfs-truenas (historical name)
soft,timeo=30,retrans=3"] CSI_PVE["Proxmox CSI plugin
StorageClass: proxmox-lvm
StorageClass: proxmox-lvm-encrypted"] NFS_PV["NFS PersistentVolumes
RWX, ~100 volumes"] @@ -85,8 +85,7 @@ graph TB | Proxmox NFS (HDD) | LV `pve/nfs-data`, 4TB ext4 | 192.168.1.127:/srv/nfs | Bulk NFS data for all services | | Proxmox NFS (SSD) | LV `ssd/nfs-ssd-data`, 100GB ext4 | 192.168.1.127:/srv/nfs-ssd | High-performance data (Immich ML) | | nfs-csi | Helm chart | Namespace: nfs-csi | NFS CSI driver | -| StorageClass `nfs-proxmox` | RWX, soft mount | Cluster-wide | NFS storage, points to Proxmox host | -| StorageClass `nfs-truenas` | RWX, soft mount | Cluster-wide | **Historical name** — functionally identical to `nfs-proxmox`, points to the Proxmox host. Kept because SC names are immutable on 48 bound PVs. | +| StorageClass `nfs-truenas` | RWX, soft mount | Cluster-wide | The only NFS StorageClass — **historical name**, points to the Proxmox host. Kept because SC names are immutable on 48 bound PVs. (Sibling `nfs-proxmox` SC removed 2026-04-25, commit 484b4c71.) | | TF module `nfs_volume` | `modules/kubernetes/nfs_volume/` | Infra repo | Static NFS PV/PVC factory | | ~~TrueNAS VM~~ | **DECOMMISSIONED 2026-04-13** | Was VM 9000 at 10.0.10.15 | Replaced by Proxmox NFS. VM still in stopped state pending deletion. | | ~~democratic-csi-iscsi~~ | **REMOVED** | Was namespace: iscsi-csi | Replaced by Proxmox CSI (2026-04-02) | @@ -113,7 +112,7 @@ graph TB **Note**: Some legacy PVs still reference `/mnt/main/` paths. These work via compatibility symlinks/bind-mounts on the Proxmox host. New PVs should use `/srv/nfs/` or `/srv/nfs-ssd/`. -**CRITICAL**: Never use inline `nfs {}` blocks in pod specs — they default to `hard,timeo=600` which causes 10-minute hangs on network issues. Always use the `nfs-proxmox` StorageClass (or the legacy `nfs-truenas` for existing PVs) via PVCs. +**CRITICAL**: Never use inline `nfs {}` blocks in pod specs — they default to `hard,timeo=600` which causes 10-minute hangs on network issues. Always use the `nfs-truenas` StorageClass (historical name; it points at the Proxmox host) via PVCs. ### Block Storage Flow (Proxmox CSI) — NEW