docs: update hardware inventory for R730 RAM upgrade to 272GB

Upgraded from 144GB (4x32G + 2x8G) to 272GB (8x32G + 2x8G) DDR4-2400.
Added physical DIMM slot diagram, channel layout, and BIOS speed override
notes. Updated compute architecture with correct CPU (single socket),
VM memory values, and capacity figures.
This commit is contained in:
Viktor Barzin 2026-04-02 00:48:13 +03:00
parent 87c858f026
commit 2d8aa5ed89
3 changed files with 277 additions and 22 deletions

View file

@ -9,16 +9,16 @@ The infrastructure runs on a single Dell R730 server with Proxmox VE, hosting a
```mermaid
graph TB
subgraph Physical["Dell R730 Physical Host"]
CPU["2x Xeon E5-2699 v4<br/>22c/44t each<br/>44c/88t total"]
RAM["142GB DDR4 ECC"]
CPU["1x Xeon E5-2699 v4<br/>22c/44t<br/>CPU2 unpopulated"]
RAM["272GB DDR4-2400 ECC"]
GPU["NVIDIA Tesla T4<br/>PCIe 0000:06:00.0"]
DISK["1.1TB SSD<br/>931GB SSD<br/>10.7TB HDD"]
end
subgraph Proxmox["Proxmox VE"]
direction TB
MASTER["VM 200: k8s-master<br/>8c / 8GB<br/>10.0.20.100"]
NODE1["VM 201: k8s-node1<br/>16c / 16GB<br/>GPU Passthrough<br/>nvidia.com/gpu=true:NoSchedule"]
MASTER["VM 200: k8s-master<br/>8c / 16GB<br/>10.0.20.100"]
NODE1["VM 201: k8s-node1<br/>16c / 32GB<br/>GPU Passthrough<br/>nvidia.com/gpu=true:NoSchedule"]
NODE2["VM 202: k8s-node2<br/>8c / 24GB"]
NODE3["VM 203: k8s-node3<br/>8c / 24GB"]
NODE4["VM 204: k8s-node4<br/>8c / 24GB"]
@ -60,9 +60,9 @@ graph TB
| Component | Specification |
|-----------|---------------|
| Model | Dell PowerEdge R730 |
| CPU | 2x Intel Xeon E5-2699 v4 (22 cores / 44 threads each) |
| Total Cores/Threads | 44 cores / 88 threads |
| RAM | 142GB DDR4 ECC |
| CPU | 1x Intel Xeon E5-2699 v4 (22 cores / 44 threads, CPU2 unpopulated) |
| Total Cores/Threads | 22 cores / 44 threads |
| RAM | 272GB DDR4-2400 ECC RDIMM (10 DIMMs: 8x32G Samsung + 2x8G Hynix) |
| GPU | NVIDIA Tesla T4 (16GB GDDR6, PCIe 0000:06:00.0) |
| Storage | 1.1TB SSD + 931GB SSD + 10.7TB HDD |
| Hypervisor | Proxmox VE |
@ -71,13 +71,13 @@ graph TB
| VM | VMID | vCPUs | RAM | Network | Role | Taints |
|----|------|-------|-----|---------|------|--------|
| k8s-master | 200 | 8 | 8GB | vmbr1:vlan20 (10.0.20.100) | Control Plane | `node-role.kubernetes.io/control-plane:NoSchedule` |
| k8s-node1 | 201 | 16 | 16GB | vmbr1:vlan20 | GPU Worker | `nvidia.com/gpu=true:NoSchedule` |
| k8s-master | 200 | 8 | 16GB | vmbr1:vlan20 (10.0.20.100) | Control Plane | `node-role.kubernetes.io/control-plane:NoSchedule` |
| k8s-node1 | 201 | 16 | 32GB | vmbr1:vlan20 | GPU Worker | `nvidia.com/gpu=true:NoSchedule` |
| k8s-node2 | 202 | 8 | 24GB | vmbr1:vlan20 | Worker | None |
| k8s-node3 | 203 | 8 | 24GB | vmbr1:vlan20 | Worker | None |
| k8s-node4 | 204 | 8 | 24GB | vmbr1:vlan20 | Worker | None |
**Total Cluster Resources**: 48 vCPUs, 104GB RAM (excluding control plane)
**Total Cluster Resources**: 48 vCPUs, 120GB RAM (excluding control plane)
### GPU Passthrough
@ -443,7 +443,7 @@ spec:
**Rationale**:
- **CFS Throttling**: Linux Completely Fair Scheduler throttles containers to their exact CPU limit, even when CPU is idle. This causes artificial performance degradation.
- **Burstability**: Services can burst to unused CPU during low-load periods, improving response times.
- **Memory-bound**: With 142GB RAM across 48 vCPUs, memory exhaustion occurs before CPU saturation. Memory is the constraining resource.
- **Memory-bound**: With 272GB host RAM (180GB allocated to VMs), memory is no longer the primary constraint. 92GB headroom available for new VMs.
**Tradeoff**: A runaway process could monopolize CPU. Mitigated by CPU requests reserving capacity and PriorityClass preemption.
@ -492,7 +492,7 @@ spec:
- **Growth Buffer**: Services grow over time (more users, more data). Headroom delays the need for manual intervention.
- **GPU Volatility**: GPU workloads (ML inference) have unpredictable memory usage. 30% headroom reduces OOMKills.
**Tradeoff**: Slightly higher memory allocation. Accepted because 142GB RAM provides ample capacity.
**Tradeoff**: Slightly higher memory allocation. Accepted because 272GB RAM provides ample capacity.
## Troubleshooting