docs: update hardware inventory for R730 RAM upgrade to 272GB
Upgraded from 144GB (4x32G + 2x8G) to 272GB (8x32G + 2x8G) DDR4-2400. Added physical DIMM slot diagram, channel layout, and BIOS speed override notes. Updated compute architecture with correct CPU (single socket), VM memory values, and capacity figures.
This commit is contained in:
parent
87c858f026
commit
2d8aa5ed89
3 changed files with 277 additions and 22 deletions
|
|
@ -9,16 +9,16 @@ The infrastructure runs on a single Dell R730 server with Proxmox VE, hosting a
|
|||
```mermaid
|
||||
graph TB
|
||||
subgraph Physical["Dell R730 Physical Host"]
|
||||
CPU["2x Xeon E5-2699 v4<br/>22c/44t each<br/>44c/88t total"]
|
||||
RAM["142GB DDR4 ECC"]
|
||||
CPU["1x Xeon E5-2699 v4<br/>22c/44t<br/>CPU2 unpopulated"]
|
||||
RAM["272GB DDR4-2400 ECC"]
|
||||
GPU["NVIDIA Tesla T4<br/>PCIe 0000:06:00.0"]
|
||||
DISK["1.1TB SSD<br/>931GB SSD<br/>10.7TB HDD"]
|
||||
end
|
||||
|
||||
subgraph Proxmox["Proxmox VE"]
|
||||
direction TB
|
||||
MASTER["VM 200: k8s-master<br/>8c / 8GB<br/>10.0.20.100"]
|
||||
NODE1["VM 201: k8s-node1<br/>16c / 16GB<br/>GPU Passthrough<br/>nvidia.com/gpu=true:NoSchedule"]
|
||||
MASTER["VM 200: k8s-master<br/>8c / 16GB<br/>10.0.20.100"]
|
||||
NODE1["VM 201: k8s-node1<br/>16c / 32GB<br/>GPU Passthrough<br/>nvidia.com/gpu=true:NoSchedule"]
|
||||
NODE2["VM 202: k8s-node2<br/>8c / 24GB"]
|
||||
NODE3["VM 203: k8s-node3<br/>8c / 24GB"]
|
||||
NODE4["VM 204: k8s-node4<br/>8c / 24GB"]
|
||||
|
|
@ -60,9 +60,9 @@ graph TB
|
|||
| Component | Specification |
|
||||
|-----------|---------------|
|
||||
| Model | Dell PowerEdge R730 |
|
||||
| CPU | 2x Intel Xeon E5-2699 v4 (22 cores / 44 threads each) |
|
||||
| Total Cores/Threads | 44 cores / 88 threads |
|
||||
| RAM | 142GB DDR4 ECC |
|
||||
| CPU | 1x Intel Xeon E5-2699 v4 (22 cores / 44 threads, CPU2 unpopulated) |
|
||||
| Total Cores/Threads | 22 cores / 44 threads |
|
||||
| RAM | 272GB DDR4-2400 ECC RDIMM (10 DIMMs: 8x32G Samsung + 2x8G Hynix) |
|
||||
| GPU | NVIDIA Tesla T4 (16GB GDDR6, PCIe 0000:06:00.0) |
|
||||
| Storage | 1.1TB SSD + 931GB SSD + 10.7TB HDD |
|
||||
| Hypervisor | Proxmox VE |
|
||||
|
|
@ -71,13 +71,13 @@ graph TB
|
|||
|
||||
| VM | VMID | vCPUs | RAM | Network | Role | Taints |
|
||||
|----|------|-------|-----|---------|------|--------|
|
||||
| k8s-master | 200 | 8 | 8GB | vmbr1:vlan20 (10.0.20.100) | Control Plane | `node-role.kubernetes.io/control-plane:NoSchedule` |
|
||||
| k8s-node1 | 201 | 16 | 16GB | vmbr1:vlan20 | GPU Worker | `nvidia.com/gpu=true:NoSchedule` |
|
||||
| k8s-master | 200 | 8 | 16GB | vmbr1:vlan20 (10.0.20.100) | Control Plane | `node-role.kubernetes.io/control-plane:NoSchedule` |
|
||||
| k8s-node1 | 201 | 16 | 32GB | vmbr1:vlan20 | GPU Worker | `nvidia.com/gpu=true:NoSchedule` |
|
||||
| k8s-node2 | 202 | 8 | 24GB | vmbr1:vlan20 | Worker | None |
|
||||
| k8s-node3 | 203 | 8 | 24GB | vmbr1:vlan20 | Worker | None |
|
||||
| k8s-node4 | 204 | 8 | 24GB | vmbr1:vlan20 | Worker | None |
|
||||
|
||||
**Total Cluster Resources**: 48 vCPUs, 104GB RAM (excluding control plane)
|
||||
**Total Cluster Resources**: 48 vCPUs, 120GB RAM (excluding control plane)
|
||||
|
||||
### GPU Passthrough
|
||||
|
||||
|
|
@ -443,7 +443,7 @@ spec:
|
|||
**Rationale**:
|
||||
- **CFS Throttling**: Linux Completely Fair Scheduler throttles containers to their exact CPU limit, even when CPU is idle. This causes artificial performance degradation.
|
||||
- **Burstability**: Services can burst to unused CPU during low-load periods, improving response times.
|
||||
- **Memory-bound**: With 142GB RAM across 48 vCPUs, memory exhaustion occurs before CPU saturation. Memory is the constraining resource.
|
||||
- **Memory-bound**: With 272GB host RAM (180GB allocated to VMs), memory is no longer the primary constraint. 92GB headroom available for new VMs.
|
||||
|
||||
**Tradeoff**: A runaway process could monopolize CPU. Mitigated by CPU requests reserving capacity and PriorityClass preemption.
|
||||
|
||||
|
|
@ -492,7 +492,7 @@ spec:
|
|||
- **Growth Buffer**: Services grow over time (more users, more data). Headroom delays the need for manual intervention.
|
||||
- **GPU Volatility**: GPU workloads (ML inference) have unpredictable memory usage. 30% headroom reduces OOMKills.
|
||||
|
||||
**Tradeoff**: Slightly higher memory allocation. Accepted because 142GB RAM provides ample capacity.
|
||||
**Tradeoff**: Slightly higher memory allocation. Accepted because 272GB RAM provides ample capacity.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue