docs: update hardware inventory for R730 RAM upgrade to 272GB
Upgraded from 144GB (4x32G + 2x8G) to 272GB (8x32G + 2x8G) DDR4-2400. Added physical DIMM slot diagram, channel layout, and BIOS speed override notes. Updated compute architecture with correct CPU (single socket), VM memory values, and capacity figures.
This commit is contained in:
parent
87c858f026
commit
2d8aa5ed89
3 changed files with 277 additions and 22 deletions
|
|
@ -3,12 +3,77 @@
|
|||
> Static reference for VMs, hardware, and network topology.
|
||||
|
||||
## Proxmox Host Hardware
|
||||
- **CPU**: Intel Xeon E5-2699 v4 @ 2.20GHz (22 cores / 44 threads, single socket)
|
||||
- **RAM**: 142 GB (Dell R730 server)
|
||||
- **Model**: Dell R730
|
||||
- **CPU**: Intel Xeon E5-2699 v4 @ 2.20GHz (22 cores / 44 threads, single socket, CPU2 unpopulated)
|
||||
- **RAM**: 272 GB DDR4-2400 ECC RDIMM (10 DIMMs, see Memory Layout below)
|
||||
- **GPU**: NVIDIA Tesla T4 (PCIe passthrough to k8s-node1)
|
||||
- **Disks**: 1.1TB + 931GB + 10.7TB (local storage)
|
||||
- **iDRAC**: 192.168.1.4 (root/calvin)
|
||||
- **Disks**: 1.1TB RAID1 SAS (unused) + 931GB Samsung SSD + 10.7TB RAID1 HDD
|
||||
- **Proxmox access**: `ssh root@192.168.1.127`
|
||||
|
||||
## Memory Layout (updated 2026-04-01)
|
||||
|
||||
### Physical DIMM Slot Map
|
||||
|
||||
```
|
||||
╔══════════════════════════════════════════════════════════════════════════════╗
|
||||
║ CPU1 DIMM SLOTS ║
|
||||
║ ║
|
||||
║ ┌─── WHITE (1st per channel) ───┐ ║
|
||||
║ │ │ ║
|
||||
║ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ║
|
||||
║ │ │ A1 │ │ A2 │ │ A3 │ │ A4 │ ║
|
||||
║ │ │ 32G │ │ 32G │ │ 32G │ │ 32G │ Samsung M393A4K40BB1-CRC (2R) ║
|
||||
║ │ │██████│ │██████│ │██████│ │██████│ ║
|
||||
║ │ └──────┘ └──────┘ └──────┘ └──────┘ ║
|
||||
║ │ Ch 0 Ch 1 Ch 2 Ch 3 ║
|
||||
║ └────────────────────────────────┘ ║
|
||||
║ ║
|
||||
║ ┌─── BLACK (2nd per channel) ───┐ ║
|
||||
║ │ │ ║
|
||||
║ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ║
|
||||
║ │ │ A5 │ │ A6 │ │ A7 │ │ A8 │ ║
|
||||
║ │ │ 32G │ │ 32G │ │ 32G │ │ 32G │ Samsung M393A4K40CB1-CRC (2R) ║
|
||||
║ │ │▓▓▓▓▓▓│ │▓▓▓▓▓▓│ │▓▓▓▓▓▓│ │▓▓▓▓▓▓│ ║
|
||||
║ │ └──────┘ └──────┘ └──────┘ └──────┘ ║
|
||||
║ │ Ch 0 Ch 1 Ch 2 Ch 3 ║
|
||||
║ └────────────────────────────────┘ ║
|
||||
║ ║
|
||||
║ ┌─── GREEN (3rd per channel) ───┐ ║
|
||||
║ │ │ ║
|
||||
║ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ║
|
||||
║ │ │ A9 │ │ A10 │ │ A11 │ │ A12 │ ║
|
||||
║ │ │ │ │ │ │ 8G │ │ 8G │ SK Hynix HMA81GR7AFR8N-UH (1R) ║
|
||||
║ │ │ empty│ │ empty│ │░░░░░░│ │░░░░░░│ ║
|
||||
║ │ └──────┘ └──────┘ └──────┘ └──────┘ ║
|
||||
║ │ Ch 0 Ch 1 Ch 2 Ch 3 ║
|
||||
║ └────────────────────────────────┘ ║
|
||||
║ ║
|
||||
║ B1-B12: All empty (requires CPU2) ║
|
||||
║ ║
|
||||
║ Legend: ██ = Samsung BB1 32G ▓▓ = Samsung CB1 32G ░░ = Hynix 8G ║
|
||||
╚══════════════════════════════════════════════════════════════════════════════╝
|
||||
```
|
||||
|
||||
### Channel Summary
|
||||
|
||||
```
|
||||
Channel 0: A1 [32G] ──── A5 [32G] ──── A9 [ ] = 64 GB ✓ matched
|
||||
Channel 1: A2 [32G] ──── A6 [32G] ──── A10[ ] = 64 GB ✓ matched
|
||||
Channel 2: A3 [32G] ──── A7 [32G] ──── A11[ 8G ] = 72 GB ~ +8G bonus
|
||||
Channel 3: A4 [32G] ──── A8 [32G] ──── A12[ 8G ] = 72 GB ~ +8G bonus
|
||||
───────── ───────── ──────────
|
||||
WHITE BLACK GREEN TOTAL: 272 GB
|
||||
```
|
||||
|
||||
### DIMM Details
|
||||
|
||||
- **A1-A4**: Samsung M393A4K40BB1-CRC 32GB DDR4-2400 ECC RDIMM (2-rank, original)
|
||||
- **A5-A8**: Samsung M393A4K40CB1-CRC 32GB DDR4-2400 ECC RDIMM (2-rank, added 2026-04-01)
|
||||
- **A11-A12**: SK Hynix HMA81GR7AFR8N-UH 8GB DDR4-2400 ECC RDIMM (1-rank, relocated from A5/A6)
|
||||
- **A9-A10, B1-B12**: Empty (B-side requires CPU2)
|
||||
- **Speed**: 2400 MHz (BIOS override — 3 DPC defaults to 1866 MHz, forced to 2400 via System BIOS > Memory Settings > Memory Frequency)
|
||||
|
||||
## Network Topology
|
||||
```
|
||||
10.0.10.0/24 - Management: Wizard (10.0.10.10), TrueNAS NFS (10.0.10.15)
|
||||
|
|
@ -25,18 +90,20 @@
|
|||
|
||||
| VMID | Name | Status | CPUs | RAM | Network | Disk | Notes |
|
||||
|------|------|--------|------|-----|---------|------|-------|
|
||||
| 101 | pfsense | running | 8 | 16GB | vmbr0, vmbr1:vlan10, vmbr1:vlan20 | 32G | Gateway/firewall |
|
||||
| 101 | pfsense | running | 8 | 4GB | vmbr0, vmbr1:vlan10, vmbr1:vlan20 | 32G | Gateway/firewall |
|
||||
| 102 | devvm | running | 16 | 8GB | vmbr1:vlan10 | 100G | Development VM |
|
||||
| 103 | home-assistant | running | 8 | 8GB | vmbr0 | 64G | HA Sofia, net0(vlan10) disabled, SSH: vbarzin@192.168.1.8 |
|
||||
| 105 | pbs | stopped | 16 | 8GB | vmbr1:vlan10 | 32G | Proxmox Backup (unused) |
|
||||
| 200 | k8s-master | running | 8 | 8GB* | vmbr1:vlan20 | 64G | Control plane (10.0.20.100). *Verify via `qm config 200` |
|
||||
| 201 | k8s-node1 | running | 16 | 16GB* | vmbr1:vlan20 | 256G | GPU node, Tesla T4. *Verify via `qm config 201` |
|
||||
| 202 | k8s-node2 | running | 8 | 24GB* | vmbr1:vlan20 | 256G | Worker. *Inferred from k8s allocatable (~22 GiB) |
|
||||
| 203 | k8s-node3 | running | 8 | 24GB* | vmbr1:vlan20 | 256G | Worker. *Inferred from k8s allocatable (~22 GiB) |
|
||||
| 204 | k8s-node4 | running | 8 | 24GB* | vmbr1:vlan20 | 256G | Worker. *Inferred from k8s allocatable (~22 GiB) |
|
||||
| 200 | k8s-master | running | 8 | 16GB | vmbr1:vlan20 | 64G | Control plane (10.0.20.100) |
|
||||
| 201 | k8s-node1 | running | 16 | 32GB | vmbr1:vlan20 | 256G | GPU node, Tesla T4 |
|
||||
| 202 | k8s-node2 | running | 8 | 24GB | vmbr1:vlan20 | 256G | Worker |
|
||||
| 203 | k8s-node3 | running | 8 | 24GB | vmbr1:vlan20 | 256G | Worker |
|
||||
| 204 | k8s-node4 | running | 8 | 24GB | vmbr1:vlan20 | 256G | Worker |
|
||||
| 220 | docker-registry | running | 4 | 4GB | vmbr1:vlan20 | 64G | MAC DE:AD:BE:EF:22:22 (10.0.20.10) |
|
||||
| 300 | Windows10 | running | 16 | 8GB | vmbr0 | 100G | Windows VM |
|
||||
| 9000 | truenas | running | 16 | 16GB | vmbr1:vlan10 | 32G+7x256G+1T | NFS (10.0.10.15) |
|
||||
| 9000 | truenas | running | 16 | 8GB | vmbr1:vlan10 | 32G+7x256G+1T | NFS (10.0.10.15) |
|
||||
|
||||
**Total VM RAM allocated**: 180 GB of 272 GB (66%) — 92 GB free for future VMs
|
||||
|
||||
## VM Templates
|
||||
| VMID | Name | Purpose |
|
||||
|
|
|
|||
|
|
@ -9,16 +9,16 @@ The infrastructure runs on a single Dell R730 server with Proxmox VE, hosting a
|
|||
```mermaid
|
||||
graph TB
|
||||
subgraph Physical["Dell R730 Physical Host"]
|
||||
CPU["2x Xeon E5-2699 v4<br/>22c/44t each<br/>44c/88t total"]
|
||||
RAM["142GB DDR4 ECC"]
|
||||
CPU["1x Xeon E5-2699 v4<br/>22c/44t<br/>CPU2 unpopulated"]
|
||||
RAM["272GB DDR4-2400 ECC"]
|
||||
GPU["NVIDIA Tesla T4<br/>PCIe 0000:06:00.0"]
|
||||
DISK["1.1TB SSD<br/>931GB SSD<br/>10.7TB HDD"]
|
||||
end
|
||||
|
||||
subgraph Proxmox["Proxmox VE"]
|
||||
direction TB
|
||||
MASTER["VM 200: k8s-master<br/>8c / 8GB<br/>10.0.20.100"]
|
||||
NODE1["VM 201: k8s-node1<br/>16c / 16GB<br/>GPU Passthrough<br/>nvidia.com/gpu=true:NoSchedule"]
|
||||
MASTER["VM 200: k8s-master<br/>8c / 16GB<br/>10.0.20.100"]
|
||||
NODE1["VM 201: k8s-node1<br/>16c / 32GB<br/>GPU Passthrough<br/>nvidia.com/gpu=true:NoSchedule"]
|
||||
NODE2["VM 202: k8s-node2<br/>8c / 24GB"]
|
||||
NODE3["VM 203: k8s-node3<br/>8c / 24GB"]
|
||||
NODE4["VM 204: k8s-node4<br/>8c / 24GB"]
|
||||
|
|
@ -60,9 +60,9 @@ graph TB
|
|||
| Component | Specification |
|
||||
|-----------|---------------|
|
||||
| Model | Dell PowerEdge R730 |
|
||||
| CPU | 2x Intel Xeon E5-2699 v4 (22 cores / 44 threads each) |
|
||||
| Total Cores/Threads | 44 cores / 88 threads |
|
||||
| RAM | 142GB DDR4 ECC |
|
||||
| CPU | 1x Intel Xeon E5-2699 v4 (22 cores / 44 threads, CPU2 unpopulated) |
|
||||
| Total Cores/Threads | 22 cores / 44 threads |
|
||||
| RAM | 272GB DDR4-2400 ECC RDIMM (10 DIMMs: 8x32G Samsung + 2x8G Hynix) |
|
||||
| GPU | NVIDIA Tesla T4 (16GB GDDR6, PCIe 0000:06:00.0) |
|
||||
| Storage | 1.1TB SSD + 931GB SSD + 10.7TB HDD |
|
||||
| Hypervisor | Proxmox VE |
|
||||
|
|
@ -71,13 +71,13 @@ graph TB
|
|||
|
||||
| VM | VMID | vCPUs | RAM | Network | Role | Taints |
|
||||
|----|------|-------|-----|---------|------|--------|
|
||||
| k8s-master | 200 | 8 | 8GB | vmbr1:vlan20 (10.0.20.100) | Control Plane | `node-role.kubernetes.io/control-plane:NoSchedule` |
|
||||
| k8s-node1 | 201 | 16 | 16GB | vmbr1:vlan20 | GPU Worker | `nvidia.com/gpu=true:NoSchedule` |
|
||||
| k8s-master | 200 | 8 | 16GB | vmbr1:vlan20 (10.0.20.100) | Control Plane | `node-role.kubernetes.io/control-plane:NoSchedule` |
|
||||
| k8s-node1 | 201 | 16 | 32GB | vmbr1:vlan20 | GPU Worker | `nvidia.com/gpu=true:NoSchedule` |
|
||||
| k8s-node2 | 202 | 8 | 24GB | vmbr1:vlan20 | Worker | None |
|
||||
| k8s-node3 | 203 | 8 | 24GB | vmbr1:vlan20 | Worker | None |
|
||||
| k8s-node4 | 204 | 8 | 24GB | vmbr1:vlan20 | Worker | None |
|
||||
|
||||
**Total Cluster Resources**: 48 vCPUs, 104GB RAM (excluding control plane)
|
||||
**Total Cluster Resources**: 48 vCPUs, 120GB RAM (excluding control plane)
|
||||
|
||||
### GPU Passthrough
|
||||
|
||||
|
|
@ -443,7 +443,7 @@ spec:
|
|||
**Rationale**:
|
||||
- **CFS Throttling**: Linux Completely Fair Scheduler throttles containers to their exact CPU limit, even when CPU is idle. This causes artificial performance degradation.
|
||||
- **Burstability**: Services can burst to unused CPU during low-load periods, improving response times.
|
||||
- **Memory-bound**: With 142GB RAM across 48 vCPUs, memory exhaustion occurs before CPU saturation. Memory is the constraining resource.
|
||||
- **Memory-bound**: With 272GB host RAM (180GB allocated to VMs), memory is no longer the primary constraint. 92GB headroom available for new VMs.
|
||||
|
||||
**Tradeoff**: A runaway process could monopolize CPU. Mitigated by CPU requests reserving capacity and PriorityClass preemption.
|
||||
|
||||
|
|
@ -492,7 +492,7 @@ spec:
|
|||
- **Growth Buffer**: Services grow over time (more users, more data). Headroom delays the need for manual intervention.
|
||||
- **GPU Volatility**: GPU workloads (ML inference) have unpredictable memory usage. 30% headroom reduces OOMKills.
|
||||
|
||||
**Tradeoff**: Slightly higher memory allocation. Accepted because 142GB RAM provides ample capacity.
|
||||
**Tradeoff**: Slightly higher memory allocation. Accepted because 272GB RAM provides ample capacity.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
|
|
|||
188
docs/runbooks/r730-ram-upgrade-272gb.md
Normal file
188
docs/runbooks/r730-ram-upgrade-272gb.md
Normal file
|
|
@ -0,0 +1,188 @@
|
|||
# RAM Upgrade — Dell R730 Proxmox Host (Completed 2026-04-01)
|
||||
|
||||
**Host**: Dell R730 @ 192.168.1.127 (Proxmox)
|
||||
**CPU**: Single Xeon E5-2699 v4 (CPU2 unpopulated — B-side slots unavailable)
|
||||
**Before**: 144 GB (4x32G Samsung BB1 + 2x8G SK Hynix) @ 2400 MHz
|
||||
**After**: 272 GB (4x32G Samsung BB1 + 4x32G Samsung CB1 + 2x8G SK Hynix) @ 2400 MHz
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
1. **3 DPC downclock**: Adding DIMMs to the 3rd slot per channel (A11/A12) caused automatic downclocking to 1866 MHz. Dell R730 BIOS allows manual override back to 2400 MHz via **System BIOS > Memory Settings > Memory Frequency > Max Performance**.
|
||||
2. **MySQL InnoDB Cluster CR recreation**: Deleting and recreating the InnoDBCluster CR generates new admin secrets that don't match the existing data on PVCs. Fix: manually create the new admin user in MySQL and configure GR recovery channel credentials.
|
||||
3. **CNPG primary label**: After restarting the CNPG operator, it may not immediately label the primary pod with `role=primary`. Deleting the pod forces the operator to recreate it with the correct labels.
|
||||
4. **LimitRange blocks MySQL**: The `dbaas` namespace LimitRange (4Gi max) blocks MySQL pods that need 5Gi. Kyverno policy resets LimitRange patches. Fix: reduce MySQL memory limit in CR to 4Gi.
|
||||
|
||||
## Physical DIMM Slot Map (looking down at motherboard, front of server at bottom)
|
||||
|
||||
```
|
||||
╔══════════════════════════════════════════════════════════════════════════════╗
|
||||
║ CPU1 DIMM SLOTS ║
|
||||
║ ║
|
||||
║ ┌─── WHITE (1st per channel) ───┐ ║
|
||||
║ │ │ ║
|
||||
║ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ║
|
||||
║ │ │ A1 │ │ A2 │ │ A3 │ │ A4 │ ║
|
||||
║ │ │ 32G │ │ 32G │ │ 32G │ │ 32G │ ◄── KEEP (existing Samsung 32G) ║
|
||||
║ │ │██████│ │██████│ │██████│ │██████│ ║
|
||||
║ │ └──────┘ └──────┘ └──────┘ └──────┘ ║
|
||||
║ │ Ch 0 Ch 1 Ch 2 Ch 3 ║
|
||||
║ └────────────────────────────────┘ ║
|
||||
║ ║
|
||||
║ ┌─── BLACK (2nd per channel) ───┐ ║
|
||||
║ │ │ ║
|
||||
║ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ║
|
||||
║ │ │ A5 │ │ A6 │ │ A7 │ │ A8 │ ║
|
||||
║ │ │ 32G │ │ 32G │ │ 32G │ │ 32G │ ◄── INSTALL NEW 32G Samsung ║
|
||||
║ │ │▓▓▓▓▓▓│ │▓▓▓▓▓▓│ │▓▓▓▓▓▓│ │▓▓▓▓▓▓│ (remove old 8G from A5/A6) ║
|
||||
║ │ └──────┘ └──────┘ └──────┘ └──────┘ ║
|
||||
║ │ Ch 0 Ch 1 Ch 2 Ch 3 ║
|
||||
║ └────────────────────────────────┘ ║
|
||||
║ ║
|
||||
║ ┌─── GREEN (3rd per channel) ───┐ ║
|
||||
║ │ │ ║
|
||||
║ │ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ ║
|
||||
║ │ │ A9 │ │ A10 │ │ A11 │ │ A12 │ ║
|
||||
║ │ │ │ │ │ │ 8G │ │ 8G │ ◄── MOVE old 8G Hynix here ║
|
||||
║ │ │ empty│ │ empty│ │░░░░░░│ │░░░░░░│ (from A5 → A11, A6 → A12) ║
|
||||
║ │ └──────┘ └──────┘ └──────┘ └──────┘ ║
|
||||
║ │ Ch 0 Ch 1 Ch 2 Ch 3 ║
|
||||
║ └────────────────────────────────┘ ║
|
||||
║ ║
|
||||
║ Legend: ██ = existing 32G (keep in place) ║
|
||||
║ ▓▓ = NEW 32G Samsung M393A4K40BB1-CRC (install) ║
|
||||
║ ░░ = relocated 8G SK Hynix HMA81GR7AFR8N-UH (moved from A5/A6) ║
|
||||
║ ║
|
||||
╚══════════════════════════════════════════════════════════════════════════════╝
|
||||
```
|
||||
|
||||
## Channel Summary After Install
|
||||
|
||||
```
|
||||
Channel 0: A1 [32G] ──── A5 [32G] ──── A9 [ ] = 64 GB ✓ matched
|
||||
Channel 1: A2 [32G] ──── A6 [32G] ──── A10[ ] = 64 GB ✓ matched
|
||||
Channel 2: A3 [32G] ──── A7 [32G] ──── A11[ 8G ] = 72 GB ~ +8G bonus
|
||||
Channel 3: A4 [32G] ──── A8 [32G] ──── A12[ 8G ] = 72 GB ~ +8G bonus
|
||||
───────── ───────── ──────────
|
||||
WHITE BLACK GREEN TOTAL: 272 GB
|
||||
(keep) (new 32G) (moved 8G)
|
||||
```
|
||||
|
||||
**Performance**: ~1-2% bandwidth penalty on Ch2/Ch3 due to mixed DIMM sizes. Ch0/Ch1 fully matched.
|
||||
|
||||
## Shutdown Sequence
|
||||
|
||||
### Phase 0: Gracefully Stop Stateful Services
|
||||
|
||||
Scale down databases, caches, and secrets engines before draining nodes to ensure clean shutdown with no data loss.
|
||||
|
||||
```bash
|
||||
export KUBECONFIG=/path/to/config
|
||||
|
||||
# 1. Vault — seal all instances (flushes WAL, closes connections)
|
||||
kubectl -n vault exec vault-0 -- vault operator step-down 2>/dev/null
|
||||
kubectl -n vault exec vault-0 -- vault operator seal
|
||||
kubectl -n vault exec vault-1 -- vault operator seal
|
||||
kubectl -n vault exec vault-2 -- vault operator seal
|
||||
|
||||
# 2. MySQL InnoDB Cluster — set super_read_only, scale router to 0
|
||||
kubectl -n dbaas scale deploy mysql-cluster-router --replicas=0
|
||||
kubectl -n dbaas exec mysql-cluster-0 -- mysql -e "SET GLOBAL innodb_fast_shutdown=0; SET GLOBAL super_read_only=ON;"
|
||||
kubectl -n dbaas exec mysql-cluster-1 -- mysql -e "SET GLOBAL innodb_fast_shutdown=0; SET GLOBAL super_read_only=ON;"
|
||||
kubectl -n dbaas exec mysql-cluster-2 -- mysql -e "SET GLOBAL innodb_fast_shutdown=0; SET GLOBAL super_read_only=ON;"
|
||||
# innodb_fast_shutdown=0 forces full purge + change buffer merge on stop
|
||||
|
||||
# 3. PostgreSQL CNPG — trigger checkpoint on primaries
|
||||
kubectl -n dbaas exec pg-cluster-2 -- psql -U postgres -c "CHECKPOINT;"
|
||||
kubectl -n dbaas exec pg-cluster-4 -- psql -U postgres -c "CHECKPOINT;"
|
||||
kubectl -n immich exec deploy/immich-postgresql -- psql -U postgres -c "CHECKPOINT;"
|
||||
|
||||
# 4. Redis — trigger BGSAVE then scale down
|
||||
kubectl -n redis exec redis-node-0 -- redis-cli BGSAVE
|
||||
kubectl -n redis exec redis-node-1 -- redis-cli BGSAVE
|
||||
sleep 5 # wait for RDB flush
|
||||
kubectl -n redis scale deploy redis-haproxy --replicas=0
|
||||
|
||||
# 5. ClickHouse — flush
|
||||
kubectl -n rybbit exec deploy/clickhouse -- clickhouse-client --query "SYSTEM FLUSH LOGS"
|
||||
|
||||
# 6. Scale down stateful workloads
|
||||
kubectl -n dbaas scale sts mysql-cluster --replicas=0
|
||||
kubectl -n redis scale sts redis-node --replicas=0
|
||||
kubectl -n vault scale sts vault --replicas=0
|
||||
|
||||
# 7. Verify all stateful pods terminated
|
||||
kubectl get pods -A | grep -iE 'mysql-cluster-[0-9]|pg-cluster|redis-node|vault-[0-9]|clickhouse'
|
||||
```
|
||||
|
||||
### Phase 1: Drain K8s Nodes
|
||||
|
||||
```bash
|
||||
# Drain workers (reverse order)
|
||||
kubectl drain k8s-node4 --ignore-daemonsets --delete-emptydir-data --force --timeout=120s
|
||||
kubectl drain k8s-node3 --ignore-daemonsets --delete-emptydir-data --force --timeout=120s
|
||||
kubectl drain k8s-node2 --ignore-daemonsets --delete-emptydir-data --force --timeout=120s
|
||||
kubectl drain k8s-node1 --ignore-daemonsets --delete-emptydir-data --force --timeout=120s
|
||||
|
||||
# Cordon master
|
||||
kubectl cordon k8s-master
|
||||
```
|
||||
|
||||
### Phase 2: Shutdown VMs (via Proxmox)
|
||||
|
||||
```bash
|
||||
ssh root@192.168.1.127
|
||||
|
||||
# K8s workers
|
||||
for VMID in 201 202 203 204; do qm shutdown $VMID && echo "Shutdown VMID $VMID"; done
|
||||
sleep 30
|
||||
|
||||
# K8s master
|
||||
qm shutdown 200; sleep 15
|
||||
|
||||
# Docker registry
|
||||
qm shutdown 220; sleep 10
|
||||
|
||||
# Secondary VMs
|
||||
for VMID in 102 300 103; do qm shutdown $VMID; done
|
||||
sleep 20
|
||||
|
||||
# TrueNAS (wait for ZFS flush)
|
||||
qm shutdown 9000; sleep 60
|
||||
|
||||
# pfSense (last — network gateway)
|
||||
qm shutdown 101; sleep 15
|
||||
|
||||
# Verify all VMs stopped
|
||||
qm list
|
||||
```
|
||||
|
||||
### Phase 3: Shutdown Proxmox Host
|
||||
|
||||
```bash
|
||||
shutdown -h now
|
||||
```
|
||||
|
||||
## Physical RAM Installation
|
||||
|
||||
| Step | Action | Slot(s) | DIMM |
|
||||
|------|--------|---------|------|
|
||||
| 1 | Power off host | — | Completed via Phase 3 above |
|
||||
| 2 | **Remove** | A5 (black clip) | Take out 8G Hynix, set aside |
|
||||
| 3 | **Remove** | A6 (black clip) | Take out 8G Hynix, set aside |
|
||||
| 4 | **Install NEW** | A5 (black clip) | Insert 32G Samsung |
|
||||
| 5 | **Install NEW** | A6 (black clip) | Insert 32G Samsung |
|
||||
| 6 | **Install NEW** | A7 (black clip) | Insert 32G Samsung |
|
||||
| 7 | **Install NEW** | A8 (black clip) | Insert 32G Samsung |
|
||||
| 8 | **Install MOVED** | A11 (green clip) | Insert 8G Hynix (was in A5) |
|
||||
| 9 | **Install MOVED** | A12 (green clip) | Insert 8G Hynix (was in A6) |
|
||||
| 10 | Power on | — | — |
|
||||
|
||||
## Post-Boot Verification
|
||||
|
||||
```bash
|
||||
# Verify all 10 DIMMs detected
|
||||
ssh root@192.168.1.127 'dmidecode -t memory | grep -E "Locator:|Size:" | grep -v Bank'
|
||||
|
||||
# Verify total RAM (~268 GiB usable)
|
||||
ssh root@192.168.1.127 'free -h'
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue