gpu: schedule off NFD label, not k8s-node1 hostname
Remove every hardcoded reference to k8s-node1 that pinned GPU scheduling to a specific host: - GPU workload nodeSelectors: gpu=true -> nvidia.com/gpu.present=true (frigate, immich, whisper, piper, ytdlp, ebook2audiobook, audiblez, audiblez-web, nvidia-exporter, gpu-pod-exporter). The NFD label is auto-applied by gpu-feature-discovery on any node carrying an NVIDIA PCI device, so the selector follows the card. - null_resource.gpu_node_config: rewrite to enumerate NFD-labeled nodes (feature.node.kubernetes.io/pci-10de.present=true) and taint each with nvidia.com/gpu=true:PreferNoSchedule. Drop the manual 'kubectl label gpu=true' since NFD handles labeling. - MySQL anti-affinity: kubernetes.io/hostname NotIn [k8s-node1] -> nvidia.com/gpu.present NotIn [true]. Same intent (keep MySQL off the GPU node) but portable when the card relocates. Net effect: moving the GPU card between nodes no longer requires any Terraform edit. Verified no-op for current scheduling — both old and new labels resolve to node1 today. Docs updated to match: AGENTS.md, compute.md, overview.md, proxmox-inventory.md, k8s-portal agent-guidance string.
This commit is contained in:
parent
134d6b9a82
commit
e2146e6916
12 changed files with 52 additions and 36 deletions
|
|
@ -138,7 +138,7 @@ Kyverno auto-generates LimitRange + ResourceQuota per namespace based on tier la
|
|||
|
||||
- **Proxmox**: 192.168.1.127 (Dell R730, 22c/44t, 142GB RAM)
|
||||
- **Nodes**: k8s-master (10.0.20.100), node1 (GPU, Tesla T4), node2-4
|
||||
- **GPU workloads**: \`node_selector = { "gpu": "true" }\` + toleration \`nvidia.com/gpu\`
|
||||
- **GPU workloads**: \`node_selector = { "nvidia.com/gpu.present" : "true" }\` + toleration \`nvidia.com/gpu\` (label auto-applied by gpu-feature-discovery, no hostname pins)
|
||||
- **Pull-through cache**: 10.0.20.10 — use versioned image tags (cache serves stale :latest manifests)
|
||||
- **MySQL InnoDB Cluster**: 3 instances on iSCSI
|
||||
- **SMTP**: \`var.mail_host\` port 587 STARTTLS
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue