gpu: schedule off NFD label, not k8s-node1 hostname

Remove every hardcoded reference to k8s-node1 that pinned GPU scheduling to a specific host: - GPU workload nodeSelectors: gpu=true -> nvidia.com/gpu.present=true (frigate, immich, whisper, piper, ytdlp, ebook2audiobook, audiblez, audiblez-web, nvidia-exporter, gpu-pod-exporter). The NFD label is auto-applied by gpu-feature-discovery on any node carrying an NVIDIA PCI device, so the selector follows the card. - null_resource.gpu_node_config: rewrite to enumerate NFD-labeled nodes (feature.node.kubernetes.io/pci-10de.present=true) and taint each with nvidia.com/gpu=true:PreferNoSchedule. Drop the manual 'kubectl label gpu=true' since NFD handles labeling. - MySQL anti-affinity: kubernetes.io/hostname NotIn [k8s-node1] -> nvidia.com/gpu.present NotIn [true]. Same intent (keep MySQL off the GPU node) but portable when the card relocates. Net effect: moving the GPU card between nodes no longer requires any Terraform edit. Verified no-op for current scheduling — both old and new labels resolve to node1 today. Docs updated to match: AGENTS.md, compute.md, overview.md, proxmox-inventory.md, k8s-portal agent-guidance string.
2026-04-22 13:43:07 +00:00 · 2026-04-22 13:43:07 +00:00 · e2146e6916
commit e2146e6916
parent 134d6b9a82
12 changed files with 52 additions and 36 deletions
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@ -139,7 +139,7 @@ The Kubernetes cluster consists of 5 nodes:
 - **k8s-node1 (201)**: 16c/32GB GPU node with Tesla T4 passthrough, tainted for GPU workloads only
 - **k8s-node2-4 (202-204)**: 8c/32GB workers running general-purpose workloads

-GPU passthrough on node1 uses PCIe device 0000:06:00.0, with Kubernetes taint `nvidia.com/gpu=true:NoSchedule` and label `gpu=true` to ensure only GPU-requesting pods schedule there.
+GPU passthrough on node1 uses PCIe device 0000:06:00.0. The NVIDIA GPU Operator's gpu-feature-discovery auto-labels whichever node carries the card with `nvidia.com/gpu.present=true`; `null_resource.gpu_node_config` taints the same set of nodes with `nvidia.com/gpu=true:PreferNoSchedule`. No hostname is hardcoded — moving the card to a different node requires no Terraform edits.

 ### Service Organization