infra/.claude/reference
Viktor Barzin e2146e6916 gpu: schedule off NFD label, not k8s-node1 hostname
Remove every hardcoded reference to k8s-node1 that pinned GPU
scheduling to a specific host:

- GPU workload nodeSelectors: gpu=true -> nvidia.com/gpu.present=true
  (frigate, immich, whisper, piper, ytdlp, ebook2audiobook, audiblez,
  audiblez-web, nvidia-exporter, gpu-pod-exporter). The NFD label is
  auto-applied by gpu-feature-discovery on any node carrying an
  NVIDIA PCI device, so the selector follows the card.

- null_resource.gpu_node_config: rewrite to enumerate NFD-labeled
  nodes (feature.node.kubernetes.io/pci-10de.present=true) and taint
  each with nvidia.com/gpu=true:PreferNoSchedule. Drop the manual
  'kubectl label gpu=true' since NFD handles labeling.

- MySQL anti-affinity: kubernetes.io/hostname NotIn [k8s-node1] ->
  nvidia.com/gpu.present NotIn [true]. Same intent (keep MySQL off
  the GPU node) but portable when the card relocates.

Net effect: moving the GPU card between nodes no longer requires any
Terraform edit. Verified no-op for current scheduling — both old and
new labels resolve to node1 today.

Docs updated to match: AGENTS.md, compute.md, overview.md,
proxmox-inventory.md, k8s-portal agent-guidance string.
2026-04-22 13:43:07 +00:00
..
authentik-state.md update VPN architecture docs and Authentik state reference 2026-04-06 16:26:21 +03:00
github-api.md [ci skip] Sunset Drone CI: remove all artifacts, DNS, configs, and references 2026-02-23 19:38:55 +00:00
known-issues.md add infrastructure agent team: 8 specialized agents + 14 diagnostic scripts 2026-03-15 02:01:07 +00:00
patterns.md [docs] Update anti-AI and rybbit docs after rewrite-body removal 2026-04-17 21:43:13 +00:00
proxmox-inventory.md gpu: schedule off NFD label, not k8s-node1 hostname 2026-04-22 13:43:07 +00:00
service-catalog.md [registry] Stop recurring orphan OCI-index incidents — detection + prevention + recovery 2026-04-19 17:08:28 +00:00
upgrade-config.json chore: add untracked stacks, scripts, and agent configs 2026-04-15 09:33:06 +00:00