Refresh CONTEXT.md against current repo + cluster reality (grill-with-docs):
- Module taxonomy rewrite: drop fictional k8s_app/helm_app/postgres_app
factory modules (never existed); name the real four (ingress_factory,
nfs_volume, anubis_instance, setup_tls_secret) + the shared / Stack-local
/ flat distinction; flag vestigial modules/kubernetes/<app> dirs.
- Rename "Ingress auth tier" -> "Ingress auth" (discrete modes, not tiers);
reserve "tier" for State tier + Namespace tier only.
- Add local-path entry (cluster default SC; node-local footgun warning).
- Add concepts: Keel, Diun, CNPG/pg-cluster, MetalLB LB-IP split, Calico.
- Add "policy" ambiguity flag (Kyverno vs Calico NetworkPolicy vs Vault/RBAC).
- Fix node count 5 -> 7 (k8s-master + k8s-node1..6).
Doc-sync (same commit per repo rules):
- overview.md: replace fictional factory modules with the real shared
modules + the flat/stack-local pattern.
- .claude/CLAUDE.md: drop dead nfs-proxmox column from the storage decision
table + stale cross-reference (vault migrated off it 2026-04-25).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Remove every hardcoded reference to k8s-node1 that pinned GPU
scheduling to a specific host:
- GPU workload nodeSelectors: gpu=true -> nvidia.com/gpu.present=true
(frigate, immich, whisper, piper, ytdlp, ebook2audiobook, audiblez,
audiblez-web, nvidia-exporter, gpu-pod-exporter). The NFD label is
auto-applied by gpu-feature-discovery on any node carrying an
NVIDIA PCI device, so the selector follows the card.
- null_resource.gpu_node_config: rewrite to enumerate NFD-labeled
nodes (feature.node.kubernetes.io/pci-10de.present=true) and taint
each with nvidia.com/gpu=true:PreferNoSchedule. Drop the manual
'kubectl label gpu=true' since NFD handles labeling.
- MySQL anti-affinity: kubernetes.io/hostname NotIn [k8s-node1] ->
nvidia.com/gpu.present NotIn [true]. Same intent (keep MySQL off
the GPU node) but portable when the card relocates.
Net effect: moving the GPU card between nodes no longer requires any
Terraform edit. Verified no-op for current scheduling — both old and
new labels resolve to node1 today.
Docs updated to match: AGENTS.md, compute.md, overview.md,
proxmox-inventory.md, k8s-portal agent-guidance string.
TrueNAS VM 9000 was operationally decommissioned 2026-04-13; NFS has been
served by Proxmox host (192.168.1.127) since. This commit scrubs remaining
references from active docs. VM 9000 itself remains on PVE in stopped state
pending user decision on deletion.
In-session cleanup already landed: reverse-proxy ingress + Cloudflare record
removed; Technitium DNS records deleted; Vault truenas_{api_key,ssh_private_key}
purged; homepage_credentials.reverse_proxy.truenas_token removed;
truenas_homepage_token variable + module deleted; Loki + Dashy cleaned;
config.tfvars deprecated DNS lines removed; historical-name comment added to
the nfs-truenas StorageClass (48 bound PVs, immutable name — kept).
Historical records (docs/plans/, docs/post-mortems/, .planning/) intentionally
untouched — they describe state at a point in time.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>