[ci skip] remove low-traffic pull-through caches (registry.k8s.io, quay.io, reg.kyverno.io)

Pull-through cache at 10.0.20.10 was serving corrupted/truncated images for low-traffic registries, breaking VPA certgen (ImagePullBackOff) and previously causing Kyverno image pull failures. Kept: docker.io (port 5000) and ghcr.io (port 5010) — high traffic, Docker Hub rate limits make caching essential. Removed from cloud-init template and all 5 live nodes: - registry.k8s.io (port 5030) — 14 system images, very low churn - quay.io (port 5020) — 11 images - reg.kyverno.io (port 5040) — 5 images The registry containers on the 10.0.20.10 VM still run but nodes no longer route to them. They can be stopped/removed from the VM later.
2026-03-01 21:46:41 +00:00 · 2026-03-01 21:46:41 +00:00 · de598996f1
commit de598996f1
parent 53be356f41
2 changed files with 11 additions and 18 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@ -122,7 +122,7 @@ terraform fmt -recursive                       # Format all
 ## Infrastructure
 - Proxmox hypervisor (192.168.1.127) — see `.claude/reference/proxmox-inventory.md` for full VM table
 - Kubernetes cluster: 5 nodes (k8s-master + k8s-node1-4, v1.34.2), GPU on node1 (Tesla T4)
- Docker registry pull-through cache at `10.0.20.10` (ports 5000/5010/5020/5030/5040)
+- Docker registry pull-through cache at `10.0.20.10` — only docker.io (port 5000) and ghcr.io (port 5010) are active. quay.io/registry.k8s.io/reg.kyverno.io caches disabled (caused corrupted images).
 - GPU workloads need: `node_selector = { "gpu": "true" }` + `toleration { key = "nvidia.com/gpu", value = "true", effect = "NoSchedule" }`

 ### Node Rebuild Procedure