Viktor Barzin fd0f4a0365 fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip]

6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-09 08:45:33 +00:00

8 KiB

Raw Blame History

name	description	author	version	date
k8s-container-image-caching	Set up and troubleshoot container image pull-through caches in Kubernetes. Use when: (1) ImagePullBackOff for non-Docker-Hub images routed through a wildcard mirror, (2) containerd has deprecated `registry.mirrors."*"` catching all image pulls, (3) need to add pull-through cache for a new upstream registry, (4) `mirrors` cannot be set when `config_path` is provided error in containerd, (5) containerd 1.6.x vs 1.7.x config_path compatibility issues, (6) kubectl shows correct image tag but container runs old code, (7) local registry mirror caches stale images, (8) imagePullPolicy: Always doesn't force fresh pulls, (9) containerd config has mirror that intercepts pulls serving stale images. Covers multi-registry pull-through cache setup (Docker Registry v2) and cache bypass via image digest pinning.	Claude Code	1.0.0	2026-02-22

Kubernetes Container Image Caching

Pull-Through Cache Setup

Problem

Docker Registry v2 can only proxy one upstream registry per instance. A common misconfiguration is using a containerd wildcard mirror (registry.mirrors."*") pointing to a single Docker Hub proxy, which breaks pulls from ghcr.io, quay.io, registry.k8s.io, and other registries -- they get routed to the Docker Hub proxy which can't serve them, causing ImagePullBackOff.

Context / Trigger Conditions

ImagePullBackOff for images from ghcr.io, quay.io, registry.k8s.io, or other non-Docker-Hub registries
Containerd config has deprecated [plugins."io.containerd.grpc.v1.cri".registry.mirrors."*"]
Error: failed to load plugin io.containerd.grpc.v1.cri: invalid plugin config: mirrors cannot be set when config_path is provided
Need to migrate from deprecated wildcard mirrors to modern config_path approach

Solution

1. Run one Registry v2 container per upstream

Each upstream needs its own Docker Registry v2 instance on a different port:

Port	Registry	Container Name
5000	docker.io	registry
5010	ghcr.io	registry-ghcr
5020	quay.io	registry-quay
5030	registry.k8s.io	registry-k8s
5040	reg.kyverno.io	registry-kyverno

Config for non-Docker-Hub proxies (no auth needed -- they're public):

version: 0.1
storage:
  cache:
    blobdescriptor: inmemory
  filesystem:
    rootdirectory: /var/lib/registry
http:
  addr: :5000
proxy:
  remoteurl: https://ghcr.io  # change per registry

docker run -p 5010:5000 -d --restart always --name registry-ghcr \
  -v /etc/docker-registry/ghcr/config.yml:/etc/docker/registry/config.yml registry:2

2. Replace deprecated wildcard mirror with `config_path`

Instead of:

# DEPRECATED - breaks non-Docker-Hub registries
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."*"]
  endpoint = ["http://10.0.20.10:5000"]

Use the modern config_path approach:

[plugins."io.containerd.grpc.v1.cri".registry]
  config_path = "/etc/containerd/certs.d"

Then create per-registry hosts.toml files:

mkdir -p /etc/containerd/certs.d/docker.io
cat > /etc/containerd/certs.d/docker.io/hosts.toml <<'EOF'
server = "https://registry-1.docker.io"

[host."http://10.0.20.10:5000"]
  capabilities = ["pull", "resolve"]
EOF

Registries without a hosts.toml entry fall through to direct pull (no breakage).

3. Critical: `config_path` and `mirrors` cannot coexist

Containerd will refuse to start the CRI plugin if both config_path and any mirrors entries exist in config.toml. You must remove ALL mirrors entries (including the [plugins."...registry.mirrors"] parent section) before setting config_path.

This is especially dangerous on containerd 1.6.x (used on older nodes like k8s-master) where the config format is slightly different. If unsure, either:

Don't use config_path on that node (skip the pull-through cache)
Remove the entire mirrors section first, then add config_path

4. Static IP for registry VM

If the registry VM uses DHCP and gets the wrong IP, all mirrors break. Use static IP via cloud-init ipconfig0 = "ip=10.0.20.10/24,gw=10.0.20.1" instead of DHCP.

Verification

# Test each proxy responds
for port in 5000 5010 5020 5030 5040; do
  curl -s http://10.0.20.10:$port/v2/_catalog
done

# Test containerd can pull through cache
crictl pull ghcr.io/some/image:tag

# Check containerd logs for mirror usage
journalctl -u containerd --since "5 minutes ago" | grep -i "mirror\|registry"

Notes

Fallback behavior: If the local mirror is unreachable, containerd falls through to direct pull from the upstream server URL. This provides graceful degradation.
GC crontabs: Add weekly garbage collection for each registry container, staggered to avoid I/O spikes.
Hourly restart: Registry v2 has known memory leak issues; hourly restart mitigates.
Cache is ephemeral: VM recreation clears the cache. Images re-cache on demand.

Cache Bypass / Stale Image Fix

Problem

Kubernetes pods continue running old Docker images even after pushing new versions with the same tag (e.g., :latest). This happens when a local registry mirror caches images and serves stale versions, ignoring imagePullPolicy: Always.

Context / Trigger Conditions

Pod is running but application code is outdated
docker push succeeded with new layers
kubectl describe pod shows correct image tag
Cluster has a local registry mirror configured (e.g., in containerd config)
imagePullPolicy: Always doesn't fix the issue
Nodes configured with registry mirrors at /etc/containerd/certs.d/ or similar

Solution

1. Get the image digest after pushing

docker push viktorbarzin/myimage:latest
# Output includes: latest: digest: sha256:abc123... size: 856

2. Use digest instead of tag in deployment

# Terraform
container {
  # Use digest to bypass local registry cache
  image             = "docker.io/viktorbarzin/myimage@sha256:abc123..."
  image_pull_policy = "Always"
  name              = "myimage"
}

# Kubernetes YAML
containers:
  - name: myimage
    image: docker.io/viktorbarzin/myimage@sha256:abc123...
    imagePullPolicy: Always

3. Apply and restart

terraform apply -target=module.kubernetes_cluster.module.myservice
kubectl rollout restart deployment/myservice -n mynamespace

Why This Works

Registry mirrors match by tag, not digest
When you specify a digest, the node must fetch that exact manifest
The mirror may not have the digest cached, forcing a pull from upstream
Even if cached, the digest guarantees the exact image version

Verification

# Check the pod is using the new image
kubectl get pod -n mynamespace -o jsonpath='{.items[*].spec.containers[*].image}'

# Verify application behavior reflects new code
kubectl exec -n mynamespace deploy/myservice -- <verification-command>

Example

Before (problematic):

image = "docker.io/viktorbarzin/audiblez-web:latest"

After (fixed):

image = "docker.io/viktorbarzin/audiblez-web@sha256:4d0e2c839555e2229bc91a0b1273569bac88529e8b3c3cadad3c3cf9d865fa29"

Notes

You must update the digest each time you push a new image
Consider automating digest extraction in CI/CD pipelines
This is a workaround; ideally fix the registry mirror configuration
To find your registry mirror config: cat /etc/containerd/config.toml on nodes
Common mirror locations: /etc/containerd/certs.d/docker.io/hosts.toml

Diagnosing Registry Mirror Issues

# On a k8s node, check containerd config
cat /etc/containerd/config.toml | grep -A5 mirrors

# Check if mirror is intercepting
crictl pull docker.io/library/alpine:latest --debug 2>&1 | grep -i mirror

# List cached images on node
crictl images | grep myimage

8 KiB Raw Blame History

Kubernetes Container Image Caching

Pull-Through Cache Setup

Problem

Context / Trigger Conditions

Solution

1. Run one Registry v2 container per upstream

2. Replace deprecated wildcard mirror with config_path

3. Critical: config_path and mirrors cannot coexist

4. Static IP for registry VM

Verification

Notes

Cache Bypass / Stale Image Fix

Problem

Context / Trigger Conditions

Solution

1. Get the image digest after pushing

2. Use digest instead of tag in deployment

3. Apply and restart

Why This Works

Verification

Example

Notes

Diagnosing Registry Mirror Issues

References

8 KiB

Raw Blame History

2. Replace deprecated wildcard mirror with `config_path`

3. Critical: `config_path` and `mirrors` cannot coexist