Stage 1 of moving private images off the registry:2 container at registry.viktorbarzin.me:5050 (which has hit distribution#3324 corruption 3x in 3 weeks) onto Forgejo's built-in OCI registry. No cutover risk — pods still pull from the existing registry until Phase 3. What changes: * Forgejo deployment: memory 384Mi→1Gi, PVC 5Gi→15Gi (cap 50Gi). Explicit FORGEJO__packages__ENABLED + CHUNKED_UPLOAD_PATH (defensive, v11 default-on). * ingress_factory: max_body_size variable was declared but never wired in after the nginx→Traefik migration. Now creates a per-ingress Buffering middleware when set; default null = no limit (preserves existing behavior). Forgejo ingress sets max_body_size=5g to allow multi-GB layer pushes. * Cluster-wide registry-credentials Secret: 4th auths entry for forgejo.viktorbarzin.me, populated from Vault secret/viktor/ forgejo_pull_token (cluster-puller PAT, read:package). Existing Kyverno ClusterPolicy syncs cluster-wide — no policy edits. * Containerd hosts.toml redirect: forgejo.viktorbarzin.me → in-cluster Traefik LB 10.0.20.200 (avoids hairpin NAT for in-cluster pulls). Cloud-init for new VMs + scripts/setup-forgejo-containerd-mirror.sh for existing nodes. * Forgejo retention CronJob (0 4 * * *): keeps newest 10 versions per package + always :latest. First 7 days dry-run (DRY_RUN=true); flip the local in cleanup.tf after log review. * Forgejo integrity probe CronJob (*/15): same algorithm as the existing registry-integrity-probe. Existing Prometheus alerts (RegistryManifestIntegrityFailure et al) made instance-aware so they cover both registries during the bake. * Docs: design+plan in docs/plans/, setup runbook in docs/runbooks/. Operational note — the apply order is non-trivial because the new Vault keys (forgejo_pull_token, forgejo_cleanup_token, secret/ci/global/forgejo_*) must exist BEFORE terragrunt apply in the kyverno + monitoring + forgejo stacks. The setup runbook documents the bootstrap sequence. Phase 1 (per-project dual-push pipelines) follows in subsequent commits. Bake clock starts when the last project goes dual-push. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
59 lines
1.7 KiB
Bash
Executable file
59 lines
1.7 KiB
Bash
Executable file
#!/usr/bin/env bash
|
|
# One-shot deployment of the forgejo.viktorbarzin.me containerd hosts.toml
|
|
# entry across every k8s node. Cloud-init only fires on VM provision, so
|
|
# existing nodes need this manual rollout.
|
|
#
|
|
# What it does, per node:
|
|
# 1. drain (ignore-daemonsets, delete-emptydir-data)
|
|
# 2. ssh in: mkdir + write /etc/containerd/certs.d/forgejo.viktorbarzin.me/hosts.toml
|
|
# 3. systemctl restart containerd
|
|
# 4. uncordon
|
|
#
|
|
# hosts.toml is documented as hot-reloaded but the post-2026-04-19
|
|
# containerd corruption playbook calls for an explicit restart so the
|
|
# config is unambiguously in effect. Running drain/uncordon around it
|
|
# avoids pulling against an in-flight containerd restart.
|
|
#
|
|
# Re-run is safe: writes are idempotent.
|
|
|
|
set -euo pipefail
|
|
|
|
CERTS_DIR=/etc/containerd/certs.d/forgejo.viktorbarzin.me
|
|
HOSTS_TOML='server = "https://forgejo.viktorbarzin.me"
|
|
|
|
[host."https://10.0.20.200"]
|
|
capabilities = ["pull", "resolve"]
|
|
'
|
|
|
|
NODES=$(kubectl get nodes -o name | sed 's|^node/||')
|
|
if [[ -z "$NODES" ]]; then
|
|
echo "ERROR: no nodes returned from kubectl get nodes" >&2
|
|
exit 1
|
|
fi
|
|
|
|
for n in $NODES; do
|
|
echo "=== $n ==="
|
|
kubectl drain "$n" --ignore-daemonsets --delete-emptydir-data --force --grace-period=60
|
|
|
|
ssh -o StrictHostKeyChecking=accept-new "wizard@$n" sudo bash <<EOF
|
|
set -euo pipefail
|
|
mkdir -p "$CERTS_DIR"
|
|
cat > "$CERTS_DIR/hosts.toml" <<'TOML'
|
|
$HOSTS_TOML
|
|
TOML
|
|
systemctl restart containerd
|
|
EOF
|
|
|
|
kubectl uncordon "$n"
|
|
|
|
# Wait for the node to report Ready before moving to the next one.
|
|
for i in {1..30}; do
|
|
if kubectl get node "$n" -o jsonpath='{.status.conditions[?(@.type=="Ready")].status}' | grep -q True; then
|
|
echo " node Ready"
|
|
break
|
|
fi
|
|
sleep 2
|
|
done
|
|
done
|
|
|
|
echo "All nodes updated."
|