|
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The chain hardcoded master→node4→node3→node2→node1→postflight and SSHed by
FQDN. It silently SKIPPED node5/node6 (added 2026-05-26) — postflight would
have failed even if reachable — and node5/node6 had no .viktorbarzin.lan DNS
records, so the chain couldn't SSH to them at all.
Refactor (upgrade-step.sh):
- Worker set + order derived live from `kubectl get nodes` (worker_nodes /
next_pending_worker), so EVERY worker still off-target is upgraded and a
newly-joined node is covered with zero script change.
- SSH targets are node InternalIPs (ssh_target), removing the dependency on
node DNS records entirely — a new node is reachable the moment it joins.
- The two remaining hardcoded loops (containerd skew, apt-repo rewrite) now
enumerate workers/all-nodes dynamically too.
- Topology preserved: master-drain Job runs on the first worker; every
worker-drain Job runs on the already-upgraded k8s-master (self-preemption
invariant intact).
- next_pending_worker returns 0 explicitly on the no-match path — the
`while read … done < <(…)` loop exits 1 at EOF, which under set -e would
abort the LAST worker's Job before it spawns postflight (cluster upgraded
but no cleanup / in_flight reset). Caught in review.
Docs (runbook + architecture + headers) updated to the dynamic topology.
NOTE: nodes still need the k8s-upgrade SSH public key in authorized_keys; it was
deployed to node4/5/6 by hand this session. Baking it into node provisioning
(so new nodes get it automatically) is the remaining follow-up.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| adr | ||
| architecture | ||
| benchmarks | ||
| plans | ||
| post-mortems | ||
| runbooks | ||
| known-issues.md | ||
| README.md | ||
Infrastructure Documentation
This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.
Quick Reference
Network Ranges
- Physical Network:
192.168.1.0/24- Physical devices and host network - Management VLAN 10:
10.0.10.0/24- Infrastructure VMs and management - Kubernetes VLAN 20:
10.0.20.0/24- Kubernetes cluster network
Key URLs
- Public:
viktorbarzin.me - Internal:
viktorbarzin.lan
Architecture Documentation
| Document | Description |
|---|---|
| Overview | Infrastructure overview, hardware specs, VM inventory, and service catalog |
| Networking | Network topology, VLANs, routing, and firewall rules |
| VPN | Headscale mesh VPN and Cloudflare Tunnel configuration |
| Storage | Proxmox host NFS, Proxmox CSI (LVM-thin + LUKS2), and persistent volume management |
| Authentication | Authentik SSO, OIDC flows, and service integration |
| Security | CrowdSec IPS, Kyverno policies, and security controls |
| Monitoring | Prometheus, Grafana, Loki, and observability stack |
| Secrets Management | HashiCorp Vault integration and secret rotation |
| CI/CD | Woodpecker CI pipeline and deployment automation |
| Backup & DR | Backup strategy, disaster recovery, and restore procedures |
| Compute | Proxmox VMs, GPU passthrough, K8s resource management, and VPA |
| Databases | PostgreSQL, MySQL, Redis, and database operators |
| Multi-tenancy | Namespace isolation, tier system, and resource quotas |
Operations
- Runbooks - Step-by-step operational procedures
- Plans - Infrastructure change plans and rollout strategies
Getting Started
- Review the Overview for a high-level understanding
- Read the Networking doc to understand connectivity
- Check Compute for resource management patterns
- Explore individual architecture docs based on your area of interest