New variant documents ghost Running pods with frozen processes after kured
rolling reboots. Key diagnostic: Running 1/1 but zero listening sockets
from ss -tlnp. Fix: force-delete pods to get fresh NFS mounts.
Drone CI has been fully replaced by Woodpecker CI at ci.viktorbarzin.me.
Destroys K8s resources (12), removes DNS records, NFS exports, Uptime Kuma
monitor, dashboard entry, and all code/doc references across 18 files.
- Add gpu=true label to Terraform (nvidia null_resource alongside taint)
- Improve API server OIDC config to detect value changes, not just flag presence
- Add policy_hash trigger to audit-policy so rule changes auto-reapply
- Enable prometheus-node-exporter sub-chart, delete unused Ansible playbook
- Document full node rebuild procedure in CLAUDE.md
- Save Talos Linux migration evaluation for future reference
Consolidated traefik-http3-quic, traefik-udp-cross-namespace, and
traefik-plugin-download-failure-404 into a single skill with sections
for HTTP/3 (QUIC), UDP cross-namespace routing, and plugin download
failure troubleshooting.
Default pipeline now uses terragrunt run --all to apply all stacks
instead of the broken terraform apply -target=module.kubernetes_cluster.
TLS renewal pipeline stripped of unnecessary Terraform download/init
since renew2.sh is pure shell (certbot + Cloudflare DNS).
Delete 20 orphaned module directories and 3 stray files from
modules/kubernetes/ that are no longer referenced by any stack.
Remove 7 root-level legacy files including the empty tfstate,
27MB terraform zip, commented-out main.tf, and migration notes.
Clean up commented-out dockerhub_secret and oauth-proxy references
in blog, travel_blog, and city-guesser stacks. Remove stale
frigate config.yaml entry from .gitignore. Remove ephemeral
docs/plans/ directory.
Remove the module "xxx" { source = "./module" } indirection layer
from all 66 service stacks. Resources are now defined directly in
each stack's main.tf instead of through a wrapper module.
- Merge module/main.tf contents into stack main.tf
- Apply variable replacements (var.tier -> local.tiers.X, renamed vars)
- Fix shared module paths (one fewer ../ at each level)
- Move extra files/dirs (factory/, chart_values, subdirs) to stack root
- Update state files to strip module.<name>. prefix
- Update CLAUDE.md to reflect flat structure
Verified: terragrunt plan shows 0 add, 0 destroy across all stacks.
Reflect new directory structure where service modules live inside
their stack directories (stacks/<service>/module/) instead of
modules/kubernetes/<service>/. Update file paths, adding service
instructions, and stack structure documentation.
New skill: music-assistant-librespot-wrong-account
- Documents fix for Spotify playback failing with "librespot does not support
free accounts" when cached credentials point to wrong Spotify account
- Includes step-by-step solution: find container, inspect cache, clear and restart
Updated: home-assistant skill with Music Assistant addon details for ha-sofia
Two skills extracted from multi-user k8s access implementation:
- authentik-oidc-kubernetes: 6 gotchas for Authentik OIDC + kube-apiserver
- kubelet-static-pod-manifest-update: full restart cycle for static pod changes
- Add skill_secrets variable to moltbot module with HA tokens and
Uptime Kuma password as container env vars
- Install Python packages (requests, caldav, icalendar, uptime-kuma-api)
in init container with PYTHONPATH for main container access
- Update all skills to use python3 directly instead of ~/.venvs/claude
venv path that doesn't exist in the container
- Remove hardcoded Uptime Kuma password from skill, use env var
- Convert setup-project and extend-vm-storage from standalone .md
to directory-based SKILL.md format with YAML frontmatter
- Add symlink in moltbot init container to expose Claude skills
at ~/.openclaw/skills/ for auto-discovery by OpenClaw
- Update CLAUDE.md skill path references
Add MySQL datasource and 15-panel dashboard for DNS analytics:
queries over time, response codes, top domains/clients, response
times, blocked/NxDomain domains. Enable Grafana dashboard sidecar
for auto-provisioning dashboards from ConfigMaps.
Four layers of noisy-neighbor protection using existing tier system:
- PriorityClasses (tier-0-core through tier-4-aux)
- LimitRange defaults auto-generated per namespace tier
- ResourceQuotas auto-generated per namespace tier
- PriorityClassName injection on pods via Kyverno mutate
Custom quota overrides for monitoring and crowdsec namespaces
which exceed the default tier quotas.