infra

Viktor Barzin a3eb309e26 All checks were successful ci/woodpecker/push/default Pipeline was successful Details calico: fix empty Whisker UI — allow whisker egress to the kube-dns ClusterIP Real root cause of the 2026-06-28 "Whisker UI empty" incident (the watchdog added in `8d1d2fb9` was treating a symptom). The tigera operator's own `whisker` NetworkPolicy is policyTypes:[Ingress,Egress]; its egress allows DNS only to the kube-dns pods (podSelector k8s-app=kube-dns). But whisker-backend resolves goldmane.calico-system.svc via the kube-dns ClusterIP (10.96.0.10), and Calico drops UDP DNS to a ClusterIP under a podSelector-only egress rule. Verified in an isolated repro: from the whisker pod's netns, ClusterIP DNS = 100% timeout while direct kube-dns pod-IP DNS = OK; a pod with no egress policy resolves fine; a test pod with the operator's podSelector-only egress rule reproduces the failure, and adding an ipBlock(ClusterIP) egress rule flips it to 100% ok. whisker-backend resolves goldmane once in the brief startup window before the policy programs, holds its long-lived gRPC stream, and only re-resolves when that stream breaks (e.g. a node-reboot blip) — then the blocked ClusterIP DNS wedges its Go resolver and the UI goes empty. The durable aggregator (separate pod, unrestricted namespace) was never affected. Fix: additive egress NetworkPolicy whisker-allow-dns-clusterip (whisker -> 10.96.0.10/32 on 53 UDP+TCP); k8s egress policies are additive so the operator NP is untouched. The whisker-watchdog CronJob is kept as a backstop (repurposed comment). Applied + verified: ClusterIP DNS from the whisker netns now 8/8 ok, whisker-backend 0 errors, flow API returns 828 flows / the namespace list. Docs (runbook + CLAUDE.md) updated to the real root cause. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>		2026-06-28 09:32:28 +00:00
..
apiserver-audit-logging.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
beads-auto-dispatch.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
breakglass-ssh.md	break-glass SSH: drop port-knock for exposed key-only :52222; version host config	2026-06-11 18:23:39 +00:00
breakglass-ui.md	claude-breakglass: in-cluster warm break-glass UI for the devvm	2026-06-12 21:40:17 +00:00
chrome-service-snapshot.md	workstation: per-user playwright browser MCP for all users, reproducible from git	2026-06-16 20:33:47 +00:00
claude-auth-renew-workstation.md	workstation: per-user long-lived Claude token to end concurrent-refresh logout	2026-06-28 08:07:43 +00:00
fan-control.md	fan-control docs: sync runbook/env/service/design to the HA-actuator + anti-flap model	2026-06-16 08:11:48 +00:00
forgejo-open-signups.md	docs(forgejo): runbook reflects Authentik disabled + zero-click GitHub	2026-06-19 17:37:46 +00:00
forgejo-registry-breakglass.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
forgejo-registry-rebuild-image.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
forgejo-registry-setup.md	forgejo pulls: route *.viktorbarzin.me to Technitium, drop /etc/hosts pins [ci skip]	2026-06-10 07:56:31 +00:00
goldmane-flow-trail.md	calico: fix empty Whisker UI — allow whisker egress to the kube-dns ClusterIP	2026-06-28 09:32:28 +00:00
grow-pve-nfs-lv.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
homelab-vault-onboarding.md	docs(homelab-vault): rebuild snippet uses cli/VERSION, not git describe	2026-06-28 09:05:49 +00:00
immich-transcode-bitrate.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
job-hunter.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
k8s-node-auto-upgrades.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
k8s-version-upgrade.md	k8s-upgrade: reclaim+auto-prune kubeadm /etc/kubernetes/tmp leak; correct crash root cause to etcd IO (not OIDC)	2026-06-25 15:23:15 +00:00
kms-public-exposure.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
mailserver-pfsense-haproxy.md	pfsense: SNI-routed internal 443 — mail.viktorbarzin.me serves webmail everywhere	2026-06-10 18:41:07 +00:00
mailserver-proxy-protocol.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
nextcloud-add-archive.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
nfs-prerequisites.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
offboard-user.md	workstation: emo direct master push — allow-then-audit [ci skip]	2026-06-10 14:53:43 +00:00
pfsense-unbound.md	dns: pfSense forward-zone for viktorbarzin.me, nodes fully stock [ci skip]	2026-06-10 08:32:34 +00:00
proxmox-host.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
r730-ram-upgrade-272gb.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
registry-rebuild-image.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
registry-vm.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
restore-etcd.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
restore-full-cluster.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
restore-lvm-snapshot.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
restore-mysql.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
restore-postgresql.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
restore-pvc-from-backup.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
restore-vault.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
restore-vaultwarden.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
scale-k8s-cluster.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
security-incident.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
synology-storage.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
t3-drop-attribution.md	t3: connection logging across the path for drop attribution	2026-06-11 13:48:10 +00:00
t3-version-bump.md	docs: t3-migrate-idle runbook section + service-catalog + design status	2026-06-21 12:40:46 +00:00
technitium-apply.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
vault-raft-leader-deadlock.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
vault-token-renew-devvm.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
woodpecker-onboard-forgejo-repo.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00