infra/.claude
Viktor Barzin b64d8d6168 cluster-health: add #47 ghost-disk drift check; fix immich_search set -e crash
Check #47 "Proxmox CSI — Ghost-Disk Drift": per node, compares the real
virtio-scsi CSI disk count in `qm config <vmid>` (SSH PVE) against the
attached proxmox-CSI VolumeAttachments k8s tracks. Catches orphaned "ghost"
disks left by failed detaches (query-pci QMP timeouts) that the scheduler's
28-LUN guard can't see — exactly the drift that wedged the MAM grabber on
node3 (13 tracked vs 23 real). PASS reconciled; WARN drift>0 or real 20-24;
FAIL real ≥25 (near the LUN cap). Already flagging node6 at 21 disks.
Single `qm list` + one `qm config` per VM keeps it ~3s (the naive
once-per-VM version timed out the parallel runner).

Also fixes a PRE-EXISTING set -e crash in #46 immich_search (introduced by
138894cd): `pct=$(kubectl exec … | tr -d ' ')` and the dur_ms probe were
unguarded, so with `set -o pipefail` a non-zero psql/exec propagated and
tripped `set -e`, killing the check before json_add. It silently dropped
from every parallel report and broke --serial entirely (whole run aborted).
Guarded both substitutions with `|| true`; the existing `=~` numeric checks
already handle the empty case. immich_search now reports PASS/WARN instead
of vanishing.
2026-06-05 09:19:10 +00:00
..
agents k8s-version-upgrade: decompose into Job chain to fix self-preemption 2026-05-11 23:54:22 +00:00
commands [ci skip] update kubectl skill to use local kubeconfig 2026-02-07 13:42:35 +00:00
reference docs(k8s-dashboard): dashboard SSO as-built (Option B multi-issuer apiserver) 2026-06-05 09:19:09 +00:00
scripts rename weekly-backup → daily-backup across scripts, timers, services, and docs [ci skip] 2026-04-13 18:37:04 +00:00
skills cluster-health: add #47 ghost-disk drift check; fix immich_search set -e crash 2026-06-05 09:19:10 +00:00
calendar-query.py sync regenerated providers.tf + upstream changes 2026-03-22 02:56:04 +02:00
CLAUDE.md immich: fix slow context search — prewarm clip_index + latency alert/healthcheck 2026-06-05 09:19:07 +00:00
home-assistant-sofia.py [ci skip] Add ha-sofia Home Assistant deployment to skills 2026-02-07 21:26:05 +00:00
home-assistant.py add claude [ci skip] 2026-02-06 20:10:02 +00:00
internet-mode-used_DO_NOT_REMOVE_MANUALLY_SECURITY_RISK add claude [ci skip] 2026-02-06 20:10:02 +00:00
pfsense.py [ci skip] Add pfSense firewall management skill 2026-02-14 12:42:10 +00:00
settings.json add claude files [ci skip] 2026-01-18 15:40:43 +00:00