infra/.claude/skills
Viktor Barzin f201e4573e immich: fix slow context search — prewarm clip_index + latency alert/healthcheck
Context (smart) search latency was caused by the 665MB vchord clip_index
decaying out of PG shared_buffers (~33% resident -> ~1.8s cold ANN reads vs
~4ms warm), NOT by yesterday's ML MODEL_TTL/clip-keepalive change (CLIP textual
is warm ~15ms on GPU). The postStart prewarm runs once at pod start and
pg_prewarm.autoprewarm only re-warms at startup, so the index decays under job
buffer-pressure over days.

- clip-index-prewarm CronJob (immich, */5): pg_prewarm('clip_index') keeps the
  whole index resident -> searches stay ~4ms.
- immich-search-probe CronJob (immich, */5): times a random-vector ANN query +
  reads clip_index residency, pushes gauges to the Pushgateway.
- Prometheus alerts ImmichSmartSearchSlow / ImmichClipIndexColdCache /
  ImmichSearchProbeStale (+ inhibition when the probe is stale).
- cluster_healthcheck.sh check #46 check_immich_search (TOTAL_CHECKS 45->46).
- Docs: infra CLAUDE.md immich note, monitoring.md, cluster-health skill.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:07 +00:00
..
add-user docs(add-user): update skill with actual working flow (no auto TF apply) 2026-03-18 00:28:46 +00:00
archived [claude-agent-service] Remove orphaned DevVM SSH key wiring 2026-04-18 13:31:15 +00:00
cluster-health immich: fix slow context search — prewarm clip_index + latency alert/healthcheck 2026-06-05 09:19:07 +00:00
disk-wear [skill] Add /disk-wear skill for periodic disk write analysis 2026-04-17 11:15:26 +00:00
extend-vm-storage [ci skip] Import Claude skills into OpenClaw moltbot 2026-02-17 21:09:12 +00:00
home-assistant docs: add ha-sofia Version Control add-on to HA skill [ci skip] 2026-04-12 11:37:02 +01:00
k8s-ndots-search-domain-nxdomain-flood [ci skip] Add pfsense-dnsmasq-interface-binding skill, update ndots skill to v1.1.0 2026-02-16 22:30:57 +00:00
pfsense [ci skip] Add pfSense firewall management skill 2026-02-14 12:42:10 +00:00
post-mortem feat: add incident management system with user reporting 2026-04-14 20:00:31 +00:00
setup-project feat(setup-project): auto-PR working Dockerfiles back to upstream 2026-04-17 18:12:13 +00:00
upgrade-state k8s-version-upgrade: switch detection cron from weekly to daily 2026-05-18 18:29:08 +00:00
uptime-kuma [uptime-kuma] Codify MySQL monitor (id=663) via idempotent sync CronJob 2026-04-18 12:04:17 +00:00