infra/stacks/immich
Viktor Barzin f201e4573e immich: fix slow context search — prewarm clip_index + latency alert/healthcheck
Context (smart) search latency was caused by the 665MB vchord clip_index
decaying out of PG shared_buffers (~33% resident -> ~1.8s cold ANN reads vs
~4ms warm), NOT by yesterday's ML MODEL_TTL/clip-keepalive change (CLIP textual
is warm ~15ms on GPU). The postStart prewarm runs once at pod start and
pg_prewarm.autoprewarm only re-warms at startup, so the index decays under job
buffer-pressure over days.

- clip-index-prewarm CronJob (immich, */5): pg_prewarm('clip_index') keeps the
  whole index resident -> searches stay ~4ms.
- immich-search-probe CronJob (immich, */5): times a random-vector ANN query +
  reads clip_index residency, pushes gauges to the Pushgateway.
- Prometheus alerts ImmichSmartSearchSlow / ImmichClipIndexColdCache /
  ImmichSearchProbeStale (+ inhibition when the probe is stale).
- cluster_healthcheck.sh check #46 check_immich_search (TOTAL_CHECKS 45->46).
- Docs: infra CLAUDE.md immich note, monitoring.md, cluster-health skill.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:07 +00:00
..
.terraform.lock.hcl kms: revert files accidentally bundled into the docs commit 2026-06-01 10:36:49 +00:00
chart_values.tpl [redis] Migrate live RW consumers off bare redis.redis hostname 2026-04-19 12:42:36 +00:00
frame.tf immich: GPU-accelerate video transcoding (NVENC + NVDEC) 2026-05-29 18:05:34 +00:00
main.tf immich: fix slow context search — prewarm clip_index + latency alert/healthcheck 2026-06-05 09:19:07 +00:00
providers.tf cluster-health: emergency-stop Keel + roll back image downgrades + quota raises 2026-05-26 18:48:50 +00:00
secrets [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
terragrunt.hcl migrate all secrets from SOPS to Vault KV 2026-03-14 17:15:48 +00:00