infra

Author	SHA1	Message	Date
Viktor Barzin	4944e508aa	Bucket C: enroll 5 raw-deploy stacks in Keel auto-update * beads-server: 3 Deployments — extended V1 lifecycle blocks to V2 + KEEL_IGNORE_IMAGE; namespace label. * llama-cpp: 1 Deployment — extended V1→V2; namespace label. * novelapp: namespace label only (Deployment has non-standard lifecycle without V1 dns_config — drift expected, accept for now). * plotting-book: namespace label only (same as novelapp). * trading-bot: namespace label only (same as novelapp). immich deferred — the bulk-add script's brace-counter got confused by a HEREDOC in the file, inserting a lifecycle block in the wrong position. Needs manual per-Deployment editing. The 3 ns-only stacks (novelapp, plotting-book, trading-bot) will see their Deployments mutated by Kyverno but their TF lifecycle doesn't yet ignore the keel annotations. Expected behavior: drift visible in terragrunt plan, applied-state oscillates with Kyverno re-injecting. Acceptable starting point; per-Deployment lifecycle work to fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 14:16:55 +00:00
Viktor Barzin	7e1580ba8c	recruiter-responder: deploy stack + llama-cpp qwen3-8b + openclaw plugin mount Three coupled changes for the new recruiter-responder pipeline: 1. stacks/llama-cpp/: add qwen3-8b text-only model to llama-swap. Uses unsloth/Qwen3-8B-GGUF Q4_K_M, 16k context, no mmproj. Refactored the download Job script + cmd renderer to handle text_only=true (skip mmproj download + --mmproj flag). The 3 existing vision models stay on text_only=false; no behaviour change for them. 2. stacks/recruiter-responder/: new stack. Namespace, 2 ExternalSecrets (app secrets from secret/recruiter-responder, DB creds from Vault DB engine static-creds/pg-recruiter-responder), Deployment (replicas=1, Recreate -- IMAP IDLE + APScheduler want single leader), Service ClusterIP. Image: forgejo.viktorbarzin.me/viktor/recruiter-responder. 3. stacks/openclaw/: add init container `install-recruiter-plugin` that uses the recruiter-responder image to copy the .mjs plugin into /home/node/.openclaw/extensions/recruiter-api/ on NFS. Couples plugin version to the recruiter-responder image tag. Also injects RECRUITER_RESPONDER_URL + RECRUITER_RESPONDER_TOKEN env vars (token from openclaw-secrets.recruiter_responder_bearer_token, optional). Pre-apply checklist for recruiter-responder stack: - Vault: seed secret/recruiter-responder with webhook_bearer_token, imap_{me,spam}_{user,pass}, smtp_password, claude_agent_token, task_webhook_token. - Vault: add secret/openclaw.recruiter_responder_bearer_token (same as above webhook_bearer_token). - dbaas: create DB recruiter_responder + role recruiter_responder, and Vault DB-engine role static-creds/pg-recruiter-responder. - Build + push image via Woodpecker (recruiter-responder repo CI). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 14:16:46 +00:00
Viktor Barzin	6e7fe96a40	infra/llama-cpp: benchmark report + -fa flag fix Phase 7 of the vision-LLM benchmark plan. Adds: - docs/benchmarks/2026-05-10-vision-llm.md — curated report (TL;DR, per-model analysis, top-N agreement, cost vs cloud APIs, sample captions). Verdict: qwen3vl-4b for the request path (3.55 s p50, 100% parse, decisive top-N distro); qwen3vl-8b for caption polish. - docs/benchmarks/benchmark-2026-05-10-1424.json — raw 300-row dump for diff-checking against future runs. - main.tf: -fa -> -fa on (b9085 llama.cpp removed the no-value form of the flash-attention flag; without the value llama-server exits before serving any request). - llama-cpp.md architecture doc links the report so future operators land on the deployed-and-evaluated model from one entry point. 300/300 calls, 0 parse errors, 33m32s wall on a single T4 with the GPU exclusively allocated. immich-ml was scaled to 0 for the run (node1 RAM constraint, not GPU - bumping node1 RAM is tracked as a follow-up). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 14:16:41 +00:00
Viktor Barzin	9c617e6d38	infra/llama-cpp: add stack — llama-swap fronting Qwen3-VL + MiniCPM-V Single Deployment of mostlygeek/llama-swap:cuda hot-swaps three GGUF vision models (qwen3vl-8b, minicpm-v-4-5, qwen3vl-4b) at one OpenAI-compat /v1 endpoint on Service llama-swap.llama-cpp.svc. Idle TTL 10min so models unload between benchmark batches. Storage: NFS-RWX from /srv/nfs-ssd/llamacpp (30Gi). One-shot download Job pulls Q4_K_M GGUF + mmproj per model, creates stable model.gguf / mmproj.gguf symlinks so the llama-swap config is filename-agnostic, then warms the kernel page cache. GPU: nvidia.com/gpu=1 = whole T4 — operator must scale immich-ml to 0 during benchmark windows. wait_for_rollout=false so apply doesn't block on GPU availability. Initial use case: vision-LLM benchmark for instagram-poster candidate scoring; future consumers (HA, agentic tooling) hit the same endpoint via LiteLLM at the gateway. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 14:16:40 +00:00

4 commits