llama-cpp: restore replicas to 1; fire-planner: fix llama-swap URL

llama-cpp was scaled to 0 during 2026-05-25 IO-storm recovery (TEMP-SCALEDOWN). Cluster is now stable; only frigate competes for the GPU on k8s-node1. Restoring to 1 to unblock fire-planner's Reddit examples ingest, which needs qwen3-8b for structured extraction. fire-planner's llama_cpp_base_url default pointed at a non-existent service:port (llama-cpp:8000) — the real service is `llama-swap` on port 8080. First 2026-05-28 bulk Job exited 0 with 0 rows because of this. Correcting.
2026-05-29 06:20:03 +00:00 · 2026-05-29 06:20:03 +00:00 · b10233975b
commit b10233975b
parent 478629c1ee
2 changed files with 14 additions and 9 deletions
--- a/stacks/fire-planner/main.tf
+++ b/stacks/fire-planner/main.tf
@ -620,8 +620,11 @@ resource "kubernetes_config_map" "grafana_fire_planner_datasource" {

 variable "llama_cpp_base_url" {
  type        = string
-  description = "llama-cpp /v1/chat/completions endpoint for primary LLM extraction"
-  default     = "http://llama-cpp.llama-cpp.svc.cluster.local:8000/v1/chat/completions"
+  description = "llama-swap /v1/chat/completions endpoint for primary LLM extraction"
+  # Service is named `llama-swap`, NOT `llama-cpp` — the proxy in front of
+  # the actual llama-cpp pod. Port 8080. (Initial 2026-05-28 value pointed
+  # at a non-existent service:port and the bulk Job produced 0 rows.)
+  default = "http://llama-swap.llama-cpp.svc.cluster.local:8080/v1/chat/completions"
 }

 variable "claude_agent_service_url" {