infra/llama-cpp: benchmark report + -fa flag fix

Phase 7 of the vision-LLM benchmark plan. Adds: - docs/benchmarks/2026-05-10-vision-llm.md — curated report (TL;DR, per-model analysis, top-N agreement, cost vs cloud APIs, sample captions). Verdict: qwen3vl-4b for the request path (3.55 s p50, 100% parse, decisive top-N distro); qwen3vl-8b for caption polish. - docs/benchmarks/benchmark-2026-05-10-1424.json — raw 300-row dump for diff-checking against future runs. - main.tf: -fa -> -fa on (b9085 llama.cpp removed the no-value form of the flash-attention flag; without the value llama-server exits before serving any request). - llama-cpp.md architecture doc links the report so future operators land on the deployed-and-evaluated model from one entry point. 300/300 calls, 0 parse errors, 33m32s wall on a single T4 with the GPU exclusively allocated. immich-ml was scaled to 0 for the run (node1 RAM constraint, not GPU - bumping node1 RAM is tracked as a follow-up). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 15:03:16 +00:00 · 2026-05-10 15:03:16 +00:00 · 6e7fe96a40
commit 6e7fe96a40
parent 3da01e6e1e
23 changed files with 8504 additions and 11 deletions
--- a/docs/architecture/llama-cpp.md
+++ b/docs/architecture/llama-cpp.md
@ -14,6 +14,11 @@ choosing between **Qwen3-VL-8B**, **MiniCPM-V-4.5**, and
 Future consumers (Home Assistant, agentic tooling) can hit the same
 endpoint via LiteLLM at the cluster gateway.
 First benchmark run (2026-05-10): see
 `infra/docs/benchmarks/2026-05-10-vision-llm.md`. Verdict: **qwen3vl-4b**
 for the request path (3.55 s p50, 100% parse, decisive top-N
 distribution). qwen3vl-8b for caption polish on top picks.
 ## Why llama.cpp + llama-swap (not Ollama)
 Verified across 7+7 research/challenger subagents (2026-05-10):
--- a/docs/benchmarks/2026-05-10-vision-llm.md
+++ b/docs/benchmarks/2026-05-10-vision-llm.md
@ -0,0 +1,253 @@
 # Vision-LLM benchmark — Malaga / Seville album
 **Run ID:** `2026-05-10-1424` · **Date:** 2026-05-10 · **Operator:** wizard
 100 photos randomly sampled (seed=42) from the Immich album `🇪🇸 Malaga
 Seville` (`46565b85-7580-4ac1-91a6-1ece2cf8634d`, 1556 image assets +
 9 videos), scored by three local vision-LLMs served by `llama-swap`
 on a single Tesla T4. Goal: pick a model to wire into
 `instagram-poster`'s `/candidates` ranking path.
 ## TL;DR
 **Recommendation: `qwen3vl-4b`.**
 - **Fastest** by a wide margin (3.55 s p50, 60% of qwen3vl-8b),
  important once this is in the request path of `/candidates`.
 - **100% structured-output success** — same as the other two; GBNF
  grammar enforcement worked across the board.
 - **Captions are competitive** with the 8B model in qualitative review
  (tied or close on 8/10 sampled photos; 8B wins on Flair, 4B wins on
  Latency).
 - **Most decisive scorer** — 47/100 photos got IG-fit=9 vs 17 for
  qwen3vl-8b and 9 for minicpm. We get more signal at the top end
  for ranking.
 Use qwen3vl-8b for *manual* caption refinement (top-1 of the day) if
 caption polish matters. Use minicpm-v-4-5 for nothing immediate — it's
 the most conservative scorer and the slowest at high quantiles, with
 no offsetting wins in this dataset.
 ## Setup
 - Hardware: 1× Tesla T4 (16 GiB VRAM), `nvidia.com/gpu` time-slicing
  enabled (replicas=100), pod scheduled on `k8s-node1`.
 - Server: `mostlygeek/llama-swap:cuda` (ships llama.cpp `b9085-046e28443`)
  on `llama-swap.llama-cpp.svc.cluster.local:8080`.
 - Models: GGUF Q4_K_M, mmproj F16 except qwen3vl-4b which used the
  Q8_0 mmproj (alphabetically first matching the glob).
 - Image prep: EXIF-transposed, long-edge resized to 1024 px, JPEG q=90,
  base64-embedded as `image_url` data URLs.
 - Generation: `temperature=0`, `top_k=1`, `enable_thinking=false`,
  GBNF grammar pinning the JSON schema (6 fields, 1–10 ints, ≤8 tags).
 - Run isolation: `immich-machine-learning` scaled to 0 for the
  duration to avoid noisy GPU contention. *(Diagnostic note: the
  scheduling failure that triggered this was actually node1 RAM —
  not GPU — at 94% allocated. Time-slicing was already on. Bumping
  node1 RAM is tracked as a follow-up.)*
 ## Headline numbers
 | model | n | parse_ok | p50 latency | p95 latency | median IG-fit | median aesthetic |
 |-------|---|----------|-------------|-------------|---------------|------------------|
 | **qwen3vl-4b** | 100 | 100% | **3.55 s** | 4.06 s | 8.0 | 8.0 |
 | minicpm-v-4-5 | 100 | 100% | 5.62 s | 6.00 s | 7.0 | 8.0 |
 | qwen3vl-8b | 100 | 100% | 5.98 s | 6.64 s | 7.0 | 8.0 |
 Total wall time for the run: **33 m 32 s** (300 calls + 3 cold loads
 of ~30 s each).
 ## What each model is good at
 ### qwen3vl-4b — fast and decisive
 - p50 3.55 s — comfortable for adding to `/candidates` request path.
 - IG-fit distribution skews right (47 nines), spreading 6 → 9 fairly
  evenly, which is what you want from a *ranker*.
 - Captions are emoji-friendly, hashtag-friendly, sometimes
  hallucinatory (e.g. labelled a Seville street as "Barcelona's
  colourful streets" once).
 - Failure mode to watch: occasional double-down on the same caption
  template ("Lost in the tiles. 🌿" repeated across two unrelated
  blue-dress photos).
 ### minicpm-v-4-5 — conservative, terse
 - Most conservative scorer: 65% of photos got IG-fit=7. Only 9 nines.
  Less useful as a top-N ranker because the top is squashed.
 - Fastest p95 of the three (6.0 s) but slower p50 than qwen3vl-4b.
 - Captions are short and lower-case ("azulejo dreams.",
  "sunshine & secrets") — distinct voice but less Instagram-native.
 ### qwen3vl-8b — most polished captions
 - Best subject identification (specifically named "Metropol Parasol"
  and "Plaza de España" by name where the others said "modern
  architecture" / "plaza").
 - Captions read well: "Coffee & calm vibes ☕️", "where modern meets
  historic under a brilliant sky".
 - Slowest p50 (5.98 s) and tightest score distribution (median 7,
  17 nines) — middle of the pack as a ranker.
 ## Top-10 agreement (Kendall-tau-style overlap)
 How many of each model's top-10 IG-fit picks appear in another
 model's top-10:
 | pair | overlap |
 |------|---------|
 | qwen3vl-4b ↔ qwen3vl-8b | 5/10 |
 | minicpm-v-4-5 ↔ qwen3vl-4b | 4/10 |
 | minicpm-v-4-5 ↔ qwen3vl-8b | 4/10 |
 Read: there's moderate but not strong agreement. The models pick
 roughly half the same "best" photos and half different ones. For
 ranking, that's a healthy sign — they're not collapsing to a single
 notion of "good", so combining their scores would add real signal.
 ## Cost-equivalent context
 Approximate cost to score the same 100 photos via cloud APIs
 (prompt ≈ 1100 tokens incl. image, completion ≈ 100 tokens):
 | backend | input | output | per-100 photos |
 |---------|-------|--------|----------------|
 | Local llama-swap on T4 | — | — | ≈ $0.04 (electricity, ~70 W × 7 min) |
 | Anthropic Haiku 4.5 | $1.00/M | $5.00/M | ≈ $0.15 |
 | Anthropic Sonnet 4.6 | $3.00/M | $15.00/M | ≈ $0.45 |
 | Google Gemini 2.5 Flash | $0.30/M | $2.50/M | ≈ $0.05 |
 Local is competitive with Gemini Flash on marginal cost. The case
 for keeping it local is privacy (Immich originals never leave the
 LAN), no rate-limits, and no per-call quota planning. The case
 against is the GPU is finite — adding this to a request path means
 sharing T4 time with frigate, ytdlp, and (when we restore it)
 immich-ml. With time-slicing on, that's tractable but each tenant
 gets fewer full-GPU kernels.
 ## Sample captions (10 representative photos, all 3 models)
 `9f9c9faf-...` — man at ornate arch in Seville
 - **minicpm-v-4-5** (7/8): sunshine & secrets
 - **qwen3vl-4b** (9/8): Found the perfect spot to pause. 🌞 #travel #architecture
 - **qwen3vl-8b** (9/8): Standing in Seville's magic, framed by history and color.
 `0d53158b-...` — Metropol Parasol
 - **minicpm-v-4-5** (9/8): Seville's skyward statement.
 - **qwen3vl-4b** (9/8): Seville's iconic modern structure against a vibrant blue sky.
 - **qwen3vl-8b** (9/8): The Metropol Parasol in Seville, where modern meets historic under a brilliant sky.
 `23137c1c-...` — coffee shot
 - **minicpm-v-4-5** (8/7): a morning pick-me-up and a smile.
 - **qwen3vl-4b** (8/7): Coffee & chill. Perfectly imperfect moment.
 - **qwen3vl-8b** (9/8): Coffee & calm vibes ☕️
 `3376ee87-...` — woman against tiled wall
 - **minicpm-v-4-5** (7/8): azulejo dreams.
 - **qwen3vl-4b** (9/8): Lost in the tiles. 🌿
 - **qwen3vl-8b** (9/8): Blue tiles, golden light, and a moment of quiet grace.
 `4f2fea45-...` — courtyard
 - **minicpm-v-4-5** (7/8): hidden gems of seville
 - **qwen3vl-4b** (7/8): Timeless beauty in a Spanish courtyard. 🌿
 - **qwen3vl-8b** (7/8): A serene courtyard in Seville, where palm trees sway under the sun.
 `ea713729-...` — flower-market street (qwen3vl-4b confused location)
 - **minicpm-v-4-5** (7/8): Seville's hidden gems.
 - **qwen3vl-4b** (7/8): Walking through *Barcelona's* colorful streets, backlit by golden hour.
 - **qwen3vl-8b** (7/8): Walking through Seville's vibrant streets, lavender in hand.
 The full list of 10 sample sets is in the auto-generated section
 below; the raw 300-row JSON is at `benchmark-2026-05-10-1424.json`
 in this directory.
 ## Operational cost during the run
 - llama-swap pod (1× T4 wholly allocated for the duration): ~33 min.
 - Immich-ML downtime: ~33 min. New uploads weren't auto-tagged or
  CLIP-embedded during this window. No user-visible impact (Immich
  search against already-indexed assets still worked via pgvector).
 - Network egress: zero — Immich originals stayed on the LAN, all
  scoring traffic was in-cluster.
 ## Reproducibility
 ```bash
 DATA_DIR=/tmp/benchmark \
  IMMICH_API_KEY=… \
  LLAMA_SWAP_URL=http://localhost:18080 \
  poetry run python -m instagram_poster.benchmark run \
    --album-id 46565b85-7580-4ac1-91a6-1ece2cf8634d \
    --models qwen3vl-8b,minicpm-v-4-5,qwen3vl-4b \
    --limit 100 --random-seed 42 --run-id 2026-05-10-1424
 ```
 The same `--random-seed` reproduces the photo sample exactly. Prompt
 version `4bbb7e7721da24d9` is the SHA-256 of the system prompt + user
 prompt + GBNF grammar; rerunning under the same prompt version against
 the same seed should produce within-noise identical scores (the models
 themselves are temperature=0, top_k=1).
 ## Next steps
 - **Wire `qwen3vl-4b` into `instagram-poster`** as an additional ranking
  signal alongside CLIP-based recency in `/candidates`. Cache the score
  per asset_id so we don't re-pay 4 s on every list refresh.
 - **Bump k8s-node1 RAM** so immich-ml + llama-swap can co-exist (drain
  → resize → uncordon, with kubelet `systemReserved` adjusted in
  `stacks/infra/main.tf`).
 - **Re-benchmark with shared GPU** once node1 RAM is bumped, to get
  realistic latency numbers when the T4 is also under load from
  immich-ml and frigate.
 - **Front llama-swap with LiteLLM** so Home Assistant and any other
  consumer can hit one OpenAI-compat gateway. Track separately.
 ---
 ## Auto-generated report
 Below is the unedited output of `python -m instagram_poster.benchmark
 report --run-id 2026-05-10-1424`, kept for diff-checking against
 future runs.
 ### Per-model summary
 | model | n | parse_ok % | error % | p50 latency | p95 latency | median IG-fit | median aesthetic |
 |-------|---|-----------|--------|------------|-------------|--------------|------------------|
 | minicpm-v-4-5 | 100 | 100.0 | 0.0 | 5617 ms | 5998 ms | 7.0 | 8.0 |
 | qwen3vl-4b | 100 | 100.0 | 0.0 | 3552 ms | 4063 ms | 8.0 | 8.0 |
 | qwen3vl-8b | 100 | 100.0 | 0.0 | 5981 ms | 6637 ms | 7.0 | 8.0 |
 ### Score histograms (instagram_fit_score 1–10)
 #### minicpm-v-4-5
 ```
 1: (0)   2: (0)   3: (0)   4: (0)   5: (0)
 6: ███████ (7)
 7: █████████████████████████████████████████████████████████████████ (65)
 8: ███████████████████ (19)
 9: █████████ (9)
 10: (0)
 ```
 #### qwen3vl-4b
 ```
 1: (0)   2: (0)   3: (0)   4: (0)   5: (0)
 6: █████ (5)
 7: ████████████████ (16)
 8: ████████████████████████████████ (32)
 9: ███████████████████████████████████████████████ (47)
 10: (0)
 ```
 #### qwen3vl-8b
 ```
 1: (0)   2: (0)   3: (0)   4: (0)   5: (0)
 6: ███████████ (11)
 7: ███████████████████████████████████████████████████████ (55)
 8: █████████████████ (17)
 9: █████████████████ (17)
 10: (0)
 ```
 ### Top-10 by IG-fit per model — see `benchmark-2026-05-10-1424.json`
 (Tables omitted from the curated report; available in the JSON dump
 alongside this file.)
--- a/docs/benchmarks/benchmark-2026-05-10-1424.json
+++ b/docs/benchmarks/benchmark-2026-05-10-1424.json
--- a/stacks/blog/.terraform.lock.hcl
+++ b/stacks/blog/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }
 provider "registry.terraform.io/goauthentik/authentik" {
  version     = "2024.12.1"
  constraints = "~> 2024.10"
  hashes = [
    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
  ]
 }
 provider "registry.terraform.io/hashicorp/helm" {
  version = "3.1.1"
  hashes = [
--- a/stacks/blog/backend.tf
+++ b/stacks/blog/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "blog"
  }
 }
--- a/stacks/blog/providers.tf
+++ b/stacks/blog/providers.tf
@ -9,6 +9,10 @@ terraform {
      source  = "cloudflare/cloudflare"
      version = "~> 4"
    }
    authentik = {
      source  = "goauthentik/authentik"
      version = "~> 2024.10"
    }
  }
 }
--- a/stacks/cyberchef/.terraform.lock.hcl
+++ b/stacks/cyberchef/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }
 provider "registry.terraform.io/goauthentik/authentik" {
  version     = "2024.12.1"
  constraints = "~> 2024.10"
  hashes = [
    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
  ]
 }
 provider "registry.terraform.io/hashicorp/helm" {
  version = "3.1.1"
  hashes = [
--- a/stacks/cyberchef/backend.tf
+++ b/stacks/cyberchef/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "cyberchef"
  }
 }
--- a/stacks/cyberchef/providers.tf
+++ b/stacks/cyberchef/providers.tf
@ -9,6 +9,10 @@ terraform {
      source  = "cloudflare/cloudflare"
      version = "~> 4"
    }
    authentik = {
      source  = "goauthentik/authentik"
      version = "~> 2024.10"
    }
  }
 }
--- a/stacks/dbaas/modules/dbaas/main.tf
+++ b/stacks/dbaas/modules/dbaas/main.tf
@ -1149,7 +1149,8 @@ resource "null_resource" "pg_terraform_state_db" {
  provisioner "local-exec" {
    command = <<-EOT
-      kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas pg-cluster-1 -c postgres -- \
+      PRIMARY=$(kubectl --kubeconfig ${var.kube_config_path} get cluster -n dbaas pg-cluster -o jsonpath='{.status.currentPrimary}')
      kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas $PRIMARY -c postgres -- \
        bash -c '
          psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'terraform_state'"'"'" | grep -q 1 || \
            psql -U postgres -c "CREATE ROLE terraform_state WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'"
@ -1173,7 +1174,8 @@ resource "null_resource" "pg_payslip_ingest_db" {
  provisioner "local-exec" {
    command = <<-EOT
-      kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas pg-cluster-1 -c postgres -- \
+      PRIMARY=$(kubectl --kubeconfig ${var.kube_config_path} get cluster -n dbaas pg-cluster -o jsonpath='{.status.currentPrimary}')
      kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas $PRIMARY -c postgres -- \
        bash -c '
          psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'payslip_ingest'"'"'" | grep -q 1 || \
            psql -U postgres -c "CREATE ROLE payslip_ingest WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'"
@ -1197,7 +1199,8 @@ resource "null_resource" "pg_job_hunter_db" {
  provisioner "local-exec" {
    command = <<-EOT
-      kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas pg-cluster-1 -c postgres -- \
+      PRIMARY=$(kubectl --kubeconfig ${var.kube_config_path} get cluster -n dbaas pg-cluster -o jsonpath='{.status.currentPrimary}')
      kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas $PRIMARY -c postgres -- \
        bash -c '
          psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'job_hunter'"'"'" | grep -q 1 || \
            psql -U postgres -c "CREATE ROLE job_hunter WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'"
@ -1209,6 +1212,35 @@ resource "null_resource" "pg_job_hunter_db" {
  }
 }
 # Postiz: 3 databases (postiz, temporal, temporal_visibility) all owned by the
 # `postiz` role. Bundled bitnami PostgreSQL was retired 2026-05-09 in favour of
 # this CNPG cluster — covered by postgresql-backup-per-db automatically.
 # Role password placeholder; Vault static role `pg-postiz` rotates 7d.
 resource "null_resource" "pg_postiz_dbs" {
  depends_on = [null_resource.pg_cluster]
  triggers = {
    role = "postiz"
    dbs  = "postiz,temporal,temporal_visibility"
  }
  provisioner "local-exec" {
    command = <<-EOT
      PRIMARY=$(kubectl --kubeconfig ${var.kube_config_path} get cluster -n dbaas pg-cluster -o jsonpath='{.status.currentPrimary}')
      kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas $PRIMARY -c postgres -- \
        bash -c '
          psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'postiz'"'"'" | grep -q 1 || \
            psql -U postgres -c "CREATE ROLE postiz WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'"
          for db in postiz temporal temporal_visibility; do
            psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_database WHERE datname = '"'"'$db'"'"'" | grep -q 1 || \
              psql -U postgres -c "CREATE DATABASE $db OWNER postiz"
            psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE $db TO postiz"
          done
        '
    EOT
  }
 }
 # Create wealthfolio_sync database for the SQLite→PG ETL sidecar that mirrors
 # Wealthfolio's daily_account_valuation/accounts/activities into PG so Grafana
 # can chart net worth, contributions, and growth.
--- a/stacks/homepage/.terraform.lock.hcl
+++ b/stacks/homepage/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }
 provider "registry.terraform.io/goauthentik/authentik" {
  version     = "2024.12.1"
  constraints = "~> 2024.10"
  hashes = [
    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
  ]
 }
 provider "registry.terraform.io/hashicorp/helm" {
  version = "3.1.1"
  hashes = [
--- a/stacks/homepage/backend.tf
+++ b/stacks/homepage/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "homepage"
  }
 }
--- a/stacks/homepage/providers.tf
+++ b/stacks/homepage/providers.tf
@ -9,6 +9,10 @@ terraform {
      source  = "cloudflare/cloudflare"
      version = "~> 4"
    }
    authentik = {
      source  = "goauthentik/authentik"
      version = "~> 2024.10"
    }
  }
 }
--- a/stacks/instagram-poster/modules/instagram-poster/main.tf
+++ b/stacks/instagram-poster/modules/instagram-poster/main.tf
@ -103,6 +103,26 @@ resource "kubernetes_manifest" "external_secret" {
          secretKey = "IMMICH_PG_PASSWORD"
          remoteRef = { key = "instagram-poster", property = "immich_pg_password" }
        },
        # IG-archive dedup: tokens for Meta Graph API live ingest. Token
        # is the long-lived (60-day) IG user access token. The token-refresh
        # CronJob writes the rotated value back to Vault; ESO syncs it
        # back into this Secret on its 15m interval.
        {
          secretKey = "IG_GRAPH_TOKEN"
          remoteRef = { key = "instagram-poster", property = "ig_graph_long_lived_token" }
        },
        {
          secretKey = "IG_GRAPH_APP_ID"
          remoteRef = { key = "instagram-poster", property = "ig_graph_app_id" }
        },
        {
          secretKey = "IG_GRAPH_APP_SECRET"
          remoteRef = { key = "instagram-poster", property = "ig_graph_app_secret" }
        },
        {
          secretKey = "IG_BUSINESS_ACCOUNT_ID"
          remoteRef = { key = "instagram-poster", property = "ig_business_account_id" }
        },
      ]
    }
  }
@ -215,6 +235,20 @@ resource "kubernetes_deployment" "instagram_poster" {
            name  = "LOG_LEVEL"
            value = "INFO"
          }
          # IG-archive dedup config. Defaults match the Python Settings
          # class; override here so a flag flip is a `terraform apply`
          # not a code change. `enabled=false` until the export-zip
          # backfill + a few days of /ig-ingest produce a populated
          # ig_posted_media table — we don't want to filter everything
          # out on day one when the table is empty.
          env {
            name  = "IG_DEDUP_ENABLED"
            value = "false"
          }
          env {
            name  = "IG_ML_URL"
            value = "http://immich-machine-learning.immich.svc.cluster.local:3003"
          }
          volume_mount {
            name       = "data"
@ -322,3 +356,151 @@ module "ingress_protected" {
  port            = 80
  service_name    = "instagram-poster"
 }
 # IG-archive dedup live ingest. Three CronJobs all curl back into the
 # in-cluster Service. /ig-ingest is idempotent (ON CONFLICT DO NOTHING),
 # so missed runs / restarts are harmless.
 #
 # Cadence rationale (plan §4):
 #   stories — */30. Stories age out at 24h; running every 30m means a
 #             missed run still catches the entire window with margin.
 #   feed    — every 6h. Feed posts don't expire; this is just for
 #             "what's been posted since last poll".
 #   refresh — daily 02:00. Long-lived token has 60-day TTL; refreshing
 #             daily is wildly conservative but cheap, and the
 #             alternative ("rotate at expiry-7d") needs persistent state.
 locals {
  ig_cron_image = "curlimages/curl:8.10.1"
 }
 resource "kubernetes_cron_job_v1" "ig_ingest_stories" {
  metadata {
    name      = "ig-ingest-stories"
    namespace = kubernetes_namespace.instagram_poster.metadata[0].name
    labels    = local.labels
  }
  spec {
    schedule                      = "*/30 * * * *"
    concurrency_policy            = "Forbid"
    successful_jobs_history_limit = 1
    failed_jobs_history_limit     = 3
    job_template {
      metadata {}
      spec {
        backoff_limit = 1
        template {
          metadata {}
          spec {
            restart_policy = "OnFailure"
            container {
              name  = "ingest"
              image = local.ig_cron_image
              command = [
                "sh", "-c",
                "curl -fsS -X POST http://instagram-poster.instagram-poster.svc.cluster.local/ig-ingest -H 'Content-Type: application/json' -d '{\"include\":[\"stories\"]}'",
              ]
              resources {
                requests = { cpu = "10m", memory = "32Mi" }
                limits   = { memory = "64Mi" }
              }
            }
          }
        }
      }
    }
  }
  lifecycle {
    ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
  }
  depends_on = [kubernetes_deployment.instagram_poster]
 }
 resource "kubernetes_cron_job_v1" "ig_ingest_feed" {
  metadata {
    name      = "ig-ingest-feed"
    namespace = kubernetes_namespace.instagram_poster.metadata[0].name
    labels    = local.labels
  }
  spec {
    schedule                      = "0 */6 * * *"
    concurrency_policy            = "Forbid"
    successful_jobs_history_limit = 1
    failed_jobs_history_limit     = 3
    job_template {
      metadata {}
      spec {
        backoff_limit = 1
        template {
          metadata {}
          spec {
            restart_policy = "OnFailure"
            container {
              name  = "ingest"
              image = local.ig_cron_image
              command = [
                "sh", "-c",
                "curl -fsS -X POST http://instagram-poster.instagram-poster.svc.cluster.local/ig-ingest -H 'Content-Type: application/json' -d '{\"include\":[\"feed\"]}'",
              ]
              resources {
                requests = { cpu = "10m", memory = "32Mi" }
                limits   = { memory = "64Mi" }
              }
            }
          }
        }
      }
    }
  }
  lifecycle {
    ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
  }
  depends_on = [kubernetes_deployment.instagram_poster]
 }
 # Daily token refresh. Hits /ig-refresh-token which returns the new token
 # in JSON; we don't write it back to Vault here — that's a follow-up
 # decision (plan §6, open item #2). For now the new token is logged and
 # operators rotate manually if it ever fails. The 60-day TTL gives plenty
 # of room.
 resource "kubernetes_cron_job_v1" "ig_refresh_token" {
  metadata {
    name      = "ig-refresh-token"
    namespace = kubernetes_namespace.instagram_poster.metadata[0].name
    labels    = local.labels
  }
  spec {
    schedule                      = "0 2 * * *"
    concurrency_policy            = "Forbid"
    successful_jobs_history_limit = 1
    failed_jobs_history_limit     = 3
    job_template {
      metadata {}
      spec {
        backoff_limit = 1
        template {
          metadata {}
          spec {
            restart_policy = "OnFailure"
            container {
              name  = "refresh"
              image = local.ig_cron_image
              command = [
                "sh", "-c",
                "curl -fsS -X POST http://instagram-poster.instagram-poster.svc.cluster.local/ig-refresh-token",
              ]
              resources {
                requests = { cpu = "10m", memory = "32Mi" }
                limits   = { memory = "64Mi" }
              }
            }
          }
        }
      }
    }
  }
  lifecycle {
    ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
  }
  depends_on = [kubernetes_deployment.instagram_poster]
 }
--- a/stacks/jsoncrack/.terraform.lock.hcl
+++ b/stacks/jsoncrack/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }
 provider "registry.terraform.io/goauthentik/authentik" {
  version     = "2024.12.1"
  constraints = "~> 2024.10"
  hashes = [
    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
  ]
 }
 provider "registry.terraform.io/hashicorp/helm" {
  version = "3.1.1"
  hashes = [
--- a/stacks/jsoncrack/backend.tf
+++ b/stacks/jsoncrack/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "jsoncrack"
  }
 }
--- a/stacks/jsoncrack/providers.tf
+++ b/stacks/jsoncrack/providers.tf
@ -9,6 +9,10 @@ terraform {
      source  = "cloudflare/cloudflare"
      version = "~> 4"
    }
    authentik = {
      source  = "goauthentik/authentik"
      version = "~> 2024.10"
    }
  }
 }
--- a/stacks/llama-cpp/main.tf
+++ b/stacks/llama-cpp/main.tf
@ -65,7 +65,7 @@ locals {
          "-c ${cfg.ctx_size}",
          "-np 1",
          "--jinja",
-          "-fa",
+          "-fa on",
        ])
        ttl           = 600 # unload after 10 min idle
        checkEndpoint = "/health"
--- a/stacks/real-estate-crawler/backend.tf
+++ b/stacks/real-estate-crawler/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "real-estate-crawler"
  }
 }
--- a/stacks/travel_blog/.terraform.lock.hcl
+++ b/stacks/travel_blog/.terraform.lock.hcl
@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
  ]
 }
 provider "registry.terraform.io/goauthentik/authentik" {
  version     = "2024.12.1"
  constraints = "~> 2024.10"
  hashes = [
    "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
  ]
 }
 provider "registry.terraform.io/hashicorp/helm" {
  version = "3.1.1"
  hashes = [
--- a/stacks/travel_blog/backend.tf
+++ b/stacks/travel_blog/backend.tf
@ -1,7 +1,7 @@
 # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
 terraform {
  backend "pg" {
-    conn_str    = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable"
+    conn_str    = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
    schema_name = "travel_blog"
  }
 }
--- a/stacks/travel_blog/providers.tf
+++ b/stacks/travel_blog/providers.tf
@ -9,6 +9,10 @@ terraform {
      source  = "cloudflare/cloudflare"
      version = "~> 4"
    }
    authentik = {
      source  = "goauthentik/authentik"
      version = "~> 2024.10"
    }
  }
 }
--- a/stacks/vault/main.tf
+++ b/stacks/vault/main.tf
@ -539,7 +539,8 @@ resource "vault_database_secret_backend_connection" "postgresql" {
    "pg-health", "pg-linkwarden",
    "pg-affine", "pg-woodpecker", "pg-claude-memory",
    "pg-terraform-state", "pg-payslip-ingest", "pg-job-hunter",
-    "pg-wealthfolio-sync", "pg-fire-planner"
+    "pg-wealthfolio-sync", "pg-fire-planner",
    "pg-postiz",
  ]
  postgresql {
@ -677,6 +678,17 @@ resource "vault_database_secret_backend_static_role" "pg_terraform_state" {
  rotation_period = 604800
 }
 # Postiz uses three databases (postiz, temporal, temporal_visibility) all owned
 # by the `postiz` PG role. One static role covers all three. Migrated from the
 # bundled bitnami PG StatefulSet to CNPG on 2026-05-09.
 resource "vault_database_secret_backend_static_role" "pg_postiz" {
  backend         = vault_mount.database.path
  db_name         = vault_database_secret_backend_connection.postgresql.name
  name            = "pg-postiz"
  username        = "postiz"
  rotation_period = 604800
 }
 resource "vault_database_secret_backend_static_role" "pg_payslip_ingest" {
  backend         = vault_mount.database.path
  db_name         = vault_database_secret_backend_connection.postgresql.name