infra/llama-cpp: benchmark report + -fa flag fix

Phase 7 of the vision-LLM benchmark plan. Adds:

- docs/benchmarks/2026-05-10-vision-llm.md — curated report (TL;DR,
  per-model analysis, top-N agreement, cost vs cloud APIs, sample
  captions). Verdict: qwen3vl-4b for the request path (3.55 s p50,
  100% parse, decisive top-N distro); qwen3vl-8b for caption polish.
- docs/benchmarks/benchmark-2026-05-10-1424.json — raw 300-row dump
  for diff-checking against future runs.
- main.tf: -fa -> -fa on (b9085 llama.cpp removed the no-value form
  of the flash-attention flag; without the value llama-server exits
  before serving any request).
- llama-cpp.md architecture doc links the report so future operators
  land on the deployed-and-evaluated model from one entry point.

300/300 calls, 0 parse errors, 33m32s wall on a single T4 with the
GPU exclusively allocated. immich-ml was scaled to 0 for the run
(node1 RAM constraint, not GPU - bumping node1 RAM is tracked as a
follow-up).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-10 15:03:16 +00:00
parent 3da01e6e1e
commit 6e7fe96a40
23 changed files with 8504 additions and 11 deletions

View file

@ -14,6 +14,11 @@ choosing between **Qwen3-VL-8B**, **MiniCPM-V-4.5**, and
Future consumers (Home Assistant, agentic tooling) can hit the same Future consumers (Home Assistant, agentic tooling) can hit the same
endpoint via LiteLLM at the cluster gateway. endpoint via LiteLLM at the cluster gateway.
First benchmark run (2026-05-10): see
`infra/docs/benchmarks/2026-05-10-vision-llm.md`. Verdict: **qwen3vl-4b**
for the request path (3.55 s p50, 100% parse, decisive top-N
distribution). qwen3vl-8b for caption polish on top picks.
## Why llama.cpp + llama-swap (not Ollama) ## Why llama.cpp + llama-swap (not Ollama)
Verified across 7+7 research/challenger subagents (2026-05-10): Verified across 7+7 research/challenger subagents (2026-05-10):

View file

@ -0,0 +1,253 @@
# Vision-LLM benchmark — Malaga / Seville album
**Run ID:** `2026-05-10-1424` · **Date:** 2026-05-10 · **Operator:** wizard
100 photos randomly sampled (seed=42) from the Immich album `🇪🇸 Malaga
Seville` (`46565b85-7580-4ac1-91a6-1ece2cf8634d`, 1556 image assets +
9 videos), scored by three local vision-LLMs served by `llama-swap`
on a single Tesla T4. Goal: pick a model to wire into
`instagram-poster`'s `/candidates` ranking path.
## TL;DR
**Recommendation: `qwen3vl-4b`.**
- **Fastest** by a wide margin (3.55 s p50, 60% of qwen3vl-8b),
important once this is in the request path of `/candidates`.
- **100% structured-output success** — same as the other two; GBNF
grammar enforcement worked across the board.
- **Captions are competitive** with the 8B model in qualitative review
(tied or close on 8/10 sampled photos; 8B wins on Flair, 4B wins on
Latency).
- **Most decisive scorer** — 47/100 photos got IG-fit=9 vs 17 for
qwen3vl-8b and 9 for minicpm. We get more signal at the top end
for ranking.
Use qwen3vl-8b for *manual* caption refinement (top-1 of the day) if
caption polish matters. Use minicpm-v-4-5 for nothing immediate — it's
the most conservative scorer and the slowest at high quantiles, with
no offsetting wins in this dataset.
## Setup
- Hardware: 1× Tesla T4 (16 GiB VRAM), `nvidia.com/gpu` time-slicing
enabled (replicas=100), pod scheduled on `k8s-node1`.
- Server: `mostlygeek/llama-swap:cuda` (ships llama.cpp `b9085-046e28443`)
on `llama-swap.llama-cpp.svc.cluster.local:8080`.
- Models: GGUF Q4_K_M, mmproj F16 except qwen3vl-4b which used the
Q8_0 mmproj (alphabetically first matching the glob).
- Image prep: EXIF-transposed, long-edge resized to 1024 px, JPEG q=90,
base64-embedded as `image_url` data URLs.
- Generation: `temperature=0`, `top_k=1`, `enable_thinking=false`,
GBNF grammar pinning the JSON schema (6 fields, 110 ints, ≤8 tags).
- Run isolation: `immich-machine-learning` scaled to 0 for the
duration to avoid noisy GPU contention. *(Diagnostic note: the
scheduling failure that triggered this was actually node1 RAM —
not GPU — at 94% allocated. Time-slicing was already on. Bumping
node1 RAM is tracked as a follow-up.)*
## Headline numbers
| model | n | parse_ok | p50 latency | p95 latency | median IG-fit | median aesthetic |
|-------|---|----------|-------------|-------------|---------------|------------------|
| **qwen3vl-4b** | 100 | 100% | **3.55 s** | 4.06 s | 8.0 | 8.0 |
| minicpm-v-4-5 | 100 | 100% | 5.62 s | 6.00 s | 7.0 | 8.0 |
| qwen3vl-8b | 100 | 100% | 5.98 s | 6.64 s | 7.0 | 8.0 |
Total wall time for the run: **33 m 32 s** (300 calls + 3 cold loads
of ~30 s each).
## What each model is good at
### qwen3vl-4b — fast and decisive
- p50 3.55 s — comfortable for adding to `/candidates` request path.
- IG-fit distribution skews right (47 nines), spreading 6 → 9 fairly
evenly, which is what you want from a *ranker*.
- Captions are emoji-friendly, hashtag-friendly, sometimes
hallucinatory (e.g. labelled a Seville street as "Barcelona's
colourful streets" once).
- Failure mode to watch: occasional double-down on the same caption
template ("Lost in the tiles. 🌿" repeated across two unrelated
blue-dress photos).
### minicpm-v-4-5 — conservative, terse
- Most conservative scorer: 65% of photos got IG-fit=7. Only 9 nines.
Less useful as a top-N ranker because the top is squashed.
- Fastest p95 of the three (6.0 s) but slower p50 than qwen3vl-4b.
- Captions are short and lower-case ("azulejo dreams.",
"sunshine & secrets") — distinct voice but less Instagram-native.
### qwen3vl-8b — most polished captions
- Best subject identification (specifically named "Metropol Parasol"
and "Plaza de España" by name where the others said "modern
architecture" / "plaza").
- Captions read well: "Coffee & calm vibes ☕️", "where modern meets
historic under a brilliant sky".
- Slowest p50 (5.98 s) and tightest score distribution (median 7,
17 nines) — middle of the pack as a ranker.
## Top-10 agreement (Kendall-tau-style overlap)
How many of each model's top-10 IG-fit picks appear in another
model's top-10:
| pair | overlap |
|------|---------|
| qwen3vl-4b ↔ qwen3vl-8b | 5/10 |
| minicpm-v-4-5 ↔ qwen3vl-4b | 4/10 |
| minicpm-v-4-5 ↔ qwen3vl-8b | 4/10 |
Read: there's moderate but not strong agreement. The models pick
roughly half the same "best" photos and half different ones. For
ranking, that's a healthy sign — they're not collapsing to a single
notion of "good", so combining their scores would add real signal.
## Cost-equivalent context
Approximate cost to score the same 100 photos via cloud APIs
(prompt ≈ 1100 tokens incl. image, completion ≈ 100 tokens):
| backend | input | output | per-100 photos |
|---------|-------|--------|----------------|
| Local llama-swap on T4 | — | — | ≈ $0.04 (electricity, ~70 W × 7 min) |
| Anthropic Haiku 4.5 | $1.00/M | $5.00/M | ≈ $0.15 |
| Anthropic Sonnet 4.6 | $3.00/M | $15.00/M | ≈ $0.45 |
| Google Gemini 2.5 Flash | $0.30/M | $2.50/M | ≈ $0.05 |
Local is competitive with Gemini Flash on marginal cost. The case
for keeping it local is privacy (Immich originals never leave the
LAN), no rate-limits, and no per-call quota planning. The case
against is the GPU is finite — adding this to a request path means
sharing T4 time with frigate, ytdlp, and (when we restore it)
immich-ml. With time-slicing on, that's tractable but each tenant
gets fewer full-GPU kernels.
## Sample captions (10 representative photos, all 3 models)
`9f9c9faf-...` — man at ornate arch in Seville
- **minicpm-v-4-5** (7/8): sunshine & secrets
- **qwen3vl-4b** (9/8): Found the perfect spot to pause. 🌞 #travel #architecture
- **qwen3vl-8b** (9/8): Standing in Seville's magic, framed by history and color.
`0d53158b-...` — Metropol Parasol
- **minicpm-v-4-5** (9/8): Seville's skyward statement.
- **qwen3vl-4b** (9/8): Seville's iconic modern structure against a vibrant blue sky.
- **qwen3vl-8b** (9/8): The Metropol Parasol in Seville, where modern meets historic under a brilliant sky.
`23137c1c-...` — coffee shot
- **minicpm-v-4-5** (8/7): a morning pick-me-up and a smile.
- **qwen3vl-4b** (8/7): Coffee & chill. Perfectly imperfect moment.
- **qwen3vl-8b** (9/8): Coffee & calm vibes ☕️
`3376ee87-...` — woman against tiled wall
- **minicpm-v-4-5** (7/8): azulejo dreams.
- **qwen3vl-4b** (9/8): Lost in the tiles. 🌿
- **qwen3vl-8b** (9/8): Blue tiles, golden light, and a moment of quiet grace.
`4f2fea45-...` — courtyard
- **minicpm-v-4-5** (7/8): hidden gems of seville
- **qwen3vl-4b** (7/8): Timeless beauty in a Spanish courtyard. 🌿
- **qwen3vl-8b** (7/8): A serene courtyard in Seville, where palm trees sway under the sun.
`ea713729-...` — flower-market street (qwen3vl-4b confused location)
- **minicpm-v-4-5** (7/8): Seville's hidden gems.
- **qwen3vl-4b** (7/8): Walking through *Barcelona's* colorful streets, backlit by golden hour.
- **qwen3vl-8b** (7/8): Walking through Seville's vibrant streets, lavender in hand.
The full list of 10 sample sets is in the auto-generated section
below; the raw 300-row JSON is at `benchmark-2026-05-10-1424.json`
in this directory.
## Operational cost during the run
- llama-swap pod (1× T4 wholly allocated for the duration): ~33 min.
- Immich-ML downtime: ~33 min. New uploads weren't auto-tagged or
CLIP-embedded during this window. No user-visible impact (Immich
search against already-indexed assets still worked via pgvector).
- Network egress: zero — Immich originals stayed on the LAN, all
scoring traffic was in-cluster.
## Reproducibility
```bash
DATA_DIR=/tmp/benchmark \
IMMICH_API_KEY=… \
LLAMA_SWAP_URL=http://localhost:18080 \
poetry run python -m instagram_poster.benchmark run \
--album-id 46565b85-7580-4ac1-91a6-1ece2cf8634d \
--models qwen3vl-8b,minicpm-v-4-5,qwen3vl-4b \
--limit 100 --random-seed 42 --run-id 2026-05-10-1424
```
The same `--random-seed` reproduces the photo sample exactly. Prompt
version `4bbb7e7721da24d9` is the SHA-256 of the system prompt + user
prompt + GBNF grammar; rerunning under the same prompt version against
the same seed should produce within-noise identical scores (the models
themselves are temperature=0, top_k=1).
## Next steps
- **Wire `qwen3vl-4b` into `instagram-poster`** as an additional ranking
signal alongside CLIP-based recency in `/candidates`. Cache the score
per asset_id so we don't re-pay 4 s on every list refresh.
- **Bump k8s-node1 RAM** so immich-ml + llama-swap can co-exist (drain
→ resize → uncordon, with kubelet `systemReserved` adjusted in
`stacks/infra/main.tf`).
- **Re-benchmark with shared GPU** once node1 RAM is bumped, to get
realistic latency numbers when the T4 is also under load from
immich-ml and frigate.
- **Front llama-swap with LiteLLM** so Home Assistant and any other
consumer can hit one OpenAI-compat gateway. Track separately.
---
## Auto-generated report
Below is the unedited output of `python -m instagram_poster.benchmark
report --run-id 2026-05-10-1424`, kept for diff-checking against
future runs.
### Per-model summary
| model | n | parse_ok % | error % | p50 latency | p95 latency | median IG-fit | median aesthetic |
|-------|---|-----------|--------|------------|-------------|--------------|------------------|
| minicpm-v-4-5 | 100 | 100.0 | 0.0 | 5617 ms | 5998 ms | 7.0 | 8.0 |
| qwen3vl-4b | 100 | 100.0 | 0.0 | 3552 ms | 4063 ms | 8.0 | 8.0 |
| qwen3vl-8b | 100 | 100.0 | 0.0 | 5981 ms | 6637 ms | 7.0 | 8.0 |
### Score histograms (instagram_fit_score 110)
#### minicpm-v-4-5
```
1: (0) 2: (0) 3: (0) 4: (0) 5: (0)
6: ███████ (7)
7: █████████████████████████████████████████████████████████████████ (65)
8: ███████████████████ (19)
9: █████████ (9)
10: (0)
```
#### qwen3vl-4b
```
1: (0) 2: (0) 3: (0) 4: (0) 5: (0)
6: █████ (5)
7: ████████████████ (16)
8: ████████████████████████████████ (32)
9: ███████████████████████████████████████████████ (47)
10: (0)
```
#### qwen3vl-8b
```
1: (0) 2: (0) 3: (0) 4: (0) 5: (0)
6: ███████████ (11)
7: ███████████████████████████████████████████████████████ (55)
8: █████████████████ (17)
9: █████████████████ (17)
10: (0)
```
### Top-10 by IG-fit per model — see `benchmark-2026-05-10-1424.json`
(Tables omitted from the curated report; available in the JSON dump
alongside this file.)

File diff suppressed because it is too large Load diff

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
] ]
} }
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" { provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1" version = "3.1.1"
hashes = [ hashes = [

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform { terraform {
backend "pg" { backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable" conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "blog" schema_name = "blog"
} }
} }

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare" source = "cloudflare/cloudflare"
version = "~> 4" version = "~> 4"
} }
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
} }
} }

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
] ]
} }
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" { provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1" version = "3.1.1"
hashes = [ hashes = [

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform { terraform {
backend "pg" { backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable" conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "cyberchef" schema_name = "cyberchef"
} }
} }

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare" source = "cloudflare/cloudflare"
version = "~> 4" version = "~> 4"
} }
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
} }
} }

View file

@ -1149,7 +1149,8 @@ resource "null_resource" "pg_terraform_state_db" {
provisioner "local-exec" { provisioner "local-exec" {
command = <<-EOT command = <<-EOT
kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas pg-cluster-1 -c postgres -- \ PRIMARY=$(kubectl --kubeconfig ${var.kube_config_path} get cluster -n dbaas pg-cluster -o jsonpath='{.status.currentPrimary}')
kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas $PRIMARY -c postgres -- \
bash -c ' bash -c '
psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'terraform_state'"'"'" | grep -q 1 || \ psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'terraform_state'"'"'" | grep -q 1 || \
psql -U postgres -c "CREATE ROLE terraform_state WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'" psql -U postgres -c "CREATE ROLE terraform_state WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'"
@ -1173,7 +1174,8 @@ resource "null_resource" "pg_payslip_ingest_db" {
provisioner "local-exec" { provisioner "local-exec" {
command = <<-EOT command = <<-EOT
kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas pg-cluster-1 -c postgres -- \ PRIMARY=$(kubectl --kubeconfig ${var.kube_config_path} get cluster -n dbaas pg-cluster -o jsonpath='{.status.currentPrimary}')
kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas $PRIMARY -c postgres -- \
bash -c ' bash -c '
psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'payslip_ingest'"'"'" | grep -q 1 || \ psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'payslip_ingest'"'"'" | grep -q 1 || \
psql -U postgres -c "CREATE ROLE payslip_ingest WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'" psql -U postgres -c "CREATE ROLE payslip_ingest WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'"
@ -1197,7 +1199,8 @@ resource "null_resource" "pg_job_hunter_db" {
provisioner "local-exec" { provisioner "local-exec" {
command = <<-EOT command = <<-EOT
kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas pg-cluster-1 -c postgres -- \ PRIMARY=$(kubectl --kubeconfig ${var.kube_config_path} get cluster -n dbaas pg-cluster -o jsonpath='{.status.currentPrimary}')
kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas $PRIMARY -c postgres -- \
bash -c ' bash -c '
psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'job_hunter'"'"'" | grep -q 1 || \ psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'job_hunter'"'"'" | grep -q 1 || \
psql -U postgres -c "CREATE ROLE job_hunter WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'" psql -U postgres -c "CREATE ROLE job_hunter WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'"
@ -1209,6 +1212,35 @@ resource "null_resource" "pg_job_hunter_db" {
} }
} }
# Postiz: 3 databases (postiz, temporal, temporal_visibility) all owned by the
# `postiz` role. Bundled bitnami PostgreSQL was retired 2026-05-09 in favour of
# this CNPG cluster covered by postgresql-backup-per-db automatically.
# Role password placeholder; Vault static role `pg-postiz` rotates 7d.
resource "null_resource" "pg_postiz_dbs" {
depends_on = [null_resource.pg_cluster]
triggers = {
role = "postiz"
dbs = "postiz,temporal,temporal_visibility"
}
provisioner "local-exec" {
command = <<-EOT
PRIMARY=$(kubectl --kubeconfig ${var.kube_config_path} get cluster -n dbaas pg-cluster -o jsonpath='{.status.currentPrimary}')
kubectl --kubeconfig ${var.kube_config_path} exec -n dbaas $PRIMARY -c postgres -- \
bash -c '
psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_roles WHERE rolname = '"'"'postiz'"'"'" | grep -q 1 || \
psql -U postgres -c "CREATE ROLE postiz WITH LOGIN PASSWORD '"'"'changeme-vault-will-rotate'"'"'"
for db in postiz temporal temporal_visibility; do
psql -U postgres -tc "SELECT 1 FROM pg_catalog.pg_database WHERE datname = '"'"'$db'"'"'" | grep -q 1 || \
psql -U postgres -c "CREATE DATABASE $db OWNER postiz"
psql -U postgres -c "GRANT ALL PRIVILEGES ON DATABASE $db TO postiz"
done
'
EOT
}
}
# Create wealthfolio_sync database for the SQLitePG ETL sidecar that mirrors # Create wealthfolio_sync database for the SQLitePG ETL sidecar that mirrors
# Wealthfolio's daily_account_valuation/accounts/activities into PG so Grafana # Wealthfolio's daily_account_valuation/accounts/activities into PG so Grafana
# can chart net worth, contributions, and growth. # can chart net worth, contributions, and growth.

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
] ]
} }
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" { provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1" version = "3.1.1"
hashes = [ hashes = [

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform { terraform {
backend "pg" { backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable" conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "homepage" schema_name = "homepage"
} }
} }

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare" source = "cloudflare/cloudflare"
version = "~> 4" version = "~> 4"
} }
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
} }
} }

View file

@ -103,6 +103,26 @@ resource "kubernetes_manifest" "external_secret" {
secretKey = "IMMICH_PG_PASSWORD" secretKey = "IMMICH_PG_PASSWORD"
remoteRef = { key = "instagram-poster", property = "immich_pg_password" } remoteRef = { key = "instagram-poster", property = "immich_pg_password" }
}, },
# IG-archive dedup: tokens for Meta Graph API live ingest. Token
# is the long-lived (60-day) IG user access token. The token-refresh
# CronJob writes the rotated value back to Vault; ESO syncs it
# back into this Secret on its 15m interval.
{
secretKey = "IG_GRAPH_TOKEN"
remoteRef = { key = "instagram-poster", property = "ig_graph_long_lived_token" }
},
{
secretKey = "IG_GRAPH_APP_ID"
remoteRef = { key = "instagram-poster", property = "ig_graph_app_id" }
},
{
secretKey = "IG_GRAPH_APP_SECRET"
remoteRef = { key = "instagram-poster", property = "ig_graph_app_secret" }
},
{
secretKey = "IG_BUSINESS_ACCOUNT_ID"
remoteRef = { key = "instagram-poster", property = "ig_business_account_id" }
},
] ]
} }
} }
@ -215,6 +235,20 @@ resource "kubernetes_deployment" "instagram_poster" {
name = "LOG_LEVEL" name = "LOG_LEVEL"
value = "INFO" value = "INFO"
} }
# IG-archive dedup config. Defaults match the Python Settings
# class; override here so a flag flip is a `terraform apply`
# not a code change. `enabled=false` until the export-zip
# backfill + a few days of /ig-ingest produce a populated
# ig_posted_media table we don't want to filter everything
# out on day one when the table is empty.
env {
name = "IG_DEDUP_ENABLED"
value = "false"
}
env {
name = "IG_ML_URL"
value = "http://immich-machine-learning.immich.svc.cluster.local:3003"
}
volume_mount { volume_mount {
name = "data" name = "data"
@ -322,3 +356,151 @@ module "ingress_protected" {
port = 80 port = 80
service_name = "instagram-poster" service_name = "instagram-poster"
} }
# IG-archive dedup live ingest. Three CronJobs all curl back into the
# in-cluster Service. /ig-ingest is idempotent (ON CONFLICT DO NOTHING),
# so missed runs / restarts are harmless.
#
# Cadence rationale (plan §4):
# stories */30. Stories age out at 24h; running every 30m means a
# missed run still catches the entire window with margin.
# feed every 6h. Feed posts don't expire; this is just for
# "what's been posted since last poll".
# refresh daily 02:00. Long-lived token has 60-day TTL; refreshing
# daily is wildly conservative but cheap, and the
# alternative ("rotate at expiry-7d") needs persistent state.
locals {
ig_cron_image = "curlimages/curl:8.10.1"
}
resource "kubernetes_cron_job_v1" "ig_ingest_stories" {
metadata {
name = "ig-ingest-stories"
namespace = kubernetes_namespace.instagram_poster.metadata[0].name
labels = local.labels
}
spec {
schedule = "*/30 * * * *"
concurrency_policy = "Forbid"
successful_jobs_history_limit = 1
failed_jobs_history_limit = 3
job_template {
metadata {}
spec {
backoff_limit = 1
template {
metadata {}
spec {
restart_policy = "OnFailure"
container {
name = "ingest"
image = local.ig_cron_image
command = [
"sh", "-c",
"curl -fsS -X POST http://instagram-poster.instagram-poster.svc.cluster.local/ig-ingest -H 'Content-Type: application/json' -d '{\"include\":[\"stories\"]}'",
]
resources {
requests = { cpu = "10m", memory = "32Mi" }
limits = { memory = "64Mi" }
}
}
}
}
}
}
}
lifecycle {
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
depends_on = [kubernetes_deployment.instagram_poster]
}
resource "kubernetes_cron_job_v1" "ig_ingest_feed" {
metadata {
name = "ig-ingest-feed"
namespace = kubernetes_namespace.instagram_poster.metadata[0].name
labels = local.labels
}
spec {
schedule = "0 */6 * * *"
concurrency_policy = "Forbid"
successful_jobs_history_limit = 1
failed_jobs_history_limit = 3
job_template {
metadata {}
spec {
backoff_limit = 1
template {
metadata {}
spec {
restart_policy = "OnFailure"
container {
name = "ingest"
image = local.ig_cron_image
command = [
"sh", "-c",
"curl -fsS -X POST http://instagram-poster.instagram-poster.svc.cluster.local/ig-ingest -H 'Content-Type: application/json' -d '{\"include\":[\"feed\"]}'",
]
resources {
requests = { cpu = "10m", memory = "32Mi" }
limits = { memory = "64Mi" }
}
}
}
}
}
}
}
lifecycle {
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
depends_on = [kubernetes_deployment.instagram_poster]
}
# Daily token refresh. Hits /ig-refresh-token which returns the new token
# in JSON; we don't write it back to Vault here that's a follow-up
# decision (plan §6, open item #2). For now the new token is logged and
# operators rotate manually if it ever fails. The 60-day TTL gives plenty
# of room.
resource "kubernetes_cron_job_v1" "ig_refresh_token" {
metadata {
name = "ig-refresh-token"
namespace = kubernetes_namespace.instagram_poster.metadata[0].name
labels = local.labels
}
spec {
schedule = "0 2 * * *"
concurrency_policy = "Forbid"
successful_jobs_history_limit = 1
failed_jobs_history_limit = 3
job_template {
metadata {}
spec {
backoff_limit = 1
template {
metadata {}
spec {
restart_policy = "OnFailure"
container {
name = "refresh"
image = local.ig_cron_image
command = [
"sh", "-c",
"curl -fsS -X POST http://instagram-poster.instagram-poster.svc.cluster.local/ig-refresh-token",
]
resources {
requests = { cpu = "10m", memory = "32Mi" }
limits = { memory = "64Mi" }
}
}
}
}
}
}
}
lifecycle {
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
depends_on = [kubernetes_deployment.instagram_poster]
}

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
] ]
} }
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" { provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1" version = "3.1.1"
hashes = [ hashes = [

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform { terraform {
backend "pg" { backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable" conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "jsoncrack" schema_name = "jsoncrack"
} }
} }

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare" source = "cloudflare/cloudflare"
version = "~> 4" version = "~> 4"
} }
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
} }
} }

View file

@ -65,7 +65,7 @@ locals {
"-c ${cfg.ctx_size}", "-c ${cfg.ctx_size}",
"-np 1", "-np 1",
"--jinja", "--jinja",
"-fa", "-fa on",
]) ])
ttl = 600 # unload after 10 min idle ttl = 600 # unload after 10 min idle
checkEndpoint = "/health" checkEndpoint = "/health"

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform { terraform {
backend "pg" { backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable" conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "real-estate-crawler" schema_name = "real-estate-crawler"
} }
} }

View file

@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" {
] ]
} }
provider "registry.terraform.io/goauthentik/authentik" {
version = "2024.12.1"
constraints = "~> 2024.10"
hashes = [
"h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=",
]
}
provider "registry.terraform.io/hashicorp/helm" { provider "registry.terraform.io/hashicorp/helm" {
version = "3.1.1" version = "3.1.1"
hashes = [ hashes = [

View file

@ -1,7 +1,7 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa # Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
terraform { terraform {
backend "pg" { backend "pg" {
conn_str = "postgres://terraform_state:SBlzGxotNUN6HH9d0S-m@10.0.20.200:5432/terraform_state?sslmode=disable" conn_str = "postgres://terraform_state:ts7DGcKmTTY-5ujz4mhh@10.0.20.200:5432/terraform_state?sslmode=disable"
schema_name = "travel_blog" schema_name = "travel_blog"
} }
} }

View file

@ -9,6 +9,10 @@ terraform {
source = "cloudflare/cloudflare" source = "cloudflare/cloudflare"
version = "~> 4" version = "~> 4"
} }
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
} }
} }

View file

@ -539,7 +539,8 @@ resource "vault_database_secret_backend_connection" "postgresql" {
"pg-health", "pg-linkwarden", "pg-health", "pg-linkwarden",
"pg-affine", "pg-woodpecker", "pg-claude-memory", "pg-affine", "pg-woodpecker", "pg-claude-memory",
"pg-terraform-state", "pg-payslip-ingest", "pg-job-hunter", "pg-terraform-state", "pg-payslip-ingest", "pg-job-hunter",
"pg-wealthfolio-sync", "pg-fire-planner" "pg-wealthfolio-sync", "pg-fire-planner",
"pg-postiz",
] ]
postgresql { postgresql {
@ -677,6 +678,17 @@ resource "vault_database_secret_backend_static_role" "pg_terraform_state" {
rotation_period = 604800 rotation_period = 604800
} }
# Postiz uses three databases (postiz, temporal, temporal_visibility) all owned
# by the `postiz` PG role. One static role covers all three. Migrated from the
# bundled bitnami PG StatefulSet to CNPG on 2026-05-09.
resource "vault_database_secret_backend_static_role" "pg_postiz" {
backend = vault_mount.database.path
db_name = vault_database_secret_backend_connection.postgresql.name
name = "pg-postiz"
username = "postiz"
rotation_period = 604800
}
resource "vault_database_secret_backend_static_role" "pg_payslip_ingest" { resource "vault_database_secret_backend_static_role" "pg_payslip_ingest" {
backend = vault_mount.database.path backend = vault_mount.database.path
db_name = vault_database_secret_backend_connection.postgresql.name db_name = vault_database_secret_backend_connection.postgresql.name