fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip]
6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
6d224861c4
commit
fd0f4a0365
1166 changed files with 358546 additions and 0 deletions
23
stacks/llama-cpp/terragrunt.hcl
Normal file
23
stacks/llama-cpp/terragrunt.hcl
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
include "root" {
|
||||
path = find_in_parent_folders()
|
||||
}
|
||||
|
||||
dependency "platform" {
|
||||
config_path = "../platform"
|
||||
skip_outputs = true
|
||||
}
|
||||
|
||||
dependency "vault" {
|
||||
config_path = "../vault"
|
||||
skip_outputs = true
|
||||
}
|
||||
|
||||
# llama-cpp: in-cluster vision-LLM server. One Deployment of
|
||||
# `mostlygeek/llama-swap:cuda` fronts three models (qwen3vl-8b,
|
||||
# minicpm-v-4-5, qwen3vl-4b) at a single OpenAI-compat /v1 endpoint
|
||||
# on Service `llama-swap`. llama-swap loads/unloads per-model
|
||||
# llama-server subprocesses on demand (idle TTL 10 min). The T4 is
|
||||
# allocated wholly to this pod; immich-ml must be scaled to 0 during
|
||||
# benchmark runs. See infra/docs/architecture/llama-cpp.md for the
|
||||
# full rationale (build ≥ b6907 for Qwen3-VL, T4 FP16/INT4 only,
|
||||
# llama-swap over Ollama, etc.).
|
||||
Loading…
Add table
Add a link
Reference in a new issue