infra

No description

Find a file

Viktor Barzin 48013a4a92 feat(tts): Chatterbox TTS stack + off-peak T4 gate, wire tripit narration [ci skip] New `infra/stacks/tts/` deploys devnen/Chatterbox-TTS-Server (OpenAI-compatible /v1/audio/speech) as ClusterIP `chatterbox-tts.tts.svc:8000` (server listens on 8004; Service remaps), requesting ONE T4 time-slice. Mirrors stacks/llama-cpp/. Option A off-peak control (no VRAM isolation on the time-sliced T4 — see post-mortem 2026-06-02): Deployment sits at replicas=0; three Europe/London CronJobs own the replica count — `chatterbox-window-up` scales to 1 at 02:00 ONLY IF a free-VRAM preflight passes (sum gpu_pod_memory_used_bytes from gpu-pod-exporter; free = 16GiB - used >= floor), `chatterbox-vram-guard` yields the card mid-window if a resident wakes, `chatterbox-window-down` scales to 0 at 06:00. tripit's bake is best-effort + cached-forever (ADR-0002/0004) so a skipped/aborted window backfills next time. SA+Role+RoleBinding grant the CronJobs deployments/scale (nextcloud-watchdog pattern). Polite-tenant hardening: kyverno `inject-gpu-workload-priority` now excludes the `tts` namespace (new `gpu_priority_excluded_namespaces` local) so Chatterbox keeps tier-2-gpu priority (600k) and is always evicted first under GPU pressure — never immich-ml/frigate/llama-swap. The LimitRange-fallback policy still uses the base exclude list (tts untouched there). tripit: add TTS_MODE=openai_compatible, TTS_BASE_URL, TTS_MODEL=chatterbox to local.app_env (no token — ClusterIP only). No tripit code change. Image build is documented in stacks/tts/README.md (devnen cu128 target -> forgejo.viktorbarzin.me/viktor/chatterbox-tts) — build is impractical inline (large CUDA image + needs the upstream repo). NOT APPLIED — review branch only. Free-VRAM floor (var.vram_free_floor_bytes, default 6GiB) must be set from the measured chatterbox-multilingual T4 peak during the first bake. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>		2026-06-09 21:21:39 +00:00
.beads	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
.claude	docs(ci-cd): tripit auto-deploy (GHA->Woodpecker 167) + svu semver in GHA [ci skip]	2026-06-09 19:41:08 +00:00
.git-crypt	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
.github	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
.planning	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
.woodpecker	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
ci	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
cli	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
diagram	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
docs	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
modules	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
playbooks	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
scripts	workstation: skel start-claude.sh inherits managed default model (drop hardcoded --model)	2026-06-09 19:35:29 +00:00
secrets	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
stacks	feat(tts): Chatterbox TTS stack + off-peak T4 gate, wire tripit narration [ci skip]	2026-06-09 21:21:39 +00:00
state/stacks	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
.gitattributes	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
.gitignore	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
.gitleaksignore	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
.mcp.json	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
.sops.yaml	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
AGENTS.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
config.tfvars	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
CONTEXT.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
CONTRIBUTING.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
LICENSE.txt	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
MEMORY.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
README.md	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
terragrunt.hcl	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00
tiers.tf	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip]	2026-06-09 08:45:33 +00:00

README.md

This repo contains my infra-as-code sources.

My infrastructure is built using Terraform, Kubernetes and CI/CD is done using Woodpecker CI.

Read more by visiting my website: https://viktorbarzin.me

Documentation

Full architecture documentation is available in docs/ — covering networking, storage, security, monitoring, secrets, CI/CD, databases, and more.

Adding a New User (Admin)

Adding a new namespace-owner to the cluster requires three steps — no code changes needed.

1. Authentik Group Assignment

In the Authentik admin UI, add the user to:

kubernetes-namespace-owners group (grants OIDC group claim for K8s RBAC)
Headscale Users group (if they need VPN access)

2. Vault KV Entry

Add a JSON entry to secret/platform → k8s_users key in Vault:

"username": {
  "role": "namespace-owner",
  "email": "user@example.com",
  "namespaces": ["username"],
  "domains": ["myapp"],
  "quota": {
    "cpu_requests": "2",
    "memory_requests": "4Gi",
    "memory_limits": "8Gi",
    "pods": "20"
  }
}

username key must match the user's Forgejo username (for Woodpecker admin access)
namespaces — K8s namespaces to create and grant admin access to
domains — subdomains under viktorbarzin.me for Cloudflare DNS records
quota — resource limits per namespace (defaults shown above)

3. Apply Stacks

vault login -method=oidc

cd stacks/vault && terragrunt apply --non-interactive
# Creates: namespace, Vault policy, identity entity, K8s deployer role

cd ../platform && terragrunt apply --non-interactive
# Creates: RBAC bindings, ResourceQuota, TLS secret, DNS records

cd ../woodpecker && terragrunt apply --non-interactive
# Adds user to Woodpecker admin list

What Gets Auto-Generated

Resource	Stack
Kubernetes namespace	vault
Vault policy (`namespace-owner-{user}`)	vault
Vault identity entity + OIDC alias	vault
K8s deployer Role + Vault K8s role	vault
RBAC RoleBinding (namespace admin)	platform
RBAC ClusterRoleBinding (cluster read-only)	platform
ResourceQuota	platform
TLS secret in namespace	platform
Cloudflare DNS records	platform
Woodpecker admin access	woodpecker

New User Onboarding

If you've been added as a namespace-owner, follow these steps to get started.

1. Join the VPN

# Install Tailscale: https://tailscale.com/download
tailscale login --login-server https://headscale.viktorbarzin.me
# Send the registration URL to Viktor, wait for approval
ping 10.0.20.100  # verify connectivity

2. Install Tools

Run the setup script to install kubectl, kubelogin, Vault CLI, Terraform, and Terragrunt:

# macOS
bash <(curl -fsSL https://k8s-portal.viktorbarzin.me/setup/script?os=mac)

# Linux
bash <(curl -fsSL https://k8s-portal.viktorbarzin.me/setup/script?os=linux)

3. Authenticate

# Log into Vault (opens browser for SSO)
vault login -method=oidc

# Test kubectl (opens browser for OIDC login)
kubectl get pods -n YOUR_NAMESPACE

4. Deploy Your First App

# Clone the infra repo
git clone https://github.com/ViktorBarzin/infra.git && cd infra

# Copy the stack template
cp -r stacks/_template stacks/myapp
mv stacks/myapp/main.tf.example stacks/myapp/main.tf

# Edit main.tf — replace all <placeholders>

# Store secrets in Vault
vault kv put secret/YOUR_USERNAME/myapp DB_PASSWORD=secret123

# Submit a PR
git checkout -b feat/myapp
git add stacks/myapp/
git commit -m "add myapp stack"
git push -u origin feat/myapp

After review and merge, an admin runs cd stacks/myapp && terragrunt apply.

5. Set Up CI/CD (Optional)

Create .woodpecker.yml in your app's Forgejo repo:

steps:
  - name: build
    image: woodpeckerci/plugin-docker-buildx
    settings:
      repo: YOUR_DOCKERHUB_USER/myapp
      tag: ["${CI_PIPELINE_NUMBER}", "latest"]
      username:
        from_secret: dockerhub-username
      password:
        from_secret: dockerhub-token
      platforms: linux/amd64

  - name: deploy
    image: hashicorp/vault:1.18.1
    commands:
      - export VAULT_ADDR=http://vault-active.vault.svc.cluster.local:8200
      - export VAULT_TOKEN=$(vault write -field=token auth/kubernetes/login
          role=ci jwt=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token))
      - KUBE_TOKEN=$(vault write -field=service_account_token
          kubernetes/creds/YOUR_NAMESPACE-deployer
          kubernetes_namespace=YOUR_NAMESPACE)
      - kubectl --server=https://kubernetes.default.svc
          --token=$KUBE_TOKEN
          --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          -n YOUR_NAMESPACE set image deployment/myapp
          myapp=YOUR_DOCKERHUB_USER/myapp:${CI_PIPELINE_NUMBER}

Useful Commands

# Check your pods
kubectl get pods -n YOUR_NAMESPACE

# View quota usage
kubectl describe resourcequota -n YOUR_NAMESPACE

# Store/read secrets
vault kv put secret/YOUR_USERNAME/myapp KEY=value
vault kv get secret/YOUR_USERNAME/myapp

# Get a short-lived K8s deploy token
vault write kubernetes/creds/YOUR_NAMESPACE-deployer \
  kubernetes_namespace=YOUR_NAMESPACE

Important Rules

All changes go through Terraform — never kubectl apply/edit/patch directly
Never put secrets in code — use Vault: vault kv put secret/YOUR_USERNAME/...
Always use a PR — never push directly to master
Docker images: build for linux/amd64, use versioned tags (not :latest)

git-crypt setup

To decrypt the secrets, you need to setup git-crypt.

Install git-crypt.
Setup gpg keys on the machine
git-crypt unlock

This will unlock the secrets and will lock them on commit