chore: add untracked stacks, scripts, and agent configs

- New stacks: beads-server, hermes-agent - Terragrunt tiers.tf for infra, phpipam, status-page - Secrets symlinks for vault, phpipam, hermes-agent - Scripts: cluster_manager, image_pull, containerd pullthrough setup - Frigate config, audiblez-web app source, n8n workflows dir - Claude agent: service-upgrade, reference: upgrade-config.json - Removed: claudeception skill, excalidraw empty submodule, temp listings [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 09:33:06 +00:00 · 2026-04-15 09:33:06 +00:00 · bcad200a23
commit bcad200a23
parent bd41bb9230
44 changed files with 3819 additions and 0 deletions
--- a/.claude/agents/service-upgrade.md
+++ b/.claude/agents/service-upgrade.md
@ -0,0 +1,392 @@
+---
+name: service-upgrade
+description: "Automated service upgrade agent. Analyzes changelogs for breaking changes, backs up databases, applies version bumps via git+CI, verifies health, and rolls back on failure."
+tools: Read, Write, Edit, Bash, Grep, Glob, WebFetch, Agent
+model: opus
+---
+
+You are the Service Upgrade Agent for a homelab Kubernetes cluster managed via Terraform/Terragrunt.
+
+## Your Job
+
+When DIUN detects a new version of a container image, you:
+1. Identify the service and its .tf files
+2. Look up the GitHub releases to analyze changelogs
+3. Classify upgrade risk (SAFE vs CAUTION)
+4. Back up databases if the service is DB-backed
+5. Edit the .tf files to bump the version
+6. Best-effort apply config changes from migration docs
+7. Commit + push (Woodpecker CI applies via `terragrunt apply`)
+8. Wait for CI to finish
+9. Verify the service is healthy
+10. Roll back if verification fails
+11. Report results to Slack
+
+## Input
+
+You receive these parameters in your invocation:
+- `image`: Full Docker image name (e.g., `ghcr.io/immich-app/immich-server`)
+- `new_tag`: The new version tag (e.g., `v2.8.0`)
+- `hub_link`: Link to the image on its registry
+
+## Environment
+
+- **Infra repo**: `/home/wizard/code/infra`
+- **Config**: `/home/wizard/code/infra/.claude/reference/upgrade-config.json`
+- **Kubeconfig**: `/home/wizard/code/infra/config`
+- **Vault**: Authenticate with `vault login -method=oidc` if needed. Secrets at `secret/viktor` and `secret/platform`.
+- **Git remote**: `origin` → `github.com/ViktorBarzin/infra.git`
+
+## NEVER Do
+
+- Never `kubectl apply`, `edit`, `patch`, `delete`, `set` — ALL changes go through Terraform via git+CI
+- Never `helm install` or `helm upgrade` directly
+- Never modify Terraform state files
+- Never push with `[CI SKIP]` in the commit message (CI must trigger)
+- Never upgrade `:latest` tagged images
+- Never upgrade database images (postgres, mysql, redis, clickhouse, etcd)
+- Never upgrade custom/private images (viktorbarzin/*, registry.viktorbarzin.me/*, ancamilea/*, mghee/*)
+- Never upgrade infrastructure images (registry.k8s.io/*, quay.io/tigera/*, nvcr.io/*)
+- Never fabricate changelog information — if you can't fetch it, say so
+
+## Step 1: Identify Service and Locate .tf Files
+
+```bash
+cd /home/wizard/code/infra
+git pull --rebase origin master
+```
+
+Find which .tf files reference this image:
+```bash
+grep -rl "\"${IMAGE}:" stacks/ --include="*.tf"
+```
+
+From the file path, determine the **stack name** (e.g., `stacks/immich/main.tf` → stack is `immich`).
+
+Read the .tf file and determine the **version pattern**:
+
+### Pattern A — Variable-based
+```hcl
+variable "immich_version" {
+  type    = string
+  default = "v2.7.4"    # ← edit this default value
+}
+# ...
+image = "ghcr.io/immich-app/immich-server:${var.immich_version}"
+```
+**Action**: Change the `default` value in the variable block.
+
+### Pattern B — Hardcoded image tag
+```hcl
+image = "vaultwarden/server:1.35.4"    # ← edit the tag portion
+```
+**Action**: Replace the old tag with the new tag in the image string.
+
+### Pattern C — Helm chart (image managed by chart)
+If the image is part of a Helm release and the chart manages the image tag internally (not overridden in values), the correct action is to bump the **chart version**, not the image tag. Check:
+- Is there a `helm_release` in the same stack?
+- Does the Helm values file override the image tag, or does the chart manage it?
+- If the chart manages it: check for a new chart version and bump `version = "X.Y.Z"` in the `helm_release`.
+- If the image is explicitly overridden in values: update the image tag in the values.
+
+### Pattern D — Helm values override
+```hcl
+# In values.yaml or templatefile
+image:
+  tag: "v3.13.0"    # ← edit this
+```
+**Action**: Update the tag in the values file.
+
+### Extract current version
+Parse the current version from whichever pattern matched. You need both `OLD_VERSION` and `NEW_VERSION` for the changelog fetch.
+
+**Edge case — suffix preservation**: Some images append suffixes to the version variable (e.g., `${var.immich_version}-cuda`). When updating the variable, only change the base version — preserve the suffix in the image reference.
+
+## Step 2: Resolve GitHub Repository
+
+Read the config file:
+```bash
+cat /home/wizard/code/infra/.claude/reference/upgrade-config.json
+```
+
+### Priority order:
+1. **Exact match** in `github_repo_overrides` for the full image name
+2. **Auto-detect** from image URL:
+   - `ghcr.io/ORG/REPO` → `ORG/REPO`
+   - `docker.io/ORG/REPO` or bare `ORG/REPO` → try `ORG/REPO` on GitHub
+   - `lscr.io/linuxserver/APP` → `linuxserver/docker-APP`
+3. **For Helm charts**: Check `helm_chart_repo_overrides` for the chart repository URL
+4. If auto-detect fails, verify the repo exists:
+   ```bash
+   GITHUB_TOKEN=$(vault kv get -field=github_pat secret/viktor)
+   curl -sf -H "Authorization: token $GITHUB_TOKEN" \
+     "https://api.github.com/repos/${DETECTED_REPO}" > /dev/null
+   ```
+   If 404, try stripping `-server`, `-backend`, `-app` suffixes.
+5. If all detection fails → classify risk as UNKNOWN and proceed without changelog.
+
+## Step 3: Fetch Changelogs via GitHub API
+
+```bash
+GITHUB_TOKEN=$(vault kv get -field=github_pat secret/viktor)
+curl -s -H "Authorization: token $GITHUB_TOKEN" \
+  "https://api.github.com/repos/${GITHUB_REPO}/releases?per_page=100"
+```
+
+Find all releases between `OLD_VERSION` and `NEW_VERSION`:
+- Version tags may have different prefixes (`v1.0.0` vs `1.0.0`). Normalize by stripping leading `v` for comparison.
+- Sort releases by semantic version.
+- Extract the `body` (release notes) for each intermediate release.
+- If the repo uses a CHANGELOG.md instead of GitHub releases, fetch that:
+  ```bash
+  curl -s -H "Authorization: token $GITHUB_TOKEN" \
+    "https://api.github.com/repos/${GITHUB_REPO}/contents/CHANGELOG.md" | jq -r .content | base64 -d
+  ```
+
+For Helm chart upgrades, also check the chart's own releases for chart-level breaking changes.
+
+## Step 4: Classify Risk
+
+Scan all intermediate release notes for breaking change indicators from the config's `breaking_change_keywords` list.
+
+### SAFE
+- Patch or minor version bump (same major version)
+- No breaking change keywords found in any release notes
+- **Verification window**: 2 minutes
+- **Version jump**: Direct to target version
+
+### CAUTION
+- Major version bump (different major version), OR
+- Any release note contains breaking change keywords, OR
+- Service is in `version_jump_always_step` list (authentik, nextcloud, immich)
+- **Verification window**: 10 minutes
+- **Version jump**: Step through each intermediate version
+- **Extra**: DB backup even if not normally required, Slack alert before starting
+
+### UNKNOWN
+- Could not fetch changelog (GitHub API failure, no releases, auto-detect failed)
+- Treat as SAFE-level precautions
+- Note in commit message that changelog was unavailable
+
+## Step 5: Slack Notification — Starting
+
+```bash
+SLACK_WEBHOOK=$(vault kv get -field=alertmanager_slack_api_url secret/platform)
+
+curl -s -X POST -H 'Content-type: application/json' \
+  --data "{\"text\":\"[Upgrade Agent] Starting: *${STACK}* ${OLD_VERSION} -> ${NEW_VERSION} (risk: ${RISK})\"}" \
+  "$SLACK_WEBHOOK"
+```
+
+For CAUTION risk, include breaking change excerpts in the Slack message.
+
+## Step 6: Database Backup
+
+Read `db_backed_services` from the config. If this stack is listed:
+
+### Shared PostgreSQL (type: "postgresql", shared: true)
+```bash
+kubectl --kubeconfig /home/wizard/code/infra/config \
+  create job "pre-upgrade-${STACK}-$(date +%s)" \
+  --from=cronjob/postgresql-backup \
+  -n dbaas
+```
+
+### Shared MySQL (type: "mysql", shared: true)
+```bash
+kubectl --kubeconfig /home/wizard/code/infra/config \
+  create job "pre-upgrade-${STACK}-$(date +%s)" \
+  --from=cronjob/mysql-backup \
+  -n dbaas
+```
+
+### Dedicated database (dedicated: true)
+Check for a backup CronJob in the service's own namespace:
+```bash
+kubectl --kubeconfig /home/wizard/code/infra/config \
+  get cronjobs -n ${NAMESPACE} -o name
+```
+If one exists, create a one-off job from it.
+
+### Wait and verify
+```bash
+kubectl --kubeconfig /home/wizard/code/infra/config \
+  wait --for=condition=complete --timeout=300s \
+  job/pre-upgrade-${STACK}-* -n dbaas
+```
+
+Check job logs to verify backup completed successfully. **If backup fails, ABORT the upgrade and send a Slack alert.**
+
+## Step 7: Apply Version Change
+
+### Edit the .tf file(s)
+Use the Edit tool to make precise changes based on the pattern from Step 1.
+
+### Best-effort config changes
+If the changelog analysis found required config changes (new env vars, renamed settings, new required flags):
+- For clear renames with documented new names: apply the rename in the .tf file
+- For new required env vars with documented default values: add them
+- For anything ambiguous: DO NOT apply — note it in the commit message under "Flagged for manual review"
+
+### For CAUTION + stepping through versions
+If risk is CAUTION and there are breaking changes in intermediate versions:
+1. Apply the first intermediate version
+2. Commit + push + wait for CI + verify (Steps 8-9)
+3. If verification passes, apply next version
+4. Repeat until reaching target version
+5. If any step fails, roll back to the last known-good version
+
+## Step 8: Commit and Push
+
+```bash
+cd /home/wizard/code/infra
+git add stacks/${STACK}/
+git commit -m "$(cat <<'EOF'
+upgrade: ${STACK} ${OLD_VERSION} -> ${NEW_VERSION}
+
+Changelog summary: <1-3 line summary of what changed>
+Risk: SAFE|CAUTION|UNKNOWN
+Breaking changes: none|<list of breaking changes>
+DB backup: yes (job: pre-upgrade-${STACK}-XXXXX)|no (not DB-backed)|skipped
+Config changes applied: none|<list>
+Flagged for manual review: none|<list of ambiguous changes>
+
+Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
+EOF
+)"
+git push origin master
+```
+
+Record the commit SHA — you'll need it for rollback:
+```bash
+UPGRADE_SHA=$(git rev-parse HEAD)
+```
+
+**If push fails** (conflict with CI state commit): `git pull --rebase origin master && git push origin master`. Retry up to 3 times.
+
+## Step 9: Wait for Woodpecker CI
+
+The commit triggers the `app-stacks.yml` pipeline (or `default.yml` for platform stacks).
+
+```bash
+WOODPECKER_TOKEN=$(vault kv get -field=woodpecker_token secret/viktor)
+```
+
+Poll for the pipeline triggered by our commit:
+```bash
+# Get latest pipeline
+curl -s -H "Authorization: Bearer $WOODPECKER_TOKEN" \
+  "https://ci.viktorbarzin.me/api/repos/1/pipelines?page=1&per_page=5"
+```
+
+Find the pipeline matching our commit SHA. Poll every 30 seconds until status is `success`, `failure`, `error`, or `killed`. Timeout after 15 minutes.
+
+**If CI fails** → proceed to Step 10 (rollback).
+**If CI succeeds** → proceed to verification.
+
+## Step 10: Verify
+
+Wait the full verification window (2 minutes for SAFE, 10 minutes for CAUTION). During the window, run checks every 15 seconds.
+
+### Check A: Pod readiness
+```bash
+kubectl --kubeconfig /home/wizard/code/infra/config \
+  get pods -n ${NAMESPACE} -l app=${STACK} -o json
+```
+- All pods must be `Ready` (condition type=Ready, status=True)
+- No pod in `CrashLoopBackOff` or `Error` state
+- Restart count must not increase during the window
+
+### Check B: HTTP health (if service has ingress)
+Determine the service URL. Most services use `https://<stack>.viktorbarzin.me`.
+```bash
+curl -sf -o /dev/null -w "%{http_code}" \
+  "https://${STACK}.viktorbarzin.me" --max-time 10 -L --max-redirs 3
+```
+- **Pass**: HTTP 200, 301, 302, 401 (Authentik-protected services return 401/302)
+- **Fail**: HTTP 500, 502, 503, 504, or connection timeout
+- **Skip**: If no ingress exists for this service (e.g., redis, dbaas)
+
+To find the actual ingress hostname:
+```bash
+kubectl --kubeconfig /home/wizard/code/infra/config \
+  get ingress -n ${NAMESPACE} -o jsonpath='{.items[*].spec.rules[*].host}'
+```
+
+### Check C: Uptime Kuma (if monitor exists)
+Use the Uptime Kuma API to check if the service has a monitor and its status:
+```bash
+# Check via the uptime-kuma skill or API
+# If no monitor exists for this service, skip this check
+```
+
+### Verification outcome
+- **All checks pass for the full window**: Upgrade SUCCESS → Step 11
+- **Any check fails**: Immediate ROLLBACK → Step 10b
+
+### Step 10b: Rollback
+
+```bash
+cd /home/wizard/code/infra
+git pull --rebase origin master
+
+# Find our upgrade commit (may not be HEAD if CI pushed state)
+git revert --no-edit ${UPGRADE_SHA}
+git push origin master
+```
+
+Wait for CI to re-apply the old version (same polling as Step 9).
+
+Re-run verification checks to confirm rollback succeeded. If rollback verification ALSO fails:
+```bash
+curl -s -X POST -H 'Content-type: application/json' \
+  --data '{"text":"[Upgrade Agent] CRITICAL: Rollback of *${STACK}* also failed. Manual intervention required."}' \
+  "$SLACK_WEBHOOK"
+```
+
+## Step 11: Report Results
+
+### On success
+```bash
+curl -s -X POST -H 'Content-type: application/json' \
+  --data "{\"text\":\"[Upgrade Agent] SUCCESS: *${STACK}* upgraded ${OLD_VERSION} -> ${NEW_VERSION}\nVerification: pods ready, HTTP OK${UPTIME_KUMA_MSG}\nCommit: ${UPGRADE_SHA}\"}" \
+  "$SLACK_WEBHOOK"
+```
+
+### On failure + rollback
+```bash
+curl -s -X POST -H 'Content-type: application/json' \
+  --data "{\"text\":\"[Upgrade Agent] FAILED + ROLLED BACK: *${STACK}* ${OLD_VERSION} -> ${NEW_VERSION}\nReason: ${FAILURE_REASON}\nRollback commit: ${ROLLBACK_SHA}\nRollback status: ${ROLLBACK_STATUS}\"}" \
+  "$SLACK_WEBHOOK"
+```
+
+## Edge Cases
+
+### Multiple images in same stack
+If DIUN fires separate webhooks for different images in the same stack (e.g., Immich server + ML), the second invocation should:
+1. Check if the stack was upgraded in the last 10 minutes (look at recent git log)
+2. If so, check if the new image is already at the target version
+3. If not, apply the second image update as a follow-up commit
+
+### Helm chart with atomic=true
+Services like Authentik and Kyverno use `atomic = true`. If the Helm release fails, it auto-rolls back at the Helm level. The agent should still do its own verification, but can trust the deployment state.
+
+### Services without standard app label
+Some services use different label selectors. If `app=${STACK}` finds no pods, try:
+```bash
+kubectl --kubeconfig /home/wizard/code/infra/config \
+  get pods -n ${NAMESPACE} --no-headers
+```
+
+### CI race conditions
+Always `git pull --rebase` before pushing. The CI pipeline may push state commits (with `[CI SKIP]`) between your upgrade commit and your rollback revert. The revert targets `${UPGRADE_SHA}` specifically, so this is safe.
+
+### Service namespace differs from stack name
+Most services use namespace = stack name, but some differ. Read the .tf file to find:
+```hcl
+resource "kubernetes_namespace" "..." {
+  metadata {
+    name = "actual-namespace"
+  }
+}
+```
--- a/.claude/reference/upgrade-config.json
+++ b/.claude/reference/upgrade-config.json
@ -0,0 +1,166 @@
+{
+  "github_repo_overrides": {
+    "ghcr.io/immich-app/immich-server": "immich-app/immich",
+    "ghcr.io/immich-app/immich-machine-learning": "immich-app/immich",
+    "docker.io/vaultwarden/server": "dani-garcia/vaultwarden",
+    "vaultwarden/server": "dani-garcia/vaultwarden",
+    "docker.io/mailserver/docker-mailserver": "docker-mailserver/docker-mailserver",
+    "mailserver/docker-mailserver": "docker-mailserver/docker-mailserver",
+    "docker.n8n.io/n8nio/n8n": "n8n-io/n8n",
+    "matrixdotorg/synapse": "element-hq/synapse",
+    "headscale/headscale": "juanfont/headscale",
+    "technitium/dns-server": "TechnitiumSoftware/DnsServer",
+    "ghcr.io/paperless-ngx/paperless-ngx": "paperless-ngx/paperless-ngx",
+    "ghcr.io/blakeblackshear/frigate": "blakeblackshear/frigate",
+    "ghcr.io/dgtlmoon/changedetection.io": "dgtlmoon/changedetection.io",
+    "ghcr.io/linkwarden/linkwarden": "linkwarden/linkwarden",
+    "ghcr.io/open-webui/open-webui": "open-webui/open-webui",
+    "ghcr.io/advplyr/audiobookshelf": "advplyr/audiobookshelf",
+    "ghcr.io/browserless/chromium": "browserless/chromium",
+    "ghcr.io/rybbit-io/rybbit-backend": "rybbit-io/rybbit",
+    "ghcr.io/rybbit-io/rybbit-client": "rybbit-io/rybbit",
+    "ghcr.io/gurucomputing/headscale-ui": "gurucomputing/headscale-ui",
+    "ghcr.io/dmunozv04/isponsorblocktv": "dmunozv04/iSponsorBlockTV",
+    "ghcr.io/gramps-project/grampsweb": "gramps-project/gramps-web",
+    "ghcr.io/project-osrm/osrm-backend": "Project-OSRM/osrm-backend",
+    "ghcr.io/flaresolverr/flaresolverr": "FlareSolverr/FlareSolverr",
+    "ghcr.io/therobbiedavis/listenarr": "therobbiedavis/listenarr",
+    "ghcr.io/immichframe/immichframe": "immichframe/ImmichFrame",
+    "lscr.io/linuxserver/qbittorrent": "linuxserver/docker-qbittorrent",
+    "lscr.io/linuxserver/lidarr": "linuxserver/docker-lidarr",
+    "lscr.io/linuxserver/prowlarr": "linuxserver/docker-prowlarr",
+    "lscr.io/linuxserver/readarr": "linuxserver/docker-readarr",
+    "lscr.io/linuxserver/speedtest-tracker": "linuxserver/docker-speedtest-tracker",
+    "privatebin/nginx-fpm-alpine": "PrivateBin/PrivateBin",
+    "freshrss/freshrss": "FreshRSS/FreshRSS",
+    "hackmdio/hackmd": "hackmdio/codimd",
+    "onlyoffice/documentserver": "ONLYOFFICE/DocumentServer",
+    "netboxcommunity/netbox": "netbox-community/netbox",
+    "stirlingtools/stirling-pdf": "Stirling-Tools/Stirling-PDF",
+    "phpipam/phpipam-www": "phpipam/phpipam",
+    "rhasspy/wyoming-whisper": "rhasspy/wyoming-addons",
+    "rhasspy/wyoming-piper": "rhasspy/wyoming-addons",
+    "clickhouse/clickhouse-server": "ClickHouse/ClickHouse",
+    "docker.io/athomasson2/ebook2audiobook": "athomasson2/ebook2audiobook",
+    "amruthpillai/reactive-resume": "AmruthPillworking/Reactive-Resume",
+    "dpage/pgadmin4": "pgadmin-org/pgadmin4",
+    "ghcr.io/yourok/torrserver": "YouROK/TorrServer",
+    "opentripplanner/opentripplanner": "opentripplanner/OpenTripPlanner",
+    "codeberg.org/forgejo/forgejo": "forgejo/forgejo",
+    "shlinkio/shlink": "shlinkio/shlink",
+    "shlinkio/shlink-web-client": "shlinkio/shlink-web-client",
+    "dgtlmoon/sockpuppetbrowser": "dgtlmoon/sockpuppetbrowser"
+  },
+  "helm_chart_repo_overrides": {
+    "https://charts.goauthentik.io/": "goauthentik/authentik",
+    "https://traefik.github.io/charts": "traefik/traefik-helm-chart",
+    "https://kyverno.github.io/kyverno/": "kyverno/kyverno",
+    "https://mysql.github.io/mysql-operator/": "mysql/mysql-operator",
+    "https://cloudnative-pg.github.io/charts": "cloudnative-pg/cloudnative-pg",
+    "https://charts.external-secrets.io": "external-secrets/external-secrets",
+    "https://metallb.github.io/metallb": "metallb/metallb",
+    "https://nextcloud.github.io/helm/": "nextcloud/helm",
+    "https://crowdsecurity.github.io/helm-charts": "crowdsecurity/helm-charts",
+    "https://helm.releases.hashicorp.com": "hashicorp/vault-helm",
+    "https://bitnami-labs.github.io/sealed-secrets": "bitnami-labs/sealed-secrets",
+    "https://grafana.github.io/helm-charts": "grafana/helm-charts",
+    "https://prometheus-community.github.io/helm-charts": "prometheus-community/helm-charts",
+    "https://democratic-csi.github.io/charts/": "democratic-csi/democratic-csi",
+    "https://stakater.github.io/stakater-charts": "stakater/Reloader",
+    "https://topolvm.github.io/pvc-autoresizer": "topolvm/pvc-autoresizer",
+    "https://kubernetes-sigs.github.io/descheduler/": "kubernetes-sigs/descheduler",
+    "https://kubernetes-sigs.github.io/metrics-server/": "kubernetes-sigs/metrics-server",
+    "https://charts.fairwinds.com/stable": "FairwindsOps/goldilocks",
+    "https://helm.ngc.nvidia.com/nvidia": "NVIDIA/gpu-operator",
+    "oci://ghcr.io/woodpecker-ci/helm": "woodpecker-ci/helm",
+    "oci://10.0.20.10:5000/bitnamicharts": "bitnami/charts"
+  },
+  "db_backed_services": {
+    "affine":          { "type": "postgresql", "db_name": "affine", "shared": true },
+    "claude-memory":   { "type": "postgresql", "db_name": "claude_memory", "shared": true },
+    "crowdsec":        { "type": "postgresql", "db_name": "crowdsec", "shared": true },
+    "dawarich":        { "type": "postgresql", "db_name": "dawarich", "shared": true },
+    "health":          { "type": "postgresql", "db_name": "health", "shared": true },
+    "linkwarden":      { "type": "postgresql", "db_name": "linkwarden", "shared": true },
+    "matrix":          { "type": "postgresql", "db_name": "matrix", "shared": true },
+    "n8n":             { "type": "postgresql", "db_name": "n8n", "shared": true },
+    "netbox":          { "type": "postgresql", "db_name": "netbox", "shared": true },
+    "rybbit":          { "type": "postgresql", "db_name": "rybbit", "shared": true },
+    "tandoor":         { "type": "postgresql", "db_name": "tandoor", "shared": true },
+    "technitium":      { "type": "postgresql", "db_name": "technitium", "shared": true },
+    "trading-bot":     { "type": "postgresql", "db_name": "trading_bot", "shared": true },
+    "woodpecker":      { "type": "postgresql", "db_name": "woodpecker", "shared": true },
+    "immich":          { "type": "postgresql", "db_name": "immich", "dedicated": true, "backup_cronjob": "postgresql-backup", "backup_namespace": "immich" },
+    "authentik":       { "type": "postgresql", "dedicated": true, "notes": "Uses PgBouncer, managed by Helm chart" },
+    "hackmd":          { "type": "mysql", "db_name": "codimd", "shared": true },
+    "mailserver":      { "type": "mysql", "db_name": "mailserver", "shared": true },
+    "monitoring":      { "type": "mysql", "db_name": "monitoring", "shared": true, "notes": "Grafana backend" },
+    "nextcloud":       { "type": "mysql", "db_name": "nextcloud", "shared": true },
+    "onlyoffice":      { "type": "mysql", "db_name": "onlyoffice", "shared": true },
+    "paperless-ngx":   { "type": "mysql", "db_name": "paperless_ngx", "shared": true },
+    "phpipam":         { "type": "mysql", "db_name": "phpipam", "shared": true },
+    "real-estate-crawler": { "type": "mysql", "db_name": "wrongmove", "shared": true },
+    "speedtest":       { "type": "mysql", "db_name": "speedtest", "shared": true },
+    "url":             { "type": "mysql", "db_name": "shlink", "shared": true },
+    "vault":           { "type": "mysql", "db_name": "vault", "shared": true }
+  },
+  "backup_infrastructure": {
+    "postgresql": {
+      "cronjob_name": "postgresql-backup",
+      "namespace": "dbaas",
+      "credential_secret": "pg-cluster-superuser",
+      "credential_key": "password",
+      "host": "pg-cluster-rw.dbaas",
+      "backup_pvc": "dbaas-postgresql-backup-host"
+    },
+    "mysql": {
+      "cronjob_name": "mysql-backup",
+      "namespace": "dbaas",
+      "credential_secret": "cluster-secret",
+      "credential_key": "ROOT_PASSWORD",
+      "host": "mysql.dbaas",
+      "backup_pvc": "dbaas-mysql-backup-host"
+    }
+  },
+  "version_jump_always_step": [
+    "authentik",
+    "nextcloud",
+    "immich"
+  ],
+  "auto_detect_rules": {
+    "ghcr.io/{org}/{repo}": "Use org/repo directly, strip -server/-backend suffixes if repo 404s",
+    "docker.io/{org}/{repo}": "Try org/repo on GitHub",
+    "lscr.io/linuxserver/{app}": "Map to linuxserver/docker-{app}",
+    "quay.io/{org}/{repo}": "Try org/repo on GitHub",
+    "registry.gitlab.com/{org}/{repo}": "Try org/repo on GitHub (may be GitLab-only)"
+  },
+  "skip_image_patterns": [
+    "viktorbarzin/*",
+    "registry.viktorbarzin.me/*",
+    "ancamilea/*",
+    "mghee/*",
+    "*postgres*",
+    "*mysql*",
+    "*redis*",
+    "*clickhouse*",
+    "*etcd*",
+    "registry.k8s.io/*",
+    "quay.io/tigera/*",
+    "quay.io/metallb/*",
+    "nvcr.io/*",
+    "reg.kyverno.io/*"
+  ],
+  "breaking_change_keywords": [
+    "breaking",
+    "BREAKING",
+    "migration required",
+    "schema change",
+    "database migration",
+    "manual intervention",
+    "action required",
+    "removed",
+    "deprecated",
+    "renamed",
+    "incompatible"
+  ]
+}
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/.gitignore
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/.gitignore
@ -0,0 +1,53 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+venv/
+.venv/
+ENV/
+
+# Node
+node_modules/
+frontend/dist/
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+
+# OS
+.DS_Store
+._*
+Thumbs.db
+
+# Uploads and outputs (runtime data)
+uploads/
+outputs/
+
+# Environment
+.env
+.env.local
+.env.*.local
+
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/Dockerfile
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/Dockerfile
@ -0,0 +1,35 @@
+FROM viktorbarzin/audiblez:latest
+
+# Install Node.js for building frontend
+RUN apt-get update && \
+    apt-get install -y curl && \
+    curl -fsSL https://deb.nodesource.com/setup_22.x | bash - && \
+    apt-get install -y nodejs && \
+    apt-get clean && \
+    rm -rf /var/lib/apt/lists/*
+
+WORKDIR /app
+
+# Build frontend
+COPY frontend/package.json frontend/package-lock.json* ./frontend/
+WORKDIR /app/frontend
+RUN npm install
+
+COPY frontend/ ./
+RUN npm run build
+
+# Install backend dependencies
+WORKDIR /app/backend
+COPY backend/requirements.txt ./
+RUN pip install --no-cache-dir --break-system-packages -r requirements.txt
+
+COPY backend/ ./
+
+# Copy voice samples
+COPY samples/ /app/samples/
+
+WORKDIR /app/backend
+
+EXPOSE 8000
+
+CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/README.md
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/README.md
@ -0,0 +1,52 @@
+# Audiblez Web UI
+
+Web interface for converting EPUB files to audiobooks using [audiblez](https://github.com/santinic/audiblez).
+
+<img width="1702" height="1145" alt="image" src="https://github.com/user-attachments/assets/ba0f9090-a8e9-4550-9c9b-473058f19cbb" />
+
+## Features
+
+- Upload EPUB files via drag & drop
+- Select from 50+ voices across multiple languages
+- Preview voice samples before converting
+- Real-time progress updates via WebSocket
+- Download completed audiobooks
+
+## Development
+
+### Backend
+
+```bash
+cd backend
+pip install -r requirements.txt
+uvicorn main:app --reload
+```
+
+### Frontend
+
+```bash
+cd frontend
+npm install
+npm run dev
+```
+
+### Voice Samples
+
+Generate voice samples (requires audiblez environment):
+
+```bash
+python generate_samples.py samples/
+```
+
+## Docker Build
+
+```bash
+docker build -t audiblez-web .
+docker run -p 8000:8000 -v /path/to/data:/mnt audiblez-web
+```
+
+## Deployment
+
+Deployed to Kubernetes via Terraform. The service mounts NFS storage at `/mnt` for:
+- `/mnt/uploads` - Uploaded EPUB files
+- `/mnt/outputs` - Generated audiobooks
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/api/init.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/api/init.py
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/api/auth.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/api/auth.py
@ -0,0 +1,104 @@
+"""
+Authentication module for extracting user identity from Authentik headers.
+
+When nginx ingress is protected with Authentik, these headers are forwarded:
+- X-Authentik-Username: The user's username
+- X-Authentik-Uid: Unique user ID (used for directory separation)
+- X-Authentik-Email: User's email
+- X-Authentik-Name: User's display name
+- X-Authentik-Groups: Comma-separated group list
+"""
+
+from dataclasses import dataclass
+from fastapi import Request, HTTPException
+from typing import Optional
+import re
+
+
+@dataclass
+class User:
+    """Represents an authenticated user from Authentik."""
+    uid: str
+    username: str
+    email: Optional[str] = None
+    name: Optional[str] = None
+    groups: list[str] = None
+
+    def __post_init__(self):
+        if self.groups is None:
+            self.groups = []
+
+
+def sanitize_user_id(uid: str) -> str:
+    """
+    Sanitize user ID for use as a directory name.
+    Only allows alphanumeric, hyphens, and underscores.
+    """
+    if not uid:
+        raise ValueError("User ID cannot be empty")
+
+    # Only allow safe characters for filesystem
+    safe_uid = re.sub(r'[^a-zA-Z0-9\-_]', '', uid)
+
+    if not safe_uid:
+        raise ValueError("User ID contains no valid characters")
+
+    # Limit length to prevent path issues
+    if len(safe_uid) > 64:
+        safe_uid = safe_uid[:64]
+
+    return safe_uid
+
+
+async def get_current_user(request: Request) -> User:
+    """
+    Extract user information from Authentik headers.
+
+    This is a FastAPI dependency that should be used on protected endpoints.
+    Raises 401 if user headers are not present (not authenticated).
+    """
+    # Header names are case-insensitive, but commonly forwarded as:
+    uid = request.headers.get("X-Authentik-Uid")
+    username = request.headers.get("X-Authentik-Username")
+    email = request.headers.get("X-Authentik-Email")
+    name = request.headers.get("X-Authentik-Name")
+    groups_str = request.headers.get("X-Authentik-Groups", "")
+
+    # For development/testing, check for alternative header names
+    if not uid:
+        uid = request.headers.get("X-Authentik-Userid")
+    if not uid:
+        uid = request.headers.get("Remote-User")
+
+    if not uid or not username:
+        raise HTTPException(
+            status_code=401,
+            detail="Authentication required. Authentik headers not found."
+        )
+
+    try:
+        safe_uid = sanitize_user_id(uid)
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+
+    # Parse groups (comma-separated)
+    groups = [g.strip() for g in groups_str.split(",") if g.strip()]
+
+    return User(
+        uid=safe_uid,
+        username=username,
+        email=email,
+        name=name,
+        groups=groups
+    )
+
+
+async def get_optional_user(request: Request) -> Optional[User]:
+    """
+    Extract user information if available, or return None.
+    Use this for endpoints that work with or without authentication.
+    """
+    try:
+        return await get_current_user(request)
+    except HTTPException:
+        return None
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/api/routes.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/api/routes.py
@ -0,0 +1,307 @@
+from fastapi import APIRouter, UploadFile, File, HTTPException, Depends
+from fastapi.responses import FileResponse
+from pydantic import BaseModel
+from pathlib import Path
+import shutil
+import asyncio
+import re
+
+from models.schemas import Voice, JobCreate, Job, JobProgress, ChapterInfo
+from services.voices import get_all_voices, get_voices_by_language, get_voice
+from services.converter import job_manager
+from api.auth import User, get_current_user
+
+router = APIRouter(prefix="/api")
+
+
+def sanitize_filename(filename: str, max_length: int = 200) -> str:
+    """
+    Sanitize a filename to prevent path traversal and shell injection.
+    Only allows alphanumeric characters, spaces, hyphens, underscores, parentheses, and dots.
+    """
+    if not filename:
+        raise ValueError("Filename cannot be empty")
+
+    # Remove any path components (prevent path traversal)
+    filename = Path(filename).name
+
+    # Only allow safe characters: alphanumeric, space, hyphen, underscore, parentheses, dot
+    # This regex removes anything that isn't in the allowed set
+    safe_filename = re.sub(r'[^a-zA-Z0-9\s\-_().]', '', filename)
+
+    # Collapse multiple spaces/dots
+    safe_filename = re.sub(r'\s+', ' ', safe_filename)
+    safe_filename = re.sub(r'\.+', '.', safe_filename)
+
+    # Strip leading/trailing whitespace and dots
+    safe_filename = safe_filename.strip(' .')
+
+    # Limit length
+    if len(safe_filename) > max_length:
+        safe_filename = safe_filename[:max_length]
+
+    if not safe_filename:
+        raise ValueError("Filename contains no valid characters")
+
+    return safe_filename
+
+
+class RenameRequest(BaseModel):
+    new_name: str
+
+
+# ============================================================================
+# Voice endpoints (no auth required - public info)
+# ============================================================================
+
+@router.get("/voices", response_model=list[Voice])
+async def list_voices():
+    """Get all available voices."""
+    return get_all_voices()
+
+
+@router.get("/voices/grouped")
+async def list_voices_grouped():
+    """Get voices grouped by language."""
+    return get_voices_by_language()
+
+
+@router.get("/voices/{voice_id}/sample")
+async def get_voice_sample(voice_id: str):
+    """Get voice sample audio file."""
+    voice = get_voice(voice_id)
+    if not voice:
+        raise HTTPException(status_code=404, detail="Voice not found")
+
+    # Try NFS storage first (persistent), then bundled samples
+    sample_path = Path("/mnt/samples") / f"{voice_id}.mp3"
+    if not sample_path.exists():
+        sample_path = Path("/app/samples") / f"{voice_id}.mp3"
+    if not sample_path.exists():
+        raise HTTPException(status_code=404, detail="Sample not available")
+
+    return FileResponse(sample_path, media_type="audio/mpeg")
+
+
+# ============================================================================
+# User info endpoint
+# ============================================================================
+
+@router.get("/me")
+async def get_current_user_info(user: User = Depends(get_current_user)):
+    """Get current authenticated user info."""
+    return {
+        "uid": user.uid,
+        "username": user.username,
+        "email": user.email,
+        "name": user.name,
+        "groups": user.groups
+    }
+
+
+# ============================================================================
+# Upload endpoints (user-scoped)
+# ============================================================================
+
+@router.post("/upload")
+async def upload_file(file: UploadFile = File(...), user: User = Depends(get_current_user)):
+    """Upload an EPUB file to user's directory."""
+    if not file.filename.endswith(".epub"):
+        raise HTTPException(status_code=400, detail="Only EPUB files are supported")
+
+    # Save file to user's uploads directory
+    upload_dir = job_manager.get_user_uploads_dir(user.uid)
+
+    # Sanitize the filename
+    try:
+        safe_filename = sanitize_filename(file.filename)
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+
+    file_path = upload_dir / safe_filename
+
+    with file_path.open("wb") as buffer:
+        shutil.copyfileobj(file.file, buffer)
+
+    return {"filename": safe_filename, "size": file_path.stat().st_size}
+
+
+# ============================================================================
+# Job endpoints (user-scoped)
+# ============================================================================
+
+@router.post("/jobs", response_model=Job)
+async def create_job(job_create: JobCreate, user: User = Depends(get_current_user)):
+    """Create a new conversion job."""
+    # Verify file exists in user's uploads
+    file_path = job_manager.get_user_uploads_dir(user.uid) / job_create.filename
+    if not file_path.exists():
+        raise HTTPException(status_code=404, detail="File not found")
+
+    # Verify voice exists
+    voice = get_voice(job_create.voice)
+    if not voice:
+        raise HTTPException(status_code=404, detail="Voice not found")
+
+    # Create job with user ownership
+    job = job_manager.create_job(
+        user_id=user.uid,
+        filename=job_create.filename,
+        voice=job_create.voice,
+        speed=job_create.speed,
+        use_gpu=job_create.use_gpu
+    )
+
+    # Start conversion in background
+    asyncio.create_task(job_manager.run_conversion(job.id))
+
+    return job
+
+
+@router.get("/jobs", response_model=list[Job])
+async def list_jobs(user: User = Depends(get_current_user)):
+    """Get all jobs for current user."""
+    return job_manager.get_user_jobs(user.uid)
+
+
+@router.get("/jobs/{job_id}", response_model=Job)
+async def get_job(job_id: str, user: User = Depends(get_current_user)):
+    """Get a specific job (must be owned by user)."""
+    job = job_manager.get_job(job_id, user.uid)
+    if not job:
+        raise HTTPException(status_code=404, detail="Job not found")
+    return job
+
+
+@router.get("/jobs/{job_id}/download")
+async def download_job(job_id: str, user: User = Depends(get_current_user)):
+    """Download the completed audiobook."""
+    job = job_manager.get_job(job_id, user.uid)
+    if not job:
+        raise HTTPException(status_code=404, detail="Job not found")
+
+    if job.status != "completed":
+        raise HTTPException(status_code=400, detail="Job not completed")
+
+    if not job.output_file:
+        raise HTTPException(status_code=404, detail="Output file not found")
+
+    output_path = job_manager.get_user_outputs_dir(user.uid) / job_id / job.output_file
+    if not output_path.exists():
+        raise HTTPException(status_code=404, detail="Output file not found")
+
+    return FileResponse(
+        output_path,
+        media_type="audio/mp4",
+        filename=job.output_file
+    )
+
+
+@router.get("/jobs/{job_id}/chapters", response_model=list[ChapterInfo])
+async def get_job_chapters(job_id: str, user: User = Depends(get_current_user)):
+    """Get chapter metadata for a job's audiobook."""
+    job = job_manager.get_job(job_id, user.uid)
+    if not job:
+        raise HTTPException(status_code=404, detail="Job not found")
+
+    if job.status != "completed":
+        raise HTTPException(status_code=400, detail="Job not completed")
+
+    return job.chapters
+
+
+@router.delete("/jobs/{job_id}")
+async def delete_job(job_id: str, user: User = Depends(get_current_user)):
+    """Delete a job (must be owned by user)."""
+    if not job_manager.delete_job(job_id, user.uid):
+        raise HTTPException(status_code=404, detail="Job not found")
+
+    return {"status": "deleted"}
+
+
+# ============================================================================
+# Audiobook endpoints (user-scoped)
+# ============================================================================
+
+@router.get("/audiobooks")
+async def list_audiobooks(user: User = Depends(get_current_user)):
+    """List all completed audiobooks for current user."""
+    return job_manager.get_user_audiobooks(user.uid)
+
+
+@router.get("/audiobooks/{audiobook_id}/download")
+async def download_audiobook(audiobook_id: str, user: User = Depends(get_current_user)):
+    """Download an audiobook by its ID (job folder name)."""
+    output_dir = job_manager.get_user_outputs_dir(user.uid) / audiobook_id
+
+    if not output_dir.exists():
+        raise HTTPException(status_code=404, detail="Audiobook not found")
+
+    # Find the audio file
+    audio_files = list(output_dir.glob("*.m4b")) + list(output_dir.glob("*.mp3"))
+    if not audio_files:
+        raise HTTPException(status_code=404, detail="Audio file not found")
+
+    audio_file = audio_files[0]
+    media_type = "audio/mp4" if audio_file.suffix == ".m4b" else "audio/mpeg"
+
+    return FileResponse(
+        audio_file,
+        media_type=media_type,
+        filename=audio_file.name
+    )
+
+
+@router.delete("/audiobooks/{audiobook_id}")
+async def delete_audiobook(audiobook_id: str, user: User = Depends(get_current_user)):
+    """Delete an audiobook and its folder."""
+    output_dir = job_manager.get_user_outputs_dir(user.uid) / audiobook_id
+
+    if not output_dir.exists():
+        raise HTTPException(status_code=404, detail="Audiobook not found")
+
+    # Delete all files in the directory and the directory itself
+    for file in output_dir.iterdir():
+        file.unlink()
+    output_dir.rmdir()
+
+    return {"status": "deleted"}
+
+
+@router.patch("/audiobooks/{audiobook_id}/rename")
+async def rename_audiobook(audiobook_id: str, rename_request: RenameRequest, user: User = Depends(get_current_user)):
+    """Rename an audiobook file. Input is sanitized to prevent path traversal and injection."""
+    output_dir = job_manager.get_user_outputs_dir(user.uid) / audiobook_id
+
+    if not output_dir.exists():
+        raise HTTPException(status_code=404, detail="Audiobook not found")
+
+    # Find the audio file
+    audio_files = list(output_dir.glob("*.m4b")) + list(output_dir.glob("*.mp3"))
+    if not audio_files:
+        raise HTTPException(status_code=404, detail="Audio file not found")
+
+    current_file = audio_files[0]
+    current_extension = current_file.suffix  # .m4b or .mp3
+
+    # Sanitize the new name
+    try:
+        safe_name = sanitize_filename(rename_request.new_name)
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+
+    # Ensure the new name has the correct extension
+    if not safe_name.lower().endswith(current_extension.lower()):
+        safe_name = safe_name + current_extension
+
+    # Create the new path (same directory, new filename)
+    new_file = output_dir / safe_name
+
+    # Check if target already exists
+    if new_file.exists() and new_file != current_file:
+        raise HTTPException(status_code=400, detail="A file with that name already exists")
+
+    # Rename the file using pathlib (no shell commands)
+    current_file.rename(new_file)
+
+    return {"status": "renamed", "new_filename": safe_name}
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/api/websocket.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/api/websocket.py
@ -0,0 +1,101 @@
+from fastapi import WebSocket, WebSocketDisconnect, HTTPException
+from services.converter import job_manager
+from models.schemas import JobProgress
+from api.auth import sanitize_user_id
+
+
+class ConnectionManager:
+    """Manages WebSocket connections for job progress updates."""
+
+    def __init__(self):
+        self.active_connections: dict[str, list[WebSocket]] = {}
+
+    async def connect(self, job_id: str, websocket: WebSocket):
+        """Connect a websocket for a specific job."""
+        await websocket.accept()
+
+        if job_id not in self.active_connections:
+            self.active_connections[job_id] = []
+        self.active_connections[job_id].append(websocket)
+
+    def disconnect(self, job_id: str, websocket: WebSocket):
+        """Disconnect a websocket."""
+        if job_id in self.active_connections:
+            if websocket in self.active_connections[job_id]:
+                self.active_connections[job_id].remove(websocket)
+            if not self.active_connections[job_id]:
+                del self.active_connections[job_id]
+
+    async def send_progress(self, job_id: str, progress: JobProgress):
+        """Send progress update to all connected clients for a job."""
+        if job_id in self.active_connections:
+            disconnected = []
+            for connection in self.active_connections[job_id]:
+                try:
+                    await connection.send_json(progress.model_dump())
+                except:
+                    disconnected.append(connection)
+
+            # Remove disconnected clients
+            for conn in disconnected:
+                self.disconnect(job_id, conn)
+
+
+manager = ConnectionManager()
+
+
+def get_user_from_websocket(websocket: WebSocket) -> str | None:
+    """
+    Extract user ID from websocket headers.
+    WebSocket connections receive HTTP headers during the upgrade handshake.
+    """
+    # Try various header name formats
+    uid = websocket.headers.get("x-authentik-uid")
+    if not uid:
+        uid = websocket.headers.get("X-Authentik-Uid")
+    if not uid:
+        uid = websocket.headers.get("x-authentik-userid")
+    if not uid:
+        uid = websocket.headers.get("remote-user")
+
+    if uid:
+        try:
+            return sanitize_user_id(uid)
+        except ValueError:
+            return None
+    return None
+
+
+async def websocket_endpoint(websocket: WebSocket, job_id: str):
+    """WebSocket endpoint for job progress updates."""
+    # Extract user from headers
+    user_id = get_user_from_websocket(websocket)
+
+    # Verify job exists and user has access
+    job = job_manager.get_job(job_id, user_id)
+    if not job:
+        # Close connection if job not found or not owned by user
+        await websocket.close(code=4004, reason="Job not found or access denied")
+        return
+
+    await manager.connect(job_id, websocket)
+
+    # Register progress callback
+    async def progress_callback(progress: JobProgress):
+        await manager.send_progress(job_id, progress)
+
+    job_manager.register_progress_callback(job_id, progress_callback)
+
+    try:
+        # Send initial status
+        await websocket.send_json({
+            "progress": job.progress,
+            "status": job.status,
+        })
+
+        # Wait for messages (keep-alive)
+        while True:
+            await websocket.receive_text()
+    except WebSocketDisconnect:
+        manager.disconnect(job_id, websocket)
+        job_manager.unregister_progress_callback(job_id, progress_callback)
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/main.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/main.py
@ -0,0 +1,41 @@
+from fastapi import FastAPI
+from fastapi.staticfiles import StaticFiles
+from fastapi.middleware.cors import CORSMiddleware
+from pathlib import Path
+
+from api.routes import router
+from api.websocket import websocket_endpoint
+
+app = FastAPI(title="Audiblez Web API", version="1.0.0")
+
+# CORS middleware for development
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+
+# Health check - must be before static mount
+@app.get("/health")
+async def health_check():
+    return {"status": "healthy"}
+
+# Include API routes
+app.include_router(router)
+
+# WebSocket endpoint
+@app.websocket("/ws/jobs/{job_id}")
+async def websocket_route(websocket, job_id: str):
+    await websocket_endpoint(websocket, job_id)
+
+# Serve static frontend files - MUST BE LAST as it catches all routes
+static_dir = Path("/app/frontend/dist")
+if static_dir.exists():
+    app.mount("/", StaticFiles(directory=str(static_dir), html=True), name="static")
+
+
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/models/init.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/models/init.py
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/models/schemas.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/models/schemas.py
@ -0,0 +1,59 @@
+from pydantic import BaseModel, Field
+from typing import Optional, Literal
+from datetime import datetime
+from enum import Enum
+
+
+class JobStatus(str, Enum):
+    PENDING = "pending"
+    PROCESSING = "processing"
+    COMPLETED = "completed"
+    FAILED = "failed"
+    CANCELLED = "cancelled"
+
+
+class Voice(BaseModel):
+    id: str
+    name: str
+    language: str
+    gender: Literal["M", "F"]
+    quality: str = "medium"
+
+
+class JobCreate(BaseModel):
+    filename: str
+    voice: str
+    speed: float = Field(default=1.0, ge=0.5, le=2.0)
+    use_gpu: bool = True
+
+
+class ChapterInfo(BaseModel):
+    """Chapter metadata extracted from EPUB and embedded in M4B."""
+    title: str
+    start_ms: int
+    end_ms: int
+
+
+class JobProgress(BaseModel):
+    progress: float = Field(ge=0, le=100)
+    eta: Optional[str] = None
+    current_chapter: Optional[str] = None
+    total_chapters: Optional[int] = None
+    status: JobStatus
+
+
+class Job(BaseModel):
+    id: str
+    user_id: str  # User who owns this job
+    filename: str
+    voice: str
+    speed: float
+    use_gpu: bool
+    status: JobStatus
+    progress: float = 0
+    created_at: datetime
+    updated_at: datetime
+    error: Optional[str] = None
+    output_file: Optional[str] = None
+    total_chapters: int = 0
+    chapters: list[ChapterInfo] = []
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/requirements.txt
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/requirements.txt
@ -0,0 +1,11 @@
+fastapi==0.115.0
+uvicorn[standard]==0.32.0
+python-multipart==0.0.12
+websockets==13.1
+aiofiles==24.1.0
+pydantic==2.9.2
+pydantic-settings==2.6.0
+ebooklib>=0.18
+pydub>=0.25.1
+beautifulsoup4>=4.12.0
+lxml>=5.0.0
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/services/init.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/services/init.py
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/services/chapter_embedder.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/services/chapter_embedder.py
@ -0,0 +1,156 @@
+"""M4B chapter metadata embedding service."""
+
+import re
+import subprocess
+import tempfile
+from pathlib import Path
+
+from pydub import AudioSegment
+
+from .epub_parser import Chapter
+
+
+def get_chapter_audio_durations(output_dir: Path) -> list[int]:
+    """Calculate duration of each chapter WAV file in milliseconds.
+
+    audiblez produces files like: {bookname}_chapter_{N}.wav
+    e.g., mybook_chapter_1.wav, mybook_chapter_2.wav
+
+    Args:
+        output_dir: Directory containing the WAV files
+
+    Returns:
+        List of durations in milliseconds, ordered by chapter number
+    """
+    durations = []
+
+    # Find all chapter WAV files - audiblez uses {name}_chapter_{N}.wav
+    wav_files = list(output_dir.glob("*_chapter_*.wav"))
+
+    if not wav_files:
+        # Fallback: try any WAV files
+        wav_files = list(output_dir.glob("*.wav"))
+
+    if not wav_files:
+        print(f"No WAV files found in {output_dir}")
+        return durations
+
+    # Sort by extracting chapter number from filename using regex
+    # Pattern: look for _chapter_N or chapter_N in filename
+    def extract_chapter_num(path: Path) -> int:
+        name = path.stem
+        # Try to find chapter number with regex - handles various patterns
+        # e.g., "book_chapter_1", "mybook_chapter_12", "chapter_3_voice"
+        match = re.search(r'chapter[_-]?(\d+)', name, re.IGNORECASE)
+        if match:
+            return int(match.group(1))
+        # Fallback: find any number in the filename
+        match = re.search(r'(\d+)', name)
+        if match:
+            return int(match.group(1))
+        return 0
+
+    wav_files.sort(key=extract_chapter_num)
+
+    print(f"Found {len(wav_files)} WAV files to process for durations")
+    for wav_file in wav_files:
+        try:
+            audio = AudioSegment.from_file(str(wav_file))
+            durations.append(len(audio))  # duration in ms
+            print(f"  Chapter WAV: {wav_file.name} - {len(audio)}ms ({len(audio)/1000:.1f}s)")
+        except Exception as e:
+            print(f"  Error reading {wav_file}: {e}")
+            continue
+
+    return durations
+
+
+def generate_ffmpeg_metadata(chapters: list[Chapter], durations: list[int]) -> str:
+    """Generate FFmpeg FFMETADATA1 format string with chapter markers.
+
+    Args:
+        chapters: List of Chapter objects with titles
+        durations: List of durations in milliseconds for each chapter
+
+    Returns:
+        FFMETADATA1 formatted string
+    """
+    metadata = ";FFMETADATA1\n"
+
+    current_time_ms = 0
+
+    # Match chapters with durations
+    num_chapters = min(len(chapters), len(durations))
+
+    for i in range(num_chapters):
+        chapter = chapters[i]
+        duration = durations[i]
+
+        chapter.start_ms = current_time_ms
+        chapter.end_ms = current_time_ms + duration
+        chapter.duration_ms = duration
+
+        metadata += f"\n[CHAPTER]\n"
+        metadata += f"TIMEBASE=1/1000\n"
+        metadata += f"START={chapter.start_ms}\n"
+        metadata += f"END={chapter.end_ms}\n"
+        metadata += f"title={chapter.title}\n"
+
+        current_time_ms = chapter.end_ms
+
+    return metadata
+
+
+def embed_chapters_in_m4b(input_m4b: Path, metadata_content: str) -> Path:
+    """Re-mux M4B with chapter metadata using FFmpeg.
+
+    Args:
+        input_m4b: Path to the input M4B file
+        metadata_content: FFMETADATA1 formatted string
+
+    Returns:
+        Path to the output M4B with chapters (same as input, replaced)
+    """
+    output_m4b = input_m4b.with_suffix('.chaptered.m4b')
+
+    # Write metadata to temporary file
+    with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False) as f:
+        f.write(metadata_content)
+        metadata_file = Path(f.name)
+
+    try:
+        cmd = [
+            'ffmpeg', '-y',
+            '-i', str(input_m4b),
+            '-f', 'ffmetadata', '-i', str(metadata_file),
+            '-map', '0:a',
+            '-map_metadata', '1',
+            '-c:a', 'copy',  # Copy audio without re-encoding
+            '-movflags', '+faststart+use_metadata_tags',
+            str(output_m4b)
+        ]
+
+        print(f"Running FFmpeg: {' '.join(cmd)}")
+        result = subprocess.run(cmd, check=True, capture_output=True, text=True)
+
+        if result.returncode != 0:
+            print(f"FFmpeg stderr: {result.stderr}")
+            raise RuntimeError(f"FFmpeg failed: {result.stderr}")
+
+        # Replace original with chaptered version
+        input_m4b.unlink()
+        output_m4b.rename(input_m4b)
+
+        print(f"Successfully embedded chapters in {input_m4b}")
+        return input_m4b
+
+    except subprocess.CalledProcessError as e:
+        print(f"FFmpeg error: {e.stderr}")
+        # Clean up temp file
+        if output_m4b.exists():
+            output_m4b.unlink()
+        raise
+    finally:
+        # Clean up metadata file
+        if metadata_file.exists():
+            metadata_file.unlink()
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/services/converter.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/services/converter.py
@ -0,0 +1,291 @@
+import asyncio
+import uuid
+import os
+from datetime import datetime
+from pathlib import Path
+from typing import Callable, Optional
+import subprocess
+import json
+
+from models.schemas import Job, JobStatus, JobProgress, ChapterInfo
+from services.epub_parser import extract_chapters, Chapter
+from services.chapter_embedder import (
+    get_chapter_audio_durations,
+    generate_ffmpeg_metadata,
+    embed_chapters_in_m4b
+)
+
+
+class JobManager:
+    """Manages conversion jobs and their state with user isolation."""
+
+    def __init__(self, storage_path: str = "/mnt"):
+        self.storage_path = Path(storage_path)
+        self.jobs: dict[str, Job] = {}
+        self.progress_callbacks: dict[str, list[Callable]] = {}
+
+    def get_user_uploads_dir(self, user_id: str) -> Path:
+        """Get the uploads directory for a specific user."""
+        user_dir = self.storage_path / "users" / user_id / "uploads"
+        user_dir.mkdir(parents=True, exist_ok=True)
+        return user_dir
+
+    def get_user_outputs_dir(self, user_id: str) -> Path:
+        """Get the outputs directory for a specific user."""
+        user_dir = self.storage_path / "users" / user_id / "outputs"
+        user_dir.mkdir(parents=True, exist_ok=True)
+        return user_dir
+
+    def create_job(self, user_id: str, filename: str, voice: str, speed: float, use_gpu: bool) -> Job:
+        """Create a new conversion job for a user."""
+        job_id = str(uuid.uuid4())
+        now = datetime.now()
+
+        job = Job(
+            id=job_id,
+            user_id=user_id,
+            filename=filename,
+            voice=voice,
+            speed=speed,
+            use_gpu=use_gpu,
+            status=JobStatus.PENDING,
+            created_at=now,
+            updated_at=now,
+        )
+
+        self.jobs[job_id] = job
+        return job
+
+    def get_job(self, job_id: str, user_id: Optional[str] = None) -> Optional[Job]:
+        """Get a job by ID. If user_id is provided, verify ownership."""
+        job = self.jobs.get(job_id)
+        if job and user_id and job.user_id != user_id:
+            return None  # User doesn't own this job
+        return job
+
+    def get_user_jobs(self, user_id: str) -> list[Job]:
+        """Get all jobs for a specific user."""
+        return [job for job in self.jobs.values() if job.user_id == user_id]
+
+    def get_all_jobs(self) -> list[Job]:
+        """Get all jobs (admin use only)."""
+        return list(self.jobs.values())
+
+    def update_job_status(self, job_id: str, status: JobStatus, error: Optional[str] = None):
+        """Update job status."""
+        if job_id in self.jobs:
+            self.jobs[job_id].status = status
+            self.jobs[job_id].updated_at = datetime.now()
+            if error:
+                self.jobs[job_id].error = error
+
+    def update_job_progress(self, job_id: str, progress: float, current_chapter: Optional[str] = None, eta: Optional[str] = None):
+        """Update job progress."""
+        if job_id in self.jobs:
+            self.jobs[job_id].progress = progress
+            self.jobs[job_id].updated_at = datetime.now()
+
+            # Notify callbacks
+            if job_id in self.progress_callbacks:
+                progress_data = JobProgress(
+                    progress=progress,
+                    eta=eta,
+                    current_chapter=current_chapter,
+                    status=self.jobs[job_id].status
+                )
+                for callback in self.progress_callbacks[job_id]:
+                    try:
+                        asyncio.create_task(callback(progress_data))
+                    except Exception as e:
+                        print(f"Error in progress callback: {e}")
+
+    def register_progress_callback(self, job_id: str, callback: Callable):
+        """Register a callback for progress updates."""
+        if job_id not in self.progress_callbacks:
+            self.progress_callbacks[job_id] = []
+        self.progress_callbacks[job_id].append(callback)
+
+    def unregister_progress_callback(self, job_id: str, callback: Callable):
+        """Unregister a progress callback."""
+        if job_id in self.progress_callbacks:
+            self.progress_callbacks[job_id].remove(callback)
+            if not self.progress_callbacks[job_id]:
+                del self.progress_callbacks[job_id]
+
+    async def run_conversion(self, job_id: str):
+        """Run the audiblez conversion in the background."""
+        job = self.jobs.get(job_id)
+        if not job:
+            return
+
+        try:
+            self.update_job_status(job_id, JobStatus.PROCESSING)
+
+            # Prepare user-specific paths
+            input_path = self.get_user_uploads_dir(job.user_id) / job.filename
+            output_dir = self.get_user_outputs_dir(job.user_id) / job_id
+            output_dir.mkdir(parents=True, exist_ok=True)
+
+            # Extract chapters from EPUB before conversion
+            chapters: list[Chapter] = []
+            if input_path.suffix.lower() == '.epub':
+                chapters = extract_chapters(input_path)
+                self.jobs[job_id].total_chapters = len(chapters)
+                print(f"Extracted {len(chapters)} chapters from EPUB")
+
+            # Build audiblez command - use the venv python/audiblez
+            cmd = [
+                "/app/audiblez/bin/audiblez",
+                str(input_path),
+                "-o", str(output_dir),
+                "-v", job.voice,
+                "-s", str(job.speed),
+            ]
+
+            if job.use_gpu:
+                cmd.append("-c")  # --cuda flag for GPU
+
+            # Run conversion
+            process = await asyncio.create_subprocess_exec(
+                *cmd,
+                stdout=asyncio.subprocess.PIPE,
+                stderr=asyncio.subprocess.PIPE,
+                cwd=str(self.storage_path)
+            )
+
+            # Monitor progress from stderr/stdout
+            async def read_stream(stream, is_stderr=False):
+                while True:
+                    line = await stream.readline()
+                    if not line:
+                        break
+
+                    line_str = line.decode().strip()
+                    print(f"{'[STDERR]' if is_stderr else '[STDOUT]'} {line_str}")
+
+                    # Parse progress from output
+                    # Audiblez outputs progress in various formats
+                    # We'll do a simple pattern match
+                    if "%" in line_str:
+                        try:
+                            # Try to extract percentage
+                            parts = line_str.split("%")
+                            if len(parts) > 1:
+                                # Find the number before %
+                                num_str = parts[0].split()[-1]
+                                progress = float(num_str)
+                                self.update_job_progress(job_id, progress)
+                        except:
+                            pass
+
+                    # Check for chapter info
+                    if "Chapter" in line_str or "chapter" in line_str:
+                        self.update_job_progress(
+                            job_id,
+                            job.progress,
+                            current_chapter=line_str
+                        )
+
+            # Read both streams concurrently
+            await asyncio.gather(
+                read_stream(process.stdout),
+                read_stream(process.stderr, is_stderr=True)
+            )
+
+            # Wait for completion
+            returncode = await process.wait()
+
+            if returncode == 0:
+                # Find output file
+                output_files = list(output_dir.glob("*.m4b"))
+                if not output_files:
+                    output_files = list(output_dir.glob("*.mp3"))
+
+                if output_files:
+                    output_file = output_files[0]
+
+                    # Embed chapter metadata if we have chapters and WAV files
+                    if chapters:
+                        try:
+                            durations = get_chapter_audio_durations(output_dir)
+                            print(f"Found {len(durations)} chapter audio durations")
+
+                            if durations:
+                                # Match chapter count with duration count
+                                num_chapters = min(len(chapters), len(durations))
+                                if num_chapters != len(chapters):
+                                    print(f"Warning: chapter count ({len(chapters)}) != duration count ({len(durations)})")
+
+                                metadata = generate_ffmpeg_metadata(chapters[:num_chapters], durations[:num_chapters])
+                                embed_chapters_in_m4b(output_file, metadata)
+
+                                # Store chapter info in job for API access
+                                self.jobs[job_id].chapters = [
+                                    ChapterInfo(
+                                        title=c.title,
+                                        start_ms=c.start_ms,
+                                        end_ms=c.end_ms
+                                    )
+                                    for c in chapters[:num_chapters]
+                                ]
+                                print(f"Embedded {num_chapters} chapters in M4B")
+                            else:
+                                print("No WAV files found for chapter duration calculation")
+                        except Exception as e:
+                            print(f"Failed to embed chapters: {e}")
+                            # Continue without chapter embedding - non-fatal error
+
+                    self.jobs[job_id].output_file = str(output_file.name)
+                    self.update_job_status(job_id, JobStatus.COMPLETED)
+                    self.update_job_progress(job_id, 100.0)
+                else:
+                    self.update_job_status(job_id, JobStatus.FAILED, "No output file generated")
+            else:
+                self.update_job_status(job_id, JobStatus.FAILED, f"Conversion failed with code {returncode}")
+
+        except Exception as e:
+            print(f"Conversion error: {e}")
+            self.update_job_status(job_id, JobStatus.FAILED, str(e))
+
+    def delete_job(self, job_id: str, user_id: str) -> bool:
+        """Delete a job and its output files."""
+        job = self.get_job(job_id, user_id)
+        if not job:
+            return False
+
+        # Delete output directory if exists
+        output_dir = self.get_user_outputs_dir(user_id) / job_id
+        if output_dir.exists():
+            import shutil
+            shutil.rmtree(output_dir)
+
+        # Remove from jobs dict
+        del self.jobs[job_id]
+        return True
+
+    def get_user_audiobooks(self, user_id: str) -> list[dict]:
+        """List all completed audiobooks for a user."""
+        outputs_dir = self.get_user_outputs_dir(user_id)
+        audiobooks = []
+
+        if outputs_dir.exists():
+            for job_dir in outputs_dir.iterdir():
+                if job_dir.is_dir():
+                    # Look for m4b or mp3 files
+                    audio_files = list(job_dir.glob("*.m4b")) + list(job_dir.glob("*.mp3"))
+                    for audio_file in audio_files:
+                        stat = audio_file.stat()
+                        audiobooks.append({
+                            "id": job_dir.name,
+                            "filename": audio_file.name,
+                            "size": stat.st_size,
+                            "created_at": stat.st_mtime,
+                        })
+
+        # Sort by creation time, newest first
+        audiobooks.sort(key=lambda x: x["created_at"], reverse=True)
+        return audiobooks
+
+
+# Global job manager instance
+job_manager = JobManager()
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/services/epub_parser.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/services/epub_parser.py
@ -0,0 +1,166 @@
+"""EPUB chapter extraction service.
+
+This parser attempts to match audiblez's chapter detection logic to ensure
+the extracted chapters align with the WAV files audiblez produces.
+
+audiblez iterates through EPUB ITEM_DOCUMENTs and uses is_chapter() to determine
+if a document is a chapter based on content length (100+ chars) and filename patterns.
+"""
+
+import re
+from dataclasses import dataclass
+from pathlib import Path
+
+from bs4 import BeautifulSoup
+from ebooklib import epub, ITEM_DOCUMENT
+
+
+@dataclass
+class Chapter:
+    """Represents a chapter extracted from an EPUB."""
+    title: str
+    index: int
+    duration_ms: int = 0
+    start_ms: int = 0
+    end_ms: int = 0
+
+
+def sanitize_title(title: str) -> str:
+    """Remove characters that break FFmpeg metadata format."""
+    if not title:
+        return "Untitled"
+    # Escape special chars for FFmpeg FFMETADATA format
+    return (title
+            .replace('=', '-')
+            .replace(';', '-')
+            .replace('#', '')
+            .replace('\\', '')
+            .replace('\n', ' ')
+            .replace('\r', '')
+            .strip())
+
+
+def is_chapter(text: str, filename: str) -> bool:
+    """Determine if a document is a chapter.
+
+    Matches audiblez's is_chapter() logic:
+    - Content must be over 100 characters
+    - Filename should match common chapter patterns
+    """
+    if len(text) < 100:
+        return False
+
+    # Check filename patterns that indicate a chapter
+    filename_lower = filename.lower()
+    chapter_patterns = [
+        r'chapter',
+        r'part[_-]?\d+',
+        r'split[_-]?\d+',
+        r'ch[_-]?\d+',
+        r'chap[_-]?\d+',
+        r'sect',          # section
+        r'content',
+        r'text',
+    ]
+
+    for pattern in chapter_patterns:
+        if re.search(pattern, filename_lower):
+            return True
+
+    # If content is substantial (1000+ chars), likely a chapter even without pattern match
+    if len(text) > 1000:
+        return True
+
+    return False
+
+
+def extract_title_from_content(soup: BeautifulSoup, filename: str, index: int) -> str:
+    """Extract a chapter title from the document content."""
+    # Try to find title in common heading tags
+    for tag in ['title', 'h1', 'h2', 'h3']:
+        element = soup.find(tag)
+        if element and element.get_text(strip=True):
+            title = element.get_text(strip=True)
+            # Truncate long titles
+            if len(title) > 100:
+                title = title[:97] + "..."
+            return title
+
+    # Fallback: use filename without extension
+    stem = Path(filename).stem
+    # Clean up common patterns
+    stem = re.sub(r'^(chapter|chap|ch)[_-]?', 'Chapter ', stem, flags=re.IGNORECASE)
+    stem = re.sub(r'[_-]', ' ', stem)
+
+    if stem and len(stem) < 50:
+        return stem.title()
+
+    return f"Chapter {index + 1}"
+
+
+def extract_chapters(epub_path: Path) -> list[Chapter]:
+    """Extract chapter titles matching audiblez's chapter detection logic.
+
+    audiblez determines chapters by:
+    1. Iterating through ITEM_DOCUMENT items
+    2. Checking is_chapter() based on content length and filename patterns
+
+    This ensures our chapter count matches the WAV files audiblez produces.
+
+    Args:
+        epub_path: Path to the EPUB file
+
+    Returns:
+        List of Chapter objects with title and index
+    """
+    try:
+        book = epub.read_epub(str(epub_path))
+    except Exception as e:
+        print(f"Failed to read EPUB: {e}")
+        return []
+
+    chapters: list[Chapter] = []
+    chapter_index = 0
+
+    # Iterate through documents like audiblez does
+    for item in book.get_items():
+        if item.get_type() != ITEM_DOCUMENT:
+            continue
+
+        try:
+            # Get content and parse with BeautifulSoup
+            content = item.get_content()
+            soup = BeautifulSoup(content, features='lxml')
+
+            # Extract text from relevant tags (matching audiblez)
+            text_parts = []
+            for tag in soup.find_all(['title', 'p', 'h1', 'h2', 'h3', 'h4', 'li']):
+                text = tag.get_text(strip=True)
+                if text:
+                    text_parts.append(text)
+
+            full_text = ' '.join(text_parts)
+            filename = item.get_name() or ""
+
+            # Check if this document is a chapter
+            if is_chapter(full_text, filename):
+                title = extract_title_from_content(soup, filename, chapter_index)
+                chapters.append(Chapter(
+                    title=sanitize_title(title),
+                    index=chapter_index
+                ))
+                chapter_index += 1
+
+        except Exception as e:
+            print(f"Error processing document {item.get_name()}: {e}")
+            continue
+
+    print(f"Extracted {len(chapters)} chapters from EPUB (audiblez-style detection)")
+
+    # Debug: print first few chapters
+    for i, ch in enumerate(chapters[:5]):
+        print(f"  {i+1}. {ch.title}")
+    if len(chapters) > 5:
+        print(f"  ... and {len(chapters) - 5} more")
+
+    return chapters
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/backend/services/voices.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/backend/services/voices.py
@ -0,0 +1,97 @@
+from models.schemas import Voice
+
+# Voice catalog from Kokoro-82M (used by audiblez)
+# https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md
+VOICE_CATALOG = {
+    # American English
+    "af_heart": Voice(id="af_heart", name="Heart", language="American English", gender="F"),
+    "af_alloy": Voice(id="af_alloy", name="Alloy", language="American English", gender="F"),
+    "af_aoede": Voice(id="af_aoede", name="Aoede", language="American English", gender="F"),
+    "af_bella": Voice(id="af_bella", name="Bella", language="American English", gender="F"),
+    "af_jessica": Voice(id="af_jessica", name="Jessica", language="American English", gender="F"),
+    "af_kore": Voice(id="af_kore", name="Kore", language="American English", gender="F"),
+    "af_nicole": Voice(id="af_nicole", name="Nicole", language="American English", gender="F"),
+    "af_nova": Voice(id="af_nova", name="Nova", language="American English", gender="F"),
+    "af_river": Voice(id="af_river", name="River", language="American English", gender="F"),
+    "af_sarah": Voice(id="af_sarah", name="Sarah", language="American English", gender="F"),
+    "af_sky": Voice(id="af_sky", name="Sky", language="American English", gender="F"),
+    "am_adam": Voice(id="am_adam", name="Adam", language="American English", gender="M"),
+    "am_echo": Voice(id="am_echo", name="Echo", language="American English", gender="M"),
+    "am_eric": Voice(id="am_eric", name="Eric", language="American English", gender="M"),
+    "am_fenrir": Voice(id="am_fenrir", name="Fenrir", language="American English", gender="M"),
+    "am_liam": Voice(id="am_liam", name="Liam", language="American English", gender="M"),
+    "am_michael": Voice(id="am_michael", name="Michael", language="American English", gender="M"),
+    "am_onyx": Voice(id="am_onyx", name="Onyx", language="American English", gender="M"),
+    "am_puck": Voice(id="am_puck", name="Puck", language="American English", gender="M"),
+    "am_santa": Voice(id="am_santa", name="Santa", language="American English", gender="M"),
+
+    # British English
+    "bf_alice": Voice(id="bf_alice", name="Alice", language="British English", gender="F"),
+    "bf_emma": Voice(id="bf_emma", name="Emma", language="British English", gender="F"),
+    "bf_isabella": Voice(id="bf_isabella", name="Isabella", language="British English", gender="F"),
+    "bf_lily": Voice(id="bf_lily", name="Lily", language="British English", gender="F"),
+    "bm_daniel": Voice(id="bm_daniel", name="Daniel", language="British English", gender="M"),
+    "bm_fable": Voice(id="bm_fable", name="Fable", language="British English", gender="M"),
+    "bm_george": Voice(id="bm_george", name="George", language="British English", gender="M"),
+    "bm_lewis": Voice(id="bm_lewis", name="Lewis", language="British English", gender="M"),
+
+    # Japanese
+    "jf_alpha": Voice(id="jf_alpha", name="Alpha", language="Japanese", gender="F"),
+    "jf_gongitsune": Voice(id="jf_gongitsune", name="Gongitsune", language="Japanese", gender="F"),
+    "jf_nezumi": Voice(id="jf_nezumi", name="Nezumi", language="Japanese", gender="F"),
+    "jf_tebukuro": Voice(id="jf_tebukuro", name="Tebukuro", language="Japanese", gender="F"),
+    "jm_kumo": Voice(id="jm_kumo", name="Kumo", language="Japanese", gender="M"),
+
+    # Mandarin Chinese
+    "zf_xiaobei": Voice(id="zf_xiaobei", name="Xiaobei", language="Mandarin Chinese", gender="F"),
+    "zf_xiaoni": Voice(id="zf_xiaoni", name="Xiaoni", language="Mandarin Chinese", gender="F"),
+    "zf_xiaoxiao": Voice(id="zf_xiaoxiao", name="Xiaoxiao", language="Mandarin Chinese", gender="F"),
+    "zf_xiaoyi": Voice(id="zf_xiaoyi", name="Xiaoyi", language="Mandarin Chinese", gender="F"),
+    "zm_yunjian": Voice(id="zm_yunjian", name="Yunjian", language="Mandarin Chinese", gender="M"),
+    "zm_yunxi": Voice(id="zm_yunxi", name="Yunxi", language="Mandarin Chinese", gender="M"),
+    "zm_yunxia": Voice(id="zm_yunxia", name="Yunxia", language="Mandarin Chinese", gender="M"),
+    "zm_yunyang": Voice(id="zm_yunyang", name="Yunyang", language="Mandarin Chinese", gender="M"),
+
+    # Spanish
+    "ef_dora": Voice(id="ef_dora", name="Dora", language="Spanish", gender="F"),
+    "em_alex": Voice(id="em_alex", name="Alex", language="Spanish", gender="M"),
+    "em_santa": Voice(id="em_santa", name="Santa", language="Spanish", gender="M"),
+
+    # French
+    "ff_siwis": Voice(id="ff_siwis", name="Siwis", language="French", gender="F"),
+
+    # Hindi
+    "hf_alpha": Voice(id="hf_alpha", name="Alpha", language="Hindi", gender="F"),
+    "hf_beta": Voice(id="hf_beta", name="Beta", language="Hindi", gender="F"),
+    "hm_omega": Voice(id="hm_omega", name="Omega", language="Hindi", gender="M"),
+    "hm_psi": Voice(id="hm_psi", name="Psi", language="Hindi", gender="M"),
+
+    # Italian
+    "if_sara": Voice(id="if_sara", name="Sara", language="Italian", gender="F"),
+    "im_nicola": Voice(id="im_nicola", name="Nicola", language="Italian", gender="M"),
+
+    # Brazilian Portuguese
+    "pf_dora": Voice(id="pf_dora", name="Dora", language="Brazilian Portuguese", gender="F"),
+    "pm_alex": Voice(id="pm_alex", name="Alex", language="Brazilian Portuguese", gender="M"),
+    "pm_santa": Voice(id="pm_santa", name="Santa", language="Brazilian Portuguese", gender="M"),
+}
+
+
+def get_all_voices() -> list[Voice]:
+    """Get all available voices."""
+    return list(VOICE_CATALOG.values())
+
+
+def get_voice(voice_id: str) -> Voice | None:
+    """Get a specific voice by ID."""
+    return VOICE_CATALOG.get(voice_id)
+
+
+def get_voices_by_language() -> dict[str, list[Voice]]:
+    """Get voices grouped by language."""
+    grouped = {}
+    for voice in VOICE_CATALOG.values():
+        if voice.language not in grouped:
+            grouped[voice.language] = []
+        grouped[voice.language].append(voice)
+    return grouped
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/frontend/index.html
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/frontend/index.html
@ -0,0 +1,13 @@
+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="UTF-8" />
+    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
+    <title>Audiblez Web - EPUB to Audiobook</title>
+    <link rel="icon" type="image/svg+xml" href="data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'><text y='.9em' font-size='90'>🎧</text></svg>" />
+  </head>
+  <body>
+    <div id="app"></div>
+    <script type="module" src="/src/main.js"></script>
+  </body>
+</html>
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/frontend/src/App.svelte
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/frontend/src/App.svelte
@ -0,0 +1,279 @@
+<script>
+  import FileUpload from './lib/FileUpload.svelte';
+  import VoicePicker from './lib/VoicePicker.svelte';
+  import JobsList from './lib/JobsList.svelte';
+  import AudiobooksList from './lib/AudiobooksList.svelte';
+  import { jobs } from './stores/jobs.js';
+
+  let uploadedFilename = $state(null);
+  let selectedVoice = $state('af_sky');
+  let speed = $state(1.0);
+  let useGpu = $state(true);
+  let isStarting = $state(false);
+  let error = $state(null);
+  let currentUser = $state(null);
+
+  // Fetch current user on mount
+  $effect(() => {
+    fetchCurrentUser();
+  });
+
+  async function fetchCurrentUser() {
+    try {
+      const response = await fetch('/api/me');
+      if (response.ok) {
+        currentUser = await response.json();
+      }
+    } catch (e) {
+      console.error('Failed to fetch user:', e);
+    }
+  }
+
+  function handleFileUpload(filename) {
+    uploadedFilename = filename;
+  }
+
+  async function startConversion() {
+    if (!uploadedFilename || !selectedVoice) {
+      error = 'Please upload a file and select a voice';
+      return;
+    }
+
+    error = null;
+    isStarting = true;
+
+    try {
+      const response = await fetch('/api/jobs', {
+        method: 'POST',
+        headers: { 'Content-Type': 'application/json' },
+        body: JSON.stringify({
+          filename: uploadedFilename,
+          voice: selectedVoice,
+          speed: speed,
+          use_gpu: useGpu
+        })
+      });
+
+      if (!response.ok) {
+        const data = await response.json();
+        throw new Error(data.detail || 'Failed to start conversion');
+      }
+
+      const job = await response.json();
+      jobs.add(job);
+
+      // Reset form
+      uploadedFilename = null;
+
+    } catch (e) {
+      error = e.message;
+    } finally {
+      isStarting = false;
+    }
+  }
+
+  let canStart = $derived(uploadedFilename && selectedVoice && !isStarting);
+</script>
+
+<main>
+  <header>
+    <div class="header-content">
+      <div>
+        <h1>Audiblez Web</h1>
+        <p class="subtitle">Convert EPUB to Audiobook</p>
+      </div>
+      {#if currentUser}
+        <div class="user-info">
+          <span class="user-name">{currentUser.name || currentUser.username}</span>
+          <span class="user-email">{currentUser.email}</span>
+        </div>
+      {/if}
+    </div>
+  </header>
+
+  <div class="content">
+    <div class="form-section">
+      <div class="upload-section">
+        <FileUpload onUpload={handleFileUpload} />
+      </div>
+
+      <div class="voice-section">
+        <VoicePicker bind:selectedVoice />
+      </div>
+    </div>
+
+    <div class="options-section">
+      <div class="option">
+        <label for="speed">Speed: {speed.toFixed(1)}x</label>
+        <input
+          type="range"
+          id="speed"
+          min="0.5"
+          max="2"
+          step="0.1"
+          bind:value={speed}
+        />
+      </div>
+
+      <div class="option">
+        <label>
+          <input type="checkbox" bind:checked={useGpu} />
+          Use GPU (faster)
+        </label>
+      </div>
+
+      <button
+        class="start-btn"
+        disabled={!canStart}
+        onclick={startConversion}
+      >
+        {#if isStarting}
+          Starting...
+        {:else}
+          Start Conversion
+        {/if}
+      </button>
+
+      {#if error}
+        <p class="error">{error}</p>
+      {/if}
+    </div>
+
+    <JobsList />
+    <AudiobooksList />
+  </div>
+</main>
+
+<style>
+  :global(body) {
+    margin: 0;
+    font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, sans-serif;
+    background: #f5f5f5;
+  }
+
+  main {
+    max-width: 900px;
+    margin: 0 auto;
+    padding: 2rem;
+  }
+
+  header {
+    text-align: center;
+    margin-bottom: 2rem;
+  }
+
+  .header-content {
+    display: flex;
+    justify-content: space-between;
+    align-items: center;
+    text-align: left;
+  }
+
+  .user-info {
+    display: flex;
+    flex-direction: column;
+    align-items: flex-end;
+    padding: 0.5rem 1rem;
+    background: #e8f0fe;
+    border-radius: 8px;
+  }
+
+  .user-name {
+    font-weight: 500;
+    color: #333;
+  }
+
+  .user-email {
+    font-size: 0.75rem;
+    color: #666;
+  }
+
+  h1 {
+    margin: 0;
+    color: #333;
+    font-size: 2rem;
+  }
+
+  .subtitle {
+    color: #666;
+    margin: 0.25rem 0 0;
+  }
+
+  .content {
+    background: white;
+    border-radius: 12px;
+    padding: 1.5rem;
+    box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
+  }
+
+  .form-section {
+    display: grid;
+    grid-template-columns: 1fr 1fr;
+    gap: 1.5rem;
+  }
+
+  @media (max-width: 768px) {
+    .form-section {
+      grid-template-columns: 1fr;
+    }
+  }
+
+  .options-section {
+    margin-top: 1.5rem;
+    padding-top: 1.5rem;
+    border-top: 1px solid #e0e0e0;
+    display: flex;
+    flex-wrap: wrap;
+    gap: 1rem;
+    align-items: center;
+  }
+
+  .option {
+    display: flex;
+    align-items: center;
+    gap: 0.5rem;
+  }
+
+  .option label {
+    font-size: 0.875rem;
+    color: #666;
+  }
+
+  .option input[type="range"] {
+    width: 120px;
+  }
+
+  .option input[type="checkbox"] {
+    width: 16px;
+    height: 16px;
+  }
+
+  .start-btn {
+    margin-left: auto;
+    padding: 0.75rem 1.5rem;
+    background: #4a90d9;
+    color: white;
+    border: none;
+    border-radius: 8px;
+    font-size: 1rem;
+    font-weight: 500;
+    cursor: pointer;
+    transition: background 0.2s;
+  }
+
+  .start-btn:hover:not(:disabled) {
+    background: #3a7fc9;
+  }
+
+  .start-btn:disabled {
+    background: #ccc;
+    cursor: not-allowed;
+  }
+
+  .error {
+    color: #d32f2f;
+    font-size: 0.875rem;
+    width: 100%;
+    margin-top: 0.5rem;
+  }
+</style>
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/frontend/src/main.js
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/frontend/src/main.js
@ -0,0 +1,6 @@
+import App from './App.svelte'
+import { mount } from 'svelte'
+
+const app = mount(App, { target: document.getElementById('app') })
+
+export default app
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/frontend/src/stores/jobs.js
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/frontend/src/stores/jobs.js
@ -0,0 +1,28 @@
+import { writable } from 'svelte/store';
+
+function createJobsStore() {
+  const { subscribe, set, update } = writable([]);
+
+  return {
+    subscribe,
+    set,
+    add: (job) => update(jobs => [...jobs, job]),
+    updateJob: (jobId, updates) => update(jobs =>
+      jobs.map(j => j.id === jobId ? { ...j, ...updates } : j)
+    ),
+    remove: (jobId) => update(jobs => jobs.filter(j => j.id !== jobId)),
+    refresh: async () => {
+      try {
+        const response = await fetch('/api/jobs');
+        if (response.ok) {
+          const jobs = await response.json();
+          set(jobs);
+        }
+      } catch (e) {
+        console.error('Failed to fetch jobs:', e);
+      }
+    }
+  };
+}
+
+export const jobs = createJobsStore();
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/frontend/svelte.config.js
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/frontend/svelte.config.js
@ -0,0 +1,5 @@
+import { vitePreprocess } from '@sveltejs/vite-plugin-svelte'
+
+export default {
+  preprocess: vitePreprocess()
+}
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/frontend/vite.config.js
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/frontend/vite.config.js
@ -0,0 +1,15 @@
+import { defineConfig } from 'vite'
+import { svelte } from '@sveltejs/vite-plugin-svelte'
+
+export default defineConfig({
+  plugins: [svelte()],
+  server: {
+    proxy: {
+      '/api': 'http://localhost:8000',
+      '/ws': {
+        target: 'ws://localhost:8000',
+        ws: true
+      }
+    }
+  }
+})
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/generate_samples.py
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/generate_samples.py
@ -0,0 +1,136 @@
+#!/usr/bin/env python3
+"""
+Generate voice samples for all available voices.
+Run this script in an environment with audiblez installed.
+
+Usage:
+    python generate_samples.py [output_dir]
+"""
+
+import os
+import sys
+from pathlib import Path
+
+# Sample text for voice preview
+SAMPLE_TEXT = "The quick brown fox jumps over the lazy dog. This is a sample of my voice for audiobook narration."
+
+# All voices from Kokoro-82M (audiblez)
+VOICES = [
+    # American English (20 voices)
+    "af_alloy", "af_aoede", "af_bella", "af_heart", "af_jessica", "af_kore",
+    "af_nicole", "af_nova", "af_river", "af_sarah", "af_sky",
+    "am_adam", "am_echo", "am_eric", "am_fenrir", "am_liam", "am_michael",
+    "am_onyx", "am_puck", "am_santa",
+    # British English (8 voices)
+    "bf_alice", "bf_emma", "bf_isabella", "bf_lily",
+    "bm_daniel", "bm_fable", "bm_george", "bm_lewis",
+    # Spanish (3 voices)
+    "ef_dora", "em_alex", "em_santa",
+    # French (1 voice)
+    "ff_siwis",
+    # Hindi (4 voices)
+    "hf_alpha", "hf_beta", "hm_omega", "hm_psi",
+    # Italian (2 voices)
+    "if_sara", "im_nicola",
+    # Japanese (5 voices)
+    "jf_alpha", "jf_gongitsune", "jf_nezumi", "jf_tebukuro", "jm_kumo",
+    # Brazilian Portuguese (3 voices)
+    "pf_dora", "pm_alex", "pm_santa",
+    # Mandarin Chinese (8 voices)
+    "zf_xiaobei", "zf_xiaoni", "zf_xiaoxiao", "zf_xiaoyi",
+    "zm_yunjian", "zm_yunxi", "zm_yunxia", "zm_yunyang",
+]
+
+
+def generate_sample(voice: str, output_dir: Path):
+    """Generate a voice sample using kokoro TTS."""
+    try:
+        from kokoro import KPipeline
+
+        output_file = output_dir / f"{voice}.mp3"
+        if output_file.exists():
+            print(f"Skipping {voice} - already exists")
+            return True
+
+        print(f"Generating sample for {voice}...")
+
+        # Map voice prefix to language code
+        lang_map = {
+            'a': 'a',   # American English
+            'b': 'b',   # British English
+            'e': 'e',   # Spanish
+            'f': 'f',   # French
+            'h': 'h',   # Hindi
+            'i': 'i',   # Italian
+            'j': 'j',   # Japanese
+            'p': 'p',   # Portuguese
+            'z': 'z',   # Chinese
+        }
+
+        # Extract language code from voice (first letter)
+        lang_code = lang_map.get(voice[0], 'a')
+
+        # Initialize the Kokoro pipeline
+        pipeline = KPipeline(lang_code=lang_code)
+
+        # Generate audio
+        generator = pipeline(SAMPLE_TEXT, voice=voice, speed=1.0)
+
+        # Collect all audio chunks
+        audio_chunks = []
+        for _, _, audio in generator:
+            audio_chunks.append(audio)
+
+        if audio_chunks:
+            import soundfile as sf
+            import numpy as np
+
+            # Concatenate audio
+            audio = np.concatenate(audio_chunks)
+
+            # Save as WAV first, then convert to MP3
+            wav_file = output_dir / f"{voice}.wav"
+            sf.write(str(wav_file), audio, 24000)
+
+            # Convert to MP3 using ffmpeg
+            import subprocess
+            result = subprocess.run([
+                "ffmpeg", "-y", "-i", str(wav_file),
+                "-codec:a", "libmp3lame", "-qscale:a", "5",
+                str(output_file)
+            ], capture_output=True)
+
+            # Remove WAV file
+            if wav_file.exists():
+                wav_file.unlink()
+
+            if result.returncode == 0:
+                print(f"Generated {output_file}")
+                return True
+            else:
+                print(f"FFmpeg failed for {voice}: {result.stderr.decode()}")
+                return False
+        else:
+            print(f"Failed to generate audio for {voice}")
+            return False
+
+    except Exception as e:
+        print(f"Error generating sample for {voice}: {e}")
+        return False
+
+
+def main():
+    output_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("samples")
+    output_dir.mkdir(parents=True, exist_ok=True)
+
+    print(f"Generating voice samples to {output_dir}")
+    print(f"Total voices: {len(VOICES)}")
+
+    for voice in VOICES:
+        generate_sample(voice, output_dir)
+
+    print("Done!")
+
+
+if __name__ == "__main__":
+    main()
--- a/modules/kubernetes/ebook2audiobook/audiblez-web/samples/.gitkeep
+++ b/modules/kubernetes/ebook2audiobook/audiblez-web/samples/.gitkeep
@ -0,0 +1,2 @@
+# Voice samples go here
+# Run generate_samples.py to generate voice samples for all voices
--- a/modules/kubernetes/frigate/config.yaml
+++ b/modules/kubernetes/frigate/config.yaml
@ -0,0 +1,229 @@
+mqtt:
+  enabled: false
+birdseye:
+  quality: 25
+detect:
+  fps: 1
+  enabled: true
+go2rtc:
+  streams:
+    vermont-1:
+      - rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/101/3
+cameras:
+    # # Temp disabled until valchedrym is back up
+    valchedrym-cam-1:
+        enabled: true
+        ffmpeg:
+            inputs:
+                #- path: rtsp://admin:REDACTED_RTSP_PW@192.168.0.11:554/Streaming/Channels/101 # <----- The stream you want to use for detection
+                - path: rtsp://admin:REDACTED_RTSP_PW@valchedrym.ddns.net:554/Streaming/Channels/101 # <----- The stream you want to use for detection
+        detect:
+            enabled: false # <---- disable detection until you have a working camera feed
+            width: 704 # <---- update for your camera's resolution
+            height: 576 # <---- update for your camera's resolution
+        rtmp:
+          enabled: false
+        record:
+          enabled: false
+        snapshots:
+          enabled: false
+        objects:
+          # Optional: list of objects to track from labelmap.txt (full list - https://docs.frigate.video/configuration/objects)
+          track:
+            - person
+            - bicycle
+            - car
+            - bird
+            - cat
+            - dog
+            - horse
+    valchedrym-cam-2:
+        enabled: true
+        ffmpeg:
+            inputs:
+                #- path: rtsp://admin:REDACTED_RTSP_PW@192.168.0.11:554/Streaming/Channels/201 # <----- The stream you want to use for detection
+                - path: rtsp://admin:REDACTED_RTSP_PW@valchedrym.ddns.net:554/Streaming/Channels/201 # <----- The stream you want to use for detection
+        detect:
+            enabled: false # <---- disable detection until you have a working camera feed
+            width: 704 # <---- update for your camera's resolution
+            height: 576 # <---- update for your camera's resolution
+        rtmp:
+          enabled: false
+        record:
+          enabled: false
+        snapshots:
+          enabled: false
+        objects:
+          # Optional: list of objects to track from labelmap.txt (full list - https://docs.frigate.video/configuration/objects)
+          track:
+            - person
+            - bicycle
+            - car
+            - bird
+            - cat
+            - dog
+            - horse
+    vermont-1:
+      enabled: true
+      ffmpeg:
+        inputs:
+          - path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/101/3       # <----- The stream you want to use for detection
+            roles:
+              - record
+      rtmp:
+        enabled: false
+      record:
+        enabled: false
+      snapshots:
+        enabled: false
+      detect:
+        enabled: false
+    vermont-2:
+      enabled: true
+      ffmpeg:
+          inputs:
+              - path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/201/1 # <----- The stream you want to use for detection
+      detect:
+          enabled: false # <---- disable detection until you have a working camera feed
+          width: 704 # <---- update for your camera's resolution
+          height: 576 # <---- update for your camera's resolution
+      rtmp:
+        enabled: false
+      record:
+        enabled: false
+      snapshots:
+        enabled: false
+    vermont-3:
+      enabled: true
+      ffmpeg:
+          inputs:
+              - path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/301/1 # <----- The stream you want to use for detection
+      detect:
+          enabled: false # <---- disable detection until you have a working camera feed
+          width: 704 # <---- update for your camera's resolution
+          height: 576 # <---- update for your camera's resolution
+      rtmp:
+        enabled: false
+      record:
+        enabled: false
+      snapshots:
+        enabled: false
+    vermont-4:
+      enabled: true
+      ffmpeg:
+          inputs:
+              - path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/401/1 # <----- The stream you want to use for detection
+      detect:
+          enabled: false # <---- disable detection until you have a working camera feed
+          width: 704 # <---- update for your camera's resolution
+          height: 576 # <---- update for your camera's resolution
+      rtmp:
+        enabled: false
+      record:
+        enabled: false
+      snapshots:
+        enabled: false
+    vermont-5:
+      enabled: true
+      ffmpeg:
+          inputs:
+              - path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/501/1 # <----- The stream you want to use for detection
+      detect:
+          enabled: false # <---- disable detection until you have a working camera feed
+          width: 704 # <---- update for your camera's resolution
+          height: 576 # <---- update for your camera's resolution
+      rtmp:
+        enabled: false
+      record:
+        enabled: false
+      snapshots:
+        enabled: false
+    vermont-6:
+      enabled: true
+      ffmpeg:
+          inputs:
+              - path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/601/1 # <----- The stream you want to use for detection
+      detect:
+          enabled: false # <---- disable detection until you have a working camera feed
+          width: 704 # <---- update for your camera's resolution
+          height: 576 # <---- update for your camera's resolution
+      rtmp:
+        enabled: false
+      record:
+        enabled: false
+      snapshots:
+        enabled: false
+    vermont-7:
+      enabled: true
+      ffmpeg:
+          inputs:
+              - path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/701/1 # <----- The stream you want to use for detection
+      detect:
+          enabled: false # <---- disable detection until you have a working camera feed
+          width: 704 # <---- update for your camera's resolution
+          height: 576 # <---- update for your camera's resolution
+      rtmp:
+        enabled: false
+      record:
+        enabled: false
+      snapshots:
+        enabled: false
+    vermont-8:
+      enabled: true
+      ffmpeg:
+          inputs:
+              - path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/801/1 # <----- The stream you want to use for detection
+      detect:
+          enabled: false # <---- disable detection until you have a working camera feed
+          width: 704 # <---- update for your camera's resolution
+          height: 576 # <---- update for your camera's resolution
+      rtmp:
+        enabled: false
+      record:
+        enabled: false
+      snapshots:
+        enabled: false
+    vermont-9:
+      enabled: true
+      ffmpeg:
+          inputs:
+              - path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/901/1 # <----- The stream you want to use for detection
+      detect:
+          enabled: false # <---- disable detection until you have a working camera feed
+          width: 704 # <---- update for your camera's resolution
+          height: 576 # <---- update for your camera's resolution
+      rtmp:
+        enabled: false
+      record:
+        enabled: false
+      snapshots:
+        enabled: false
+    # london-ipcam:
+    #     enabled: false
+    #     ffmpeg:
+    #         inputs:
+    #             - path: rtsp://192.168.2.2:8554/london_cam # <----- The stream you want to use for detection
+    #               roles:
+    #                 - rtmp
+    #                 - record
+    #                 - detect
+    #     detect:
+    #         enabled: False
+    #         width: 1280
+    #         height: 720
+    #     record:
+    #         enabled: False # Not needed for this camera but keeping for reference
+    #         events:
+    #           retain:
+    #             default: 10
+    #     objects:
+    #       # Optional: list of objects to track from labelmap.txt (full list - https://docs.frigate.video/configuration/objects)
+    #       track:
+    #         - person
+    #         - shoe
+    #         - handbag
+    #         - wine glass
+    #         - knife
+    #         - pizza
+    #         - laptop
+    #         - book
--- a/scripts/cluster_manager.py
+++ b/scripts/cluster_manager.py
@ -0,0 +1,277 @@
+import asyncio
+import click
+import logging
+import time
+from typing import List, Union, Optional
+from kubernetes_asyncio import client, config
+from kubernetes_asyncio.client.api_client import ApiClient
+
+logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
+logger = logging.getLogger(__name__)
+
+
+async def wait_for_healthy(
+    api_instance: client.AppsV1Api,
+    resource_type: str,
+    namespace: str,
+    name: str,
+    target_replicas: int,
+    timeout: int = 300,
+) -> None:
+    start_time = time.time()
+    logger.info(
+        f"Waiting for {resource_type} {name} to reach {target_replicas} replicas..."
+    )
+
+    while True:
+        if time.time() - start_time > timeout:
+            logger.error(f"❌ Timeout reached for {resource_type} {name}")
+            return
+
+        try:
+            if resource_type.lower() == "deployment":
+                res = await api_instance.read_namespaced_deployment_status(
+                    name, namespace
+                )
+                ready = res.status.ready_replicas or 0
+                updated = res.status.updated_replicas or 0
+                if ready == target_replicas and updated == target_replicas:
+                    break
+            else:  # StatefulSet
+                res = await api_instance.read_namespaced_stateful_set_status(
+                    name, namespace
+                )
+                ready = res.status.ready_replicas or 0
+                if ready == target_replicas:
+                    break
+
+        except Exception as e:
+            logger.debug(f"Retrying status check for {name}: {e}")
+
+        await asyncio.sleep(5)
+
+    logger.info(f"✅ {resource_type} {name} is now healthy.")
+
+
+async def wait_for_zero(
+    api: client.AppsV1Api, kind: str, ns: str, name: str, timeout: int
+) -> tuple[str, str]:
+    start_time = asyncio.get_event_loop().time()
+    while (asyncio.get_event_loop().time() - start_time) < timeout:
+        try:
+            res = await (
+                api.read_namespaced_deployment_status(name, ns)
+                if kind.lower() == "deployment"
+                else api.read_namespaced_stateful_set_status(name, ns)
+            )
+            if (res.status.ready_replicas or 0) == 0:
+                return ns, name
+        except Exception:
+            return ns, name  # Assume gone if error
+        await asyncio.sleep(3)
+    logger.error(f"Timeout: {kind} {ns}/{name} still has running pods.")
+    return ns, name
+
+
+async def scale_resource(
+    api_instance: client.AppsV1Api,
+    resource_type: str,
+    namespace: str,
+    name: str,
+    replicas: int,
+) -> None:
+    body = {"spec": {"replicas": replicas}}
+    try:
+        if resource_type.lower() == "deployment":
+            await api_instance.patch_namespaced_deployment_scale(name, namespace, body)
+        else:
+            await api_instance.patch_namespaced_stateful_set_scale(
+                name, namespace, body
+            )
+    except Exception as e:
+        logger.error(f"Failed to scale {resource_type} {name}: {e}")
+
+
+async def run_stop_tier(
+    api_v1: client.AppsV1Api, label: str, output_file: str, timeout: int
+) -> None:
+    """Processes a single label tier: saves, scales to 0, and waits."""
+    excluded_ns = ["kube-system", "kube-public", "kube-node-lease"]
+
+    # 1. Discover
+    targets = [
+        ("Deployment", api_v1.list_deployment_for_all_namespaces),
+        ("StatefulSet", api_v1.list_stateful_set_for_all_namespaces),
+    ]
+
+    tier_resources = []
+    for kind, list_func in targets:
+        resp = await list_func(label_selector=label)
+        tier_resources.extend(
+            [
+                (kind, item)
+                for item in resp.items
+                if item.metadata.namespace not in excluded_ns
+            ]
+        )
+
+    if not tier_resources:
+        logger.warning(f"No resources found for label: {label}")
+        return
+
+    # 2. Save & Scale
+    active_jobs: set[tuple[str, str]] = set()
+    wait_tasks = []
+
+    # Append to file so we don't overwrite previous tiers
+    with open(output_file, "a") as f:
+        for kind, item in tier_resources:
+            ns, name = item.metadata.namespace, item.metadata.name
+            reps = item.spec.replicas or 0
+            f.write(f"{kind} {ns} {name} {reps}\n")
+            active_jobs.add((ns, name))
+
+            await scale_resource(api_v1, kind, ns, name, 0)
+            wait_tasks.append(wait_for_zero(api_v1, kind, ns, name, timeout))
+
+    # 3. Wait for this tier to finish before moving to next
+    logger.info(f"Tier [{label}]: Waiting for {len(active_jobs)} resources to stop...")
+    for coro in asyncio.as_completed(wait_tasks):
+        finished_ns, finished_name = await coro
+        active_jobs.discard((finished_ns, finished_name))
+        if active_jobs:
+            remaining_ns = sorted({ns for ns, name in active_jobs})
+            logger.info(
+                f"[{label}] Pending: {len(active_jobs)} | Namespaces: {', '.join(remaining_ns)}"
+            )
+
+    logger.info(f"✅ Tier [{label}] successfully shut down.")
+
+
+@click.group()
+def cli():
+    pass
+
+
+@cli.command()
+@click.argument("labels", nargs=-1, required=True)
+@click.option("--output", "-o", default="resources.txt", help="Output state file")
+@click.option("--timeout", "-t", default=3600)
+def stop(labels: List[str], output: str, timeout: int):
+    """Stop tiers sequentially. Usage: stop 'app=web' 'app=db'"""
+
+    async def main():
+        await config.load_kube_config()
+        # Clear/Create file at start
+        open(output, "w").close()
+
+        async with ApiClient() as api_client:
+            api_v1 = client.AppsV1Api(api_client)
+            for label in labels:
+                logger.info(f"🚀 Processing Shutdown Tier: {label}")
+                await run_stop_tier(api_v1, label, output, timeout)
+        logger.info("🏁 Sequence complete. Cluster is gracefully stopped.")
+
+    asyncio.run(main())
+
+
+@cli.command()
+@click.argument("labels", nargs=-1, required=True)
+@click.option("--file", "-f", default="resources.txt")
+@click.option("--timeout", "-t", default=3600, help="Seconds to wait per resource")
+def start(labels: List[str], file: str, timeout: int):
+    asyncio.run(run_start_sequence(labels, file, timeout))
+
+
+async def run_start_sequence(labels: List[str], file_path: str, timeout: int) -> None:
+    await config.load_kube_config()
+
+    async with ApiClient() as api_client:
+        apps_v1 = client.AppsV1Api(api_client)
+
+        # 1. Load the entire snapshot into memory for filtering
+        try:
+            with open(file_path, "r") as f:
+                # Format: Kind Namespace Name Replicas
+                snapshot_lines = [line.strip().split() for line in f if line.strip()]
+        except FileNotFoundError:
+            logger.error(f"Snapshot file {file_path} not found.")
+            return
+
+        # 2. Iterate through labels in the order provided
+        for label in labels:
+            logger.info(f"🚀 Starting Tier: {label}")
+
+            # Find resources in this tier by querying K8s for the label
+            # then matching against our snapshot file data
+            tier_resources = await get_resources_by_label(apps_v1, label)
+
+            # Cross-reference: Only start things that are in BOTH the K8s label query AND our file
+            # This ensures we restore them to the CORRECT previous replica count
+            to_restore = []
+            tier_keys = {(r["ns"], r["name"]) for r in tier_resources}
+
+            for kind, ns, name, reps in snapshot_lines:
+                if (ns, name) in tier_keys:
+                    to_restore.append((kind, ns, name, int(reps)))
+
+            if not to_restore:
+                logger.warning(f"No resources found in snapshot for tier: {label}")
+                continue
+
+            # 3. Scale and Wait for this specific tier
+            await process_start_tier(apps_v1, to_restore, timeout, label)
+
+        logger.info("🏁 All tiers started successfully.")
+
+
+async def get_resources_by_label(api: client.AppsV1Api, label: str) -> List[dict]:
+    """Helper to find what currently exists in the cluster with this label."""
+    targets = [
+        api.list_deployment_for_all_namespaces,
+        api.list_stateful_set_for_all_namespaces,
+    ]
+    found = []
+    for list_func in targets:
+        resp = await list_func(label_selector=label)
+        for item in resp.items:
+            found.append({"ns": item.metadata.namespace, "name": item.metadata.name})
+    return found
+
+
+async def process_start_tier(
+    api: client.AppsV1Api, resources: list, timeout: int, label: str
+):
+    active_jobs = set()
+    scale_tasks = []
+    wait_tasks = []
+
+    # Wrapper to track which job finishes
+    async def tracked_wait(kind, ns, name, target, t_out):
+        await wait_for_healthy(api, kind, ns, name, target, t_out)
+        return (ns, name)
+
+    for kind, ns, name, reps in resources:
+        active_jobs.add((ns, name))
+        scale_tasks.append(scale_resource(api, kind, ns, name, reps))
+        wait_tasks.append(tracked_wait(kind, ns, name, reps, timeout))
+
+    # Trigger all scales for this tier
+    await asyncio.gather(*scale_tasks)
+
+    # Monitor health
+    for coro in asyncio.as_completed(wait_tasks):
+        finished_ns, finished_name = await coro
+        active_jobs.discard((finished_ns, finished_name))
+
+        if active_jobs:
+            remaining_ns = sorted({ns for ns, name in active_jobs})
+            logger.info(
+                f"[{label}] Pending Health: {len(active_jobs)} | Namespaces: {', '.join(remaining_ns)}"
+            )
+
+    logger.info(f"✅ Tier [{label}] is healthy.")
+
+
+if __name__ == "__main__":
+    cli()
--- a/scripts/image_pull.sh
+++ b/scripts/image_pull.sh
@ -0,0 +1,10 @@
+#!/usr/bin/env bash
+
+for n in $(kubectl get nodes -o wide | grep node | awk '{print $1}'); do 
+    echo $n;
+    kubectl drain $n --ignore-daemonsets --delete-emptydir-data && \
+    ssh wizard@$n < image_pull_remote.sh
+    # Check result
+    kubectl get --raw "/api/v1/nodes/$n/proxy/configz" | jq '.kubeletconfig | {serializeImagePulls, maxParallelImagePulls}'
+    kubectl uncordon $n
+done
--- a/scripts/image_pull_remote.sh
+++ b/scripts/image_pull_remote.sh
@ -0,0 +1,14 @@
+#!/usr/bin/env bash
+
+# Containerd
+sudo sed -i 's/.*max_concurrent_downloads.*/max_concurrent_downloads = 5/g' /etc/containerd/config.toml 
+sudo systemctl restart containerd
+
+# Kubelet
+#sed serializeImagePulls: false # Allow container images to be downloaded in parallel
+#maxParallelImagePulls: 20 # To limit the number of parallel image pulls.
+
+sudo sed -i '/serializeImagePulls:/d' /var/lib/kubelet/config.yaml && \
+sudo sed -i '/maxParallelImagePulls:/d' /var/lib/kubelet/config.yaml && \
+echo -e 'serializeImagePulls: false\nmaxParallelImagePulls: 5' | sudo tee -a /var/lib/kubelet/config.yaml
+sudo systemctl restart kubelet
--- a/scripts/setup-containerd-pullthrough.sh
+++ b/scripts/setup-containerd-pullthrough.sh
@ -0,0 +1,115 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+############################################
+# CONFIGURATION
+############################################
+
+# Internal pull-through registry endpoint
+# Examples:
+#   http://registry.internal:5000
+#   https://registry.internal
+INTERNAL_REGISTRY="http://10.0.20.10:5002"
+
+# Path where containerd reads registry configs
+CERTS_DIR="/etc/containerd/certs.d"
+
+# Optional: path to CA file if INTERNAL_REGISTRY uses HTTPS with custom CA
+# Leave empty if not needed
+INTERNAL_CA_PATH=""
+
+# Restart containerd at the end
+RESTART_CONTAINERD=true
+
+############################################
+# REGISTRIES TO MIRROR
+############################################
+
+REGISTRIES=(
+  "docker.io"
+  "registry-1.docker.io"
+  "registry.k8s.io"
+  "quay.io"
+  "ghcr.io"
+  "gcr.io"
+  "us-docker.pkg.dev"
+  "public.ecr.aws"
+  "mcr.microsoft.com"
+)
+
+############################################
+# FUNCTIONS
+############################################
+
+require_root() {
+  if [[ "$(id -u)" -ne 0 ]]; then
+    echo "ERROR: must be run as root" >&2
+    exit 1
+  fi
+}
+
+ensure_containerd_config_path() {
+  local cfg="/etc/containerd/config.toml"
+
+  if [[ ! -f "$cfg" ]]; then
+    echo "Generating default containerd config"
+    containerd config default > "$cfg"
+  fi
+
+  if ! grep -q 'config_path *= *"/etc/containerd/certs.d"' "$cfg"; then
+    echo "Enabling config_path in containerd config"
+
+    # Minimal and safe append if section exists
+    if grep -q '\[plugins\."io.containerd.grpc.v1.cri".registry\]' "$cfg"; then
+      sed -i '/\[plugins\."io.containerd.grpc.v1.cri".registry\]/a \  config_path = "/etc/containerd/certs.d"' "$cfg"
+    else
+      cat >> "$cfg" <<'EOF'
+
+[plugins."io.containerd.grpc.v1.cri".registry]
+  config_path = "/etc/containerd/certs.d"
+EOF
+    fi
+  fi
+}
+
+write_hosts_toml() {
+  local registry="$1"
+  local dir="$CERTS_DIR/$registry"
+  local file="$dir/hosts.toml"
+
+  mkdir -p "$dir"
+
+  cat > "$file" <<EOF
+server = "https://$registry"
+
+[host."$INTERNAL_REGISTRY"]
+  capabilities = ["pull", "resolve"]
+EOF
+
+  if [[ -n "$INTERNAL_CA_PATH" ]]; then
+    cat >> "$file" <<EOF
+  ca = "$INTERNAL_CA_PATH"
+EOF
+  fi
+}
+
+############################################
+# MAIN
+############################################
+
+require_root
+ensure_containerd_config_path
+
+echo "Creating registry mirror configurations..."
+
+for r in "${REGISTRIES[@]}"; do
+  echo "  - $r"
+  write_hosts_toml "$r"
+done
+
+if [[ "$RESTART_CONTAINERD" == "true" ]]; then
+  echo "Restarting containerd"
+  systemctl restart containerd
+fi
+
+echo "Done."
--- a/stacks/beads-server/main.tf
+++ b/stacks/beads-server/main.tf
@ -0,0 +1,172 @@
+resource "kubernetes_namespace" "beads" {
+  metadata {
+    name = "beads-server"
+    labels = {
+      tier = local.tiers.aux
+    }
+  }
+}
+
+resource "kubernetes_persistent_volume_claim" "dolt_data" {
+  wait_until_bound = false
+  metadata {
+    name      = "dolt-data"
+    namespace = kubernetes_namespace.beads.metadata[0].name
+    annotations = {
+      "resize.topolvm.io/threshold"     = "80%"
+      "resize.topolvm.io/increase"      = "100%"
+      "resize.topolvm.io/storage_limit" = "10Gi"
+    }
+  }
+  spec {
+    access_modes       = ["ReadWriteOnce"]
+    storage_class_name = "proxmox-lvm"
+    resources {
+      requests = { storage = "2Gi" }
+    }
+  }
+}
+
+resource "kubernetes_config_map" "dolt_init" {
+  metadata {
+    name      = "dolt-init"
+    namespace = kubernetes_namespace.beads.metadata[0].name
+  }
+  data = {
+    "01-create-beads-user.sql" = <<-EOT
+      CREATE USER IF NOT EXISTS 'beads'@'%' IDENTIFIED BY '';
+      GRANT ALL PRIVILEGES ON *.* TO 'beads'@'%' WITH GRANT OPTION;
+    EOT
+  }
+}
+
+resource "kubernetes_deployment" "dolt" {
+  metadata {
+    name      = "dolt"
+    namespace = kubernetes_namespace.beads.metadata[0].name
+    labels = {
+      app  = "dolt"
+      tier = local.tiers.aux
+    }
+  }
+  spec {
+    replicas = 1
+    strategy {
+      type = "Recreate"
+    }
+    selector {
+      match_labels = {
+        app = "dolt"
+      }
+    }
+    template {
+      metadata {
+        labels = {
+          app = "dolt"
+        }
+      }
+      spec {
+        container {
+          name  = "dolt"
+          image = "dolthub/dolt-sql-server:latest"
+
+          port {
+            name           = "mysql"
+            container_port = 3306
+          }
+
+          env {
+            name  = "DOLT_ROOT_HOST"
+            value = "%"
+          }
+
+          volume_mount {
+            name       = "dolt-data"
+            mount_path = "/var/lib/dolt"
+          }
+          volume_mount {
+            name       = "init-scripts"
+            mount_path = "/docker-entrypoint-initdb.d"
+            read_only  = true
+          }
+
+          startup_probe {
+            tcp_socket {
+              port = 3306
+            }
+            failure_threshold = 30
+            period_seconds    = 2
+          }
+          liveness_probe {
+            tcp_socket {
+              port = 3306
+            }
+            initial_delay_seconds = 10
+            period_seconds        = 30
+          }
+          readiness_probe {
+            tcp_socket {
+              port = 3306
+            }
+            initial_delay_seconds = 5
+            period_seconds        = 10
+          }
+
+          resources {
+            requests = {
+              memory = "256Mi"
+              cpu    = "50m"
+            }
+            limits = {
+              memory = "512Mi"
+            }
+          }
+        }
+
+        volume {
+          name = "dolt-data"
+          persistent_volume_claim {
+            claim_name = kubernetes_persistent_volume_claim.dolt_data.metadata[0].name
+          }
+        }
+        volume {
+          name = "init-scripts"
+          config_map {
+            name = kubernetes_config_map.dolt_init.metadata[0].name
+          }
+        }
+      }
+    }
+  }
+  lifecycle {
+    ignore_changes = [
+      spec[0].template[0].spec[0].dns_config
+    ]
+  }
+}
+
+resource "kubernetes_service" "dolt" {
+  metadata {
+    name      = "dolt"
+    namespace = kubernetes_namespace.beads.metadata[0].name
+    labels = {
+      app = "dolt"
+    }
+    annotations = {
+      "metallb.universe.tf/loadBalancerIPs" = "10.0.20.200"
+      "metallb.io/allow-shared-ip"          = "shared"
+    }
+  }
+  spec {
+    type                    = "LoadBalancer"
+    external_traffic_policy = "Cluster"
+    selector = {
+      app = "dolt"
+    }
+    port {
+      name        = "mysql"
+      port        = 3306
+      target_port = 3306
+    }
+  }
+}
--- a/stacks/beads-server/terragrunt.hcl
+++ b/stacks/beads-server/terragrunt.hcl
@ -0,0 +1,3 @@
+include "root" {
+  path = find_in_parent_folders()
+}
--- a/stacks/beads-server/tiers.tf
+++ b/stacks/beads-server/tiers.tf
@ -0,0 +1,10 @@
+# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
+locals {
+  tiers = {
+    core    = "0-core"
+    cluster = "1-cluster"
+    gpu     = "2-gpu"
+    edge    = "3-edge"
+    aux     = "4-aux"
+  }
+}
--- a/stacks/hermes-agent/main.tf
+++ b/stacks/hermes-agent/main.tf
@ -0,0 +1,418 @@
+variable "tls_secret_name" {
+  type      = string
+  sensitive = true
+}
+
+# --- Namespace ---
+
+resource "kubernetes_namespace" "hermes_agent" {
+  metadata {
+    name = "hermes-agent"
+    labels = {
+      tier = local.tiers.aux
+    }
+  }
+}
+
+module "tls_secret" {
+  source          = "../../modules/kubernetes/setup_tls_secret"
+  namespace       = kubernetes_namespace.hermes_agent.metadata[0].name
+  tls_secret_name = var.tls_secret_name
+}
+
+# --- Secrets (ESO from Vault) ---
+
+resource "kubernetes_manifest" "external_secret" {
+  manifest = {
+    apiVersion = "external-secrets.io/v1beta1"
+    kind       = "ExternalSecret"
+    metadata = {
+      name      = "hermes-agent-secrets"
+      namespace = "hermes-agent"
+    }
+    spec = {
+      refreshInterval = "15m"
+      secretStoreRef = {
+        name = "vault-kv"
+        kind = "ClusterSecretStore"
+      }
+      target = {
+        name = "hermes-agent-secrets"
+      }
+      dataFrom = [{
+        extract = {
+          key = "hermes-agent"
+        }
+      }]
+    }
+  }
+  depends_on = [kubernetes_namespace.hermes_agent]
+}
+
+# --- Storage ---
+
+resource "kubernetes_persistent_volume_claim" "data_proxmox" {
+  wait_until_bound = false
+  metadata {
+    name      = "hermes-agent-data-proxmox"
+    namespace = kubernetes_namespace.hermes_agent.metadata[0].name
+    annotations = {
+      "resize.topolvm.io/threshold"     = "80%"
+      "resize.topolvm.io/increase"      = "100%"
+      "resize.topolvm.io/storage_limit" = "5Gi"
+    }
+  }
+  spec {
+    access_modes       = ["ReadWriteOnce"]
+    storage_class_name = "proxmox-lvm"
+    resources {
+      requests = {
+        storage = "1Gi"
+      }
+    }
+  }
+}
+
+# --- ConfigMaps ---
+
+resource "kubernetes_config_map" "hermes_config" {
+  metadata {
+    name      = "hermes-agent-config"
+    namespace = kubernetes_namespace.hermes_agent.metadata[0].name
+  }
+  data = {
+    "config.yaml" = yamlencode({
+      # Primary model — Qwen 3.5 397B via NVIDIA NIM (free tier)
+      # api_key is injected by init container from the secret
+      model = {
+        provider       = "custom"
+        default        = "qwen/qwen3.5-397b-a17b"
+        base_url       = "https://integrate.api.nvidia.com/v1"
+        api_key        = "__NVIDIA_API_KEY_PLACEHOLDER__"
+        context_window = 262000
+      }
+
+      # Terminal execution
+      terminal = {
+        backend = "local"
+      }
+
+      # Memory system
+      memory = {
+        memory_enabled       = true
+        user_profile_enabled = true
+        memory_char_limit    = 2200
+        user_char_limit      = 1375
+      }
+
+      # Context compression
+      compression = {
+        enabled      = true
+        threshold    = 0.50
+        target_ratio = 0.20
+        protect_last_n = 20
+      }
+
+      # Security
+      security = {
+        redact_secrets = true
+      }
+
+      # Display
+      display = {
+        streaming = true
+      }
+
+      # Web tools
+      web = {
+        backend = "tavily"
+      }
+
+      # Agent behavior
+      agent = {
+        max_turns        = 90
+        api_timeout      = 90
+        reasoning_effort = "medium"
+      }
+
+      # Telegram DM policy
+      unauthorized_dm_behavior = "ignore"
+
+      # MCP servers — claude-memory for persistent cross-session memory
+      mcp_servers = {
+        claude-memory = {
+          url = "http://claude-memory.claude-memory.svc.cluster.local/mcp/mcp"
+          headers = {
+            Authorization = "Bearer $${CLAUDE_MEMORY_API_KEY}"
+          }
+        }
+      }
+    })
+  }
+}
+
+resource "kubernetes_config_map" "hermes_soul" {
+  metadata {
+    name      = "hermes-agent-soul"
+    namespace = kubernetes_namespace.hermes_agent.metadata[0].name
+  }
+  data = {
+    "SOUL.md" = <<-EOT
+# Hermes - Viktor's Personal AI Assistant
+
+You are Hermes, a knowledgeable and helpful AI assistant owned and operated by
+Viktor Barzin. You run on Viktor's home lab Kubernetes cluster and are powered
+by Anthropic Claude.
+
+## Personality
+- Direct and concise, no unnecessary fluff
+- Technical when needed, plain language when possible
+- Honest about limitations and uncertainties
+- Proactive — suggest improvements, flag risks, follow up on open items
+
+## Owner
+- Viktor Barzin — software engineer based in London
+- Runs a home lab with a 5-node Kubernetes cluster on Proxmox
+- Infrastructure managed with Terraform/Terragrunt
+- Uses Vault for secrets, Traefik for ingress, Authentik for SSO
+
+## Your Capabilities
+- Terminal: local execution inside your container (Python, Node.js, git, ripgrep, ffmpeg)
+- Memory: you have access to claude-memory MCP for persistent cross-session memory
+  - Always check memory at the start of conversations for context
+  - Store important decisions, learnings, and user preferences
+- Skills: you can create and refine your own skills over time
+- Web: search and fetch capabilities for research
+
+## Package Persistence
+Your container restarts lose system packages. To make installs survive restarts:
+- **pip**: packages auto-persist to /opt/data/pip-packages/ (PIP_TARGET is set)
+- **npm -g**: packages auto-persist to /opt/data/npm-global/ (NPM_CONFIG_PREFIX is set)
+- **apt**: after installing system packages, save them:
+  `apt install -y foo bar && echo -e "foo\nbar" >> /opt/data/apt-packages.txt && sort -u -o /opt/data/apt-packages.txt /opt/data/apt-packages.txt`
+  They will be auto-reinstalled on next container restart.
+
+## Communication Preferences
+- Viktor prefers terse, technical responses
+- No emojis unless asked
+- Lead with the answer, not the reasoning
+- When unsure, say so rather than guessing
+    EOT
+  }
+}
+
+# --- Deployment ---
+
+resource "kubernetes_deployment" "hermes_agent" {
+  metadata {
+    name      = "hermes-agent"
+    namespace = kubernetes_namespace.hermes_agent.metadata[0].name
+    labels = {
+      app  = "hermes-agent"
+      tier = local.tiers.aux
+    }
+  }
+  spec {
+    strategy {
+      type = "Recreate"
+    }
+    replicas = 1
+    selector {
+      match_labels = {
+        app = "hermes-agent"
+      }
+    }
+    template {
+      metadata {
+        labels = {
+          app = "hermes-agent"
+        }
+        annotations = {
+          "reloader.stakater.com/search" = "true"
+        }
+      }
+      spec {
+        # Init container 1: bootstrap config into writable PVC
+        init_container {
+          name  = "bootstrap-config"
+          image = "docker.io/library/busybox:1.37"
+          command = ["sh", "-c", <<-EOF
+            # Create subdirectories
+            mkdir -p /opt/data/memories /opt/data/skills /opt/data/sessions /opt/data/cron /opt/data/logs
+            mkdir -p /opt/data/pip-packages/bin /opt/data/npm-global/bin
+
+            # Copy config from ConfigMap and inject secrets
+            cp /config/config.yaml /opt/data/config.yaml
+            cp /soul/SOUL.md /opt/data/SOUL.md
+
+            # Replace API key placeholder with actual value from secret
+            NVIDIA_KEY=$(cat /secrets/NVIDIA_API_KEY)
+            sed -i "s|__NVIDIA_API_KEY_PLACEHOLDER__|$${NVIDIA_KEY}|g" /opt/data/config.yaml
+
+            # Generate .env from mounted secret files
+            echo "# Auto-generated from Vault secret/hermes-agent" > /opt/data/.env
+            for f in /secrets/*; do
+              key=$(basename "$f")
+              val=$(cat "$f")
+              echo "$${key}=$${val}" >> /opt/data/.env
+            done
+
+            # Fix ownership (hermes container user)
+            chown -R 1000:1000 /opt/data
+          EOF
+          ]
+          volume_mount {
+            name       = "data"
+            mount_path = "/opt/data"
+          }
+          volume_mount {
+            name       = "config"
+            mount_path = "/config"
+          }
+          volume_mount {
+            name       = "soul"
+            mount_path = "/soul"
+          }
+          volume_mount {
+            name       = "secrets"
+            mount_path = "/secrets"
+          }
+        }
+
+        container {
+          name  = "hermes-agent"
+          image = "nousresearch/hermes-agent:latest"
+          # Wrap entrypoint: restore apt packages (as root) then hand off to original entrypoint
+          command = ["bash", "-c", <<-EOF
+            if [ -f /opt/data/apt-packages.txt ] && [ -s /opt/data/apt-packages.txt ]; then
+              echo "Restoring apt packages: $(cat /opt/data/apt-packages.txt | tr '\n' ' ')"
+              apt-get update -qq && \
+              xargs -a /opt/data/apt-packages.txt apt-get install -y -qq --no-install-recommends 2>&1 | tail -5
+              echo "Done restoring apt packages"
+            fi
+            exec /opt/hermes/docker/entrypoint.sh gateway run
+          EOF
+          ]
+
+          env {
+            name  = "HERMES_HOME"
+            value = "/opt/data"
+          }
+
+          # Persist pip packages across restarts
+          env {
+            name  = "PIP_TARGET"
+            value = "/opt/data/pip-packages"
+          }
+          env {
+            name  = "PYTHONPATH"
+            value = "/opt/data/pip-packages"
+          }
+
+          # Persist npm global packages across restarts
+          env {
+            name  = "NPM_CONFIG_PREFIX"
+            value = "/opt/data/npm-global"
+          }
+
+          # Add persistent bin dirs to PATH
+          env {
+            name  = "PATH"
+            value = "/opt/data/pip-packages/bin:/opt/data/npm-global/bin:/opt/data/apt-local/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
+          }
+
+          volume_mount {
+            name       = "data"
+            mount_path = "/opt/data"
+          }
+
+          resources {
+            requests = {
+              cpu    = "100m"
+              memory = "512Mi"
+            }
+            limits = {
+              memory = "1Gi"
+            }
+          }
+
+          liveness_probe {
+            exec {
+              command = ["pgrep", "-f", "hermes"]
+            }
+            initial_delay_seconds = 30
+            period_seconds        = 30
+            failure_threshold     = 3
+          }
+        }
+
+        volume {
+          name = "data"
+          persistent_volume_claim {
+            claim_name = kubernetes_persistent_volume_claim.data_proxmox.metadata[0].name
+          }
+        }
+        volume {
+          name = "config"
+          config_map {
+            name = kubernetes_config_map.hermes_config.metadata[0].name
+          }
+        }
+        volume {
+          name = "soul"
+          config_map {
+            name = kubernetes_config_map.hermes_soul.metadata[0].name
+          }
+        }
+        volume {
+          name = "secrets"
+          secret {
+            secret_name = "hermes-agent-secrets"
+          }
+        }
+      }
+    }
+  }
+  lifecycle {
+    ignore_changes = [spec[0].template[0].spec[0].dns_config]
+  }
+}
+
+# --- Service ---
+
+resource "kubernetes_service" "hermes_agent" {
+  metadata {
+    name      = "hermes-agent"
+    namespace = kubernetes_namespace.hermes_agent.metadata[0].name
+    labels = {
+      app = "hermes-agent"
+    }
+  }
+  spec {
+    selector = {
+      app = "hermes-agent"
+    }
+    port {
+      port        = 80
+      target_port = 8642
+    }
+  }
+}
+
+# --- Ingress ---
+
+module "ingress" {
+  source          = "../../modules/kubernetes/ingress_factory"
+  namespace       = kubernetes_namespace.hermes_agent.metadata[0].name
+  name            = "hermes-agent"
+  tls_secret_name = var.tls_secret_name
+  protected       = true
+  extra_annotations = {
+    "gethomepage.dev/enabled"      = "true"
+    "gethomepage.dev/name"         = "Hermes Agent"
+    "gethomepage.dev/description"  = "Self-improving AI agent"
+    "gethomepage.dev/icon"         = "mdi-robot"
+    "gethomepage.dev/group"        = "AI & Data"
+    "gethomepage.dev/pod-selector" = ""
+  }
+}
--- a/stacks/hermes-agent/secrets
+++ b/stacks/hermes-agent/secrets
@ -0,0 +1 @@
+../../secrets
--- a/stacks/hermes-agent/terragrunt.hcl
+++ b/stacks/hermes-agent/terragrunt.hcl
@ -0,0 +1,13 @@
+include "root" {
+  path = find_in_parent_folders()
+}
+
+dependency "platform" {
+  config_path  = "../platform"
+  skip_outputs = true
+}
+
+dependency "vault" {
+  config_path  = "../vault"
+  skip_outputs = true
+}
--- a/stacks/hermes-agent/tiers.tf
+++ b/stacks/hermes-agent/tiers.tf
@ -0,0 +1,10 @@
+# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
+locals {
+  tiers = {
+    core    = "0-core"
+    cluster = "1-cluster"
+    gpu     = "2-gpu"
+    edge    = "3-edge"
+    aux     = "4-aux"
+  }
+}
--- a/stacks/infra/tiers.tf
+++ b/stacks/infra/tiers.tf
@ -0,0 +1,10 @@
+# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
+locals {
+  tiers = {
+    core    = "0-core"
+    cluster = "1-cluster"
+    gpu     = "2-gpu"
+    edge    = "3-edge"
+    aux     = "4-aux"
+  }
+}
--- a/stacks/n8n/workflows/.gitkeep
+++ b/stacks/n8n/workflows/.gitkeep
--- a/stacks/phpipam/secrets
+++ b/stacks/phpipam/secrets
@ -0,0 +1 @@
+../../secrets
--- a/stacks/phpipam/tiers.tf
+++ b/stacks/phpipam/tiers.tf
@ -0,0 +1,10 @@
+# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
+locals {
+  tiers = {
+    core    = "0-core"
+    cluster = "1-cluster"
+    gpu     = "2-gpu"
+    edge    = "3-edge"
+    aux     = "4-aux"
+  }
+}
--- a/stacks/status-page/tiers.tf
+++ b/stacks/status-page/tiers.tf
@ -0,0 +1,10 @@
+# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
+locals {
+  tiers = {
+    core    = "0-core"
+    cluster = "1-cluster"
+    gpu     = "2-gpu"
+    edge    = "3-edge"
+    aux     = "4-aux"
+  }
+}
--- a/stacks/vault/secrets
+++ b/stacks/vault/secrets
@ -0,0 +1 @@
+../../secrets