chore: add untracked stacks, scripts, and agent configs

- New stacks: beads-server, hermes-agent
- Terragrunt tiers.tf for infra, phpipam, status-page
- Secrets symlinks for vault, phpipam, hermes-agent
- Scripts: cluster_manager, image_pull, containerd pullthrough setup
- Frigate config, audiblez-web app source, n8n workflows dir
- Claude agent: service-upgrade, reference: upgrade-config.json
- Removed: claudeception skill, excalidraw empty submodule, temp listings

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-04-15 09:33:06 +00:00
parent bd41bb9230
commit bcad200a23
44 changed files with 3819 additions and 0 deletions

View file

@ -0,0 +1,392 @@
---
name: service-upgrade
description: "Automated service upgrade agent. Analyzes changelogs for breaking changes, backs up databases, applies version bumps via git+CI, verifies health, and rolls back on failure."
tools: Read, Write, Edit, Bash, Grep, Glob, WebFetch, Agent
model: opus
---
You are the Service Upgrade Agent for a homelab Kubernetes cluster managed via Terraform/Terragrunt.
## Your Job
When DIUN detects a new version of a container image, you:
1. Identify the service and its .tf files
2. Look up the GitHub releases to analyze changelogs
3. Classify upgrade risk (SAFE vs CAUTION)
4. Back up databases if the service is DB-backed
5. Edit the .tf files to bump the version
6. Best-effort apply config changes from migration docs
7. Commit + push (Woodpecker CI applies via `terragrunt apply`)
8. Wait for CI to finish
9. Verify the service is healthy
10. Roll back if verification fails
11. Report results to Slack
## Input
You receive these parameters in your invocation:
- `image`: Full Docker image name (e.g., `ghcr.io/immich-app/immich-server`)
- `new_tag`: The new version tag (e.g., `v2.8.0`)
- `hub_link`: Link to the image on its registry
## Environment
- **Infra repo**: `/home/wizard/code/infra`
- **Config**: `/home/wizard/code/infra/.claude/reference/upgrade-config.json`
- **Kubeconfig**: `/home/wizard/code/infra/config`
- **Vault**: Authenticate with `vault login -method=oidc` if needed. Secrets at `secret/viktor` and `secret/platform`.
- **Git remote**: `origin``github.com/ViktorBarzin/infra.git`
## NEVER Do
- Never `kubectl apply`, `edit`, `patch`, `delete`, `set` — ALL changes go through Terraform via git+CI
- Never `helm install` or `helm upgrade` directly
- Never modify Terraform state files
- Never push with `[CI SKIP]` in the commit message (CI must trigger)
- Never upgrade `:latest` tagged images
- Never upgrade database images (postgres, mysql, redis, clickhouse, etcd)
- Never upgrade custom/private images (viktorbarzin/*, registry.viktorbarzin.me/*, ancamilea/*, mghee/*)
- Never upgrade infrastructure images (registry.k8s.io/*, quay.io/tigera/*, nvcr.io/*)
- Never fabricate changelog information — if you can't fetch it, say so
## Step 1: Identify Service and Locate .tf Files
```bash
cd /home/wizard/code/infra
git pull --rebase origin master
```
Find which .tf files reference this image:
```bash
grep -rl "\"${IMAGE}:" stacks/ --include="*.tf"
```
From the file path, determine the **stack name** (e.g., `stacks/immich/main.tf` → stack is `immich`).
Read the .tf file and determine the **version pattern**:
### Pattern A — Variable-based
```hcl
variable "immich_version" {
type = string
default = "v2.7.4" # ← edit this default value
}
# ...
image = "ghcr.io/immich-app/immich-server:${var.immich_version}"
```
**Action**: Change the `default` value in the variable block.
### Pattern B — Hardcoded image tag
```hcl
image = "vaultwarden/server:1.35.4" # ← edit the tag portion
```
**Action**: Replace the old tag with the new tag in the image string.
### Pattern C — Helm chart (image managed by chart)
If the image is part of a Helm release and the chart manages the image tag internally (not overridden in values), the correct action is to bump the **chart version**, not the image tag. Check:
- Is there a `helm_release` in the same stack?
- Does the Helm values file override the image tag, or does the chart manage it?
- If the chart manages it: check for a new chart version and bump `version = "X.Y.Z"` in the `helm_release`.
- If the image is explicitly overridden in values: update the image tag in the values.
### Pattern D — Helm values override
```hcl
# In values.yaml or templatefile
image:
tag: "v3.13.0" # ← edit this
```
**Action**: Update the tag in the values file.
### Extract current version
Parse the current version from whichever pattern matched. You need both `OLD_VERSION` and `NEW_VERSION` for the changelog fetch.
**Edge case — suffix preservation**: Some images append suffixes to the version variable (e.g., `${var.immich_version}-cuda`). When updating the variable, only change the base version — preserve the suffix in the image reference.
## Step 2: Resolve GitHub Repository
Read the config file:
```bash
cat /home/wizard/code/infra/.claude/reference/upgrade-config.json
```
### Priority order:
1. **Exact match** in `github_repo_overrides` for the full image name
2. **Auto-detect** from image URL:
- `ghcr.io/ORG/REPO``ORG/REPO`
- `docker.io/ORG/REPO` or bare `ORG/REPO` → try `ORG/REPO` on GitHub
- `lscr.io/linuxserver/APP``linuxserver/docker-APP`
3. **For Helm charts**: Check `helm_chart_repo_overrides` for the chart repository URL
4. If auto-detect fails, verify the repo exists:
```bash
GITHUB_TOKEN=$(vault kv get -field=github_pat secret/viktor)
curl -sf -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/${DETECTED_REPO}" > /dev/null
```
If 404, try stripping `-server`, `-backend`, `-app` suffixes.
5. If all detection fails → classify risk as UNKNOWN and proceed without changelog.
## Step 3: Fetch Changelogs via GitHub API
```bash
GITHUB_TOKEN=$(vault kv get -field=github_pat secret/viktor)
curl -s -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/${GITHUB_REPO}/releases?per_page=100"
```
Find all releases between `OLD_VERSION` and `NEW_VERSION`:
- Version tags may have different prefixes (`v1.0.0` vs `1.0.0`). Normalize by stripping leading `v` for comparison.
- Sort releases by semantic version.
- Extract the `body` (release notes) for each intermediate release.
- If the repo uses a CHANGELOG.md instead of GitHub releases, fetch that:
```bash
curl -s -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/${GITHUB_REPO}/contents/CHANGELOG.md" | jq -r .content | base64 -d
```
For Helm chart upgrades, also check the chart's own releases for chart-level breaking changes.
## Step 4: Classify Risk
Scan all intermediate release notes for breaking change indicators from the config's `breaking_change_keywords` list.
### SAFE
- Patch or minor version bump (same major version)
- No breaking change keywords found in any release notes
- **Verification window**: 2 minutes
- **Version jump**: Direct to target version
### CAUTION
- Major version bump (different major version), OR
- Any release note contains breaking change keywords, OR
- Service is in `version_jump_always_step` list (authentik, nextcloud, immich)
- **Verification window**: 10 minutes
- **Version jump**: Step through each intermediate version
- **Extra**: DB backup even if not normally required, Slack alert before starting
### UNKNOWN
- Could not fetch changelog (GitHub API failure, no releases, auto-detect failed)
- Treat as SAFE-level precautions
- Note in commit message that changelog was unavailable
## Step 5: Slack Notification — Starting
```bash
SLACK_WEBHOOK=$(vault kv get -field=alertmanager_slack_api_url secret/platform)
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"[Upgrade Agent] Starting: *${STACK}* ${OLD_VERSION} -> ${NEW_VERSION} (risk: ${RISK})\"}" \
"$SLACK_WEBHOOK"
```
For CAUTION risk, include breaking change excerpts in the Slack message.
## Step 6: Database Backup
Read `db_backed_services` from the config. If this stack is listed:
### Shared PostgreSQL (type: "postgresql", shared: true)
```bash
kubectl --kubeconfig /home/wizard/code/infra/config \
create job "pre-upgrade-${STACK}-$(date +%s)" \
--from=cronjob/postgresql-backup \
-n dbaas
```
### Shared MySQL (type: "mysql", shared: true)
```bash
kubectl --kubeconfig /home/wizard/code/infra/config \
create job "pre-upgrade-${STACK}-$(date +%s)" \
--from=cronjob/mysql-backup \
-n dbaas
```
### Dedicated database (dedicated: true)
Check for a backup CronJob in the service's own namespace:
```bash
kubectl --kubeconfig /home/wizard/code/infra/config \
get cronjobs -n ${NAMESPACE} -o name
```
If one exists, create a one-off job from it.
### Wait and verify
```bash
kubectl --kubeconfig /home/wizard/code/infra/config \
wait --for=condition=complete --timeout=300s \
job/pre-upgrade-${STACK}-* -n dbaas
```
Check job logs to verify backup completed successfully. **If backup fails, ABORT the upgrade and send a Slack alert.**
## Step 7: Apply Version Change
### Edit the .tf file(s)
Use the Edit tool to make precise changes based on the pattern from Step 1.
### Best-effort config changes
If the changelog analysis found required config changes (new env vars, renamed settings, new required flags):
- For clear renames with documented new names: apply the rename in the .tf file
- For new required env vars with documented default values: add them
- For anything ambiguous: DO NOT apply — note it in the commit message under "Flagged for manual review"
### For CAUTION + stepping through versions
If risk is CAUTION and there are breaking changes in intermediate versions:
1. Apply the first intermediate version
2. Commit + push + wait for CI + verify (Steps 8-9)
3. If verification passes, apply next version
4. Repeat until reaching target version
5. If any step fails, roll back to the last known-good version
## Step 8: Commit and Push
```bash
cd /home/wizard/code/infra
git add stacks/${STACK}/
git commit -m "$(cat <<'EOF'
upgrade: ${STACK} ${OLD_VERSION} -> ${NEW_VERSION}
Changelog summary: <1-3 line summary of what changed>
Risk: SAFE|CAUTION|UNKNOWN
Breaking changes: none|<list of breaking changes>
DB backup: yes (job: pre-upgrade-${STACK}-XXXXX)|no (not DB-backed)|skipped
Config changes applied: none|<list>
Flagged for manual review: none|<list of ambiguous changes>
Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
EOF
)"
git push origin master
```
Record the commit SHA — you'll need it for rollback:
```bash
UPGRADE_SHA=$(git rev-parse HEAD)
```
**If push fails** (conflict with CI state commit): `git pull --rebase origin master && git push origin master`. Retry up to 3 times.
## Step 9: Wait for Woodpecker CI
The commit triggers the `app-stacks.yml` pipeline (or `default.yml` for platform stacks).
```bash
WOODPECKER_TOKEN=$(vault kv get -field=woodpecker_token secret/viktor)
```
Poll for the pipeline triggered by our commit:
```bash
# Get latest pipeline
curl -s -H "Authorization: Bearer $WOODPECKER_TOKEN" \
"https://ci.viktorbarzin.me/api/repos/1/pipelines?page=1&per_page=5"
```
Find the pipeline matching our commit SHA. Poll every 30 seconds until status is `success`, `failure`, `error`, or `killed`. Timeout after 15 minutes.
**If CI fails** → proceed to Step 10 (rollback).
**If CI succeeds** → proceed to verification.
## Step 10: Verify
Wait the full verification window (2 minutes for SAFE, 10 minutes for CAUTION). During the window, run checks every 15 seconds.
### Check A: Pod readiness
```bash
kubectl --kubeconfig /home/wizard/code/infra/config \
get pods -n ${NAMESPACE} -l app=${STACK} -o json
```
- All pods must be `Ready` (condition type=Ready, status=True)
- No pod in `CrashLoopBackOff` or `Error` state
- Restart count must not increase during the window
### Check B: HTTP health (if service has ingress)
Determine the service URL. Most services use `https://<stack>.viktorbarzin.me`.
```bash
curl -sf -o /dev/null -w "%{http_code}" \
"https://${STACK}.viktorbarzin.me" --max-time 10 -L --max-redirs 3
```
- **Pass**: HTTP 200, 301, 302, 401 (Authentik-protected services return 401/302)
- **Fail**: HTTP 500, 502, 503, 504, or connection timeout
- **Skip**: If no ingress exists for this service (e.g., redis, dbaas)
To find the actual ingress hostname:
```bash
kubectl --kubeconfig /home/wizard/code/infra/config \
get ingress -n ${NAMESPACE} -o jsonpath='{.items[*].spec.rules[*].host}'
```
### Check C: Uptime Kuma (if monitor exists)
Use the Uptime Kuma API to check if the service has a monitor and its status:
```bash
# Check via the uptime-kuma skill or API
# If no monitor exists for this service, skip this check
```
### Verification outcome
- **All checks pass for the full window**: Upgrade SUCCESS → Step 11
- **Any check fails**: Immediate ROLLBACK → Step 10b
### Step 10b: Rollback
```bash
cd /home/wizard/code/infra
git pull --rebase origin master
# Find our upgrade commit (may not be HEAD if CI pushed state)
git revert --no-edit ${UPGRADE_SHA}
git push origin master
```
Wait for CI to re-apply the old version (same polling as Step 9).
Re-run verification checks to confirm rollback succeeded. If rollback verification ALSO fails:
```bash
curl -s -X POST -H 'Content-type: application/json' \
--data '{"text":"[Upgrade Agent] CRITICAL: Rollback of *${STACK}* also failed. Manual intervention required."}' \
"$SLACK_WEBHOOK"
```
## Step 11: Report Results
### On success
```bash
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"[Upgrade Agent] SUCCESS: *${STACK}* upgraded ${OLD_VERSION} -> ${NEW_VERSION}\nVerification: pods ready, HTTP OK${UPTIME_KUMA_MSG}\nCommit: ${UPGRADE_SHA}\"}" \
"$SLACK_WEBHOOK"
```
### On failure + rollback
```bash
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"[Upgrade Agent] FAILED + ROLLED BACK: *${STACK}* ${OLD_VERSION} -> ${NEW_VERSION}\nReason: ${FAILURE_REASON}\nRollback commit: ${ROLLBACK_SHA}\nRollback status: ${ROLLBACK_STATUS}\"}" \
"$SLACK_WEBHOOK"
```
## Edge Cases
### Multiple images in same stack
If DIUN fires separate webhooks for different images in the same stack (e.g., Immich server + ML), the second invocation should:
1. Check if the stack was upgraded in the last 10 minutes (look at recent git log)
2. If so, check if the new image is already at the target version
3. If not, apply the second image update as a follow-up commit
### Helm chart with atomic=true
Services like Authentik and Kyverno use `atomic = true`. If the Helm release fails, it auto-rolls back at the Helm level. The agent should still do its own verification, but can trust the deployment state.
### Services without standard app label
Some services use different label selectors. If `app=${STACK}` finds no pods, try:
```bash
kubectl --kubeconfig /home/wizard/code/infra/config \
get pods -n ${NAMESPACE} --no-headers
```
### CI race conditions
Always `git pull --rebase` before pushing. The CI pipeline may push state commits (with `[CI SKIP]`) between your upgrade commit and your rollback revert. The revert targets `${UPGRADE_SHA}` specifically, so this is safe.
### Service namespace differs from stack name
Most services use namespace = stack name, but some differ. Read the .tf file to find:
```hcl
resource "kubernetes_namespace" "..." {
metadata {
name = "actual-namespace"
}
}
```

View file

@ -0,0 +1,166 @@
{
"github_repo_overrides": {
"ghcr.io/immich-app/immich-server": "immich-app/immich",
"ghcr.io/immich-app/immich-machine-learning": "immich-app/immich",
"docker.io/vaultwarden/server": "dani-garcia/vaultwarden",
"vaultwarden/server": "dani-garcia/vaultwarden",
"docker.io/mailserver/docker-mailserver": "docker-mailserver/docker-mailserver",
"mailserver/docker-mailserver": "docker-mailserver/docker-mailserver",
"docker.n8n.io/n8nio/n8n": "n8n-io/n8n",
"matrixdotorg/synapse": "element-hq/synapse",
"headscale/headscale": "juanfont/headscale",
"technitium/dns-server": "TechnitiumSoftware/DnsServer",
"ghcr.io/paperless-ngx/paperless-ngx": "paperless-ngx/paperless-ngx",
"ghcr.io/blakeblackshear/frigate": "blakeblackshear/frigate",
"ghcr.io/dgtlmoon/changedetection.io": "dgtlmoon/changedetection.io",
"ghcr.io/linkwarden/linkwarden": "linkwarden/linkwarden",
"ghcr.io/open-webui/open-webui": "open-webui/open-webui",
"ghcr.io/advplyr/audiobookshelf": "advplyr/audiobookshelf",
"ghcr.io/browserless/chromium": "browserless/chromium",
"ghcr.io/rybbit-io/rybbit-backend": "rybbit-io/rybbit",
"ghcr.io/rybbit-io/rybbit-client": "rybbit-io/rybbit",
"ghcr.io/gurucomputing/headscale-ui": "gurucomputing/headscale-ui",
"ghcr.io/dmunozv04/isponsorblocktv": "dmunozv04/iSponsorBlockTV",
"ghcr.io/gramps-project/grampsweb": "gramps-project/gramps-web",
"ghcr.io/project-osrm/osrm-backend": "Project-OSRM/osrm-backend",
"ghcr.io/flaresolverr/flaresolverr": "FlareSolverr/FlareSolverr",
"ghcr.io/therobbiedavis/listenarr": "therobbiedavis/listenarr",
"ghcr.io/immichframe/immichframe": "immichframe/ImmichFrame",
"lscr.io/linuxserver/qbittorrent": "linuxserver/docker-qbittorrent",
"lscr.io/linuxserver/lidarr": "linuxserver/docker-lidarr",
"lscr.io/linuxserver/prowlarr": "linuxserver/docker-prowlarr",
"lscr.io/linuxserver/readarr": "linuxserver/docker-readarr",
"lscr.io/linuxserver/speedtest-tracker": "linuxserver/docker-speedtest-tracker",
"privatebin/nginx-fpm-alpine": "PrivateBin/PrivateBin",
"freshrss/freshrss": "FreshRSS/FreshRSS",
"hackmdio/hackmd": "hackmdio/codimd",
"onlyoffice/documentserver": "ONLYOFFICE/DocumentServer",
"netboxcommunity/netbox": "netbox-community/netbox",
"stirlingtools/stirling-pdf": "Stirling-Tools/Stirling-PDF",
"phpipam/phpipam-www": "phpipam/phpipam",
"rhasspy/wyoming-whisper": "rhasspy/wyoming-addons",
"rhasspy/wyoming-piper": "rhasspy/wyoming-addons",
"clickhouse/clickhouse-server": "ClickHouse/ClickHouse",
"docker.io/athomasson2/ebook2audiobook": "athomasson2/ebook2audiobook",
"amruthpillai/reactive-resume": "AmruthPillworking/Reactive-Resume",
"dpage/pgadmin4": "pgadmin-org/pgadmin4",
"ghcr.io/yourok/torrserver": "YouROK/TorrServer",
"opentripplanner/opentripplanner": "opentripplanner/OpenTripPlanner",
"codeberg.org/forgejo/forgejo": "forgejo/forgejo",
"shlinkio/shlink": "shlinkio/shlink",
"shlinkio/shlink-web-client": "shlinkio/shlink-web-client",
"dgtlmoon/sockpuppetbrowser": "dgtlmoon/sockpuppetbrowser"
},
"helm_chart_repo_overrides": {
"https://charts.goauthentik.io/": "goauthentik/authentik",
"https://traefik.github.io/charts": "traefik/traefik-helm-chart",
"https://kyverno.github.io/kyverno/": "kyverno/kyverno",
"https://mysql.github.io/mysql-operator/": "mysql/mysql-operator",
"https://cloudnative-pg.github.io/charts": "cloudnative-pg/cloudnative-pg",
"https://charts.external-secrets.io": "external-secrets/external-secrets",
"https://metallb.github.io/metallb": "metallb/metallb",
"https://nextcloud.github.io/helm/": "nextcloud/helm",
"https://crowdsecurity.github.io/helm-charts": "crowdsecurity/helm-charts",
"https://helm.releases.hashicorp.com": "hashicorp/vault-helm",
"https://bitnami-labs.github.io/sealed-secrets": "bitnami-labs/sealed-secrets",
"https://grafana.github.io/helm-charts": "grafana/helm-charts",
"https://prometheus-community.github.io/helm-charts": "prometheus-community/helm-charts",
"https://democratic-csi.github.io/charts/": "democratic-csi/democratic-csi",
"https://stakater.github.io/stakater-charts": "stakater/Reloader",
"https://topolvm.github.io/pvc-autoresizer": "topolvm/pvc-autoresizer",
"https://kubernetes-sigs.github.io/descheduler/": "kubernetes-sigs/descheduler",
"https://kubernetes-sigs.github.io/metrics-server/": "kubernetes-sigs/metrics-server",
"https://charts.fairwinds.com/stable": "FairwindsOps/goldilocks",
"https://helm.ngc.nvidia.com/nvidia": "NVIDIA/gpu-operator",
"oci://ghcr.io/woodpecker-ci/helm": "woodpecker-ci/helm",
"oci://10.0.20.10:5000/bitnamicharts": "bitnami/charts"
},
"db_backed_services": {
"affine": { "type": "postgresql", "db_name": "affine", "shared": true },
"claude-memory": { "type": "postgresql", "db_name": "claude_memory", "shared": true },
"crowdsec": { "type": "postgresql", "db_name": "crowdsec", "shared": true },
"dawarich": { "type": "postgresql", "db_name": "dawarich", "shared": true },
"health": { "type": "postgresql", "db_name": "health", "shared": true },
"linkwarden": { "type": "postgresql", "db_name": "linkwarden", "shared": true },
"matrix": { "type": "postgresql", "db_name": "matrix", "shared": true },
"n8n": { "type": "postgresql", "db_name": "n8n", "shared": true },
"netbox": { "type": "postgresql", "db_name": "netbox", "shared": true },
"rybbit": { "type": "postgresql", "db_name": "rybbit", "shared": true },
"tandoor": { "type": "postgresql", "db_name": "tandoor", "shared": true },
"technitium": { "type": "postgresql", "db_name": "technitium", "shared": true },
"trading-bot": { "type": "postgresql", "db_name": "trading_bot", "shared": true },
"woodpecker": { "type": "postgresql", "db_name": "woodpecker", "shared": true },
"immich": { "type": "postgresql", "db_name": "immich", "dedicated": true, "backup_cronjob": "postgresql-backup", "backup_namespace": "immich" },
"authentik": { "type": "postgresql", "dedicated": true, "notes": "Uses PgBouncer, managed by Helm chart" },
"hackmd": { "type": "mysql", "db_name": "codimd", "shared": true },
"mailserver": { "type": "mysql", "db_name": "mailserver", "shared": true },
"monitoring": { "type": "mysql", "db_name": "monitoring", "shared": true, "notes": "Grafana backend" },
"nextcloud": { "type": "mysql", "db_name": "nextcloud", "shared": true },
"onlyoffice": { "type": "mysql", "db_name": "onlyoffice", "shared": true },
"paperless-ngx": { "type": "mysql", "db_name": "paperless_ngx", "shared": true },
"phpipam": { "type": "mysql", "db_name": "phpipam", "shared": true },
"real-estate-crawler": { "type": "mysql", "db_name": "wrongmove", "shared": true },
"speedtest": { "type": "mysql", "db_name": "speedtest", "shared": true },
"url": { "type": "mysql", "db_name": "shlink", "shared": true },
"vault": { "type": "mysql", "db_name": "vault", "shared": true }
},
"backup_infrastructure": {
"postgresql": {
"cronjob_name": "postgresql-backup",
"namespace": "dbaas",
"credential_secret": "pg-cluster-superuser",
"credential_key": "password",
"host": "pg-cluster-rw.dbaas",
"backup_pvc": "dbaas-postgresql-backup-host"
},
"mysql": {
"cronjob_name": "mysql-backup",
"namespace": "dbaas",
"credential_secret": "cluster-secret",
"credential_key": "ROOT_PASSWORD",
"host": "mysql.dbaas",
"backup_pvc": "dbaas-mysql-backup-host"
}
},
"version_jump_always_step": [
"authentik",
"nextcloud",
"immich"
],
"auto_detect_rules": {
"ghcr.io/{org}/{repo}": "Use org/repo directly, strip -server/-backend suffixes if repo 404s",
"docker.io/{org}/{repo}": "Try org/repo on GitHub",
"lscr.io/linuxserver/{app}": "Map to linuxserver/docker-{app}",
"quay.io/{org}/{repo}": "Try org/repo on GitHub",
"registry.gitlab.com/{org}/{repo}": "Try org/repo on GitHub (may be GitLab-only)"
},
"skip_image_patterns": [
"viktorbarzin/*",
"registry.viktorbarzin.me/*",
"ancamilea/*",
"mghee/*",
"*postgres*",
"*mysql*",
"*redis*",
"*clickhouse*",
"*etcd*",
"registry.k8s.io/*",
"quay.io/tigera/*",
"quay.io/metallb/*",
"nvcr.io/*",
"reg.kyverno.io/*"
],
"breaking_change_keywords": [
"breaking",
"BREAKING",
"migration required",
"schema change",
"database migration",
"manual intervention",
"action required",
"removed",
"deprecated",
"renamed",
"incompatible"
]
}

View file

@ -0,0 +1,53 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg
venv/
.venv/
ENV/
# Node
node_modules/
frontend/dist/
npm-debug.log*
yarn-debug.log*
yarn-error.log*
# IDE
.idea/
.vscode/
*.swp
*.swo
*~
# OS
.DS_Store
._*
Thumbs.db
# Uploads and outputs (runtime data)
uploads/
outputs/
# Environment
.env
.env.local
.env.*.local

View file

@ -0,0 +1,35 @@
FROM viktorbarzin/audiblez:latest
# Install Node.js for building frontend
RUN apt-get update && \
apt-get install -y curl && \
curl -fsSL https://deb.nodesource.com/setup_22.x | bash - && \
apt-get install -y nodejs && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Build frontend
COPY frontend/package.json frontend/package-lock.json* ./frontend/
WORKDIR /app/frontend
RUN npm install
COPY frontend/ ./
RUN npm run build
# Install backend dependencies
WORKDIR /app/backend
COPY backend/requirements.txt ./
RUN pip install --no-cache-dir --break-system-packages -r requirements.txt
COPY backend/ ./
# Copy voice samples
COPY samples/ /app/samples/
WORKDIR /app/backend
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

View file

@ -0,0 +1,52 @@
# Audiblez Web UI
Web interface for converting EPUB files to audiobooks using [audiblez](https://github.com/santinic/audiblez).
<img width="1702" height="1145" alt="image" src="https://github.com/user-attachments/assets/ba0f9090-a8e9-4550-9c9b-473058f19cbb" />
## Features
- Upload EPUB files via drag & drop
- Select from 50+ voices across multiple languages
- Preview voice samples before converting
- Real-time progress updates via WebSocket
- Download completed audiobooks
## Development
### Backend
```bash
cd backend
pip install -r requirements.txt
uvicorn main:app --reload
```
### Frontend
```bash
cd frontend
npm install
npm run dev
```
### Voice Samples
Generate voice samples (requires audiblez environment):
```bash
python generate_samples.py samples/
```
## Docker Build
```bash
docker build -t audiblez-web .
docker run -p 8000:8000 -v /path/to/data:/mnt audiblez-web
```
## Deployment
Deployed to Kubernetes via Terraform. The service mounts NFS storage at `/mnt` for:
- `/mnt/uploads` - Uploaded EPUB files
- `/mnt/outputs` - Generated audiobooks

View file

@ -0,0 +1,104 @@
"""
Authentication module for extracting user identity from Authentik headers.
When nginx ingress is protected with Authentik, these headers are forwarded:
- X-Authentik-Username: The user's username
- X-Authentik-Uid: Unique user ID (used for directory separation)
- X-Authentik-Email: User's email
- X-Authentik-Name: User's display name
- X-Authentik-Groups: Comma-separated group list
"""
from dataclasses import dataclass
from fastapi import Request, HTTPException
from typing import Optional
import re
@dataclass
class User:
"""Represents an authenticated user from Authentik."""
uid: str
username: str
email: Optional[str] = None
name: Optional[str] = None
groups: list[str] = None
def __post_init__(self):
if self.groups is None:
self.groups = []
def sanitize_user_id(uid: str) -> str:
"""
Sanitize user ID for use as a directory name.
Only allows alphanumeric, hyphens, and underscores.
"""
if not uid:
raise ValueError("User ID cannot be empty")
# Only allow safe characters for filesystem
safe_uid = re.sub(r'[^a-zA-Z0-9\-_]', '', uid)
if not safe_uid:
raise ValueError("User ID contains no valid characters")
# Limit length to prevent path issues
if len(safe_uid) > 64:
safe_uid = safe_uid[:64]
return safe_uid
async def get_current_user(request: Request) -> User:
"""
Extract user information from Authentik headers.
This is a FastAPI dependency that should be used on protected endpoints.
Raises 401 if user headers are not present (not authenticated).
"""
# Header names are case-insensitive, but commonly forwarded as:
uid = request.headers.get("X-Authentik-Uid")
username = request.headers.get("X-Authentik-Username")
email = request.headers.get("X-Authentik-Email")
name = request.headers.get("X-Authentik-Name")
groups_str = request.headers.get("X-Authentik-Groups", "")
# For development/testing, check for alternative header names
if not uid:
uid = request.headers.get("X-Authentik-Userid")
if not uid:
uid = request.headers.get("Remote-User")
if not uid or not username:
raise HTTPException(
status_code=401,
detail="Authentication required. Authentik headers not found."
)
try:
safe_uid = sanitize_user_id(uid)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
# Parse groups (comma-separated)
groups = [g.strip() for g in groups_str.split(",") if g.strip()]
return User(
uid=safe_uid,
username=username,
email=email,
name=name,
groups=groups
)
async def get_optional_user(request: Request) -> Optional[User]:
"""
Extract user information if available, or return None.
Use this for endpoints that work with or without authentication.
"""
try:
return await get_current_user(request)
except HTTPException:
return None

View file

@ -0,0 +1,307 @@
from fastapi import APIRouter, UploadFile, File, HTTPException, Depends
from fastapi.responses import FileResponse
from pydantic import BaseModel
from pathlib import Path
import shutil
import asyncio
import re
from models.schemas import Voice, JobCreate, Job, JobProgress, ChapterInfo
from services.voices import get_all_voices, get_voices_by_language, get_voice
from services.converter import job_manager
from api.auth import User, get_current_user
router = APIRouter(prefix="/api")
def sanitize_filename(filename: str, max_length: int = 200) -> str:
"""
Sanitize a filename to prevent path traversal and shell injection.
Only allows alphanumeric characters, spaces, hyphens, underscores, parentheses, and dots.
"""
if not filename:
raise ValueError("Filename cannot be empty")
# Remove any path components (prevent path traversal)
filename = Path(filename).name
# Only allow safe characters: alphanumeric, space, hyphen, underscore, parentheses, dot
# This regex removes anything that isn't in the allowed set
safe_filename = re.sub(r'[^a-zA-Z0-9\s\-_().]', '', filename)
# Collapse multiple spaces/dots
safe_filename = re.sub(r'\s+', ' ', safe_filename)
safe_filename = re.sub(r'\.+', '.', safe_filename)
# Strip leading/trailing whitespace and dots
safe_filename = safe_filename.strip(' .')
# Limit length
if len(safe_filename) > max_length:
safe_filename = safe_filename[:max_length]
if not safe_filename:
raise ValueError("Filename contains no valid characters")
return safe_filename
class RenameRequest(BaseModel):
new_name: str
# ============================================================================
# Voice endpoints (no auth required - public info)
# ============================================================================
@router.get("/voices", response_model=list[Voice])
async def list_voices():
"""Get all available voices."""
return get_all_voices()
@router.get("/voices/grouped")
async def list_voices_grouped():
"""Get voices grouped by language."""
return get_voices_by_language()
@router.get("/voices/{voice_id}/sample")
async def get_voice_sample(voice_id: str):
"""Get voice sample audio file."""
voice = get_voice(voice_id)
if not voice:
raise HTTPException(status_code=404, detail="Voice not found")
# Try NFS storage first (persistent), then bundled samples
sample_path = Path("/mnt/samples") / f"{voice_id}.mp3"
if not sample_path.exists():
sample_path = Path("/app/samples") / f"{voice_id}.mp3"
if not sample_path.exists():
raise HTTPException(status_code=404, detail="Sample not available")
return FileResponse(sample_path, media_type="audio/mpeg")
# ============================================================================
# User info endpoint
# ============================================================================
@router.get("/me")
async def get_current_user_info(user: User = Depends(get_current_user)):
"""Get current authenticated user info."""
return {
"uid": user.uid,
"username": user.username,
"email": user.email,
"name": user.name,
"groups": user.groups
}
# ============================================================================
# Upload endpoints (user-scoped)
# ============================================================================
@router.post("/upload")
async def upload_file(file: UploadFile = File(...), user: User = Depends(get_current_user)):
"""Upload an EPUB file to user's directory."""
if not file.filename.endswith(".epub"):
raise HTTPException(status_code=400, detail="Only EPUB files are supported")
# Save file to user's uploads directory
upload_dir = job_manager.get_user_uploads_dir(user.uid)
# Sanitize the filename
try:
safe_filename = sanitize_filename(file.filename)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
file_path = upload_dir / safe_filename
with file_path.open("wb") as buffer:
shutil.copyfileobj(file.file, buffer)
return {"filename": safe_filename, "size": file_path.stat().st_size}
# ============================================================================
# Job endpoints (user-scoped)
# ============================================================================
@router.post("/jobs", response_model=Job)
async def create_job(job_create: JobCreate, user: User = Depends(get_current_user)):
"""Create a new conversion job."""
# Verify file exists in user's uploads
file_path = job_manager.get_user_uploads_dir(user.uid) / job_create.filename
if not file_path.exists():
raise HTTPException(status_code=404, detail="File not found")
# Verify voice exists
voice = get_voice(job_create.voice)
if not voice:
raise HTTPException(status_code=404, detail="Voice not found")
# Create job with user ownership
job = job_manager.create_job(
user_id=user.uid,
filename=job_create.filename,
voice=job_create.voice,
speed=job_create.speed,
use_gpu=job_create.use_gpu
)
# Start conversion in background
asyncio.create_task(job_manager.run_conversion(job.id))
return job
@router.get("/jobs", response_model=list[Job])
async def list_jobs(user: User = Depends(get_current_user)):
"""Get all jobs for current user."""
return job_manager.get_user_jobs(user.uid)
@router.get("/jobs/{job_id}", response_model=Job)
async def get_job(job_id: str, user: User = Depends(get_current_user)):
"""Get a specific job (must be owned by user)."""
job = job_manager.get_job(job_id, user.uid)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
return job
@router.get("/jobs/{job_id}/download")
async def download_job(job_id: str, user: User = Depends(get_current_user)):
"""Download the completed audiobook."""
job = job_manager.get_job(job_id, user.uid)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
if job.status != "completed":
raise HTTPException(status_code=400, detail="Job not completed")
if not job.output_file:
raise HTTPException(status_code=404, detail="Output file not found")
output_path = job_manager.get_user_outputs_dir(user.uid) / job_id / job.output_file
if not output_path.exists():
raise HTTPException(status_code=404, detail="Output file not found")
return FileResponse(
output_path,
media_type="audio/mp4",
filename=job.output_file
)
@router.get("/jobs/{job_id}/chapters", response_model=list[ChapterInfo])
async def get_job_chapters(job_id: str, user: User = Depends(get_current_user)):
"""Get chapter metadata for a job's audiobook."""
job = job_manager.get_job(job_id, user.uid)
if not job:
raise HTTPException(status_code=404, detail="Job not found")
if job.status != "completed":
raise HTTPException(status_code=400, detail="Job not completed")
return job.chapters
@router.delete("/jobs/{job_id}")
async def delete_job(job_id: str, user: User = Depends(get_current_user)):
"""Delete a job (must be owned by user)."""
if not job_manager.delete_job(job_id, user.uid):
raise HTTPException(status_code=404, detail="Job not found")
return {"status": "deleted"}
# ============================================================================
# Audiobook endpoints (user-scoped)
# ============================================================================
@router.get("/audiobooks")
async def list_audiobooks(user: User = Depends(get_current_user)):
"""List all completed audiobooks for current user."""
return job_manager.get_user_audiobooks(user.uid)
@router.get("/audiobooks/{audiobook_id}/download")
async def download_audiobook(audiobook_id: str, user: User = Depends(get_current_user)):
"""Download an audiobook by its ID (job folder name)."""
output_dir = job_manager.get_user_outputs_dir(user.uid) / audiobook_id
if not output_dir.exists():
raise HTTPException(status_code=404, detail="Audiobook not found")
# Find the audio file
audio_files = list(output_dir.glob("*.m4b")) + list(output_dir.glob("*.mp3"))
if not audio_files:
raise HTTPException(status_code=404, detail="Audio file not found")
audio_file = audio_files[0]
media_type = "audio/mp4" if audio_file.suffix == ".m4b" else "audio/mpeg"
return FileResponse(
audio_file,
media_type=media_type,
filename=audio_file.name
)
@router.delete("/audiobooks/{audiobook_id}")
async def delete_audiobook(audiobook_id: str, user: User = Depends(get_current_user)):
"""Delete an audiobook and its folder."""
output_dir = job_manager.get_user_outputs_dir(user.uid) / audiobook_id
if not output_dir.exists():
raise HTTPException(status_code=404, detail="Audiobook not found")
# Delete all files in the directory and the directory itself
for file in output_dir.iterdir():
file.unlink()
output_dir.rmdir()
return {"status": "deleted"}
@router.patch("/audiobooks/{audiobook_id}/rename")
async def rename_audiobook(audiobook_id: str, rename_request: RenameRequest, user: User = Depends(get_current_user)):
"""Rename an audiobook file. Input is sanitized to prevent path traversal and injection."""
output_dir = job_manager.get_user_outputs_dir(user.uid) / audiobook_id
if not output_dir.exists():
raise HTTPException(status_code=404, detail="Audiobook not found")
# Find the audio file
audio_files = list(output_dir.glob("*.m4b")) + list(output_dir.glob("*.mp3"))
if not audio_files:
raise HTTPException(status_code=404, detail="Audio file not found")
current_file = audio_files[0]
current_extension = current_file.suffix # .m4b or .mp3
# Sanitize the new name
try:
safe_name = sanitize_filename(rename_request.new_name)
except ValueError as e:
raise HTTPException(status_code=400, detail=str(e))
# Ensure the new name has the correct extension
if not safe_name.lower().endswith(current_extension.lower()):
safe_name = safe_name + current_extension
# Create the new path (same directory, new filename)
new_file = output_dir / safe_name
# Check if target already exists
if new_file.exists() and new_file != current_file:
raise HTTPException(status_code=400, detail="A file with that name already exists")
# Rename the file using pathlib (no shell commands)
current_file.rename(new_file)
return {"status": "renamed", "new_filename": safe_name}

View file

@ -0,0 +1,101 @@
from fastapi import WebSocket, WebSocketDisconnect, HTTPException
from services.converter import job_manager
from models.schemas import JobProgress
from api.auth import sanitize_user_id
class ConnectionManager:
"""Manages WebSocket connections for job progress updates."""
def __init__(self):
self.active_connections: dict[str, list[WebSocket]] = {}
async def connect(self, job_id: str, websocket: WebSocket):
"""Connect a websocket for a specific job."""
await websocket.accept()
if job_id not in self.active_connections:
self.active_connections[job_id] = []
self.active_connections[job_id].append(websocket)
def disconnect(self, job_id: str, websocket: WebSocket):
"""Disconnect a websocket."""
if job_id in self.active_connections:
if websocket in self.active_connections[job_id]:
self.active_connections[job_id].remove(websocket)
if not self.active_connections[job_id]:
del self.active_connections[job_id]
async def send_progress(self, job_id: str, progress: JobProgress):
"""Send progress update to all connected clients for a job."""
if job_id in self.active_connections:
disconnected = []
for connection in self.active_connections[job_id]:
try:
await connection.send_json(progress.model_dump())
except:
disconnected.append(connection)
# Remove disconnected clients
for conn in disconnected:
self.disconnect(job_id, conn)
manager = ConnectionManager()
def get_user_from_websocket(websocket: WebSocket) -> str | None:
"""
Extract user ID from websocket headers.
WebSocket connections receive HTTP headers during the upgrade handshake.
"""
# Try various header name formats
uid = websocket.headers.get("x-authentik-uid")
if not uid:
uid = websocket.headers.get("X-Authentik-Uid")
if not uid:
uid = websocket.headers.get("x-authentik-userid")
if not uid:
uid = websocket.headers.get("remote-user")
if uid:
try:
return sanitize_user_id(uid)
except ValueError:
return None
return None
async def websocket_endpoint(websocket: WebSocket, job_id: str):
"""WebSocket endpoint for job progress updates."""
# Extract user from headers
user_id = get_user_from_websocket(websocket)
# Verify job exists and user has access
job = job_manager.get_job(job_id, user_id)
if not job:
# Close connection if job not found or not owned by user
await websocket.close(code=4004, reason="Job not found or access denied")
return
await manager.connect(job_id, websocket)
# Register progress callback
async def progress_callback(progress: JobProgress):
await manager.send_progress(job_id, progress)
job_manager.register_progress_callback(job_id, progress_callback)
try:
# Send initial status
await websocket.send_json({
"progress": job.progress,
"status": job.status,
})
# Wait for messages (keep-alive)
while True:
await websocket.receive_text()
except WebSocketDisconnect:
manager.disconnect(job_id, websocket)
job_manager.unregister_progress_callback(job_id, progress_callback)

View file

@ -0,0 +1,41 @@
from fastapi import FastAPI
from fastapi.staticfiles import StaticFiles
from fastapi.middleware.cors import CORSMiddleware
from pathlib import Path
from api.routes import router
from api.websocket import websocket_endpoint
app = FastAPI(title="Audiblez Web API", version="1.0.0")
# CORS middleware for development
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Health check - must be before static mount
@app.get("/health")
async def health_check():
return {"status": "healthy"}
# Include API routes
app.include_router(router)
# WebSocket endpoint
@app.websocket("/ws/jobs/{job_id}")
async def websocket_route(websocket, job_id: str):
await websocket_endpoint(websocket, job_id)
# Serve static frontend files - MUST BE LAST as it catches all routes
static_dir = Path("/app/frontend/dist")
if static_dir.exists():
app.mount("/", StaticFiles(directory=str(static_dir), html=True), name="static")
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)

View file

@ -0,0 +1,59 @@
from pydantic import BaseModel, Field
from typing import Optional, Literal
from datetime import datetime
from enum import Enum
class JobStatus(str, Enum):
PENDING = "pending"
PROCESSING = "processing"
COMPLETED = "completed"
FAILED = "failed"
CANCELLED = "cancelled"
class Voice(BaseModel):
id: str
name: str
language: str
gender: Literal["M", "F"]
quality: str = "medium"
class JobCreate(BaseModel):
filename: str
voice: str
speed: float = Field(default=1.0, ge=0.5, le=2.0)
use_gpu: bool = True
class ChapterInfo(BaseModel):
"""Chapter metadata extracted from EPUB and embedded in M4B."""
title: str
start_ms: int
end_ms: int
class JobProgress(BaseModel):
progress: float = Field(ge=0, le=100)
eta: Optional[str] = None
current_chapter: Optional[str] = None
total_chapters: Optional[int] = None
status: JobStatus
class Job(BaseModel):
id: str
user_id: str # User who owns this job
filename: str
voice: str
speed: float
use_gpu: bool
status: JobStatus
progress: float = 0
created_at: datetime
updated_at: datetime
error: Optional[str] = None
output_file: Optional[str] = None
total_chapters: int = 0
chapters: list[ChapterInfo] = []

View file

@ -0,0 +1,11 @@
fastapi==0.115.0
uvicorn[standard]==0.32.0
python-multipart==0.0.12
websockets==13.1
aiofiles==24.1.0
pydantic==2.9.2
pydantic-settings==2.6.0
ebooklib>=0.18
pydub>=0.25.1
beautifulsoup4>=4.12.0
lxml>=5.0.0

View file

@ -0,0 +1,156 @@
"""M4B chapter metadata embedding service."""
import re
import subprocess
import tempfile
from pathlib import Path
from pydub import AudioSegment
from .epub_parser import Chapter
def get_chapter_audio_durations(output_dir: Path) -> list[int]:
"""Calculate duration of each chapter WAV file in milliseconds.
audiblez produces files like: {bookname}_chapter_{N}.wav
e.g., mybook_chapter_1.wav, mybook_chapter_2.wav
Args:
output_dir: Directory containing the WAV files
Returns:
List of durations in milliseconds, ordered by chapter number
"""
durations = []
# Find all chapter WAV files - audiblez uses {name}_chapter_{N}.wav
wav_files = list(output_dir.glob("*_chapter_*.wav"))
if not wav_files:
# Fallback: try any WAV files
wav_files = list(output_dir.glob("*.wav"))
if not wav_files:
print(f"No WAV files found in {output_dir}")
return durations
# Sort by extracting chapter number from filename using regex
# Pattern: look for _chapter_N or chapter_N in filename
def extract_chapter_num(path: Path) -> int:
name = path.stem
# Try to find chapter number with regex - handles various patterns
# e.g., "book_chapter_1", "mybook_chapter_12", "chapter_3_voice"
match = re.search(r'chapter[_-]?(\d+)', name, re.IGNORECASE)
if match:
return int(match.group(1))
# Fallback: find any number in the filename
match = re.search(r'(\d+)', name)
if match:
return int(match.group(1))
return 0
wav_files.sort(key=extract_chapter_num)
print(f"Found {len(wav_files)} WAV files to process for durations")
for wav_file in wav_files:
try:
audio = AudioSegment.from_file(str(wav_file))
durations.append(len(audio)) # duration in ms
print(f" Chapter WAV: {wav_file.name} - {len(audio)}ms ({len(audio)/1000:.1f}s)")
except Exception as e:
print(f" Error reading {wav_file}: {e}")
continue
return durations
def generate_ffmpeg_metadata(chapters: list[Chapter], durations: list[int]) -> str:
"""Generate FFmpeg FFMETADATA1 format string with chapter markers.
Args:
chapters: List of Chapter objects with titles
durations: List of durations in milliseconds for each chapter
Returns:
FFMETADATA1 formatted string
"""
metadata = ";FFMETADATA1\n"
current_time_ms = 0
# Match chapters with durations
num_chapters = min(len(chapters), len(durations))
for i in range(num_chapters):
chapter = chapters[i]
duration = durations[i]
chapter.start_ms = current_time_ms
chapter.end_ms = current_time_ms + duration
chapter.duration_ms = duration
metadata += f"\n[CHAPTER]\n"
metadata += f"TIMEBASE=1/1000\n"
metadata += f"START={chapter.start_ms}\n"
metadata += f"END={chapter.end_ms}\n"
metadata += f"title={chapter.title}\n"
current_time_ms = chapter.end_ms
return metadata
def embed_chapters_in_m4b(input_m4b: Path, metadata_content: str) -> Path:
"""Re-mux M4B with chapter metadata using FFmpeg.
Args:
input_m4b: Path to the input M4B file
metadata_content: FFMETADATA1 formatted string
Returns:
Path to the output M4B with chapters (same as input, replaced)
"""
output_m4b = input_m4b.with_suffix('.chaptered.m4b')
# Write metadata to temporary file
with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False) as f:
f.write(metadata_content)
metadata_file = Path(f.name)
try:
cmd = [
'ffmpeg', '-y',
'-i', str(input_m4b),
'-f', 'ffmetadata', '-i', str(metadata_file),
'-map', '0:a',
'-map_metadata', '1',
'-c:a', 'copy', # Copy audio without re-encoding
'-movflags', '+faststart+use_metadata_tags',
str(output_m4b)
]
print(f"Running FFmpeg: {' '.join(cmd)}")
result = subprocess.run(cmd, check=True, capture_output=True, text=True)
if result.returncode != 0:
print(f"FFmpeg stderr: {result.stderr}")
raise RuntimeError(f"FFmpeg failed: {result.stderr}")
# Replace original with chaptered version
input_m4b.unlink()
output_m4b.rename(input_m4b)
print(f"Successfully embedded chapters in {input_m4b}")
return input_m4b
except subprocess.CalledProcessError as e:
print(f"FFmpeg error: {e.stderr}")
# Clean up temp file
if output_m4b.exists():
output_m4b.unlink()
raise
finally:
# Clean up metadata file
if metadata_file.exists():
metadata_file.unlink()

View file

@ -0,0 +1,291 @@
import asyncio
import uuid
import os
from datetime import datetime
from pathlib import Path
from typing import Callable, Optional
import subprocess
import json
from models.schemas import Job, JobStatus, JobProgress, ChapterInfo
from services.epub_parser import extract_chapters, Chapter
from services.chapter_embedder import (
get_chapter_audio_durations,
generate_ffmpeg_metadata,
embed_chapters_in_m4b
)
class JobManager:
"""Manages conversion jobs and their state with user isolation."""
def __init__(self, storage_path: str = "/mnt"):
self.storage_path = Path(storage_path)
self.jobs: dict[str, Job] = {}
self.progress_callbacks: dict[str, list[Callable]] = {}
def get_user_uploads_dir(self, user_id: str) -> Path:
"""Get the uploads directory for a specific user."""
user_dir = self.storage_path / "users" / user_id / "uploads"
user_dir.mkdir(parents=True, exist_ok=True)
return user_dir
def get_user_outputs_dir(self, user_id: str) -> Path:
"""Get the outputs directory for a specific user."""
user_dir = self.storage_path / "users" / user_id / "outputs"
user_dir.mkdir(parents=True, exist_ok=True)
return user_dir
def create_job(self, user_id: str, filename: str, voice: str, speed: float, use_gpu: bool) -> Job:
"""Create a new conversion job for a user."""
job_id = str(uuid.uuid4())
now = datetime.now()
job = Job(
id=job_id,
user_id=user_id,
filename=filename,
voice=voice,
speed=speed,
use_gpu=use_gpu,
status=JobStatus.PENDING,
created_at=now,
updated_at=now,
)
self.jobs[job_id] = job
return job
def get_job(self, job_id: str, user_id: Optional[str] = None) -> Optional[Job]:
"""Get a job by ID. If user_id is provided, verify ownership."""
job = self.jobs.get(job_id)
if job and user_id and job.user_id != user_id:
return None # User doesn't own this job
return job
def get_user_jobs(self, user_id: str) -> list[Job]:
"""Get all jobs for a specific user."""
return [job for job in self.jobs.values() if job.user_id == user_id]
def get_all_jobs(self) -> list[Job]:
"""Get all jobs (admin use only)."""
return list(self.jobs.values())
def update_job_status(self, job_id: str, status: JobStatus, error: Optional[str] = None):
"""Update job status."""
if job_id in self.jobs:
self.jobs[job_id].status = status
self.jobs[job_id].updated_at = datetime.now()
if error:
self.jobs[job_id].error = error
def update_job_progress(self, job_id: str, progress: float, current_chapter: Optional[str] = None, eta: Optional[str] = None):
"""Update job progress."""
if job_id in self.jobs:
self.jobs[job_id].progress = progress
self.jobs[job_id].updated_at = datetime.now()
# Notify callbacks
if job_id in self.progress_callbacks:
progress_data = JobProgress(
progress=progress,
eta=eta,
current_chapter=current_chapter,
status=self.jobs[job_id].status
)
for callback in self.progress_callbacks[job_id]:
try:
asyncio.create_task(callback(progress_data))
except Exception as e:
print(f"Error in progress callback: {e}")
def register_progress_callback(self, job_id: str, callback: Callable):
"""Register a callback for progress updates."""
if job_id not in self.progress_callbacks:
self.progress_callbacks[job_id] = []
self.progress_callbacks[job_id].append(callback)
def unregister_progress_callback(self, job_id: str, callback: Callable):
"""Unregister a progress callback."""
if job_id in self.progress_callbacks:
self.progress_callbacks[job_id].remove(callback)
if not self.progress_callbacks[job_id]:
del self.progress_callbacks[job_id]
async def run_conversion(self, job_id: str):
"""Run the audiblez conversion in the background."""
job = self.jobs.get(job_id)
if not job:
return
try:
self.update_job_status(job_id, JobStatus.PROCESSING)
# Prepare user-specific paths
input_path = self.get_user_uploads_dir(job.user_id) / job.filename
output_dir = self.get_user_outputs_dir(job.user_id) / job_id
output_dir.mkdir(parents=True, exist_ok=True)
# Extract chapters from EPUB before conversion
chapters: list[Chapter] = []
if input_path.suffix.lower() == '.epub':
chapters = extract_chapters(input_path)
self.jobs[job_id].total_chapters = len(chapters)
print(f"Extracted {len(chapters)} chapters from EPUB")
# Build audiblez command - use the venv python/audiblez
cmd = [
"/app/audiblez/bin/audiblez",
str(input_path),
"-o", str(output_dir),
"-v", job.voice,
"-s", str(job.speed),
]
if job.use_gpu:
cmd.append("-c") # --cuda flag for GPU
# Run conversion
process = await asyncio.create_subprocess_exec(
*cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=str(self.storage_path)
)
# Monitor progress from stderr/stdout
async def read_stream(stream, is_stderr=False):
while True:
line = await stream.readline()
if not line:
break
line_str = line.decode().strip()
print(f"{'[STDERR]' if is_stderr else '[STDOUT]'} {line_str}")
# Parse progress from output
# Audiblez outputs progress in various formats
# We'll do a simple pattern match
if "%" in line_str:
try:
# Try to extract percentage
parts = line_str.split("%")
if len(parts) > 1:
# Find the number before %
num_str = parts[0].split()[-1]
progress = float(num_str)
self.update_job_progress(job_id, progress)
except:
pass
# Check for chapter info
if "Chapter" in line_str or "chapter" in line_str:
self.update_job_progress(
job_id,
job.progress,
current_chapter=line_str
)
# Read both streams concurrently
await asyncio.gather(
read_stream(process.stdout),
read_stream(process.stderr, is_stderr=True)
)
# Wait for completion
returncode = await process.wait()
if returncode == 0:
# Find output file
output_files = list(output_dir.glob("*.m4b"))
if not output_files:
output_files = list(output_dir.glob("*.mp3"))
if output_files:
output_file = output_files[0]
# Embed chapter metadata if we have chapters and WAV files
if chapters:
try:
durations = get_chapter_audio_durations(output_dir)
print(f"Found {len(durations)} chapter audio durations")
if durations:
# Match chapter count with duration count
num_chapters = min(len(chapters), len(durations))
if num_chapters != len(chapters):
print(f"Warning: chapter count ({len(chapters)}) != duration count ({len(durations)})")
metadata = generate_ffmpeg_metadata(chapters[:num_chapters], durations[:num_chapters])
embed_chapters_in_m4b(output_file, metadata)
# Store chapter info in job for API access
self.jobs[job_id].chapters = [
ChapterInfo(
title=c.title,
start_ms=c.start_ms,
end_ms=c.end_ms
)
for c in chapters[:num_chapters]
]
print(f"Embedded {num_chapters} chapters in M4B")
else:
print("No WAV files found for chapter duration calculation")
except Exception as e:
print(f"Failed to embed chapters: {e}")
# Continue without chapter embedding - non-fatal error
self.jobs[job_id].output_file = str(output_file.name)
self.update_job_status(job_id, JobStatus.COMPLETED)
self.update_job_progress(job_id, 100.0)
else:
self.update_job_status(job_id, JobStatus.FAILED, "No output file generated")
else:
self.update_job_status(job_id, JobStatus.FAILED, f"Conversion failed with code {returncode}")
except Exception as e:
print(f"Conversion error: {e}")
self.update_job_status(job_id, JobStatus.FAILED, str(e))
def delete_job(self, job_id: str, user_id: str) -> bool:
"""Delete a job and its output files."""
job = self.get_job(job_id, user_id)
if not job:
return False
# Delete output directory if exists
output_dir = self.get_user_outputs_dir(user_id) / job_id
if output_dir.exists():
import shutil
shutil.rmtree(output_dir)
# Remove from jobs dict
del self.jobs[job_id]
return True
def get_user_audiobooks(self, user_id: str) -> list[dict]:
"""List all completed audiobooks for a user."""
outputs_dir = self.get_user_outputs_dir(user_id)
audiobooks = []
if outputs_dir.exists():
for job_dir in outputs_dir.iterdir():
if job_dir.is_dir():
# Look for m4b or mp3 files
audio_files = list(job_dir.glob("*.m4b")) + list(job_dir.glob("*.mp3"))
for audio_file in audio_files:
stat = audio_file.stat()
audiobooks.append({
"id": job_dir.name,
"filename": audio_file.name,
"size": stat.st_size,
"created_at": stat.st_mtime,
})
# Sort by creation time, newest first
audiobooks.sort(key=lambda x: x["created_at"], reverse=True)
return audiobooks
# Global job manager instance
job_manager = JobManager()

View file

@ -0,0 +1,166 @@
"""EPUB chapter extraction service.
This parser attempts to match audiblez's chapter detection logic to ensure
the extracted chapters align with the WAV files audiblez produces.
audiblez iterates through EPUB ITEM_DOCUMENTs and uses is_chapter() to determine
if a document is a chapter based on content length (100+ chars) and filename patterns.
"""
import re
from dataclasses import dataclass
from pathlib import Path
from bs4 import BeautifulSoup
from ebooklib import epub, ITEM_DOCUMENT
@dataclass
class Chapter:
"""Represents a chapter extracted from an EPUB."""
title: str
index: int
duration_ms: int = 0
start_ms: int = 0
end_ms: int = 0
def sanitize_title(title: str) -> str:
"""Remove characters that break FFmpeg metadata format."""
if not title:
return "Untitled"
# Escape special chars for FFmpeg FFMETADATA format
return (title
.replace('=', '-')
.replace(';', '-')
.replace('#', '')
.replace('\\', '')
.replace('\n', ' ')
.replace('\r', '')
.strip())
def is_chapter(text: str, filename: str) -> bool:
"""Determine if a document is a chapter.
Matches audiblez's is_chapter() logic:
- Content must be over 100 characters
- Filename should match common chapter patterns
"""
if len(text) < 100:
return False
# Check filename patterns that indicate a chapter
filename_lower = filename.lower()
chapter_patterns = [
r'chapter',
r'part[_-]?\d+',
r'split[_-]?\d+',
r'ch[_-]?\d+',
r'chap[_-]?\d+',
r'sect', # section
r'content',
r'text',
]
for pattern in chapter_patterns:
if re.search(pattern, filename_lower):
return True
# If content is substantial (1000+ chars), likely a chapter even without pattern match
if len(text) > 1000:
return True
return False
def extract_title_from_content(soup: BeautifulSoup, filename: str, index: int) -> str:
"""Extract a chapter title from the document content."""
# Try to find title in common heading tags
for tag in ['title', 'h1', 'h2', 'h3']:
element = soup.find(tag)
if element and element.get_text(strip=True):
title = element.get_text(strip=True)
# Truncate long titles
if len(title) > 100:
title = title[:97] + "..."
return title
# Fallback: use filename without extension
stem = Path(filename).stem
# Clean up common patterns
stem = re.sub(r'^(chapter|chap|ch)[_-]?', 'Chapter ', stem, flags=re.IGNORECASE)
stem = re.sub(r'[_-]', ' ', stem)
if stem and len(stem) < 50:
return stem.title()
return f"Chapter {index + 1}"
def extract_chapters(epub_path: Path) -> list[Chapter]:
"""Extract chapter titles matching audiblez's chapter detection logic.
audiblez determines chapters by:
1. Iterating through ITEM_DOCUMENT items
2. Checking is_chapter() based on content length and filename patterns
This ensures our chapter count matches the WAV files audiblez produces.
Args:
epub_path: Path to the EPUB file
Returns:
List of Chapter objects with title and index
"""
try:
book = epub.read_epub(str(epub_path))
except Exception as e:
print(f"Failed to read EPUB: {e}")
return []
chapters: list[Chapter] = []
chapter_index = 0
# Iterate through documents like audiblez does
for item in book.get_items():
if item.get_type() != ITEM_DOCUMENT:
continue
try:
# Get content and parse with BeautifulSoup
content = item.get_content()
soup = BeautifulSoup(content, features='lxml')
# Extract text from relevant tags (matching audiblez)
text_parts = []
for tag in soup.find_all(['title', 'p', 'h1', 'h2', 'h3', 'h4', 'li']):
text = tag.get_text(strip=True)
if text:
text_parts.append(text)
full_text = ' '.join(text_parts)
filename = item.get_name() or ""
# Check if this document is a chapter
if is_chapter(full_text, filename):
title = extract_title_from_content(soup, filename, chapter_index)
chapters.append(Chapter(
title=sanitize_title(title),
index=chapter_index
))
chapter_index += 1
except Exception as e:
print(f"Error processing document {item.get_name()}: {e}")
continue
print(f"Extracted {len(chapters)} chapters from EPUB (audiblez-style detection)")
# Debug: print first few chapters
for i, ch in enumerate(chapters[:5]):
print(f" {i+1}. {ch.title}")
if len(chapters) > 5:
print(f" ... and {len(chapters) - 5} more")
return chapters

View file

@ -0,0 +1,97 @@
from models.schemas import Voice
# Voice catalog from Kokoro-82M (used by audiblez)
# https://huggingface.co/hexgrad/Kokoro-82M/blob/main/VOICES.md
VOICE_CATALOG = {
# American English
"af_heart": Voice(id="af_heart", name="Heart", language="American English", gender="F"),
"af_alloy": Voice(id="af_alloy", name="Alloy", language="American English", gender="F"),
"af_aoede": Voice(id="af_aoede", name="Aoede", language="American English", gender="F"),
"af_bella": Voice(id="af_bella", name="Bella", language="American English", gender="F"),
"af_jessica": Voice(id="af_jessica", name="Jessica", language="American English", gender="F"),
"af_kore": Voice(id="af_kore", name="Kore", language="American English", gender="F"),
"af_nicole": Voice(id="af_nicole", name="Nicole", language="American English", gender="F"),
"af_nova": Voice(id="af_nova", name="Nova", language="American English", gender="F"),
"af_river": Voice(id="af_river", name="River", language="American English", gender="F"),
"af_sarah": Voice(id="af_sarah", name="Sarah", language="American English", gender="F"),
"af_sky": Voice(id="af_sky", name="Sky", language="American English", gender="F"),
"am_adam": Voice(id="am_adam", name="Adam", language="American English", gender="M"),
"am_echo": Voice(id="am_echo", name="Echo", language="American English", gender="M"),
"am_eric": Voice(id="am_eric", name="Eric", language="American English", gender="M"),
"am_fenrir": Voice(id="am_fenrir", name="Fenrir", language="American English", gender="M"),
"am_liam": Voice(id="am_liam", name="Liam", language="American English", gender="M"),
"am_michael": Voice(id="am_michael", name="Michael", language="American English", gender="M"),
"am_onyx": Voice(id="am_onyx", name="Onyx", language="American English", gender="M"),
"am_puck": Voice(id="am_puck", name="Puck", language="American English", gender="M"),
"am_santa": Voice(id="am_santa", name="Santa", language="American English", gender="M"),
# British English
"bf_alice": Voice(id="bf_alice", name="Alice", language="British English", gender="F"),
"bf_emma": Voice(id="bf_emma", name="Emma", language="British English", gender="F"),
"bf_isabella": Voice(id="bf_isabella", name="Isabella", language="British English", gender="F"),
"bf_lily": Voice(id="bf_lily", name="Lily", language="British English", gender="F"),
"bm_daniel": Voice(id="bm_daniel", name="Daniel", language="British English", gender="M"),
"bm_fable": Voice(id="bm_fable", name="Fable", language="British English", gender="M"),
"bm_george": Voice(id="bm_george", name="George", language="British English", gender="M"),
"bm_lewis": Voice(id="bm_lewis", name="Lewis", language="British English", gender="M"),
# Japanese
"jf_alpha": Voice(id="jf_alpha", name="Alpha", language="Japanese", gender="F"),
"jf_gongitsune": Voice(id="jf_gongitsune", name="Gongitsune", language="Japanese", gender="F"),
"jf_nezumi": Voice(id="jf_nezumi", name="Nezumi", language="Japanese", gender="F"),
"jf_tebukuro": Voice(id="jf_tebukuro", name="Tebukuro", language="Japanese", gender="F"),
"jm_kumo": Voice(id="jm_kumo", name="Kumo", language="Japanese", gender="M"),
# Mandarin Chinese
"zf_xiaobei": Voice(id="zf_xiaobei", name="Xiaobei", language="Mandarin Chinese", gender="F"),
"zf_xiaoni": Voice(id="zf_xiaoni", name="Xiaoni", language="Mandarin Chinese", gender="F"),
"zf_xiaoxiao": Voice(id="zf_xiaoxiao", name="Xiaoxiao", language="Mandarin Chinese", gender="F"),
"zf_xiaoyi": Voice(id="zf_xiaoyi", name="Xiaoyi", language="Mandarin Chinese", gender="F"),
"zm_yunjian": Voice(id="zm_yunjian", name="Yunjian", language="Mandarin Chinese", gender="M"),
"zm_yunxi": Voice(id="zm_yunxi", name="Yunxi", language="Mandarin Chinese", gender="M"),
"zm_yunxia": Voice(id="zm_yunxia", name="Yunxia", language="Mandarin Chinese", gender="M"),
"zm_yunyang": Voice(id="zm_yunyang", name="Yunyang", language="Mandarin Chinese", gender="M"),
# Spanish
"ef_dora": Voice(id="ef_dora", name="Dora", language="Spanish", gender="F"),
"em_alex": Voice(id="em_alex", name="Alex", language="Spanish", gender="M"),
"em_santa": Voice(id="em_santa", name="Santa", language="Spanish", gender="M"),
# French
"ff_siwis": Voice(id="ff_siwis", name="Siwis", language="French", gender="F"),
# Hindi
"hf_alpha": Voice(id="hf_alpha", name="Alpha", language="Hindi", gender="F"),
"hf_beta": Voice(id="hf_beta", name="Beta", language="Hindi", gender="F"),
"hm_omega": Voice(id="hm_omega", name="Omega", language="Hindi", gender="M"),
"hm_psi": Voice(id="hm_psi", name="Psi", language="Hindi", gender="M"),
# Italian
"if_sara": Voice(id="if_sara", name="Sara", language="Italian", gender="F"),
"im_nicola": Voice(id="im_nicola", name="Nicola", language="Italian", gender="M"),
# Brazilian Portuguese
"pf_dora": Voice(id="pf_dora", name="Dora", language="Brazilian Portuguese", gender="F"),
"pm_alex": Voice(id="pm_alex", name="Alex", language="Brazilian Portuguese", gender="M"),
"pm_santa": Voice(id="pm_santa", name="Santa", language="Brazilian Portuguese", gender="M"),
}
def get_all_voices() -> list[Voice]:
"""Get all available voices."""
return list(VOICE_CATALOG.values())
def get_voice(voice_id: str) -> Voice | None:
"""Get a specific voice by ID."""
return VOICE_CATALOG.get(voice_id)
def get_voices_by_language() -> dict[str, list[Voice]]:
"""Get voices grouped by language."""
grouped = {}
for voice in VOICE_CATALOG.values():
if voice.language not in grouped:
grouped[voice.language] = []
grouped[voice.language].append(voice)
return grouped

View file

@ -0,0 +1,13 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Audiblez Web - EPUB to Audiobook</title>
<link rel="icon" type="image/svg+xml" href="data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'><text y='.9em' font-size='90'>🎧</text></svg>" />
</head>
<body>
<div id="app"></div>
<script type="module" src="/src/main.js"></script>
</body>
</html>

View file

@ -0,0 +1,279 @@
<script>
import FileUpload from './lib/FileUpload.svelte';
import VoicePicker from './lib/VoicePicker.svelte';
import JobsList from './lib/JobsList.svelte';
import AudiobooksList from './lib/AudiobooksList.svelte';
import { jobs } from './stores/jobs.js';
let uploadedFilename = $state(null);
let selectedVoice = $state('af_sky');
let speed = $state(1.0);
let useGpu = $state(true);
let isStarting = $state(false);
let error = $state(null);
let currentUser = $state(null);
// Fetch current user on mount
$effect(() => {
fetchCurrentUser();
});
async function fetchCurrentUser() {
try {
const response = await fetch('/api/me');
if (response.ok) {
currentUser = await response.json();
}
} catch (e) {
console.error('Failed to fetch user:', e);
}
}
function handleFileUpload(filename) {
uploadedFilename = filename;
}
async function startConversion() {
if (!uploadedFilename || !selectedVoice) {
error = 'Please upload a file and select a voice';
return;
}
error = null;
isStarting = true;
try {
const response = await fetch('/api/jobs', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
filename: uploadedFilename,
voice: selectedVoice,
speed: speed,
use_gpu: useGpu
})
});
if (!response.ok) {
const data = await response.json();
throw new Error(data.detail || 'Failed to start conversion');
}
const job = await response.json();
jobs.add(job);
// Reset form
uploadedFilename = null;
} catch (e) {
error = e.message;
} finally {
isStarting = false;
}
}
let canStart = $derived(uploadedFilename && selectedVoice && !isStarting);
</script>
<main>
<header>
<div class="header-content">
<div>
<h1>Audiblez Web</h1>
<p class="subtitle">Convert EPUB to Audiobook</p>
</div>
{#if currentUser}
<div class="user-info">
<span class="user-name">{currentUser.name || currentUser.username}</span>
<span class="user-email">{currentUser.email}</span>
</div>
{/if}
</div>
</header>
<div class="content">
<div class="form-section">
<div class="upload-section">
<FileUpload onUpload={handleFileUpload} />
</div>
<div class="voice-section">
<VoicePicker bind:selectedVoice />
</div>
</div>
<div class="options-section">
<div class="option">
<label for="speed">Speed: {speed.toFixed(1)}x</label>
<input
type="range"
id="speed"
min="0.5"
max="2"
step="0.1"
bind:value={speed}
/>
</div>
<div class="option">
<label>
<input type="checkbox" bind:checked={useGpu} />
Use GPU (faster)
</label>
</div>
<button
class="start-btn"
disabled={!canStart}
onclick={startConversion}
>
{#if isStarting}
Starting...
{:else}
Start Conversion
{/if}
</button>
{#if error}
<p class="error">{error}</p>
{/if}
</div>
<JobsList />
<AudiobooksList />
</div>
</main>
<style>
:global(body) {
margin: 0;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, sans-serif;
background: #f5f5f5;
}
main {
max-width: 900px;
margin: 0 auto;
padding: 2rem;
}
header {
text-align: center;
margin-bottom: 2rem;
}
.header-content {
display: flex;
justify-content: space-between;
align-items: center;
text-align: left;
}
.user-info {
display: flex;
flex-direction: column;
align-items: flex-end;
padding: 0.5rem 1rem;
background: #e8f0fe;
border-radius: 8px;
}
.user-name {
font-weight: 500;
color: #333;
}
.user-email {
font-size: 0.75rem;
color: #666;
}
h1 {
margin: 0;
color: #333;
font-size: 2rem;
}
.subtitle {
color: #666;
margin: 0.25rem 0 0;
}
.content {
background: white;
border-radius: 12px;
padding: 1.5rem;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.1);
}
.form-section {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 1.5rem;
}
@media (max-width: 768px) {
.form-section {
grid-template-columns: 1fr;
}
}
.options-section {
margin-top: 1.5rem;
padding-top: 1.5rem;
border-top: 1px solid #e0e0e0;
display: flex;
flex-wrap: wrap;
gap: 1rem;
align-items: center;
}
.option {
display: flex;
align-items: center;
gap: 0.5rem;
}
.option label {
font-size: 0.875rem;
color: #666;
}
.option input[type="range"] {
width: 120px;
}
.option input[type="checkbox"] {
width: 16px;
height: 16px;
}
.start-btn {
margin-left: auto;
padding: 0.75rem 1.5rem;
background: #4a90d9;
color: white;
border: none;
border-radius: 8px;
font-size: 1rem;
font-weight: 500;
cursor: pointer;
transition: background 0.2s;
}
.start-btn:hover:not(:disabled) {
background: #3a7fc9;
}
.start-btn:disabled {
background: #ccc;
cursor: not-allowed;
}
.error {
color: #d32f2f;
font-size: 0.875rem;
width: 100%;
margin-top: 0.5rem;
}
</style>

View file

@ -0,0 +1,6 @@
import App from './App.svelte'
import { mount } from 'svelte'
const app = mount(App, { target: document.getElementById('app') })
export default app

View file

@ -0,0 +1,28 @@
import { writable } from 'svelte/store';
function createJobsStore() {
const { subscribe, set, update } = writable([]);
return {
subscribe,
set,
add: (job) => update(jobs => [...jobs, job]),
updateJob: (jobId, updates) => update(jobs =>
jobs.map(j => j.id === jobId ? { ...j, ...updates } : j)
),
remove: (jobId) => update(jobs => jobs.filter(j => j.id !== jobId)),
refresh: async () => {
try {
const response = await fetch('/api/jobs');
if (response.ok) {
const jobs = await response.json();
set(jobs);
}
} catch (e) {
console.error('Failed to fetch jobs:', e);
}
}
};
}
export const jobs = createJobsStore();

View file

@ -0,0 +1,5 @@
import { vitePreprocess } from '@sveltejs/vite-plugin-svelte'
export default {
preprocess: vitePreprocess()
}

View file

@ -0,0 +1,15 @@
import { defineConfig } from 'vite'
import { svelte } from '@sveltejs/vite-plugin-svelte'
export default defineConfig({
plugins: [svelte()],
server: {
proxy: {
'/api': 'http://localhost:8000',
'/ws': {
target: 'ws://localhost:8000',
ws: true
}
}
}
})

View file

@ -0,0 +1,136 @@
#!/usr/bin/env python3
"""
Generate voice samples for all available voices.
Run this script in an environment with audiblez installed.
Usage:
python generate_samples.py [output_dir]
"""
import os
import sys
from pathlib import Path
# Sample text for voice preview
SAMPLE_TEXT = "The quick brown fox jumps over the lazy dog. This is a sample of my voice for audiobook narration."
# All voices from Kokoro-82M (audiblez)
VOICES = [
# American English (20 voices)
"af_alloy", "af_aoede", "af_bella", "af_heart", "af_jessica", "af_kore",
"af_nicole", "af_nova", "af_river", "af_sarah", "af_sky",
"am_adam", "am_echo", "am_eric", "am_fenrir", "am_liam", "am_michael",
"am_onyx", "am_puck", "am_santa",
# British English (8 voices)
"bf_alice", "bf_emma", "bf_isabella", "bf_lily",
"bm_daniel", "bm_fable", "bm_george", "bm_lewis",
# Spanish (3 voices)
"ef_dora", "em_alex", "em_santa",
# French (1 voice)
"ff_siwis",
# Hindi (4 voices)
"hf_alpha", "hf_beta", "hm_omega", "hm_psi",
# Italian (2 voices)
"if_sara", "im_nicola",
# Japanese (5 voices)
"jf_alpha", "jf_gongitsune", "jf_nezumi", "jf_tebukuro", "jm_kumo",
# Brazilian Portuguese (3 voices)
"pf_dora", "pm_alex", "pm_santa",
# Mandarin Chinese (8 voices)
"zf_xiaobei", "zf_xiaoni", "zf_xiaoxiao", "zf_xiaoyi",
"zm_yunjian", "zm_yunxi", "zm_yunxia", "zm_yunyang",
]
def generate_sample(voice: str, output_dir: Path):
"""Generate a voice sample using kokoro TTS."""
try:
from kokoro import KPipeline
output_file = output_dir / f"{voice}.mp3"
if output_file.exists():
print(f"Skipping {voice} - already exists")
return True
print(f"Generating sample for {voice}...")
# Map voice prefix to language code
lang_map = {
'a': 'a', # American English
'b': 'b', # British English
'e': 'e', # Spanish
'f': 'f', # French
'h': 'h', # Hindi
'i': 'i', # Italian
'j': 'j', # Japanese
'p': 'p', # Portuguese
'z': 'z', # Chinese
}
# Extract language code from voice (first letter)
lang_code = lang_map.get(voice[0], 'a')
# Initialize the Kokoro pipeline
pipeline = KPipeline(lang_code=lang_code)
# Generate audio
generator = pipeline(SAMPLE_TEXT, voice=voice, speed=1.0)
# Collect all audio chunks
audio_chunks = []
for _, _, audio in generator:
audio_chunks.append(audio)
if audio_chunks:
import soundfile as sf
import numpy as np
# Concatenate audio
audio = np.concatenate(audio_chunks)
# Save as WAV first, then convert to MP3
wav_file = output_dir / f"{voice}.wav"
sf.write(str(wav_file), audio, 24000)
# Convert to MP3 using ffmpeg
import subprocess
result = subprocess.run([
"ffmpeg", "-y", "-i", str(wav_file),
"-codec:a", "libmp3lame", "-qscale:a", "5",
str(output_file)
], capture_output=True)
# Remove WAV file
if wav_file.exists():
wav_file.unlink()
if result.returncode == 0:
print(f"Generated {output_file}")
return True
else:
print(f"FFmpeg failed for {voice}: {result.stderr.decode()}")
return False
else:
print(f"Failed to generate audio for {voice}")
return False
except Exception as e:
print(f"Error generating sample for {voice}: {e}")
return False
def main():
output_dir = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("samples")
output_dir.mkdir(parents=True, exist_ok=True)
print(f"Generating voice samples to {output_dir}")
print(f"Total voices: {len(VOICES)}")
for voice in VOICES:
generate_sample(voice, output_dir)
print("Done!")
if __name__ == "__main__":
main()

View file

@ -0,0 +1,2 @@
# Voice samples go here
# Run generate_samples.py to generate voice samples for all voices

View file

@ -0,0 +1,229 @@
mqtt:
enabled: false
birdseye:
quality: 25
detect:
fps: 1
enabled: true
go2rtc:
streams:
vermont-1:
- rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/101/3
cameras:
# # Temp disabled until valchedrym is back up
valchedrym-cam-1:
enabled: true
ffmpeg:
inputs:
#- path: rtsp://admin:REDACTED_RTSP_PW@192.168.0.11:554/Streaming/Channels/101 # <----- The stream you want to use for detection
- path: rtsp://admin:REDACTED_RTSP_PW@valchedrym.ddns.net:554/Streaming/Channels/101 # <----- The stream you want to use for detection
detect:
enabled: false # <---- disable detection until you have a working camera feed
width: 704 # <---- update for your camera's resolution
height: 576 # <---- update for your camera's resolution
rtmp:
enabled: false
record:
enabled: false
snapshots:
enabled: false
objects:
# Optional: list of objects to track from labelmap.txt (full list - https://docs.frigate.video/configuration/objects)
track:
- person
- bicycle
- car
- bird
- cat
- dog
- horse
valchedrym-cam-2:
enabled: true
ffmpeg:
inputs:
#- path: rtsp://admin:REDACTED_RTSP_PW@192.168.0.11:554/Streaming/Channels/201 # <----- The stream you want to use for detection
- path: rtsp://admin:REDACTED_RTSP_PW@valchedrym.ddns.net:554/Streaming/Channels/201 # <----- The stream you want to use for detection
detect:
enabled: false # <---- disable detection until you have a working camera feed
width: 704 # <---- update for your camera's resolution
height: 576 # <---- update for your camera's resolution
rtmp:
enabled: false
record:
enabled: false
snapshots:
enabled: false
objects:
# Optional: list of objects to track from labelmap.txt (full list - https://docs.frigate.video/configuration/objects)
track:
- person
- bicycle
- car
- bird
- cat
- dog
- horse
vermont-1:
enabled: true
ffmpeg:
inputs:
- path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/101/3 # <----- The stream you want to use for detection
roles:
- record
rtmp:
enabled: false
record:
enabled: false
snapshots:
enabled: false
detect:
enabled: false
vermont-2:
enabled: true
ffmpeg:
inputs:
- path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/201/1 # <----- The stream you want to use for detection
detect:
enabled: false # <---- disable detection until you have a working camera feed
width: 704 # <---- update for your camera's resolution
height: 576 # <---- update for your camera's resolution
rtmp:
enabled: false
record:
enabled: false
snapshots:
enabled: false
vermont-3:
enabled: true
ffmpeg:
inputs:
- path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/301/1 # <----- The stream you want to use for detection
detect:
enabled: false # <---- disable detection until you have a working camera feed
width: 704 # <---- update for your camera's resolution
height: 576 # <---- update for your camera's resolution
rtmp:
enabled: false
record:
enabled: false
snapshots:
enabled: false
vermont-4:
enabled: true
ffmpeg:
inputs:
- path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/401/1 # <----- The stream you want to use for detection
detect:
enabled: false # <---- disable detection until you have a working camera feed
width: 704 # <---- update for your camera's resolution
height: 576 # <---- update for your camera's resolution
rtmp:
enabled: false
record:
enabled: false
snapshots:
enabled: false
vermont-5:
enabled: true
ffmpeg:
inputs:
- path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/501/1 # <----- The stream you want to use for detection
detect:
enabled: false # <---- disable detection until you have a working camera feed
width: 704 # <---- update for your camera's resolution
height: 576 # <---- update for your camera's resolution
rtmp:
enabled: false
record:
enabled: false
snapshots:
enabled: false
vermont-6:
enabled: true
ffmpeg:
inputs:
- path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/601/1 # <----- The stream you want to use for detection
detect:
enabled: false # <---- disable detection until you have a working camera feed
width: 704 # <---- update for your camera's resolution
height: 576 # <---- update for your camera's resolution
rtmp:
enabled: false
record:
enabled: false
snapshots:
enabled: false
vermont-7:
enabled: true
ffmpeg:
inputs:
- path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/701/1 # <----- The stream you want to use for detection
detect:
enabled: false # <---- disable detection until you have a working camera feed
width: 704 # <---- update for your camera's resolution
height: 576 # <---- update for your camera's resolution
rtmp:
enabled: false
record:
enabled: false
snapshots:
enabled: false
vermont-8:
enabled: true
ffmpeg:
inputs:
- path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/801/1 # <----- The stream you want to use for detection
detect:
enabled: false # <---- disable detection until you have a working camera feed
width: 704 # <---- update for your camera's resolution
height: 576 # <---- update for your camera's resolution
rtmp:
enabled: false
record:
enabled: false
snapshots:
enabled: false
vermont-9:
enabled: true
ffmpeg:
inputs:
- path: rtsp://admin:REDACTED_RTSP_PW@192.168.1.10:554/Streaming/Channels/901/1 # <----- The stream you want to use for detection
detect:
enabled: false # <---- disable detection until you have a working camera feed
width: 704 # <---- update for your camera's resolution
height: 576 # <---- update for your camera's resolution
rtmp:
enabled: false
record:
enabled: false
snapshots:
enabled: false
# london-ipcam:
# enabled: false
# ffmpeg:
# inputs:
# - path: rtsp://192.168.2.2:8554/london_cam # <----- The stream you want to use for detection
# roles:
# - rtmp
# - record
# - detect
# detect:
# enabled: False
# width: 1280
# height: 720
# record:
# enabled: False # Not needed for this camera but keeping for reference
# events:
# retain:
# default: 10
# objects:
# # Optional: list of objects to track from labelmap.txt (full list - https://docs.frigate.video/configuration/objects)
# track:
# - person
# - shoe
# - handbag
# - wine glass
# - knife
# - pizza
# - laptop
# - book

277
scripts/cluster_manager.py Normal file
View file

@ -0,0 +1,277 @@
import asyncio
import click
import logging
import time
from typing import List, Union, Optional
from kubernetes_asyncio import client, config
from kubernetes_asyncio.client.api_client import ApiClient
logging.basicConfig(level=logging.INFO, format="%(levelname)s: %(message)s")
logger = logging.getLogger(__name__)
async def wait_for_healthy(
api_instance: client.AppsV1Api,
resource_type: str,
namespace: str,
name: str,
target_replicas: int,
timeout: int = 300,
) -> None:
start_time = time.time()
logger.info(
f"Waiting for {resource_type} {name} to reach {target_replicas} replicas..."
)
while True:
if time.time() - start_time > timeout:
logger.error(f"❌ Timeout reached for {resource_type} {name}")
return
try:
if resource_type.lower() == "deployment":
res = await api_instance.read_namespaced_deployment_status(
name, namespace
)
ready = res.status.ready_replicas or 0
updated = res.status.updated_replicas or 0
if ready == target_replicas and updated == target_replicas:
break
else: # StatefulSet
res = await api_instance.read_namespaced_stateful_set_status(
name, namespace
)
ready = res.status.ready_replicas or 0
if ready == target_replicas:
break
except Exception as e:
logger.debug(f"Retrying status check for {name}: {e}")
await asyncio.sleep(5)
logger.info(f"{resource_type} {name} is now healthy.")
async def wait_for_zero(
api: client.AppsV1Api, kind: str, ns: str, name: str, timeout: int
) -> tuple[str, str]:
start_time = asyncio.get_event_loop().time()
while (asyncio.get_event_loop().time() - start_time) < timeout:
try:
res = await (
api.read_namespaced_deployment_status(name, ns)
if kind.lower() == "deployment"
else api.read_namespaced_stateful_set_status(name, ns)
)
if (res.status.ready_replicas or 0) == 0:
return ns, name
except Exception:
return ns, name # Assume gone if error
await asyncio.sleep(3)
logger.error(f"Timeout: {kind} {ns}/{name} still has running pods.")
return ns, name
async def scale_resource(
api_instance: client.AppsV1Api,
resource_type: str,
namespace: str,
name: str,
replicas: int,
) -> None:
body = {"spec": {"replicas": replicas}}
try:
if resource_type.lower() == "deployment":
await api_instance.patch_namespaced_deployment_scale(name, namespace, body)
else:
await api_instance.patch_namespaced_stateful_set_scale(
name, namespace, body
)
except Exception as e:
logger.error(f"Failed to scale {resource_type} {name}: {e}")
async def run_stop_tier(
api_v1: client.AppsV1Api, label: str, output_file: str, timeout: int
) -> None:
"""Processes a single label tier: saves, scales to 0, and waits."""
excluded_ns = ["kube-system", "kube-public", "kube-node-lease"]
# 1. Discover
targets = [
("Deployment", api_v1.list_deployment_for_all_namespaces),
("StatefulSet", api_v1.list_stateful_set_for_all_namespaces),
]
tier_resources = []
for kind, list_func in targets:
resp = await list_func(label_selector=label)
tier_resources.extend(
[
(kind, item)
for item in resp.items
if item.metadata.namespace not in excluded_ns
]
)
if not tier_resources:
logger.warning(f"No resources found for label: {label}")
return
# 2. Save & Scale
active_jobs: set[tuple[str, str]] = set()
wait_tasks = []
# Append to file so we don't overwrite previous tiers
with open(output_file, "a") as f:
for kind, item in tier_resources:
ns, name = item.metadata.namespace, item.metadata.name
reps = item.spec.replicas or 0
f.write(f"{kind} {ns} {name} {reps}\n")
active_jobs.add((ns, name))
await scale_resource(api_v1, kind, ns, name, 0)
wait_tasks.append(wait_for_zero(api_v1, kind, ns, name, timeout))
# 3. Wait for this tier to finish before moving to next
logger.info(f"Tier [{label}]: Waiting for {len(active_jobs)} resources to stop...")
for coro in asyncio.as_completed(wait_tasks):
finished_ns, finished_name = await coro
active_jobs.discard((finished_ns, finished_name))
if active_jobs:
remaining_ns = sorted({ns for ns, name in active_jobs})
logger.info(
f"[{label}] Pending: {len(active_jobs)} | Namespaces: {', '.join(remaining_ns)}"
)
logger.info(f"✅ Tier [{label}] successfully shut down.")
@click.group()
def cli():
pass
@cli.command()
@click.argument("labels", nargs=-1, required=True)
@click.option("--output", "-o", default="resources.txt", help="Output state file")
@click.option("--timeout", "-t", default=3600)
def stop(labels: List[str], output: str, timeout: int):
"""Stop tiers sequentially. Usage: stop 'app=web' 'app=db'"""
async def main():
await config.load_kube_config()
# Clear/Create file at start
open(output, "w").close()
async with ApiClient() as api_client:
api_v1 = client.AppsV1Api(api_client)
for label in labels:
logger.info(f"🚀 Processing Shutdown Tier: {label}")
await run_stop_tier(api_v1, label, output, timeout)
logger.info("🏁 Sequence complete. Cluster is gracefully stopped.")
asyncio.run(main())
@cli.command()
@click.argument("labels", nargs=-1, required=True)
@click.option("--file", "-f", default="resources.txt")
@click.option("--timeout", "-t", default=3600, help="Seconds to wait per resource")
def start(labels: List[str], file: str, timeout: int):
asyncio.run(run_start_sequence(labels, file, timeout))
async def run_start_sequence(labels: List[str], file_path: str, timeout: int) -> None:
await config.load_kube_config()
async with ApiClient() as api_client:
apps_v1 = client.AppsV1Api(api_client)
# 1. Load the entire snapshot into memory for filtering
try:
with open(file_path, "r") as f:
# Format: Kind Namespace Name Replicas
snapshot_lines = [line.strip().split() for line in f if line.strip()]
except FileNotFoundError:
logger.error(f"Snapshot file {file_path} not found.")
return
# 2. Iterate through labels in the order provided
for label in labels:
logger.info(f"🚀 Starting Tier: {label}")
# Find resources in this tier by querying K8s for the label
# then matching against our snapshot file data
tier_resources = await get_resources_by_label(apps_v1, label)
# Cross-reference: Only start things that are in BOTH the K8s label query AND our file
# This ensures we restore them to the CORRECT previous replica count
to_restore = []
tier_keys = {(r["ns"], r["name"]) for r in tier_resources}
for kind, ns, name, reps in snapshot_lines:
if (ns, name) in tier_keys:
to_restore.append((kind, ns, name, int(reps)))
if not to_restore:
logger.warning(f"No resources found in snapshot for tier: {label}")
continue
# 3. Scale and Wait for this specific tier
await process_start_tier(apps_v1, to_restore, timeout, label)
logger.info("🏁 All tiers started successfully.")
async def get_resources_by_label(api: client.AppsV1Api, label: str) -> List[dict]:
"""Helper to find what currently exists in the cluster with this label."""
targets = [
api.list_deployment_for_all_namespaces,
api.list_stateful_set_for_all_namespaces,
]
found = []
for list_func in targets:
resp = await list_func(label_selector=label)
for item in resp.items:
found.append({"ns": item.metadata.namespace, "name": item.metadata.name})
return found
async def process_start_tier(
api: client.AppsV1Api, resources: list, timeout: int, label: str
):
active_jobs = set()
scale_tasks = []
wait_tasks = []
# Wrapper to track which job finishes
async def tracked_wait(kind, ns, name, target, t_out):
await wait_for_healthy(api, kind, ns, name, target, t_out)
return (ns, name)
for kind, ns, name, reps in resources:
active_jobs.add((ns, name))
scale_tasks.append(scale_resource(api, kind, ns, name, reps))
wait_tasks.append(tracked_wait(kind, ns, name, reps, timeout))
# Trigger all scales for this tier
await asyncio.gather(*scale_tasks)
# Monitor health
for coro in asyncio.as_completed(wait_tasks):
finished_ns, finished_name = await coro
active_jobs.discard((finished_ns, finished_name))
if active_jobs:
remaining_ns = sorted({ns for ns, name in active_jobs})
logger.info(
f"[{label}] Pending Health: {len(active_jobs)} | Namespaces: {', '.join(remaining_ns)}"
)
logger.info(f"✅ Tier [{label}] is healthy.")
if __name__ == "__main__":
cli()

10
scripts/image_pull.sh Executable file
View file

@ -0,0 +1,10 @@
#!/usr/bin/env bash
for n in $(kubectl get nodes -o wide | grep node | awk '{print $1}'); do
echo $n;
kubectl drain $n --ignore-daemonsets --delete-emptydir-data && \
ssh wizard@$n < image_pull_remote.sh
# Check result
kubectl get --raw "/api/v1/nodes/$n/proxy/configz" | jq '.kubeletconfig | {serializeImagePulls, maxParallelImagePulls}'
kubectl uncordon $n
done

14
scripts/image_pull_remote.sh Executable file
View file

@ -0,0 +1,14 @@
#!/usr/bin/env bash
# Containerd
sudo sed -i 's/.*max_concurrent_downloads.*/max_concurrent_downloads = 5/g' /etc/containerd/config.toml
sudo systemctl restart containerd
# Kubelet
#sed serializeImagePulls: false # Allow container images to be downloaded in parallel
#maxParallelImagePulls: 20 # To limit the number of parallel image pulls.
sudo sed -i '/serializeImagePulls:/d' /var/lib/kubelet/config.yaml && \
sudo sed -i '/maxParallelImagePulls:/d' /var/lib/kubelet/config.yaml && \
echo -e 'serializeImagePulls: false\nmaxParallelImagePulls: 5' | sudo tee -a /var/lib/kubelet/config.yaml
sudo systemctl restart kubelet

View file

@ -0,0 +1,115 @@
#!/usr/bin/env bash
set -euo pipefail
############################################
# CONFIGURATION
############################################
# Internal pull-through registry endpoint
# Examples:
# http://registry.internal:5000
# https://registry.internal
INTERNAL_REGISTRY="http://10.0.20.10:5002"
# Path where containerd reads registry configs
CERTS_DIR="/etc/containerd/certs.d"
# Optional: path to CA file if INTERNAL_REGISTRY uses HTTPS with custom CA
# Leave empty if not needed
INTERNAL_CA_PATH=""
# Restart containerd at the end
RESTART_CONTAINERD=true
############################################
# REGISTRIES TO MIRROR
############################################
REGISTRIES=(
"docker.io"
"registry-1.docker.io"
"registry.k8s.io"
"quay.io"
"ghcr.io"
"gcr.io"
"us-docker.pkg.dev"
"public.ecr.aws"
"mcr.microsoft.com"
)
############################################
# FUNCTIONS
############################################
require_root() {
if [[ "$(id -u)" -ne 0 ]]; then
echo "ERROR: must be run as root" >&2
exit 1
fi
}
ensure_containerd_config_path() {
local cfg="/etc/containerd/config.toml"
if [[ ! -f "$cfg" ]]; then
echo "Generating default containerd config"
containerd config default > "$cfg"
fi
if ! grep -q 'config_path *= *"/etc/containerd/certs.d"' "$cfg"; then
echo "Enabling config_path in containerd config"
# Minimal and safe append if section exists
if grep -q '\[plugins\."io.containerd.grpc.v1.cri".registry\]' "$cfg"; then
sed -i '/\[plugins\."io.containerd.grpc.v1.cri".registry\]/a \ config_path = "/etc/containerd/certs.d"' "$cfg"
else
cat >> "$cfg" <<'EOF'
[plugins."io.containerd.grpc.v1.cri".registry]
config_path = "/etc/containerd/certs.d"
EOF
fi
fi
}
write_hosts_toml() {
local registry="$1"
local dir="$CERTS_DIR/$registry"
local file="$dir/hosts.toml"
mkdir -p "$dir"
cat > "$file" <<EOF
server = "https://$registry"
[host."$INTERNAL_REGISTRY"]
capabilities = ["pull", "resolve"]
EOF
if [[ -n "$INTERNAL_CA_PATH" ]]; then
cat >> "$file" <<EOF
ca = "$INTERNAL_CA_PATH"
EOF
fi
}
############################################
# MAIN
############################################
require_root
ensure_containerd_config_path
echo "Creating registry mirror configurations..."
for r in "${REGISTRIES[@]}"; do
echo " - $r"
write_hosts_toml "$r"
done
if [[ "$RESTART_CONTAINERD" == "true" ]]; then
echo "Restarting containerd"
systemctl restart containerd
fi
echo "Done."

172
stacks/beads-server/main.tf Normal file
View file

@ -0,0 +1,172 @@
resource "kubernetes_namespace" "beads" {
metadata {
name = "beads-server"
labels = {
tier = local.tiers.aux
}
}
}
resource "kubernetes_persistent_volume_claim" "dolt_data" {
wait_until_bound = false
metadata {
name = "dolt-data"
namespace = kubernetes_namespace.beads.metadata[0].name
annotations = {
"resize.topolvm.io/threshold" = "80%"
"resize.topolvm.io/increase" = "100%"
"resize.topolvm.io/storage_limit" = "10Gi"
}
}
spec {
access_modes = ["ReadWriteOnce"]
storage_class_name = "proxmox-lvm"
resources {
requests = { storage = "2Gi" }
}
}
}
resource "kubernetes_config_map" "dolt_init" {
metadata {
name = "dolt-init"
namespace = kubernetes_namespace.beads.metadata[0].name
}
data = {
"01-create-beads-user.sql" = <<-EOT
CREATE USER IF NOT EXISTS 'beads'@'%' IDENTIFIED BY '';
GRANT ALL PRIVILEGES ON *.* TO 'beads'@'%' WITH GRANT OPTION;
EOT
}
}
resource "kubernetes_deployment" "dolt" {
metadata {
name = "dolt"
namespace = kubernetes_namespace.beads.metadata[0].name
labels = {
app = "dolt"
tier = local.tiers.aux
}
}
spec {
replicas = 1
strategy {
type = "Recreate"
}
selector {
match_labels = {
app = "dolt"
}
}
template {
metadata {
labels = {
app = "dolt"
}
}
spec {
container {
name = "dolt"
image = "dolthub/dolt-sql-server:latest"
port {
name = "mysql"
container_port = 3306
}
env {
name = "DOLT_ROOT_HOST"
value = "%"
}
volume_mount {
name = "dolt-data"
mount_path = "/var/lib/dolt"
}
volume_mount {
name = "init-scripts"
mount_path = "/docker-entrypoint-initdb.d"
read_only = true
}
startup_probe {
tcp_socket {
port = 3306
}
failure_threshold = 30
period_seconds = 2
}
liveness_probe {
tcp_socket {
port = 3306
}
initial_delay_seconds = 10
period_seconds = 30
}
readiness_probe {
tcp_socket {
port = 3306
}
initial_delay_seconds = 5
period_seconds = 10
}
resources {
requests = {
memory = "256Mi"
cpu = "50m"
}
limits = {
memory = "512Mi"
}
}
}
volume {
name = "dolt-data"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.dolt_data.metadata[0].name
}
}
volume {
name = "init-scripts"
config_map {
name = kubernetes_config_map.dolt_init.metadata[0].name
}
}
}
}
}
lifecycle {
ignore_changes = [
spec[0].template[0].spec[0].dns_config
]
}
}
resource "kubernetes_service" "dolt" {
metadata {
name = "dolt"
namespace = kubernetes_namespace.beads.metadata[0].name
labels = {
app = "dolt"
}
annotations = {
"metallb.universe.tf/loadBalancerIPs" = "10.0.20.200"
"metallb.io/allow-shared-ip" = "shared"
}
}
spec {
type = "LoadBalancer"
external_traffic_policy = "Cluster"
selector = {
app = "dolt"
}
port {
name = "mysql"
port = 3306
target_port = 3306
}
}
}

View file

@ -0,0 +1,3 @@
include "root" {
path = find_in_parent_folders()
}

View file

@ -0,0 +1,10 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
locals {
tiers = {
core = "0-core"
cluster = "1-cluster"
gpu = "2-gpu"
edge = "3-edge"
aux = "4-aux"
}
}

418
stacks/hermes-agent/main.tf Normal file
View file

@ -0,0 +1,418 @@
variable "tls_secret_name" {
type = string
sensitive = true
}
# --- Namespace ---
resource "kubernetes_namespace" "hermes_agent" {
metadata {
name = "hermes-agent"
labels = {
tier = local.tiers.aux
}
}
}
module "tls_secret" {
source = "../../modules/kubernetes/setup_tls_secret"
namespace = kubernetes_namespace.hermes_agent.metadata[0].name
tls_secret_name = var.tls_secret_name
}
# --- Secrets (ESO from Vault) ---
resource "kubernetes_manifest" "external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "hermes-agent-secrets"
namespace = "hermes-agent"
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "hermes-agent-secrets"
}
dataFrom = [{
extract = {
key = "hermes-agent"
}
}]
}
}
depends_on = [kubernetes_namespace.hermes_agent]
}
# --- Storage ---
resource "kubernetes_persistent_volume_claim" "data_proxmox" {
wait_until_bound = false
metadata {
name = "hermes-agent-data-proxmox"
namespace = kubernetes_namespace.hermes_agent.metadata[0].name
annotations = {
"resize.topolvm.io/threshold" = "80%"
"resize.topolvm.io/increase" = "100%"
"resize.topolvm.io/storage_limit" = "5Gi"
}
}
spec {
access_modes = ["ReadWriteOnce"]
storage_class_name = "proxmox-lvm"
resources {
requests = {
storage = "1Gi"
}
}
}
}
# --- ConfigMaps ---
resource "kubernetes_config_map" "hermes_config" {
metadata {
name = "hermes-agent-config"
namespace = kubernetes_namespace.hermes_agent.metadata[0].name
}
data = {
"config.yaml" = yamlencode({
# Primary model Qwen 3.5 397B via NVIDIA NIM (free tier)
# api_key is injected by init container from the secret
model = {
provider = "custom"
default = "qwen/qwen3.5-397b-a17b"
base_url = "https://integrate.api.nvidia.com/v1"
api_key = "__NVIDIA_API_KEY_PLACEHOLDER__"
context_window = 262000
}
# Terminal execution
terminal = {
backend = "local"
}
# Memory system
memory = {
memory_enabled = true
user_profile_enabled = true
memory_char_limit = 2200
user_char_limit = 1375
}
# Context compression
compression = {
enabled = true
threshold = 0.50
target_ratio = 0.20
protect_last_n = 20
}
# Security
security = {
redact_secrets = true
}
# Display
display = {
streaming = true
}
# Web tools
web = {
backend = "tavily"
}
# Agent behavior
agent = {
max_turns = 90
api_timeout = 90
reasoning_effort = "medium"
}
# Telegram DM policy
unauthorized_dm_behavior = "ignore"
# MCP servers claude-memory for persistent cross-session memory
mcp_servers = {
claude-memory = {
url = "http://claude-memory.claude-memory.svc.cluster.local/mcp/mcp"
headers = {
Authorization = "Bearer $${CLAUDE_MEMORY_API_KEY}"
}
}
}
})
}
}
resource "kubernetes_config_map" "hermes_soul" {
metadata {
name = "hermes-agent-soul"
namespace = kubernetes_namespace.hermes_agent.metadata[0].name
}
data = {
"SOUL.md" = <<-EOT
# Hermes - Viktor's Personal AI Assistant
You are Hermes, a knowledgeable and helpful AI assistant owned and operated by
Viktor Barzin. You run on Viktor's home lab Kubernetes cluster and are powered
by Anthropic Claude.
## Personality
- Direct and concise, no unnecessary fluff
- Technical when needed, plain language when possible
- Honest about limitations and uncertainties
- Proactive suggest improvements, flag risks, follow up on open items
## Owner
- Viktor Barzin software engineer based in London
- Runs a home lab with a 5-node Kubernetes cluster on Proxmox
- Infrastructure managed with Terraform/Terragrunt
- Uses Vault for secrets, Traefik for ingress, Authentik for SSO
## Your Capabilities
- Terminal: local execution inside your container (Python, Node.js, git, ripgrep, ffmpeg)
- Memory: you have access to claude-memory MCP for persistent cross-session memory
- Always check memory at the start of conversations for context
- Store important decisions, learnings, and user preferences
- Skills: you can create and refine your own skills over time
- Web: search and fetch capabilities for research
## Package Persistence
Your container restarts lose system packages. To make installs survive restarts:
- **pip**: packages auto-persist to /opt/data/pip-packages/ (PIP_TARGET is set)
- **npm -g**: packages auto-persist to /opt/data/npm-global/ (NPM_CONFIG_PREFIX is set)
- **apt**: after installing system packages, save them:
`apt install -y foo bar && echo -e "foo\nbar" >> /opt/data/apt-packages.txt && sort -u -o /opt/data/apt-packages.txt /opt/data/apt-packages.txt`
They will be auto-reinstalled on next container restart.
## Communication Preferences
- Viktor prefers terse, technical responses
- No emojis unless asked
- Lead with the answer, not the reasoning
- When unsure, say so rather than guessing
EOT
}
}
# --- Deployment ---
resource "kubernetes_deployment" "hermes_agent" {
metadata {
name = "hermes-agent"
namespace = kubernetes_namespace.hermes_agent.metadata[0].name
labels = {
app = "hermes-agent"
tier = local.tiers.aux
}
}
spec {
strategy {
type = "Recreate"
}
replicas = 1
selector {
match_labels = {
app = "hermes-agent"
}
}
template {
metadata {
labels = {
app = "hermes-agent"
}
annotations = {
"reloader.stakater.com/search" = "true"
}
}
spec {
# Init container 1: bootstrap config into writable PVC
init_container {
name = "bootstrap-config"
image = "docker.io/library/busybox:1.37"
command = ["sh", "-c", <<-EOF
# Create subdirectories
mkdir -p /opt/data/memories /opt/data/skills /opt/data/sessions /opt/data/cron /opt/data/logs
mkdir -p /opt/data/pip-packages/bin /opt/data/npm-global/bin
# Copy config from ConfigMap and inject secrets
cp /config/config.yaml /opt/data/config.yaml
cp /soul/SOUL.md /opt/data/SOUL.md
# Replace API key placeholder with actual value from secret
NVIDIA_KEY=$(cat /secrets/NVIDIA_API_KEY)
sed -i "s|__NVIDIA_API_KEY_PLACEHOLDER__|$${NVIDIA_KEY}|g" /opt/data/config.yaml
# Generate .env from mounted secret files
echo "# Auto-generated from Vault secret/hermes-agent" > /opt/data/.env
for f in /secrets/*; do
key=$(basename "$f")
val=$(cat "$f")
echo "$${key}=$${val}" >> /opt/data/.env
done
# Fix ownership (hermes container user)
chown -R 1000:1000 /opt/data
EOF
]
volume_mount {
name = "data"
mount_path = "/opt/data"
}
volume_mount {
name = "config"
mount_path = "/config"
}
volume_mount {
name = "soul"
mount_path = "/soul"
}
volume_mount {
name = "secrets"
mount_path = "/secrets"
}
}
container {
name = "hermes-agent"
image = "nousresearch/hermes-agent:latest"
# Wrap entrypoint: restore apt packages (as root) then hand off to original entrypoint
command = ["bash", "-c", <<-EOF
if [ -f /opt/data/apt-packages.txt ] && [ -s /opt/data/apt-packages.txt ]; then
echo "Restoring apt packages: $(cat /opt/data/apt-packages.txt | tr '\n' ' ')"
apt-get update -qq && \
xargs -a /opt/data/apt-packages.txt apt-get install -y -qq --no-install-recommends 2>&1 | tail -5
echo "Done restoring apt packages"
fi
exec /opt/hermes/docker/entrypoint.sh gateway run
EOF
]
env {
name = "HERMES_HOME"
value = "/opt/data"
}
# Persist pip packages across restarts
env {
name = "PIP_TARGET"
value = "/opt/data/pip-packages"
}
env {
name = "PYTHONPATH"
value = "/opt/data/pip-packages"
}
# Persist npm global packages across restarts
env {
name = "NPM_CONFIG_PREFIX"
value = "/opt/data/npm-global"
}
# Add persistent bin dirs to PATH
env {
name = "PATH"
value = "/opt/data/pip-packages/bin:/opt/data/npm-global/bin:/opt/data/apt-local/usr/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
}
volume_mount {
name = "data"
mount_path = "/opt/data"
}
resources {
requests = {
cpu = "100m"
memory = "512Mi"
}
limits = {
memory = "1Gi"
}
}
liveness_probe {
exec {
command = ["pgrep", "-f", "hermes"]
}
initial_delay_seconds = 30
period_seconds = 30
failure_threshold = 3
}
}
volume {
name = "data"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.data_proxmox.metadata[0].name
}
}
volume {
name = "config"
config_map {
name = kubernetes_config_map.hermes_config.metadata[0].name
}
}
volume {
name = "soul"
config_map {
name = kubernetes_config_map.hermes_soul.metadata[0].name
}
}
volume {
name = "secrets"
secret {
secret_name = "hermes-agent-secrets"
}
}
}
}
}
lifecycle {
ignore_changes = [spec[0].template[0].spec[0].dns_config]
}
}
# --- Service ---
resource "kubernetes_service" "hermes_agent" {
metadata {
name = "hermes-agent"
namespace = kubernetes_namespace.hermes_agent.metadata[0].name
labels = {
app = "hermes-agent"
}
}
spec {
selector = {
app = "hermes-agent"
}
port {
port = 80
target_port = 8642
}
}
}
# --- Ingress ---
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.hermes_agent.metadata[0].name
name = "hermes-agent"
tls_secret_name = var.tls_secret_name
protected = true
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "Hermes Agent"
"gethomepage.dev/description" = "Self-improving AI agent"
"gethomepage.dev/icon" = "mdi-robot"
"gethomepage.dev/group" = "AI & Data"
"gethomepage.dev/pod-selector" = ""
}
}

1
stacks/hermes-agent/secrets Symbolic link
View file

@ -0,0 +1 @@
../../secrets

View file

@ -0,0 +1,13 @@
include "root" {
path = find_in_parent_folders()
}
dependency "platform" {
config_path = "../platform"
skip_outputs = true
}
dependency "vault" {
config_path = "../vault"
skip_outputs = true
}

View file

@ -0,0 +1,10 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
locals {
tiers = {
core = "0-core"
cluster = "1-cluster"
gpu = "2-gpu"
edge = "3-edge"
aux = "4-aux"
}
}

10
stacks/infra/tiers.tf Normal file
View file

@ -0,0 +1,10 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
locals {
tiers = {
core = "0-core"
cluster = "1-cluster"
gpu = "2-gpu"
edge = "3-edge"
aux = "4-aux"
}
}

View file

1
stacks/phpipam/secrets Symbolic link
View file

@ -0,0 +1 @@
../../secrets

10
stacks/phpipam/tiers.tf Normal file
View file

@ -0,0 +1,10 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
locals {
tiers = {
core = "0-core"
cluster = "1-cluster"
gpu = "2-gpu"
edge = "3-edge"
aux = "4-aux"
}
}

View file

@ -0,0 +1,10 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
locals {
tiers = {
core = "0-core"
cluster = "1-cluster"
gpu = "2-gpu"
edge = "3-edge"
aux = "4-aux"
}
}

1
stacks/vault/secrets Symbolic link
View file

@ -0,0 +1 @@
../../secrets