infra/stacks/servarr/main.tf

132 lines
3.3 KiB
Terraform
Raw Normal View History

variable "tls_secret_name" {
type = string
sensitive = true
}
variable "nfs_server" { type = string }
resource "kubernetes_manifest" "external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "servarr-secrets"
namespace = "servarr"
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "servarr-secrets"
}
dataFrom = [{
extract = {
key = "servarr"
}
}]
}
}
depends_on = [kubernetes_namespace.servarr]
}
data "kubernetes_secret" "eso_secrets" {
metadata {
name = "servarr-secrets"
namespace = kubernetes_namespace.servarr.metadata[0].name
}
depends_on = [kubernetes_manifest.external_secret]
}
locals {
homepage_credentials = jsondecode(data.kubernetes_secret.eso_secrets.data["homepage_credentials"])
}
resource "kubernetes_namespace" "servarr" {
metadata {
name = "servarr"
labels = {
tier = local.tiers.aux
}
}
[infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] ## Context Wave 3B-continued: the Goldilocks VPA dashboard (stacks/vpa) runs a Kyverno ClusterPolicy `goldilocks-vpa-auto-mode` that mutates every namespace with `metadata.labels["goldilocks.fairwinds.com/vpa-update-mode"] = "off"`. This is intentional — Terraform owns container resource limits, and Goldilocks should only provide recommendations, never auto-update. The label is how Goldilocks decides per-namespace whether to run its VPA in `off` mode. Effect on Terraform: every `kubernetes_namespace` resource shows the label as pending-removal (`-> null`) on every `scripts/tg plan`. Dawarich survey 2026-04-18 confirmed the drift. Cluster-side count: 88 namespaces carry the label (`kubectl get ns -o json | jq ... | wc -l`). Every TF-managed namespace is affected. This commit brings the intentional admission drift under the same `# KYVERNO_LIFECYCLE_V1` discoverability marker introduced in c9d221d5 for the ndots dns_config pattern. The marker now stands generically for any Kyverno admission-webhook drift suppression; the inline comment records which specific policy stamps which specific field so future grep audits show why each suppression exists. ## This change 107 `.tf` files touched — every stack's `resource "kubernetes_namespace"` resource gets: ```hcl lifecycle { # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]] } ``` Injection was done with a brace-depth-tracking Python pass (`/tmp/add_goldilocks_ignore.py`): match `^resource "kubernetes_namespace" ` → track `{` / `}` until the outermost closing brace → insert the lifecycle block before the closing brace. The script is idempotent (skips any file that already mentions `goldilocks.fairwinds.com/vpa-update-mode`) so re-running is safe. Vault stack picked up 2 namespaces in the same file (k8s-users produces one, plus a second explicit ns) — confirmed via file diff (+8 lines). ## What is NOT in this change - `stacks/trading-bot/main.tf` — entire file is `/* … */` commented out (paused 2026-04-06 per user decision). Reverted after the script ran. - `stacks/_template/main.tf.example` — per-stack skeleton, intentionally minimal. User keeps it that way. Not touched by the script (file has no real `resource "kubernetes_namespace"` — only a placeholder comment). - `.terraform/` copies (e.g. `stacks/metallb/.terraform/modules/...`) — gitignored, won't commit; the live path was edited. - `terraform fmt` cleanup of adjacent pre-existing alignment issues in authentik, freedify, hermes-agent, nvidia, vault, meshcentral. Reverted to keep the commit scoped to the Goldilocks sweep. Those files will need a separate fmt-only commit or will be cleaned up on next real apply to that stack. ## Verification Dawarich (one of the hundred-plus touched stacks) showed the pattern before and after: ``` $ cd stacks/dawarich && ../../scripts/tg plan Before: Plan: 0 to add, 2 to change, 0 to destroy. # kubernetes_namespace.dawarich will be updated in-place (goldilocks.fairwinds.com/vpa-update-mode -> null) # module.tls_secret.kubernetes_secret.tls_secret will be updated in-place (Kyverno generate.* labels — fixed in 8d94688d) After: No changes. Your infrastructure matches the configuration. ``` Injection count check: ``` $ rg -c 'KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode' stacks/ | awk -F: '{s+=$2} END {print s}' 108 ``` ## Reproduce locally 1. `git pull` 2. Pick any stack: `cd stacks/<name> && ../../scripts/tg plan` 3. Expect: no drift on the namespace's goldilocks.fairwinds.com/vpa-update-mode label. Closes: code-dwx Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:15:27 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
}
module "tls_secret" {
source = "../../modules/kubernetes/setup_tls_secret"
namespace = kubernetes_namespace.servarr.metadata[0].name
tls_secret_name = var.tls_secret_name
}
# module "readarr" {
# source = "./readarr"
# tls_secret_name = var.tls_secret_name
# tier = local.tiers.aux
# }
module "prowlarr" {
source = "./prowlarr"
tls_secret_name = var.tls_secret_name
tier = local.tiers.aux
nfs_server = var.nfs_server
homepage_credentials = local.homepage_credentials
}
module "qbittorrent" {
source = "./qbittorrent"
tls_secret_name = var.tls_secret_name
tier = local.tiers.aux
nfs_server = var.nfs_server
homepage_credentials = local.homepage_credentials
}
[servarr] Rewrite MAM ratio farming — break Mouse death spiral, adopt in TF ## Context A MAM (MyAnonamouse) freeleech farming workflow was deployed on 2026-04-14 via kubectl apply (outside Terraform). Five days later the account was still stuck in Mouse class: 715 MiB downloaded, 0 uploaded, ratio 0. Tracker responses on 7 of 9 active torrents returned `status=4 | msg="User currently mouse rank, you need to get your ratio up!"` — MAM was actively refusing to serve peer lists because the account was in Mouse class, and refusing to serve peer lists made the ratio impossible to recover. Meanwhile the grabber kept digging: 501 torrents sat in qBittorrent, 0 completed, 0 bytes uploaded. Root causes (ranked): 1. Death spiral — Mouse class blocks announces, nothing uploads. 2. BP-spender 30 000 BP threshold blocked the only exit even though the account already had 24 500 BP. 3. Grabber selection (`score = 1.0 / (seeders+1)`) preferred low-demand torrents filtered to <100 MiB — ratio-hostile by design. 4. Grabber/cleanup deadlock: cleanup only fired on seed_time > 3d, so torrents that never started never qualified. Combined with the 500- torrent cap this stalled the grabber indefinitely. 5. qBittorrent queueing amplified (4) — 495/501 stuck in queuedDL. 6. Ratio-monitor labelled queued torrents `unknown` (empty tracker field), hiding the problem on the MAM Grafana panel. 7. qBittorrent memory limit (256 Mi LimitRange default) too low. 8. All of the above was Terraform drift with no reviewability. ## This change Introduces `stacks/servarr/mam-farming/` — a new TF module that adopts the three kubectl-applied resources and replaces their scripts with demand-first, H&R-aware logic. Also bumps qBittorrent resources, fixes ratio-monitor labelling, and adds five Prometheus alerts plus a Grafana panel row. ### Architecture MAM API ───┬─── jsonLoad.php (profile: ratio, class, BP) ├─── loadSearchJSONbasic.php (freeleech search) ├─── bonusBuy.php (50 GiB min tier for API) └─── download.php (torrent file) │ Pushgateway <──┬────────────┤ │ mam_ratio ┌────────────────────┐ │ mam_class_code │ freeleech-grabber │ */30 │ mam_bp_balance ◄───│ (ratio-guarded) │ │ mam_farming_* └──────────┬─────────┘ │ mam_janitor_* │ adds to │ ▼ │ Grafana panels qBittorrent (mam-farming) │ + 5 alerts ▲ │ │ deletes by rule │ ┌──────────┴─────────┐ │ ◄───│ farming-janitor │ */15 │ │ (H&R-aware) │ │ └──────────┬─────────┘ │ │ buys credit │ ┌──────────┴─────────┐ └───────────────────────│ bp-spender │ 0 */6 │ (tier-aware) │ └────────────────────┘ ### Key decisions - **Ratio guard on grabber** — refuse to grab if ratio < 1.2 OR class == Mouse. Prevents the death spiral from deepening. Emits `mam_grabber_skipped_reason{reason=...}` and exits clean. - **Demand-first selection** — new score formula `leechers*3 - seeders*0.5 + 200 if freeleech_wedge else 0`; size band 50 MiB – 1 GiB; leecher floor 1; seeder ceiling 50. Picks titles that will actually upload. - **Janitor decoupled from grabber** — runs every 15 min regardless of the ratio-guard state. Without this, stuck torrents accumulate fastest exactly when the grabber is skipping (Mouse class). H&R-aware: never deletes `progress==1.0 AND seeding_time < 72h`. Six delete reasons observable via `mam_janitor_deleted_per_run{reason=...}`. - **BP-spender tier-aware** — MAM imposes a hard 50 GiB minimum on API buyers ("Automated spenders are limited to buying at least 50 GB... due to log spam"). Valid API tiers: 50/100/200/500 GiB at 500 BP/GiB. The spender picks the smallest tier that satisfies the ratio deficit AND fits the budget, preserving a 500 BP reserve. If even the 50 GiB tier is too expensive, it skips and retries on the next 6-hour cron. - **Authoritative metrics use MAM profile fields** — `downloaded_bytes` / `uploaded_bytes` (integers) rather than the pretty-printed `downloaded` / `uploaded` strings like "715.55 MiB" that MAM also returns. - **Ratio-monitor category-first labelling** — `tracker` is empty for queued torrents that never announced. Now maps `category==mam-farming` to label `mam` first, only falls back to tracker-URL parsing when category is absent. Stops hundreds of MAM torrents collecting under `unknown`. - **qBittorrent resources bumped** to `requests=512Mi / limits=1Gi` so hundreds of active torrents don't OOM. ### Emergency recovery performed this session 1. Adopted 5 in-cluster resources via root-module `import {}` blocks (Terraform 1.5+ rejects imports inside child modules). 2. Ran the janitor in DRY_RUN=1 to verify rules against live state — 466 `never_started` candidates, 0 false positives in any other reason bucket. Flipped to enforce mode. 3. Janitor deleted 466 stuck torrents (matches plan's ~495 target; 35 preserved as active/in-progress). 4. Truncated `/data/grabbed_ids.txt` so newly-popular titles become eligible again. The ratio is still 0 because the API cannot buy below 50 GiB and the account sits at 24 551 BP (needs 25 000). Manual 1 GiB purchase via the MAM web UI — 500 BP — would immediately lift the account to ratio ≈ 1.4 and unblock announces. Future automation cannot do this for us due to MAMs anti-spam rule. ### What is NOT in this change - qBittorrent prefs reconciliation (max_active_downloads=20, max_active_uploads=150, max_active_torrents=150). The plan wanted this; deferred to a follow-up because the janitor + ratio recovery handles the 500-torrent backlog first. A small reconciler CronJob posting to /api/v2/app/setPreferences is the intended follow-up. - VIP purchase (~100 k BP) — deferred until BP accumulates. - Cross-seed / autobrr — separate initiative. ## Alerts added - P1 MAMMouseClass — `mam_class_code == 0` for 1h - P1 MAMCookieExpired — `mam_farming_cookie_expired > 0` - P2 MAMRatioBelowOne — `mam_ratio < 1.0` for 24h (replaces old QBittorrentMAMRatioLow, now driven by authoritative profile metric) - P2 MAMFarmingStuck — no grabs in 4h while ratio is healthy - P2 MAMJanitorStuckBacklog — `skipped_active > 400` for 6h ## Test plan ### Automated $ cd infra/stacks/servarr && ../../scripts/tg plan 2>&1 | grep Plan Plan: 5 to import, 2 to add, 6 to change, 0 to destroy. $ ../../scripts/tg apply --non-interactive Apply complete! Resources: 5 imported, 2 added, 6 changed, 0 destroyed. # Re-plan after import block removal (idempotent) $ ../../scripts/tg plan 2>&1 | grep Plan Plan: 0 to add, 1 to change, 0 to destroy. # The 1 change is a pre-existing MetalLB annotation drift on the # qbittorrent-torrenting Service — unrelated to this change. $ cd ../monitoring && ../../scripts/tg apply --non-interactive Apply complete! Resources: 0 added, 2 changed, 0 destroyed. # Python + JSON syntax $ python3 -c 'import ast; [ast.parse(open(p).read()) for p in [ "infra/stacks/servarr/mam-farming/files/freeleech-grabber.py", "infra/stacks/servarr/mam-farming/files/bp-spender.py", "infra/stacks/servarr/mam-farming/files/mam-farming-janitor.py"]]' $ python3 -c 'import json; json.load(open( "infra/stacks/monitoring/modules/monitoring/dashboards/qbittorrent.json"))' ### Manual Verification 1. Grabber ratio-guard path: $ kubectl -n servarr create job --from=cronjob/mam-freeleech-grabber g1 $ kubectl -n servarr logs job/g1 Skip grab: ratio=0.0 class=Mouse (floor=1.2) reason=mouse_class 2. BP-spender tier path: $ kubectl -n servarr create job --from=cronjob/mam-bp-spender s1 $ kubectl -n servarr logs job/s1 Profile: ratio=0.0 class=Mouse DL=0.70 GiB UL=0.00 GiB BP=24551 | deficit=1.40 GiB needed=3 affordable=48 buy=0 Done: BP=24551, spent=0 GiB (needed=3, affordable=48) Correctly skips because affordable (48) < smallest API tier (50). 3. Janitor in enforce mode: $ kubectl -n servarr create job --from=cronjob/mam-farming-janitor j1 $ kubectl -n servarr logs job/j1 | tail -3 Done: deleted=466 preserved_hnr=0 skipped_active=35 dry_run=False per reason: {'never_started': 466, ...} Second run immediately after: `deleted=0 skipped_active=35` — steady state with only active/seeding torrents left. 4. Alerts loaded: $ kubectl -n monitoring get cm prometheus-server \ -o jsonpath='{.data.alerting_rules\.yml}' \ | grep -E "alert: MAM|alert: QBittorrent" - alert: MAMMouseClass - alert: MAMCookieExpired - alert: MAMRatioBelowOne - alert: MAMFarmingStuck - alert: MAMJanitorStuckBacklog - alert: QBittorrentDisconnected - alert: QBittorrentMAMUnsatisfied 5. Dashboard: browse to Grafana "qBittorrent - Seeding & Ratio" → new "MAM Profile (from jsonLoad.php)" row at the bottom shows class, BP balance, profile ratio, transfer, BP-vs-reserve timeseries, janitor deletion stacked chart, janitor state stat, grabber state stat. ## Reproduce locally 1. `cd infra/stacks/servarr && ../../scripts/tg plan` — expect 0 add / 1 change (unrelated MetalLB annotation drift). 2. `kubectl -n servarr get cronjobs` — expect three: mam-freeleech-grabber, mam-bp-spender, mam-farming-janitor. 3. Trigger each via `kubectl create job --from=cronjob/<name> <job>` and read logs; outputs match the manual-verification snippets above. Closes: code-qfs Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 11:45:38 +00:00
module "mam_farming" {
source = "./mam-farming"
namespace = kubernetes_namespace.servarr.metadata[0].name
depends_on = [
kubernetes_manifest.external_secret,
module.qbittorrent,
]
}
module "flaresolverr" {
source = "./flaresolverr"
tls_secret_name = var.tls_secret_name
tier = local.tiers.aux
}
# module "lidarr" {
# source = "./lidarr"
# tls_secret_name = var.tls_secret_name
# tier = local.tiers.aux
# }
# module "soulseek" {
# source = "./soulseek"
# tls_secret_name = var.tls_secret_name
# tier = local.tiers.aux
# }
module "listenarr" {
source = "./listenarr"
tls_secret_name = var.tls_secret_name
tier = local.tiers.aux
nfs_server = var.nfs_server
}
module "aiostreams" {
source = "./aiostreams"
tls_secret_name = var.tls_secret_name
aiostreams_database_connection_string = data.kubernetes_secret.eso_secrets.data["aiostreams_database_connection_string"]
tier = local.tiers.aux
nfs_server = var.nfs_server
}