infra/stacks/servarr/aiostreams/main.tf

582 lines
19 KiB
Terraform
Raw Normal View History

variable "tls_secret_name" {}
2026-01-10 16:28:12 +00:00
variable "tier" { type = string }
variable "aiostreams_database_connection_string" { type = string }
2026-02-23 22:05:28 +00:00
variable "nfs_server" { type = string }
resource "kubernetes_namespace" "aiostreams" {
metadata {
name = "aiostreams"
labels = {
"istio-injection" : "disabled"
keel: enroll 11 more namespaces (operators + critical infra) Per user decision, removed authentik, kyverno, metallb-system, external-secrets, proxmox-csi, nfs-csi, vpa, sealed-secrets, infra-maintenance from the policy-level exclude list, and added keel.sh/enrolled=true to aiostreams (alive — 1/1 Running, despite being earlier flagged as scaled-to-0) and woodpecker. Net cluster coverage: 197/227 workloads on safe-force (86%), up from 170/227 (74%). All 197 are paired with match-tag=true (digest-only). Remaining 7 namespaces in Kyverno exclude list (irreducible): - keel (self-update) - calico-system + tigera-operator (operator-managed Installation CR) - cnpg-system + dbaas (state-coupled) - nvidia (chart-pinned at 570.195.03 per code-8vr0 until NVIDIA ships ubuntu26.04 driver images) - kube-system (k8s built-ins) Files: - stacks/kyverno/modules/kyverno/keel-annotations.tf — exclude list trimmed from 16 → 7 - stacks/authentik, kyverno, proxmox-csi, nfs-csi, vpa, sealed-secrets, servarr/aiostreams, metallb (creates ns "metallb-system"), woodpecker — added keel.sh/enrolled=true label on kubernetes_namespace resource - infra-maintenance was in the policy exclude but the namespace doesn't actually exist in the cluster; the removal is a no-op there Applied via kubectl patch on the live ClusterPolicy + kubectl label on namespaces because the kubernetes provider v3.1.0 panics on Kyverno ClusterPolicy refresh — TF source has the desired state for next clean apply on a fixed provider. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 20:59:14 +00:00
"keel.sh/enrolled" = "true"
}
}
[infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] ## Context Wave 3B-continued: the Goldilocks VPA dashboard (stacks/vpa) runs a Kyverno ClusterPolicy `goldilocks-vpa-auto-mode` that mutates every namespace with `metadata.labels["goldilocks.fairwinds.com/vpa-update-mode"] = "off"`. This is intentional — Terraform owns container resource limits, and Goldilocks should only provide recommendations, never auto-update. The label is how Goldilocks decides per-namespace whether to run its VPA in `off` mode. Effect on Terraform: every `kubernetes_namespace` resource shows the label as pending-removal (`-> null`) on every `scripts/tg plan`. Dawarich survey 2026-04-18 confirmed the drift. Cluster-side count: 88 namespaces carry the label (`kubectl get ns -o json | jq ... | wc -l`). Every TF-managed namespace is affected. This commit brings the intentional admission drift under the same `# KYVERNO_LIFECYCLE_V1` discoverability marker introduced in c9d221d5 for the ndots dns_config pattern. The marker now stands generically for any Kyverno admission-webhook drift suppression; the inline comment records which specific policy stamps which specific field so future grep audits show why each suppression exists. ## This change 107 `.tf` files touched — every stack's `resource "kubernetes_namespace"` resource gets: ```hcl lifecycle { # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]] } ``` Injection was done with a brace-depth-tracking Python pass (`/tmp/add_goldilocks_ignore.py`): match `^resource "kubernetes_namespace" ` → track `{` / `}` until the outermost closing brace → insert the lifecycle block before the closing brace. The script is idempotent (skips any file that already mentions `goldilocks.fairwinds.com/vpa-update-mode`) so re-running is safe. Vault stack picked up 2 namespaces in the same file (k8s-users produces one, plus a second explicit ns) — confirmed via file diff (+8 lines). ## What is NOT in this change - `stacks/trading-bot/main.tf` — entire file is `/* … */` commented out (paused 2026-04-06 per user decision). Reverted after the script ran. - `stacks/_template/main.tf.example` — per-stack skeleton, intentionally minimal. User keeps it that way. Not touched by the script (file has no real `resource "kubernetes_namespace"` — only a placeholder comment). - `.terraform/` copies (e.g. `stacks/metallb/.terraform/modules/...`) — gitignored, won't commit; the live path was edited. - `terraform fmt` cleanup of adjacent pre-existing alignment issues in authentik, freedify, hermes-agent, nvidia, vault, meshcentral. Reverted to keep the commit scoped to the Goldilocks sweep. Those files will need a separate fmt-only commit or will be cleaned up on next real apply to that stack. ## Verification Dawarich (one of the hundred-plus touched stacks) showed the pattern before and after: ``` $ cd stacks/dawarich && ../../scripts/tg plan Before: Plan: 0 to add, 2 to change, 0 to destroy. # kubernetes_namespace.dawarich will be updated in-place (goldilocks.fairwinds.com/vpa-update-mode -> null) # module.tls_secret.kubernetes_secret.tls_secret will be updated in-place (Kyverno generate.* labels — fixed in 8d94688d) After: No changes. Your infrastructure matches the configuration. ``` Injection count check: ``` $ rg -c 'KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode' stacks/ | awk -F: '{s+=$2} END {print s}' 108 ``` ## Reproduce locally 1. `git pull` 2. Pick any stack: `cd stacks/<name> && ../../scripts/tg plan` 3. Expect: no drift on the namespace's goldilocks.fairwinds.com/vpa-update-mode label. Closes: code-dwx Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:15:27 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
}
resource "random_id" "secret_key" {
byte_length = 32 # 32 bytes × 2 hex chars = 64 hex characters
}
resource "kubernetes_persistent_volume_claim" "data_proxmox" {
wait_until_bound = false
metadata {
name = "aiostreams-data-proxmox"
namespace = kubernetes_namespace.aiostreams.metadata[0].name
annotations = {
"resize.topolvm.io/threshold" = "10%"
"resize.topolvm.io/increase" = "100%"
"resize.topolvm.io/storage_limit" = "5Gi"
}
}
spec {
access_modes = ["ReadWriteOnce"]
storage_class_name = "proxmox-lvm"
resources {
requests = {
storage = "1Gi"
}
}
}
lifecycle {
# The autoresizer expands requests.storage up to storage_limit and
# PVCs can't shrink. Without this, every TF apply tries to revert
# to the spec value, K8s rejects the shrink, and the PVC ends up
# in Terminating-but-in-use limbo.
ignore_changes = [spec[0].resources[0].requests]
}
}
resource "kubernetes_deployment" "aiostreams" {
metadata {
name = "aiostreams"
namespace = kubernetes_namespace.aiostreams.metadata[0].name
labels = {
2026-01-10 16:28:12 +00:00
app = "aiostreams"
tier = var.tier
}
}
spec {
replicas = 1
strategy {
type = "Recreate"
}
selector {
match_labels = {
app = "aiostreams"
}
}
template {
metadata {
labels = {
app = "aiostreams"
}
}
spec {
container {
image = "viren070/aiostreams:2026.05.14.1326-nightly"
name = "aiostreams"
port {
container_port = 3000
}
env {
name = "BASE_URL"
value = "https://aiostreams.viktorbarzin.me"
}
env {
name = "SECRET_KEY"
value = random_id.secret_key.hex
}
env {
name = "DATABASE_URI"
value = var.aiostreams_database_connection_string
}
env {
# Cache stream-response payloads for 1h. Default is -1 (disabled),
# which made every Stremio request hit all 5 upstream addons live —
# slow, and contributed to the perceived empty-list issue when an
# upstream was slow/erroring. 1h is short enough that RD cache
# invalidations are picked up quickly.
name = "STREAM_CACHE_TTL"
value = "3600"
}
env {
# Whitelisted regex sync URLs. Vidhin's regexes.json contains release-group
# patterns (TRaSH Guides-aligned).
name = "WHITELISTED_REGEX_PATTERNS_URLS"
value = jsonencode([
"https://raw.githubusercontent.com/Vidhin05/Releases-Regex/main/English/regexes.json",
])
}
env {
# Whitelisted SEL (Stream Expression Language) sync URLs. Stream-expression
# files (Vidhin's ranked expressions + Tamtaro's ISE/PSE/ESE) go here, NOT
# in WHITELISTED_REGEX_PATTERNS_URLS — AIOStreams validates each field
# against the correct whitelist.
name = "WHITELISTED_SEL_URLS"
value = jsonencode([
"https://raw.githubusercontent.com/Vidhin05/Releases-Regex/main/English/expressions.json",
"https://raw.githubusercontent.com/Tam-Taro/SEL-Filtering-and-Sorting/main/AIOStreams-SyncedURLs/Tamtaro-synced-ISEs.json",
"https://raw.githubusercontent.com/Tam-Taro/SEL-Filtering-and-Sorting/main/AIOStreams-SyncedURLs/Tamtaro-synced-PSEs.json",
"https://raw.githubusercontent.com/Tam-Taro/SEL-Filtering-and-Sorting/main/AIOStreams-SyncedURLs/Tamtaro-synced-ESEs-standard.json",
])
}
volume_mount {
name = "data"
mount_path = "/app/data"
}
resources {
requests = {
2026-03-01 19:18:50 +00:00
cpu = "25m"
memory = "768Mi"
}
limits = {
memory = "768Mi"
}
}
}
volume {
name = "data"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.data_proxmox.metadata[0].name
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
metadata[0].annotations["keel.sh/policy"],
metadata[0].annotations["keel.sh/trigger"],
metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
metadata[0].annotations["keel.sh/match-tag"],
spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
metadata[0].annotations["kubernetes.io/change-cause"],
metadata[0].annotations["deployment.kubernetes.io/revision"],
spec[0].template[0].metadata[0].annotations["keel.sh/update-time"], # KEEL_LIFECYCLE_V1
]
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
}
}
resource "kubernetes_service" "aiostreams" {
metadata {
name = "aiostreams"
namespace = kubernetes_namespace.aiostreams.metadata[0].name
labels = {
"app" = "aiostreams"
}
}
spec {
selector = {
app = "aiostreams"
}
port {
name = "http"
port = 80
target_port = 3000
}
}
}
resource "kubernetes_manifest" "probe_secrets" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "aiostreams-probe-secrets"
namespace = kubernetes_namespace.aiostreams.metadata[0].name
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = { name = "aiostreams-probe-secrets" }
data = [
{ secretKey = "AIOSTREAMS_UUID", remoteRef = { key = "viktor", property = "aiostreams_uuid" } },
{ secretKey = "AIOSTREAMS_PASSWORD", remoteRef = { key = "viktor", property = "aiostreams_password" } },
{ secretKey = "STREMIO_EMAIL", remoteRef = { key = "viktor", property = "stremio_email" } },
{ secretKey = "STREMIO_PASSWORD", remoteRef = { key = "viktor", property = "stremio_password" } },
]
}
}
depends_on = [kubernetes_namespace.aiostreams]
}
resource "kubernetes_cron_job_v1" "stream_probe" {
metadata {
name = "aiostreams-stream-probe"
namespace = kubernetes_namespace.aiostreams.metadata[0].name
}
spec {
schedule = "*/5 * * * *"
concurrency_policy = "Replace"
successful_jobs_history_limit = 3
failed_jobs_history_limit = 3
job_template {
metadata {}
spec {
backoff_limit = 1
ttl_seconds_after_finished = 300
template {
metadata {}
spec {
restart_policy = "Never"
container {
name = "probe"
image = "docker.io/library/python:3.12-alpine"
command = ["/bin/sh", "-c", <<-EOT
pip install --quiet --disable-pip-version-check requests && python3 -c '
import requests, os, time, urllib.parse, sys
BASE = "http://aiostreams.aiostreams.svc.cluster.local"
PUSHGATEWAY = "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/aiostreams-stream-probe"
UUID = os.environ["AIOSTREAMS_UUID"]
PW = os.environ["AIOSTREAMS_PASSWORD"]
TEST_ID = "tt0903747:1:1" # Breaking Bad S01E01 - stable, always has many streams
THRESHOLD = 50
count = 0
success = 0
duration = 0
start = time.time()
try:
r = requests.get(f"{BASE}/api/v1/user/", params={"uuid": UUID, "password": PW}, timeout=10)
r.raise_for_status()
enc = r.json()["data"]["encryptedPassword"]
enc_url = urllib.parse.quote(enc, safe="")
r2 = requests.get(
f"{BASE}/stremio/{UUID}/{enc_url}/stream/series/{TEST_ID}.json",
headers={"User-Agent": "AIOStreams/probe"}, timeout=60,
)
r2.raise_for_status()
count = len(r2.json().get("streams", []))
success = 1 if count >= THRESHOLD else 0
print(f"streams={count} success={success}")
except Exception as e:
print(f"ERROR: {e}", file=sys.stderr)
success = 0
duration = time.time() - start
body = (
"# TYPE aiostreams_stream_count gauge\n"
f"aiostreams_stream_count {count}\n"
"# TYPE aiostreams_probe_success gauge\n"
f"aiostreams_probe_success {success}\n"
"# TYPE aiostreams_probe_duration_seconds gauge\n"
f"aiostreams_probe_duration_seconds {duration:.3f}\n"
"# TYPE aiostreams_probe_last_run_timestamp gauge\n"
f"aiostreams_probe_last_run_timestamp {int(time.time())}\n"
)
try:
requests.post(PUSHGATEWAY, data=body, timeout=10).raise_for_status()
except Exception as e:
print(f"WARN: pushgateway POST failed: {e}", file=sys.stderr)
sys.exit(0 if success else 1)
'
EOT
]
env_from {
secret_ref { name = "aiostreams-probe-secrets" }
}
resources {
requests = { memory = "64Mi", cpu = "10m" }
limits = { memory = "128Mi" }
}
}
}
}
}
}
}
depends_on = [kubernetes_manifest.probe_secrets, kubernetes_deployment.aiostreams]
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}
module "nfs_backup" {
source = "../../../modules/kubernetes/nfs_volume"
name = "aiostreams-backup"
namespace = kubernetes_namespace.aiostreams.metadata[0].name
nfs_server = var.nfs_server
nfs_path = "/srv/nfs/aiostreams-backup"
storage = "1Gi"
}
resource "kubernetes_cron_job_v1" "config_backup" {
metadata {
name = "aiostreams-config-backup"
namespace = kubernetes_namespace.aiostreams.metadata[0].name
}
spec {
schedule = "0 3 * * 0" # Sunday 03:00 weekly
concurrency_policy = "Forbid"
successful_jobs_history_limit = 3
failed_jobs_history_limit = 3
job_template {
metadata {}
spec {
backoff_limit = 2
ttl_seconds_after_finished = 600
template {
metadata {}
spec {
restart_policy = "Never"
container {
name = "backup"
image = "docker.io/library/python:3.12-alpine"
command = ["/bin/sh", "-c", <<-EOT
pip install --quiet --disable-pip-version-check requests && python3 -c '
import requests, os, time, json, sys, datetime, glob
BASE = "http://aiostreams.aiostreams.svc.cluster.local"
PUSHGATEWAY = "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/aiostreams-config-backup"
UUID = os.environ["AIOSTREAMS_UUID"]
PW = os.environ["AIOSTREAMS_PASSWORD"]
BACKUP_DIR = "/backup"
RETENTION_DAYS = 90
success = 0
bytes_written = 0
start = time.time()
try:
r = requests.get(f"{BASE}/api/v1/user/", params={"uuid": UUID, "password": PW, "raw": "true"}, timeout=30)
r.raise_for_status()
data = r.json()["data"]["userData"]
if not data:
raise RuntimeError("empty userData from API")
os.makedirs(BACKUP_DIR, exist_ok=True)
ts = datetime.datetime.utcnow().strftime("%Y-%m-%d_%H%M")
path = f"{BACKUP_DIR}/config-{ts}.json"
with open(path, "w") as f:
json.dump(data, f, indent=2, sort_keys=True)
bytes_written = os.path.getsize(path)
os.chmod(path, 0o600)
print(f"OK wrote {path} ({bytes_written} bytes)")
# Prune backups older than RETENTION_DAYS
cutoff = time.time() - (RETENTION_DAYS * 86400)
pruned = 0
for f in glob.glob(f"{BACKUP_DIR}/config-*.json"):
if os.path.getmtime(f) < cutoff:
os.unlink(f)
pruned += 1
if pruned:
print(f"Pruned {pruned} old backups")
success = 1
except Exception as e:
print(f"ERROR: {e}", file=sys.stderr)
duration = time.time() - start
body = (
"# TYPE aiostreams_config_backup_success gauge\n"
f"aiostreams_config_backup_success {success}\n"
"# TYPE aiostreams_config_backup_bytes gauge\n"
f"aiostreams_config_backup_bytes {bytes_written}\n"
"# TYPE aiostreams_config_backup_duration_seconds gauge\n"
f"aiostreams_config_backup_duration_seconds {duration:.3f}\n"
"# TYPE aiostreams_config_backup_last_run_timestamp gauge\n"
f"aiostreams_config_backup_last_run_timestamp {int(time.time())}\n"
)
try:
requests.post(PUSHGATEWAY, data=body, timeout=10).raise_for_status()
except Exception as e:
print(f"WARN: pushgateway POST failed: {e}", file=sys.stderr)
sys.exit(0 if success else 1)
'
EOT
]
env_from {
secret_ref { name = "aiostreams-probe-secrets" }
}
volume_mount {
name = "backup"
mount_path = "/backup"
}
resources {
requests = { memory = "64Mi", cpu = "10m" }
limits = { memory = "128Mi" }
}
}
volume {
name = "backup"
persistent_volume_claim {
claim_name = module.nfs_backup.claim_name
}
}
}
}
}
}
}
depends_on = [kubernetes_manifest.probe_secrets, kubernetes_deployment.aiostreams, module.nfs_backup]
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}
resource "kubernetes_cron_job_v1" "stremio_account_backup" {
metadata {
name = "stremio-account-backup"
namespace = kubernetes_namespace.aiostreams.metadata[0].name
}
spec {
schedule = "0 4 * * 0" # Sunday 04:00 weekly (1h after config-backup)
concurrency_policy = "Forbid"
successful_jobs_history_limit = 3
failed_jobs_history_limit = 3
job_template {
metadata {}
spec {
backoff_limit = 2
ttl_seconds_after_finished = 600
template {
metadata {}
spec {
restart_policy = "Never"
container {
name = "backup"
image = "docker.io/library/python:3.12-alpine"
command = ["/bin/sh", "-c", <<-EOT
pip install --quiet --disable-pip-version-check requests && python3 -c '
import requests, os, time, json, sys, datetime, glob
BASE = "https://api.strem.io/api"
PUSHGATEWAY = "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/stremio-account-backup"
EMAIL = os.environ["STREMIO_EMAIL"]
PASSWORD = os.environ["STREMIO_PASSWORD"]
BACKUP_DIR = "/backup"
RETENTION_DAYS = 90
success = 0
bytes_written = 0
addon_count = 0
start = time.time()
try:
r = requests.post(f"{BASE}/login", json={"type":"Login","email":EMAIL,"password":PASSWORD}, timeout=20)
r.raise_for_status()
auth = r.json()["result"]["authKey"]
r2 = requests.post(f"{BASE}/addonCollectionGet", json={"type":"AddonCollectionGet","authKey":auth,"update":True}, timeout=30)
r2.raise_for_status()
addons = r2.json()["result"]["addons"]
addon_count = len(addons)
os.makedirs(BACKUP_DIR, exist_ok=True)
ts = datetime.datetime.now(datetime.UTC).strftime("%Y-%m-%d_%H%M")
path = f"{BACKUP_DIR}/stremio-collection-{ts}.json"
payload = {"capturedAt": ts, "email": EMAIL, "addonCount": addon_count, "addons": addons}
with open(path, "w") as f:
json.dump(payload, f, indent=2, sort_keys=True)
bytes_written = os.path.getsize(path)
os.chmod(path, 0o600)
print(f"OK wrote {path} ({bytes_written} bytes, {addon_count} addons)")
# Logout to invalidate the auth key
try:
requests.post(f"{BASE}/logout", json={"type":"Logout","authKey":auth}, timeout=10)
except Exception:
pass
# Prune older than RETENTION_DAYS
cutoff = time.time() - (RETENTION_DAYS * 86400)
pruned = 0
for f in glob.glob(f"{BACKUP_DIR}/stremio-collection-*.json"):
if os.path.getmtime(f) < cutoff:
os.unlink(f); pruned += 1
if pruned: print(f"Pruned {pruned} old backups")
success = 1
except Exception as e:
print(f"ERROR: {e}", file=sys.stderr)
duration = time.time() - start
body = (
"# TYPE stremio_account_backup_success gauge\n"
f"stremio_account_backup_success {success}\n"
"# TYPE stremio_account_backup_bytes gauge\n"
f"stremio_account_backup_bytes {bytes_written}\n"
"# TYPE stremio_account_backup_addon_count gauge\n"
f"stremio_account_backup_addon_count {addon_count}\n"
"# TYPE stremio_account_backup_duration_seconds gauge\n"
f"stremio_account_backup_duration_seconds {duration:.3f}\n"
"# TYPE stremio_account_backup_last_run_timestamp gauge\n"
f"stremio_account_backup_last_run_timestamp {int(time.time())}\n"
)
try:
requests.post(PUSHGATEWAY, data=body, timeout=10).raise_for_status()
except Exception as e:
print(f"WARN: pushgateway POST failed: {e}", file=sys.stderr)
sys.exit(0 if success else 1)
'
EOT
]
env_from {
secret_ref { name = "aiostreams-probe-secrets" }
}
volume_mount {
name = "backup"
mount_path = "/backup"
}
resources {
requests = { memory = "64Mi", cpu = "10m" }
limits = { memory = "128Mi" }
}
}
volume {
name = "backup"
persistent_volume_claim {
claim_name = module.nfs_backup.claim_name
}
}
}
}
}
}
}
depends_on = [kubernetes_manifest.probe_secrets, module.nfs_backup]
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}
module "ingress" {
source = "../../../modules/kubernetes/ingress_factory"
# auth = "app": AIOStreams enforces its own UUID + password gate on /configure
# and /api/*, and Stremio addon URLs (/stremio/{uuid}/{encryptedPassword}/...)
# use the encryptedPassword path segment as a bearer token. Authentik forward-auth
# broke Stremio clients (cannot follow OAuth 302) and is redundant with the app's
# own auth. UUIDs are 128-bit random; password attempts are rate-limited.
auth = "app"
[infra] Auto-create Cloudflare DNS records from ingress_factory ## Context Deploying new services required manually adding hostnames to cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars — a separate file from the service stack. This was frequently forgotten, leaving services unreachable externally. ## This change: - Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory` modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP). - Simplify cloudflared tunnel from 100 per-hostname rules to wildcard `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing. - Add global Cloudflare provider via terragrunt.hcl (separate cloudflare_provider.tf with Vault-sourced API key). - Migrate 118 hostnames from centralized config.tfvars to per-service dns_type. 17 hostnames remain centrally managed (Helm ingresses, special cases). - Update docs, AGENTS.md, CLAUDE.md, dns.md runbook. ``` BEFORE AFTER config.tfvars (manual list) stacks/<svc>/main.tf | module "ingress" { v dns_type = "proxied" stacks/cloudflared/ } for_each = list | cloudflare_record auto-creates tunnel per-hostname cloudflare_record + annotation ``` ## What is NOT in this change: - Uptime Kuma monitor migration (still reads from config.tfvars) - 17 remaining centrally-managed hostnames (Helm, special cases) - Removal of allow_overwrite (keep until migration confirmed stable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:45:04 +00:00
dns_type = "proxied"
namespace = kubernetes_namespace.aiostreams.metadata[0].name
name = "aiostreams"
tls_secret_name = var.tls_secret_name
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "AIOStreams"
"gethomepage.dev/description" = "Streaming addon manager"
"gethomepage.dev/icon" = "stremio.png"
"gethomepage.dev/group" = "Media & Entertainment"
"gethomepage.dev/pod-selector" = ""
}
}