2026-01-03 16:58:57 +00:00
|
|
|
|
variable "tls_secret_name" {}
|
2026-01-10 16:28:12 +00:00
|
|
|
|
variable "tier" { type = string }
|
2026-01-03 16:58:57 +00:00
|
|
|
|
variable "aiostreams_database_connection_string" { type = string }
|
[ci skip] Infrastructure hardening: security, monitoring, reliability, maintainability
Phase 1 - Critical Security:
- Netbox: move hardcoded DB/superuser passwords to variables
- MeshCentral: disable public registration, add Authentik auth
- Traefik: disable insecure API dashboard (api.insecure=false)
- Traefik: configure forwarded headers with Cloudflare trusted IPs
Phase 2 - Security Hardening:
- Add security headers middleware (HSTS, X-Frame-Options, nosniff, etc.)
- Add Kyverno pod security policies in audit mode (privileged, host
namespaces, SYS_ADMIN, trusted registries)
- Tighten rate limiting (avg=10, burst=50)
- Add Authentik protection to grampsweb
Phase 3 - Monitoring & Alerting:
- Add critical service alerts (PostgreSQL, MySQL, Redis, Headscale,
Authentik, Loki)
- Increase Loki retention from 7 to 30 days (720h)
- Add predictive PV filling alert (predict_linear)
- Re-enable Hackmd and Privatebin down alerts
Phase 4 - Reliability:
- Add resource requests/limits to Redis, DBaaS, Technitium, Headscale,
Vaultwarden, Uptime Kuma
- Increase Alloy DaemonSet memory to 512Mi/1Gi
Phase 6 - Maintainability:
- Extract duplicated tiers locals to terragrunt.hcl generate block
(removed from 67 stacks)
- Replace hardcoded NFS IP 10.0.10.15 with var.nfs_server (114
instances across 63 files)
- Replace hardcoded Redis/PostgreSQL/MySQL/Ollama/mail host references
with variables across ~35 stacks
- Migrate xray raw ingress resources to ingress_factory modules
2026-02-23 22:05:28 +00:00
|
|
|
|
variable "nfs_server" { type = string }
|
2026-01-03 16:58:57 +00:00
|
|
|
|
|
|
|
|
|
|
resource "kubernetes_namespace" "aiostreams" {
|
|
|
|
|
|
metadata {
|
|
|
|
|
|
name = "aiostreams"
|
|
|
|
|
|
labels = {
|
|
|
|
|
|
"istio-injection" : "disabled"
|
keel: enroll 11 more namespaces (operators + critical infra)
Per user decision, removed authentik, kyverno, metallb-system,
external-secrets, proxmox-csi, nfs-csi, vpa, sealed-secrets,
infra-maintenance from the policy-level exclude list, and added
keel.sh/enrolled=true to aiostreams (alive — 1/1 Running, despite
being earlier flagged as scaled-to-0) and woodpecker.
Net cluster coverage: 197/227 workloads on safe-force (86%), up from
170/227 (74%). All 197 are paired with match-tag=true (digest-only).
Remaining 7 namespaces in Kyverno exclude list (irreducible):
- keel (self-update)
- calico-system + tigera-operator (operator-managed Installation CR)
- cnpg-system + dbaas (state-coupled)
- nvidia (chart-pinned at 570.195.03 per code-8vr0 until NVIDIA ships
ubuntu26.04 driver images)
- kube-system (k8s built-ins)
Files:
- stacks/kyverno/modules/kyverno/keel-annotations.tf — exclude list
trimmed from 16 → 7
- stacks/authentik, kyverno, proxmox-csi, nfs-csi, vpa, sealed-secrets,
servarr/aiostreams, metallb (creates ns "metallb-system"), woodpecker —
added keel.sh/enrolled=true label on kubernetes_namespace resource
- infra-maintenance was in the policy exclude but the namespace doesn't
actually exist in the cluster; the removal is a no-op there
Applied via kubectl patch on the live ClusterPolicy + kubectl label on
namespaces because the kubernetes provider v3.1.0 panics on Kyverno
ClusterPolicy refresh — TF source has the desired state for next clean
apply on a fixed provider.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 20:59:14 +00:00
|
|
|
|
"keel.sh/enrolled" = "true"
|
2026-01-03 16:58:57 +00:00
|
|
|
|
}
|
|
|
|
|
|
}
|
[infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip]
## Context
Wave 3B-continued: the Goldilocks VPA dashboard (stacks/vpa) runs a Kyverno
ClusterPolicy `goldilocks-vpa-auto-mode` that mutates every namespace with
`metadata.labels["goldilocks.fairwinds.com/vpa-update-mode"] = "off"`. This
is intentional — Terraform owns container resource limits, and Goldilocks
should only provide recommendations, never auto-update. The label is how
Goldilocks decides per-namespace whether to run its VPA in `off` mode.
Effect on Terraform: every `kubernetes_namespace` resource shows the label
as pending-removal (`-> null`) on every `scripts/tg plan`. Dawarich survey
2026-04-18 confirmed the drift. Cluster-side count: 88 namespaces carry the
label (`kubectl get ns -o json | jq ... | wc -l`). Every TF-managed namespace
is affected.
This commit brings the intentional admission drift under the same
`# KYVERNO_LIFECYCLE_V1` discoverability marker introduced in c9d221d5 for
the ndots dns_config pattern. The marker now stands generically for any
Kyverno admission-webhook drift suppression; the inline comment records
which specific policy stamps which specific field so future grep audits
show why each suppression exists.
## This change
107 `.tf` files touched — every stack's `resource "kubernetes_namespace"`
resource gets:
```hcl
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
```
Injection was done with a brace-depth-tracking Python pass (`/tmp/add_goldilocks_ignore.py`):
match `^resource "kubernetes_namespace" ` → track `{` / `}` until the
outermost closing brace → insert the lifecycle block before the closing
brace. The script is idempotent (skips any file that already mentions
`goldilocks.fairwinds.com/vpa-update-mode`) so re-running is safe.
Vault stack picked up 2 namespaces in the same file (k8s-users produces
one, plus a second explicit ns) — confirmed via file diff (+8 lines).
## What is NOT in this change
- `stacks/trading-bot/main.tf` — entire file is `/* … */` commented out
(paused 2026-04-06 per user decision). Reverted after the script ran.
- `stacks/_template/main.tf.example` — per-stack skeleton, intentionally
minimal. User keeps it that way. Not touched by the script (file
has no real `resource "kubernetes_namespace"` — only a placeholder
comment).
- `.terraform/` copies (e.g. `stacks/metallb/.terraform/modules/...`) —
gitignored, won't commit; the live path was edited.
- `terraform fmt` cleanup of adjacent pre-existing alignment issues in
authentik, freedify, hermes-agent, nvidia, vault, meshcentral. Reverted
to keep the commit scoped to the Goldilocks sweep. Those files will
need a separate fmt-only commit or will be cleaned up on next real
apply to that stack.
## Verification
Dawarich (one of the hundred-plus touched stacks) showed the pattern
before and after:
```
$ cd stacks/dawarich && ../../scripts/tg plan
Before:
Plan: 0 to add, 2 to change, 0 to destroy.
# kubernetes_namespace.dawarich will be updated in-place
(goldilocks.fairwinds.com/vpa-update-mode -> null)
# module.tls_secret.kubernetes_secret.tls_secret will be updated in-place
(Kyverno generate.* labels — fixed in 8d94688d)
After:
No changes. Your infrastructure matches the configuration.
```
Injection count check:
```
$ rg -c 'KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode' stacks/ | awk -F: '{s+=$2} END {print s}'
108
```
## Reproduce locally
1. `git pull`
2. Pick any stack: `cd stacks/<name> && ../../scripts/tg plan`
3. Expect: no drift on the namespace's goldilocks.fairwinds.com/vpa-update-mode label.
Closes: code-dwx
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:15:27 +00:00
|
|
|
|
lifecycle {
|
|
|
|
|
|
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
|
|
|
|
|
|
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
|
|
|
|
|
|
}
|
2026-01-03 16:58:57 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
2026-01-03 23:13:55 +00:00
|
|
|
|
resource "random_id" "secret_key" {
|
|
|
|
|
|
byte_length = 32 # 32 bytes × 2 hex chars = 64 hex characters
|
|
|
|
|
|
}
|
2026-01-03 16:58:57 +00:00
|
|
|
|
|
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
|
|
|
|
resource "kubernetes_persistent_volume_claim" "data_proxmox" {
|
|
|
|
|
|
wait_until_bound = false
|
|
|
|
|
|
metadata {
|
|
|
|
|
|
name = "aiostreams-data-proxmox"
|
|
|
|
|
|
namespace = kubernetes_namespace.aiostreams.metadata[0].name
|
|
|
|
|
|
annotations = {
|
2026-05-10 19:56:16 +00:00
|
|
|
|
"resize.topolvm.io/threshold" = "10%"
|
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
|
|
|
|
"resize.topolvm.io/increase" = "100%"
|
|
|
|
|
|
"resize.topolvm.io/storage_limit" = "5Gi"
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
spec {
|
|
|
|
|
|
access_modes = ["ReadWriteOnce"]
|
|
|
|
|
|
storage_class_name = "proxmox-lvm"
|
|
|
|
|
|
resources {
|
|
|
|
|
|
requests = {
|
|
|
|
|
|
storage = "1Gi"
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
2026-05-10 21:57:01 +00:00
|
|
|
|
lifecycle {
|
|
|
|
|
|
# The autoresizer expands requests.storage up to storage_limit and
|
|
|
|
|
|
# PVCs can't shrink. Without this, every TF apply tries to revert
|
|
|
|
|
|
# to the spec value, K8s rejects the shrink, and the PVC ends up
|
|
|
|
|
|
# in Terminating-but-in-use limbo.
|
|
|
|
|
|
ignore_changes = [spec[0].resources[0].requests]
|
|
|
|
|
|
}
|
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
|
|
|
|
}
|
|
|
|
|
|
|
2026-01-03 16:58:57 +00:00
|
|
|
|
resource "kubernetes_deployment" "aiostreams" {
|
|
|
|
|
|
metadata {
|
|
|
|
|
|
name = "aiostreams"
|
|
|
|
|
|
namespace = kubernetes_namespace.aiostreams.metadata[0].name
|
|
|
|
|
|
labels = {
|
2026-01-10 16:28:12 +00:00
|
|
|
|
app = "aiostreams"
|
|
|
|
|
|
tier = var.tier
|
2026-01-03 16:58:57 +00:00
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
spec {
|
|
|
|
|
|
replicas = 1
|
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
|
|
|
|
strategy {
|
|
|
|
|
|
type = "Recreate"
|
|
|
|
|
|
}
|
2026-01-03 16:58:57 +00:00
|
|
|
|
selector {
|
|
|
|
|
|
match_labels = {
|
|
|
|
|
|
app = "aiostreams"
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
template {
|
|
|
|
|
|
metadata {
|
|
|
|
|
|
labels = {
|
|
|
|
|
|
app = "aiostreams"
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
spec {
|
|
|
|
|
|
container {
|
2026-05-15 21:28:09 +00:00
|
|
|
|
image = "viren070/aiostreams:2026.05.14.1326-nightly"
|
2026-01-03 16:58:57 +00:00
|
|
|
|
name = "aiostreams"
|
|
|
|
|
|
port {
|
|
|
|
|
|
container_port = 3000
|
|
|
|
|
|
}
|
|
|
|
|
|
env {
|
|
|
|
|
|
name = "BASE_URL"
|
|
|
|
|
|
value = "https://aiostreams.viktorbarzin.me"
|
|
|
|
|
|
}
|
|
|
|
|
|
env {
|
|
|
|
|
|
name = "SECRET_KEY"
|
2026-01-03 23:13:55 +00:00
|
|
|
|
value = random_id.secret_key.hex
|
2026-01-03 16:58:57 +00:00
|
|
|
|
}
|
|
|
|
|
|
env {
|
|
|
|
|
|
name = "DATABASE_URI"
|
|
|
|
|
|
value = var.aiostreams_database_connection_string
|
|
|
|
|
|
}
|
2026-05-15 21:38:50 +00:00
|
|
|
|
env {
|
|
|
|
|
|
# Cache stream-response payloads for 1h. Default is -1 (disabled),
|
|
|
|
|
|
# which made every Stremio request hit all 5 upstream addons live —
|
|
|
|
|
|
# slow, and contributed to the perceived empty-list issue when an
|
|
|
|
|
|
# upstream was slow/erroring. 1h is short enough that RD cache
|
|
|
|
|
|
# invalidations are picked up quickly.
|
|
|
|
|
|
name = "STREAM_CACHE_TTL"
|
|
|
|
|
|
value = "3600"
|
|
|
|
|
|
}
|
2026-05-15 23:37:47 +00:00
|
|
|
|
env {
|
|
|
|
|
|
# Whitelisted regex sync URLs. Vidhin's regexes.json contains release-group
|
|
|
|
|
|
# patterns (TRaSH Guides-aligned).
|
|
|
|
|
|
name = "WHITELISTED_REGEX_PATTERNS_URLS"
|
|
|
|
|
|
value = jsonencode([
|
|
|
|
|
|
"https://raw.githubusercontent.com/Vidhin05/Releases-Regex/main/English/regexes.json",
|
|
|
|
|
|
])
|
|
|
|
|
|
}
|
|
|
|
|
|
env {
|
|
|
|
|
|
# Whitelisted SEL (Stream Expression Language) sync URLs. Stream-expression
|
|
|
|
|
|
# files (Vidhin's ranked expressions + Tamtaro's ISE/PSE/ESE) go here, NOT
|
|
|
|
|
|
# in WHITELISTED_REGEX_PATTERNS_URLS — AIOStreams validates each field
|
|
|
|
|
|
# against the correct whitelist.
|
|
|
|
|
|
name = "WHITELISTED_SEL_URLS"
|
|
|
|
|
|
value = jsonencode([
|
|
|
|
|
|
"https://raw.githubusercontent.com/Vidhin05/Releases-Regex/main/English/expressions.json",
|
|
|
|
|
|
"https://raw.githubusercontent.com/Tam-Taro/SEL-Filtering-and-Sorting/main/AIOStreams-SyncedURLs/Tamtaro-synced-ISEs.json",
|
|
|
|
|
|
"https://raw.githubusercontent.com/Tam-Taro/SEL-Filtering-and-Sorting/main/AIOStreams-SyncedURLs/Tamtaro-synced-PSEs.json",
|
|
|
|
|
|
"https://raw.githubusercontent.com/Tam-Taro/SEL-Filtering-and-Sorting/main/AIOStreams-SyncedURLs/Tamtaro-synced-ESEs-standard.json",
|
|
|
|
|
|
])
|
|
|
|
|
|
}
|
2026-01-03 16:58:57 +00:00
|
|
|
|
volume_mount {
|
|
|
|
|
|
name = "data"
|
|
|
|
|
|
mount_path = "/app/data"
|
|
|
|
|
|
}
|
2026-02-28 17:03:33 +00:00
|
|
|
|
resources {
|
|
|
|
|
|
requests = {
|
[ci skip] right-size all pod resources based on VPA + live metrics audit
Full cluster resource audit: cross-referenced Goldilocks VPA recommendations,
live kubectl top metrics, and Terraform definitions for 100+ containers.
Critical fixes:
- dashy: CPU throttled at 98% (490m/500m) → 2 CPU limit
- stirling-pdf: CPU throttled at 99.7% (299m/300m) → 2 CPU limit
- traefik auth-proxy/bot-block-proxy: mem limit 32Mi → 128Mi
Added explicit resources to ~40 containers that had none:
- audiobookshelf, changedetection, cyberchef, dawarich, diun, echo,
excalidraw, freshrss, hackmd, isponsorblocktv, linkwarden, n8n,
navidrome, ntfy, owntracks, privatebin, send, shadowsocks, tandoor,
tor-proxy, wealthfolio, networking-toolbox, rybbit, mailserver,
cloudflared, pgadmin, phpmyadmin, crowdsec-web, xray, wireguard,
k8s-portal, tuya-bridge, ollama-ui, whisper, piper, immich-server,
immich-postgresql, osrm-foot
GPU containers: added CPU/mem alongside GPU limits:
- ollama: removed CPU/mem limits (models vary in size), keep GPU only
- frigate: req 500m/2Gi, lim 4/8Gi + GPU
- immich-ml: req 100m/1Gi, lim 2/4Gi + GPU
Right-sized ~25 over-provisioned containers:
- kms-web-page: 500m/512Mi → 50m/64Mi (was using 0m/10Mi)
- onlyoffice: CPU 8 → 2 (VPA upper 45m)
- realestate-crawler-api: CPU 2000m → 250m
- blog/travel-blog/webhook-handler: 500m → 100m
- coturn/health/plotting-book: reduced to match actual usage
Conservative methodology: limits = max(VPA upper * 2, live usage * 2)
2026-03-01 19:18:50 +00:00
|
|
|
|
cpu = "25m"
|
2026-03-14 21:46:49 +00:00
|
|
|
|
memory = "768Mi"
|
2026-02-28 17:03:33 +00:00
|
|
|
|
}
|
|
|
|
|
|
limits = {
|
2026-03-14 21:46:49 +00:00
|
|
|
|
memory = "768Mi"
|
2026-02-28 17:03:33 +00:00
|
|
|
|
}
|
|
|
|
|
|
}
|
2026-01-03 16:58:57 +00:00
|
|
|
|
}
|
|
|
|
|
|
volume {
|
|
|
|
|
|
name = "data"
|
2026-03-02 02:04:22 +00:00
|
|
|
|
persistent_volume_claim {
|
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
|
|
|
|
claim_name = kubernetes_persistent_volume_claim.data_proxmox.metadata[0].name
|
2026-01-03 16:58:57 +00:00
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip]
## Context
Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the
27 pre-existing `ignore_changes = [...dns_config]` sites so they could be
grepped and audited. It did NOT address pod-owning resources that were
simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18)
found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec,
and many other stacks showed perpetual `dns_config` drift every plan
because their `kubernetes_deployment` / `kubernetes_stateful_set` /
`kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all.
Root cause (same as Wave 3A): Kyverno's admission webhook stamps
`dns_config { option { name = "ndots"; value = "2" } }` on every pod's
`spec.template.spec.dns_config` to prevent NxDomain search-domain flooding
(see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes`
on every Terraform-managed pod-owner, Terraform repeatedly tries to strip
the injected field.
## This change
Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`,
`kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`,
`kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each
carries the right `ignore_changes` path:
- **kubernetes_deployment / stateful_set / daemon_set / job_v1**:
`spec[0].template[0].spec[0].dns_config`
- **kubernetes_cron_job_v1**:
`spec[0].job_template[0].spec[0].template[0].spec[0].dns_config`
(extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is
one level deeper)
Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno
admission webhook mutates dns_config with ndots=2` inline so the
suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`.
Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`):
1. **No existing `lifecycle {}`**: inject a brand-new block just before the
resource's closing `}`. 108 new blocks on 93 files.
2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag`
from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the
dns_config path. Handles both inline (`= [x]`) and multiline
(`= [\n x,\n]`) forms; ensures the last pre-existing list item carries
a trailing comma so the extended list is valid HCL. 34 extensions.
The script skips anything already mentioning `dns_config` inside an
`ignore_changes`, so re-running is a no-op.
## Scale
- 142 total lifecycle injections/extensions
- 93 `.tf` files touched
- 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones
- Every Tier 0 and Tier 1 stack with a pod-owning resource is covered
- Together with Wave 3A's 27 pre-existing markers → **169 greppable
`KYVERNO_LIFECYCLE_V1` dns_config sites across the repo**
## What is NOT in this change
- `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`).
Python script touched the file, reverted manually.
- `_template/main.tf.example` skeleton — kept minimal on purpose; any
future stack created from it should either inherit the Wave 3A one-line
form or add its own on first `kubernetes_deployment`.
- `terraform fmt` fixes to pre-existing alignment issues in meshcentral,
nvidia/modules/nvidia, vault — unrelated to this commit. Left for a
separate fmt-only pass.
- Non-pod resources (`kubernetes_service`, `kubernetes_secret`,
`kubernetes_manifest`, etc.) — they don't own pods so they don't get
Kyverno dns_config mutation.
## Verification
Random sample post-commit:
```
$ cd stacks/navidrome && ../../scripts/tg plan → No changes.
$ cd stacks/f1-stream && ../../scripts/tg plan → No changes.
$ cd stacks/frigate && ../../scripts/tg plan → No changes.
$ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \
| awk -F: '{s+=$2} END {print s}'
169
```
## Reproduce locally
1. `git pull`
2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+
3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on
the deployment's dns_config field.
Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest
annotation class handled separately in 8d94688d for tls_secret)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
|
|
|
|
lifecycle {
|
|
|
|
|
|
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
keel+anubis: extend sweep to non-V2 raw deployments; fix anubis replicas validation
Second-tier keel drift: actualbudget, mailserver (docker-mailserver + roundcube),
servarr (8 deployments), and authentik pgbouncer are live-enrolled (Kyverno injects
keel.sh/policy=patch) and drifting, but never had the V2 block in Terraform. Added
the full block (KYVERNO_LIFECYCLE_V2 + keel.sh/match-tag + per-container
KEEL_IGNORE_IMAGE + KEEL_LIFECYCLE_V1) to all 13 deployments. The docker-mailserver
deployment had no resource-level lifecycle at all — added one.
Also fixes a pre-existing bug in modules/kubernetes/anubis_instance: the `replicas`
validation `var.replicas == null || (...)` doesn't null-short-circuit in the current
TF version, failing apply on every single-replica Anubis site (blog, cyberchef,
f1-stream, homepage, jsoncrack, kms, postiz, real-estate-crawler, travel_blog) with
"argument must not be null". Switched to a null-safe ternary.
Verified: actualbudget plan shows no image drift (http-api 26.5.2 downgrade prevented).
The anubis module change triggers a full platform apply.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 06:01:24 +00:00
|
|
|
|
ignore_changes = [
|
|
|
|
|
|
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
|
|
|
|
|
|
metadata[0].annotations["keel.sh/policy"],
|
|
|
|
|
|
metadata[0].annotations["keel.sh/trigger"],
|
|
|
|
|
|
metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
|
|
|
|
|
|
metadata[0].annotations["keel.sh/match-tag"],
|
|
|
|
|
|
spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE — Keel manages tag updates
|
|
|
|
|
|
metadata[0].annotations["kubernetes.io/change-cause"],
|
|
|
|
|
|
metadata[0].annotations["deployment.kubernetes.io/revision"],
|
|
|
|
|
|
spec[0].template[0].metadata[0].annotations["keel.sh/update-time"], # KEEL_LIFECYCLE_V1
|
|
|
|
|
|
]
|
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip]
## Context
Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the
27 pre-existing `ignore_changes = [...dns_config]` sites so they could be
grepped and audited. It did NOT address pod-owning resources that were
simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18)
found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec,
and many other stacks showed perpetual `dns_config` drift every plan
because their `kubernetes_deployment` / `kubernetes_stateful_set` /
`kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all.
Root cause (same as Wave 3A): Kyverno's admission webhook stamps
`dns_config { option { name = "ndots"; value = "2" } }` on every pod's
`spec.template.spec.dns_config` to prevent NxDomain search-domain flooding
(see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes`
on every Terraform-managed pod-owner, Terraform repeatedly tries to strip
the injected field.
## This change
Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`,
`kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`,
`kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each
carries the right `ignore_changes` path:
- **kubernetes_deployment / stateful_set / daemon_set / job_v1**:
`spec[0].template[0].spec[0].dns_config`
- **kubernetes_cron_job_v1**:
`spec[0].job_template[0].spec[0].template[0].spec[0].dns_config`
(extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is
one level deeper)
Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno
admission webhook mutates dns_config with ndots=2` inline so the
suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`.
Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`):
1. **No existing `lifecycle {}`**: inject a brand-new block just before the
resource's closing `}`. 108 new blocks on 93 files.
2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag`
from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the
dns_config path. Handles both inline (`= [x]`) and multiline
(`= [\n x,\n]`) forms; ensures the last pre-existing list item carries
a trailing comma so the extended list is valid HCL. 34 extensions.
The script skips anything already mentioning `dns_config` inside an
`ignore_changes`, so re-running is a no-op.
## Scale
- 142 total lifecycle injections/extensions
- 93 `.tf` files touched
- 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones
- Every Tier 0 and Tier 1 stack with a pod-owning resource is covered
- Together with Wave 3A's 27 pre-existing markers → **169 greppable
`KYVERNO_LIFECYCLE_V1` dns_config sites across the repo**
## What is NOT in this change
- `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`).
Python script touched the file, reverted manually.
- `_template/main.tf.example` skeleton — kept minimal on purpose; any
future stack created from it should either inherit the Wave 3A one-line
form or add its own on first `kubernetes_deployment`.
- `terraform fmt` fixes to pre-existing alignment issues in meshcentral,
nvidia/modules/nvidia, vault — unrelated to this commit. Left for a
separate fmt-only pass.
- Non-pod resources (`kubernetes_service`, `kubernetes_secret`,
`kubernetes_manifest`, etc.) — they don't own pods so they don't get
Kyverno dns_config mutation.
## Verification
Random sample post-commit:
```
$ cd stacks/navidrome && ../../scripts/tg plan → No changes.
$ cd stacks/f1-stream && ../../scripts/tg plan → No changes.
$ cd stacks/frigate && ../../scripts/tg plan → No changes.
$ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \
| awk -F: '{s+=$2} END {print s}'
169
```
## Reproduce locally
1. `git pull`
2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+
3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on
the deployment's dns_config field.
Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest
annotation class handled separately in 8d94688d for tls_secret)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
|
|
|
|
}
|
2026-01-03 16:58:57 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
resource "kubernetes_service" "aiostreams" {
|
|
|
|
|
|
metadata {
|
|
|
|
|
|
name = "aiostreams"
|
|
|
|
|
|
namespace = kubernetes_namespace.aiostreams.metadata[0].name
|
|
|
|
|
|
labels = {
|
|
|
|
|
|
"app" = "aiostreams"
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
spec {
|
|
|
|
|
|
selector = {
|
|
|
|
|
|
app = "aiostreams"
|
|
|
|
|
|
}
|
|
|
|
|
|
port {
|
|
|
|
|
|
name = "http"
|
|
|
|
|
|
port = 80
|
|
|
|
|
|
target_port = 3000
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
2026-05-15 21:38:50 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
resource "kubernetes_manifest" "probe_secrets" {
|
|
|
|
|
|
manifest = {
|
|
|
|
|
|
apiVersion = "external-secrets.io/v1beta1"
|
|
|
|
|
|
kind = "ExternalSecret"
|
|
|
|
|
|
metadata = {
|
|
|
|
|
|
name = "aiostreams-probe-secrets"
|
|
|
|
|
|
namespace = kubernetes_namespace.aiostreams.metadata[0].name
|
|
|
|
|
|
}
|
|
|
|
|
|
spec = {
|
|
|
|
|
|
refreshInterval = "15m"
|
|
|
|
|
|
secretStoreRef = {
|
|
|
|
|
|
name = "vault-kv"
|
|
|
|
|
|
kind = "ClusterSecretStore"
|
|
|
|
|
|
}
|
|
|
|
|
|
target = { name = "aiostreams-probe-secrets" }
|
|
|
|
|
|
data = [
|
|
|
|
|
|
{ secretKey = "AIOSTREAMS_UUID", remoteRef = { key = "viktor", property = "aiostreams_uuid" } },
|
|
|
|
|
|
{ secretKey = "AIOSTREAMS_PASSWORD", remoteRef = { key = "viktor", property = "aiostreams_password" } },
|
aiostreams: weekly backup of Stremio account addon collection
Adds stremio-account-backup CronJob (Sun 04:00 weekly, offset 1h from
the AIOStreams config-backup at 03:00):
- Logs into api.strem.io with credentials from Vault
(secret/viktor.stremio_email + stremio_password, now also synced
into the aiostreams-probe-secrets ExternalSecret)
- Fetches the full addonCollection via addonCollectionGet
- Writes timestamped JSON to the existing aiostreams-backup PVC
(NFS /srv/nfs/aiostreams-backup/stremio-collection-*.json, mode 0600)
- 90-day retention, logs out to invalidate the auth key
- Pushgateway metrics: stremio_account_backup_{success,bytes,
addon_count,duration_seconds,last_run_timestamp}
Protects against: accidental "uninstall all" / API regression / wrong
account login wiping the curated set of 22 addons (Cinemeta + 16
MDBList + AIOStreams + More Like This + Formulio + Zamunda + Local).
Verified: manual run wrote 93480 bytes, 22 addons, file present on NFS.
2026-05-15 23:48:41 +00:00
|
|
|
|
{ secretKey = "STREMIO_EMAIL", remoteRef = { key = "viktor", property = "stremio_email" } },
|
|
|
|
|
|
{ secretKey = "STREMIO_PASSWORD", remoteRef = { key = "viktor", property = "stremio_password" } },
|
2026-05-15 21:38:50 +00:00
|
|
|
|
]
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
depends_on = [kubernetes_namespace.aiostreams]
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
resource "kubernetes_cron_job_v1" "stream_probe" {
|
|
|
|
|
|
metadata {
|
|
|
|
|
|
name = "aiostreams-stream-probe"
|
|
|
|
|
|
namespace = kubernetes_namespace.aiostreams.metadata[0].name
|
|
|
|
|
|
}
|
|
|
|
|
|
spec {
|
|
|
|
|
|
schedule = "*/5 * * * *"
|
|
|
|
|
|
concurrency_policy = "Replace"
|
|
|
|
|
|
successful_jobs_history_limit = 3
|
|
|
|
|
|
failed_jobs_history_limit = 3
|
|
|
|
|
|
job_template {
|
|
|
|
|
|
metadata {}
|
|
|
|
|
|
spec {
|
|
|
|
|
|
backoff_limit = 1
|
|
|
|
|
|
ttl_seconds_after_finished = 300
|
|
|
|
|
|
template {
|
|
|
|
|
|
metadata {}
|
|
|
|
|
|
spec {
|
|
|
|
|
|
restart_policy = "Never"
|
|
|
|
|
|
container {
|
|
|
|
|
|
name = "probe"
|
|
|
|
|
|
image = "docker.io/library/python:3.12-alpine"
|
|
|
|
|
|
command = ["/bin/sh", "-c", <<-EOT
|
|
|
|
|
|
pip install --quiet --disable-pip-version-check requests && python3 -c '
|
|
|
|
|
|
import requests, os, time, urllib.parse, sys
|
|
|
|
|
|
|
|
|
|
|
|
BASE = "http://aiostreams.aiostreams.svc.cluster.local"
|
|
|
|
|
|
PUSHGATEWAY = "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/aiostreams-stream-probe"
|
|
|
|
|
|
UUID = os.environ["AIOSTREAMS_UUID"]
|
|
|
|
|
|
PW = os.environ["AIOSTREAMS_PASSWORD"]
|
|
|
|
|
|
TEST_ID = "tt0903747:1:1" # Breaking Bad S01E01 - stable, always has many streams
|
|
|
|
|
|
THRESHOLD = 50
|
|
|
|
|
|
|
|
|
|
|
|
count = 0
|
|
|
|
|
|
success = 0
|
|
|
|
|
|
duration = 0
|
|
|
|
|
|
start = time.time()
|
|
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
|
r = requests.get(f"{BASE}/api/v1/user/", params={"uuid": UUID, "password": PW}, timeout=10)
|
|
|
|
|
|
r.raise_for_status()
|
|
|
|
|
|
enc = r.json()["data"]["encryptedPassword"]
|
|
|
|
|
|
enc_url = urllib.parse.quote(enc, safe="")
|
|
|
|
|
|
r2 = requests.get(
|
|
|
|
|
|
f"{BASE}/stremio/{UUID}/{enc_url}/stream/series/{TEST_ID}.json",
|
|
|
|
|
|
headers={"User-Agent": "AIOStreams/probe"}, timeout=60,
|
|
|
|
|
|
)
|
|
|
|
|
|
r2.raise_for_status()
|
|
|
|
|
|
count = len(r2.json().get("streams", []))
|
|
|
|
|
|
success = 1 if count >= THRESHOLD else 0
|
|
|
|
|
|
print(f"streams={count} success={success}")
|
|
|
|
|
|
except Exception as e:
|
|
|
|
|
|
print(f"ERROR: {e}", file=sys.stderr)
|
|
|
|
|
|
success = 0
|
|
|
|
|
|
|
|
|
|
|
|
duration = time.time() - start
|
|
|
|
|
|
|
|
|
|
|
|
body = (
|
|
|
|
|
|
"# TYPE aiostreams_stream_count gauge\n"
|
|
|
|
|
|
f"aiostreams_stream_count {count}\n"
|
|
|
|
|
|
"# TYPE aiostreams_probe_success gauge\n"
|
|
|
|
|
|
f"aiostreams_probe_success {success}\n"
|
|
|
|
|
|
"# TYPE aiostreams_probe_duration_seconds gauge\n"
|
|
|
|
|
|
f"aiostreams_probe_duration_seconds {duration:.3f}\n"
|
|
|
|
|
|
"# TYPE aiostreams_probe_last_run_timestamp gauge\n"
|
|
|
|
|
|
f"aiostreams_probe_last_run_timestamp {int(time.time())}\n"
|
|
|
|
|
|
)
|
|
|
|
|
|
try:
|
|
|
|
|
|
requests.post(PUSHGATEWAY, data=body, timeout=10).raise_for_status()
|
|
|
|
|
|
except Exception as e:
|
|
|
|
|
|
print(f"WARN: pushgateway POST failed: {e}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
|
|
sys.exit(0 if success else 1)
|
|
|
|
|
|
'
|
|
|
|
|
|
EOT
|
|
|
|
|
|
]
|
|
|
|
|
|
env_from {
|
|
|
|
|
|
secret_ref { name = "aiostreams-probe-secrets" }
|
|
|
|
|
|
}
|
|
|
|
|
|
resources {
|
|
|
|
|
|
requests = { memory = "64Mi", cpu = "10m" }
|
|
|
|
|
|
limits = { memory = "128Mi" }
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
depends_on = [kubernetes_manifest.probe_secrets, kubernetes_deployment.aiostreams]
|
|
|
|
|
|
lifecycle {
|
|
|
|
|
|
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
|
|
|
|
|
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
|
|
|
|
|
}
|
2026-05-15 23:30:04 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
module "nfs_backup" {
|
|
|
|
|
|
source = "../../../modules/kubernetes/nfs_volume"
|
|
|
|
|
|
name = "aiostreams-backup"
|
|
|
|
|
|
namespace = kubernetes_namespace.aiostreams.metadata[0].name
|
|
|
|
|
|
nfs_server = var.nfs_server
|
|
|
|
|
|
nfs_path = "/srv/nfs/aiostreams-backup"
|
|
|
|
|
|
storage = "1Gi"
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
resource "kubernetes_cron_job_v1" "config_backup" {
|
|
|
|
|
|
metadata {
|
|
|
|
|
|
name = "aiostreams-config-backup"
|
|
|
|
|
|
namespace = kubernetes_namespace.aiostreams.metadata[0].name
|
|
|
|
|
|
}
|
|
|
|
|
|
spec {
|
|
|
|
|
|
schedule = "0 3 * * 0" # Sunday 03:00 weekly
|
|
|
|
|
|
concurrency_policy = "Forbid"
|
|
|
|
|
|
successful_jobs_history_limit = 3
|
|
|
|
|
|
failed_jobs_history_limit = 3
|
|
|
|
|
|
job_template {
|
|
|
|
|
|
metadata {}
|
|
|
|
|
|
spec {
|
|
|
|
|
|
backoff_limit = 2
|
|
|
|
|
|
ttl_seconds_after_finished = 600
|
|
|
|
|
|
template {
|
|
|
|
|
|
metadata {}
|
|
|
|
|
|
spec {
|
|
|
|
|
|
restart_policy = "Never"
|
|
|
|
|
|
container {
|
|
|
|
|
|
name = "backup"
|
|
|
|
|
|
image = "docker.io/library/python:3.12-alpine"
|
|
|
|
|
|
command = ["/bin/sh", "-c", <<-EOT
|
|
|
|
|
|
pip install --quiet --disable-pip-version-check requests && python3 -c '
|
|
|
|
|
|
import requests, os, time, json, sys, datetime, glob
|
|
|
|
|
|
|
|
|
|
|
|
BASE = "http://aiostreams.aiostreams.svc.cluster.local"
|
|
|
|
|
|
PUSHGATEWAY = "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/aiostreams-config-backup"
|
|
|
|
|
|
UUID = os.environ["AIOSTREAMS_UUID"]
|
|
|
|
|
|
PW = os.environ["AIOSTREAMS_PASSWORD"]
|
|
|
|
|
|
BACKUP_DIR = "/backup"
|
|
|
|
|
|
RETENTION_DAYS = 90
|
|
|
|
|
|
|
|
|
|
|
|
success = 0
|
|
|
|
|
|
bytes_written = 0
|
|
|
|
|
|
start = time.time()
|
|
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
|
r = requests.get(f"{BASE}/api/v1/user/", params={"uuid": UUID, "password": PW, "raw": "true"}, timeout=30)
|
|
|
|
|
|
r.raise_for_status()
|
|
|
|
|
|
data = r.json()["data"]["userData"]
|
|
|
|
|
|
if not data:
|
|
|
|
|
|
raise RuntimeError("empty userData from API")
|
|
|
|
|
|
|
|
|
|
|
|
os.makedirs(BACKUP_DIR, exist_ok=True)
|
|
|
|
|
|
ts = datetime.datetime.utcnow().strftime("%Y-%m-%d_%H%M")
|
|
|
|
|
|
path = f"{BACKUP_DIR}/config-{ts}.json"
|
|
|
|
|
|
with open(path, "w") as f:
|
|
|
|
|
|
json.dump(data, f, indent=2, sort_keys=True)
|
|
|
|
|
|
bytes_written = os.path.getsize(path)
|
|
|
|
|
|
os.chmod(path, 0o600)
|
|
|
|
|
|
print(f"OK wrote {path} ({bytes_written} bytes)")
|
|
|
|
|
|
|
|
|
|
|
|
# Prune backups older than RETENTION_DAYS
|
|
|
|
|
|
cutoff = time.time() - (RETENTION_DAYS * 86400)
|
|
|
|
|
|
pruned = 0
|
|
|
|
|
|
for f in glob.glob(f"{BACKUP_DIR}/config-*.json"):
|
|
|
|
|
|
if os.path.getmtime(f) < cutoff:
|
|
|
|
|
|
os.unlink(f)
|
|
|
|
|
|
pruned += 1
|
|
|
|
|
|
if pruned:
|
|
|
|
|
|
print(f"Pruned {pruned} old backups")
|
|
|
|
|
|
success = 1
|
|
|
|
|
|
except Exception as e:
|
|
|
|
|
|
print(f"ERROR: {e}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
|
|
duration = time.time() - start
|
|
|
|
|
|
body = (
|
|
|
|
|
|
"# TYPE aiostreams_config_backup_success gauge\n"
|
|
|
|
|
|
f"aiostreams_config_backup_success {success}\n"
|
|
|
|
|
|
"# TYPE aiostreams_config_backup_bytes gauge\n"
|
|
|
|
|
|
f"aiostreams_config_backup_bytes {bytes_written}\n"
|
|
|
|
|
|
"# TYPE aiostreams_config_backup_duration_seconds gauge\n"
|
|
|
|
|
|
f"aiostreams_config_backup_duration_seconds {duration:.3f}\n"
|
|
|
|
|
|
"# TYPE aiostreams_config_backup_last_run_timestamp gauge\n"
|
|
|
|
|
|
f"aiostreams_config_backup_last_run_timestamp {int(time.time())}\n"
|
|
|
|
|
|
)
|
|
|
|
|
|
try:
|
|
|
|
|
|
requests.post(PUSHGATEWAY, data=body, timeout=10).raise_for_status()
|
|
|
|
|
|
except Exception as e:
|
|
|
|
|
|
print(f"WARN: pushgateway POST failed: {e}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
|
|
sys.exit(0 if success else 1)
|
|
|
|
|
|
'
|
|
|
|
|
|
EOT
|
|
|
|
|
|
]
|
|
|
|
|
|
env_from {
|
|
|
|
|
|
secret_ref { name = "aiostreams-probe-secrets" }
|
|
|
|
|
|
}
|
|
|
|
|
|
volume_mount {
|
|
|
|
|
|
name = "backup"
|
|
|
|
|
|
mount_path = "/backup"
|
|
|
|
|
|
}
|
|
|
|
|
|
resources {
|
|
|
|
|
|
requests = { memory = "64Mi", cpu = "10m" }
|
|
|
|
|
|
limits = { memory = "128Mi" }
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
volume {
|
|
|
|
|
|
name = "backup"
|
|
|
|
|
|
persistent_volume_claim {
|
|
|
|
|
|
claim_name = module.nfs_backup.claim_name
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
depends_on = [kubernetes_manifest.probe_secrets, kubernetes_deployment.aiostreams, module.nfs_backup]
|
|
|
|
|
|
lifecycle {
|
|
|
|
|
|
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
|
|
|
|
|
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
|
|
|
|
|
}
|
aiostreams: weekly backup of Stremio account addon collection
Adds stremio-account-backup CronJob (Sun 04:00 weekly, offset 1h from
the AIOStreams config-backup at 03:00):
- Logs into api.strem.io with credentials from Vault
(secret/viktor.stremio_email + stremio_password, now also synced
into the aiostreams-probe-secrets ExternalSecret)
- Fetches the full addonCollection via addonCollectionGet
- Writes timestamped JSON to the existing aiostreams-backup PVC
(NFS /srv/nfs/aiostreams-backup/stremio-collection-*.json, mode 0600)
- 90-day retention, logs out to invalidate the auth key
- Pushgateway metrics: stremio_account_backup_{success,bytes,
addon_count,duration_seconds,last_run_timestamp}
Protects against: accidental "uninstall all" / API regression / wrong
account login wiping the curated set of 22 addons (Cinemeta + 16
MDBList + AIOStreams + More Like This + Formulio + Zamunda + Local).
Verified: manual run wrote 93480 bytes, 22 addons, file present on NFS.
2026-05-15 23:48:41 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
resource "kubernetes_cron_job_v1" "stremio_account_backup" {
|
|
|
|
|
|
metadata {
|
|
|
|
|
|
name = "stremio-account-backup"
|
|
|
|
|
|
namespace = kubernetes_namespace.aiostreams.metadata[0].name
|
|
|
|
|
|
}
|
|
|
|
|
|
spec {
|
|
|
|
|
|
schedule = "0 4 * * 0" # Sunday 04:00 weekly (1h after config-backup)
|
|
|
|
|
|
concurrency_policy = "Forbid"
|
|
|
|
|
|
successful_jobs_history_limit = 3
|
|
|
|
|
|
failed_jobs_history_limit = 3
|
|
|
|
|
|
job_template {
|
|
|
|
|
|
metadata {}
|
|
|
|
|
|
spec {
|
|
|
|
|
|
backoff_limit = 2
|
|
|
|
|
|
ttl_seconds_after_finished = 600
|
|
|
|
|
|
template {
|
|
|
|
|
|
metadata {}
|
|
|
|
|
|
spec {
|
|
|
|
|
|
restart_policy = "Never"
|
|
|
|
|
|
container {
|
|
|
|
|
|
name = "backup"
|
|
|
|
|
|
image = "docker.io/library/python:3.12-alpine"
|
|
|
|
|
|
command = ["/bin/sh", "-c", <<-EOT
|
|
|
|
|
|
pip install --quiet --disable-pip-version-check requests && python3 -c '
|
|
|
|
|
|
import requests, os, time, json, sys, datetime, glob
|
|
|
|
|
|
|
|
|
|
|
|
BASE = "https://api.strem.io/api"
|
|
|
|
|
|
PUSHGATEWAY = "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/stremio-account-backup"
|
|
|
|
|
|
EMAIL = os.environ["STREMIO_EMAIL"]
|
|
|
|
|
|
PASSWORD = os.environ["STREMIO_PASSWORD"]
|
|
|
|
|
|
BACKUP_DIR = "/backup"
|
|
|
|
|
|
RETENTION_DAYS = 90
|
|
|
|
|
|
|
|
|
|
|
|
success = 0
|
|
|
|
|
|
bytes_written = 0
|
|
|
|
|
|
addon_count = 0
|
|
|
|
|
|
start = time.time()
|
|
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
|
r = requests.post(f"{BASE}/login", json={"type":"Login","email":EMAIL,"password":PASSWORD}, timeout=20)
|
|
|
|
|
|
r.raise_for_status()
|
|
|
|
|
|
auth = r.json()["result"]["authKey"]
|
|
|
|
|
|
|
|
|
|
|
|
r2 = requests.post(f"{BASE}/addonCollectionGet", json={"type":"AddonCollectionGet","authKey":auth,"update":True}, timeout=30)
|
|
|
|
|
|
r2.raise_for_status()
|
|
|
|
|
|
addons = r2.json()["result"]["addons"]
|
|
|
|
|
|
addon_count = len(addons)
|
|
|
|
|
|
|
|
|
|
|
|
os.makedirs(BACKUP_DIR, exist_ok=True)
|
|
|
|
|
|
ts = datetime.datetime.now(datetime.UTC).strftime("%Y-%m-%d_%H%M")
|
|
|
|
|
|
path = f"{BACKUP_DIR}/stremio-collection-{ts}.json"
|
|
|
|
|
|
payload = {"capturedAt": ts, "email": EMAIL, "addonCount": addon_count, "addons": addons}
|
|
|
|
|
|
with open(path, "w") as f:
|
|
|
|
|
|
json.dump(payload, f, indent=2, sort_keys=True)
|
|
|
|
|
|
bytes_written = os.path.getsize(path)
|
|
|
|
|
|
os.chmod(path, 0o600)
|
|
|
|
|
|
print(f"OK wrote {path} ({bytes_written} bytes, {addon_count} addons)")
|
|
|
|
|
|
|
|
|
|
|
|
# Logout to invalidate the auth key
|
|
|
|
|
|
try:
|
|
|
|
|
|
requests.post(f"{BASE}/logout", json={"type":"Logout","authKey":auth}, timeout=10)
|
|
|
|
|
|
except Exception:
|
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
|
|
# Prune older than RETENTION_DAYS
|
|
|
|
|
|
cutoff = time.time() - (RETENTION_DAYS * 86400)
|
|
|
|
|
|
pruned = 0
|
|
|
|
|
|
for f in glob.glob(f"{BACKUP_DIR}/stremio-collection-*.json"):
|
|
|
|
|
|
if os.path.getmtime(f) < cutoff:
|
|
|
|
|
|
os.unlink(f); pruned += 1
|
|
|
|
|
|
if pruned: print(f"Pruned {pruned} old backups")
|
|
|
|
|
|
success = 1
|
|
|
|
|
|
except Exception as e:
|
|
|
|
|
|
print(f"ERROR: {e}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
|
|
duration = time.time() - start
|
|
|
|
|
|
body = (
|
|
|
|
|
|
"# TYPE stremio_account_backup_success gauge\n"
|
|
|
|
|
|
f"stremio_account_backup_success {success}\n"
|
|
|
|
|
|
"# TYPE stremio_account_backup_bytes gauge\n"
|
|
|
|
|
|
f"stremio_account_backup_bytes {bytes_written}\n"
|
|
|
|
|
|
"# TYPE stremio_account_backup_addon_count gauge\n"
|
|
|
|
|
|
f"stremio_account_backup_addon_count {addon_count}\n"
|
|
|
|
|
|
"# TYPE stremio_account_backup_duration_seconds gauge\n"
|
|
|
|
|
|
f"stremio_account_backup_duration_seconds {duration:.3f}\n"
|
|
|
|
|
|
"# TYPE stremio_account_backup_last_run_timestamp gauge\n"
|
|
|
|
|
|
f"stremio_account_backup_last_run_timestamp {int(time.time())}\n"
|
|
|
|
|
|
)
|
|
|
|
|
|
try:
|
|
|
|
|
|
requests.post(PUSHGATEWAY, data=body, timeout=10).raise_for_status()
|
|
|
|
|
|
except Exception as e:
|
|
|
|
|
|
print(f"WARN: pushgateway POST failed: {e}", file=sys.stderr)
|
|
|
|
|
|
|
|
|
|
|
|
sys.exit(0 if success else 1)
|
|
|
|
|
|
'
|
|
|
|
|
|
EOT
|
|
|
|
|
|
]
|
|
|
|
|
|
env_from {
|
|
|
|
|
|
secret_ref { name = "aiostreams-probe-secrets" }
|
|
|
|
|
|
}
|
|
|
|
|
|
volume_mount {
|
|
|
|
|
|
name = "backup"
|
|
|
|
|
|
mount_path = "/backup"
|
|
|
|
|
|
}
|
|
|
|
|
|
resources {
|
|
|
|
|
|
requests = { memory = "64Mi", cpu = "10m" }
|
|
|
|
|
|
limits = { memory = "128Mi" }
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
volume {
|
|
|
|
|
|
name = "backup"
|
|
|
|
|
|
persistent_volume_claim {
|
|
|
|
|
|
claim_name = module.nfs_backup.claim_name
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
}
|
|
|
|
|
|
depends_on = [kubernetes_manifest.probe_secrets, module.nfs_backup]
|
|
|
|
|
|
lifecycle {
|
|
|
|
|
|
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
|
|
|
|
|
|
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
|
|
|
|
|
|
}
|
2026-01-03 16:58:57 +00:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
module "ingress" {
|
2026-05-15 21:28:09 +00:00
|
|
|
|
source = "../../../modules/kubernetes/ingress_factory"
|
|
|
|
|
|
# auth = "app": AIOStreams enforces its own UUID + password gate on /configure
|
|
|
|
|
|
# and /api/*, and Stremio addon URLs (/stremio/{uuid}/{encryptedPassword}/...)
|
|
|
|
|
|
# use the encryptedPassword path segment as a bearer token. Authentik forward-auth
|
|
|
|
|
|
# broke Stremio clients (cannot follow OAuth 302) and is redundant with the app's
|
|
|
|
|
|
# own auth. UUIDs are 128-bit random; password attempts are rate-limited.
|
|
|
|
|
|
auth = "app"
|
2026-04-16 13:45:04 +00:00
|
|
|
|
dns_type = "proxied"
|
2026-01-03 16:58:57 +00:00
|
|
|
|
namespace = kubernetes_namespace.aiostreams.metadata[0].name
|
|
|
|
|
|
name = "aiostreams"
|
|
|
|
|
|
tls_secret_name = var.tls_secret_name
|
2026-03-07 16:41:36 +00:00
|
|
|
|
extra_annotations = {
|
|
|
|
|
|
"gethomepage.dev/enabled" = "true"
|
|
|
|
|
|
"gethomepage.dev/name" = "AIOStreams"
|
|
|
|
|
|
"gethomepage.dev/description" = "Streaming addon manager"
|
|
|
|
|
|
"gethomepage.dev/icon" = "stremio.png"
|
|
|
|
|
|
"gethomepage.dev/group" = "Media & Entertainment"
|
|
|
|
|
|
"gethomepage.dev/pod-selector" = ""
|
|
|
|
|
|
}
|
2026-01-03 16:58:57 +00:00
|
|
|
|
}
|