Hardening pass following the empty-stream-list incident:
1. STREAM_CACHE_TTL=3600 — re-enables stream payload cache (was -1 /
disabled). Default behaviour hit all 5 upstream addons on every
Stremio request; with a 1h TTL repeat requests for the same title
are instant, while RD cache invalidations still propagate quickly.
2. aiostreams-stream-probe CronJob (every 5 min): fetches the user's
encryptedPassword via the internal ClusterIP, runs a canary stream
search for Breaking Bad S01E01, pushes streams_count + probe_success
to Pushgateway. Uses an ExternalSecret pulling UUID + password from
Vault secret/viktor. Same pattern as email-roundtrip-monitor.
3. Three alerts in monitoring's prometheus_chart_values.tpl:
- AIOStreamsStreamCountLow (< 50 streams for 30m)
- AIOStreamsProbeFailing (probe_success == 0 for 30m)
- AIOStreamsProbeStale (last_run_timestamp > 30min for 10m)
Verified: probe returned streams=411 success=1 on first run; all 3
alerts loaded into Prometheus with state=inactive health=ok.