infra/stacks
OpenClaw 8154103ac4 feat(monitoring): Disable Loki centralized logging while preserving configuration
DECISION: Disable Loki due to operational overhead vs benefit analysis

EVIDENCE FROM NODE2 INCIDENT:
- Loki was the root cause of major cluster outage (PVC storage exhaustion)
- Centralized logging was unavailable when needed most (Loki was down)
- All debugging was accomplished with simpler tools (kubectl logs, events, describe)
- Prometheus metrics proved more valuable than centralized logs

OPERATIONAL OVERHEAD ELIMINATED:
 50GB iSCSI storage freed up (expensive)
 ~3.5GB memory freed up (Loki + Alloy agents across cluster)
 ~2+ CPU cores freed up for actual workloads
 Reduced complexity - fewer services to maintain and troubleshoot
 Eliminated single point of failure that can cascade cluster-wide

CONFIGURATION PRESERVED:
 All Terraform resources commented out (not deleted)
 loki.yaml preserved with 50GB configuration
 alloy.yaml preserved with log shipping configuration
 Alert rules and Grafana datasource preserved (commented)
 Easy re-enabling: just uncomment resources and apply

ALTERNATIVE DEBUGGING APPROACH:
 kubectl logs (always works, no storage dependency)
 kubectl get events (built-in Kubernetes events)
 Prometheus metrics (more reliable for monitoring)
 Enhanced health check scripts (direct status verification)

RE-ENABLING:
To restore Loki later, uncomment all /* ... */ blocks in loki.tf
and apply via Terraform. All configuration is preserved.

[ci skip] - Infrastructure changes applied locally first due to resource cleanup
2026-03-17 16:51:02 +00:00
..
actualbudget fix cluster health: pin actualbudget, spread MySQL, scale grampsweb, fix GPU toleration 2026-03-11 11:43:34 +00:00
affine [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
audiobookshelf [ci skip] add widgets for audiobookshelf, changedetection, prowlarr, headscale 2026-03-07 20:39:55 +00:00
blog [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
calibre resource quota review: fix OOM risks, close quota gaps, add HA protections 2026-03-08 18:17:46 +00:00
changedetection [ci skip] fix invalid Homepage dashboard icons for 9 services 2026-03-07 21:14:17 +00:00
city-guesser [ci skip] fix invalid Homepage dashboard icons for 9 services 2026-03-07 21:14:17 +00:00
coturn [ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars 2026-03-07 14:30:36 +00:00
cyberchef [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
dashy resource quota review: fix OOM risks, close quota gaps, add HA protections 2026-03-08 18:17:46 +00:00
dawarich [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
descheduler resource quota review: fix OOM risks, close quota gaps, add HA protections 2026-03-08 18:17:46 +00:00
diun [ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars 2026-03-07 14:30:36 +00:00
ebook2audiobook [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
echo [ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars 2026-03-07 14:30:36 +00:00
excalidraw [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
f1-stream [ci skip] fix invalid Homepage dashboard icons for 9 services 2026-03-07 21:14:17 +00:00
forgejo [ci skip] add Forgejo task pipeline for OpenClaw AI agent 2026-03-07 21:11:07 +00:00
freedify [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
freshrss [ci skip] add widgets for qbittorrent, navidrome, nextcloud, freshrss, linkwarden, uptime-kuma 2026-03-07 20:39:55 +00:00
frigate fix Frigate GPU stall: add inference speed check to liveness probe 2026-03-13 10:23:21 +00:00
grampsweb fix cluster health: pin actualbudget, spread MySQL, scale grampsweb, fix GPU toleration 2026-03-11 11:43:34 +00:00
hackmd [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
health [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
homepage add nginx caching proxy for Homepage widget API requests 2026-03-07 21:11:07 +00:00
immich [ci skip] fix widget issues: ports, Immich v2 API, Nextcloud trusted domains 2026-03-07 20:39:56 +00:00
infra feat(monitoring): Enhance disk monitoring and containerd GC after node2 incident 2026-03-17 16:51:02 +00:00
isponsorblocktv [ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars 2026-03-07 14:30:36 +00:00
jsoncrack [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
k8s-dashboard [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
kms [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
linkwarden [ci skip] fix widget URLs: use correct k8s service ports 2026-03-07 20:39:56 +00:00
matrix [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
meshcentral [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
n8n [ci skip] add Forgejo task pipeline for OpenClaw AI agent 2026-03-07 21:11:07 +00:00
navidrome [ci skip] fix widget URLs: use correct k8s service ports 2026-03-07 20:39:56 +00:00
netbox [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
networking-toolbox [ci skip] fix Homepage icons for Tandoor, Listenarr, Networking Toolbox, Goldilocks 2026-03-07 21:29:51 +00:00
nextcloud fix(nextcloud): Database corruption recovery and conservative Apache tuning 2026-03-17 16:51:01 +00:00
ntfy [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
ollama [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
onlyoffice [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
openclaw [ci skip] add Forgejo task pipeline for OpenClaw AI agent 2026-03-07 21:11:07 +00:00
osm_routing [ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars 2026-03-07 14:30:36 +00:00
owntracks [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
paperless-ngx [ci skip] add widgets for audiobookshelf, changedetection, prowlarr, headscale 2026-03-07 20:39:55 +00:00
platform feat(monitoring): Disable Loki centralized logging while preserving configuration 2026-03-17 16:51:02 +00:00
plotting-book set Recreate strategy for plotting-book deployment 2026-03-10 23:47:30 +00:00
poison-fountain [ci skip] fix invalid Homepage dashboard icons for 9 services 2026-03-07 21:14:17 +00:00
privatebin [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
real-estate-crawler resource quota review: fix OOM risks, close quota gaps, add HA protections 2026-03-08 18:17:46 +00:00
reloader [ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars 2026-03-07 14:30:36 +00:00
resume [ci skip] fix invalid Homepage dashboard icons for 9 services 2026-03-07 21:14:17 +00:00
rybbit [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
send [ci skip] add liveness probe to Send deployment 2026-03-07 20:39:57 +00:00
servarr resource quota review: fix OOM risks, close quota gaps, add HA protections 2026-03-08 18:17:46 +00:00
shadowsocks [ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars 2026-03-07 14:30:36 +00:00
speedtest [ci skip] fix broken Homepage widgets + add service API tokens to SOPS 2026-03-07 20:39:55 +00:00
stirling-pdf [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
tandoor [ci skip] fix Homepage icons for Tandoor, Listenarr, Networking Toolbox, Goldilocks 2026-03-07 21:29:51 +00:00
terminal Add terminal stack - reverse proxy to ttyd behind authentik 2026-03-10 23:46:01 +00:00
tor-proxy [ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars 2026-03-07 14:30:36 +00:00
trading-bot resource quota review: fix OOM risks, close quota gaps, add HA protections 2026-03-08 18:17:46 +00:00
travel_blog [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
tuya-bridge [ci skip] fix invalid Homepage dashboard icons for 9 services 2026-03-07 21:14:17 +00:00
url [ci skip] add Homepage widget credentials for Authentik, Shlink, Home Assistant 2026-03-07 20:39:54 +00:00
wealthfolio [ci skip] fix Wealthfolio Homepage icon: wealthfolio.png → mdi-finance 2026-03-07 21:32:58 +00:00
webhook_handler [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00
whisper [ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars 2026-03-07 14:30:36 +00:00
woodpecker resource quota review: fix OOM risks, close quota gaps, add HA protections 2026-03-08 18:17:46 +00:00
ytdlp [ci skip] add Homepage gethomepage.dev annotations to all services 2026-03-07 20:39:54 +00:00