infra/stacks/platform/modules/monitoring
Viktor Barzin d8bcdfef2e revert MaxRequestWorkers to 50, exclude nextcloud from 5xx alert
- MaxRequestWorkers 25→50: too few workers caused ALL workers to block
  on SQLite locks, making liveness probes fail even faster (131 restarts
  vs 50 before). 50 is a compromise — enough workers for probes.
- Excluded nextcloud from HighServiceErrorRate alert (chronic SQLite issue)
- MySQL migration attempted but hit: GR error 3100 (fixed with GIPK),
  emoji in calendar/filecache (stripped), SQLite corruption (pre-existing
  from crash-looping). Migration rolled back, Nextcloud restored to SQLite.
2026-03-09 22:05:20 +00:00
..
dashboards [ci skip] color only public IPs red in service map, private IPs (10.x, 192.168.x) get light blue 2026-02-28 19:44:16 +00:00
server-power-cycle [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
alloy.yaml [ci skip] Infrastructure hardening: security, monitoring, reliability, maintainability 2026-02-23 22:05:28 +00:00
caretta.tf resource quota review: fix OOM risks, close quota gaps, add HA protections 2026-03-08 18:17:46 +00:00
Dockerfile [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
goflow2.tf [ci skip] fix caretta helm values and goflow2 transport args 2026-02-28 18:51:02 +00:00
grafana.tf [ci skip] fix: add mount_options to all NFS PVs (soft,timeo=30,retrans=3) 2026-03-02 20:23:36 +00:00
grafana_chart_values.yaml resource quota review: fix OOM risks, close quota gaps, add HA protections 2026-03-08 18:17:46 +00:00
idrac.tf [ci skip] platform: add ndots=2 dns_config to all deployment pod specs 2026-02-23 22:43:05 +00:00
k8s-monitoring-values.yaml [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
loki.tf revert MaxRequestWorkers to 50, exclude nextcloud from 5xx alert 2026-03-09 22:05:20 +00:00
loki.yaml [ci skip] migrate Redis, Prometheus, Loki storage to iSCSI 2026-03-06 20:50:55 +00:00
main.tf resource quota review: fix OOM risks, close quota gaps, add HA protections 2026-03-08 18:17:46 +00:00
prometheus.tf [ci skip] expand Prometheus PVC to 200Gi, increase retention to 180GB for 1-year history 2026-03-06 23:16:32 +00:00
prometheus_chart_values.tpl revert MaxRequestWorkers to 50, exclude nextcloud from 5xx alert 2026-03-09 22:05:20 +00:00
prometheus_snmp_chart_values.yaml [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
pve_exporter.tf [ci skip] platform: add ndots=2 dns_config to all deployment pod specs 2026-02-23 22:43:05 +00:00
snmp_exporter.tf [ci skip] platform: add ndots=2 dns_config to all deployment pod specs 2026-02-23 22:43:05 +00:00
ups_snmp_values.yaml [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00