infra/stacks/platform/modules/monitoring
Viktor Barzin 69b513992a Right-size CPU requests cluster-wide and remove missed CPU limits
Increase requests for under-requested pods (dashy 50m→250m, frigate 500m→1500m,
clickhouse 100m→500m, otp 100m→300m, linkwarden 25m→50m, authentik worker 50m→100m).

Reduce requests for over-requested pods (crowdsec agent/lapi 500m→25m each,
prometheus 200m→100m, dbaas mysql 1800m→100m, pg-cluster 250m→50m,
shlink-web 250m→10m, gpu-pod-exporter 50m→10m, stirling-pdf 100m→25m,
technitium 100m→25m, celery 50m→15m). Reduce crowdsec quota from 8→1 CPU.

Remove missed CPU limits in prometheus (cpu: "2") and dbaas (cpu: "3600m") tpl files.
2026-03-18 08:03:58 +00:00
..
dashboards Add node hang instrumentation and scale down chromium services 2026-03-18 08:03:58 +00:00
server-power-cycle [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
alloy.yaml Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
caretta.tf Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
Dockerfile [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
goflow2.tf Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
grafana.tf [ci skip] fix: add mount_options to all NFS PVs (soft,timeo=30,retrans=3) 2026-03-02 20:23:36 +00:00
grafana_chart_values.yaml Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
idrac.tf [ci skip] platform: add ndots=2 dns_config to all deployment pod specs 2026-02-23 22:43:05 +00:00
k8s-monitoring-values.yaml [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
loki.tf feat(monitoring): Disable Loki centralized logging while preserving configuration 2026-03-17 16:51:02 +00:00
loki.yaml Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
main.tf Remove all CPU limits cluster-wide to eliminate CFS throttling 2026-03-18 08:03:58 +00:00
prometheus.tf [ci skip] expand Prometheus PVC to 200Gi, increase retention to 180GB for 1-year history 2026-03-06 23:16:32 +00:00
prometheus_chart_values.tpl Right-size CPU requests cluster-wide and remove missed CPU limits 2026-03-18 08:03:58 +00:00
prometheus_snmp_chart_values.yaml [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00
pve_exporter.tf [ci skip] platform: add ndots=2 dns_config to all deployment pod specs 2026-02-23 22:43:05 +00:00
snmp_exporter.tf [ci skip] platform: add ndots=2 dns_config to all deployment pod specs 2026-02-23 22:43:05 +00:00
ups_snmp_values.yaml [ci skip] Move Terraform modules into stack directories 2026-02-22 14:38:14 +00:00