fix alerts and reduce Prometheus disk write rate
- linkwarden: add Reloader match annotation to DB secret so pods auto-restart on Vault credential rotation (was causing 100% 5xx) - authentik: increase memory limits (server 1Gi→1.5Gi, worker 896Mi→1Gi) to prevent OOM kills - prometheus: drop 113k high-cardinality series to reduce HDD write rate from ~8.8 to ~6.0 MB/s (31% reduction): - drop all traefik/apiserver/etcd histogram bucket metrics - drop goflow2_flow_process_nf_templates_total (9.3k series) - drop container_tasks_state and container_memory_failures_total - rewrite HighServiceLatency alert to use avg latency (_sum/_count) - update cluster_health dashboard to match - raise KubeletRuntimeOperationsLatency threshold from 30s to 60s
This commit is contained in:
parent
7267e53e2f
commit
8a5a53a832
4 changed files with 334 additions and 10 deletions
|
|
@ -25,9 +25,9 @@ server:
|
|||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 1Gi
|
||||
memory: 1.5Gi
|
||||
limits:
|
||||
memory: 1Gi
|
||||
memory: 1.5Gi
|
||||
topologySpreadConstraints:
|
||||
- maxSkew: 1
|
||||
topologyKey: kubernetes.io/hostname
|
||||
|
|
@ -58,9 +58,9 @@ worker:
|
|||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 896Mi
|
||||
memory: 1Gi
|
||||
limits:
|
||||
memory: 896Mi
|
||||
memory: 1Gi
|
||||
topologySpreadConstraints:
|
||||
- maxSkew: 1
|
||||
topologyKey: kubernetes.io/hostname
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue