fix alerts and reduce Prometheus disk write rate

- linkwarden: add Reloader match annotation to DB secret so pods
  auto-restart on Vault credential rotation (was causing 100% 5xx)
- authentik: increase memory limits (server 1Gi→1.5Gi, worker 896Mi→1Gi)
  to prevent OOM kills
- prometheus: drop 113k high-cardinality series to reduce HDD write rate
  from ~8.8 to ~6.0 MB/s (31% reduction):
  - drop all traefik/apiserver/etcd histogram bucket metrics
  - drop goflow2_flow_process_nf_templates_total (9.3k series)
  - drop container_tasks_state and container_memory_failures_total
  - rewrite HighServiceLatency alert to use avg latency (_sum/_count)
  - update cluster_health dashboard to match
- raise KubeletRuntimeOperationsLatency threshold from 30s to 60s
This commit is contained in:
Viktor Barzin 2026-03-28 15:42:14 +02:00
parent 7267e53e2f
commit 8a5a53a832
4 changed files with 334 additions and 10 deletions

View file

@ -25,9 +25,9 @@ server:
resources:
requests:
cpu: 100m
memory: 1Gi
memory: 1.5Gi
limits:
memory: 1Gi
memory: 1.5Gi
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
@ -58,9 +58,9 @@ worker:
resources:
requests:
cpu: 100m
memory: 896Mi
memory: 1Gi
limits:
memory: 896Mi
memory: 1Gi
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname