equalize memory req=lim across 70+ containers using Prometheus 7d max data
After node2 OOM incident, right-size memory across the cluster by setting requests=limits based on max_over_time(container_memory_working_set_bytes[7d]) with 1.3x headroom. Eliminates ~37Gi overcommit gap. Categories: - Safe equalization (50 containers): set req=lim where max7d well within target - Limit increases (8 containers): raise limits for services spiking above current - No Prometheus data (12 containers): conservatively set lim=req - Exception: nextcloud keeps req=256Mi/lim=8Gi due to Apache memory spikes Also increased dbaas namespace quota from 12Gi to 16Gi to accommodate mysql 4Gi limits across 3 replicas.
This commit is contained in:
parent
eb0301b02b
commit
23019da8e5
39 changed files with 211 additions and 74 deletions
|
|
@ -24,14 +24,42 @@ resource "helm_release" "nfs_csi_driver" {
|
|||
controller = {
|
||||
replicas = 2
|
||||
resources = {
|
||||
requests = { cpu = "10m", memory = "32Mi" }
|
||||
limits = { memory = "128Mi" }
|
||||
csiProvisioner = {
|
||||
requests = { cpu = "10m", memory = "128Mi" }
|
||||
limits = { memory = "128Mi" }
|
||||
}
|
||||
csiResizer = {
|
||||
requests = { cpu = "10m", memory = "128Mi" }
|
||||
limits = { memory = "128Mi" }
|
||||
}
|
||||
csiSnapshotter = {
|
||||
requests = { cpu = "10m", memory = "128Mi" }
|
||||
limits = { memory = "128Mi" }
|
||||
}
|
||||
nfs = {
|
||||
requests = { cpu = "10m", memory = "128Mi" }
|
||||
limits = { memory = "128Mi" }
|
||||
}
|
||||
livenessProbe = {
|
||||
requests = { cpu = "10m", memory = "64Mi" }
|
||||
limits = { memory = "64Mi" }
|
||||
}
|
||||
}
|
||||
}
|
||||
node = {
|
||||
resources = {
|
||||
requests = { cpu = "10m", memory = "32Mi" }
|
||||
limits = { memory = "128Mi" }
|
||||
nfs = {
|
||||
requests = { cpu = "10m", memory = "128Mi" }
|
||||
limits = { memory = "128Mi" }
|
||||
}
|
||||
livenessProbe = {
|
||||
requests = { cpu = "10m", memory = "64Mi" }
|
||||
limits = { memory = "64Mi" }
|
||||
}
|
||||
nodeDriverRegistrar = {
|
||||
requests = { cpu = "10m", memory = "64Mi" }
|
||||
limits = { memory = "64Mi" }
|
||||
}
|
||||
}
|
||||
}
|
||||
storageClass = {
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue