infra/.planning/quick/resource-audit-live-metrics.md
Viktor Barzin 197cef7f3f [ci skip] add auto-generated tiers.tf, planning docs, and helm chart cache
- tiers.tf: Terragrunt-generated tier locals for all standalone stacks
- .planning/: resource audit research and plans
- docs/plans/: cluster hardening design doc
- redis-25.3.2.tgz: Bitnami Redis Helm chart cache
2026-03-06 23:55:57 +00:00

614 lines
28 KiB
Markdown

# Kubernetes Cluster Resource Audit - Live Metrics
**Collected**: 2026-03-01
**Cluster**: 5 nodes (k8s-master + k8s-node1-4), Kubernetes v1.34.2
---
## EXECUTIVE SUMMARY
### Critical Issues
#### OOMKilled Pods
| Namespace | Pod | Status |
|-----------|-----|--------|
| dbaas | mysql-cluster-0 | OOMKilled (last state) |
#### CrashLoopBackOff / ImagePullBackOff Pods
| Namespace | Pod | Status |
|-----------|-----|--------|
| vpa | vpa-admission-certgen-kdvqj | ImagePullBackOff |
#### Pods with NO Resource Limits (unbounded)
These pods have `<none>` for CPU and/or memory limits -- they can consume unlimited node resources:
| Namespace | Pod | Container | CPU Limit | Mem Limit |
|-----------|-----|-----------|-----------|-----------|
| calico-apiserver | calico-apiserver-*-bq6zp | calico-apiserver | <none> | <none> |
| calico-apiserver | calico-apiserver-*-q794h | calico-apiserver | <none> | <none> |
| calico-system | calico-kube-controllers-* | calico-kube-controllers | <none> | <none> |
| calico-system | calico-node-* (5 pods) | calico-node | <none> | <none> |
| calico-system | calico-typha-*-9wr7z | calico-typha | <none> | <none> |
| calico-system | calico-typha-*-hw8wt | calico-typha | <none> | <none> |
| calico-system | calico-typha-*-z69vx | calico-typha | <none> | <none> |
| calico-system | csi-node-driver-* (5 pods) | calico-csi, csi-node-driver-registrar | <none> | <none> |
| kube-system | etcd-k8s-master | etcd | <none> | <none> |
| kube-system | kube-apiserver-k8s-master | kube-apiserver | <none> | <none> |
| kube-system | kube-controller-manager-k8s-master | kube-controller-manager | <none> | <none> |
| kube-system | kube-proxy-* (5 pods) | kube-proxy | <none> | <none> |
| kube-system | kube-scheduler-k8s-master | kube-scheduler | <none> | <none> |
| kyverno | kyverno-admission-controller-* (2 pods) | kyverno | <none> (CPU) | 768Mi |
| kyverno | kyverno-background-controller-* | controller | <none> (CPU) | 128Mi |
| kyverno | kyverno-cleanup-controller-* | controller | <none> (CPU) | 128Mi |
| kyverno | kyverno-reports-controller-* | controller | <none> (CPU) | 128Mi |
| metallb-system | controller-* | controller | <none> | <none> |
| metallb-system | speaker-dn9bk | speaker | <none> | <none> |
| metallb-system | speaker-mnpsl | speaker | <none> | <none> |
| metallb-system | speaker-pl8dz | speaker | <none> | <none> |
| nvidia | nvidia-driver-daemonset-x2r6b | nvidia-driver-ctr | <none> | <none> |
**Note**: kube-system and calico-system pods without limits are standard for control-plane components. The NVIDIA driver daemonset is also expected. MetalLB pods without limits should be monitored.
#### Pods Near or Exceeding Memory Limits (>75% utilization)
| Namespace | Pod | Current Usage | Memory Limit | % Used |
|-----------|-----|--------------|--------------|--------|
| dbaas | mysql-cluster-0 | 1845Mi | 2Gi (sidecar:512Mi + mysql:2Gi) | ~90% of mysql container |
| dbaas | mysql-cluster-2 | 1212Mi | 2Gi (sidecar:512Mi + mysql:2Gi) | ~59% combined |
| dbaas | mysql-cluster-1 | 1083Mi | 2Gi (sidecar:512Mi + mysql:2Gi) | ~53% combined |
| dashy | dashy-* | 1048Mi | 4Gi | 26% but NOTE: 490m CPU near 500m limit (98%) |
| onlyoffice | onlyoffice-document-server-* | 1007Mi | 4Gi | 25% |
| stirling-pdf | stirling-pdf-* | 902Mi | 4Gi | 23% |
| trading-bot | trading-bot-workers-* | 1901Mi | 2Gi (sentiment-analyzer) | ~95% of largest container |
| authentik | goauthentik-server-*-x68p7 | 593Mi | 1Gi | 58% |
| authentik | goauthentik-server-*-4bjll | 583Mi | 1Gi | 57% |
| authentik | goauthentik-server-*-z68g8 | 548Mi | 1Gi | 54% |
| authentik | goauthentik-worker-*-klk6z | 551Mi | 1Gi | 54% |
| servarr | flaresolverr-* | 148Mi | 256Mi | 58% |
| speedtest | speedtest-* | 147Mi | ~1.2Gi | 12% |
| cnpg-system | cnpg-cloudnative-pg-* | 72Mi | 256Mi | 28% |
| mailserver | mailserver-* | 183Mi | 256Mi+256Mi | 36% per container |
| vpa | vpa-recommender-* | 74Mi | 512Mi | 14% (but 500Mi req = nearly full request!) |
#### Pods with CPU Near Limit (potential throttling)
| Namespace | Pod | Current CPU | CPU Limit | % Used |
|-----------|-----|------------|-----------|--------|
| dashy | dashy-* | 490m | 500m | **98%** -- actively throttling |
| stirling-pdf | stirling-pdf-* | 299m | 300m | **99.7%** -- actively throttling |
| frigate | frigate-* | 860m | 8000m | 11% |
| crowdsec | crowdsec-agent-rkvf2 | 13m | 500m | 3% (but req=limit=500m) |
| redis | redis-node-0 | 44m | 500m (redis) + 200m (sentinel) | 6% |
| redis | redis-node-1 | 43m | 1260m (redis) + 140m (sentinel) | 3% |
---
## NODE-LEVEL RESOURCE USAGE
| Node | CPU (cores) | CPU % | Memory | Memory % |
|------|-------------|-------|--------|----------|
| k8s-master | 805m | 10% | 5132Mi | 65% |
| k8s-node1 | 1002m | 6% | 9192Mi | 57% |
| k8s-node2 | 894m | 11% | 11517Mi | 48% |
| k8s-node3 | 781m | 9% | 13103Mi | 54% |
| k8s-node4 | 1333m | 16% | 13122Mi | 54% |
| **TOTAL** | **4815m** | **~10%** | **52066Mi** | **~55%** |
**Observations**:
- Memory is the tighter resource (~55% cluster-wide), CPU is abundant (~10%)
- k8s-master at 65% memory -- highest, but still has headroom
- k8s-node3 and k8s-node4 carry the most memory workloads (~13Gi each)
---
## POD RESOURCE USAGE BY NAMESPACE (sorted by total memory)
### Top 20 Memory Consumers
| Rank | Namespace/Pod | CPU | Memory | Mem Limit |
|------|--------------|-----|--------|-----------|
| 1 | frigate/frigate | 860m | 3835Mi | 16Gi |
| 2 | kube-system/kube-apiserver | 376m | 2531Mi | <none> |
| 3 | monitoring/prometheus-server | 36m | 1912Mi | 4Gi |
| 4 | trading-bot/trading-bot-workers | 7m | 1901Mi | 2Gi (largest) |
| 5 | dbaas/mysql-cluster-0 | 62m | 1845Mi | 2Gi (mysql) |
| 6 | monitoring/loki-0 | 95m | 1335Mi | ~2.9Gi |
| 7 | immich/immich-machine-learning | 8m | 1215Mi | 16Gi |
| 8 | dbaas/mysql-cluster-2 | 32m | 1212Mi | 2Gi (mysql) |
| 9 | nvidia/nvidia-driver-daemonset | 0m | 1168Mi | <none> |
| 10 | dbaas/mysql-cluster-1 | 40m | 1083Mi | 2Gi (mysql) |
| 11 | dashy/dashy | 490m | 1048Mi | 4Gi |
| 12 | onlyoffice/onlyoffice-document-server | 3m | 1007Mi | 4Gi |
| 13 | stirling-pdf/stirling-pdf | 299m | 902Mi | 4Gi |
| 14 | tandoor/tandoor | 1m | 754Mi | ~3.1Gi |
| 15 | paperless-ngx/paperless-ngx | 4m | 691Mi | ~3.7Gi |
| 16 | linkwarden/linkwarden | 8m | 682Mi | ~3.3Gi |
| 17 | ollama/ollama-ui | 2m | 658Mi | ~5.8Gi |
| 18 | whisper/whisper | 1m | 628Mi | ~5.8Gi |
| 19 | realestate-crawler/celery | 2m | 608Mi | 2Gi |
| 20 | authentik/goauthentik-server (x3) | ~17m each | ~575Mi each | 1Gi |
### Top 10 CPU Consumers
| Rank | Namespace/Pod | CPU | CPU Limit |
|------|--------------|-----|-----------|
| 1 | frigate/frigate | 860m | 8000m |
| 2 | dashy/dashy | 490m | 500m |
| 3 | kube-system/kube-apiserver | 376m | <none> |
| 4 | stirling-pdf/stirling-pdf | 299m | 300m |
| 5 | kube-system/etcd | 216m | <none> |
| 6 | monitoring/loki-0 | 95m | 504m |
| 7 | authentik/goauthentik-worker-c5zfs | 81m | 2000m |
| 8 | authentik/goauthentik-worker-b5wzk | 62m | 2000m |
| 9 | dbaas/mysql-cluster-0 | 62m | 2000m |
| 10 | calico-system/calico-node-wllsb | 49m | <none> |
---
## DETAILED NAMESPACE BREAKDOWN
### actualbudget
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| actualbudget-anca | 1m | 42Mi | 25m/250m | 64Mi/256Mi |
| actualbudget-emo | 1m | 40Mi | 25m/250m | 64Mi/256Mi |
| actualbudget-http-api-anca | 1m | 26Mi | 25m/250m | 64Mi/256Mi |
| actualbudget-http-api-emo | 0m | 26Mi | 25m/250m | 64Mi/256Mi |
| actualbudget-http-api-viktor | 1m | 29Mi | 25m/250m | 64Mi/256Mi |
| actualbudget-viktor | 1m | 56Mi | 25m/250m | 64Mi/256Mi |
**Quota**: 150m/4000m CPU used, 384Mi/4Gi mem used, 6/30 pods
### affine
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| affine | 4m | 174Mi | 35m/700m | ~237Mi/~1.9Gi |
**Quota**: 35m/2000m CPU, ~237Mi/2Gi mem, 1/20 pods
### aiostreams
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| aiostreams | 1m | 215Mi | 50m/500m | 256Mi/768Mi |
### atuin
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| atuin | 1m | 2Mi | 50m/500m | 64Mi/256Mi |
### audiobookshelf
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| audiobookshelf | 1m | 55Mi | 15m/150m | ~100Mi/400Mi |
### authentik
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| ak-outpost-embedded | 6m | 18Mi | 50m/500m | 64Mi/512Mi |
| goauthentik-server (x3) | 14-21m | 548-593Mi | 100m/2000m | 512Mi/1Gi |
| goauthentik-worker (x3) | 40-81m | 420-551Mi | 50-100m/1-2000m | 384Mi-600Mi/1-1.6Gi |
| pgbouncer (x3) | 1-2m | 2Mi | 15-50m/150-500m | ~100Mi/512-800Mi |
**Quota**: 680m/16000m CPU, ~3.3Gi/16Gi mem, 10/50 pods
### calibre
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| annas-archive-stacks | 1m | 60Mi | 25m/250m | 64Mi/256Mi |
| calibre-web-automated | 1m | 196Mi | 23m/460m | ~640Mi/~2.6Gi |
### changedetection
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| changedetection (2 containers) | 6m | 111Mi | 25m+25m/250m+250m | 64Mi+64Mi/256Mi+256Mi |
### cloudflared
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| cloudflared (x3) | 3-9m | 31-59Mi | 50m/500m | 64Mi/512Mi |
### crowdsec
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| crowdsec-agent (x3) | 3-13m | 43-48Mi | 500m/500m | 250Mi/250Mi |
| crowdsec-lapi (x3) | 1m | 30-34Mi | 23m/23m | ~121Mi/~121Mi |
| crowdsec-web | 2m | 46Mi | 50m/500m | 64Mi/512Mi |
**Note**: crowdsec-agent has CPU req=limit=500m (Guaranteed QoS). Same for memory at 250Mi.
### dashy
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| dashy | **490m** | 1048Mi | 15m/**500m** | 512Mi/4Gi |
**WARNING**: CPU at 98% of limit -- actively being throttled!
### dawarich
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| dawarich | 1m | 438Mi | 15m/150m | ~600Mi/~2.4Gi |
### dbaas
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| mysql-cluster-0 | 62m | 1845Mi | 50m+250m/500m+2000m | 64Mi+1Gi/512Mi+2Gi |
| mysql-cluster-1 | 40m | 1083Mi | 50m+250m/500m+2000m | 64Mi+1Gi/512Mi+2Gi |
| mysql-cluster-2 | 32m | 1212Mi | 50m+250m/500m+2000m | 64Mi+1Gi/512Mi+2Gi |
| pg-cluster-1 | 22m | 335Mi | 250m/2000m | 512Mi/4Gi |
| pg-cluster-2 | 11m | 155Mi | 250m/2000m | 512Mi/4Gi |
| pgadmin | 1m | 265Mi | 50m/500m | 64Mi/512Mi |
| phpmyadmin | 1m | 46Mi | 50m/500m | 64Mi/512Mi |
**WARNING**: mysql-cluster-0 was OOMKilled previously. Currently at 1845Mi with 2Gi limit on mysql container (~90%).
**Quota**: 1500m/8000m CPU, 4416Mi/12Gi mem, 7/30 pods
### echo
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| echo (x5) | 0-1m | 19-30Mi | 15-25m/150-250m | 64Mi-100Mi/256-400Mi |
### forgejo
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| forgejo | 1m | 170Mi | 15m/500m | ~215Mi/~1.7Gi |
### freedify
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| music-emo | 2m | 68Mi | 100m/500m | 256Mi/512Mi |
| music-viktor | 2m | 57Mi | 100m/500m | 256Mi/512Mi |
### frigate
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| frigate | 860m | 3835Mi | 800m/8000m | 2Gi/16Gi |
**Note**: Highest memory consumer in the cluster. GPU tier (2-gpu).
### headscale
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| headscale (2 containers) | 1m | 65Mi | 50m+25m/200m+100m | 64Mi+32Mi/256Mi+128Mi |
### homepage
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| homepage | 1m | 86Mi | 15m/150m | ~121Mi/~484Mi |
### immich
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| immich-frame | 1m | 30Mi | 15m/150m | ~105Mi/~838Mi |
| immich-machine-learning | 8m | 1215Mi | 15m/150m | 2Gi/16Gi |
| immich-postgresql | 1m | 268Mi | 15m/150m | ~990Mi/~7.9Gi |
| immich-server | 3m | 404Mi | 800m/8000m | ~990Mi/~7.9Gi |
**Quota**: 845m/8000m CPU, ~4.1Gi/8Gi mem, 4/40 pods. Note: mem at ~51% of quota.
### kms
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| kms | 0m | 0Mi | 15m/15m | ~100Mi/1Gi |
| kms-web-page | 0m | 10Mi | 500m/500m | 512Mi/512Mi |
**Note**: kms-web-page has req=limit (Guaranteed QoS) at 500m CPU and 512Mi, but uses 0m/10Mi.
### linkwarden
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| linkwarden | 8m | 682Mi | 15m/150m | ~826Mi/~3.3Gi |
### mailserver
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| mailserver (2 containers) | 9m | 183Mi | 25m+25m/250m+250m | 64Mi+64Mi/256Mi+256Mi |
| roundcubemail | 1m | 44Mi | 25m/250m | 64Mi/256Mi |
### meshcentral
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| meshcentral | 1m | 127Mi | 15m/300m | ~283Mi/~850Mi |
### monitoring
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| alloy (x3, DaemonSet) | 44-47m | 182-201Mi | 63m+11m/252m+550m | ~422Mi+50Mi/~845Mi+512Mi |
| caretta (x4, DaemonSet) | 2-4m | 250-267Mi | 15m/225m | ~422Mi/~2.5Gi |
| goflow2 | 11m | 28Mi | 15m/60m | ~100Mi/400Mi |
| grafana (x3) | 18m | 232-235Mi | 11m+11m+35m/110m+110m+350m | multi-container |
| idrac-redfish-exporter | 3m | 9Mi | 15m/150m | ~100Mi/800Mi |
| loki-0 (2 containers) | 95m | 1335Mi | 126m+11m/504m+110m | ~1.9Gi+~121Mi/~2.9Gi+~968Mi |
| node-exporter (x5) | 1m | 9-24Mi | 15m/150m | ~100Mi/800Mi |
| prometheus-alertmanager | 2m | 24Mi | 15m/150m | ~100Mi/800Mi |
| prometheus-kube-state-metrics | 3m | 33Mi | 15m/150m | ~100Mi/800Mi |
| prometheus-pushgateway | 1m | 18Mi | 15m/150m | ~100Mi/800Mi |
| prometheus-server (2 containers) | 36m | 1912Mi | 11m+93m/110m+930m | 50Mi+512Mi/400Mi+4Gi |
| proxmox-exporter | 1m | 41Mi | 23m/230m | ~100Mi/800Mi |
| snmp-exporter | 2m | 14Mi | 15m/150m | ~100Mi/800Mi |
| sysctl-inotify (x5) | 0m | 0Mi | 15m/15m | ~100Mi/~100Mi |
**Quota**: 1177m/16000m CPU, ~9Gi/16Gi mem, 32/100 pods
### mysql-operator
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| mysql-operator | 4m | 254Mi | 23m/230m | ~309Mi/~1.2Gi |
### n8n
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| n8n | 2m | 425Mi | 15m/150m | ~524Mi/~2.1Gi |
### netbox
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| netbox | 1m | 480Mi | 50m/2000m | 512Mi/4Gi |
### nextcloud
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| nextcloud (2 containers) | 9m | 234Mi | 100m+11m/16000m+110m | ~1.3Gi+~121Mi/~8Gi+~484Mi |
| whiteboard | 1m | 62Mi | 25m/250m | 64Mi/256Mi |
**Quota**: 136m/4000m CPU, ~1.5Gi/8Gi mem, 2/10 pods
### nvidia
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| gpu-feature-discovery | 1m | 76Mi | 100m+100m/1+1 | 256Mi+256Mi/2Gi+2Gi |
| gpu-operator | 14m | 63Mi | 200m/500m | 100Mi/350Mi |
| gpu-pod-exporter | 2m | 50Mi | 50m/200m | 128Mi/256Mi |
| nvidia-container-toolkit | 1m | 27Mi | 100m/1000m | 256Mi/2Gi |
| nvidia-dcgm-exporter | 17m | 538Mi | 100m/1000m | 256Mi/2Gi |
| nvidia-device-plugin | 1m | 47Mi | 100m+100m/1+1 | 256Mi+256Mi/2Gi+2Gi |
| nvidia-driver-daemonset | 0m | 1168Mi | <none> | <none> |
| nvidia-exporter | 1m | 138Mi | 15m/150m | ~121Mi/~968Mi |
| nfd-gc | 1m | 9Mi | 15m/1500m | ~100Mi/800Mi |
| nfd-master | 1m | 27Mi | 100m/4000m | 128Mi/4Gi |
| nfd-worker (x5) | 1m | 14-18Mi | 15m/3000m | ~100Mi/800Mi |
| nvidia-operator-validator | 0m | 1Mi | 100m/1000m | 256Mi/2Gi |
### ollama
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| ollama | 1m | 11Mi | 500m/4000m | 4Gi/12Gi |
| ollama-ui | 2m | 658Mi | 15m/150m | ~729Mi/~5.8Gi |
**Note**: ollama pod at only 11Mi but reserves 4Gi -- GPU workload likely using VRAM instead.
### onlyoffice
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| onlyoffice-document-server | 3m | 1007Mi | 250m/8000m | 512Mi/4Gi |
**Quota**: 250m/4000m CPU, 512Mi/4Gi mem, 1/10 pods
### openclaw
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| openclaw (2 containers) | 2m | 447Mi | 100m+25m/2000m+500m | 512Mi+64Mi/2Gi+256Mi |
### osm-routing
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| osrm-bicycle | 0m | 366Mi | 15m/250m | ~454Mi/~909Mi |
| osrm-foot | 0m | 359Mi | 15m/150m | ~454Mi/~1.8Gi |
### paperless-ngx
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| paperless-ngx | 4m | 691Mi | 49m/980m | ~933Mi/~3.7Gi |
### realestate-crawler
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| realestate-crawler-api (x2) | 2m | 133-134Mi | 15m/600m | ~194Mi/~1.6Gi |
| realestate-crawler-celery | 2m | 608Mi | 100m/2000m | 512Mi/2Gi |
| realestate-crawler-celery-beat | 0m | 107Mi | 15m/300m | ~175Mi/~699Mi |
| realestate-crawler-ui (x2) | 0m | 7-8Mi | 15-25m/150-250m | 64-100Mi/256-400Mi |
### redis
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| redis-node-0 (redis+sentinel) | 44m | 47Mi | 50m+50m/500m+200m | 64Mi+64Mi/256Mi+128Mi |
| redis-node-1 (redis+sentinel) | 43m | 25Mi | 126m+35m/1260m+140m | ~50Mi+~50Mi/200Mi+100Mi |
### resume
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| printer | 3m | 109Mi | 15m/300m | 1Gi/4Gi |
| resume | 1m | 116Mi | 15m/300m | ~215Mi/~645Mi |
### rybbit
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| rybbit | 2m | 185Mi | 15m/150m | ~215Mi/~860Mi |
| rybbit-client | 1m | 89Mi | 25m/250m | 64Mi/256Mi |
**Note**: rybbit-client at 89Mi with 256Mi limit (35%).
### servarr
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| flaresolverr | 1m | 148Mi | 25m/250m | 64Mi/256Mi |
| listenarr | 2m | 383Mi | 15m/600m | ~640Mi/~2.6Gi |
| prowlarr | 1m | 149Mi | 15m/150m | ~260Mi/~1Gi |
| qbittorrent | 1m | 29Mi | 25m/250m | 64Mi/256Mi |
**WARNING**: flaresolverr at 148Mi / 256Mi = 58% of mem limit.
### speedtest
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| speedtest | 1m | 147Mi | 200m/2000m | ~309Mi/~1.2Gi |
### stirling-pdf
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| stirling-pdf | **299m** | 902Mi | 15m/**300m** | 1Gi/4Gi |
**WARNING**: CPU at 99.7% of limit -- actively being throttled!
### tandoor
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| tandoor | 1m | 754Mi | 15m/150m | ~776Mi/~3.1Gi |
### technitium
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| technitium | 1m | 184Mi | 100m/500m | 128Mi/512Mi |
| technitium-secondary | 9m | 123Mi | 100m/500m | 128Mi/512Mi |
### trading-bot
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| trading-bot-frontend (2 containers) | 2m | 174Mi | 10m+50m/200m+1000m | 32Mi+128Mi/128Mi+512Mi |
| trading-bot-workers (6 containers) | 7m | 1901Mi | 10m+100m+10m+10m+10m+10m/500m+2000m+500m+500m+500m+500m | 64Mi*5+512Mi/256Mi*5+2Gi |
**WARNING**: trading-bot-workers at 1901Mi. The sentiment-analyzer container has 2Gi limit, possibly near OOM.
### traefik
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| auth-proxy (x2) | 1m | 7Mi | 5m/50m | 16Mi/32Mi |
| bot-block-proxy (x2) | 1m | 7Mi | 5m/50m | 16Mi/32Mi |
| traefik (x3) | 4-14m | 81-120Mi | 100m/500m | 128Mi/512Mi |
### uptime-kuma
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| uptime-kuma | 23m | 163Mi | 49m/196m | ~237Mi/~947Mi |
### vpa
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| goldilocks-controller | 7m | 30Mi | 49m/980m | ~105Mi/~209Mi |
| goldilocks-dashboard | 1m | 8Mi | 15m/300m | ~105Mi/~209Mi |
| vpa-admission-certgen | N/A | N/A | 50m/500m | 64Mi/512Mi |
| vpa-admission-controller | 3m | 48Mi | 50m/500m | 200Mi/512Mi |
| vpa-recommender | 13m | 74Mi | 50m/500m | 500Mi/512Mi |
| vpa-updater | 2m | 68Mi | 50m/500m | 500Mi/512Mi |
**WARNING**: vpa-admission-certgen in ImagePullBackOff.
### whisper
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| piper | 0m | 32Mi | 100m/1000m | 256Mi/2Gi |
| whisper | 1m | 628Mi | 15m/150m | ~729Mi/~5.8Gi |
### wireguard
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| wireguard (2 containers) | 1m | 2Mi | 50m+50m/500m+500m | 64Mi+64Mi/512Mi+512Mi |
### woodpecker
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| woodpecker-agent-0 | 1m | 17Mi | 15m/150m | ~100Mi/400Mi |
| woodpecker-agent-1 | 1m | 28Mi | 25m/250m | 64Mi/256Mi |
| woodpecker-server-0 | 4m | 32Mi | 25m/250m | 64Mi/256Mi |
### website
| Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----|----------|----------|-------------|-------------|
| blog (x3, 2 containers each) | 0-1m | 17-19Mi | 11m+11m/22m+110m | ~50Mi+~50Mi/512Mi+200Mi |
### Other Small Namespaces
| Namespace | Pod | CPU Used | Mem Used | CPU Req/Lim | Mem Req/Lim |
|-----------|-----|----------|----------|-------------|-------------|
| city-guesser | city-guesser | 1m | 23Mi | 250m/500m | 50Mi/512Mi |
| coturn | coturn | 1m | 7Mi | 15m/150m | ~100Mi/400Mi |
| cyberchef | cyberchef | 0m | 8Mi | 15m/150m | ~100Mi/400Mi |
| diun | diun | 1m | 24Mi | 15m/150m | ~100Mi/400Mi |
| excalidraw | excalidraw | 0m | 2Mi | 15m/150m | ~100Mi/400Mi |
| f1-stream | f1-stream | 7m | 53Mi | 50m/500m | 64Mi/256Mi |
| freshrss | freshrss | 1m | 56Mi | 25m/250m | 64Mi/256Mi |
| hackmd | hackmd | 2m | 82Mi | 15m/150m | ~138Mi/~552Mi |
| health | health | 2m | 101Mi | 100m/1000m | 256Mi/1Gi |
| isponsorblocktv | isponsorblocktv-vermont | 1m | 42Mi | 15m/150m | ~100Mi/400Mi |
| jsoncrack | jsoncrack | 0m | 7Mi | 15m/150m | ~100Mi/400Mi |
| k8s-portal | k8s-portal | 0m | 14Mi | 25m/250m | 64Mi/256Mi |
| navidrome | navidrome | 1m | 62Mi | 15m/150m | ~156Mi/~623Mi |
| ntfy | ntfy | 1m | 20Mi | 25m/250m | 64Mi/256Mi |
| owntracks | owntracks | 1m | 1Mi | 15m/150m | ~100Mi/400Mi |
| plotting-book | plotting-book | 0m | 22Mi | 50m/500m | 128Mi/512Mi |
| privatebin | privatebin | 1m | 46Mi | 15m/150m | ~100Mi/400Mi |
| send | send | 0m | 53Mi | 15m/150m | ~100Mi/400Mi |
| shadowsocks | shadowsocks | 1m | 0Mi | 15m/150m | ~100Mi/400Mi |
| tor-proxy | tor-proxy | 1m | 61Mi | 15m/150m | ~105Mi/~419Mi |
| vaultwarden | vaultwarden | 1m | 49Mi | 50m/200m | 64Mi/256Mi |
| wealthfolio | wealthfolio | 0m | 8Mi | 15m/150m | ~100Mi/400Mi |
| webhook-handler | webhook-handler | 1m | 8Mi | 15m/30m | ~100Mi/1Gi |
| xray | xray | 0m | 11Mi | 50m/500m | 64Mi/512Mi |
---
## LIMITRANGE DEFAULTS BY NAMESPACE
| Namespace | Default CPU | Default Mem | Max CPU | Max Mem | Tier |
|-----------|-------------|-------------|---------|---------|------|
| **GPU tier (2-gpu)** | | | | | |
| ebook2audiobook | 1 | 2Gi | 8 | 16Gi | 2-gpu |
| frigate | 1 | 2Gi | 8 | 16Gi | 2-gpu |
| immich | 1 | 2Gi | 8 | 16Gi | 2-gpu |
| nvidia | 1 | 2Gi | 8 | 16Gi | 2-gpu |
| ollama | 1 | 2Gi | 8 | 16Gi | 2-gpu |
| whisper | 1 | 2Gi | 8 | 16Gi | 2-gpu |
| **Core tier (0-core)** | | | | | |
| cloudflared | 500m | 512Mi | 4 | 8Gi | 0-core |
| headscale | 500m | 512Mi | 4 | 8Gi | 0-core |
| technitium | 500m | 512Mi | 4 | 8Gi | 0-core |
| traefik | 500m | 512Mi | 4 | 8Gi | 0-core |
| wireguard | 500m | 512Mi | 4 | 8Gi | 0-core |
| xray | 500m | 512Mi | 4 | 8Gi | 0-core |
| **Cluster tier (1-cluster)** | | | | | |
| authentik | 500m | 512Mi | 2 | 4Gi | 1-cluster |
| cnpg-system | 500m | 512Mi | 2 | 4Gi | 1-cluster |
| crowdsec | 500m | 512Mi | 2 | 4Gi | 1-cluster |
| dbaas | 500m | 512Mi | 2 | 4Gi | 1-cluster |
| metrics-server | 500m | 512Mi | 2 | 4Gi | 1-cluster |
| monitoring | 500m | 512Mi | 2 | 4Gi | 1-cluster |
| poison-fountain | 500m | 512Mi | 2 | 4Gi | 1-cluster |
| redis | 500m | 512Mi | 2 | 4Gi | 1-cluster |
| tuya-bridge | 500m | 512Mi | 2 | 4Gi | 1-cluster |
| uptime-kuma | 500m | 512Mi | 2 | 4Gi | 1-cluster |
| vpa | 500m | 512Mi | 2 | 4Gi | 1-cluster |
| **Edge tier (3-edge)** | | | | | |
| Most app namespaces | 250m | 256Mi | 2 | 4Gi | 3-edge |
| **Aux tier (4-aux)** | | | | | |
| Some app namespaces | 250m | 256Mi | 2 | 4Gi | 4-aux |
| **Custom LimitRanges** | | | | | |
| nextcloud | 250m | 256Mi | 16 | 8Gi | Custom |
| onlyoffice | 250m | 256Mi | 8 | 8Gi | Custom |
| **No tier** | | | | | |
| aiostreams | 250m | 256Mi | 1 | 2Gi | None |
| default | 250m | 256Mi | 1 | 2Gi | None |
| descheduler | 250m | 256Mi | 1 | 2Gi | None |
| gadget | 250m | 256Mi | 1 | 2Gi | None |
| kured | 250m | 256Mi | 1 | 2Gi | None |
| local-path-storage | 250m | 256Mi | 1 | 2Gi | None |
| mysql-operator | 250m | 256Mi | 1 | 2Gi | None |
| reverse-proxy | 250m | 256Mi | 1 | 2Gi | None |
| tigera-operator | 250m | 256Mi | 1 | 2Gi | None |
---
## RESOURCEQUOTA UTILIZATION (top consumers)
| Namespace | CPU Req Used/Hard | Mem Req Used/Hard | Pods Used/Hard | % Mem Req |
|-----------|-------------------|-------------------|----------------|-----------|
| monitoring | 1177m/16000m | ~9Gi/16Gi | 32/100 | ~56% |
| authentik | 680m/16000m | ~3.3Gi/16Gi | 10/50 | ~21% |
| crowdsec | 1619m/8000m | ~1.1Gi/8Gi | 7/30 | ~14% |
| dbaas | 1500m/8000m | 4416Mi/12Gi | 7/30 | ~36% |
| immich | 845m/8000m | ~4.1Gi/8Gi | 4/40 | ~51% |
| ollama | 515m/8000m | ~4.7Gi/8Gi | 2/40 | ~59% |
| nextcloud | 136m/4000m | ~1.5Gi/8Gi | 2/10 | ~19% |
| rybbit | 140m/2000m | ~791Mi/2Gi | 3/20 | ~39% |
---
## ACTION ITEMS
### Immediate (potential service impact)
1. **dashy** -- CPU throttled at 98% (490m/500m). Increase CPU limit or investigate high CPU usage.
2. **stirling-pdf** -- CPU throttled at 99.7% (299m/300m). Increase CPU limit.
3. **dbaas/mysql-cluster-0** -- Previously OOMKilled. Currently at ~1845Mi with 2Gi limit on mysql container (~90%). Monitor closely or increase limit.
4. **vpa/vpa-admission-certgen** -- ImagePullBackOff. Fix image reference.
5. **trading-bot-workers** -- 1901Mi across 6 containers, sentiment-analyzer at 2Gi limit. Verify not OOMing.
### Medium Priority (resource waste or risk)
6. **kms/kms-web-page** -- Guaranteed QoS at 500m CPU / 512Mi, but only uses 0m/10Mi. Massive overprovisioning.
7. **ollama/ollama** -- Requests 4Gi memory but uses 11Mi (GPU model in VRAM). If not using CPU memory, reduce request.
8. **resume/printer** -- Requests 1Gi memory but uses 109Mi. Consider reducing.
9. **nvidia-driver-daemonset** -- No limits set, using 1168Mi. Standard for driver but worth noting.
10. **servarr/flaresolverr** -- At 58% memory (148Mi/256Mi). Trending toward limit.
### Low Priority (optimization opportunities)
11. Multiple pods in the monitoring namespace have generous limits but low actual usage (node-exporters at 9-24Mi with 800Mi limits).
12. crowdsec-agent pods have Guaranteed QoS (req=limit) at 500m/250Mi but use only 3-13m CPU and 43-48Mi memory.
13. Many edge-tier pods using <10% of their memory limits -- VPA recommendations could help right-size.