[ci skip] switch VPA to off mode globally, fix Ollama/MySQL resources

- Kyverno policy: VPA mode set to 'off' for all namespaces (was 'initial'
  for non-core). Terraform is now sole authority for container resources.
  Goldilocks provides recommendations only.
- Ollama: add explicit CPU/memory resources (500m/4Gi req, 4/12Gi limit)
  alongside GPU allocation. Fixes OOMKill from VPA scaling down resources.
- MySQL InnoDB Cluster: bump memory limit from 2Gi to 3Gi.
- Remove redundant per-namespace VPA opt-out labels from onlyoffice,
  openclaw, trading-bot (now handled globally by Kyverno policy).
This commit is contained in:
Viktor Barzin 2026-03-01 19:03:49 +00:00
parent 304b5e4b3d
commit 32762a0916
No known key found for this signature in database
GPG key ID: 0EB088298288D958
7 changed files with 21 additions and 61 deletions

View file

@ -175,9 +175,9 @@ Custom quota namespaces: `authentik` (16 req CPU/16Gi req mem/48 lim CPU/96Gi li
**LimitRange opt-out**: label `resource-governance/custom-limitrange=true` — skips Kyverno-generated LimitRange, requires a custom `kubernetes_limit_range` in the stack. Used by: `nextcloud` (max 16 CPU/8Gi), `onlyoffice` (max 8 CPU/8Gi). **LimitRange opt-out**: label `resource-governance/custom-limitrange=true` — skips Kyverno-generated LimitRange, requires a custom `kubernetes_limit_range` in the stack. Used by: `nextcloud` (max 16 CPU/8Gi), `onlyoffice` (max 8 CPU/8Gi).
**Other mutating policies**: `inject-priority-class-from-tier` (sets priorityClassName, **CREATE only**), `inject-ndots` (ndots:2 on all pods), `sync-tier-label-from-namespace`, `goldilocks-vpa-auto-mode` (sets VPA to `initial` for non-core, `off` for core). **Other mutating policies**: `inject-priority-class-from-tier` (sets priorityClassName, **CREATE only**), `inject-ndots` (ndots:2 on all pods), `sync-tier-label-from-namespace`, `goldilocks-vpa-auto-mode` (sets VPA to `off` for ALL namespaces — Terraform owns container resources, Goldilocks is observe-only).
**Goldilocks VPA warning**: VPA in Initial mode overrides explicit container resource limits on pod creation. To disable for a deployment: annotate with `goldilocks.fairwinds.com/enabled=false` and set namespace label `goldilocks.fairwinds.com/vpa-update-mode=off`. **Goldilocks VPA**: VPA is in `off` mode globally — it provides resource recommendations only via the Goldilocks dashboard, but never mutates pods. Terraform is the sole authority for container resources.
**Security policies** (ALL Audit mode, log-only): `deny-privileged-containers`, `deny-host-namespaces`, `restrict-sys-admin`, `require-trusted-registries`. **Security policies** (ALL Audit mode, log-only): `deny-privileged-containers`, `deny-host-namespaces`, `restrict-sys-admin`, `require-trusted-registries`.
@ -187,8 +187,7 @@ Custom quota namespaces: `authentik` (16 req CPU/16Gi req mem/48 lim CPU/96Gi li
3. **Evicted?** → aux-tier pods (priority 200K, Never preempt) are first evicted under pressure. 3. **Evicted?** → aux-tier pods (priority 200K, Never preempt) are first evicted under pressure.
4. **Unexpected limits?** → LimitRange injects defaults when `resources: {}` or no resources block exists. Always set explicit resources. 4. **Unexpected limits?** → LimitRange injects defaults when `resources: {}` or no resources block exists. Always set explicit resources.
5. **Need more?** → Set explicit `resources {}` on container (overrides LimitRange defaults) or add `resource-governance/custom-quota=true` label + `resource-governance/custom-limitrange=true` label with custom resources in the stack. 5. **Need more?** → Set explicit `resources {}` on container (overrides LimitRange defaults) or add `resource-governance/custom-quota=true` label + `resource-governance/custom-limitrange=true` label with custom resources in the stack.
6. **VPA overriding resources?** → Goldilocks VPA in `initial` mode scales down explicit limits. Annotate deployment with `goldilocks.fairwinds.com/enabled=false`. 6. **Pod patch failing with immutable spec?** → Kyverno `inject-priority-class-from-tier` was fixed to CREATE-only. If similar issues arise, check mutating webhooks with `kubectl get mutatingwebhookconfigurations`.
7. **Pod patch failing with immutable spec?** → Kyverno `inject-priority-class-from-tier` was fixed to CREATE-only. If similar issues arise, check mutating webhooks with `kubectl get mutatingwebhookconfigurations`.
--- ---

View file

@ -121,7 +121,13 @@ resource "kubernetes_deployment" "ollama" {
mount_path = "/root/.ollama" mount_path = "/root/.ollama"
} }
resources { resources {
requests = {
cpu = "500m"
memory = "4Gi"
}
limits = { limits = {
cpu = "4"
memory = "12Gi"
"nvidia.com/gpu" = "1" "nvidia.com/gpu" = "1"
} }
} }

View file

@ -11,8 +11,7 @@ resource "kubernetes_namespace" "onlyoffice" {
name = "onlyoffice" name = "onlyoffice"
labels = { labels = {
"istio-injection" : "disabled" "istio-injection" : "disabled"
tier = local.tiers.edge tier = local.tiers.edge
"goldilocks.fairwinds.com/vpa-update-mode" = "off"
"resource-governance/custom-limitrange" = "true" "resource-governance/custom-limitrange" = "true"
"resource-governance/custom-quota" = "true" "resource-governance/custom-quota" = "true"
} }

View file

@ -13,8 +13,7 @@ resource "kubernetes_namespace" "openclaw" {
metadata { metadata {
name = "openclaw" name = "openclaw"
labels = { labels = {
tier = local.tiers.aux tier = local.tiers.aux
"goldilocks.fairwinds.com/vpa-update-mode" = "off"
} }
} }
} }

View file

@ -150,7 +150,7 @@ resource "helm_release" "mysql_cluster" {
} }
limits = { limits = {
cpu = "2" cpu = "2"
memory = "2Gi" memory = "3Gi"
} }
} }
@ -176,7 +176,7 @@ resource "helm_release" "mysql_cluster" {
cpu = "250m" cpu = "250m"
} }
limits = { limits = {
memory = "2Gi" memory = "3Gi"
cpu = "2" cpu = "2"
} }
} }

View file

@ -86,12 +86,12 @@ module "ingress" {
} }
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Kyverno policy label namespaces for VPA mode by tier # Kyverno policy label namespaces for VPA observe-only mode
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# Goldilocks reads the goldilocks.fairwinds.com/vpa-update-mode label on # Goldilocks reads the goldilocks.fairwinds.com/vpa-update-mode label on
# namespaces to decide the updateMode for VPA objects it creates. # namespaces to decide the updateMode for VPA objects it creates.
# Tier 0-core gets "off" (recommend only these are critical infra where # All namespaces get "off" Terraform is the authoritative source of truth
# evictions cause downtime). All other namespaces get "auto". # for container resources. Goldilocks provides recommendations only.
resource "kubernetes_manifest" "vpa_auto_mode_label" { resource "kubernetes_manifest" "vpa_auto_mode_label" {
manifest = { manifest = {
@ -100,25 +100,19 @@ resource "kubernetes_manifest" "vpa_auto_mode_label" {
metadata = { metadata = {
name = "goldilocks-vpa-auto-mode" name = "goldilocks-vpa-auto-mode"
annotations = { annotations = {
"policies.kyverno.io/title" = "Goldilocks VPA Mode by Tier" "policies.kyverno.io/title" = "Goldilocks VPA Observe-Only Mode"
"policies.kyverno.io/description" = "Sets VPA update mode per namespace: Off for tier-0 critical infra (no evictions), Auto for all others." "policies.kyverno.io/description" = "Sets VPA update mode to off for all namespaces. Terraform owns container resources; Goldilocks provides recommendations only."
} }
} }
spec = { spec = {
rules = [ rules = [
# Tier 0-core: recommend only, never evict
{ {
name = "label-vpa-off-tier-0" name = "label-vpa-off-all"
match = { match = {
any = [ any = [
{ {
resources = { resources = {
kinds = ["Namespace"] kinds = ["Namespace"]
selector = {
matchLabels = {
tier = "0-core"
}
}
} }
} }
] ]
@ -133,42 +127,6 @@ resource "kubernetes_manifest" "vpa_auto_mode_label" {
} }
} }
}, },
# All other namespaces: initial mode (compatible with Terraform
# VPA mutates pods at creation, not the deployment spec)
{
name = "label-vpa-initial-default"
match = {
any = [
{
resources = {
kinds = ["Namespace"]
}
}
]
}
exclude = {
any = [
{
resources = {
selector = {
matchLabels = {
tier = "0-core"
}
}
}
}
]
}
mutate = {
patchStrategicMerge = {
metadata = {
labels = {
"goldilocks.fairwinds.com/vpa-update-mode" = "initial"
}
}
}
}
},
] ]
} }
} }

View file

@ -47,9 +47,8 @@ resource "kubernetes_namespace" "trading-bot" {
metadata { metadata {
name = "trading-bot" name = "trading-bot"
labels = { labels = {
tier = local.tiers.edge tier = local.tiers.edge
"resource-governance/custom-quota" = "true" "resource-governance/custom-quota" = "true"
"goldilocks.fairwinds.com/vpa-update-mode" = "off"
} }
} }
} }