kyverno: disable reports-controller to stop etcd ephemeralreport load
All checks were successful
ci/woodpecker/push/default Pipeline was successful
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor flagged not wanting to wear the single non-RAID SSD with useless etcd writes if etcd moves there. Investigation found the avoidable load is kyverno reporting: the 2026-06-12 etcd-load-reduction disabled the report *features* but left the reports-controller running (default --enableReporting + --validatingAdmissionPolicyReports=true), so the 2026-06-21 kyverno upgrade left a one-time pile of ~10.5k cluster/namespaced ephemeralreports (~114MB in etcd) that nothing reaps (aggregation off). Listing that range starves etcd's fdatasync enough to flap the apiserver (observed live 2026-06-28). Disable the reports-controller outright (reportsController.enabled=false), completing the 2026-06-12 intent. Reports are not consumed (violations surface via Loki->Slack); admission enforcement (deny-* policies) and Keel mutation are independent of it. The ~10.5k stale reports already in etcd are cleared separately (throttled, out-of-band) since bulk-deleting them is itself etcd-heavy. Refs: code-oflt (etcd IO isolation), code-at4f (etcd starvation alerting). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
cf42042cba
commit
e43e64c666
1 changed files with 15 additions and 2 deletions
|
|
@ -36,8 +36,9 @@ resource "helm_release" "kyverno" {
|
|||
forceFailurePolicyIgnore = {
|
||||
enabled = true
|
||||
}
|
||||
# Reporting fully disabled (2026-06-12, etcd-load-reduction). policyReports
|
||||
# were already off, so admission/aggregate/background reporting generated
|
||||
# Reporting features disabled (2026-06-12, etcd-load-reduction); the
|
||||
# reportsController itself is now disabled too (2026-06-28, see below).
|
||||
# policyReports were already off, so admission/aggregate/background generated
|
||||
# ephemeralreports + an hourly all-resource etcd re-scan for NO user-facing
|
||||
# output. Admission enforcement (deny-* policies) and Keel mutation are
|
||||
# independent of reporting; policy violations surface via Loki->Slack. This
|
||||
|
|
@ -56,7 +57,19 @@ resource "helm_release" "kyverno" {
|
|||
}
|
||||
}
|
||||
|
||||
# Fully disable the reports controller (2026-06-28). The 2026-06-12 change
|
||||
# turned off the report *features* (policy/admission/aggregate/background) but
|
||||
# LEFT this controller running with its default --enableReporting +
|
||||
# --validatingAdmissionPolicyReports=true, so it kept emitting ephemeralreports.
|
||||
# The 2026-06-21 kyverno upgrade then produced a one-time pile of ~10.5k
|
||||
# cluster/namespaced ephemeralreports (~114MB in etcd) that nothing reaps
|
||||
# (aggregation off) — and listing that range starves etcd's fdatasync hard
|
||||
# enough to flap the apiserver (observed live 2026-06-28). Reports are not
|
||||
# consumed (violations surface via Loki->Slack), so disable the controller
|
||||
# outright; enforcement (deny-* policies) + Keel mutation are independent of
|
||||
# it. Stale reports are cleared out-of-band (one-time, throttled).
|
||||
reportsController = {
|
||||
enabled = false
|
||||
resources = {
|
||||
limits = {
|
||||
memory = "512Mi"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue