apiserver: enable audit logging (low-write Metadata) + ship to Loki
Resource changes/deletions are now attributable (the novelapp deletion this week
was untraceable because apiserver audit was off). Low-write policy: drops
reads/noise, Metadata level on mutations, omitStages RequestReceived. Wired into
the kube-apiserver static-pod manifest + kubeadm-config (v1beta4
extraArgs/extraVolumes -> survives kubeadm upgrade) on k8s-master; Alloy tails
/var/log/kubernetes/audit/audit.log -> Loki {job=kubernetes-audit}.
Root cause that had silently blocked this AND OIDC for weeks: a stray
kube-apiserver.yaml.bak inside /etc/kubernetes/manifests/ was a duplicate
static-pod manifest kubelet ran instead of the real one, dropping every flag
added to the real manifest. Removed it. Runbook added.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
3696ff5922
commit
551412488b
3 changed files with 137 additions and 3 deletions
74
docs/runbooks/apiserver-audit-logging.md
Normal file
74
docs/runbooks/apiserver-audit-logging.md
Normal file
|
|
@ -0,0 +1,74 @@
|
|||
# Runbook: kube-apiserver Audit Logging
|
||||
|
||||
**Status:** enabled 2026-06-06 on `k8s-master` (10.0.20.100, the single
|
||||
control-plane node). Motivated by the novelapp incident — a workload was
|
||||
deleted with no way to attribute it, because apiserver audit logging had never
|
||||
been on (see post-incident note below).
|
||||
|
||||
## What is configured
|
||||
|
||||
- **Audit policy:** `infra/scripts/k8s-apiserver-audit-policy.yaml` (source of
|
||||
truth), deployed to `/etc/kubernetes/audit-policy.yaml` on k8s-master.
|
||||
Low-write by design: drops reads (get/list/watch), high-churn resources
|
||||
(events, leases, endpointslices, token/subjectaccess reviews), and probe
|
||||
URLs; logs everything else (create/update/patch/delete) at **Metadata**
|
||||
level (who/verb/resource/namespace/name/time/sourceIP — no bodies).
|
||||
`omitStages: [RequestReceived]` → one line per mutating request.
|
||||
- **kube-apiserver static-pod manifest** (`/etc/kubernetes/manifests/kube-apiserver.yaml`):
|
||||
`--audit-policy-file=/etc/kubernetes/audit-policy.yaml`,
|
||||
`--audit-log-path=/var/log/kubernetes/audit/audit.log`,
|
||||
`--audit-log-maxage=30 --audit-log-maxbackup=10 --audit-log-maxsize=100`
|
||||
(≤1 GB on disk, 30-day rotation), plus the `audit-policy` (File, RO) and
|
||||
`audit-logs` (DirectoryOrCreate) hostPath volumes/mounts.
|
||||
- **Persistence across `kubeadm upgrade`:** the same flags + volumes are in the
|
||||
`kubeadm-config` ConfigMap (`kube-system`), `ClusterConfiguration.apiServer.{extraArgs,extraVolumes}`
|
||||
(v1beta4). Without this, a control-plane upgrade regenerates the manifest and
|
||||
silently drops audit (and oidc). The OIDC flags are recorded there too (see
|
||||
below).
|
||||
- **Shipping to Loki:** the Alloy DaemonSet
|
||||
(`infra/stacks/monitoring/modules/monitoring/alloy.yaml`) tails
|
||||
`/var/log/kubernetes/audit/audit.log` (it schedules on the control-plane node
|
||||
and mounts host `/var/log`). Query in Loki/Grafana with
|
||||
`{job="kubernetes-audit"}`.
|
||||
|
||||
## How to attribute a change ("who deleted X, when")
|
||||
|
||||
```
|
||||
# In Loki (Grafana Explore or logcli), last 24h:
|
||||
{job="kubernetes-audit"} |= "delete" |= "<resource-name>"
|
||||
```
|
||||
Each entry is a JSON `audit.k8s.io/v1` Event: `user.username`, `verb`,
|
||||
`objectRef.{resource,namespace,name}`, `requestReceivedTimestamp`,
|
||||
`sourceIPs`, `userAgent`. On-node fallback (Loki down):
|
||||
`sudo grep <name> /var/log/kubernetes/audit/audit.log` on k8s-master.
|
||||
|
||||
Note: direct `kubectl`/dashboard calls now show the real identity (user SA or
|
||||
OIDC email). Pre-2026-06-06 deletions are NOT recoverable (audit was off).
|
||||
|
||||
## CRITICAL gotcha that blocked this (and OIDC) for weeks
|
||||
|
||||
`kubelet` runs **every** non-dotfile in its `staticPodPath`
|
||||
(`/etc/kubernetes/manifests`) as a static pod. A stray
|
||||
`kube-apiserver.yaml.bak.<epoch>` left in that directory (from an earlier manual
|
||||
edit) was a **second** manifest defining pod `kube-apiserver`. kubelet ran the
|
||||
older `.bak` copy and ignored edits to the real `kube-apiserver.yaml` — so newly
|
||||
added flags (the OIDC flags, then these audit flags) never reached the running
|
||||
process even though the file clearly had them. Symptom: the running apiserver's
|
||||
`/proc/<pid>/cmdline` (or `crictl inspect` args) is SHORTER than the manifest's
|
||||
`command:` list. Fix: move any `*.bak`/backup OUT of `/etc/kubernetes/manifests/`.
|
||||
**Always back up control-plane manifests to a sibling dir (e.g.
|
||||
`/etc/kubernetes/`), never inside `manifests/`.** This also un-blocked OIDC
|
||||
(memory id=4042) as a side effect.
|
||||
|
||||
## Rollback
|
||||
|
||||
Backups live in `/etc/kubernetes/apiserver-manifest-archive/` on k8s-master
|
||||
(the 27-arg pre-audit known-good, and the 36-arg desired). To disable audit:
|
||||
remove the `--audit-*` flags + audit volumes from the manifest (kubelet
|
||||
restarts the apiserver in ~30-40s), and remove them from `kubeadm-config`. A bad
|
||||
manifest edit only needs the known-good copied back over
|
||||
`/etc/kubernetes/manifests/kube-apiserver.yaml`.
|
||||
|
||||
Editing the apiserver manifest restarts the apiserver → ~30-40s API blip on this
|
||||
single-control-plane cluster. Always edit from a backup + watch
|
||||
`curl -sk https://10.0.20.100:6443/livez` before declaring success.
|
||||
57
scripts/k8s-apiserver-audit-policy.yaml
Normal file
57
scripts/k8s-apiserver-audit-policy.yaml
Normal file
|
|
@ -0,0 +1,57 @@
|
|||
# kube-apiserver audit policy -- k8s-master (10.0.20.100), single control-plane.
|
||||
#
|
||||
# Goal: a durable "who/when/what" trail for MUTATIONS (create/update/patch/
|
||||
# delete) so resource deletions can be attributed even though direct
|
||||
# kubectl-to-apiserver calls otherwise leave no trace (see the 2026-06-06
|
||||
# novelapp incident: a dashboard delete was attributable, a direct-kubectl
|
||||
# recreate was not). Deployed OUTSIDE Terraform (the k8s VMs are not TF-managed,
|
||||
# see memory id=1575); this file is the source of truth, scp'd to
|
||||
# /etc/kubernetes/audit-policy.yaml and wired into the apiserver static-pod
|
||||
# manifest + the kubeadm-config ConfigMap (so "kubeadm upgrade" preserves it).
|
||||
#
|
||||
# Tuned for LOW WRITE VOLUME (the cluster's sdc HDD is write-sensitive, see
|
||||
# memory id=559): reads are dropped entirely, high-churn resources and probe
|
||||
# endpoints are dropped, and the verbose RequestReceived stage is omitted, so
|
||||
# only one Metadata-level line is written per mutating request.
|
||||
apiVersion: audit.k8s.io/v1
|
||||
kind: Policy
|
||||
# Only emit the post-execution stage -- halves volume vs logging both stages.
|
||||
omitStages:
|
||||
- RequestReceived
|
||||
rules:
|
||||
# 1. Never log read-only verbs -- the overwhelming majority of traffic and
|
||||
# irrelevant to "who changed/deleted X".
|
||||
- level: None
|
||||
verbs: ["get", "list", "watch"]
|
||||
|
||||
# 2. Drop high-churn / low-value resources even on writes.
|
||||
- level: None
|
||||
resources:
|
||||
- group: ""
|
||||
resources: ["events", "endpoints", "nodes/status", "pods/status"]
|
||||
- group: "coordination.k8s.io"
|
||||
resources: ["leases"]
|
||||
- group: "discovery.k8s.io"
|
||||
resources: ["endpointslices"]
|
||||
- group: "metrics.k8s.io"
|
||||
- group: "authentication.k8s.io"
|
||||
resources: ["tokenreviews"]
|
||||
- group: "authorization.k8s.io"
|
||||
resources: ["subjectaccessreviews", "selfsubjectaccessreviews"]
|
||||
|
||||
# 3. Drop noisy non-resource probe / discovery URLs.
|
||||
- level: None
|
||||
nonResourceURLs:
|
||||
- "/healthz*"
|
||||
- "/readyz*"
|
||||
- "/livez*"
|
||||
- "/version"
|
||||
- "/metrics"
|
||||
- "/openapi/*"
|
||||
- "/swagger*"
|
||||
|
||||
# 4. Everything else (every create/update/patch/delete on real resources):
|
||||
# record WHO (user + sourceIP + userAgent), WHAT (resource/namespace/name),
|
||||
# WHEN, and the verb -- at Metadata level (no request/response bodies, so
|
||||
# each entry stays small).
|
||||
- level: Metadata
|
||||
|
|
@ -222,11 +222,14 @@ alloy:
|
|||
forward_to = [loki.write.default.receiver]
|
||||
}
|
||||
|
||||
// Kubernetes audit log collection from /var/log/kubernetes/audit.log
|
||||
// Requires alloy.mounts.varlog=true to mount /var/log from the host
|
||||
// Kubernetes audit log collection from /var/log/kubernetes/audit/audit.log
|
||||
// (kube-apiserver --audit-log-path on k8s-master; rotated siblings stay in
|
||||
// the audit/ subdir). Requires alloy.mounts.varlog=true to mount /var/log
|
||||
// from the host. Enabled 2026-06-06 once apiserver audit actually started
|
||||
// writing — see infra/scripts/k8s-apiserver-audit-policy.yaml.
|
||||
local.file_match "audit_logs" {
|
||||
path_targets = [{
|
||||
__path__ = "/var/log/kubernetes/audit.log",
|
||||
__path__ = "/var/log/kubernetes/audit/audit.log",
|
||||
job = "kubernetes-audit",
|
||||
node = env("HOSTNAME"),
|
||||
}]
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue