diff --git a/docs/runbooks/apiserver-audit-logging.md b/docs/runbooks/apiserver-audit-logging.md new file mode 100644 index 00000000..e8d732cd --- /dev/null +++ b/docs/runbooks/apiserver-audit-logging.md @@ -0,0 +1,74 @@ +# Runbook: kube-apiserver Audit Logging + +**Status:** enabled 2026-06-06 on `k8s-master` (10.0.20.100, the single +control-plane node). Motivated by the novelapp incident — a workload was +deleted with no way to attribute it, because apiserver audit logging had never +been on (see post-incident note below). + +## What is configured + +- **Audit policy:** `infra/scripts/k8s-apiserver-audit-policy.yaml` (source of + truth), deployed to `/etc/kubernetes/audit-policy.yaml` on k8s-master. + Low-write by design: drops reads (get/list/watch), high-churn resources + (events, leases, endpointslices, token/subjectaccess reviews), and probe + URLs; logs everything else (create/update/patch/delete) at **Metadata** + level (who/verb/resource/namespace/name/time/sourceIP — no bodies). + `omitStages: [RequestReceived]` → one line per mutating request. +- **kube-apiserver static-pod manifest** (`/etc/kubernetes/manifests/kube-apiserver.yaml`): + `--audit-policy-file=/etc/kubernetes/audit-policy.yaml`, + `--audit-log-path=/var/log/kubernetes/audit/audit.log`, + `--audit-log-maxage=30 --audit-log-maxbackup=10 --audit-log-maxsize=100` + (≤1 GB on disk, 30-day rotation), plus the `audit-policy` (File, RO) and + `audit-logs` (DirectoryOrCreate) hostPath volumes/mounts. +- **Persistence across `kubeadm upgrade`:** the same flags + volumes are in the + `kubeadm-config` ConfigMap (`kube-system`), `ClusterConfiguration.apiServer.{extraArgs,extraVolumes}` + (v1beta4). Without this, a control-plane upgrade regenerates the manifest and + silently drops audit (and oidc). The OIDC flags are recorded there too (see + below). +- **Shipping to Loki:** the Alloy DaemonSet + (`infra/stacks/monitoring/modules/monitoring/alloy.yaml`) tails + `/var/log/kubernetes/audit/audit.log` (it schedules on the control-plane node + and mounts host `/var/log`). Query in Loki/Grafana with + `{job="kubernetes-audit"}`. + +## How to attribute a change ("who deleted X, when") + +``` +# In Loki (Grafana Explore or logcli), last 24h: +{job="kubernetes-audit"} |= "delete" |= "" +``` +Each entry is a JSON `audit.k8s.io/v1` Event: `user.username`, `verb`, +`objectRef.{resource,namespace,name}`, `requestReceivedTimestamp`, +`sourceIPs`, `userAgent`. On-node fallback (Loki down): +`sudo grep /var/log/kubernetes/audit/audit.log` on k8s-master. + +Note: direct `kubectl`/dashboard calls now show the real identity (user SA or +OIDC email). Pre-2026-06-06 deletions are NOT recoverable (audit was off). + +## CRITICAL gotcha that blocked this (and OIDC) for weeks + +`kubelet` runs **every** non-dotfile in its `staticPodPath` +(`/etc/kubernetes/manifests`) as a static pod. A stray +`kube-apiserver.yaml.bak.` left in that directory (from an earlier manual +edit) was a **second** manifest defining pod `kube-apiserver`. kubelet ran the +older `.bak` copy and ignored edits to the real `kube-apiserver.yaml` — so newly +added flags (the OIDC flags, then these audit flags) never reached the running +process even though the file clearly had them. Symptom: the running apiserver's +`/proc//cmdline` (or `crictl inspect` args) is SHORTER than the manifest's +`command:` list. Fix: move any `*.bak`/backup OUT of `/etc/kubernetes/manifests/`. +**Always back up control-plane manifests to a sibling dir (e.g. +`/etc/kubernetes/`), never inside `manifests/`.** This also un-blocked OIDC +(memory id=4042) as a side effect. + +## Rollback + +Backups live in `/etc/kubernetes/apiserver-manifest-archive/` on k8s-master +(the 27-arg pre-audit known-good, and the 36-arg desired). To disable audit: +remove the `--audit-*` flags + audit volumes from the manifest (kubelet +restarts the apiserver in ~30-40s), and remove them from `kubeadm-config`. A bad +manifest edit only needs the known-good copied back over +`/etc/kubernetes/manifests/kube-apiserver.yaml`. + +Editing the apiserver manifest restarts the apiserver → ~30-40s API blip on this +single-control-plane cluster. Always edit from a backup + watch +`curl -sk https://10.0.20.100:6443/livez` before declaring success. diff --git a/scripts/k8s-apiserver-audit-policy.yaml b/scripts/k8s-apiserver-audit-policy.yaml new file mode 100644 index 00000000..7c210f44 --- /dev/null +++ b/scripts/k8s-apiserver-audit-policy.yaml @@ -0,0 +1,57 @@ +# kube-apiserver audit policy -- k8s-master (10.0.20.100), single control-plane. +# +# Goal: a durable "who/when/what" trail for MUTATIONS (create/update/patch/ +# delete) so resource deletions can be attributed even though direct +# kubectl-to-apiserver calls otherwise leave no trace (see the 2026-06-06 +# novelapp incident: a dashboard delete was attributable, a direct-kubectl +# recreate was not). Deployed OUTSIDE Terraform (the k8s VMs are not TF-managed, +# see memory id=1575); this file is the source of truth, scp'd to +# /etc/kubernetes/audit-policy.yaml and wired into the apiserver static-pod +# manifest + the kubeadm-config ConfigMap (so "kubeadm upgrade" preserves it). +# +# Tuned for LOW WRITE VOLUME (the cluster's sdc HDD is write-sensitive, see +# memory id=559): reads are dropped entirely, high-churn resources and probe +# endpoints are dropped, and the verbose RequestReceived stage is omitted, so +# only one Metadata-level line is written per mutating request. +apiVersion: audit.k8s.io/v1 +kind: Policy +# Only emit the post-execution stage -- halves volume vs logging both stages. +omitStages: + - RequestReceived +rules: + # 1. Never log read-only verbs -- the overwhelming majority of traffic and + # irrelevant to "who changed/deleted X". + - level: None + verbs: ["get", "list", "watch"] + + # 2. Drop high-churn / low-value resources even on writes. + - level: None + resources: + - group: "" + resources: ["events", "endpoints", "nodes/status", "pods/status"] + - group: "coordination.k8s.io" + resources: ["leases"] + - group: "discovery.k8s.io" + resources: ["endpointslices"] + - group: "metrics.k8s.io" + - group: "authentication.k8s.io" + resources: ["tokenreviews"] + - group: "authorization.k8s.io" + resources: ["subjectaccessreviews", "selfsubjectaccessreviews"] + + # 3. Drop noisy non-resource probe / discovery URLs. + - level: None + nonResourceURLs: + - "/healthz*" + - "/readyz*" + - "/livez*" + - "/version" + - "/metrics" + - "/openapi/*" + - "/swagger*" + + # 4. Everything else (every create/update/patch/delete on real resources): + # record WHO (user + sourceIP + userAgent), WHAT (resource/namespace/name), + # WHEN, and the verb -- at Metadata level (no request/response bodies, so + # each entry stays small). + - level: Metadata diff --git a/stacks/monitoring/modules/monitoring/alloy.yaml b/stacks/monitoring/modules/monitoring/alloy.yaml index cd9d223b..e146d498 100644 --- a/stacks/monitoring/modules/monitoring/alloy.yaml +++ b/stacks/monitoring/modules/monitoring/alloy.yaml @@ -222,11 +222,14 @@ alloy: forward_to = [loki.write.default.receiver] } - // Kubernetes audit log collection from /var/log/kubernetes/audit.log - // Requires alloy.mounts.varlog=true to mount /var/log from the host + // Kubernetes audit log collection from /var/log/kubernetes/audit/audit.log + // (kube-apiserver --audit-log-path on k8s-master; rotated siblings stay in + // the audit/ subdir). Requires alloy.mounts.varlog=true to mount /var/log + // from the host. Enabled 2026-06-06 once apiserver audit actually started + // writing — see infra/scripts/k8s-apiserver-audit-policy.yaml. local.file_match "audit_logs" { path_targets = [{ - __path__ = "/var/log/kubernetes/audit.log", + __path__ = "/var/log/kubernetes/audit/audit.log", job = "kubernetes-audit", node = env("HOSTNAME"), }]