docs(security): wave 1 plan — Kyverno enforce, NetworkPolicy egress, audit logging, source-IP anomaly

Locked design for wave 1 of cluster security hardening. Plan only — implementation lives in beads code-8ywc and follow-up commits. Captures: - security.md: Kyverno policy table updated (Audit → Enforce planned for the four security policies with the 31-namespace exclude list). New section "Audit Logging & Anomaly Detection" detailing the K8s API audit policy, Vault audit device + X-Forwarded-For trust, source-IP anomaly rules (K9, V7, S1), and the rejected-canary-tokens / rejected-K1 rationales. New section "NetworkPolicy Default-Deny Egress" describing the observe-then-enforce (γ) approach for tier 3+4. - monitoring.md: new "Security Alerts (Wave 1)" section listing the 16 rules (K2-K9, V1-V7, S1) and the Loki ruler → Alertmanager → #security routing path. - runbooks/security-incident.md (new): per-alert response playbook with LogQL queries, action steps, false-positive triage, and SEV1 escalation. - .claude/CLAUDE.md: new "Security Posture" section summarising the locked decisions: identity allowlist is me@viktorbarzin.me ONLY, source-IP allowlist CIDRs, no public-IP access policy, rationale for not adopting canary tokens. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-18 19:10:16 +00:00 · 2026-05-18 19:10:16 +00:00 · 01de3babd6
commit 01de3babd6
parent 20018cd9b4
4 changed files with 335 additions and 8 deletions
--- a/docs/architecture/monitoring.md
+++ b/docs/architecture/monitoring.md
@ -176,6 +176,35 @@ The email monitoring system uses a CronJob (`email-roundtrip-monitor`, every 10

 Uptime Kuma monitors: TCP SMTP (port 25) on `176.12.22.76` (external), IMAP (port 993) on `10.0.20.202`, and Dovecot exporter metrics on port 9166.

+#### Security Alerts (Wave 1 — planned, beads `code-8ywc`)
+
+Routed via **Loki ruler → Alertmanager → `#security` Slack receiver**. Same handling path as infra alerts. Single channel with severity labels inside (critical/warning/info), not three separate channels. Detection sources: K8s API audit log (`job=kube-audit`), Vault audit log (`job=vault-audit`), PVE sshd journald (`job=sshd-pve`), Calico flow logs (`job=calico-flow`, W1.6 only).
+
+| # | Source | Event | Severity |
+|---|---|---|---|
+| K2 | kube-audit | SA token used from outside cluster | critical |
+| K3 | kube-audit | Secret read in vault/sealed-secrets/external-secrets by non-allowlisted SA | critical |
+| K4 | kube-audit | Exec into vault/kube-system/dbaas/cnpg-system pod by non-allowlisted user | warning |
+| K5 | kube-audit | Mass delete (>5 Pod/Secret/CM in 60s) | critical |
+| K6 | kube-audit | Audit policy itself modified | critical |
+| K7 | kube-audit | New `*,*` ClusterRole created | warning |
+| K8 | kube-audit | Anonymous binding granted | critical |
+| K9 | kube-audit | `me@viktorbarzin.me` request from non-allowlist sourceIP | critical |
+| V1 | vault-audit | Root token created | critical |
+| V2 | vault-audit | Audit device disabled/modified | critical |
+| V3 | vault-audit | Seal status changed | critical |
+| V4 | vault-audit | Policy written/modified (allowlist Terraform actor) | warning |
+| V5 | vault-audit | Auth failure spike >10/min | warning |
+| V6 | vault-audit | Token with policies different from parent created | critical |
+| V7 | vault-audit | Viktor's entity_id from non-allowlist remote_addr (requires `x_forwarded_for_authorized_addrs`) | critical |
+| S1 | sshd-pve | sshd auth success from non-allowlist IP | critical |
+
+K1 (cluster-admin grant) intentionally skipped — see security.md.
+
+Allowlist source-IP CIDRs (used by K2, K9, V7, S1): `10.0.20.0/22`, `192.168.1.0/24`, K8s pod CIDR, K8s service CIDR, Headscale tailnet. Policy: no public-IP access; all admin paths transit LAN or Headscale.
+
+IOPS impact estimated ~1-2 GB/day additional disk writes after custom audit-policy tuning. Retention: 90d for security streams.
+
 #### Backup Alerts
 - **PostgreSQLBackupStale**: >36h since last backup
 - **MySQLBackupStale**: >36h since last backup