docs(security): wave 1 plan — Kyverno enforce, NetworkPolicy egress, audit logging, source-IP anomaly

Locked design for wave 1 of cluster security hardening. Plan only — implementation lives in beads
code-8ywc and follow-up commits. Captures:

- security.md: Kyverno policy table updated (Audit → Enforce planned for the four security policies
  with the 31-namespace exclude list). New section "Audit Logging & Anomaly Detection" detailing the
  K8s API audit policy, Vault audit device + X-Forwarded-For trust, source-IP anomaly rules (K9, V7,
  S1), and the rejected-canary-tokens / rejected-K1 rationales. New section "NetworkPolicy
  Default-Deny Egress" describing the observe-then-enforce (γ) approach for tier 3+4.
- monitoring.md: new "Security Alerts (Wave 1)" section listing the 16 rules (K2-K9, V1-V7, S1)
  and the Loki ruler → Alertmanager → #security routing path.
- runbooks/security-incident.md (new): per-alert response playbook with LogQL queries, action
  steps, false-positive triage, and SEV1 escalation.
- .claude/CLAUDE.md: new "Security Posture" section summarising the locked decisions: identity
  allowlist is me@viktorbarzin.me ONLY, source-IP allowlist CIDRs, no public-IP access policy,
  rationale for not adopting canary tokens.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-18 19:10:16 +00:00
parent 20018cd9b4
commit 01de3babd6
4 changed files with 335 additions and 8 deletions

View file

@ -176,6 +176,35 @@ The email monitoring system uses a CronJob (`email-roundtrip-monitor`, every 10
Uptime Kuma monitors: TCP SMTP (port 25) on `176.12.22.76` (external), IMAP (port 993) on `10.0.20.202`, and Dovecot exporter metrics on port 9166.
#### Security Alerts (Wave 1 — planned, beads `code-8ywc`)
Routed via **Loki ruler → Alertmanager → `#security` Slack receiver**. Same handling path as infra alerts. Single channel with severity labels inside (critical/warning/info), not three separate channels. Detection sources: K8s API audit log (`job=kube-audit`), Vault audit log (`job=vault-audit`), PVE sshd journald (`job=sshd-pve`), Calico flow logs (`job=calico-flow`, W1.6 only).
| # | Source | Event | Severity |
|---|---|---|---|
| K2 | kube-audit | SA token used from outside cluster | critical |
| K3 | kube-audit | Secret read in vault/sealed-secrets/external-secrets by non-allowlisted SA | critical |
| K4 | kube-audit | Exec into vault/kube-system/dbaas/cnpg-system pod by non-allowlisted user | warning |
| K5 | kube-audit | Mass delete (>5 Pod/Secret/CM in 60s) | critical |
| K6 | kube-audit | Audit policy itself modified | critical |
| K7 | kube-audit | New `*,*` ClusterRole created | warning |
| K8 | kube-audit | Anonymous binding granted | critical |
| K9 | kube-audit | `me@viktorbarzin.me` request from non-allowlist sourceIP | critical |
| V1 | vault-audit | Root token created | critical |
| V2 | vault-audit | Audit device disabled/modified | critical |
| V3 | vault-audit | Seal status changed | critical |
| V4 | vault-audit | Policy written/modified (allowlist Terraform actor) | warning |
| V5 | vault-audit | Auth failure spike >10/min | warning |
| V6 | vault-audit | Token with policies different from parent created | critical |
| V7 | vault-audit | Viktor's entity_id from non-allowlist remote_addr (requires `x_forwarded_for_authorized_addrs`) | critical |
| S1 | sshd-pve | sshd auth success from non-allowlist IP | critical |
K1 (cluster-admin grant) intentionally skipped — see security.md.
Allowlist source-IP CIDRs (used by K2, K9, V7, S1): `10.0.20.0/22`, `192.168.1.0/24`, K8s pod CIDR, K8s service CIDR, Headscale tailnet. Policy: no public-IP access; all admin paths transit LAN or Headscale.
IOPS impact estimated ~1-2 GB/day additional disk writes after custom audit-policy tuning. Retention: 90d for security streams.
#### Backup Alerts
- **PostgreSQLBackupStale**: >36h since last backup
- **MySQLBackupStale**: >36h since last backup