docs(security): wave 1 plan — Kyverno enforce, NetworkPolicy egress, audit logging, source-IP anomaly
Locked design for wave 1 of cluster security hardening. Plan only — implementation lives in beads code-8ywc and follow-up commits. Captures: - security.md: Kyverno policy table updated (Audit → Enforce planned for the four security policies with the 31-namespace exclude list). New section "Audit Logging & Anomaly Detection" detailing the K8s API audit policy, Vault audit device + X-Forwarded-For trust, source-IP anomaly rules (K9, V7, S1), and the rejected-canary-tokens / rejected-K1 rationales. New section "NetworkPolicy Default-Deny Egress" describing the observe-then-enforce (γ) approach for tier 3+4. - monitoring.md: new "Security Alerts (Wave 1)" section listing the 16 rules (K2-K9, V1-V7, S1) and the Loki ruler → Alertmanager → #security routing path. - runbooks/security-incident.md (new): per-alert response playbook with LogQL queries, action steps, false-positive triage, and SEV1 escalation. - .claude/CLAUDE.md: new "Security Posture" section summarising the locked decisions: identity allowlist is me@viktorbarzin.me ONLY, source-IP allowlist CIDRs, no public-IP access policy, rationale for not adopting canary tokens. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
20018cd9b4
commit
01de3babd6
4 changed files with 335 additions and 8 deletions
|
|
@ -176,6 +176,35 @@ The email monitoring system uses a CronJob (`email-roundtrip-monitor`, every 10
|
|||
|
||||
Uptime Kuma monitors: TCP SMTP (port 25) on `176.12.22.76` (external), IMAP (port 993) on `10.0.20.202`, and Dovecot exporter metrics on port 9166.
|
||||
|
||||
#### Security Alerts (Wave 1 — planned, beads `code-8ywc`)
|
||||
|
||||
Routed via **Loki ruler → Alertmanager → `#security` Slack receiver**. Same handling path as infra alerts. Single channel with severity labels inside (critical/warning/info), not three separate channels. Detection sources: K8s API audit log (`job=kube-audit`), Vault audit log (`job=vault-audit`), PVE sshd journald (`job=sshd-pve`), Calico flow logs (`job=calico-flow`, W1.6 only).
|
||||
|
||||
| # | Source | Event | Severity |
|
||||
|---|---|---|---|
|
||||
| K2 | kube-audit | SA token used from outside cluster | critical |
|
||||
| K3 | kube-audit | Secret read in vault/sealed-secrets/external-secrets by non-allowlisted SA | critical |
|
||||
| K4 | kube-audit | Exec into vault/kube-system/dbaas/cnpg-system pod by non-allowlisted user | warning |
|
||||
| K5 | kube-audit | Mass delete (>5 Pod/Secret/CM in 60s) | critical |
|
||||
| K6 | kube-audit | Audit policy itself modified | critical |
|
||||
| K7 | kube-audit | New `*,*` ClusterRole created | warning |
|
||||
| K8 | kube-audit | Anonymous binding granted | critical |
|
||||
| K9 | kube-audit | `me@viktorbarzin.me` request from non-allowlist sourceIP | critical |
|
||||
| V1 | vault-audit | Root token created | critical |
|
||||
| V2 | vault-audit | Audit device disabled/modified | critical |
|
||||
| V3 | vault-audit | Seal status changed | critical |
|
||||
| V4 | vault-audit | Policy written/modified (allowlist Terraform actor) | warning |
|
||||
| V5 | vault-audit | Auth failure spike >10/min | warning |
|
||||
| V6 | vault-audit | Token with policies different from parent created | critical |
|
||||
| V7 | vault-audit | Viktor's entity_id from non-allowlist remote_addr (requires `x_forwarded_for_authorized_addrs`) | critical |
|
||||
| S1 | sshd-pve | sshd auth success from non-allowlist IP | critical |
|
||||
|
||||
K1 (cluster-admin grant) intentionally skipped — see security.md.
|
||||
|
||||
Allowlist source-IP CIDRs (used by K2, K9, V7, S1): `10.0.20.0/22`, `192.168.1.0/24`, K8s pod CIDR, K8s service CIDR, Headscale tailnet. Policy: no public-IP access; all admin paths transit LAN or Headscale.
|
||||
|
||||
IOPS impact estimated ~1-2 GB/day additional disk writes after custom audit-policy tuning. Retention: 90d for security streams.
|
||||
|
||||
#### Backup Alerts
|
||||
- **PostgreSQLBackupStale**: >36h since last backup
|
||||
- **MySQLBackupStale**: >36h since last backup
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue