security(wave1): W1.2 Vault XFF (applied) + W1.4/W1.5 Kyverno code prep (apply blocked on provider crash)

## W1.2 — Vault audit device + X-Forwarded-For (APPLIED + VERIFIED)
- Added `x_forwarded_for_authorized_addrs = "10.10.0.0/16"` to vault listener config.
  Trust X-Forwarded-For from in-cluster sources (pod CIDR). Without this, every
  vault audit log entry shows Traefik's pod IP instead of the real client IP —
  the V7 alert rule (Viktor identity from non-allowlist source IP) needs the
  real client IP to be meaningful.
- Applied via `tg apply -target=helm_release.vault` (vault stack has pre-existing
  for_each unknown issues unrelated to this change; -target documented in error
  message itself as the workaround).
- Rolling restart of vault-{0,1,2} performed manually (StatefulSet uses OnDelete
  update strategy, not RollingUpdate). All 3 pods rejoined Raft + auto-unsealed
  within ~10s each. Verified XFF config visible in pod's
  /vault/config/extraconfig-from-values.hcl.
- The `vault_audit "file"` resource was already in TF at line 287 (writing to
  /vault/audit/vault-audit.log) — no change needed.

## W1.4 + W1.5 — Kyverno enforce flip (CODE ONLY, apply BLOCKED)
- Added shared `local.security_policy_exclude_namespaces` (31 critical namespaces
  from memory id=1970 + `frigate, kured, default, changedetection` discovered
  during the live-cluster pre-flight check for privileged/hostNetwork/SYS_ADMIN
  pods that would be blocked by Enforce).
- Flipped 3 security policies Audit → Enforce: deny-privileged-containers,
  deny-host-namespaces, restrict-sys-admin. failurePolicy=Ignore preserved at
  chart level.
- `require-trusted-registries` STAYS in Audit mode pending allowlist tightening
  (current pattern includes `*/*` which matches anything-with-a-slash, so Enforce
  would be a no-op for supply chain). Tracked under beads `code-8ywc` W1.5.

**Apply blocker**: `tg plan` panics with `terraform-provider-kubernetes_v3.1.0`
crash on the kubernetes_manifest resources (`ElementKeyInt(0): can't use
tftypes.Object...` — provider schema mismatch on Kyverno CRDs). The crash
reproduces on the UNMODIFIED file, so it's a pre-existing provider issue, not
caused by these changes. Resolving it requires either upgrading the provider or
finding a kubernetes_manifest-compatible workaround. Tracked under `code-8ywc`.

## Wave 1 status after this commit
- W1.2: APPLIED + VERIFIED (vault XFF + audit device already in place)
- W1.4 + W1.5: code ready, apply blocked on provider crash
- W1.1, W1.3, W1.6, W1.7: not started in this session

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-05-18 19:26:39 +00:00 committed by Viktor Barzin
parent 87961e9ef8
commit ae0c1701ec
2 changed files with 54 additions and 8 deletions

View file

@ -1,8 +1,33 @@
# =============================================================================
# Pod Security Policies (Audit Mode)
# Pod Security Policies
# =============================================================================
# Kyverno validate policies for pod security standards.
# All policies start in Audit mode - violations are logged but not blocked.
# Wave 1 (locked 2026-05-18, beads code-8ywc): deny-privileged-containers,
# deny-host-namespaces, restrict-sys-admin flipped from Audit Enforce with
# a shared 32-namespace exclude list. require-trusted-registries STAYS in
# Audit until the allowlist pattern is tightened beyond `*/*` (separate work
# item current pattern allows everything with a slash, so Enforce would be
# a no-op for supply-chain protection).
# failurePolicy stays Ignore (chart-level) to prevent admission webhook
# failures from cascading.
# Shared namespace exclude list 31 critical namespaces from the Keel rollout
# (memory id=1970) + `frigate` (legitimately needs host access for camera RTSP).
locals {
security_policy_exclude_namespaces = [
"keel", "calico-system", "authentik", "vault", "cnpg-system", "dbaas",
"monitoring", "traefik", "technitium", "mailserver", "kyverno",
"metallb-system", "external-secrets", "proxmox-csi", "nfs-csi", "nvidia",
"kube-system", "cloudflared", "crowdsec", "reverse-proxy", "reloader",
"descheduler", "vpa", "redis", "sealed-secrets", "headscale", "wireguard",
"xray", "infra-maintenance", "metrics-server", "tigera-operator", "frigate",
# Additions discovered during wave 1 enforce flip these contain workloads
# that legitimately need privileged / hostNetwork / SYS_ADMIN:
"kured", # kured DaemonSet is privileged (manages node reboots)
"default", # etcd backup + defrag CronJobs use hostNetwork
"changedetection", # uses SYS_ADMIN for chromium sandbox
]
}
resource "kubernetes_manifest" "policy_deny_privileged" {
manifest = {
@ -18,7 +43,7 @@ resource "kubernetes_manifest" "policy_deny_privileged" {
}
}
spec = {
validationFailureAction = "Audit"
validationFailureAction = "Enforce"
background = true
rules = [{
name = "deny-privileged"
@ -32,7 +57,7 @@ resource "kubernetes_manifest" "policy_deny_privileged" {
exclude = {
any = [{
resources = {
namespaces = ["frigate", "nvidia", "monitoring"]
namespaces = local.security_policy_exclude_namespaces
}
}]
}
@ -74,7 +99,7 @@ resource "kubernetes_manifest" "policy_deny_host_namespaces" {
}
}
spec = {
validationFailureAction = "Audit"
validationFailureAction = "Enforce"
background = true
rules = [{
name = "deny-host-namespaces"
@ -88,7 +113,7 @@ resource "kubernetes_manifest" "policy_deny_host_namespaces" {
exclude = {
any = [{
resources = {
namespaces = ["frigate", "monitoring"]
namespaces = local.security_policy_exclude_namespaces
}
}]
}
@ -123,7 +148,7 @@ resource "kubernetes_manifest" "policy_restrict_capabilities" {
}
}
spec = {
validationFailureAction = "Audit"
validationFailureAction = "Enforce"
background = true
rules = [{
name = "restrict-sys-admin"
@ -137,7 +162,7 @@ resource "kubernetes_manifest" "policy_restrict_capabilities" {
exclude = {
any = [{
resources = {
namespaces = ["nvidia", "monitoring"]
namespaces = local.security_policy_exclude_namespaces
}
}]
}
@ -265,6 +290,11 @@ resource "kubernetes_manifest" "policy_require_trusted_registries" {
}
}
spec = {
# NOTE: Stays in Audit mode pending allowlist tightening. The current
# pattern includes `*/*` which matches any image with a registry flipping
# to Enforce would not actually restrict supply chain. Tightening the
# allowlist to a precise enumeration of in-use registries is tracked
# separately under beads code-8ywc (W1.5 follow-up).
validationFailureAction = "Audit"
background = true
rules = [{
@ -276,6 +306,13 @@ resource "kubernetes_manifest" "policy_require_trusted_registries" {
}
}]
}
exclude = {
any = [{
resources = {
namespaces = local.security_policy_exclude_namespaces
}
}]
}
validate = {
message = "Images must be from trusted registries (docker.io, ghcr.io, quay.io, registry.k8s.io, or local cache)."
pattern = {

View file

@ -89,6 +89,15 @@ resource "helm_release" "vault" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
# Trust X-Forwarded-For from in-cluster sources only (Traefik runs in pod CIDR).
# Without this, every audit log entry shows Traefik's pod IP instead of the
# real client IP V7 (Viktor identity from non-allowlist source IP) needs the
# real client IP to work. Pod CIDR = 10.10.0.0/16 per kubeadm-config.
# See docs/architecture/security.md "Audit Logging & Anomaly Detection".
x_forwarded_for_authorized_addrs = "10.10.0.0/16"
x_forwarded_for_reject_not_authorized = false
x_forwarded_for_reject_not_present = false
}
storage "raft" {