ingress_factory: replace protected bool with auth enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
317d6aa99f
commit
e4f806abe3
100 changed files with 351 additions and 165 deletions
|
|
@ -125,6 +125,7 @@ resource "kubernetes_service" "idrac-redfish-exporter" {
|
|||
|
||||
module "idrac-redfish-exporter-ingress" {
|
||||
source = "../../../../modules/kubernetes/ingress_factory"
|
||||
auth = "required"
|
||||
namespace = kubernetes_namespace.monitoring.metadata[0].name
|
||||
name = "idrac-redfish-exporter"
|
||||
root_domain = "viktorbarzin.lan"
|
||||
|
|
|
|||
|
|
@ -1874,6 +1874,35 @@ serverFiles:
|
|||
severity: warning
|
||||
annotations:
|
||||
summary: "ResourceQuota {{ $labels.namespace }}/{{ $labels.resourcequota }} {{ $labels.resource }} at {{ $value | printf \"%.1f\" }} — workloads may fail to reschedule"
|
||||
# K8sVersionSkew: kubelet on any node disagrees with the apiserver's gitVersion.
|
||||
# Catches a half-done kubeadm rollout — e.g. master at 1.34.5 but a worker
|
||||
# still on 1.34.2 after the agent aborted mid-flight. Distinct gitVersion
|
||||
# count >1 across kubernetes-nodes + kubernetes-apiservers means skew exists.
|
||||
# 30m for: gives a normal rolling upgrade (master + 4 workers + 10-min soaks
|
||||
# ≈ 60-90 min) room to be in mid-progress without firing during a healthy
|
||||
# run — but only because Prometheus only counts a node post-restart, and the
|
||||
# agent's soak between workers exceeds 10min anyway.
|
||||
- alert: K8sVersionSkew
|
||||
expr: count(count by (git_version) (kubernetes_build_info{job=~"kubernetes-nodes|kubernetes-apiservers"})) > 1
|
||||
for: 30m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Kubelet/apiserver gitVersion skew detected — possible half-done k8s upgrade. Inspect: kubectl get nodes -o jsonpath='{.items[*].status.nodeInfo.kubeletVersion}'"
|
||||
# EtcdPreUpgradeSnapshotMissing: the k8s-version-upgrade agent pushes
|
||||
# k8s_upgrade_in_flight=1 when it starts, and k8s_upgrade_snapshot_taken=1
|
||||
# after the etcdctl snapshot is verified. If we see in_flight=1 with no
|
||||
# corresponding snapshot_taken=1 after 10 min, the agent has skipped or
|
||||
# failed the snapshot — that's a critical safety hole.
|
||||
- alert: EtcdPreUpgradeSnapshotMissing
|
||||
expr: |
|
||||
k8s_upgrade_in_flight == 1
|
||||
unless on() k8s_upgrade_snapshot_taken == 1
|
||||
for: 10m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "K8s upgrade is in flight but no etcd snapshot was recorded — pipeline pre-flight failed silently"
|
||||
- name: "Traefik Ingress"
|
||||
rules:
|
||||
- alert: TraefikDown
|
||||
|
|
|
|||
|
|
@ -124,6 +124,7 @@ resource "kubernetes_service" "snmp-exporter" {
|
|||
|
||||
module "snmp-exporter-ingress" {
|
||||
source = "../../../../modules/kubernetes/ingress_factory"
|
||||
auth = "required"
|
||||
namespace = kubernetes_namespace.monitoring.metadata[0].name
|
||||
name = "snmp-exporter"
|
||||
root_domain = "viktorbarzin.lan"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue