infra/stacks/vpa/modules/vpa/main.tf

176 lines
4.8 KiB
Terraform
Raw Normal View History

variable "tls_secret_name" {
type = string
sensitive = true
}
variable "tier" { type = string }
resource "kubernetes_namespace" "vpa" {
metadata {
name = "vpa"
labels = {
tier = var.tier
keel: enroll 11 more namespaces (operators + critical infra) Per user decision, removed authentik, kyverno, metallb-system, external-secrets, proxmox-csi, nfs-csi, vpa, sealed-secrets, infra-maintenance from the policy-level exclude list, and added keel.sh/enrolled=true to aiostreams (alive — 1/1 Running, despite being earlier flagged as scaled-to-0) and woodpecker. Net cluster coverage: 197/227 workloads on safe-force (86%), up from 170/227 (74%). All 197 are paired with match-tag=true (digest-only). Remaining 7 namespaces in Kyverno exclude list (irreducible): - keel (self-update) - calico-system + tigera-operator (operator-managed Installation CR) - cnpg-system + dbaas (state-coupled) - nvidia (chart-pinned at 570.195.03 per code-8vr0 until NVIDIA ships ubuntu26.04 driver images) - kube-system (k8s built-ins) Files: - stacks/kyverno/modules/kyverno/keel-annotations.tf — exclude list trimmed from 16 → 7 - stacks/authentik, kyverno, proxmox-csi, nfs-csi, vpa, sealed-secrets, servarr/aiostreams, metallb (creates ns "metallb-system"), woodpecker — added keel.sh/enrolled=true label on kubernetes_namespace resource - infra-maintenance was in the policy exclude but the namespace doesn't actually exist in the cluster; the removal is a no-op there Applied via kubectl patch on the live ClusterPolicy + kubectl label on namespaces because the kubernetes provider v3.1.0 panics on Kyverno ClusterPolicy refresh — TF source has the desired state for next clean apply on a fixed provider. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 20:59:14 +00:00
"keel.sh/enrolled" = "true"
}
}
}
module "tls_secret" {
source = "../../../../modules/kubernetes/setup_tls_secret"
namespace = kubernetes_namespace.vpa.metadata[0].name
tls_secret_name = var.tls_secret_name
}
# -----------------------------------------------------------------------------
# VPA — Vertical Pod Autoscaler (Fairwinds Helm chart)
# -----------------------------------------------------------------------------
resource "helm_release" "vpa" {
namespace = kubernetes_namespace.vpa.metadata[0].name
create_namespace = false
name = "vpa"
atomic = true
repository = "https://charts.fairwinds.com/stable"
chart = "vpa"
values = [yamlencode({
recommender = {
enabled = true
resources = {
requests = {
cpu = "50m"
memory = "200Mi"
}
limits = {
memory = "200Mi"
}
}
}
updater = {
enabled = true
resources = {
requests = {
cpu = "50m"
memory = "200Mi"
}
limits = {
memory = "200Mi"
}
}
}
admissionController = {
enabled = true
resources = {
requests = {
cpu = "50m"
memory = "200Mi"
}
limits = {
memory = "200Mi"
}
}
}
})]
}
# -----------------------------------------------------------------------------
# Goldilocks — VPA dashboard (Fairwinds Helm chart)
# -----------------------------------------------------------------------------
resource "helm_release" "goldilocks" {
namespace = kubernetes_namespace.vpa.metadata[0].name
create_namespace = false
name = "goldilocks"
atomic = true
repository = "https://charts.fairwinds.com/stable"
chart = "goldilocks"
values = [yamlencode({
controller = {
flags = {
on-by-default = "true"
}
}
dashboard = {
replicaCount = 1
flags = {
on-by-default = "true"
}
}
})]
depends_on = [helm_release.vpa]
}
# -----------------------------------------------------------------------------
# Ingress — Goldilocks dashboard at goldilocks.viktorbarzin.me
# -----------------------------------------------------------------------------
module "ingress" {
source = "../../../../modules/kubernetes/ingress_factory"
[infra] Auto-create Cloudflare DNS records from ingress_factory ## Context Deploying new services required manually adding hostnames to cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars — a separate file from the service stack. This was frequently forgotten, leaving services unreachable externally. ## This change: - Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory` modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP). - Simplify cloudflared tunnel from 100 per-hostname rules to wildcard `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing. - Add global Cloudflare provider via terragrunt.hcl (separate cloudflare_provider.tf with Vault-sourced API key). - Migrate 118 hostnames from centralized config.tfvars to per-service dns_type. 17 hostnames remain centrally managed (Helm ingresses, special cases). - Update docs, AGENTS.md, CLAUDE.md, dns.md runbook. ``` BEFORE AFTER config.tfvars (manual list) stacks/<svc>/main.tf | module "ingress" { v dns_type = "proxied" stacks/cloudflared/ } for_each = list | cloudflare_record auto-creates tunnel per-hostname cloudflare_record + annotation ``` ## What is NOT in this change: - Uptime Kuma monitor migration (still reads from config.tfvars) - 17 remaining centrally-managed hostnames (Helm, special cases) - Removal of allow_overwrite (keep until migration confirmed stable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:45:04 +00:00
dns_type = "proxied"
namespace = kubernetes_namespace.vpa.metadata[0].name
name = "goldilocks"
service_name = "goldilocks-dashboard"
port = 80
tls_secret_name = var.tls_secret_name
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default false → unprotected) variable in `modules/kubernetes/ingress_factory` with `auth = string` enum (default "required" → fail-closed). Touches every ingress_factory caller so the audit decision is recorded explicitly in code. ingress_factory (Phase 3): - `auth = "required"`: standard Authentik forward-auth (the legacy `protected = true` semantic). - `auth = "public"`: forward-auth via the new `authentik-forward-auth-public` middleware → dedicated public outpost → guest auto-bind. Logged-in users keep their real identity. - `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost itself. - `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated ingresses don't need anti-AI noise; the auth flow already discourages bots). Audit pass (Phase 4) across 96 ingress_factory call sites: - 49 explicit `protected = true` → `auth = "required"` - 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3) - 64 previously-default (no protected line) → `auth = "required"` ADDED, then reviewed individually: * 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack, homepage, wrongmove UI, privatebin) → `auth = "none"` * 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC, xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich location ingestion, immich frame kiosk, headscale CP, send anonymous drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) → `auth = "none"` * Remaining ~33 → `auth = "required"` confirmed (admin tools, internal UIs, services without app-level auth) - Smoke-test promotions to `auth = "public"`: fire-planner public UI, k8s-portal API, insta2spotify callback. Three call sites in wrapper modules (`stacks/freedify/factory/`, `stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected` bool — they translate to `auth` internally, out of scope for this rename. Behavior change: previously-default ingresses now fail closed (require Authentik login) unless explicitly flipped to `auth = "none"` or `auth = "public"`. This is the audit goal — no more accidentally-unprotected surfaces. Sites that were intentionally public (Anubis content, native APIs, webhooks) are now explicitly recorded as `auth = "none"`. Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via `terraform fmt -recursive` during the audit. Behavior-neutral. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
auth = "required"
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "Goldilocks"
"gethomepage.dev/description" = "Resource recommendations"
"gethomepage.dev/icon" = "mdi-scale-balance"
"gethomepage.dev/group" = "Core Platform"
"gethomepage.dev/pod-selector" = ""
}
depends_on = [helm_release.goldilocks]
}
# -----------------------------------------------------------------------------
# Kyverno policy — label namespaces for VPA observe-only mode
# -----------------------------------------------------------------------------
# Goldilocks reads the goldilocks.fairwinds.com/vpa-update-mode label on
# namespaces to decide the updateMode for VPA objects it creates.
# All namespaces get "off" — Terraform is the authoritative source of truth
# for container resources. Goldilocks provides recommendations only.
resource "kubernetes_manifest" "vpa_auto_mode_label" {
manifest = {
apiVersion = "kyverno.io/v1"
kind = "ClusterPolicy"
metadata = {
name = "goldilocks-vpa-auto-mode"
annotations = {
"policies.kyverno.io/title" = "Goldilocks VPA Observe-Only Mode"
"policies.kyverno.io/description" = "Sets VPA update mode to off for all namespaces. Terraform owns container resources; Goldilocks provides recommendations only."
}
}
spec = {
rules = [
{
name = "label-vpa-off-all"
match = {
any = [
{
resources = {
kinds = ["Namespace"]
}
}
]
}
mutate = {
patchStrategicMerge = {
metadata = {
labels = {
"goldilocks.fairwinds.com/vpa-update-mode" = "off"
}
}
}
}
},
]
}
}
depends_on = [helm_release.goldilocks]
}