[ci skip] Infrastructure hardening: security, monitoring, reliability, maintainability

Phase 1 - Critical Security:
- Netbox: move hardcoded DB/superuser passwords to variables
- MeshCentral: disable public registration, add Authentik auth
- Traefik: disable insecure API dashboard (api.insecure=false)
- Traefik: configure forwarded headers with Cloudflare trusted IPs

Phase 2 - Security Hardening:
- Add security headers middleware (HSTS, X-Frame-Options, nosniff, etc.)
- Add Kyverno pod security policies in audit mode (privileged, host
  namespaces, SYS_ADMIN, trusted registries)
- Tighten rate limiting (avg=10, burst=50)
- Add Authentik protection to grampsweb

Phase 3 - Monitoring & Alerting:
- Add critical service alerts (PostgreSQL, MySQL, Redis, Headscale,
  Authentik, Loki)
- Increase Loki retention from 7 to 30 days (720h)
- Add predictive PV filling alert (predict_linear)
- Re-enable Hackmd and Privatebin down alerts

Phase 4 - Reliability:
- Add resource requests/limits to Redis, DBaaS, Technitium, Headscale,
  Vaultwarden, Uptime Kuma
- Increase Alloy DaemonSet memory to 512Mi/1Gi

Phase 6 - Maintainability:
- Extract duplicated tiers locals to terragrunt.hcl generate block
  (removed from 67 stacks)
- Replace hardcoded NFS IP 10.0.10.15 with var.nfs_server (114
  instances across 63 files)
- Replace hardcoded Redis/PostgreSQL/MySQL/Ollama/mail host references
  with variables across ~35 stacks
- Migrate xray raw ingress resources to ingress_factory modules
This commit is contained in:
Viktor Barzin 2026-02-23 22:05:28 +00:00
parent 1b4737c90c
commit 89a6e08245
104 changed files with 773 additions and 920 deletions

View file

@ -8,6 +8,7 @@ variable "crowdsec_dash_machine_id" { type = string } # used for web dash
variable "crowdsec_dash_machine_password" { type = string } # used for web dash
variable "tier" { type = string }
variable "slack_webhook_url" { type = string }
variable "mysql_host" { type = string }
module "tls_secret" {
source = "../../../../modules/kubernetes/setup_tls_secret"
@ -99,7 +100,7 @@ resource "helm_release" "crowdsec" {
repository = "https://crowdsecurity.github.io/helm-charts"
chart = "crowdsec"
values = [templatefile("${path.module}/values.yaml", { homepage_username = var.homepage_username, homepage_password = var.homepage_password, DB_PASSWORD = var.db_password, ENROLL_KEY = var.enroll_key, SLACK_WEBHOOK_URL = var.slack_webhook_url })]
values = [templatefile("${path.module}/values.yaml", { homepage_username = var.homepage_username, homepage_password = var.homepage_password, DB_PASSWORD = var.db_password, ENROLL_KEY = var.enroll_key, SLACK_WEBHOOK_URL = var.slack_webhook_url, mysql_host = var.mysql_host })]
timeout = 3600
}

View file

@ -81,7 +81,7 @@ lapi:
- name: MB_DB_PASS
value: "${DB_PASSWORD}"
- name: MB_DB_HOST
value: "mysql.dbaas.svc.cluster.local"
value: "${mysql_host}"
- name: MB_EMAIL_SMTP_USERNAME
value: "info@viktorbarzin.me"
@ -166,7 +166,7 @@ config:
user: crowdsec
password: ${DB_PASSWORD}
db_name: crowdsec
host: mysql.dbaas.svc.cluster.local
host: ${mysql_host}
port: 3306
api:
server: