Commit graph

3 commits

Author SHA1 Message Date
Viktor Barzin
e097b7eb29 fix(ingress): wire up backend_protocol, remove dead ssl_redirect variable
Post nginx→Traefik migration cleanup:
- backend_protocol now sets serversscheme + serverstransport annotations
  for HTTPS backends (k8s-dashboard, pfsense, nas, idrac, proxmox, etc.)
- Remove ssl_redirect variable (nginx-only, silently ignored by Traefik)
  and all 9 caller references

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 08:45:56 +00:00
Viktor Barzin
7ec627f365 right-size memory requests to unblock GPU workloads and fix dbaas quota [ci skip]
- nvidia: custom LimitRange (128Mi default, was 1Gi from Kyverno) to stop
  inflating GPU operator init containers; saves ~2.5Gi on GPU node
- nvidia: dcgm-exporter 1536Mi → 768Mi (actual usage 489Mi)
- monitoring: prometheus server 4Gi → 3Gi (actual usage 2.6Gi)
- onlyoffice: 2304Mi → 1536Mi (actual usage 1.3Gi)
- immich: frame explicit 64Mi resources (was getting 1Gi LimitRange default)
- dbaas: quota limits.memory 20Gi → 24Gi to fit 3rd MySQL replica

Root cause: Kyverno tier-2-gpu LimitRange injected 1Gi on every NVIDIA init
container (no explicit resources), wasting ~2.5Gi scheduling overhead on the
GPU node. Combined with over-requesting, frigate and immich-ml couldn't schedule.
2026-03-18 08:04:08 +00:00
Viktor Barzin
f7c3a338a5 extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip]
Phase 2 of platform stack split. 5 more modules extracted into
independent stacks. All applied successfully with zero destroys.
Cloudflared now reads k8s_users from Vault directly to compute
user_domains. Woodpecker pipeline runs all 8 extracted stacks
in parallel. Memory bumped to 6Gi for 9 concurrent TF processes.
Platform reduced from 27 to 19 modules.
2026-03-18 08:04:07 +00:00