No description
Find a file
Viktor Barzin 194281e527 right-size cluster memory: reduce overprovisioned, fix under-provisioned services
Phase 1 - Quick wins (~4.5 Gi saved):
- democratic-csi: add explicit sidecar resources (64-80Mi vs 256Mi LimitRange default)
- caretta: 768Mi → 600Mi (VPA upper 485Mi)
- immich-ml: 4Gi → 3584Mi (VPA upper 2.95Gi, GPU margin)
- onlyoffice: 3Gi → 2304Mi (VPA upper 1.82Gi)

Phase 2 - Safety fixes (prevent OOMKills):
- frigate: 2Gi/8Gi → 5Gi/10Gi (VPA upper 7.7Gi, was 4% headroom)
- openclaw: 1280Mi req → 2Gi req=limit (documented 2Gi requirement)

Phase 3 - Additional right-sizing:
- authentik workers: 1Gi → 896Mi x3 (VPA upper 722Mi)
- shlink: 512Mi/768Mi → 960Mi req=limit (VPA upper 780Mi, safety increase)

Phase 4 - Burstable QoS for lower tiers:
- tier-3-edge: 128Mi/128Mi → 96Mi req / 192Mi limit
- tier-4-aux: 128Mi/128Mi → 64Mi req / 256Mi limit

Phase 5 - Monitoring:
- Add ClusterMemoryRequestsHigh alert (>85% allocatable, 15m)
- Add ContainerNearOOM alert (>85% limit, 30m)
- Add PodUnschedulable alert (5m, critical)

Cluster: 92.7% → 90.8% memory requests. Stirling-pdf now schedulable.
2026-03-15 15:30:18 +00:00
.claude add name/description/tools to review-loop agent frontmatter [ci skip] 2026-03-15 11:14:31 +00:00
.git-crypt Add 1 git-crypt collaborator [ci skip] 2025-10-24 18:00:00 +00:00
.planning [ci skip] add auto-generated tiers.tf, planning docs, and helm chart cache 2026-03-06 23:55:57 +00:00
.woodpecker [ci skip] update AGENTS.md + CLAUDE.md with SOPS workflow, add k8s-portal CI pipeline 2026-03-07 15:37:19 +00:00
cli update @ record as well 2024-12-02 21:51:05 +00:00
diagram [ci skip] Sunset Drone CI: remove all artifacts, DNS, configs, and references 2026-02-23 19:38:55 +00:00
docs/plans [ci skip] k8s portal: fix setup script + add onboarding hub (5 new pages) 2026-03-07 15:06:26 +00:00
modules exclude manifest requests from nginx registry cache 2026-03-14 23:42:17 +00:00
playbooks [ci skip] Reduce node config drift: GPU label, OIDC idempotency, node-exporter, rebuild docs 2026-02-22 22:59:38 +00:00
scripts fix openclaw config mount and OOM: use init container, increase memory to 2Gi 2026-03-14 23:42:17 +00:00
secrets [ci skip] remove atuin: destroy stack, DNS, NFS export, PostgreSQL credentials 2026-03-06 20:11:14 +00:00
stacks right-size cluster memory: reduce overprovisioned, fix under-provisioned services 2026-03-15 15:30:18 +00:00
.gitattributes add git-crypt terraform 2021-02-14 18:17:40 +00:00
.gitignore Archive terraform.tfvars — secrets now in SOPS 2026-03-11 21:16:11 +00:00
.sops.yaml [ci skip] phase 1: SOPS tooling setup (.sops.yaml, scripts/tg, .gitignore) 2026-03-07 13:57:42 +00:00
AGENTS.md update claude knowledge: infra operational learnings from commit history [ci skip] 2026-03-15 10:46:45 +00:00
config.tfvars add vaultwarden daily backup CronJob to NFS 2026-03-15 00:03:59 +00:00
LICENSE.txt Drone CI Update TLS Certificates Commit 2025-10-12 00:13:18 +00:00
MEMORY.md Update MEMORY.md timestamp 2026-03-07 16:43:15 +00:00
README.md [ci skip] Sunset Drone CI: remove all artifacts, DNS, configs, and references 2026-02-23 19:38:55 +00:00
secrets.sops.json add vaultwarden daily backup CronJob to NFS 2026-03-15 00:03:59 +00:00
setup-monitoring.sh fix(monitoring): Add setup script for automated health check environment 2026-03-13 13:57:11 +00:00
terragrunt.hcl migrate all secrets from SOPS to Vault KV 2026-03-14 17:15:48 +00:00
tiers.tf [ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk 2026-02-28 19:08:06 +00:00

This repo contains my infra-as-code sources.

My infrastructure is built using Terraform, Kubernetes and CI/CD is done using Woodpecker CI.

Read more by visiting my website: https://viktorbarzin.me

git-crypt setup

To decrypt the secrets, you need to setup git-crypt.

  1. Install git-crypt.
  2. Setup gpg keys on the machine
  3. git-crypt unlock

This will unlock the secrets and will lock them on commit