No description
Find a file
Viktor Barzin aded77d5ab monitoring: alerts for proxmox-csi LUN saturation per node
Vaultwarden + 18 pods got stuck for 7h on 2026-05-26 when k8s-node4 went
down: surviving workloads piled onto node1 and hit the
csi.proxmox.sinextra.dev/max-volume-attachments=28 cap. The Proxmox VM also
had 5 stale scsi entries (PVCs long-migrated to other nodes but never
removed from VM config), which bypassed the K8s scheduler safety until the
plugin returned 'no free lun found' at attach time.

Three new alerts on the kube_volumeattachment_info count per node:
- warning at 24/28 (>= 85%), 10m
- critical at 27/28 (1 slot left), 3m
- critical at 28/28 (cap reached), 1m

Also whitelisted kube_volumeattachment_info — the metric was being dropped
by the disk-write-reduction filter (id=559) and the alert queries returned
zero series until it's kept.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-26 02:45:13 +00:00
.beads bd init: initialize beads issue tracking 2026-04-06 15:38:46 +03:00
.claude nfs-mirror: append transferred files to offsite-sync manifest 2026-05-24 15:32:22 +00:00
.git-crypt Add 1 git-crypt collaborator [ci skip] 2025-10-24 18:00:00 +00:00
.github chore: sort outage report service list alphabetically 2026-04-15 18:01:54 +00:00
.planning [ci skip] add auto-generated tiers.tf, planning docs, and helm chart cache 2026-03-06 23:55:57 +00:00
.woodpecker ci(drift-detection): generate kubeconfig from projected SA token 2026-05-09 11:31:53 +00:00
ci ci: retrigger image rebuild — prior pipeline aborted during PG outage 2026-05-11 19:30:34 +00:00
cli add IPv6 connectivity via Hurricane Electric 6in4 tunnel 2026-03-23 02:22:00 +02:00
diagram [docs] TrueNAS decommission cleanup — remove references from active docs 2026-04-19 16:55:43 +00:00
docs nvidia: fix driver install deadlock + extend startup probe 2026-05-25 11:53:44 +00:00
modules anubis: HA with shared valkey/redis store + replicas=2 2026-05-16 11:54:54 +00:00
playbooks [ci skip] Reduce node config drift: GPU label, OIDC idempotency, node-exporter, rebuild docs 2026-02-22 22:59:38 +00:00
scripts backup: exclude /anca-elements/ from nfs-mirror + offsite Step 1 2026-05-24 18:34:41 +00:00
secrets Woodpecker CI Update TLS Certificates Commit 2026-05-24 00:03:48 +00:00
stacks monitoring: alerts for proxmox-csi LUN saturation per node 2026-05-26 02:45:13 +00:00
state/stacks state(cnpg): update encrypted state 2026-05-22 15:00:04 +00:00
.gitattributes Add broker-sync Terraform stack (#7) 2026-04-17 21:17:45 +01:00
.gitignore .gitignore: ignore terragrunt_rendered.json debug output 2026-04-18 13:18:05 +00:00
.gitleaksignore recruiter-responder: public /cb ingress for Telegram URL-button callbacks 2026-05-15 23:46:49 +00:00
.sops.yaml state: per-stack Transit keys for namespace-owner access control 2026-03-17 23:08:18 +00:00
AGENTS.md Phase 0: install Keel + Kyverno auto-update annotation injector 2026-05-16 12:19:34 +00:00
config.tfvars [infra] Update RPi Sofia DNS: 192.168.1.16 → 192.168.1.10 2026-04-22 10:55:34 +00:00
CONTEXT.md docs: add CONTEXT.md domain glossary [ci skip] 2026-05-16 11:48:19 +00:00
CONTRIBUTING.md multi-user access: fix template memory default, add storage quota, add CONTRIBUTING.md [ci skip] 2026-03-19 23:49:15 +00:00
LICENSE.txt Drone CI Update TLS Certificates Commit 2025-10-12 00:13:18 +00:00
MEMORY.md Update MEMORY.md timestamp 2026-03-07 16:43:15 +00:00
README.md add architecture documentation for all infrastructure subsystems [ci skip] 2026-03-24 00:55:25 +02:00
terragrunt.hcl kyverno(wave1): swap kubernetes_manifest → kubectl_manifest + flip 3 security policies to Enforce 2026-05-18 20:10:27 +00:00
tiers.tf [ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk 2026-02-28 19:08:06 +00:00

This repo contains my infra-as-code sources.

My infrastructure is built using Terraform, Kubernetes and CI/CD is done using Woodpecker CI.

Read more by visiting my website: https://viktorbarzin.me

Documentation

Full architecture documentation is available in docs/ — covering networking, storage, security, monitoring, secrets, CI/CD, databases, and more.

Adding a New User (Admin)

Adding a new namespace-owner to the cluster requires three steps — no code changes needed.

1. Authentik Group Assignment

In the Authentik admin UI, add the user to:

  • kubernetes-namespace-owners group (grants OIDC group claim for K8s RBAC)
  • Headscale Users group (if they need VPN access)

2. Vault KV Entry

Add a JSON entry to secret/platformk8s_users key in Vault:

"username": {
  "role": "namespace-owner",
  "email": "user@example.com",
  "namespaces": ["username"],
  "domains": ["myapp"],
  "quota": {
    "cpu_requests": "2",
    "memory_requests": "4Gi",
    "memory_limits": "8Gi",
    "pods": "20"
  }
}
  • username key must match the user's Forgejo username (for Woodpecker admin access)
  • namespaces — K8s namespaces to create and grant admin access to
  • domains — subdomains under viktorbarzin.me for Cloudflare DNS records
  • quota — resource limits per namespace (defaults shown above)

3. Apply Stacks

vault login -method=oidc

cd stacks/vault && terragrunt apply --non-interactive
# Creates: namespace, Vault policy, identity entity, K8s deployer role

cd ../platform && terragrunt apply --non-interactive
# Creates: RBAC bindings, ResourceQuota, TLS secret, DNS records

cd ../woodpecker && terragrunt apply --non-interactive
# Adds user to Woodpecker admin list

What Gets Auto-Generated

Resource Stack
Kubernetes namespace vault
Vault policy (namespace-owner-{user}) vault
Vault identity entity + OIDC alias vault
K8s deployer Role + Vault K8s role vault
RBAC RoleBinding (namespace admin) platform
RBAC ClusterRoleBinding (cluster read-only) platform
ResourceQuota platform
TLS secret in namespace platform
Cloudflare DNS records platform
Woodpecker admin access woodpecker

New User Onboarding

If you've been added as a namespace-owner, follow these steps to get started.

1. Join the VPN

# Install Tailscale: https://tailscale.com/download
tailscale login --login-server https://headscale.viktorbarzin.me
# Send the registration URL to Viktor, wait for approval
ping 10.0.20.100  # verify connectivity

2. Install Tools

Run the setup script to install kubectl, kubelogin, Vault CLI, Terraform, and Terragrunt:

# macOS
bash <(curl -fsSL https://k8s-portal.viktorbarzin.me/setup/script?os=mac)

# Linux
bash <(curl -fsSL https://k8s-portal.viktorbarzin.me/setup/script?os=linux)

3. Authenticate

# Log into Vault (opens browser for SSO)
vault login -method=oidc

# Test kubectl (opens browser for OIDC login)
kubectl get pods -n YOUR_NAMESPACE

4. Deploy Your First App

# Clone the infra repo
git clone https://github.com/ViktorBarzin/infra.git && cd infra

# Copy the stack template
cp -r stacks/_template stacks/myapp
mv stacks/myapp/main.tf.example stacks/myapp/main.tf

# Edit main.tf — replace all <placeholders>

# Store secrets in Vault
vault kv put secret/YOUR_USERNAME/myapp DB_PASSWORD=secret123

# Submit a PR
git checkout -b feat/myapp
git add stacks/myapp/
git commit -m "add myapp stack"
git push -u origin feat/myapp

After review and merge, an admin runs cd stacks/myapp && terragrunt apply.

5. Set Up CI/CD (Optional)

Create .woodpecker.yml in your app's Forgejo repo:

steps:
  - name: build
    image: woodpeckerci/plugin-docker-buildx
    settings:
      repo: YOUR_DOCKERHUB_USER/myapp
      tag: ["${CI_PIPELINE_NUMBER}", "latest"]
      username:
        from_secret: dockerhub-username
      password:
        from_secret: dockerhub-token
      platforms: linux/amd64

  - name: deploy
    image: hashicorp/vault:1.18.1
    commands:
      - export VAULT_ADDR=http://vault-active.vault.svc.cluster.local:8200
      - export VAULT_TOKEN=$(vault write -field=token auth/kubernetes/login
          role=ci jwt=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token))
      - KUBE_TOKEN=$(vault write -field=service_account_token
          kubernetes/creds/YOUR_NAMESPACE-deployer
          kubernetes_namespace=YOUR_NAMESPACE)
      - kubectl --server=https://kubernetes.default.svc
          --token=$KUBE_TOKEN
          --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          -n YOUR_NAMESPACE set image deployment/myapp
          myapp=YOUR_DOCKERHUB_USER/myapp:${CI_PIPELINE_NUMBER}

Useful Commands

# Check your pods
kubectl get pods -n YOUR_NAMESPACE

# View quota usage
kubectl describe resourcequota -n YOUR_NAMESPACE

# Store/read secrets
vault kv put secret/YOUR_USERNAME/myapp KEY=value
vault kv get secret/YOUR_USERNAME/myapp

# Get a short-lived K8s deploy token
vault write kubernetes/creds/YOUR_NAMESPACE-deployer \
  kubernetes_namespace=YOUR_NAMESPACE

Important Rules

  • All changes go through Terraform — never kubectl apply/edit/patch directly
  • Never put secrets in code — use Vault: vault kv put secret/YOUR_USERNAME/...
  • Always use a PR — never push directly to master
  • Docker images: build for linux/amd64, use versioned tags (not :latest)

git-crypt setup

To decrypt the secrets, you need to setup git-crypt.

  1. Install git-crypt.
  2. Setup gpg keys on the machine
  3. git-crypt unlock

This will unlock the secrets and will lock them on commit