No description
Find a file
Viktor Barzin dbf7732a66 [uptime-kuma] Bump CPU + memory requests to reduce TTFB jitter
## Context
Uptime Kuma TTFB was bimodal — fast ~150ms responses mixed with slow
~3s responses — median 1.7s, p95 3.2s across 20 samples. CPU request
was 50m (5% of one core) against a Node.js process that handles ~190
monitors plus SQLite DB maintenance. Memory request was 64Mi while
actual RSS sat around 221Mi, so the pod was also running above its
guaranteed memory floor and subject to eviction pressure when nodes got
tight.

CPU limits are intentionally absent cluster-wide (CFS throttling caused
more pain than it solved), so the only knob to give the scheduler a
higher floor is the request itself. Raising the request makes the node
reserve more CPU for the pod and lets the kernel's CFS weight it more
generously when the node is busy — should reduce the tail on the slow
path without introducing throttling.

## This change
- requests.cpu: 50m -> 100m
- requests.memory: 64Mi -> 128Mi
- limits.memory: unchanged at 512Mi
- limits.cpu: still unset (explicit — cluster-wide rule)

## What is NOT in this change
- No CPU limit added
- No readiness/liveness probe tuning
- No replica count change (still 1, Recreate strategy)
- No DB layer / SQLite tuning

## Measurements (20 curl samples of https://uptime.viktorbarzin.me/)

Before:
  min    0.143s
  median 1.727s
  p95    3.163s
  max    3.204s
  mean   1.768s

After:
  min    0.149s
  median 1.228s
  p95    3.154s
  max    3.283s
  mean   1.590s

Median dropped ~29% (1.73s -> 1.23s). Tail (p95/max) essentially
unchanged — the slow bucket appears driven by something other than
CPU scheduling (likely socket.io / SSR render path inside the app,
or TLS/cf-tunnel handshake — worth a separate investigation).

Closes: code-79d
2026-04-18 11:11:39 +00:00
.beads bd init: initialize beads issue tracking 2026-04-06 15:38:46 +03:00
.claude [docs] Update anti-AI and rybbit docs after rewrite-body removal 2026-04-17 21:43:13 +00:00
.git-crypt Add 1 git-crypt collaborator [ci skip] 2025-10-24 18:00:00 +00:00
.github chore: sort outage report service list alphabetically 2026-04-15 18:01:54 +00:00
.planning [ci skip] add auto-generated tiers.tf, planning docs, and helm chart cache 2026-03-06 23:55:57 +00:00
.woodpecker [claude-agent-service] Migrate all pipelines from DevVM SSH to K8s HTTP 2026-04-18 10:12:02 +00:00
ci feat: CI/CD performance overhaul 2026-04-15 11:22:26 +00:00
cli add IPv6 connectivity via Hurricane Electric 6in4 tunnel 2026-03-23 02:22:00 +02:00
diagram [ci skip] Sunset Drone CI: remove all artifacts, DNS, configs, and references 2026-02-23 19:38:55 +00:00
docs [docs] Update auto-upgrade docs — new HTTP auth path + n8n expression gotcha 2026-04-18 10:42:11 +00:00
modules Add broker-sync Terraform stack (#7) 2026-04-17 21:17:45 +01:00
playbooks [ci skip] Reduce node config drift: GPU label, OIDC idempotency, node-exporter, rebuild docs 2026-02-22 22:59:38 +00:00
scripts [claude-agent-service] Migrate all pipelines from DevVM SSH to K8s HTTP 2026-04-18 10:12:02 +00:00
secrets deprecate TrueNAS: migrate Immich NFS to Proxmox, remove all 10.0.10.15 references [ci skip] 2026-04-13 14:42:07 +00:00
stacks [uptime-kuma] Bump CPU + memory requests to reduce TTFB jitter 2026-04-18 11:11:39 +00:00
state/stacks state(dbaas): update encrypted state 2026-04-17 22:33:13 +00:00
.gitattributes Add broker-sync Terraform stack (#7) 2026-04-17 21:17:45 +01:00
.gitignore chore: add pre-commit size guard and harden .gitignore 2026-04-15 14:13:18 +00:00
.sops.yaml state: per-stack Transit keys for namespace-owner access control 2026-03-17 23:08:18 +00:00
AGENTS.md [claude-agent-service] Migrate all pipelines from DevVM SSH to K8s HTTP 2026-04-18 10:12:02 +00:00
config.tfvars [infra] Auto-create Cloudflare DNS records from ingress_factory 2026-04-16 13:45:04 +00:00
CONTRIBUTING.md multi-user access: fix template memory default, add storage quota, add CONTRIBUTING.md [ci skip] 2026-03-19 23:49:15 +00:00
LICENSE.txt Drone CI Update TLS Certificates Commit 2025-10-12 00:13:18 +00:00
MEMORY.md Update MEMORY.md timestamp 2026-03-07 16:43:15 +00:00
README.md add architecture documentation for all infrastructure subsystems [ci skip] 2026-03-24 00:55:25 +02:00
setup-monitoring.sh fix(monitoring): Add setup script for automated health check environment 2026-03-13 13:57:11 +00:00
terragrunt.hcl [infra] Migrate Terraform state from local SOPS to PostgreSQL backend 2026-04-16 19:33:12 +00:00
tiers.tf [ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk 2026-02-28 19:08:06 +00:00

This repo contains my infra-as-code sources.

My infrastructure is built using Terraform, Kubernetes and CI/CD is done using Woodpecker CI.

Read more by visiting my website: https://viktorbarzin.me

Documentation

Full architecture documentation is available in docs/ — covering networking, storage, security, monitoring, secrets, CI/CD, databases, and more.

Adding a New User (Admin)

Adding a new namespace-owner to the cluster requires three steps — no code changes needed.

1. Authentik Group Assignment

In the Authentik admin UI, add the user to:

  • kubernetes-namespace-owners group (grants OIDC group claim for K8s RBAC)
  • Headscale Users group (if they need VPN access)

2. Vault KV Entry

Add a JSON entry to secret/platformk8s_users key in Vault:

"username": {
  "role": "namespace-owner",
  "email": "user@example.com",
  "namespaces": ["username"],
  "domains": ["myapp"],
  "quota": {
    "cpu_requests": "2",
    "memory_requests": "4Gi",
    "memory_limits": "8Gi",
    "pods": "20"
  }
}
  • username key must match the user's Forgejo username (for Woodpecker admin access)
  • namespaces — K8s namespaces to create and grant admin access to
  • domains — subdomains under viktorbarzin.me for Cloudflare DNS records
  • quota — resource limits per namespace (defaults shown above)

3. Apply Stacks

vault login -method=oidc

cd stacks/vault && terragrunt apply --non-interactive
# Creates: namespace, Vault policy, identity entity, K8s deployer role

cd ../platform && terragrunt apply --non-interactive
# Creates: RBAC bindings, ResourceQuota, TLS secret, DNS records

cd ../woodpecker && terragrunt apply --non-interactive
# Adds user to Woodpecker admin list

What Gets Auto-Generated

Resource Stack
Kubernetes namespace vault
Vault policy (namespace-owner-{user}) vault
Vault identity entity + OIDC alias vault
K8s deployer Role + Vault K8s role vault
RBAC RoleBinding (namespace admin) platform
RBAC ClusterRoleBinding (cluster read-only) platform
ResourceQuota platform
TLS secret in namespace platform
Cloudflare DNS records platform
Woodpecker admin access woodpecker

New User Onboarding

If you've been added as a namespace-owner, follow these steps to get started.

1. Join the VPN

# Install Tailscale: https://tailscale.com/download
tailscale login --login-server https://headscale.viktorbarzin.me
# Send the registration URL to Viktor, wait for approval
ping 10.0.20.100  # verify connectivity

2. Install Tools

Run the setup script to install kubectl, kubelogin, Vault CLI, Terraform, and Terragrunt:

# macOS
bash <(curl -fsSL https://k8s-portal.viktorbarzin.me/setup/script?os=mac)

# Linux
bash <(curl -fsSL https://k8s-portal.viktorbarzin.me/setup/script?os=linux)

3. Authenticate

# Log into Vault (opens browser for SSO)
vault login -method=oidc

# Test kubectl (opens browser for OIDC login)
kubectl get pods -n YOUR_NAMESPACE

4. Deploy Your First App

# Clone the infra repo
git clone https://github.com/ViktorBarzin/infra.git && cd infra

# Copy the stack template
cp -r stacks/_template stacks/myapp
mv stacks/myapp/main.tf.example stacks/myapp/main.tf

# Edit main.tf — replace all <placeholders>

# Store secrets in Vault
vault kv put secret/YOUR_USERNAME/myapp DB_PASSWORD=secret123

# Submit a PR
git checkout -b feat/myapp
git add stacks/myapp/
git commit -m "add myapp stack"
git push -u origin feat/myapp

After review and merge, an admin runs cd stacks/myapp && terragrunt apply.

5. Set Up CI/CD (Optional)

Create .woodpecker.yml in your app's Forgejo repo:

steps:
  - name: build
    image: woodpeckerci/plugin-docker-buildx
    settings:
      repo: YOUR_DOCKERHUB_USER/myapp
      tag: ["${CI_PIPELINE_NUMBER}", "latest"]
      username:
        from_secret: dockerhub-username
      password:
        from_secret: dockerhub-token
      platforms: linux/amd64

  - name: deploy
    image: hashicorp/vault:1.18.1
    commands:
      - export VAULT_ADDR=http://vault-active.vault.svc.cluster.local:8200
      - export VAULT_TOKEN=$(vault write -field=token auth/kubernetes/login
          role=ci jwt=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token))
      - KUBE_TOKEN=$(vault write -field=service_account_token
          kubernetes/creds/YOUR_NAMESPACE-deployer
          kubernetes_namespace=YOUR_NAMESPACE)
      - kubectl --server=https://kubernetes.default.svc
          --token=$KUBE_TOKEN
          --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          -n YOUR_NAMESPACE set image deployment/myapp
          myapp=YOUR_DOCKERHUB_USER/myapp:${CI_PIPELINE_NUMBER}

Useful Commands

# Check your pods
kubectl get pods -n YOUR_NAMESPACE

# View quota usage
kubectl describe resourcequota -n YOUR_NAMESPACE

# Store/read secrets
vault kv put secret/YOUR_USERNAME/myapp KEY=value
vault kv get secret/YOUR_USERNAME/myapp

# Get a short-lived K8s deploy token
vault write kubernetes/creds/YOUR_NAMESPACE-deployer \
  kubernetes_namespace=YOUR_NAMESPACE

Important Rules

  • All changes go through Terraform — never kubectl apply/edit/patch directly
  • Never put secrets in code — use Vault: vault kv put secret/YOUR_USERNAME/...
  • Always use a PR — never push directly to master
  • Docker images: build for linux/amd64, use versioned tags (not :latest)

git-crypt setup

To decrypt the secrets, you need to setup git-crypt.

  1. Install git-crypt.
  2. Setup gpg keys on the machine
  3. git-crypt unlock

This will unlock the secrets and will lock them on commit