No description
Find a file
Viktor Barzin bc5aba34b6 meshcentral: fix agent connectivity behind Authentik + TLS-offload Traefik
Two root causes kept all 8 mesh agents (incl. family laptops) offline:

1. The single ingress gated the ENTIRE site (path "/") behind Authentik
   forward-auth, so the agent/relay endpoints (/agent.ashx, /meshrelay.ashx,
   /control.ashx, etc.) got 302-bounced to SSO. Native mesh clients can't do
   the OAuth cookie dance. Fix: add a second ingress_factory (auth="none")
   path-scoped to the agent endpoints, pointing at the same meshcentral
   service. Traefik routes by rule length so these out-prioritise the "/"
   catch-all; the human web UI stays Authentik-gated.

2. After the auth fix, agents reached /agent.ashx but were rejected with
   "Agent bad web cert hash" — MeshCentral pins the OUTER TLS cert, but with
   TLS offload the agent sees Traefik's Let's Encrypt cert (which differs
   between the internal .203 LB and the external Cloudflare path, and rotates
   monthly), not MeshCentral's own webserver cert. Fix: set
   ignoreAgentHashCheck=true in the init-container config so MeshCentral
   echoes back the agent-reported hash. The separate mesh-certificate
   (ServerID) handshake still authenticates the server.

Verified: agent paths no longer 302->authentik; web UI root still does;
laptop "Valia_Laptop" enrolled in group "laptops" and ONLINE.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-03 10:24:24 +00:00
.beads bd init: initialize beads issue tracking 2026-04-06 15:38:46 +03:00
.claude immich: clip-keepalive CronJob to pin smart-search model warm 2026-06-03 10:24:24 +00:00
.git-crypt Add 1 git-crypt collaborator [ci skip] 2025-10-24 18:00:00 +00:00
.github chore: sort outage report service list alphabetically 2026-04-15 18:01:54 +00:00
.planning [ci skip] add auto-generated tiers.tf, planning docs, and helm chart cache 2026-03-06 23:55:57 +00:00
.woodpecker ci(drift-detection): generate kubeconfig from projected SA token 2026-05-09 11:31:53 +00:00
ci ci: retrigger image rebuild — prior pipeline aborted during PG outage 2026-05-11 19:30:34 +00:00
cli add IPv6 connectivity via Hurricane Electric 6in4 tunnel 2026-03-23 02:22:00 +02:00
diagram [docs] TrueNAS decommission cleanup — remove references from active docs 2026-04-19 16:55:43 +00:00
docs claude-agent-service: wire parallel execution (git-crypt mount, memory, MAX_CONCURRENCY) 2026-06-03 10:24:24 +00:00
modules infra: fix containerd forgejo-registry redirect .200->.203 (+skip_verify) 2026-06-01 21:22:05 +00:00
playbooks [ci skip] Reduce node config drift: GPU label, OIDC idempotency, node-exporter, rebuild docs 2026-02-22 22:59:38 +00:00
scripts t3code: track t3 nightly via health-checked auto-updater 2026-06-02 19:24:30 +00:00
secrets Woodpecker CI Update TLS Certificates Commit 2026-06-01 10:36:49 +00:00
stacks meshcentral: fix agent connectivity behind Authentik + TLS-offload Traefik 2026-06-03 10:24:24 +00:00
state/stacks state(vault): update encrypted state 2026-05-30 07:59:28 +00:00
.gitattributes Add broker-sync Terraform stack (#7) 2026-04-17 21:17:45 +01:00
.gitignore .gitignore: ignore terragrunt_rendered.json debug output 2026-04-18 13:18:05 +00:00
.gitleaksignore recruiter-responder: public /cb ingress for Telegram URL-button callbacks 2026-05-15 23:46:49 +00:00
.sops.yaml state: per-stack Transit keys for namespace-owner access control 2026-03-17 23:08:18 +00:00
AGENTS.md kyverno: strip orphaned keel.sh/match-tag fleet-wide (image-swap fix) 2026-06-01 19:50:41 +00:00
config.tfvars [infra] Update RPi Sofia DNS: 192.168.1.16 → 192.168.1.10 2026-04-22 10:55:34 +00:00
CONTEXT.md docs: add CONTEXT.md domain glossary [ci skip] 2026-05-16 11:48:19 +00:00
CONTRIBUTING.md multi-user access: fix template memory default, add storage quota, add CONTRIBUTING.md [ci skip] 2026-03-19 23:49:15 +00:00
LICENSE.txt Drone CI Update TLS Certificates Commit 2025-10-12 00:13:18 +00:00
MEMORY.md Update MEMORY.md timestamp 2026-03-07 16:43:15 +00:00
README.md add architecture documentation for all infrastructure subsystems [ci skip] 2026-03-24 00:55:25 +02:00
terragrunt.hcl infra: per-VM I/O caps + terragrunt v0.77 plumbing + state recovery 2026-05-26 06:46:47 +00:00
tiers.tf [ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk 2026-02-28 19:08:06 +00:00

This repo contains my infra-as-code sources.

My infrastructure is built using Terraform, Kubernetes and CI/CD is done using Woodpecker CI.

Read more by visiting my website: https://viktorbarzin.me

Documentation

Full architecture documentation is available in docs/ — covering networking, storage, security, monitoring, secrets, CI/CD, databases, and more.

Adding a New User (Admin)

Adding a new namespace-owner to the cluster requires three steps — no code changes needed.

1. Authentik Group Assignment

In the Authentik admin UI, add the user to:

  • kubernetes-namespace-owners group (grants OIDC group claim for K8s RBAC)
  • Headscale Users group (if they need VPN access)

2. Vault KV Entry

Add a JSON entry to secret/platformk8s_users key in Vault:

"username": {
  "role": "namespace-owner",
  "email": "user@example.com",
  "namespaces": ["username"],
  "domains": ["myapp"],
  "quota": {
    "cpu_requests": "2",
    "memory_requests": "4Gi",
    "memory_limits": "8Gi",
    "pods": "20"
  }
}
  • username key must match the user's Forgejo username (for Woodpecker admin access)
  • namespaces — K8s namespaces to create and grant admin access to
  • domains — subdomains under viktorbarzin.me for Cloudflare DNS records
  • quota — resource limits per namespace (defaults shown above)

3. Apply Stacks

vault login -method=oidc

cd stacks/vault && terragrunt apply --non-interactive
# Creates: namespace, Vault policy, identity entity, K8s deployer role

cd ../platform && terragrunt apply --non-interactive
# Creates: RBAC bindings, ResourceQuota, TLS secret, DNS records

cd ../woodpecker && terragrunt apply --non-interactive
# Adds user to Woodpecker admin list

What Gets Auto-Generated

Resource Stack
Kubernetes namespace vault
Vault policy (namespace-owner-{user}) vault
Vault identity entity + OIDC alias vault
K8s deployer Role + Vault K8s role vault
RBAC RoleBinding (namespace admin) platform
RBAC ClusterRoleBinding (cluster read-only) platform
ResourceQuota platform
TLS secret in namespace platform
Cloudflare DNS records platform
Woodpecker admin access woodpecker

New User Onboarding

If you've been added as a namespace-owner, follow these steps to get started.

1. Join the VPN

# Install Tailscale: https://tailscale.com/download
tailscale login --login-server https://headscale.viktorbarzin.me
# Send the registration URL to Viktor, wait for approval
ping 10.0.20.100  # verify connectivity

2. Install Tools

Run the setup script to install kubectl, kubelogin, Vault CLI, Terraform, and Terragrunt:

# macOS
bash <(curl -fsSL https://k8s-portal.viktorbarzin.me/setup/script?os=mac)

# Linux
bash <(curl -fsSL https://k8s-portal.viktorbarzin.me/setup/script?os=linux)

3. Authenticate

# Log into Vault (opens browser for SSO)
vault login -method=oidc

# Test kubectl (opens browser for OIDC login)
kubectl get pods -n YOUR_NAMESPACE

4. Deploy Your First App

# Clone the infra repo
git clone https://github.com/ViktorBarzin/infra.git && cd infra

# Copy the stack template
cp -r stacks/_template stacks/myapp
mv stacks/myapp/main.tf.example stacks/myapp/main.tf

# Edit main.tf — replace all <placeholders>

# Store secrets in Vault
vault kv put secret/YOUR_USERNAME/myapp DB_PASSWORD=secret123

# Submit a PR
git checkout -b feat/myapp
git add stacks/myapp/
git commit -m "add myapp stack"
git push -u origin feat/myapp

After review and merge, an admin runs cd stacks/myapp && terragrunt apply.

5. Set Up CI/CD (Optional)

Create .woodpecker.yml in your app's Forgejo repo:

steps:
  - name: build
    image: woodpeckerci/plugin-docker-buildx
    settings:
      repo: YOUR_DOCKERHUB_USER/myapp
      tag: ["${CI_PIPELINE_NUMBER}", "latest"]
      username:
        from_secret: dockerhub-username
      password:
        from_secret: dockerhub-token
      platforms: linux/amd64

  - name: deploy
    image: hashicorp/vault:1.18.1
    commands:
      - export VAULT_ADDR=http://vault-active.vault.svc.cluster.local:8200
      - export VAULT_TOKEN=$(vault write -field=token auth/kubernetes/login
          role=ci jwt=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token))
      - KUBE_TOKEN=$(vault write -field=service_account_token
          kubernetes/creds/YOUR_NAMESPACE-deployer
          kubernetes_namespace=YOUR_NAMESPACE)
      - kubectl --server=https://kubernetes.default.svc
          --token=$KUBE_TOKEN
          --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          -n YOUR_NAMESPACE set image deployment/myapp
          myapp=YOUR_DOCKERHUB_USER/myapp:${CI_PIPELINE_NUMBER}

Useful Commands

# Check your pods
kubectl get pods -n YOUR_NAMESPACE

# View quota usage
kubectl describe resourcequota -n YOUR_NAMESPACE

# Store/read secrets
vault kv put secret/YOUR_USERNAME/myapp KEY=value
vault kv get secret/YOUR_USERNAME/myapp

# Get a short-lived K8s deploy token
vault write kubernetes/creds/YOUR_NAMESPACE-deployer \
  kubernetes_namespace=YOUR_NAMESPACE

Important Rules

  • All changes go through Terraform — never kubectl apply/edit/patch directly
  • Never put secrets in code — use Vault: vault kv put secret/YOUR_USERNAME/...
  • Always use a PR — never push directly to master
  • Docker images: build for linux/amd64, use versioned tags (not :latest)

git-crypt setup

To decrypt the secrets, you need to setup git-crypt.

  1. Install git-crypt.
  2. Setup gpg keys on the machine
  3. git-crypt unlock

This will unlock the secrets and will lock them on commit