infra/.claude/skills/setup-project/SKILL.md
Viktor Barzin fd0f4a0365 fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip]
6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 08:45:33 +00:00

17 KiB
Raw Blame History

name description author version date
setup-project Deploy a new self-hosted service to the Kubernetes cluster from a GitHub repository. Use when: (1) User provides a GitHub URL or project name and wants to deploy it, (2) User says "deploy [service]" or "set up [service]", (3) User wants to add a new service to the cluster. Automated workflow: Docker image → Terraform module → Deploy. Handles database setup, ingress, DNS configuration. Claude Code 1.0.0 2025-01-01

Setup Project Skill

Purpose: Deploy a new self-hosted service to the Kubernetes cluster from a GitHub repository.

When to use: User provides a GitHub URL or project name and wants to deploy it to the cluster.

Workflow

1. Research Phase

Input: GitHub repository URL or project name

Actions:

  • Visit the GitHub repository
  • Check the README for:
    • Official Docker image (Docker Hub, ghcr.io, etc.)
    • docker-compose.yml file
    • Self-hosting documentation
    • Required dependencies (PostgreSQL, MySQL, Redis, etc.)
    • Environment variables needed
    • Default ports
    • Storage requirements

Find Docker Image Priority:

  1. Check official documentation for recommended image
  2. Look in docker-compose.yml for image: directive
  3. Check GitHub Container Registry: ghcr.io/<org>/<repo>
  4. Check Docker Hub: <org>/<repo>
  5. Check releases page for container images
  6. Last resort: Build from Dockerfile (avoid if possible)

Classify Dockerfile State (drives whether we contribute a PR back upstream later):

State When Action on deploy success
image-used An official/community image worked (priority 1-5). No upstream PR. Default case.
used-as-is Upstream ships a Dockerfile; it built and ran fine. No upstream PR.
fixed-broken-upstream Upstream Dockerfile exists but fails to build / run; we patched it. Open a fix-dockerfile PR after stability gate.
written-from-scratch Upstream has no Dockerfile at all; we authored one. Open an add-dockerfile PR after stability gate.

Record the chosen state and supporting metadata in modules/kubernetes/<service>/.contribution-state.json. When we author or fix a Dockerfile, also write modules/kubernetes/<service>/files/Dockerfile, .dockerignore, and BUILD.md (from templates/Dockerfile.README.md) — these travel with the upstream PR.

{
  "upstream_repo": "owner/name",
  "dockerfile_state": "written-from-scratch",
  "dockerfile_path_in_infra": "modules/kubernetes/<service>/files/Dockerfile",
  "deploy_target_url": "https://<service>.viktorbarzin.me",
  "image_tag": "registry.viktorbarzin.me/<service>:<sha>",
  "image_size": "<MB>",
  "base_image": "<e.g. python:3.12-slim>",
  "dockerfile_shape": "multi-stage, non-root, linux/amd64",
  "deploy_verified_at": null,
  "contribution_pr_url": null
}

Dockerfile quality bar (when writing one ourselves — enforced before PR):

  • Multi-stage build where it makes sense (Node, Go, Rust, Python with compiled deps).
  • Explicit non-root USER.
  • HEALTHCHECK when the app exposes a known endpoint.
  • Minimal base image (alpine / distroless preferred; -slim otherwise).
  • No secrets baked in; runtime config via ENV.
  • .dockerignore that excludes .git, node_modules, test artifacts.

Extract Configuration:

  • Container port (default port the app listens on)
  • Environment variables (DATABASE_URL, REDIS_HOST, SMTP, etc.)
  • Volume mounts (what data needs persistence)
  • Dependencies (database type, cache, etc.)

2. Database Setup (if needed)

If project requires PostgreSQL:

  • User provides database credentials or use pattern: <service> user with secure password
  • Database will be created in shared postgresql.dbaas.svc.cluster.local
  • Connection string format: postgresql://<user>:<password>@postgresql.dbaas.svc.cluster.local:5432/<dbname>

If project requires MySQL:

  • User provides database credentials
  • Database in shared mysql.dbaas.svc.cluster.local
  • Connection string format: mysql://<user>:<password>@mysql.dbaas.svc.cluster.local:3306/<dbname>

If project requires Redis:

  • Use shared Redis: redis.redis.svc.cluster.local:6379
  • No password required

IMPORTANT: Never create databases yourself - always ask user for credentials to use.

3. NFS Storage Setup (if service needs persistent data)

IMPORTANT: NFS directories must exist and be exported on the NFS server BEFORE deploying the service. If the directory doesn't exist, the pod will fail to mount the volume and get stuck in ContainerCreating.

Steps:

  1. Create the directory on the NFS server:
ssh root@10.0.10.15 'mkdir -p /mnt/main/<service> && chmod 777 /mnt/main/<service>'
  1. Export the directory via TrueNAS:
    • The NFS export must be configured in TrueNAS so Kubernetes nodes can mount it
    • Create the export via TrueNAS WebUI or API, allowing access from the Kubernetes network (10.0.20.0/24)
    • Verify the export is accessible:
# From a k8s node or the dev VM
showmount -e 10.0.10.15 | grep <service>
  1. Verify the mount works before proceeding:
# Quick test from a k8s node
ssh root@10.0.20.100 'mount -t nfs 10.0.10.15:/mnt/main/<service> /tmp/test-mount && ls /tmp/test-mount && umount /tmp/test-mount'

Only proceed to Terraform module creation after confirming the NFS export is accessible.

4. Terraform Module Creation

Create module directory:

mkdir -p modules/kubernetes/<service-name>/

Create modules/kubernetes/<service-name>/main.tf:

variable "tls_secret_name" {}
variable "tier" { type = string }
variable "postgresql_password" {}  # Only if needed
# Add other variables as needed (smtp_password, api_keys, etc.)

resource "kubernetes_namespace" "<service>" {
  metadata {
    name = "<service>"
  }
}

module "tls_secret" {
  source          = "../setup_tls_secret"
  namespace       = kubernetes_namespace.<service>.metadata[0].name
  tls_secret_name = var.tls_secret_name
}

# If database migrations needed, add init_container
resource "kubernetes_deployment" "<service>" {
  metadata {
    name      = "<service>"
    namespace = kubernetes_namespace.<service>.metadata[0].name
    labels = {
      app  = "<service>"
      tier = var.tier
    }
  }
  spec {
    replicas = 1
    selector {
      match_labels = {
        app = "<service>"
      }
    }
    template {
      metadata {
        labels = {
          app = "<service>"
        }
      }
      spec {
        # Init container for migrations (if needed)
        # init_container { ... }

        container {
          name  = "<service>"
          image = "<docker-image>:<tag>"

          port {
            container_port = <port>
          }

          # Environment variables
          env {
            name  = "DATABASE_URL"
            value = "postgresql://<service>:${var.postgresql_password}@postgresql.dbaas.svc.cluster.local:5432/<service>"
          }
          # Add other env vars as needed

          # Volume mounts for persistent data
          volume_mount {
            name       = "data"
            mount_path = "<mount-path>"
            sub_path   = "<optional-subpath>"
          }

          resources {
            requests = {
              memory = "256Mi"
              cpu    = "100m"
            }
            limits = {
              memory = "2Gi"
              cpu    = "1"
            }
          }

          # Health checks (if endpoints exist)
          liveness_probe {
            http_get {
              path = "/health"  # or /healthz, /, etc.
              port = <port>
            }
            initial_delay_seconds = 60
            period_seconds        = 30
          }
        }

        # NFS volume for persistence
        volume {
          name = "data"
          nfs {
            server = "10.0.10.15"
            path   = "/mnt/main/<service>"
          }
        }
      }
    }
  }
}

resource "kubernetes_service" "<service>" {
  metadata {
    name      = "<service>"
    namespace = kubernetes_namespace.<service>.metadata[0].name
    labels = {
      app = "<service>"
    }
  }

  spec {
    selector = {
      app = "<service>"
    }
    port {
      name        = "http"
      port        = 80
      target_port = <container-port>
    }
  }
}

module "ingress" {
  source          = "../ingress_factory"
  namespace       = kubernetes_namespace.<service>.metadata[0].name
  name            = "<service>"
  tls_secret_name = var.tls_secret_name
  # Add extra_annotations if needed (proxy-body-size, timeouts, etc.)
}

5. Update Main Terraform Files

Add to modules/kubernetes/main.tf:

  1. Add variable declarations at top:
variable "<service>_postgresql_password" { type = string }
  1. Add to appropriate DEFCON level (ask user which level, default to 5):
5 : [
  ...,
  "<service>"
]
  1. Add module block at bottom:
module "<service>" {
  source              = "./<service>"
  for_each            = contains(local.active_modules, "<service>") ? { <service> = true } : {}
  tls_secret_name     = var.tls_secret_name
  postgresql_password = var.<service>_postgresql_password
  tier                = local.tiers.aux  # or appropriate tier

  depends_on = [null_resource.core_services]
}

Add to main.tf:

  1. Add variable:
variable "<service>_postgresql_password" { type = string }
  1. Pass to kubernetes_cluster module:
module "kubernetes_cluster" {
  ...
  <service>_postgresql_password = var.<service>_postgresql_password
}

Update terraform.tfvars:

  1. Add password/credentials:
<service>_postgresql_password = "<secure-password>"
  1. Add to Cloudflare DNS (ask user if proxied or non-proxied):
cloudflare_non_proxied_names = [
  ...,
  "<service>"
]

6. Email/SMTP Configuration (if needed)

If service needs to send emails:

env {
  name  = "MAILER_HOST"
  value = "mailserver.viktorbarzin.me"  # Public hostname for TLS
}
env {
  name  = "MAILER_PORT"
  value = "587"
}
env {
  name  = "MAILER_USER"
  value = "info@viktorbarzin.me"
}
env {
  name  = "MAILER_PASSWORD"
  value = var.mailserver_accounts["info@viktorbarzin.me"]  # Pass from module
}

Add to module call:

smtp_password = var.mailserver_accounts["info@viktorbarzin.me"]

7. Apply Terraform

terraform init
terraform apply -target=module.kubernetes_cluster.module.<service> -var="kube_config_path=$(pwd)/config" -auto-approve

IMPORTANT: Also apply the cloudflared module to create the Cloudflare DNS record:

terraform apply -target=module.kubernetes_cluster.module.cloudflared -var="kube_config_path=$(pwd)/config" -auto-approve

Without this step, the DNS record won't be created even though it's defined in terraform.tfvars.

8. Verification

kubectl get pods -n <service>
kubectl logs -n <service> -l app=<service> --tail=50

Test URL: https://<service>.viktorbarzin.me

8b. Stability Gate (required when dockerfile_state ∈ {written-from-scratch, fixed-broken-upstream})

Before committing — and before any upstream PR in §10 — run a 10-minute stability check to catch pods that crash-loop a few minutes after Ready.

.claude/skills/setup-project/scripts/stability-gate.sh <service> <service> https://<service>.viktorbarzin.me

Polls pod readiness + curl 200 every 30s × 20 iterations. Requires 18/20 successes (tolerates 2 blips).

  • Pass → update the state file: jq '.deploy_verified_at = (now | todate)' .contribution-state.json | sponge .contribution-state.json → proceed to §9 and §10.
  • Fail → stop. Investigate via kubectl logs, kubectl describe. Do NOT commit. Do NOT fire §10. Re-run the gate after fixes.

For image-used / used-as-is states, the gate is optional (app is already running a known-good image).

9. Commit Changes

git add modules/kubernetes/<service>/ main.tf modules/kubernetes/main.tf terraform.tfvars
git commit -m "Add <service> deployment

- Deploy <service> as <description>
- Uses <dependencies>
- Ingress at <service>.viktorbarzin.me

[ci skip]"

10. Contribute Dockerfile Upstream (only when dockerfile_state ∈ {written-from-scratch, fixed-broken-upstream})

Goal: give the community the working Dockerfile we just validated in production.

Preconditions (script enforces):

  • .contribution-state.json present with a trigger state and deploy_verified_at set.
  • files/Dockerfile, files/.dockerignore, files/BUILD.md exist next to the module.
  • GITHUB_TOKEN in env — or vault kv get -field=github_pat secret/viktor is reachable.

Run:

.claude/skills/setup-project/scripts/contribute-dockerfile.sh modules/kubernetes/<service>

What the script does (all via GitHub REST — gh CLI is sandbox-blocked):

  1. Reads .contribution-state.json; skips unless state is written-from-scratch or fixed-broken-upstream and no contribution_pr_url is already recorded.
  2. Upstream sanity checks: repo exists, public, not archived; default branch discoverable; for written-from-scratch, verifies a Dockerfile didn't land upstream while we were deploying; bails cleanly if an open PR from our fork already exists.
  3. POST /repos/<owner>/<name>/forks — idempotent; waits up to 30s for the fork to be ready at ViktorBarzin/<name>.
  4. POST /repos/ViktorBarzin/<name>/merge-upstream — keeps fork current with upstream default branch.
  5. Creates branch add-dockerfile (or fix-dockerfile), timestamp-suffixed if that branch already exists with unrelated commits.
  6. Commits Dockerfile, .dockerignore, BUILD.md via Contents API. Each commit message carries Signed-off-by: for DCO-enforcing repos.
  7. Opens PR against upstream with body rendered from templates/PR_BODY.md.
  8. Writes contribution_pr_url back into .contribution-state.json and echoes the URL.

Failure handling:

  • Upstream archived / private / deleted → logged as SKIP, deploy success stands.
  • Fork/branch/PR already exists → treated as idempotent success; existing URL recorded.
  • GitHub 5xx → 3× exponential backoff, then hard fail with a clear message — safe to re-run the script.

After the PR opens: the URL is in .contribution-state.json. Share it with the user. No automated follow-up on merge/reject — that's a manual check for now.

Common Patterns

Init Container for Migrations

init_container {
  name    = "migration"
  image   = "<same-image>"
  command = ["sh", "-c", "<migration-command>"]

  # Same env vars and volumes as main container
}

Dynamic Environment Variables

locals {
  common_env = [
    { name = "VAR1", value = "value1" },
    { name = "VAR2", value = "value2" },
  ]
}

dynamic "env" {
  for_each = local.common_env
  content {
    name  = env.value.name
    value = env.value.value
  }
}

External URL Configuration

Many apps need their public URL configured:

env {
  name  = "APP_URL"  # or PUBLIC_URL, EXTERNAL_URL, etc.
  value = "https://<service>.viktorbarzin.me"
}
env {
  name  = "HTTPS"  # or ENABLE_HTTPS, etc.
  value = "true"
}

Checklist

  • Find official Docker image or docker-compose
  • Identify dependencies (DB, Redis, etc.)
  • Ask user for database credentials (never create yourself)
  • Create NFS directory and export on TrueNAS (if persistent storage needed)
  • Verify NFS mount is accessible from k8s nodes
  • Create modules/kubernetes/<service>/main.tf
  • Classify dockerfile_state and write .contribution-state.json
  • If writing/fixing Dockerfile: satisfy the quality bar (multi-stage, non-root, .dockerignore, BUILD.md)
  • Update modules/kubernetes/main.tf (variables, DEFCON level, module block)
  • Update main.tf (variable, pass to module)
  • Update terraform.tfvars (password, Cloudflare DNS)
  • Run terraform init and terraform apply
  • Verify pods are running
  • Test the URL
  • Run stability-gate.sh — needed for contribution, optional otherwise
  • Commit changes with [ci skip]
  • Run contribute-dockerfile.sh if state triggers an upstream PR

Questions to Ask User

  1. What DEFCON level should this service be in? (Default: 5)
  2. Should Cloudflare proxy this domain? (Default: no, add to non_proxied_names)
  3. Does this need email/SMTP? (Configure if yes)
  4. What database credentials should I use? (Never create yourself)
  5. What tier? (core/cluster/gpu/edge/aux - default: aux)

Notes

  • Always create NFS directories and exports BEFORE deploying - pods will get stuck in ContainerCreating if the NFS path doesn't exist or isn't exported
  • Always use official documentation as the source of truth
  • Prefer stable/latest tags over specific versions for self-hosted
  • Use shared infrastructure: PostgreSQL at postgresql.dbaas.svc.cluster.local, Redis at redis.redis.svc.cluster.local
  • NFS storage: Always at 10.0.10.15:/mnt/main/<service>
  • Email: Use mailserver.viktorbarzin.me (public hostname) not internal service name
  • Resource limits: Start conservative, can increase if needed
  • Health checks: Only add if the app has health endpoints