infra/AGENTS.md

# Infrastructure Repository — AI Agent Instructions

## Critical Rules (MUST FOLLOW)
- **ALL changes through Terraform/Terragrunt** — NEVER `kubectl apply/edit/patch/delete` for persistent changes. Read-only kubectl is fine.
- **NEVER put secrets in plaintext** — use `secrets.sops.json` (SOPS-encrypted) or `terraform.tfvars` (git-crypt, legacy)
- **NEVER restart NFS on TrueNAS** — causes cluster-wide mount failures across all pods
- **NEVER commit secrets** — triple-check before every commit
- **`[ci skip]` in commit messages** when changes were already applied locally
- **Ask before `git push`** — always confirm with the user first

## Execution
- **Apply a service**: `scripts/tg apply --non-interactive` (auto-decrypts SOPS secrets)
- **Legacy apply**: `cd stacks/<service> && terragrunt apply --non-interactive` (uses terraform.tfvars)
- **kubectl**: `kubectl --kubeconfig $(pwd)/config`
- **Health check**: `bash scripts/cluster_healthcheck.sh --quiet`
- **Plan all**: `cd stacks && terragrunt run --all --non-interactive -- plan`

## Secrets Management (SOPS)
- **`config.tfvars`** — plaintext config (hostnames, IPs, DNS records, public keys)
- **`secrets.sops.json`** — SOPS-encrypted secrets (passwords, tokens, SSH keys, API keys)
- **`.sops.yaml`** — defines who can decrypt (age public keys: Viktor + CI)
- **`scripts/tg`** — wrapper that auto-decrypts SOPS before running terragrunt
- **Edit secrets**: `sops secrets.sops.json` (opens $EDITOR, re-encrypts on save)
- **Add a secret**: `sops set secrets.sops.json '["new_key"]' '"value"'`
- **Operators** push PRs → Viktor reviews → CI decrypts and applies. No encryption keys needed for operators.

## Sealed Secrets (User-Managed Secrets)
For secrets that users manage themselves (no SOPS/git-crypt access needed):
1. **Create**: `kubectl create secret generic <name> --from-literal=key=value -n <ns> --dry-run=client -o yaml | kubeseal --controller-name sealed-secrets --controller-namespace sealed-secrets -o yaml > sealed-<name>.yaml`
2. **Commit**: Place `sealed-*.yaml` files in the stack directory (`stacks/<service>/`)
3. **Terraform picks them up** automatically via `fileset` + `for_each`:
   ```hcl
   resource "kubernetes_manifest" "sealed_secrets" {
     for_each = fileset(path.module, "sealed-*.yaml")
     manifest = yamldecode(file("${path.module}/${each.value}"))
   }
   ```
4. **Deploy**: Push → CI runs `terragrunt apply` → controller decrypts into real K8s Secrets
- Only the in-cluster controller has the private key. `kubeseal` uses the public key — safe to distribute.
- Naming convention: files MUST match `sealed-*.yaml` glob pattern.
- The `kubernetes_manifest` block is safe to add even with zero sealed-*.yaml files (empty for_each).

## Architecture
Terragrunt-based homelab managing a Kubernetes cluster (5 nodes, v1.34.2) on Proxmox VMs.
- **70+ services**, each in `stacks/<service>/` with its own Terraform state
- **Core platform**: `stacks/platform/modules/` (~22 modules: Traefik, Kyverno, monitoring, dbaas, sealed-secrets, etc.)
- **Public domain**: `viktorbarzin.me` (Cloudflare) | **Internal**: `viktorbarzin.lan` (Technitium DNS)
- **Onboarding portal**: `https://k8s-portal.viktorbarzin.me` — self-service kubectl setup + docs
- **CI/CD**: Woodpecker CI — PRs run plan, merges to master auto-apply platform stack

## Key Paths
- `stacks/<service>/main.tf` — service definition
- `stacks/platform/modules/<service>/` — core infra modules
- `modules/kubernetes/ingress_factory/` — standardized ingress with auth, rate limiting, anti-AI
- `modules/kubernetes/nfs_volume/` — NFS volume module (CSI-backed, soft mount)
- `config.tfvars` — non-secret configuration (plaintext)
- `secrets.sops.json` — all secrets (SOPS-encrypted JSON)
- `terraform.tfvars` — legacy secrets file (git-crypt, kept for reference)
- `scripts/cluster_healthcheck.sh` — 25-check cluster health script

## Storage
- **NFS** (`nfs-truenas` StorageClass): For app data. Use the `nfs_volume` module, never inline `nfs {}` blocks.
- **iSCSI** (`iscsi-truenas` StorageClass): For databases (PostgreSQL, MySQL). democratic-csi driver.
- **TrueNAS**: 10.0.10.15. NFS exports managed via `secrets/nfs_exports.sh`.
- **SQLite on NFS is unreliable** (fsync issues) — always use iSCSI or local disk for databases.
- **NFS mount options**: Always `soft,timeo=30,retrans=3` to prevent uninterruptible sleep (D state).
- **NFS export directory must exist** on TrueNAS before Terraform can create the PV.

## Shared Variables (never hardcode)
`var.nfs_server` (10.0.10.15), `var.redis_host`, `var.postgresql_host`, `var.mysql_host`, `var.ollama_host`, `var.mail_host`

## Tier System
`0-core` | `1-cluster` | `2-gpu` | `3-edge` | `4-aux` — Kyverno auto-generates LimitRange + ResourceQuota per namespace based on tier label.
- Containers without explicit `resources {}` get default limits (256Mi for edge/aux — causes OOMKill for heavy apps)
- Always set explicit resources on containers that need more than defaults
- Opt-out: labels `resource-governance/custom-quota=true` / `resource-governance/custom-limitrange=true`

## Infrastructure
- **Proxmox**: 192.168.1.127 (Dell R730, 22c/44t, 142GB RAM)
- **Nodes**: k8s-master (10.0.20.100), node1 (GPU, Tesla T4), node2-4
- **GPU**: `node_selector = { "gpu": "true" }` + toleration `nvidia.com/gpu`
- **Pull-through cache**: 10.0.20.10 — docker.io (:5000), ghcr.io (:5010) only. Caches stale manifests for :latest tags — use versioned tags or pre-pull with `ctr --hosts-dir ''` to bypass.
- **pfSense**: 10.0.20.1 (gateway, firewall, DNS forwarding)
- **MySQL InnoDB Cluster**: 3 instances on iSCSI, anti-affinity excludes node2 (SIGBUS bug)
- **SMTP**: `var.mail_host` port 587 STARTTLS (not internal svc address — cert mismatch)

## Contributor Onboarding
1. Get Authentik account + Headscale VPN access (ask Viktor)
2. Clone repo — `AGENTS.md` is auto-loaded by Codex
3. Create branch → edit → push → open PR
4. Viktor reviews → CI applies → Slack notification
5. Portal: `https://k8s-portal.viktorbarzin.me/onboarding` for full guide

## Common Operations
- **Deploy new service**: Use `stacks/<existing-service>/` as template. Create stack, add DNS in tfvars, apply platform then service.
- **Fix crashed pods**: Run healthcheck first. Safe to delete evicted/failed pods and CrashLoopBackOff pods with >10 restarts.
- **OOMKilled**: Check `kubectl describe limitrange tier-defaults -n <ns>`. Increase `resources.limits.memory` in the stack's main.tf.
- **Add a secret**: `sops set secrets.sops.json '["key"]' '"value"'` then commit.
- **NFS exports**: Create dir on TrueNAS first, add to `secrets/nfs_directories.txt`, run `secrets/nfs_exports.sh`.

## Detailed Reference
See `.claude/reference/patterns.md` for: NFS volume code examples, iSCSI details, Kyverno governance tables, anti-AI scraping layers, Terragrunt architecture, node rebuild procedure, archived troubleshooting runbooks index.
[ci skip] add AGENTS.md for model-agnostic knowledge, slim CLAUDE.md to Claude-specific layer AGENTS.md (63 lines): shared infra knowledge for any AI tool (Codex, Claude, Cursor). Covers: critical rules, architecture, storage, tiers, common ops. CLAUDE.md (23 lines): Claude-specific addons — skills, agents, user preferences. References AGENTS.md for shared knowledge. Removed generic agents (devops-engineer, fullstack-developer). 2026-03-06 23:50:26 +00:00			`# Infrastructure Repository — AI Agent Instructions`

			`## Critical Rules (MUST FOLLOW)`
			- ALL changes through Terraform/Terragrunt — NEVER `kubectl apply/edit/patch/delete` for persistent changes. Read-only kubectl is fine.
[ci skip] update AGENTS.md + CLAUDE.md with SOPS workflow, add k8s-portal CI pipeline AGENTS.md: added SOPS secrets management section, scripts/tg usage, contributor onboarding steps, pull-through cache bypass notes. CLAUDE.md: added SOPS workflow note, linux/amd64 build reminder, versioned tag guidance for pull-through cache. CI: new .woodpecker/k8s-portal.yml pipeline — auto-builds and deploys the k8s portal when files under stacks/platform/modules/k8s-portal/files/ change on master push. Uses buildx for linux/amd64. 2026-03-07 15:37:19 +00:00			- NEVER put secrets in plaintext — use `secrets.sops.json` (SOPS-encrypted) or `terraform.tfvars` (git-crypt, legacy)
[ci skip] add AGENTS.md for model-agnostic knowledge, slim CLAUDE.md to Claude-specific layer AGENTS.md (63 lines): shared infra knowledge for any AI tool (Codex, Claude, Cursor). Covers: critical rules, architecture, storage, tiers, common ops. CLAUDE.md (23 lines): Claude-specific addons — skills, agents, user preferences. References AGENTS.md for shared knowledge. Removed generic agents (devops-engineer, fullstack-developer). 2026-03-06 23:50:26 +00:00			`- NEVER restart NFS on TrueNAS — causes cluster-wide mount failures across all pods`
			`- NEVER commit secrets — triple-check before every commit`
			- `[ci skip]` in commit messages when changes were already applied locally
			- Ask before `git push` — always confirm with the user first

			`## Execution`
[ci skip] update AGENTS.md + CLAUDE.md with SOPS workflow, add k8s-portal CI pipeline AGENTS.md: added SOPS secrets management section, scripts/tg usage, contributor onboarding steps, pull-through cache bypass notes. CLAUDE.md: added SOPS workflow note, linux/amd64 build reminder, versioned tag guidance for pull-through cache. CI: new .woodpecker/k8s-portal.yml pipeline — auto-builds and deploys the k8s portal when files under stacks/platform/modules/k8s-portal/files/ change on master push. Uses buildx for linux/amd64. 2026-03-07 15:37:19 +00:00			- Apply a service: `scripts/tg apply --non-interactive` (auto-decrypts SOPS secrets)
			- Legacy apply: `cd stacks/<service> && terragrunt apply --non-interactive` (uses terraform.tfvars)
[ci skip] add AGENTS.md for model-agnostic knowledge, slim CLAUDE.md to Claude-specific layer AGENTS.md (63 lines): shared infra knowledge for any AI tool (Codex, Claude, Cursor). Covers: critical rules, architecture, storage, tiers, common ops. CLAUDE.md (23 lines): Claude-specific addons — skills, agents, user preferences. References AGENTS.md for shared knowledge. Removed generic agents (devops-engineer, fullstack-developer). 2026-03-06 23:50:26 +00:00			- kubectl: `kubectl --kubeconfig $(pwd)/config`
			- Health check: `bash scripts/cluster_healthcheck.sh --quiet`
			- Plan all: `cd stacks && terragrunt run --all --non-interactive -- plan`

[ci skip] update AGENTS.md + CLAUDE.md with SOPS workflow, add k8s-portal CI pipeline AGENTS.md: added SOPS secrets management section, scripts/tg usage, contributor onboarding steps, pull-through cache bypass notes. CLAUDE.md: added SOPS workflow note, linux/amd64 build reminder, versioned tag guidance for pull-through cache. CI: new .woodpecker/k8s-portal.yml pipeline — auto-builds and deploys the k8s portal when files under stacks/platform/modules/k8s-portal/files/ change on master push. Uses buildx for linux/amd64. 2026-03-07 15:37:19 +00:00			`## Secrets Management (SOPS)`
			- `config.tfvars` — plaintext config (hostnames, IPs, DNS records, public keys)
			- `secrets.sops.json` — SOPS-encrypted secrets (passwords, tokens, SSH keys, API keys)
			- `.sops.yaml` — defines who can decrypt (age public keys: Viktor + CI)
			- `scripts/tg` — wrapper that auto-decrypts SOPS before running terragrunt
			- Edit secrets: `sops secrets.sops.json` (opens $EDITOR, re-encrypts on save)
			- Add a secret: `sops set secrets.sops.json '["new_key"]' '"value"'`
			`- Operators push PRs → Viktor reviews → CI decrypts and applies. No encryption keys needed for operators.`

[ci skip] add sealed secrets convention: fileset + kubernetes_manifest pattern - Document sealed secrets workflow in AGENTS.md and CLAUDE.md - Add kubernetes_manifest + fileset(sealed-.yaml) block to plotting-book as reference - Users: kubeseal encrypt → commit sealed-.yaml → CI applies via Terraform - E2E tested: seal/commit/plan/apply/decrypt cycle verified 2026-03-08 20:03:50 +00:00			`## Sealed Secrets (User-Managed Secrets)`
			`For secrets that users manage themselves (no SOPS/git-crypt access needed):`
			1. Create: `kubectl create secret generic <name> --from-literal=key=value -n <ns> --dry-run=client -o yaml \| kubeseal --controller-name sealed-secrets --controller-namespace sealed-secrets -o yaml > sealed-<name>.yaml`
			2. Commit: Place `sealed-*.yaml` files in the stack directory (`stacks/<service>/`)
			3. Terraform picks them up automatically via `fileset` + `for_each`:
			```hcl
			`resource "kubernetes_manifest" "sealed_secrets" {`
			`for_each = fileset(path.module, "sealed-*.yaml")`
			`manifest = yamldecode(file("${path.module}/${each.value}"))`
			`}`
			```
			4. Deploy: Push → CI runs `terragrunt apply` → controller decrypts into real K8s Secrets
			- Only the in-cluster controller has the private key. `kubeseal` uses the public key — safe to distribute.
			- Naming convention: files MUST match `sealed-*.yaml` glob pattern.
			- The `kubernetes_manifest` block is safe to add even with zero sealed-*.yaml files (empty for_each).

[ci skip] add AGENTS.md for model-agnostic knowledge, slim CLAUDE.md to Claude-specific layer AGENTS.md (63 lines): shared infra knowledge for any AI tool (Codex, Claude, Cursor). Covers: critical rules, architecture, storage, tiers, common ops. CLAUDE.md (23 lines): Claude-specific addons — skills, agents, user preferences. References AGENTS.md for shared knowledge. Removed generic agents (devops-engineer, fullstack-developer). 2026-03-06 23:50:26 +00:00			`## Architecture`
			`Terragrunt-based homelab managing a Kubernetes cluster (5 nodes, v1.34.2) on Proxmox VMs.`
			- 70+ services, each in `stacks/<service>/` with its own Terraform state
[ci skip] add sealed secrets convention: fileset + kubernetes_manifest pattern - Document sealed secrets workflow in AGENTS.md and CLAUDE.md - Add kubernetes_manifest + fileset(sealed-.yaml) block to plotting-book as reference - Users: kubeseal encrypt → commit sealed-.yaml → CI applies via Terraform - E2E tested: seal/commit/plan/apply/decrypt cycle verified 2026-03-08 20:03:50 +00:00			- Core platform: `stacks/platform/modules/` (~22 modules: Traefik, Kyverno, monitoring, dbaas, sealed-secrets, etc.)
[ci skip] add AGENTS.md for model-agnostic knowledge, slim CLAUDE.md to Claude-specific layer AGENTS.md (63 lines): shared infra knowledge for any AI tool (Codex, Claude, Cursor). Covers: critical rules, architecture, storage, tiers, common ops. CLAUDE.md (23 lines): Claude-specific addons — skills, agents, user preferences. References AGENTS.md for shared knowledge. Removed generic agents (devops-engineer, fullstack-developer). 2026-03-06 23:50:26 +00:00			- Public domain: `viktorbarzin.me` (Cloudflare) \| Internal: `viktorbarzin.lan` (Technitium DNS)
[ci skip] update AGENTS.md + CLAUDE.md with SOPS workflow, add k8s-portal CI pipeline AGENTS.md: added SOPS secrets management section, scripts/tg usage, contributor onboarding steps, pull-through cache bypass notes. CLAUDE.md: added SOPS workflow note, linux/amd64 build reminder, versioned tag guidance for pull-through cache. CI: new .woodpecker/k8s-portal.yml pipeline — auto-builds and deploys the k8s portal when files under stacks/platform/modules/k8s-portal/files/ change on master push. Uses buildx for linux/amd64. 2026-03-07 15:37:19 +00:00			- Onboarding portal: `https://k8s-portal.viktorbarzin.me` — self-service kubectl setup + docs
			`- CI/CD: Woodpecker CI — PRs run plan, merges to master auto-apply platform stack`
[ci skip] add AGENTS.md for model-agnostic knowledge, slim CLAUDE.md to Claude-specific layer AGENTS.md (63 lines): shared infra knowledge for any AI tool (Codex, Claude, Cursor). Covers: critical rules, architecture, storage, tiers, common ops. CLAUDE.md (23 lines): Claude-specific addons — skills, agents, user preferences. References AGENTS.md for shared knowledge. Removed generic agents (devops-engineer, fullstack-developer). 2026-03-06 23:50:26 +00:00
			`## Key Paths`
			- `stacks/<service>/main.tf` — service definition
			- `stacks/platform/modules/<service>/` — core infra modules
			- `modules/kubernetes/ingress_factory/` — standardized ingress with auth, rate limiting, anti-AI
			- `modules/kubernetes/nfs_volume/` — NFS volume module (CSI-backed, soft mount)
[ci skip] update AGENTS.md + CLAUDE.md with SOPS workflow, add k8s-portal CI pipeline AGENTS.md: added SOPS secrets management section, scripts/tg usage, contributor onboarding steps, pull-through cache bypass notes. CLAUDE.md: added SOPS workflow note, linux/amd64 build reminder, versioned tag guidance for pull-through cache. CI: new .woodpecker/k8s-portal.yml pipeline — auto-builds and deploys the k8s portal when files under stacks/platform/modules/k8s-portal/files/ change on master push. Uses buildx for linux/amd64. 2026-03-07 15:37:19 +00:00			- `config.tfvars` — non-secret configuration (plaintext)
			- `secrets.sops.json` — all secrets (SOPS-encrypted JSON)
			- `terraform.tfvars` — legacy secrets file (git-crypt, kept for reference)
[ci skip] add AGENTS.md for model-agnostic knowledge, slim CLAUDE.md to Claude-specific layer AGENTS.md (63 lines): shared infra knowledge for any AI tool (Codex, Claude, Cursor). Covers: critical rules, architecture, storage, tiers, common ops. CLAUDE.md (23 lines): Claude-specific addons — skills, agents, user preferences. References AGENTS.md for shared knowledge. Removed generic agents (devops-engineer, fullstack-developer). 2026-03-06 23:50:26 +00:00			- `scripts/cluster_healthcheck.sh` — 25-check cluster health script

			`## Storage`
			- NFS (`nfs-truenas` StorageClass): For app data. Use the `nfs_volume` module, never inline `nfs {}` blocks.
			- iSCSI (`iscsi-truenas` StorageClass): For databases (PostgreSQL, MySQL). democratic-csi driver.
			- TrueNAS: 10.0.10.15. NFS exports managed via `secrets/nfs_exports.sh`.
update claude knowledge: infra operational learnings from commit history [ci skip] Add resource management patterns, networking resilience, service-specific notes, monitoring patterns, and NFS storage rules extracted from ~963 commits. 2026-03-15 10:46:45 +00:00			`- SQLite on NFS is unreliable (fsync issues) — always use iSCSI or local disk for databases.`
			- NFS mount options: Always `soft,timeo=30,retrans=3` to prevent uninterruptible sleep (D state).
			`- NFS export directory must exist on TrueNAS before Terraform can create the PV.`
[ci skip] add AGENTS.md for model-agnostic knowledge, slim CLAUDE.md to Claude-specific layer AGENTS.md (63 lines): shared infra knowledge for any AI tool (Codex, Claude, Cursor). Covers: critical rules, architecture, storage, tiers, common ops. CLAUDE.md (23 lines): Claude-specific addons — skills, agents, user preferences. References AGENTS.md for shared knowledge. Removed generic agents (devops-engineer, fullstack-developer). 2026-03-06 23:50:26 +00:00
			`## Shared Variables (never hardcode)`
			`var.nfs_server` (10.0.10.15), `var.redis_host`, `var.postgresql_host`, `var.mysql_host`, `var.ollama_host`, `var.mail_host`

			`## Tier System`
			`0-core` \| `1-cluster` \| `2-gpu` \| `3-edge` \| `4-aux` — Kyverno auto-generates LimitRange + ResourceQuota per namespace based on tier label.
			- Containers without explicit `resources {}` get default limits (256Mi for edge/aux — causes OOMKill for heavy apps)
			`- Always set explicit resources on containers that need more than defaults`
			- Opt-out: labels `resource-governance/custom-quota=true` / `resource-governance/custom-limitrange=true`

			`## Infrastructure`
			`- Proxmox: 192.168.1.127 (Dell R730, 22c/44t, 142GB RAM)`
			`- Nodes: k8s-master (10.0.20.100), node1 (GPU, Tesla T4), node2-4`
			- GPU: `node_selector = { "gpu": "true" }` + toleration `nvidia.com/gpu`
[ci skip] update AGENTS.md + CLAUDE.md with SOPS workflow, add k8s-portal CI pipeline AGENTS.md: added SOPS secrets management section, scripts/tg usage, contributor onboarding steps, pull-through cache bypass notes. CLAUDE.md: added SOPS workflow note, linux/amd64 build reminder, versioned tag guidance for pull-through cache. CI: new .woodpecker/k8s-portal.yml pipeline — auto-builds and deploys the k8s portal when files under stacks/platform/modules/k8s-portal/files/ change on master push. Uses buildx for linux/amd64. 2026-03-07 15:37:19 +00:00			- Pull-through cache: 10.0.20.10 — docker.io (:5000), ghcr.io (:5010) only. Caches stale manifests for :latest tags — use versioned tags or pre-pull with `ctr --hosts-dir ''` to bypass.
[ci skip] add AGENTS.md for model-agnostic knowledge, slim CLAUDE.md to Claude-specific layer AGENTS.md (63 lines): shared infra knowledge for any AI tool (Codex, Claude, Cursor). Covers: critical rules, architecture, storage, tiers, common ops. CLAUDE.md (23 lines): Claude-specific addons — skills, agents, user preferences. References AGENTS.md for shared knowledge. Removed generic agents (devops-engineer, fullstack-developer). 2026-03-06 23:50:26 +00:00			`- pfSense: 10.0.20.1 (gateway, firewall, DNS forwarding)`
			`- MySQL InnoDB Cluster: 3 instances on iSCSI, anti-affinity excludes node2 (SIGBUS bug)`
			- SMTP: `var.mail_host` port 587 STARTTLS (not internal svc address — cert mismatch)

[ci skip] update AGENTS.md + CLAUDE.md with SOPS workflow, add k8s-portal CI pipeline AGENTS.md: added SOPS secrets management section, scripts/tg usage, contributor onboarding steps, pull-through cache bypass notes. CLAUDE.md: added SOPS workflow note, linux/amd64 build reminder, versioned tag guidance for pull-through cache. CI: new .woodpecker/k8s-portal.yml pipeline — auto-builds and deploys the k8s portal when files under stacks/platform/modules/k8s-portal/files/ change on master push. Uses buildx for linux/amd64. 2026-03-07 15:37:19 +00:00			`## Contributor Onboarding`
			`1. Get Authentik account + Headscale VPN access (ask Viktor)`
			2. Clone repo — `AGENTS.md` is auto-loaded by Codex
			`3. Create branch → edit → push → open PR`
			`4. Viktor reviews → CI applies → Slack notification`
			5. Portal: `https://k8s-portal.viktorbarzin.me/onboarding` for full guide

[ci skip] add AGENTS.md for model-agnostic knowledge, slim CLAUDE.md to Claude-specific layer AGENTS.md (63 lines): shared infra knowledge for any AI tool (Codex, Claude, Cursor). Covers: critical rules, architecture, storage, tiers, common ops. CLAUDE.md (23 lines): Claude-specific addons — skills, agents, user preferences. References AGENTS.md for shared knowledge. Removed generic agents (devops-engineer, fullstack-developer). 2026-03-06 23:50:26 +00:00			`## Common Operations`
			- Deploy new service: Use `stacks/<existing-service>/` as template. Create stack, add DNS in tfvars, apply platform then service.
			`- Fix crashed pods: Run healthcheck first. Safe to delete evicted/failed pods and CrashLoopBackOff pods with >10 restarts.`
			- OOMKilled: Check `kubectl describe limitrange tier-defaults -n <ns>`. Increase `resources.limits.memory` in the stack's main.tf.
[ci skip] update AGENTS.md + CLAUDE.md with SOPS workflow, add k8s-portal CI pipeline AGENTS.md: added SOPS secrets management section, scripts/tg usage, contributor onboarding steps, pull-through cache bypass notes. CLAUDE.md: added SOPS workflow note, linux/amd64 build reminder, versioned tag guidance for pull-through cache. CI: new .woodpecker/k8s-portal.yml pipeline — auto-builds and deploys the k8s portal when files under stacks/platform/modules/k8s-portal/files/ change on master push. Uses buildx for linux/amd64. 2026-03-07 15:37:19 +00:00			- Add a secret: `sops set secrets.sops.json '["key"]' '"value"'` then commit.
[ci skip] add AGENTS.md for model-agnostic knowledge, slim CLAUDE.md to Claude-specific layer AGENTS.md (63 lines): shared infra knowledge for any AI tool (Codex, Claude, Cursor). Covers: critical rules, architecture, storage, tiers, common ops. CLAUDE.md (23 lines): Claude-specific addons — skills, agents, user preferences. References AGENTS.md for shared knowledge. Removed generic agents (devops-engineer, fullstack-developer). 2026-03-06 23:50:26 +00:00			- NFS exports: Create dir on TrueNAS first, add to `secrets/nfs_directories.txt`, run `secrets/nfs_exports.sh`.

			`## Detailed Reference`
			See `.claude/reference/patterns.md` for: NFS volume code examples, iSCSI details, Kyverno governance tables, anti-AI scraping layers, Terragrunt architecture, node rebuild procedure, archived troubleshooting runbooks index.