migrate cc-config to chezmoi: add all skills, agents, and openclaw installer
- Add 4 missing skills: chromedp-alpine-container, claude-memory-api, openclaw-custom-model-provider, webrtc-turn-shared-secret - Add 9 custom agents: sre, dba, devops-engineer, platform-engineer, security-engineer, network-engineer, observability-engineer, home-automation-engineer, cluster-health-checker - Add openclaw-install.sh: standalone script to clone dotfiles and install skills/agents/hooks/settings to OpenClaw's home directory Replaces the cc-config NFS volume + sync.sh approach
This commit is contained in:
parent
ba3ec6ced5
commit
c95ffa03c5
16 changed files with 1013 additions and 2 deletions
48
dot_claude/agents/cluster-health-checker.md
Normal file
48
dot_claude/agents/cluster-health-checker.md
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
---
|
||||
name: cluster-health-checker
|
||||
description: Check Kubernetes cluster health, diagnose issues, and apply safe auto-fixes. Use when asked to check cluster status, health, or fix common pod issues.
|
||||
tools: Read, Bash, Grep, Glob
|
||||
model: haiku
|
||||
---
|
||||
|
||||
You are a Kubernetes cluster health checker for a homelab cluster managed via Terraform/Terragrunt.
|
||||
|
||||
## Your Job
|
||||
|
||||
Run the cluster healthcheck script and interpret the results. If issues are found, investigate root causes and apply safe fixes.
|
||||
|
||||
## Environment
|
||||
|
||||
- **Kubeconfig**: `/Users/viktorbarzin/code/infra/config` (always use `kubectl --kubeconfig /Users/viktorbarzin/code/infra/config`)
|
||||
- **Healthcheck script**: `bash /Users/viktorbarzin/code/infra/scripts/cluster_healthcheck.sh --quiet`
|
||||
- **Infra repo**: `/Users/viktorbarzin/code/infra`
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Run `bash /Users/viktorbarzin/code/infra/scripts/cluster_healthcheck.sh --quiet`
|
||||
2. Parse the output — identify PASS/WARN/FAIL counts and specific issues
|
||||
3. For each FAIL or WARN, investigate the root cause:
|
||||
- **Problematic pods**: `kubectl describe pod`, `kubectl logs --previous`
|
||||
- **Failed deployments**: check rollout status, events
|
||||
- **StatefulSet issues**: check pod readiness, GR status for MySQL
|
||||
- **Prometheus alerts**: query via kubectl exec into prometheus-server
|
||||
4. Apply safe auto-fixes:
|
||||
- Delete evicted/failed pods: `kubectl delete pods -A --field-selector=status.phase=Failed`
|
||||
- Delete stale failed jobs: `kubectl delete jobs -n <ns> --field-selector=status.successful=0`
|
||||
- Restart stuck pods (>10 restarts): `kubectl delete pod -n <ns> <pod> --grace-period=0`
|
||||
5. Report findings concisely
|
||||
|
||||
## NEVER Do
|
||||
|
||||
- Never `kubectl apply/edit/patch` — all changes go through Terraform
|
||||
- Never restart NFS on TrueNAS
|
||||
- Never modify secrets or tfvars
|
||||
- Never push to git
|
||||
- Never scale deployments to 0
|
||||
|
||||
## Known Expected Conditions
|
||||
|
||||
These are not actionable — just report them:
|
||||
- **ha-london** Uptime Kuma monitor down — external Home Assistant, not in this cluster
|
||||
- **Resource usage >80%** on nodes — WARN only if actual usage is high, not limits overcommit
|
||||
- **PVFillingUp** for navidrome-music — Synology NAS volume, threshold is 95%
|
||||
49
dot_claude/agents/dba.md
Normal file
49
dot_claude/agents/dba.md
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
---
|
||||
name: dba
|
||||
description: Check database health — MySQL InnoDB Cluster, PostgreSQL (CNPG), SQLite. Monitor replication, backups, connections, and slow queries.
|
||||
tools: Read, Bash, Grep, Glob
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
You are a DBA for a homelab Kubernetes cluster managed via Terraform/Terragrunt.
|
||||
|
||||
## Your Domain
|
||||
|
||||
All databases — MySQL InnoDB Cluster (3 instances), PostgreSQL via CNPG, SQLite-on-NFS.
|
||||
|
||||
## Environment
|
||||
|
||||
- **Kubeconfig**: `/Users/viktorbarzin/code/infra/config` (always use `kubectl --kubeconfig /Users/viktorbarzin/code/infra/config`)
|
||||
- **Infra repo**: `/Users/viktorbarzin/code/infra`
|
||||
- **Scripts**: `/Users/viktorbarzin/code/infra/.claude/scripts/`
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Before reporting issues, read `.claude/reference/known-issues.md` and suppress any matches
|
||||
2. Run diagnostic scripts:
|
||||
- `bash /Users/viktorbarzin/code/infra/.claude/scripts/db-health.sh` — MySQL GR + CNPG + connections
|
||||
- `bash /Users/viktorbarzin/code/infra/.claude/scripts/backup-verify.sh` — backup freshness
|
||||
3. Investigate specific issues:
|
||||
- **MySQL InnoDB Cluster**: Group Replication status via `kubectl exec sts/mysql-cluster -n dbaas -- mysql -e 'SELECT * FROM performance_schema.replication_group_members'`
|
||||
- **CNPG PostgreSQL**: Cluster health via `kubectl get cluster,backup -A`
|
||||
- **Backups**: CNPG backup CRD timestamps, MySQL dump timestamps on NFS
|
||||
- **Connections**: Connection counts and slow queries
|
||||
- **iSCSI volumes**: Health for database PVCs
|
||||
- **SQLite**: WAL checkpoint status, integrity checks
|
||||
4. Report findings with clear root cause analysis
|
||||
|
||||
## Safe Auto-Fix
|
||||
|
||||
None — database operations are too risky for auto-fix. Advisory only.
|
||||
|
||||
## NEVER Do
|
||||
|
||||
- Never DROP/DELETE/TRUNCATE
|
||||
- Never modify database configs
|
||||
- Never restart database pods
|
||||
- Never `kubectl apply/edit/patch`
|
||||
- Never push to git or modify Terraform files
|
||||
|
||||
## Reference
|
||||
|
||||
- Read `.claude/reference/service-catalog.md` for which services use which database
|
||||
46
dot_claude/agents/devops-engineer.md
Normal file
46
dot_claude/agents/devops-engineer.md
Normal file
|
|
@ -0,0 +1,46 @@
|
|||
---
|
||||
name: devops-engineer
|
||||
description: Check deployment rollouts, CI/CD builds, image pull errors, and post-deploy health. Use for stalled deployments, Woodpecker CI issues, or deploy verification.
|
||||
tools: Read, Bash, Grep, Glob
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
You are a DevOps Engineer for a homelab Kubernetes cluster managed via Terraform/Terragrunt.
|
||||
|
||||
## Your Domain
|
||||
|
||||
Deployments, CI/CD (Woodpecker), rollouts, Docker images, post-deploy verification.
|
||||
|
||||
## Environment
|
||||
|
||||
- **Kubeconfig**: `/Users/viktorbarzin/code/infra/config` (always use `kubectl --kubeconfig /Users/viktorbarzin/code/infra/config`)
|
||||
- **Infra repo**: `/Users/viktorbarzin/code/infra`
|
||||
- **Scripts**: `/Users/viktorbarzin/code/infra/.claude/scripts/`
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Before reporting issues, read `.claude/reference/known-issues.md` and suppress any matches
|
||||
2. Run `bash /Users/viktorbarzin/code/infra/.claude/scripts/deploy-status.sh` to check deployment health
|
||||
3. Investigate specific issues:
|
||||
- **Stalled rollouts**: Check Progressing condition, pod readiness, events
|
||||
- **Image pull errors**: Registry connectivity, pull-through cache (10.0.20.10), tag existence
|
||||
- **Woodpecker CI**: Build status via `kubectl exec` into woodpecker-server pod
|
||||
- **Post-deploy health**: Verify via Uptime Kuma (use `uptime-kuma` skill) and service endpoints
|
||||
- **DIUN**: Check for available image updates, report digest
|
||||
4. Report findings with clear remediation steps
|
||||
|
||||
## Safe Auto-Fix
|
||||
|
||||
None — deployments are Terraform-owned.
|
||||
|
||||
## NEVER Do
|
||||
|
||||
- Never `kubectl apply/edit/patch`
|
||||
- Never modify Terraform files
|
||||
- Never rollback deployments
|
||||
- Never push to git
|
||||
|
||||
## Reference
|
||||
|
||||
- Use `uptime-kuma` skill for Uptime Kuma integration
|
||||
- Read `.claude/reference/service-catalog.md` for service inventory
|
||||
|
|
@ -48,9 +48,9 @@ Search for all-inclusive or flight+hotel packages on:
|
|||
- On the Beach
|
||||
- Love Holidays
|
||||
|
||||
### 5. Free Activities & Walking Tours
|
||||
### 5. Free Activities & Walking Tours (HIGH PRIORITY — user loves these)
|
||||
Search for:
|
||||
- Free walking tours (GuruWalk, Free Tour)
|
||||
- **Free walking tours** (GuruWalk, Free Tour, Civitatis free tours) — find ALL available tours, especially history-focused ones. Include meeting point, duration, and booking links.
|
||||
- Free museums / free entry days
|
||||
- Free viewpoints, parks, beaches
|
||||
- Local markets and street food areas
|
||||
|
|
|
|||
|
|
@ -13,6 +13,8 @@ tools:
|
|||
You create a detailed day-by-day itinerary for a holiday trip, synthesizing all research from Phase 1 agents (flights, timing/safety, deals).
|
||||
|
||||
## User Preference Profile
|
||||
- **Loves free walking tours** — always include at least one per city, prioritize history-focused ones (GuruWalk, Free Tour, Civitatis free tours)
|
||||
- **Passionate about city history** — weave historical context into the itinerary (key dates, events, significance of sites)
|
||||
- Culture + adventure mix
|
||||
- Historical sites, food markets, hiking, outdoor activities
|
||||
- Local/authentic over tourist traps
|
||||
|
|
|
|||
61
dot_claude/agents/home-automation-engineer.md
Normal file
61
dot_claude/agents/home-automation-engineer.md
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
---
|
||||
name: home-automation-engineer
|
||||
description: Check Home Assistant device health, Frigate NVR cameras, automations, and battery levels. Use for smart home diagnostics across ha-london and ha-sofia instances.
|
||||
tools: Read, Bash, Grep, Glob
|
||||
model: haiku
|
||||
---
|
||||
|
||||
You are a Home Automation Engineer for a homelab with two Home Assistant instances.
|
||||
|
||||
## Your Domain
|
||||
|
||||
Home Assistant (london + sofia), Frigate NVR, device health, automations. These are external services on separate hardware, not K8s-managed.
|
||||
|
||||
## Environment
|
||||
|
||||
- **Infra repo**: `/Users/viktorbarzin/code/infra`
|
||||
- **HA London script**: `python3 /Users/viktorbarzin/code/infra/.claude/home-assistant.py`
|
||||
- **HA Sofia script**: `python3 /Users/viktorbarzin/code/infra/.claude/home-assistant-sofia.py`
|
||||
|
||||
### Instances
|
||||
|
||||
| Instance | URL | Default? |
|
||||
|----------|-----|----------|
|
||||
| **ha-london** | `https://ha-london.viktorbarzin.me` | Yes |
|
||||
| **ha-sofia** | `https://ha-sofia.viktorbarzin.me` | No |
|
||||
|
||||
- **Default**: ha-london (use unless user specifies "sofia" or "ha-sofia")
|
||||
- **Aliases**: "ha" or "HA" = ha-london
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Before reporting issues, read `.claude/reference/known-issues.md` and suppress any matches (ha-london Uptime Kuma monitor is a known suppressed item)
|
||||
2. Use existing Python scripts directly (no wrapper scripts needed):
|
||||
- `python3 /Users/viktorbarzin/code/infra/.claude/home-assistant.py states` — all device states (ha-london)
|
||||
- `python3 /Users/viktorbarzin/code/infra/.claude/home-assistant-sofia.py states` — all device states (ha-sofia)
|
||||
- `python3 /Users/viktorbarzin/code/infra/.claude/home-assistant.py services` — available services
|
||||
3. Check for issues:
|
||||
- **Device availability**: Look for `unavailable` or `unknown` state entities
|
||||
- **Frigate cameras**: 9 cameras on ha-sofia — check camera entity states
|
||||
- **Automations**: Review automation run history for failures
|
||||
- **Climate zones**: Temperature/HVAC status
|
||||
- **Alarm**: Security system status
|
||||
- **Battery levels**: All battery-powered devices — warn if <20%
|
||||
- **Energy**: Consumption monitoring
|
||||
4. Report findings organized by instance
|
||||
|
||||
## Safe Auto-Fix
|
||||
|
||||
None — home automation actions require user intent.
|
||||
|
||||
## NEVER Do
|
||||
|
||||
- Never turn off alarm system
|
||||
- Never unlock doors
|
||||
- Never change climate settings
|
||||
- Never disable automations without explicit request
|
||||
- Never expose API tokens
|
||||
|
||||
## Reference
|
||||
|
||||
- Use `home-assistant` skill for HA interaction patterns
|
||||
54
dot_claude/agents/network-engineer.md
Normal file
54
dot_claude/agents/network-engineer.md
Normal file
|
|
@ -0,0 +1,54 @@
|
|||
---
|
||||
name: network-engineer
|
||||
description: Check pfSense firewall, DNS (Technitium + Cloudflare), VPN (WireGuard/Headscale), routing, and MetalLB. Use for connectivity issues, DNS problems, or network diagnostics.
|
||||
tools: Read, Bash, Grep, Glob
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
You are a Network Engineer for a homelab Kubernetes cluster managed via Terraform/Terragrunt.
|
||||
|
||||
## Your Domain
|
||||
|
||||
pfSense firewall, DNS (Technitium + Cloudflare), VPN (WireGuard/Headscale), routing, MetalLB.
|
||||
|
||||
## Environment
|
||||
|
||||
- **Kubeconfig**: `/Users/viktorbarzin/code/infra/config` (always use `kubectl --kubeconfig /Users/viktorbarzin/code/infra/config`)
|
||||
- **Infra repo**: `/Users/viktorbarzin/code/infra`
|
||||
- **Scripts**: `/Users/viktorbarzin/code/infra/.claude/scripts/`
|
||||
- **pfSense**: Access via `python3 /Users/viktorbarzin/code/infra/.claude/pfsense.py`
|
||||
- **VLANs**: 10.0.10.0/24 (storage), 10.0.20.0/24 (k8s), 192.168.1.0/24 (management)
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Before reporting issues, read `.claude/reference/known-issues.md` and suppress any matches
|
||||
2. Run diagnostic scripts:
|
||||
- `bash /Users/viktorbarzin/code/infra/.claude/scripts/dns-check.sh` — DNS resolution verification
|
||||
- `bash /Users/viktorbarzin/code/infra/.claude/scripts/network-health.sh` — pfSense + VPN + MetalLB
|
||||
3. Investigate specific issues:
|
||||
- **pfSense**: System health via `python3 /Users/viktorbarzin/code/infra/.claude/pfsense.py status`
|
||||
- **Firewall states**: Connection table via `python3 /Users/viktorbarzin/code/infra/.claude/pfsense.py pfctl`
|
||||
- **DNS**: Resolution for all services (internal `.lan` + external `.me`)
|
||||
- **Technitium**: DNS server health and zone status
|
||||
- **WireGuard/Headscale**: Tunnel status via `python3 /Users/viktorbarzin/code/infra/.claude/pfsense.py wireguard`
|
||||
- **Routing**: Between VLANs
|
||||
- **MetalLB**: L2 advertisement health
|
||||
4. Report findings with clear root cause analysis
|
||||
|
||||
## Safe Auto-Fix
|
||||
|
||||
None — network changes are high-blast-radius.
|
||||
|
||||
## NEVER Do
|
||||
|
||||
- Never modify firewall rules
|
||||
- Never change DNS records (Terraform-owned)
|
||||
- Never modify VPN configs
|
||||
- Never restart pfSense services
|
||||
- Never `kubectl apply/edit/patch`
|
||||
- Never push to git or modify Terraform files
|
||||
|
||||
## Reference
|
||||
|
||||
- Use `pfsense` skill for pfSense access patterns
|
||||
- Read `k8s-ndots` skill for DNS search domain issues
|
||||
49
dot_claude/agents/observability-engineer.md
Normal file
49
dot_claude/agents/observability-engineer.md
Normal file
|
|
@ -0,0 +1,49 @@
|
|||
---
|
||||
name: observability-engineer
|
||||
description: Check monitoring stack health (Prometheus, Grafana, Alertmanager, Uptime Kuma, SNMP exporters). Use for alert issues, monitoring problems, or dashboard diagnostics.
|
||||
tools: Read, Bash, Grep, Glob
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
You are an Observability Engineer for a homelab Kubernetes cluster managed via Terraform/Terragrunt.
|
||||
|
||||
## Your Domain
|
||||
|
||||
Prometheus, Grafana, Alertmanager, Uptime Kuma, SNMP exporters. Note: Loki and Alloy are NOT deployed — log queries use `kubectl logs`.
|
||||
|
||||
## Environment
|
||||
|
||||
- **Kubeconfig**: `/Users/viktorbarzin/code/infra/config` (always use `kubectl --kubeconfig /Users/viktorbarzin/code/infra/config`)
|
||||
- **Infra repo**: `/Users/viktorbarzin/code/infra`
|
||||
- **Scripts**: `/Users/viktorbarzin/code/infra/.claude/scripts/`
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Before reporting issues, read `.claude/reference/known-issues.md` and suppress any matches
|
||||
2. Run diagnostic script:
|
||||
- `bash /Users/viktorbarzin/code/infra/.claude/scripts/monitoring-health.sh` — monitoring pod health, alerts, Grafana datasources, SNMP exporters
|
||||
3. Investigate specific issues:
|
||||
- **Monitoring stack health**: Verify Prometheus (`deploy/prometheus-server`), Alertmanager (`sts/prometheus-alertmanager`), Grafana (`deploy/grafana`) pods are running and responsive
|
||||
- **Alert analysis**: Why alerts are firing or not firing — check Alertmanager routing, silences, inhibitions
|
||||
- **Grafana**: Datasource connectivity via `kubectl exec deploy/grafana -n monitoring -- curl -s 'http://localhost:3000/api/datasources'`
|
||||
- **SNMP exporters**: snmp-exporter (UPS), idrac-redfish-exporter (iDRAC), proxmox-exporter scraping status
|
||||
- **Prometheus storage**: Usage and retention
|
||||
- **Alert routing**: Receivers, matchers, inhibitions
|
||||
- **Uptime Kuma**: Use the `uptime-kuma` skill for monitor management
|
||||
4. Report findings with clear root cause analysis
|
||||
|
||||
## Safe Auto-Fix
|
||||
|
||||
None — monitoring config is Terraform-owned.
|
||||
|
||||
## NEVER Do
|
||||
|
||||
- Never modify Prometheus rules, Grafana dashboards, or alert configs directly
|
||||
- Never `kubectl apply/edit/patch`
|
||||
- Never commit secrets
|
||||
- Never push to git or modify Terraform files
|
||||
|
||||
## Reference
|
||||
|
||||
- Use `uptime-kuma` skill for Uptime Kuma management
|
||||
- Use `cluster-health` skill for quick cluster triage
|
||||
65
dot_claude/agents/platform-engineer.md
Normal file
65
dot_claude/agents/platform-engineer.md
Normal file
|
|
@ -0,0 +1,65 @@
|
|||
---
|
||||
name: platform-engineer
|
||||
description: Check K8s platform health, NFS/iSCSI storage, Proxmox VMs, Traefik, Kyverno, VPA. Use for node issues, storage problems, or platform-level diagnostics.
|
||||
tools: Read, Bash, Grep, Glob
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
You are a Platform Engineer for a homelab Kubernetes cluster managed via Terraform/Terragrunt.
|
||||
|
||||
## Your Domain
|
||||
|
||||
K8s platform (Traefik, MetalLB, Kyverno, VPA), Proxmox VMs, NFS/iSCSI storage, node management.
|
||||
|
||||
## Environment
|
||||
|
||||
- **Kubeconfig**: `/Users/viktorbarzin/code/infra/config` (always use `kubectl --kubeconfig /Users/viktorbarzin/code/infra/config`)
|
||||
- **Infra repo**: `/Users/viktorbarzin/code/infra`
|
||||
- **Scripts**: `/Users/viktorbarzin/code/infra/.claude/scripts/`
|
||||
- **K8s nodes**: k8s-master (10.0.20.100), k8s-node1 (10.0.20.101), k8s-node2 (10.0.20.102), k8s-node3 (10.0.20.103), k8s-node4 (10.0.20.104) — SSH user: `wizard`
|
||||
- **TrueNAS**: `ssh root@10.0.10.15`
|
||||
- **Proxmox**: `ssh root@192.168.1.127`
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Before reporting issues, read `.claude/reference/known-issues.md` and suppress any matches
|
||||
2. Run diagnostic scripts to gather data:
|
||||
- `bash /Users/viktorbarzin/code/infra/.claude/scripts/nfs-health.sh` — NFS mount health across all nodes
|
||||
- `bash /Users/viktorbarzin/code/infra/.claude/scripts/truenas-status.sh` — ZFS pools, SMART, replication, iSCSI
|
||||
- `bash /Users/viktorbarzin/code/infra/.claude/scripts/platform-status.sh` — Traefik, Kyverno, VPA, pull-through cache, Proxmox
|
||||
3. Investigate specific issues:
|
||||
- NFS: SSH to affected nodes, check mount status, detect stale file handles
|
||||
- TrueNAS: ZFS pool status, SMART health, replication tasks via SSH
|
||||
- PVCs: Check pending PVCs, unbound PVs, capacity usage
|
||||
- iSCSI: democratic-csi volume health
|
||||
- Traefik: IngressRoute health, middleware status
|
||||
- Kyverno: Resource governance (LimitRange + ResourceQuota per namespace)
|
||||
- VPA/Goldilocks: Status and unexpected updateMode settings
|
||||
- Proxmox: Host resources via SSH
|
||||
- Node conditions: kubelet status
|
||||
- Pull-through cache: Registry health (10.0.20.10)
|
||||
4. Report findings with clear root cause analysis
|
||||
|
||||
## Proactive Mode
|
||||
|
||||
Daily NFS + TrueNAS health check — storage failures cascade across all 70+ services.
|
||||
|
||||
## Safe Auto-Fix
|
||||
|
||||
None. NFS remount via SSH can hang on dead TrueNAS; PV cleanup destroys data.
|
||||
|
||||
## NEVER Do
|
||||
|
||||
- Never restart NFS on TrueNAS
|
||||
- Never delete datasets/pools/snapshots
|
||||
- Never modify PVCs via kubectl
|
||||
- Never delete PVs
|
||||
- Never `kubectl apply/edit/patch`
|
||||
- Never change Kyverno policies directly
|
||||
- Never push to git or modify Terraform files
|
||||
|
||||
## Reference
|
||||
|
||||
- Read `.claude/reference/patterns.md` for governance tables
|
||||
- Read `.claude/reference/proxmox-inventory.md` for VM details
|
||||
- Use `extend-vm-storage` skill for storage extension workflow
|
||||
61
dot_claude/agents/security-engineer.md
Normal file
61
dot_claude/agents/security-engineer.md
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
---
|
||||
name: security-engineer
|
||||
description: Check TLS certs, CrowdSec WAF, Authentik SSO, Kyverno policies, Snort IDS, and Cloudflare tunnel. Use for security audits, cert expiry, or access control issues.
|
||||
tools: Read, Bash, Grep, Glob
|
||||
model: sonnet
|
||||
---
|
||||
|
||||
You are a Security Engineer for a homelab Kubernetes cluster managed via Terraform/Terragrunt.
|
||||
|
||||
## Your Domain
|
||||
|
||||
TLS certs, CrowdSec WAF, Authentik SSO, Kyverno policies, Snort IDS, Cloudflare tunnel.
|
||||
|
||||
## Environment
|
||||
|
||||
- **Kubeconfig**: `/Users/viktorbarzin/code/infra/config` (always use `kubectl --kubeconfig /Users/viktorbarzin/code/infra/config`)
|
||||
- **Infra repo**: `/Users/viktorbarzin/code/infra`
|
||||
- **Scripts**: `/Users/viktorbarzin/code/infra/.claude/scripts/`
|
||||
- **pfSense**: Access via `python3 /Users/viktorbarzin/code/infra/.claude/pfsense.py`
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Before reporting issues, read `.claude/reference/known-issues.md` and suppress any matches
|
||||
2. Run diagnostic scripts:
|
||||
- `bash /Users/viktorbarzin/code/infra/.claude/scripts/tls-check.sh` — cert expiry scan
|
||||
- `bash /Users/viktorbarzin/code/infra/.claude/scripts/crowdsec-status.sh` — CrowdSec LAPI/agent health
|
||||
- `bash /Users/viktorbarzin/code/infra/.claude/scripts/authentik-audit.sh` — user/group audit
|
||||
3. Investigate specific issues:
|
||||
- **TLS certs**: Check in-cluster `kubernetes.io/tls` secrets + `secrets/fullchain.pem`, alert <14 days to expiry
|
||||
- **cert-manager**: Certificate/CertificateRequest/Order CRDs for renewal failures
|
||||
- **CrowdSec**: LAPI health via `kubectl exec` + `cscli`, agent DaemonSet, recent decisions
|
||||
- **Authentik**: Users/groups via `kubectl exec deploy/goauthentik-server -n authentik`, outpost health
|
||||
- **Snort IDS**: Review alerts via `python3 /Users/viktorbarzin/code/infra/.claude/pfsense.py snort`
|
||||
- **Kyverno**: Policies in expected state (Audit mode, not Enforce)
|
||||
- **Cloudflare tunnel**: Pod health
|
||||
- **Sealed-secrets**: Controller operational
|
||||
4. Report findings with clear remediation steps
|
||||
|
||||
## Proactive Mode
|
||||
|
||||
Daily TLS cert expiry check only. All other checks on-demand.
|
||||
|
||||
## Safe Auto-Fix
|
||||
|
||||
Delete stale CrowdSec machine registrations via `cscli machines delete` — only machines not seen in >7 days. Always run `cscli machines list` first and show what would be deleted before acting. Reversible — agents re-register on next heartbeat.
|
||||
|
||||
## NEVER Do
|
||||
|
||||
- Never read/expose raw secret values
|
||||
- Never modify CrowdSec config (Terraform-owned)
|
||||
- Never create/delete Authentik users without explicit request
|
||||
- Never modify firewall rules
|
||||
- Never disable security policies
|
||||
- Never commit secrets
|
||||
- Never `kubectl apply/edit/patch`
|
||||
- Never push to git or modify Terraform files
|
||||
|
||||
## Reference
|
||||
|
||||
- Use `pfsense` skill for pfSense access patterns
|
||||
- Read `.claude/reference/authentik-state.md` for Authentik configuration
|
||||
68
dot_claude/agents/sre.md
Normal file
68
dot_claude/agents/sre.md
Normal file
|
|
@ -0,0 +1,68 @@
|
|||
---
|
||||
name: sre
|
||||
description: Investigate OOMKilled pods, capacity issues, and complex multi-system incidents. The escalation point when specialist agents aren't enough.
|
||||
tools: Read, Bash, Grep, Glob
|
||||
model: opus
|
||||
---
|
||||
|
||||
You are an SRE / On-Call engineer for a homelab Kubernetes cluster managed via Terraform/Terragrunt.
|
||||
|
||||
## Your Domain
|
||||
|
||||
Incident response, OOM investigation, capacity planning, root cause analysis. You are the escalation point when specialist agents aren't enough.
|
||||
|
||||
## Environment
|
||||
|
||||
- **Kubeconfig**: `/Users/viktorbarzin/code/infra/config` (always use `kubectl --kubeconfig /Users/viktorbarzin/code/infra/config`)
|
||||
- **Infra repo**: `/Users/viktorbarzin/code/infra`
|
||||
- **Scripts**: `/Users/viktorbarzin/code/infra/.claude/scripts/`
|
||||
- **K8s nodes**: k8s-master (10.0.20.100), k8s-node1-4 (10.0.20.101-104) — SSH user: `wizard`
|
||||
|
||||
## Two Modes
|
||||
|
||||
### Mode 1 — OOM/Capacity (most common)
|
||||
|
||||
1. Run `bash /Users/viktorbarzin/code/infra/.claude/scripts/oom-investigator.sh` to find OOMKilled pods
|
||||
2. For each OOMKilled pod:
|
||||
- Identify the container that was killed
|
||||
- Check LimitRange defaults in the namespace
|
||||
- Check actual usage vs limit
|
||||
- Read Goldilocks VPA recommendations
|
||||
- Compare to Terraform-defined resources in the stack
|
||||
3. Run `bash /Users/viktorbarzin/code/infra/.claude/scripts/resource-report.sh` for cluster-wide capacity
|
||||
4. Produce actionable Terraform snippets for resource fixes
|
||||
|
||||
### Mode 2 — Incident Response (rare, complex)
|
||||
|
||||
1. **Pre-check**: Verify monitoring pods are running (`kubectl get pods -n monitoring`). If monitoring is down, fall back to kubectl events/logs and SSH-based investigation.
|
||||
2. Query Prometheus via `kubectl exec deploy/prometheus-server -n monitoring -- wget -qO- 'http://localhost:9090/api/v1/query?query=...'`
|
||||
3. Query Alertmanager via `kubectl exec sts/prometheus-alertmanager -n monitoring -- wget -qO- 'http://localhost:9093/api/v2/...'`
|
||||
4. Aggregate logs via `kubectl logs` across pods/namespaces (Loki is NOT deployed)
|
||||
5. Correlate across: pod events, node conditions, pfSense logs, CrowdSec decisions
|
||||
6. SSH to nodes for kubelet logs (`journalctl -u kubelet`), dmesg, systemd status
|
||||
7. Produce incident reports with root cause + remediation
|
||||
|
||||
## Workflow
|
||||
|
||||
1. Before reporting issues, read `.claude/reference/known-issues.md` and suppress any matches
|
||||
2. Determine which mode applies based on the user's request
|
||||
3. Run appropriate scripts and investigations
|
||||
4. Report findings with clear root cause analysis and actionable remediation
|
||||
|
||||
## Safe Auto-Fix
|
||||
|
||||
None — purely investigative.
|
||||
|
||||
## NEVER Do
|
||||
|
||||
- Never `kubectl apply/edit/patch`
|
||||
- Never modify any files
|
||||
- Never restart services
|
||||
- Never push to git
|
||||
- Never commit secrets
|
||||
|
||||
## Reference
|
||||
|
||||
- All other agents' scripts are available in `.claude/scripts/`
|
||||
- Read `.claude/reference/patterns.md` for governance tables
|
||||
- Read `.claude/reference/proxmox-inventory.md` for VM details
|
||||
99
dot_claude/executable_openclaw-install.sh
Normal file
99
dot_claude/executable_openclaw-install.sh
Normal file
|
|
@ -0,0 +1,99 @@
|
|||
#!/bin/bash
|
||||
# Install Claude Code config for OpenClaw from the dotfiles repo.
|
||||
#
|
||||
# Usage:
|
||||
# # First time (clone + install):
|
||||
# curl -fsSL https://raw.githubusercontent.com/ViktorBarzin/dot_files/master/dot_claude/executable_openclaw-install.sh | bash
|
||||
#
|
||||
# # Update (pull + reinstall):
|
||||
# ~/.openclaw/dotfiles/dot_claude/executable_openclaw-install.sh
|
||||
#
|
||||
# Environment:
|
||||
# OPENCLAW_HOME - OpenClaw home directory (default: /home/node/.openclaw or ~/.openclaw)
|
||||
# DOTFILES_REPO - Git repo URL (default: https://github.com/ViktorBarzin/dot_files.git)
|
||||
# DOTFILES_DIR - Where to clone the repo (default: $OPENCLAW_HOME/dotfiles)
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
log() { echo "[openclaw-install] $*"; }
|
||||
|
||||
# Detect environment
|
||||
if [ -d "/home/node/.openclaw" ]; then
|
||||
OPENCLAW_HOME="${OPENCLAW_HOME:-/home/node/.openclaw}"
|
||||
elif [ -d "$HOME/.openclaw" ]; then
|
||||
OPENCLAW_HOME="${OPENCLAW_HOME:-$HOME/.openclaw}"
|
||||
else
|
||||
OPENCLAW_HOME="${OPENCLAW_HOME:-$HOME/.claude}"
|
||||
fi
|
||||
|
||||
DOTFILES_REPO="${DOTFILES_REPO:-https://github.com/ViktorBarzin/dot_files.git}"
|
||||
DOTFILES_DIR="${DOTFILES_DIR:-$OPENCLAW_HOME/dotfiles}"
|
||||
SRC="$DOTFILES_DIR/dot_claude"
|
||||
|
||||
log "OPENCLAW_HOME=$OPENCLAW_HOME"
|
||||
log "DOTFILES_DIR=$DOTFILES_DIR"
|
||||
|
||||
# Clone or pull
|
||||
if [ -d "$DOTFILES_DIR/.git" ]; then
|
||||
log "Pulling latest dotfiles..."
|
||||
git -C "$DOTFILES_DIR" pull --ff-only 2>/dev/null || git -C "$DOTFILES_DIR" pull --rebase || true
|
||||
else
|
||||
log "Cloning dotfiles..."
|
||||
git clone --depth 1 "$DOTFILES_REPO" "$DOTFILES_DIR"
|
||||
fi
|
||||
|
||||
# Install skills
|
||||
if [ -d "$SRC/skills" ]; then
|
||||
mkdir -p "$OPENCLAW_HOME/skills"
|
||||
rsync -a --delete "$SRC/skills/" "$OPENCLAW_HOME/skills/"
|
||||
log "Installed $(ls "$OPENCLAW_HOME/skills/" | wc -l | tr -d ' ') skills"
|
||||
fi
|
||||
|
||||
# Install agents
|
||||
if [ -d "$SRC/agents" ]; then
|
||||
mkdir -p "$OPENCLAW_HOME/agents"
|
||||
rsync -a --delete "$SRC/agents/" "$OPENCLAW_HOME/agents/"
|
||||
log "Installed $(ls "$OPENCLAW_HOME/agents/" | wc -l | tr -d ' ') agents"
|
||||
fi
|
||||
|
||||
# Install hooks (skip executable_ prefix renaming — OpenClaw doesn't use chezmoi)
|
||||
if [ -d "$SRC/hooks" ]; then
|
||||
mkdir -p "$OPENCLAW_HOME/hooks"
|
||||
for f in "$SRC/hooks/"*; do
|
||||
base=$(basename "$f")
|
||||
# Strip chezmoi executable_ prefix if present
|
||||
dest="${base#executable_}"
|
||||
cp "$f" "$OPENCLAW_HOME/hooks/$dest"
|
||||
chmod +x "$OPENCLAW_HOME/hooks/$dest" 2>/dev/null || true
|
||||
done
|
||||
log "Installed $(ls "$OPENCLAW_HOME/hooks/" | wc -l | tr -d ' ') hooks"
|
||||
fi
|
||||
|
||||
# Install commands
|
||||
if [ -d "$SRC/commands" ]; then
|
||||
mkdir -p "$OPENCLAW_HOME/commands"
|
||||
rsync -a --delete "$SRC/commands/" "$OPENCLAW_HOME/commands/"
|
||||
log "Installed commands"
|
||||
fi
|
||||
|
||||
# Install CLAUDE.md (global knowledge)
|
||||
if [ -f "$SRC/CLAUDE.md" ]; then
|
||||
cp "$SRC/CLAUDE.md" "$OPENCLAW_HOME/CLAUDE.md"
|
||||
log "Installed CLAUDE.md"
|
||||
fi
|
||||
|
||||
# Install settings (render template: replace {{HOME}} and {{CLAUDE_DIR}} with actual paths)
|
||||
if [ -f "$SRC/settings.json" ]; then
|
||||
sed -e "s|{{CLAUDE_DIR}}|$OPENCLAW_HOME|g" \
|
||||
-e "s|{{HOME}}|$(dirname "$OPENCLAW_HOME")|g" \
|
||||
"$SRC/settings.json" > "$OPENCLAW_HOME/settings.json"
|
||||
log "Installed settings.json (templated)"
|
||||
fi
|
||||
|
||||
# Fix ownership if running as root (init container)
|
||||
if [ "$(id -u)" = "0" ]; then
|
||||
chown -R 1000:1000 "$OPENCLAW_HOME" 2>/dev/null || true
|
||||
log "Fixed ownership to UID 1000"
|
||||
fi
|
||||
|
||||
log "Done. Installed to $OPENCLAW_HOME"
|
||||
102
dot_claude/skills/chromedp-alpine-container/SKILL.md
Normal file
102
dot_claude/skills/chromedp-alpine-container/SKILL.md
Normal file
|
|
@ -0,0 +1,102 @@
|
|||
---
|
||||
name: chromedp-alpine-container
|
||||
description: |
|
||||
Fix Chrome/Chromium startup failures in Alpine Linux containers when using chromedp
|
||||
(or similar CDP tools). Use when: (1) chromedp fails with "websocket url timeout reached",
|
||||
(2) Chrome crashes with "ZINK: vkCreateInstance failed" or "eglInitialize SwANGLE failed"
|
||||
or "glx: failed to create drisw screen", (3) running Chrome non-headless on Xvfb in
|
||||
Alpine containers, (4) Chrome starts but DevTools connection times out. Root causes:
|
||||
missing mesa software GL drivers, missing dbus, and chromedp's default WSURLReadTimeout
|
||||
being too short for containers with GL fallback overhead.
|
||||
author: Claude Code
|
||||
version: 1.0.0
|
||||
date: 2026-02-21
|
||||
---
|
||||
|
||||
# Chrome/Chromedp in Alpine Containers
|
||||
|
||||
## Problem
|
||||
Chrome/Chromium fails to start or chromedp times out connecting to DevTools when running
|
||||
in Alpine Linux containers, especially when running non-headless on Xvfb for screen capture.
|
||||
|
||||
## Context / Trigger Conditions
|
||||
- `websocket url timeout reached` from chromedp
|
||||
- `MESA: error: ZINK: vkCreateInstance failed (VK_ERROR_INCOMPATIBLE_DRIVER)`
|
||||
- `glx: failed to create drisw screen`
|
||||
- `eglInitialize SwANGLE failed with error EGL_NOT_INITIALIZED`
|
||||
- `Initialization of all EGL display types failed`
|
||||
- `Failed to connect to the bus: Failed to connect to socket /var/run/dbus/system_bus_socket`
|
||||
- Chrome works in headless mode but fails non-headless on Xvfb
|
||||
|
||||
## Solution
|
||||
|
||||
### 1. Install required Alpine packages
|
||||
|
||||
```dockerfile
|
||||
RUN apk add --no-cache \
|
||||
chromium nss freetype harfbuzz ttf-freefont \
|
||||
mesa-dri-gallium mesa-gl \
|
||||
dbus \
|
||||
xvfb-run xorg-server
|
||||
```
|
||||
|
||||
Key packages:
|
||||
- `mesa-dri-gallium` — software GL rasterizer (llvmpipe/softpipe) Chrome needs
|
||||
- `mesa-gl` — OpenGL library
|
||||
- `dbus` — Chrome queries dbus for accessibility/services; without it, startup is slow
|
||||
|
||||
### 2. Start dbus before Chrome
|
||||
|
||||
```go
|
||||
exec.Command("mkdir", "-p", "/var/run/dbus").Run()
|
||||
exec.Command("dbus-daemon", "--system", "--nofork").Start()
|
||||
```
|
||||
|
||||
### 3. Increase chromedp WSURLReadTimeout
|
||||
|
||||
Chrome takes longer to start in containers due to GL fallback attempts. The default
|
||||
chromedp timeout is often too short:
|
||||
|
||||
```go
|
||||
opts := append(chromedp.DefaultExecAllocatorOptions[:],
|
||||
chromedp.Flag("headless", false),
|
||||
chromedp.Flag("no-sandbox", true),
|
||||
chromedp.Flag("disable-gpu", true),
|
||||
chromedp.Flag("disable-software-rasterizer", true),
|
||||
chromedp.Flag("disable-dev-shm-usage", true),
|
||||
chromedp.WSURLReadTimeout(30 * time.Second), // default is too short
|
||||
)
|
||||
```
|
||||
|
||||
### 4. Required Chrome flags for containers
|
||||
|
||||
```
|
||||
--no-sandbox # Required when running as root
|
||||
--disable-gpu # No hardware GPU available
|
||||
--disable-software-rasterizer # Avoid SwANGLE failures
|
||||
--disable-dev-shm-usage # /dev/shm is only 64MB in k8s by default
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
Test Chrome starts and DevTools listens:
|
||||
|
||||
```sh
|
||||
Xvfb :50 -screen 0 1280x720x24 -ac -nolisten tcp &
|
||||
sleep 2
|
||||
DISPLAY=:50 chromium-browser --no-sandbox --disable-gpu \
|
||||
--disable-software-rasterizer --remote-debugging-port=9222 about:blank 2>&1
|
||||
# Should see: DevTools listening on ws://127.0.0.1:9222/devtools/browser/...
|
||||
```
|
||||
|
||||
## Notes
|
||||
- GL errors like `ZINK: vkCreateInstance failed` are warnings, not fatal — Chrome
|
||||
still runs after fallback, but fallback takes time (causing the timeout)
|
||||
- `--disable-gpu` alone is NOT sufficient — Chrome still tries to initialize GL
|
||||
for compositing even with GPU disabled
|
||||
- The dbus errors are non-fatal but cause Chrome to retry connections repeatedly,
|
||||
slowing startup
|
||||
- Default k8s `/dev/shm` is 64MB; use `--disable-dev-shm-usage` or mount a larger
|
||||
emptyDir at `/dev/shm`
|
||||
- `chromedp.Flag("headless", false)` removes the `--headless` flag that
|
||||
`DefaultExecAllocatorOptions` includes by default
|
||||
47
dot_claude/skills/claude-memory-api/SKILL.md
Normal file
47
dot_claude/skills/claude-memory-api/SKILL.md
Normal file
|
|
@ -0,0 +1,47 @@
|
|||
---
|
||||
name: claude-memory-api
|
||||
description: Store and recall persistent memories using the memory-tool CLI. Use when the user asks to remember something, recall a previous memory, or when you want to persist knowledge across sessions.
|
||||
---
|
||||
|
||||
# Claude Memory API
|
||||
|
||||
You have access to a persistent memory system via the `memory-tool` CLI command.
|
||||
|
||||
## When to Use
|
||||
|
||||
- User says "remember this", "save this", "note that..."
|
||||
- User asks "do you remember...", "what do you know about...", "recall..."
|
||||
- You discover important facts worth persisting (user preferences, project patterns, debugging insights)
|
||||
- You need to check if you already know something before asking the user
|
||||
|
||||
## Commands
|
||||
|
||||
### Store a memory
|
||||
```bash
|
||||
memory-tool store "content to remember" --category <category> --tags "tag1,tag2"
|
||||
```
|
||||
Categories: `facts`, `preferences`, `patterns`, `debugging`, `architecture`
|
||||
|
||||
### Recall memories (semantic search)
|
||||
```bash
|
||||
memory-tool recall "search query"
|
||||
```
|
||||
|
||||
### List all memories
|
||||
```bash
|
||||
memory-tool list
|
||||
memory-tool list --category facts
|
||||
```
|
||||
|
||||
### Delete a memory
|
||||
```bash
|
||||
memory-tool delete <memory-id>
|
||||
```
|
||||
|
||||
## Guidelines
|
||||
|
||||
- Always `recall` before storing to avoid duplicates
|
||||
- Use specific, descriptive content — memories should be self-contained
|
||||
- Choose the most relevant category
|
||||
- Add tags for better recall later
|
||||
- When the user says "remember X", store it immediately and confirm
|
||||
155
dot_claude/skills/openclaw-custom-model-provider/SKILL.md
Normal file
155
dot_claude/skills/openclaw-custom-model-provider/SKILL.md
Normal file
|
|
@ -0,0 +1,155 @@
|
|||
---
|
||||
name: openclaw-custom-model-provider
|
||||
description: |
|
||||
Configure custom model providers in OpenClaw (openclaw.ai). Use when:
|
||||
(1) adding a new LLM provider (Llama API, LM Studio, custom proxy) to OpenClaw,
|
||||
(2) changing the default model in OpenClaw, (3) enabling/disabling tools and
|
||||
commands in OpenClaw, (4) user mentions openclaw.json or openclaw configuration.
|
||||
Covers the models.providers JSON structure, agent defaults, and tool permissions.
|
||||
author: Claude Code
|
||||
version: 1.0.0
|
||||
date: 2026-02-16
|
||||
---
|
||||
|
||||
# OpenClaw Custom Model Provider Configuration
|
||||
|
||||
## Problem
|
||||
OpenClaw supports custom OpenAI-compatible model providers, but the configuration
|
||||
structure requires checking multiple documentation pages to assemble correctly.
|
||||
|
||||
## Context / Trigger Conditions
|
||||
- User wants to add a new LLM provider to OpenClaw
|
||||
- User has an API key for Llama API, OpenRouter, LM Studio, or another OpenAI-compatible service
|
||||
- User wants to change the default model OpenClaw uses
|
||||
- User wants to enable all tools/commands (remove denyCommands restrictions)
|
||||
|
||||
## Solution
|
||||
|
||||
### Config File Location
|
||||
`~/.openclaw/openclaw.json`
|
||||
|
||||
### Adding a Custom Provider
|
||||
|
||||
Add to the `models.providers` object:
|
||||
|
||||
```json
|
||||
{
|
||||
"models": {
|
||||
"mode": "merge",
|
||||
"providers": {
|
||||
"my-provider": {
|
||||
"baseUrl": "https://api.example.com/compat/v1",
|
||||
"apiKey": "YOUR_API_KEY",
|
||||
"api": "openai-completions",
|
||||
"models": [
|
||||
{
|
||||
"id": "model-id",
|
||||
"name": "Display Name",
|
||||
"reasoning": false,
|
||||
"input": ["text"],
|
||||
"cost": {
|
||||
"input": 0,
|
||||
"output": 0,
|
||||
"cacheRead": 0,
|
||||
"cacheWrite": 0
|
||||
},
|
||||
"contextWindow": 200000,
|
||||
"maxTokens": 8192
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Key fields:**
|
||||
- `api`: Protocol — `"openai-completions"` | `"openai-responses"` | `"anthropic-messages"` | `"google-generative-ai"`
|
||||
- `mode`: `"merge"` (default, keeps built-in providers) or `"replace"` (only custom)
|
||||
- `cost`: Set all to `0` for free/self-hosted models
|
||||
- Model reference format: `provider-name/model-id` (e.g., `llama-as-openai/Llama-4-Maverick-17B-128E-Instruct-FP8`)
|
||||
|
||||
### Setting Default Model
|
||||
|
||||
```json
|
||||
{
|
||||
"agents": {
|
||||
"defaults": {
|
||||
"model": {
|
||||
"primary": "my-provider/model-id",
|
||||
"fallbacks": ["ollama/local-model"]
|
||||
},
|
||||
"models": {
|
||||
"my-provider/model-id": {},
|
||||
"ollama/local-model": {}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Enabling All Tools/Commands
|
||||
|
||||
To remove tool restrictions:
|
||||
|
||||
```json
|
||||
{
|
||||
"commands": {
|
||||
"native": true,
|
||||
"nativeSkills": true
|
||||
},
|
||||
"gateway": {
|
||||
"nodes": {
|
||||
"denyCommands": []
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Default `denyCommands` blocks: `camera.snap`, `camera.clip`, `screen.record`,
|
||||
`calendar.add`, `contacts.add`, `reminders.add`.
|
||||
|
||||
### Common Provider Examples
|
||||
|
||||
**Llama API:**
|
||||
```json
|
||||
"llama-as-openai": {
|
||||
"baseUrl": "https://api.llama.com/compat/v1",
|
||||
"apiKey": "LLM|...",
|
||||
"api": "openai-completions"
|
||||
}
|
||||
```
|
||||
|
||||
**Local Ollama:**
|
||||
```json
|
||||
"ollama": {
|
||||
"baseUrl": "http://127.0.0.1:11434/v1",
|
||||
"apiKey": "none",
|
||||
"api": "openai-completions"
|
||||
}
|
||||
```
|
||||
|
||||
**LM Studio:**
|
||||
```json
|
||||
"lmstudio": {
|
||||
"baseUrl": "http://127.0.0.1:1234/v1",
|
||||
"apiKey": "lmstudio",
|
||||
"api": "openai-responses"
|
||||
}
|
||||
```
|
||||
|
||||
## Verification
|
||||
- Restart OpenClaw after config changes
|
||||
- Run `openclaw` and check that the new model appears in model selection
|
||||
- Send a test message to verify the provider responds
|
||||
|
||||
## Notes
|
||||
- `mode: "merge"` is the default and recommended — it keeps built-in providers alongside custom ones
|
||||
- Optional fields: `authHeader` (boolean), `headers` (object for custom HTTP headers)
|
||||
- Set `reasoning: true` for models that support chain-of-thought (e.g., DeepSeek R1)
|
||||
- OpenClaw docs: https://docs.openclaw.ai/gateway/configuration-reference.md
|
||||
|
||||
## References
|
||||
- [OpenClaw Configuration Reference](https://docs.openclaw.ai/gateway/configuration-reference.md)
|
||||
- [OpenClaw Configuration Examples](https://docs.openclaw.ai/gateway/configuration-examples.md)
|
||||
- [OpenClaw Model Providers](https://docs.openclaw.ai/concepts/model-providers.md)
|
||||
105
dot_claude/skills/webrtc-turn-shared-secret/SKILL.md
Normal file
105
dot_claude/skills/webrtc-turn-shared-secret/SKILL.md
Normal file
|
|
@ -0,0 +1,105 @@
|
|||
---
|
||||
name: webrtc-turn-shared-secret
|
||||
description: |
|
||||
Generate ephemeral TURN credentials from a shared secret for coturn (--use-auth-secret mode).
|
||||
Use when: (1) WebRTC ICE connection state goes to "failed" or stays at "checking",
|
||||
(2) STUN-only config can't establish media path through NAT/k8s,
|
||||
(3) coturn is configured with --use-auth-secret and you need time-limited credentials,
|
||||
(4) need to pass TURN credentials to both server-side (pion/webrtc) and client-side
|
||||
(browser RTCPeerConnection). Covers credential generation, Go implementation, and
|
||||
client-side WebRTC configuration.
|
||||
author: Claude Code
|
||||
version: 1.0.0
|
||||
date: 2026-02-21
|
||||
---
|
||||
|
||||
# WebRTC TURN Server with Shared Secret Credentials
|
||||
|
||||
## Problem
|
||||
WebRTC connections fail with `ICE connection state: failed` when peers are behind NAT
|
||||
(especially in Kubernetes pods). STUN alone can't establish a media path through
|
||||
symmetric NAT. A TURN server is needed, and coturn's shared secret mode requires
|
||||
generating ephemeral credentials.
|
||||
|
||||
## Context / Trigger Conditions
|
||||
- `webrtc: ICE connection state: failed` in server logs
|
||||
- `ICE connection state: failed` in browser console
|
||||
- WebRTC signaling (offer/answer) succeeds but no media flows
|
||||
- Server is in a k8s pod with private IP, client is behind NAT
|
||||
- coturn configured with `--use-auth-secret` or `use-auth-secret` in turnserver.conf
|
||||
|
||||
## Solution
|
||||
|
||||
### Credential Generation (TURN REST API)
|
||||
|
||||
```
|
||||
username = Unix timestamp of expiry (e.g., "1740200000")
|
||||
password = Base64(HMAC-SHA1(username, shared_secret))
|
||||
```
|
||||
|
||||
### Go Implementation
|
||||
|
||||
```go
|
||||
import (
|
||||
"crypto/hmac"
|
||||
"crypto/sha1"
|
||||
"encoding/base64"
|
||||
"fmt"
|
||||
"time"
|
||||
)
|
||||
|
||||
func GenerateTURNCredentials(turnURL, sharedSecret string, ttl time.Duration) (urls []string, username, credential string) {
|
||||
expiry := time.Now().Add(ttl).Unix()
|
||||
username = fmt.Sprintf("%d", expiry)
|
||||
mac := hmac.New(sha1.New, []byte(sharedSecret))
|
||||
mac.Write([]byte(username))
|
||||
credential = base64.StdEncoding.EncodeToString(mac.Sum(nil))
|
||||
return []string{turnURL}, username, credential
|
||||
}
|
||||
```
|
||||
|
||||
### Server-side (pion/webrtc)
|
||||
|
||||
```go
|
||||
iceServers := []webrtc.ICEServer{
|
||||
{URLs: []string{"stun:stun.l.google.com:19302"}},
|
||||
{
|
||||
URLs: []string{"turn:your-turn-server:3478"},
|
||||
Username: username,
|
||||
Credential: credential,
|
||||
CredentialType: webrtc.ICECredentialTypePassword,
|
||||
},
|
||||
}
|
||||
pc, _ := webrtc.NewPeerConnection(webrtc.Configuration{ICEServers: iceServers})
|
||||
```
|
||||
|
||||
### Client-side (browser)
|
||||
|
||||
Send ICE config from server to client via signaling channel (WebSocket),
|
||||
then create RTCPeerConnection with it:
|
||||
|
||||
```javascript
|
||||
// Server sends: { type: "iceServers", iceServers: [...] }
|
||||
socket.onmessage = (e) => {
|
||||
const msg = JSON.parse(e.data);
|
||||
if (msg.type === 'iceServers') {
|
||||
pc = new RTCPeerConnection({ iceServers: msg.iceServers });
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
1. Server logs should show `ICE connection state: connected` (not `failed`)
|
||||
2. Browser console should show `ICE connection state: connected`
|
||||
3. Test TURN connectivity: `turnutils_uclient -u username -w credential turn-server-ip`
|
||||
|
||||
## Notes
|
||||
- Both server and client need the TURN credentials — the server uses them for its
|
||||
PeerConnection, and the client needs them for its RTCPeerConnection
|
||||
- Credentials are time-limited (TTL); generate fresh ones per session
|
||||
- If TURN server hostname doesn't resolve from k8s pods (CoreDNS custom zones),
|
||||
use the IP address directly: `turn:1.2.3.4:3478`
|
||||
- STUN is still useful as a fallback for direct connections; keep it in the ICE
|
||||
servers list alongside TURN
|
||||
- The shared secret must match coturn's `static-auth-secret` config
|
||||
Loading…
Add table
Add a link
Reference in a new issue