2026-03-07 14:30:36 +00:00
variable " tls_secret_name " {
2026-03-14 08:51:45 +00:00
type = string
2026-03-07 14:30:36 +00:00
sensitive = true
}
2026-03-14 17:15:48 +00:00
variable " nfs_server " { type = string }
data " vault_kv_secret_v2 " " secrets " {
mount = " secret "
name = " openclaw "
2026-03-07 21:09:31 +00:00
}
2026-03-14 17:15:48 +00:00
locals {
skill_secrets = jsondecode ( data . vault_kv_secret_v2 . secrets . data [ " skill_secrets " ] )
2026-03-14 16:01:41 +00:00
}
2026-02-22 13:56:34 +00:00
2026-02-22 15:13:55 +00:00
resource " kubernetes_namespace " " openclaw " {
metadata {
name = " openclaw "
labels = {
2026-03-15 16:04:02 +00:00
tier = local . tiers . aux
" resource-governance/custom-limitrange " = " true "
" resource-governance/custom-quota " = " true "
2026-05-16 14:01:46 +00:00
" keel.sh/enrolled " = " true "
2026-02-22 15:13:55 +00:00
}
}
[infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip]
## Context
Wave 3B-continued: the Goldilocks VPA dashboard (stacks/vpa) runs a Kyverno
ClusterPolicy `goldilocks-vpa-auto-mode` that mutates every namespace with
`metadata.labels["goldilocks.fairwinds.com/vpa-update-mode"] = "off"`. This
is intentional — Terraform owns container resource limits, and Goldilocks
should only provide recommendations, never auto-update. The label is how
Goldilocks decides per-namespace whether to run its VPA in `off` mode.
Effect on Terraform: every `kubernetes_namespace` resource shows the label
as pending-removal (`-> null`) on every `scripts/tg plan`. Dawarich survey
2026-04-18 confirmed the drift. Cluster-side count: 88 namespaces carry the
label (`kubectl get ns -o json | jq ... | wc -l`). Every TF-managed namespace
is affected.
This commit brings the intentional admission drift under the same
`# KYVERNO_LIFECYCLE_V1` discoverability marker introduced in c9d221d5 for
the ndots dns_config pattern. The marker now stands generically for any
Kyverno admission-webhook drift suppression; the inline comment records
which specific policy stamps which specific field so future grep audits
show why each suppression exists.
## This change
107 `.tf` files touched — every stack's `resource "kubernetes_namespace"`
resource gets:
```hcl
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
```
Injection was done with a brace-depth-tracking Python pass (`/tmp/add_goldilocks_ignore.py`):
match `^resource "kubernetes_namespace" ` → track `{` / `}` until the
outermost closing brace → insert the lifecycle block before the closing
brace. The script is idempotent (skips any file that already mentions
`goldilocks.fairwinds.com/vpa-update-mode`) so re-running is safe.
Vault stack picked up 2 namespaces in the same file (k8s-users produces
one, plus a second explicit ns) — confirmed via file diff (+8 lines).
## What is NOT in this change
- `stacks/trading-bot/main.tf` — entire file is `/* … */` commented out
(paused 2026-04-06 per user decision). Reverted after the script ran.
- `stacks/_template/main.tf.example` — per-stack skeleton, intentionally
minimal. User keeps it that way. Not touched by the script (file
has no real `resource "kubernetes_namespace"` — only a placeholder
comment).
- `.terraform/` copies (e.g. `stacks/metallb/.terraform/modules/...`) —
gitignored, won't commit; the live path was edited.
- `terraform fmt` cleanup of adjacent pre-existing alignment issues in
authentik, freedify, hermes-agent, nvidia, vault, meshcentral. Reverted
to keep the commit scoped to the Goldilocks sweep. Those files will
need a separate fmt-only commit or will be cleaned up on next real
apply to that stack.
## Verification
Dawarich (one of the hundred-plus touched stacks) showed the pattern
before and after:
```
$ cd stacks/dawarich && ../../scripts/tg plan
Before:
Plan: 0 to add, 2 to change, 0 to destroy.
# kubernetes_namespace.dawarich will be updated in-place
(goldilocks.fairwinds.com/vpa-update-mode -> null)
# module.tls_secret.kubernetes_secret.tls_secret will be updated in-place
(Kyverno generate.* labels — fixed in 8d94688d)
After:
No changes. Your infrastructure matches the configuration.
```
Injection count check:
```
$ rg -c 'KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode' stacks/ | awk -F: '{s+=$2} END {print s}'
108
```
## Reproduce locally
1. `git pull`
2. Pick any stack: `cd stacks/<name> && ../../scripts/tg plan`
3. Expect: no drift on the namespace's goldilocks.fairwinds.com/vpa-update-mode label.
Closes: code-dwx
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:15:27 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [ metadata [ 0 ] . labels [ " goldilocks.fairwinds.com/vpa-update-mode " ] ]
}
2026-02-22 15:13:55 +00:00
}
module " tls_secret " {
source = " ../../modules/kubernetes/setup_tls_secret "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
tls_secret_name = var . tls_secret_name
}
migrate consuming stacks to ESO + remove k8s-dashboard static token
Phase 9: ExternalSecret migration across 26 stacks:
Fully migrated (vault data source removed, ESO delivers secrets):
- speedtest, shadowsocks, wealthfolio, plotting-book, f1-stream, tandoor
- n8n, dawarich, diun, netbox, onlyoffice, tuya-bridge
- hackmd (ESO template for DB URL), health (ESO template for DB URL)
- trading-bot (ESO template for DATABASE_URL + 7 secret env vars)
- forgejo (removed unused vault data source)
Partially migrated (vault kept for plan-time, ESO added for runtime):
- immich, linkwarden, nextcloud, paperless-ngx (jsondecode for homepage)
- claude-memory, rybbit, url, webhook_handler (plan-time in locals/jobs)
- woodpecker, openclaw, resume (plan-time in helm values/jobs/modules)
17 stacks unchanged (all plan-time: homepage annotations, configmaps,
module inputs) — vault data source works with OIDC auth.
Phase 17a: Remove k8s-dashboard static admin token secret.
Users now get tokens via: vault write kubernetes/creds/dashboard-admin
2026-03-15 19:05:04 +00:00
resource " kubernetes_manifest " " external_secret " {
manifest = {
apiVersion = " external-secrets.io/v1beta1 "
kind = " ExternalSecret "
metadata = {
name = " openclaw-secrets "
namespace = " openclaw "
}
spec = {
refreshInterval = " 15m "
secretStoreRef = {
name = " vault-kv "
kind = " ClusterSecretStore "
}
target = {
name = " openclaw-secrets "
}
dataFrom = [ {
extract = {
key = " openclaw "
}
} ]
}
}
depends_on = [ kubernetes_namespace . openclaw ]
}
2026-02-22 15:13:55 +00:00
resource " kubernetes_service_account " " openclaw " {
metadata {
name = " openclaw "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
}
resource " kubernetes_cluster_role_binding " " openclaw " {
metadata {
name = " openclaw-cluster-admin "
}
subject {
kind = " ServiceAccount "
name = kubernetes_service_account . openclaw . metadata [ 0 ] . name
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
role_ref {
api_group = " rbac.authorization.k8s.io "
kind = " ClusterRole "
name = " cluster-admin "
}
}
resource " kubernetes_secret " " ssh_key " {
metadata {
name = " ssh-key "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
data = {
2026-03-14 17:15:48 +00:00
" id_rsa " = data . vault_kv_secret_v2 . secrets . data [ " ssh_key " ]
2026-02-22 15:13:55 +00:00
}
type = " generic "
}
resource " kubernetes_config_map " " git_crypt_key " {
metadata {
name = " git-crypt-key "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
data = {
" key " = filebase64 ( " ${ path . root } /../../.git/git-crypt/keys/default " )
}
}
resource " kubernetes_config_map " " openclaw_config " {
metadata {
name = " openclaw-config "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
data = {
" openclaw.json " = jsonencode ( {
gateway = {
2026-03-01 15:47:54 +00:00
mode = " local "
2026-02-22 15:13:55 +00:00
bind = " lan "
trustedProxies = [ " 10.0.0.0/8 " ]
controlUi = {
2026-03-01 15:47:54 +00:00
dangerouslyDisableDeviceAuth = true
dangerouslyAllowHostHeaderOriginFallback = true
2026-02-22 15:13:55 +00:00
}
}
agents = {
defaults = {
contextTokens = 1000000
bootstrapMaxChars = 30000
2026-03-19 23:32:28 +00:00
workspace = " /workspace "
2026-03-01 16:51:35 +00:00
sandbox = {
mode = " off "
}
2026-02-22 15:13:55 +00:00
model = {
2026-05-22 15:23:17 +00:00
# 2026-05-22: switched primary to nim/meta/llama-3.1-70b-instruct.
# Verified end-to-end with tool calls (sub-second responses,
# proper tool_calls in API response). Auth audit on this date:
# - openai-codex OAuth: EXPIRED (ancaelena98@gmail.com,
# ChatGPT Plus). Re-auth requires interactive TTY:
# kubectl -n openclaw exec -it $(kubectl -n openclaw \
# get pods -l app=openclaw -o jsonpath='{.items[0].metadata.name}') \
# -c openclaw -- node /app/openclaw.mjs models auth \
# login --provider openai-codex
# - secret/openclaw → openai_api_key (sk-svcacct…):
# insufficient_quota (billing exhausted)
# - openrouter_api_key: "Key limit exceeded"
# - llama_api_key: region-blocked
# - anthropic_api_key: sk-ant-oat-… (OAuth refresh token,
# NOT a real x-api-key — won't auth)
# - nvidia_api_key: WORKS. nim/meta/llama-3.1-70b-instruct
# and nim/meta/llama-4-maverick-17b-128e-instruct both
# tool-call reliably.
# Keep codex as a fallback so it auto-promotes once
# re-authed; modelrelay last because it routes to a
# small model that hallucinates instead of tool-calling.
primary = " nim/meta/llama-3.1-70b-instruct "
fallbacks = [ " nim/meta/llama-4-maverick-17b-128e-instruct " , " openai-codex/gpt-5.4-mini " , " modelrelay/auto-fastest " ]
2026-02-22 15:13:55 +00:00
}
models = {
2026-03-01 15:57:31 +00:00
" modelrelay/auto-fastest " = { }
2026-03-01 13:22:47 +00:00
" nim/deepseek-ai/deepseek-v3.2 " = { }
" nim/qwen/qwen3.5-397b-a17b " = { }
" nim/mistralai/mistral-large-3-675b-instruct-2512 " = { }
" nim/qwen/qwen3-coder-480b-a35b-instruct " = { }
" nim/nvidia/llama-3.1-nemotron-ultra-253b-v1 " = { }
" nim/z-ai/glm5 " = { }
2026-05-22 15:23:17 +00:00
" nim/meta/llama-3.1-70b-instruct " = { }
" nim/meta/llama-4-maverick-17b-128e-instruct " = { }
2026-03-01 13:22:47 +00:00
" llama-as-openai/Llama-4-Maverick-17B-128E-Instruct-FP8 " = { }
" llama-as-openai/Llama-4-Scout-17B-16E-Instruct-FP8 " = { }
" openrouter/stepfun/step-3.5-flash:free " = { }
" openrouter/arcee-ai/trinity-large-preview:free " = { }
2026-05-06 22:06:32 +00:00
" openai-codex/gpt-5.4-mini " = { }
" openai-codex/gpt-5.5 " = { }
2026-02-22 15:13:55 +00:00
}
}
}
tools = {
profile = " full "
2026-03-01 15:58:12 +00:00
deny = [ ]
2026-03-01 17:12:03 +00:00
elevated = {
enabled = true
}
2026-02-22 15:13:55 +00:00
exec = {
2026-03-01 16:47:14 +00:00
host = " gateway "
2026-03-01 16:42:22 +00:00
security = " full "
2026-02-22 15:13:55 +00:00
ask = " off "
2026-03-19 23:32:28 +00:00
pathPrepend = [ " /tools " , " /workspace " ]
2026-02-22 15:13:55 +00:00
}
web = {
search = {
enabled = true
provider = " brave "
2026-03-14 17:15:48 +00:00
apiKey = data . vault_kv_secret_v2 . secrets . data [ " brave_api_key " ]
2026-02-22 15:13:55 +00:00
maxResults = 5
}
fetch = {
enabled = true
maxChars = 50000
timeoutSeconds = 30
}
}
}
2026-03-14 16:01:41 +00:00
plugins = {
2026-05-20 21:56:11 +00:00
allow = [ " memory-core " , " recruiter-api " ]
2026-03-15 02:39:14 +00:00
slots = { memory = " memory-core " }
2026-03-14 16:01:41 +00:00
load = {
2026-05-16 14:01:46 +00:00
# /app/extensions is the legacy bundled-plugins path; OpenClaw
# already loads bundled plugins natively (doctor warning).
paths = [ " /home/node/.openclaw/extensions " ]
2026-03-14 16:01:41 +00:00
}
}
2026-05-16 14:01:46 +00:00
# Note: mcp.servers is configured via `openclaw mcp set` in the main
# container startup command (see below) rather than in this ConfigMap.
# OpenClaw's `doctor --fix` (which runs on every pod start) strips
# bulk-loaded mcp blocks from openclaw.json, but preserves CLI-set
# entries. The CLI is the canonical writer.
2026-03-01 17:12:03 +00:00
commands = {
native = true
nativeSkills = true
}
2026-03-01 13:44:58 +00:00
channels = {
telegram = {
2026-03-01 15:47:54 +00:00
enabled = true
2026-03-14 17:15:48 +00:00
botToken = data . vault_kv_secret_v2 . secrets . data [ " telegram_bot_token " ]
2026-03-01 15:47:54 +00:00
dmPolicy = " allowlist "
allowFrom = [ " tg:8281953845 " ]
groupPolicy = " allowlist "
streamMode = " partial "
2026-03-01 13:44:58 +00:00
}
}
2026-02-22 15:13:55 +00:00
models = {
mode = "merge "
providers = {
2026-03-01 15:57:31 +00:00
modelrelay = {
baseUrl = " http://127.0.0.1:7352/v1 "
api = " openai-completions "
apiKey = " modelrelay "
models = [
{ id = " auto-fastest " , name = " Auto (Fastest) " , reasoning = false , input = [ " text " ] , contextWindow = 200000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
]
}
2026-03-01 13:22:47 +00:00
nim = {
baseUrl = " https://integrate.api.nvidia.com/v1 "
2026-02-22 15:13:55 +00:00
api = " openai-completions "
2026-03-14 17:15:48 +00:00
apiKey = data . vault_kv_secret_v2 . secrets . data [ " nvidia_api_key " ]
2026-02-22 15:13:55 +00:00
models = [
2026-03-01 13:22:47 +00:00
{ id = " deepseek-ai/deepseek-v3.2 " , name = " DeepSeek V3.2 " , reasoning = false , input = [ " text " ] , contextWindow = 164000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
{ id = " qwen/qwen3.5-397b-a17b " , name = " Qwen 3.5 " , reasoning = true , input = [ " text " ] , contextWindow = 262000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
{ id = " mistralai/mistral-large-3-675b-instruct-2512 " , name = " Mistral Large 3 " , reasoning = false , input = [ " text " ] , contextWindow = 262000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
{ id = " qwen/qwen3-coder-480b-a35b-instruct " , name = " Qwen 3 Coder " , reasoning = false , input = [ " text " ] , contextWindow = 262000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
{ id = " nvidia/llama-3.1-nemotron-ultra-253b-v1 " , name = " Nemotron Ultra 253B " , reasoning = true , input = [ " text " ] , contextWindow = 128000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
{ id = " z-ai/glm5 " , name = " GLM-5 " , reasoning = false , input = [ " text " ] , contextWindow = 128000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
2026-05-22 15:23:17 +00:00
{ id = " meta/llama-3.1-70b-instruct " , name = " Llama 3.1 70B Instruct " , reasoning = false , input = [ " text " ] , contextWindow = 128000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
{ id = " meta/llama-4-maverick-17b-128e-instruct " , name = " Llama 4 Maverick (NIM) " , reasoning = false , input = [ " text " ] , contextWindow = 1000000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
2026-02-22 15:13:55 +00:00
]
}
2026-03-01 13:22:47 +00:00
openrouter = {
baseUrl = " https://openrouter.ai/api/v1 "
2026-02-22 15:13:55 +00:00
api = " openai-completions "
2026-03-14 17:15:48 +00:00
apiKey = data . vault_kv_secret_v2 . secrets . data [ " openrouter_api_key " ]
2026-02-22 15:13:55 +00:00
models = [
2026-03-01 13:22:47 +00:00
{ id = " stepfun/step-3.5-flash:free " , name = " Step 3.5 Flash " , reasoning = true , input = [ " text " ] , contextWindow = 256000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
{ id = " arcee-ai/trinity-large-preview:free " , name = " Trinity Large " , reasoning = false , input = [ " text " ] , contextWindow = 131000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
2026-02-22 15:13:55 +00:00
]
}
llama - as - openai = {
baseUrl = " https://api.llama.com/compat/v1 "
2026-03-14 17:15:48 +00:00
apiKey = data . vault_kv_secret_v2 . secrets . data [ " llama_api_key " ]
2026-02-22 15:13:55 +00:00
api = " openai-completions "
models = [
2026-03-01 13:22:47 +00:00
{ id = " Llama-4-Maverick-17B-128E-Instruct-FP8 " , name = " Llama 4 Maverick " , reasoning = false , input = [ " text " ] , contextWindow = 200000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
{ id = " Llama-4-Scout-17B-16E-Instruct-FP8 " , name = " Llama 4 Scout " , reasoning = false , input = [ " text " ] , contextWindow = 200000 , maxTokens = 16384 , cost = { input = 0 , output = 0 , cacheRead = 0 , cacheWrite = 0 } } ,
2026-02-22 15:13:55 +00:00
]
}
}
}
2026-03-01 15:47:54 +00:00
wizard = {
lastRunAt = " 2026-03-01T15:11:54.176Z "
lastRunVersion = " 2026.2.9 "
lastRunCommand = " configure "
lastRunMode = " local "
}
2026-02-22 15:13:55 +00:00
} )
}
}
resource " random_password " " gateway_token " {
length = 32
special = false
}
openclaw: realtime usage dashboard via Prometheus exporter sidecar
Stdlib-only Python exporter ($1) reads ~/.openclaw/agents/*/sessions/*.jsonl
(assistant messages with usage) plus auth-profiles.json (OAuth expiry,
Plus-tier label) and exposes Prometheus text format on :9099/metrics.
Container is python:3.12-slim; pod template gets prometheus.io/scrape
annotations so the existing kubernetes-pods job picks it up — no
ServiceMonitor needed.
Metrics exported:
openclaw_codex_messages_total{provider,model,session_kind} counter
openclaw_codex_input/output/cache_read/cache_write_tokens_total
openclaw_codex_message_errors_total{reason}
openclaw_codex_active_sessions{kind} gauge
openclaw_codex_oauth_expiry_seconds{provider,account,plan} gauge
openclaw_codex_last_run_timestamp gauge
Grafana dashboard "OpenClaw — Codex Usage" (Applications folder, 30s
refresh): messages/5h vs Plus rate-card, % of 1,200 floor, tokens/5h,
cache hit %, OAuth expiry days, active sessions, last-turn age, errors,
plus per-model timeseries + bar gauge + error table.
Plus rate-card thresholds in the gauge are conservative (1,200/5h floor;
real cap is dynamic 1,200–7,000). Re-baseline if throttling shows up
below 80%.
2026-05-07 09:04:25 +00:00
# Prometheus exporter script — read by the openclaw-exporter sidecar.
# Stdlib-only Python so no pip install at startup. Reads sessions JSONL +
# auth-profiles.json from the NFS-backed openclaw home volume (mounted ro).
resource " kubernetes_config_map " " openclaw_exporter " {
metadata {
name = " openclaw-exporter "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
data = {
" exporter.py " = file ( " ${ path . module } /files/exporter.py " )
}
}
truenas deprecation: migrate all non-immich storage to proxmox NFS
- Migrate 7 backup CronJobs to Proxmox host NFS (192.168.1.127)
(etcd, mysql, postgresql, nextcloud, redis, vaultwarden, plotting-book)
- Migrate headscale backup, ebook2audiobook, osm_routing to Proxmox NFS
- Migrate servarr (lidarr, readarr, soulseek) NFS refs to Proxmox
- Remove 79 orphaned TrueNAS NFS module declarations from 49 stacks
- Delete stacks/platform/modules/ (27 dead module copies, 65MB)
- Update nfs-truenas StorageClass to point to Proxmox (192.168.1.127)
- Remove iscsi DNS record from config.tfvars
- Fix woodpecker persistence config and alertmanager PV
Only Immich (8 PVCs, ~1.4TB) remains on TrueNAS.
2026-04-12 14:35:39 +01:00
module " nfs_tools_host " {
[ci skip] complete NFS CSI migration: complex stacks + platform modules
Migrate remaining multi-volume stacks and all platform modules from
inline NFS volumes to CSI-backed PV/PVC with nfs-truenas StorageClass
(soft,timeo=30,retrans=3 mount options).
Complex stacks: openclaw (4 vols), immich (8 vols), frigate (2 vols),
nextcloud (2 vols + old PV replaced), rybbit (1 vol)
Remaining stacks: affine, ebook2audiobook, f1-stream, osm_routing,
real-estate-crawler
Platform modules: monitoring (prometheus, loki, alertmanager PVs
converted from native NFS to CSI), redis, dbaas, technitium,
headscale, vaultwarden, uptime-kuma, mailserver, infra-maintenance
2026-03-02 01:24:07 +00:00
source = " ../../modules/kubernetes/nfs_volume "
truenas deprecation: migrate all non-immich storage to proxmox NFS
- Migrate 7 backup CronJobs to Proxmox host NFS (192.168.1.127)
(etcd, mysql, postgresql, nextcloud, redis, vaultwarden, plotting-book)
- Migrate headscale backup, ebook2audiobook, osm_routing to Proxmox NFS
- Migrate servarr (lidarr, readarr, soulseek) NFS refs to Proxmox
- Remove 79 orphaned TrueNAS NFS module declarations from 49 stacks
- Delete stacks/platform/modules/ (27 dead module copies, 65MB)
- Update nfs-truenas StorageClass to point to Proxmox (192.168.1.127)
- Remove iscsi DNS record from config.tfvars
- Fix woodpecker persistence config and alertmanager PV
Only Immich (8 PVCs, ~1.4TB) remains on TrueNAS.
2026-04-12 14:35:39 +01:00
name = " openclaw-tools-host "
[ci skip] complete NFS CSI migration: complex stacks + platform modules
Migrate remaining multi-volume stacks and all platform modules from
inline NFS volumes to CSI-backed PV/PVC with nfs-truenas StorageClass
(soft,timeo=30,retrans=3 mount options).
Complex stacks: openclaw (4 vols), immich (8 vols), frigate (2 vols),
nextcloud (2 vols + old PV replaced), rybbit (1 vol)
Remaining stacks: affine, ebook2audiobook, f1-stream, osm_routing,
real-estate-crawler
Platform modules: monitoring (prometheus, loki, alertmanager PVs
converted from native NFS to CSI), redis, dbaas, technitium,
headscale, vaultwarden, uptime-kuma, mailserver, infra-maintenance
2026-03-02 01:24:07 +00:00
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
truenas deprecation: migrate all non-immich storage to proxmox NFS
- Migrate 7 backup CronJobs to Proxmox host NFS (192.168.1.127)
(etcd, mysql, postgresql, nextcloud, redis, vaultwarden, plotting-book)
- Migrate headscale backup, ebook2audiobook, osm_routing to Proxmox NFS
- Migrate servarr (lidarr, readarr, soulseek) NFS refs to Proxmox
- Remove 79 orphaned TrueNAS NFS module declarations from 49 stacks
- Delete stacks/platform/modules/ (27 dead module copies, 65MB)
- Update nfs-truenas StorageClass to point to Proxmox (192.168.1.127)
- Remove iscsi DNS record from config.tfvars
- Fix woodpecker persistence config and alertmanager PV
Only Immich (8 PVCs, ~1.4TB) remains on TrueNAS.
2026-04-12 14:35:39 +01:00
nfs_server = " 192.168.1.127 "
nfs_path = " /srv/nfs/openclaw/tools "
[ci skip] complete NFS CSI migration: complex stacks + platform modules
Migrate remaining multi-volume stacks and all platform modules from
inline NFS volumes to CSI-backed PV/PVC with nfs-truenas StorageClass
(soft,timeo=30,retrans=3 mount options).
Complex stacks: openclaw (4 vols), immich (8 vols), frigate (2 vols),
nextcloud (2 vols + old PV replaced), rybbit (1 vol)
Remaining stacks: affine, ebook2audiobook, f1-stream, osm_routing,
real-estate-crawler
Platform modules: monitoring (prometheus, loki, alertmanager PVs
converted from native NFS to CSI), redis, dbaas, technitium,
headscale, vaultwarden, uptime-kuma, mailserver, infra-maintenance
2026-03-02 01:24:07 +00:00
}
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
resource " kubernetes_persistent_volume_claim " " home_proxmox " {
wait_until_bound = false
metadata {
name = " openclaw-home-proxmox "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
annotations = {
2026-05-10 19:56:16 +00:00
" resize.topolvm.io/threshold " = " 10% "
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
" resize.topolvm.io/increase " = " 100% "
" resize.topolvm.io/storage_limit " = " 5Gi "
}
}
spec {
access_modes = [ " ReadWriteOnce " ]
storage_class_name = " proxmox-lvm "
resources {
requests = {
storage = " 1Gi "
}
}
}
2026-05-10 21:57:01 +00:00
lifecycle {
# The autoresizer expands requests.storage up to storage_limit and
# PVCs can't shrink. Without this, every TF apply tries to revert
# to the spec value, K8s rejects the shrink, and the PVC ends up
# in Terminating-but-in-use limbo.
ignore_changes = [ spec [ 0 ] . resources [ 0 ] . requests ]
}
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
}
truenas deprecation: migrate all non-immich storage to proxmox NFS
- Migrate 7 backup CronJobs to Proxmox host NFS (192.168.1.127)
(etcd, mysql, postgresql, nextcloud, redis, vaultwarden, plotting-book)
- Migrate headscale backup, ebook2audiobook, osm_routing to Proxmox NFS
- Migrate servarr (lidarr, readarr, soulseek) NFS refs to Proxmox
- Remove 79 orphaned TrueNAS NFS module declarations from 49 stacks
- Delete stacks/platform/modules/ (27 dead module copies, 65MB)
- Update nfs-truenas StorageClass to point to Proxmox (192.168.1.127)
- Remove iscsi DNS record from config.tfvars
- Fix woodpecker persistence config and alertmanager PV
Only Immich (8 PVCs, ~1.4TB) remains on TrueNAS.
2026-04-12 14:35:39 +01:00
module " nfs_workspace_host " {
[ci skip] complete NFS CSI migration: complex stacks + platform modules
Migrate remaining multi-volume stacks and all platform modules from
inline NFS volumes to CSI-backed PV/PVC with nfs-truenas StorageClass
(soft,timeo=30,retrans=3 mount options).
Complex stacks: openclaw (4 vols), immich (8 vols), frigate (2 vols),
nextcloud (2 vols + old PV replaced), rybbit (1 vol)
Remaining stacks: affine, ebook2audiobook, f1-stream, osm_routing,
real-estate-crawler
Platform modules: monitoring (prometheus, loki, alertmanager PVs
converted from native NFS to CSI), redis, dbaas, technitium,
headscale, vaultwarden, uptime-kuma, mailserver, infra-maintenance
2026-03-02 01:24:07 +00:00
source = " ../../modules/kubernetes/nfs_volume "
truenas deprecation: migrate all non-immich storage to proxmox NFS
- Migrate 7 backup CronJobs to Proxmox host NFS (192.168.1.127)
(etcd, mysql, postgresql, nextcloud, redis, vaultwarden, plotting-book)
- Migrate headscale backup, ebook2audiobook, osm_routing to Proxmox NFS
- Migrate servarr (lidarr, readarr, soulseek) NFS refs to Proxmox
- Remove 79 orphaned TrueNAS NFS module declarations from 49 stacks
- Delete stacks/platform/modules/ (27 dead module copies, 65MB)
- Update nfs-truenas StorageClass to point to Proxmox (192.168.1.127)
- Remove iscsi DNS record from config.tfvars
- Fix woodpecker persistence config and alertmanager PV
Only Immich (8 PVCs, ~1.4TB) remains on TrueNAS.
2026-04-12 14:35:39 +01:00
name = " openclaw-workspace-host "
[ci skip] complete NFS CSI migration: complex stacks + platform modules
Migrate remaining multi-volume stacks and all platform modules from
inline NFS volumes to CSI-backed PV/PVC with nfs-truenas StorageClass
(soft,timeo=30,retrans=3 mount options).
Complex stacks: openclaw (4 vols), immich (8 vols), frigate (2 vols),
nextcloud (2 vols + old PV replaced), rybbit (1 vol)
Remaining stacks: affine, ebook2audiobook, f1-stream, osm_routing,
real-estate-crawler
Platform modules: monitoring (prometheus, loki, alertmanager PVs
converted from native NFS to CSI), redis, dbaas, technitium,
headscale, vaultwarden, uptime-kuma, mailserver, infra-maintenance
2026-03-02 01:24:07 +00:00
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
truenas deprecation: migrate all non-immich storage to proxmox NFS
- Migrate 7 backup CronJobs to Proxmox host NFS (192.168.1.127)
(etcd, mysql, postgresql, nextcloud, redis, vaultwarden, plotting-book)
- Migrate headscale backup, ebook2audiobook, osm_routing to Proxmox NFS
- Migrate servarr (lidarr, readarr, soulseek) NFS refs to Proxmox
- Remove 79 orphaned TrueNAS NFS module declarations from 49 stacks
- Delete stacks/platform/modules/ (27 dead module copies, 65MB)
- Update nfs-truenas StorageClass to point to Proxmox (192.168.1.127)
- Remove iscsi DNS record from config.tfvars
- Fix woodpecker persistence config and alertmanager PV
Only Immich (8 PVCs, ~1.4TB) remains on TrueNAS.
2026-04-12 14:35:39 +01:00
nfs_server = " 192.168.1.127 "
nfs_path = " /srv/nfs/openclaw/workspace "
[ci skip] complete NFS CSI migration: complex stacks + platform modules
Migrate remaining multi-volume stacks and all platform modules from
inline NFS volumes to CSI-backed PV/PVC with nfs-truenas StorageClass
(soft,timeo=30,retrans=3 mount options).
Complex stacks: openclaw (4 vols), immich (8 vols), frigate (2 vols),
nextcloud (2 vols + old PV replaced), rybbit (1 vol)
Remaining stacks: affine, ebook2audiobook, f1-stream, osm_routing,
real-estate-crawler
Platform modules: monitoring (prometheus, loki, alertmanager PVs
converted from native NFS to CSI), redis, dbaas, technitium,
headscale, vaultwarden, uptime-kuma, mailserver, infra-maintenance
2026-03-02 01:24:07 +00:00
}
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
resource " kubernetes_persistent_volume_claim " " data_proxmox " {
wait_until_bound = false
metadata {
name = " openclaw-data-proxmox "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
annotations = {
2026-05-10 19:56:16 +00:00
" resize.topolvm.io/threshold " = " 10% "
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
" resize.topolvm.io/increase " = " 100% "
" resize.topolvm.io/storage_limit " = " 5Gi "
}
}
spec {
access_modes = [ " ReadWriteOnce " ]
storage_class_name = " proxmox-lvm "
resources {
requests = {
storage = " 1Gi "
}
}
}
2026-05-10 21:57:01 +00:00
lifecycle {
# The autoresizer expands requests.storage up to storage_limit and
# PVCs can't shrink. Without this, every TF apply tries to revert
# to the spec value, K8s rejects the shrink, and the PVC ends up
# in Terminating-but-in-use limbo.
ignore_changes = [ spec [ 0 ] . resources [ 0 ] . requests ]
}
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
}
2026-03-15 16:04:02 +00:00
## cc-config NFS volume removed — replaced by dotfiles repo clone in init container
## See init_container "install-dotfiles" in the deployment
2026-03-14 08:51:45 +00:00
2026-02-22 15:13:55 +00:00
resource " kubernetes_deployment " " openclaw " {
metadata {
name = " openclaw "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
labels = {
app = " openclaw "
tier = local . tiers . aux
}
}
spec {
strategy {
type = " Recreate "
}
replicas = 1
selector {
match_labels = {
app = " openclaw "
}
}
template {
metadata {
labels = {
app = " openclaw "
}
migrate consuming stacks to ESO + remove k8s-dashboard static token
Phase 9: ExternalSecret migration across 26 stacks:
Fully migrated (vault data source removed, ESO delivers secrets):
- speedtest, shadowsocks, wealthfolio, plotting-book, f1-stream, tandoor
- n8n, dawarich, diun, netbox, onlyoffice, tuya-bridge
- hackmd (ESO template for DB URL), health (ESO template for DB URL)
- trading-bot (ESO template for DATABASE_URL + 7 secret env vars)
- forgejo (removed unused vault data source)
Partially migrated (vault kept for plan-time, ESO added for runtime):
- immich, linkwarden, nextcloud, paperless-ngx (jsondecode for homepage)
- claude-memory, rybbit, url, webhook_handler (plan-time in locals/jobs)
- woodpecker, openclaw, resume (plan-time in helm values/jobs/modules)
17 stacks unchanged (all plan-time: homepage annotations, configmaps,
module inputs) — vault data source works with OIDC auth.
Phase 17a: Remove k8s-dashboard static admin token secret.
Users now get tokens via: vault write kubernetes/creds/dashboard-admin
2026-03-15 19:05:04 +00:00
annotations = {
" reloader.stakater.com/search " = " true "
openclaw: realtime usage dashboard via Prometheus exporter sidecar
Stdlib-only Python exporter ($1) reads ~/.openclaw/agents/*/sessions/*.jsonl
(assistant messages with usage) plus auth-profiles.json (OAuth expiry,
Plus-tier label) and exposes Prometheus text format on :9099/metrics.
Container is python:3.12-slim; pod template gets prometheus.io/scrape
annotations so the existing kubernetes-pods job picks it up — no
ServiceMonitor needed.
Metrics exported:
openclaw_codex_messages_total{provider,model,session_kind} counter
openclaw_codex_input/output/cache_read/cache_write_tokens_total
openclaw_codex_message_errors_total{reason}
openclaw_codex_active_sessions{kind} gauge
openclaw_codex_oauth_expiry_seconds{provider,account,plan} gauge
openclaw_codex_last_run_timestamp gauge
Grafana dashboard "OpenClaw — Codex Usage" (Applications folder, 30s
refresh): messages/5h vs Plus rate-card, % of 1,200 floor, tokens/5h,
cache hit %, OAuth expiry days, active sessions, last-turn age, errors,
plus per-model timeseries + bar gauge + error table.
Plus rate-card thresholds in the gauge are conservative (1,200/5h floor;
real cap is dynamic 1,200–7,000). Re-baseline if throttling shows up
below 80%.
2026-05-07 09:04:25 +00:00
# Prometheus auto-discovers pods with these annotations.
# Scraped by the openclaw-exporter sidecar — exposes /metrics on :9099.
" prometheus.io/scrape " = " true "
" prometheus.io/port " = " 9099 "
" prometheus.io/path " = " /metrics "
migrate consuming stacks to ESO + remove k8s-dashboard static token
Phase 9: ExternalSecret migration across 26 stacks:
Fully migrated (vault data source removed, ESO delivers secrets):
- speedtest, shadowsocks, wealthfolio, plotting-book, f1-stream, tandoor
- n8n, dawarich, diun, netbox, onlyoffice, tuya-bridge
- hackmd (ESO template for DB URL), health (ESO template for DB URL)
- trading-bot (ESO template for DATABASE_URL + 7 secret env vars)
- forgejo (removed unused vault data source)
Partially migrated (vault kept for plan-time, ESO added for runtime):
- immich, linkwarden, nextcloud, paperless-ngx (jsondecode for homepage)
- claude-memory, rybbit, url, webhook_handler (plan-time in locals/jobs)
- woodpecker, openclaw, resume (plan-time in helm values/jobs/modules)
17 stacks unchanged (all plan-time: homepage annotations, configmaps,
module inputs) — vault data source works with OIDC auth.
Phase 17a: Remove k8s-dashboard static admin token secret.
Users now get tokens via: vault write kubernetes/creds/dashboard-admin
2026-03-15 19:05:04 +00:00
}
2026-02-22 15:13:55 +00:00
}
spec {
service_account_name = kubernetes_service_account . openclaw . metadata [ 0 ] . name
2026-03-22 02:56:04 +02:00
# Init 0: fix /workspace ownership so node user can write
init_container {
name = " fix-workspace-perms "
image = " busybox:1.37 "
command = [ " sh " , " -c " , " chown 1000:1000 /workspace " ]
volume_mount {
name = " workspace "
mount_path = " /workspace "
}
}
2026-03-15 16:04:02 +00:00
# Init 1: copy openclaw.json from ConfigMap into writable NFS home
2026-02-22 15:13:55 +00:00
init_container {
2026-03-14 23:42:08 +00:00
name = " copy-config "
image = " busybox:1.37 "
command = [ " sh " , " -c " , " cp /config/openclaw.json /home/node/.openclaw/openclaw.json && chown 1000:1000 /home/node/.openclaw/openclaw.json " ]
2026-02-22 15:13:55 +00:00
volume_mount {
name = " openclaw-config "
2026-03-14 23:42:08 +00:00
mount_path = " /config "
2026-02-22 15:13:55 +00:00
}
2026-03-14 08:51:45 +00:00
volume_mount {
2026-03-14 23:42:08 +00:00
name = " openclaw-home "
mount_path = " /home/node/.openclaw "
2026-03-14 08:51:45 +00:00
}
2026-02-22 15:13:55 +00:00
}
2026-05-08 08:07:38 +00:00
# Init 1b: regenerate kubeconfig pointing at the projected SA tokenFile
# so kubectl always reads the fresh, kubelet-rotated token. Without
# this the previously-baked kubeconfig retains a SA token bound to a
# long-dead pod and kubectl returns "must be logged in to the server".
init_container {
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
name = " setup-kubeconfig "
image = " busybox:1.37 "
2026-05-08 08:07:38 +00:00
command = [ " sh " , " -c " , < < - EOT
cat > / home / node / . openclaw / kubeconfig < < ' KUBECONFIG_EOF '
apiVersion : v1
kind : Config
clusters :
- cluster :
certificate - authority : / var / run / secrets / kubernetes . io / serviceaccount / ca . crt
server : https : //kubernetes.default.svc
name : in - cluster
contexts :
- context :
cluster : in - cluster
user : openclaw
namespace : openclaw
name : in - cluster
current - context : in - cluster
users :
- name : openclaw
user :
tokenFile : / var / run / secrets / kubernetes . io / serviceaccount / token
KUBECONFIG_EOF
chown 1000 : 1000 / home / node / . openclaw / kubeconfig
chmod 0644 / home / node / . openclaw / kubeconfig
EOT
]
volume_mount {
name = " openclaw-home "
mount_path = " /home/node/.openclaw "
}
}
2026-03-29 01:11:33 +02:00
# Init 2 removed: install-dotfiles init container was cloning dotfiles
# repo via git on every pod start, causing 200+ small NFS writes.
# Dotfiles already exist on NFS at /home/node/.openclaw/dotfiles from
# a previous clone. To update, run git pull manually or via CronJob.
2026-03-15 16:04:02 +00:00
recruiter-responder: deploy stack + llama-cpp qwen3-8b + openclaw plugin mount
Three coupled changes for the new recruiter-responder pipeline:
1. stacks/llama-cpp/: add qwen3-8b text-only model to llama-swap. Uses
unsloth/Qwen3-8B-GGUF Q4_K_M, 16k context, no mmproj. Refactored the
download Job script + cmd renderer to handle text_only=true (skip
mmproj download + --mmproj flag). The 3 existing vision models stay
on text_only=false; no behaviour change for them.
2. stacks/recruiter-responder/: new stack. Namespace, 2 ExternalSecrets
(app secrets from secret/recruiter-responder, DB creds from Vault DB
engine static-creds/pg-recruiter-responder), Deployment (replicas=1,
Recreate -- IMAP IDLE + APScheduler want single leader), Service
ClusterIP. Image: forgejo.viktorbarzin.me/viktor/recruiter-responder.
3. stacks/openclaw/: add init container `install-recruiter-plugin` that
uses the recruiter-responder image to copy the .mjs plugin into
/home/node/.openclaw/extensions/recruiter-api/ on NFS. Couples plugin
version to the recruiter-responder image tag. Also injects
RECRUITER_RESPONDER_URL + RECRUITER_RESPONDER_TOKEN env vars (token
from openclaw-secrets.recruiter_responder_bearer_token, optional).
Pre-apply checklist for recruiter-responder stack:
- Vault: seed secret/recruiter-responder with webhook_bearer_token,
imap_{me,spam}_{user,pass}, smtp_password, claude_agent_token,
task_webhook_token.
- Vault: add secret/openclaw.recruiter_responder_bearer_token (same as
above webhook_bearer_token).
- dbaas: create DB recruiter_responder + role recruiter_responder,
and Vault DB-engine role static-creds/pg-recruiter-responder.
- Build + push image via Woodpecker (recruiter-responder repo CI).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 22:38:53 +00:00
# Init 3: install the recruiter-api OpenClaw plugin from the
# recruiter-responder image into NFS extensions/. Plugin lifecycle
# is coupled to the recruiter-responder image tag — bumping that
# tag re-installs the plugin on next openclaw pod restart.
init_container {
name = " install-recruiter-plugin "
image = " forgejo.viktorbarzin.me/viktor/recruiter-responder:latest "
command = [ " sh " , " -c " , < < - EOT
set - eu
mkdir - p / home / node / . openclaw / extensions / recruiter - api
cp - r / app / openclaw - plugin / . / home / node / . openclaw / extensions / recruiter - api /
chown - R 1000 : 1000 / home / node / . openclaw / extensions / recruiter - api
echo " recruiter-api plugin installed at /home/node/.openclaw/extensions/recruiter-api "
ls - la / home / node / . openclaw / extensions / recruiter - api
EOT
]
2026-05-15 23:18:43 +00:00
# /home/node/.openclaw is uid 1000 on NFS; recruiter-responder image
# otherwise drops to uid 10001 which can't write or chown. Run as
# root so mkdir + chown succeed.
security_context {
run_as_user = 0
}
recruiter-responder: deploy stack + llama-cpp qwen3-8b + openclaw plugin mount
Three coupled changes for the new recruiter-responder pipeline:
1. stacks/llama-cpp/: add qwen3-8b text-only model to llama-swap. Uses
unsloth/Qwen3-8B-GGUF Q4_K_M, 16k context, no mmproj. Refactored the
download Job script + cmd renderer to handle text_only=true (skip
mmproj download + --mmproj flag). The 3 existing vision models stay
on text_only=false; no behaviour change for them.
2. stacks/recruiter-responder/: new stack. Namespace, 2 ExternalSecrets
(app secrets from secret/recruiter-responder, DB creds from Vault DB
engine static-creds/pg-recruiter-responder), Deployment (replicas=1,
Recreate -- IMAP IDLE + APScheduler want single leader), Service
ClusterIP. Image: forgejo.viktorbarzin.me/viktor/recruiter-responder.
3. stacks/openclaw/: add init container `install-recruiter-plugin` that
uses the recruiter-responder image to copy the .mjs plugin into
/home/node/.openclaw/extensions/recruiter-api/ on NFS. Couples plugin
version to the recruiter-responder image tag. Also injects
RECRUITER_RESPONDER_URL + RECRUITER_RESPONDER_TOKEN env vars (token
from openclaw-secrets.recruiter_responder_bearer_token, optional).
Pre-apply checklist for recruiter-responder stack:
- Vault: seed secret/recruiter-responder with webhook_bearer_token,
imap_{me,spam}_{user,pass}, smtp_password, claude_agent_token,
task_webhook_token.
- Vault: add secret/openclaw.recruiter_responder_bearer_token (same as
above webhook_bearer_token).
- dbaas: create DB recruiter_responder + role recruiter_responder,
and Vault DB-engine role static-creds/pg-recruiter-responder.
- Build + push image via Woodpecker (recruiter-responder repo CI).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 22:38:53 +00:00
volume_mount {
name = " openclaw-home "
mount_path = " /home/node/.openclaw "
}
resources {
requests = { cpu = " 50m " , memory = " 64Mi " }
limits = { memory = " 128Mi " }
}
}
2026-05-22 10:20:00 +00:00
# Init 4: install host-tools bundle (ssh, vault, jq, ripgrep, tmux, …)
# into /tools/host-tools/ so the in-pod agent reaches CLI parity
# with the dev VM. Upstream OpenClaw image is minimal Debian
# bookworm running as uid 1000 — can't apt-install at runtime.
# Idempotent via marker file; bump suffix to force reinstall.
# See docs/plans/2026-05-22-openclaw-devvm-access-design.md.
init_container {
name = " install-host-tools "
image = " debian:bookworm-slim "
command = [ " bash " , " -c " , < < - EOT
set - euo pipefail
DEST =/ tools / host - tools
MARKER =" $ DEST/.installed-v1 "
if [ - f " $ MARKER " ] ; then
echo " host-tools v1 already installed (skipping) "
exit 0
fi
echo " installing host-tools v1 ... "
rm - rf " $ DEST "
mkdir - p " $ DEST/root " " $ DEST/bin "
export DEBIAN_FRONTEND =noninteractive
apt - get update - qq
# debian:bookworm-slim doesn't ship wget/unzip; install
# transiently into this init container's filesystem so we
# can download the static binaries below.
apt - get install - y - - no - install - recommends wget unzip ca - certificates
# NOTE: we deliberately do NOT pass --no-install-recommends to
# the download step. ssh links against libgssapi-krb5-2 which
# is a hard Depends but its transitive deps (libkrb5-3 etc.)
# need to come along too. The bundle is a self-contained
# /usr-like tree that the openclaw container can use via
# LD_LIBRARY_PATH, so missing deps = broken binaries.
APT_PKGS =" openssh-client dnsutils iputils-ping wget gnupg jq ripgrep fd-find ncdu htop strace tcpdump tmux unzip ca-certificates "
apt - get install - y - - download - only $ APT_PKGS
for d in / var / cache / apt / archives / * . deb ; do
dpkg - deb - x " $ d " " $ DEST/root/ "
done
VAULT_VER =1 . 18 . 3
YQ_VER =v4 . 44 . 3
wget - qO / tmp / vault . zip \
" https://releases.hashicorp.com/vault/ $ ${ VAULT_VER } /vault_ $ ${ VAULT_VER } _linux_amd64.zip "
unzip - o / tmp / vault . zip vault - d " $ DEST/bin/ "
chmod + x " $ DEST/bin/vault "
wget - qO " $ DEST/bin/yq " \
" https://github.com/mikefarah/yq/releases/download/ $ ${ YQ_VER } /yq_linux_amd64 "
chmod + x " $ DEST/bin/yq "
# Smoke test — fail init if any bundled binary has unresolved
# shared-lib deps, so glibc / shared-lib drift surfaces at
# deploy time. We don't run --version because flag support
# varies (older scp returns non-zero, ping/nslookup use weird
# conventions). ldd is the reliable signal: if any "not
# found" appears, the binary won't load when called.
# LD_LIBRARY_PATH points ld.so at the bundled libs (the
# openclaw main container sets the same env).
export PATH =" $ DEST/root/usr/bin: $ DEST/root/usr/sbin: $ DEST/root/bin: $ DEST/root/sbin: $ DEST/bin: $ PATH "
export LD_LIBRARY_PATH =" $ DEST/root/usr/lib/x86_64-linux-gnu: $ DEST/root/lib/x86_64-linux-gnu "
for t in ssh scp ssh - keyscan dig host nslookup ping wget gpg jq rg fdfind tmux vault yq ; do
bin =$ ( command - v " $ t " 2 > / dev / null ) | | { echo " FAIL: $ t not on PATH " ; exit 1 ; }
if ldd " $ bin " 2 > & 1 | grep - q " not found " ; then
echo " FAIL: $ t has unresolved shared libs: "
ldd " $ bin "
exit 1
fi
echo " OK: $ t "
done
chown - R 1000 : 1000 " $ DEST "
touch " $ MARKER "
echo " host-tools v1 install complete ( $ (du -sh " $ DEST " | cut -f1)) "
EOT
]
volume_mount {
name = " tools "
mount_path = " /tools "
}
resources {
requests = { cpu = " 100m " , memory = " 256Mi " }
limits = { memory = " 512Mi " }
}
}
# Init 5: write /home/node/.openclaw/.ssh/{id_rsa,config,known_hosts}
# so the agent can `ssh devvm` without device-trust prompts. The
# main container symlinks /home/node/.ssh → here at startup so
# the ssh client picks it up via $HOME/.ssh. Installs
# openssh-client transiently into this init container so
# ssh-keyscan works without LD_LIBRARY_PATH gymnastics.
init_container {
name = " setup-ssh-config "
image = " debian:bookworm-slim "
command = [ " bash " , " -c " , < < - EOT
set - euo pipefail
SSH =/ home / node / . openclaw / . ssh
MARKER =" $ SSH/.configured-v1 "
if [ - f " $ MARKER " ] ; then
echo " ssh-config v1 already set up (skipping) "
exit 0
fi
echo " installing openssh-client for ssh-keyscan ... "
export DEBIAN_FRONTEND =noninteractive
apt - get update - qq
apt - get install - y - - no - install - recommends openssh - client > / dev / null
echo " configuring ssh ... "
mkdir - p " $ SSH "
# Copy the secret-mounted private key into ~/.ssh with 0600 —
# the secret's tmpfs mount has wider perms (1777 + symlinks)
# that openssh refuses.
cp / ssh / id_rsa " $ SSH/id_rsa "
chmod 0600 " $ SSH/id_rsa "
cat > " $ SSH/config " < < ' SSH_EOF '
Host devvm
HostName 10 . 0 . 10 . 10
User wizard
IdentityFile ~ / . ssh / id_rsa
UserKnownHostsFile ~ / . ssh / known_hosts
StrictHostKeyChecking yes
SSH_EOF
chmod 0600 " $ SSH/config "
ssh - keyscan - H 10 . 0 . 10 . 10 > " $ SSH/known_hosts " 2 > / tmp / keyscan . err
if [ ! - s " $ SSH/known_hosts " ] ; then
echo " ssh-keyscan produced empty known_hosts; stderr: "
cat / tmp / keyscan . err
exit 1
fi
chmod 0644 " $ SSH/known_hosts "
chown - R 1000 : 1000 " $ SSH "
touch " $ MARKER "
echo " ssh-config v1 set up "
EOT
]
volume_mount {
name = " openclaw-home "
mount_path = " /home/node/.openclaw "
}
volume_mount {
name = " ssh-key "
mount_path = " /ssh "
}
resources {
requests = { cpu = " 50m " , memory = " 64Mi " }
limits = { memory = " 256Mi " }
}
}
2026-05-22 13:18:52 +00:00
# Init 6: seed the "know → ask devvm → save" learning loop into
# FIVE places so it's engrained at the identity level, not just
# buried as workflow advice:
2026-05-22 10:50:42 +00:00
#
2026-05-22 13:18:52 +00:00
# 1. /workspace/SOUL.md — read every session per AGENTS.md, this
# is the agent's identity. A marker-delimited section there
# says "learning is who you are" (single short truth, no
# procedural detail).
# 2. /workspace/TOOLS.md — full how-to, inserted at the TOP of
# the file so the agent sees it first when loading bootstrap.
# 3. /workspace/memory/projects/openclaw-runtime/devvm-fallback.md
# — indexed by memory-core (memory_recall surfaces it).
# 4. /workspace/memory/projects/openclaw-learned/ — durable cache
# of recipes the agent has saved. INDEX.md is the first thing
# checked on any task (Step 1 of the flow).
# 5. /home/node/.openclaw/cc-skills/learn-from-tasks/SKILL.md —
# formal skill with the triage logic.
# All marker-delimited (v4) so user edits outside markers
# survive future pod restarts. The four artifacts that ARE
# marker-managed reinforce the same rule from different angles:
# identity (SOUL), how (TOOLS), memory-search (memory note),
# entry-point (skill).
2026-05-22 10:20:00 +00:00
init_container {
name = " seed-devvm-memory-note "
image = " busybox:1.37 "
command = [ " sh " , " -c " , < < - EOT
set - eu
2026-05-22 11:12:33 +00:00
# ---- 1. Memory note (memory-core searchable) ----------------
2026-05-22 10:20:00 +00:00
DIR =/ workspace / memory / projects / openclaw - runtime
mkdir - p " $ DIR "
cat > " $ DIR/devvm-fallback.md " < < ' NOTE_EOF '
2026-05-22 13:18:52 +00:00
# The learning loop — applies to EVERY task
2026-05-22 10:20:00 +00:00
2026-05-22 13:18:52 +00:00
* * The flow : know → ask devvm → ( rarely ) try yourself . Then
save what you learned , always . * * This is not optional and
not specific to any domain — it ' s how you compound . Every
task that ends with a saved recipe makes future - you
stronger . Every task that doesn ' t is a wasted lesson .
2026-05-22 10:20:00 +00:00
2026-05-22 13:18:52 +00:00
## The flow
2026-05-22 11:12:33 +00:00
2026-05-22 13:18:52 +00:00
1 . * * Know ? * * ` memory_recall " <topic> " ` and check
` / workspace / memory / projects / openclaw - learned / INDEX . md ` .
If yes , use it .
2 . * * Don ' t know ? * * Default to ` ssh devvm ` and ask claude .
Don ' t brute - force . Most tasks are solvable there .
3 . * * Devvm can ' t either ? * * ( rare ) Try in - pod , be honest
about the risk , save anything you discover .
## The save (mandatory, every non-trivial task)
After ANY task — calendar , container debugging , vault
lookup , posting to Slack , parsing a log , whatever — pause
and save the recipe to
` / workspace / memory / projects / openclaw - learned / ` :
- Script → scripts / < task > . md
- Knowledge → knowledge / < topic > . md
- Credential → credentials / < name > . md * * Vault pointer
only , never the value * *
2026-05-22 10:20:00 +00:00
2026-05-22 13:18:52 +00:00
Update INDEX . md so the next session sees it .
2026-05-22 10:20:00 +00:00
2026-05-22 13:18:52 +00:00
## SSH cheatsheet
2026-05-22 10:20:00 +00:00
2026-05-22 13:18:52 +00:00
ssh devvm ' vault kv get - field =KEY secret / PATH '
ssh devvm ' claude - p " your prompt " '
2026-05-22 10:20:00 +00:00
2026-05-22 13:18:52 +00:00
# async (>2 min) — survives pod restart
ssh devvm openclaw - task new < id > " <command> "
ssh devvm openclaw - task capture / log / status / kill
2026-05-22 10:20:00 +00:00
NOTE_EOF
chown - R 1000 : 1000 " $ DIR "
2026-05-22 10:50:42 +00:00
2026-05-22 13:18:52 +00:00
# ---- 2. SOUL.md identity-level rule -------------------------
SOUL =/ workspace / SOUL . md
if [ - f " $ SOUL " ] ; then
awk '
$ 0 = = " <!-- BEGIN openclaw-learning-soul v1 --> " { skip =1 ; next }
$ 0 = = " <!-- END openclaw-learning-soul v1 --> " { skip =0 ; next }
! skip { print }
' " $ SOUL " > " $ SOUL.tmp "
# Insert before "## Boundaries" if present, else append.
if grep - q " ^## Boundaries " " $ SOUL.tmp " ; then
awk '
/ ^ ## Boundaries$/ && !done {
print " <!-- BEGIN openclaw-learning-soul v1 --> "
print " **Learning is your identity.** Every task — anything anyone asks you to do — is also a learning opportunity. The job is not just to answer; it is to come back to the next person (or future-you) smarter than you were. When you figure out how to do something, save the recipe under /workspace/memory/projects/openclaw-learned/. When you needed devvm to figure it out, save what devvm taught you. The point is compounding: every saved learning makes the next task faster, and over time you become independent. See TOOLS.md and the `learn-from-tasks` skill for the protocol. "
print " <!-- END openclaw-learning-soul v1 --> "
print " "
done =1
}
{ print }
' " $ SOUL.tmp " > " $ SOUL.new "
else
cat " $ SOUL.tmp " > " $ SOUL.new "
cat > > " $ SOUL.new " < < ' SOUL_EOF '
< ! - - BEGIN openclaw - learning - soul v1 - - >
* * Learning is your identity . * * Every task is a learning opportunity . Save what you figure out under / workspace / memory / projects / openclaw - learned / so future - you is stronger . See TOOLS . md and the ` learn - from - tasks ` skill .
< ! - - END openclaw - learning - soul v1 - - >
SOUL_EOF
fi
mv " $ SOUL.new " " $ SOUL "
rm - f " $ SOUL.tmp "
chown 1000 : 1000 " $ SOUL "
fi
# ---- 3. TOOLS.md flow section, INSERTED AT TOP --------------
2026-05-22 10:50:42 +00:00
TOOLS =/ workspace / TOOLS . md
touch " $ TOOLS "
2026-05-22 13:18:52 +00:00
# Strip prior versions (v1, v2, v3, v4) before re-inserting v4.
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
# User edits outside the markers always survive.
2026-05-22 11:12:33 +00:00
awk '
2026-05-22 13:18:52 +00:00
/ ^ < ! - - BEGIN openclaw - devvm - section v [ 1 -9 ] - - > $ / { skip =1 ; next }
/ ^ < ! - - END openclaw - devvm - section v [ 1 -9 ] - - > $ / { skip =0 ; next }
2026-05-22 10:50:42 +00:00
! skip { print }
2026-05-22 13:18:52 +00:00
' " $ TOOLS " > " $ TOOLS.stripped "
2026-05-22 10:50:42 +00:00
2026-05-22 13:18:52 +00:00
# Build the v4 section in /tmp, then insert near the top.
cat > / tmp / devvm - section . md < < ' TOOLS_EOF '
< ! - - BEGIN openclaw - devvm - section v4 - - >
## THE FLOW — run this on EVERY task. Not just hard ones.
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
2026-05-22 13:18:52 +00:00
This is universal . Same triage whether the user asks " what's
on my calendar today ? " , " restart frigate " , " find disk usage
on node3 " , " summarise the inbox " , or " deploy a new service " .
* * Don ' t brute - force . * * Don ' t say " I can't " until you ' ve done
steps 1 -2 . Most tasks are solvable .
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
` ` `
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
new task ─ ─ ─ ─ ▶ │ 1 . Do I KNOW how ? │
2026-05-22 13:18:52 +00:00
│ memory_recall + │
│ / workspace / memory / │
│ projects / openclaw - │
│ learned / INDEX . md │
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
│
yes ◀ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ▶ no
│ │
▼ ▼
Just do it . ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐
│ 2 . Ask claude on devvm : │
│ - can you do this ? │
│ - if yes , do it AND │
2026-05-22 13:18:52 +00:00
│ share the recipe + │
│ credentials needed │
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
└ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘
│
yes ◀ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ▶ no
│ │
▼ ▼
Devvm - claude does it . 3 . ( RARE ) Let me try
2026-05-22 13:18:52 +00:00
SAVE the recipe to in - pod . Most likely
openclaw - learned I will fail ; that ' s
( vault pointer for OK — that ' s the
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
credentials ) . point .
` ` `
2026-05-22 13:18:52 +00:00
* * The save is not optional . * * Every non - trivial task —
success OR partial fail — ends with a row in
` openclaw - learned / INDEX . md ` and a markdown file with the
recipe . The 2 minutes you spend saving now turn future - you
into the agent who never has to ask again . This compounds .
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
2026-05-22 13:18:52 +00:00
## Step 1 — KNOW?
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
2026-05-22 13:18:52 +00:00
Two checks , both fast :
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
2026-05-22 13:18:52 +00:00
node / app / openclaw . mjs memory recall " <topic keywords> "
cat / workspace / memory / projects / openclaw - learned / INDEX . md
2026-05-22 11:12:33 +00:00
2026-05-22 13:18:52 +00:00
If a row matches , open the file and use it . Don ' t re - ask .
## Step 2 — ASK DEVVM (the default when you don't know)
2026-05-22 11:12:33 +00:00
2026-05-22 13:18:52 +00:00
Use this prompt shape — explicitly ask for the recipe , not
just the answer :
2026-05-22 11:12:33 +00:00
2026-05-22 13:18:52 +00:00
ssh devvm ' claude - p - - dangerously - skip - permissions \
" I am the openclaw agent. I need to do X (give context). \
Can you do this ? If yes : do it now , then tell me 1 ) the \
exact commands I would run on my own next time , and 2 ) \
what credentials / paths I need ( give Vault paths , NOT \
values ) . I will save your answer . " '
2026-05-22 11:12:33 +00:00
2026-05-22 13:18:52 +00:00
For work that takes more than ~ 2 minutes , dispatch async so
the session survives this pod restarting :
2026-05-22 10:50:42 +00:00
2026-05-22 13:18:52 +00:00
ssh devvm openclaw - task new < id > " <command> "
ssh devvm openclaw - task capture < id >
2026-05-22 10:50:42 +00:00
2026-05-22 13:18:52 +00:00
## Step 3 — TRY IN-POD (rare)
2026-05-22 10:50:42 +00:00
2026-05-22 13:18:52 +00:00
Only when devvm - claude says it can ' t . Be honest with the
user about the uncertainty . If you DO find a way , save it
just like Step 2 .
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
2026-05-22 13:18:52 +00:00
## The save (do this on every non-trivial task)
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
All learnings live under
2026-05-22 13:18:52 +00:00
` / workspace / memory / projects / openclaw - learned / ` ( memory - core
indexes this path ; ` memory_recall ` surfaces it ) .
- * * Script / recipe * * → ` scripts / < task > . md `
Inline a fenced code block . Header : WHAT , WHEN learned ,
HOW ( verbatim devvm prompt or " self " ) , SOURCE ( Vault path
if a credential is involved ) .
- * * Knowledge * * ( decisions , paths , gotchas , conventions ) →
` knowledge / < topic > . md `
- * * Credential POINTER * * → ` credentials / < name > . md ` —
* * NEVER stores the value . * * Documents the Vault path +
field + fetch command + consumer + rotation expectations .
Then add a row to ` openclaw - learned / INDEX . md ` .
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
## devvm — wizard@10.0.10.10 (pre-wired, zero-config)
SSH key at ~ / . ssh / id_rsa , host pre - trusted , ` ssh devvm `
Just Works . No password prompts , no host - trust prompts .
2026-05-22 13:18:52 +00:00
Devvm has : Vault token , kubectl cluster - admin , git repos
under / home / wizard / code , git - crypt , claude 2 . 1 . 126 at
/ usr / local / bin / claude .
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
2026-05-22 13:18:52 +00:00
< ! - - END openclaw - devvm - section v4 - - >
2026-05-22 10:50:42 +00:00
TOOLS_EOF
2026-05-22 13:18:52 +00:00
# Insert at top: after first non-blank/non-heading lines.
# If the file starts with "# TOOLS.md", inject right after.
awk '
! inserted && / ^ # / {
print
print " "
while ( ( getline line < " /tmp/devvm-section.md " ) > 0 ) print line
close ( " /tmp/devvm-section.md " )
inserted =1
next
}
{ print }
END {
if ( ! inserted ) {
while ( ( getline line < " /tmp/devvm-section.md " ) > 0 ) print line
close ( " /tmp/devvm-section.md " )
}
}
' " $ TOOLS.stripped " > " $ TOOLS "
rm - f " $ TOOLS.stripped " / tmp / devvm - section . md
2026-05-22 10:50:42 +00:00
chown 1000 : 1000 " $ TOOLS "
2026-05-22 11:12:33 +00:00
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
# ---- 3. Memory-indexed learned/ scaffold --------------------
LEARNED =/ workspace / memory / projects / openclaw - learned
mkdir - p " $ LEARNED/scripts " " $ LEARNED/knowledge " " $ LEARNED/credentials "
chmod 0755 " $ LEARNED/credentials " # pointers only, not secrets
if [ ! - f " $ LEARNED/INDEX.md " ] ; then
cat > " $ LEARNED/INDEX.md " < < ' INDEX_EOF '
# openclaw-learned — index
2026-05-22 11:12:33 +00:00
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
Things I ' ve figured out ( via devvm - claude or self ) . Check
here FIRST — ` memory_recall " <topic> " ` also surfaces these .
2026-05-22 11:12:33 +00:00
| Task | Type | Path | Source | Added |
| - - - - - - | - - - - - - | - - - - - - | - - - - - - - - | - - - - - - - |
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
| _ example : post to slack_ | script | scripts / slack - post . md | devvm - claude | 2026 -05 -22 |
## Layout
2026-05-22 11:12:33 +00:00
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
- ` scripts / < task > . md ` — runnable recipes
- ` knowledge / < topic > . md ` — decisions , paths , gotchas
- ` credentials / < name > . md ` — POINTERS to Vault , never values
2026-05-22 11:12:33 +00:00
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
## When you save something new
2026-05-22 11:12:33 +00:00
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
1 . Drop the file in the right slot above .
2 . Header : WHAT , WHEN learned , HOW ( verbatim devvm prompt
or " self " ) , SOURCE ( Vault path if a credential ) .
3 . Add a row to this INDEX .
4 . ( Optional ) ` node / app / openclaw . mjs memory index - - force `
to make it immediately searchable ; the daily memory - sync
CronJob re - indexes anyway .
2026-05-22 11:12:33 +00:00
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
See the ` learn - from - tasks ` skill for full protocol .
2026-05-22 11:12:33 +00:00
INDEX_EOF
fi
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
chown - R 1000 : 1000 " $ LEARNED "
# Migrate v2 scaffold at /workspace/learned/ into the new
# memory-indexed location. Only move actual content — if
# the directory is still empty (no learnings saved yet),
# remove it so the agent isn't confused by two locations.
if [ - d / workspace / learned ] ; then
for sub in scripts knowledge credentials ; do
if [ - d " /workspace/learned/ $ sub " ] ; then
for f in " /workspace/learned/ $ sub " / * ; do
[ - e " $ f " ] | | continue
mv " $ f " " $ LEARNED/ $ sub/ " 2 > / dev / null | | true
done
fi
done
# Pull INDEX.md rows over too (manual review — leave both
# for now and let the agent clean up). Then remove v2 dir
# if empty.
rmdir / workspace / learned / scripts / workspace / learned / knowledge \
/ workspace / learned / credentials / workspace / learned \
2 > / dev / null | | true
fi
2026-05-22 11:12:33 +00:00
2026-05-22 13:18:52 +00:00
# ---- 5. Skill at cc-skills/learn-from-tasks/ ----------------
2026-05-22 11:12:33 +00:00
SKILL_DIR =/ home / node / . openclaw / cc - skills / learn - from - tasks
mkdir - p " $ SKILL_DIR "
cat > " $ SKILL_DIR/SKILL.md " < < ' SKILL_EOF '
# learn-from-tasks
2026-05-22 13:18:52 +00:00
* * This is universal . EVERY task runs through this flow —
not just hard ones , not just unfamiliar ones . The save at
the end is mandatory . * *
> know → ask devvm → ( rarely ) try yourself → SAVE the recipe
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
2026-05-22 13:18:52 +00:00
The goal isn ' t to answer the current task ; the goal is
to ALSO leave a trail so future - you doesn ' t need to
redo the discovery . Every saved recipe compounds . Skip
the save and you stay at zero .
2026-05-22 11:12:33 +00:00
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
## Triage every new task
2026-05-22 11:12:33 +00:00
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
1 . * * Do I know how ? * *
2026-05-22 13:18:52 +00:00
` memory_recall " <topic> " ` and scan
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
` / workspace / memory / projects / openclaw - learned / INDEX . md ` .
If yes → just do it .
2026-05-22 13:18:52 +00:00
2 . * * If not , ASK DEVVM . * * This is the default path — most
tasks land here .
2026-05-22 11:12:33 +00:00
ssh devvm ' claude - p - - dangerously - skip - permissions \
2026-05-22 13:18:52 +00:00
" I am the openclaw agent. I need to do X. Can \
you do this ? If yes : do it now , then tell me 1 ) \
the exact commands I would run on my own next \
time , and 2 ) what credentials / paths I need ( give \
Vault paths , NOT values ) . I will save your \
answer . " '
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
For multi - step work ( > 2 min ) , use async :
2026-05-22 11:12:33 +00:00
ssh devvm openclaw - task new < id > " <command> "
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
ssh devvm openclaw - task claude < id > " <prompt> "
ssh devvm openclaw - task capture / log / status
2026-05-22 11:12:33 +00:00
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
When devvm - claude responds , * * save what it told you * *
2026-05-22 13:18:52 +00:00
( see " The save " below ) .
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
3 . * * Only if devvm - claude can ' t either * * , try in - pod . This
is rare . Be honest with the user about the uncertainty .
If you find a way , save it just like step 2 .
2026-05-22 13:18:52 +00:00
## The save — MANDATORY on every non-trivial task
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
2026-05-22 13:18:52 +00:00
Don ' t end the task without it . Even if you self - figured ,
even if it was easy this time — recurrence makes it
valuable . The exceptions list at the end is tight ; bias
aggressively toward saving .
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
2026-05-22 13:18:52 +00:00
All learnings live under
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
` / workspace / memory / projects / openclaw - learned / ` because
2026-05-22 13:18:52 +00:00
memory - core indexes that tree .
2026-05-22 11:12:33 +00:00
1 . Pick the slot :
2026-05-22 13:18:52 +00:00
- * * Script / recipe * * → ` scripts / < task > . md `
Fenced code block ( s ) ; agent reads + runs from here .
- * * Knowledge * * → ` knowledge / < topic > . md `
Decisions , paths , conventions , gotchas , anti - patterns .
- * * Credential * * → ` credentials / < name > . md `
* * POINTER ONLY , never the value . * * Vault path + field
+ fetch command + consumer + rotation expectations .
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
2 . Header in the file : WHAT , WHEN learned , HOW ( verbatim
devvm prompt , or " self " ) , SOURCE ( Vault path if cred ) .
2026-05-22 13:18:52 +00:00
3 . Add a row to ` openclaw - learned / INDEX . md ` .
4 . Test the saved recipe end - to - end . If it doesn ' t work
as - saved , the artifact is a lie — fix it before you
consider the task done .
## After every task, ask yourself
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
2026-05-22 13:18:52 +00:00
- Could a future - me ( or another agent ) do this faster by
reading what I just figured out ?
- Was there any non - obvious URL / Vault path / quirk ?
- Did the first attempt fail and need a tweak ?
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
2026-05-22 13:18:52 +00:00
If yes to any → save it . The bar is low .
## When NOT to save
Very narrow exceptions :
- Trivial one - liners ( ` date ` , ` whoami ` ) that take zero
time to redo .
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
- Things that change every run ( ephemeral pod names ,
random tokens , timestamps ) .
2026-05-22 13:18:52 +00:00
- Values of credentials ( use the pointer pattern instead ) .
That ' s it . Everything else , save .
2026-05-22 11:12:33 +00:00
SKILL_EOF
chown - R 1000 : 1000 " $ SKILL_DIR "
2026-05-22 13:18:52 +00:00
echo " learning-loop v4 seeded: "
2026-05-22 11:12:33 +00:00
echo " - memory note: $ DIR/devvm-fallback.md "
2026-05-22 13:18:52 +00:00
echo " - SOUL.md: learning-as-identity marker section "
echo " - TOOLS.md v4: flow section INSERTED AT TOP "
openclaw: v3 flow — know → ask devvm → (rarely) try yourself
Refines the devvm-fallback into an explicit triage flow that the
agent runs on every task. The default path is to ASK devvm-claude
when uncertain — don't brute-force. Most tasks are solvable there.
## The flow
1. Do I KNOW how? Check `memory_recall` and INDEX.md.
2. If not, SSH devvm and ask claude — and crucially, ask it to
share the steps + credentials needed so I can do it on my own
next time. Save the answer in openclaw memory.
3. (RARE) If devvm-claude says no, try in-pod. Most likely fail —
that's OK.
## Storage moved to memory-indexed location
Learnings now live under
`/workspace/memory/projects/openclaw-learned/` (was
`/workspace/learned/`) so memory-core indexes them and
`memory_recall` surfaces them. Layout:
- `scripts/<task>.md` runnable recipes
- `knowledge/<topic>.md` decisions, paths, gotchas
- `credentials/<name>.md` **POINTERS to Vault, never values**
## Credentials = Vault pointers only
Previous v2 design saved cred values to plaintext NFS files. v3
flips to pointer-only: cred file documents the Vault path + fetch
command (`ssh devvm 'vault kv get -field=foo secret/bar'`), the
consumer, and rotation expectations. The secret stays in Vault.
## Init container also migrates
Strips v1/v2/v3 markers from TOOLS.md before re-inserting v3,
moves any files from the legacy `/workspace/learned/` tree into
the new location, removes the empty legacy dir. User edits
outside the markers always survive.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-22 11:20:54 +00:00
echo " - openclaw-learned/ at $ LEARNED "
2026-05-22 11:12:33 +00:00
echo " - skill: $ SKILL_DIR/SKILL.md "
2026-05-22 10:20:00 +00:00
EOT
]
volume_mount {
name = " workspace "
mount_path = " /workspace "
}
2026-05-22 11:12:33 +00:00
volume_mount {
name = " openclaw-home "
mount_path = " /home/node/.openclaw "
}
2026-05-22 10:20:00 +00:00
resources {
requests = { cpu = " 10m " , memory = " 32Mi " }
2026-05-22 11:12:33 +00:00
limits = { memory = " 64Mi " }
2026-05-22 10:20:00 +00:00
}
}
2026-02-22 15:13:55 +00:00
# Main container: OpenClaw
container {
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
name = " openclaw "
image = " ghcr.io/openclaw/openclaw:2026.5.4 "
2026-05-16 14:01:46 +00:00
# Startup sequence:
# 1. doctor --fix — repair sessions/state (also resets some config)
# 2. models set — pin gpt-5.4-mini (doctor auto-promotes to gpt-5-pro otherwise)
# 3. mcp set <name> — register MCP servers via the CLI (the
# ConfigMap-baked mcp.servers block gets
# stripped by doctor --fix, but CLI-written
# entries persist). Values: ha URL from
# $HA_SOFIA_MCP_URL env (Vault-sourced),
# others hard-coded.
# 4. gateway — exec into the gateway process
command = [ " sh " , " -c " , < < - EOC
2026-05-22 10:20:00 +00:00
# Symlink /home/node/.ssh → persistent .ssh so the ssh client
# finds id_rsa/config/known_hosts via $HOME/.ssh. HOME is
# /home/node (image overlay), .ssh files live on the PVC
# at /home/node/.openclaw/.ssh (set up by init 5).
ln - sfn / home / node / . openclaw / . ssh / home / node / . ssh
2026-05-16 14:01:46 +00:00
node openclaw . mjs doctor - - fix 2 > / dev / null
2026-05-22 15:23:17 +00:00
node openclaw . mjs models set nim / meta / llama -3 . 1 -70 b - instruct 2 > / dev / null
2026-05-16 14:01:46 +00:00
node openclaw . mjs mcp set ha " { \ " url \ " : \ " $ HA_SOFIA_MCP_URL \ " , \ " transport \ " : \ " streamable - http \ " } " 2 > / dev / null
node openclaw . mjs mcp set context7 ' { " command " : " npx " , " args " : [ " -y " , " @upstash/context7-mcp " ] } ' 2 > / dev / null
node openclaw . mjs mcp set playwright ' { " url " : " http://localhost:3000/mcp " , " transport " : " streamable-http " } ' 2 > / dev / null
2026-05-20 21:56:11 +00:00
# doctor --fix overwrites plugins.allow with its bundled-plugins
# list. Re-add our third-party plugin to the allow list via
# `config patch`, then enable it. (Same pattern as mcp set above.)
echo ' { " plugins " : { " allow " : [ " memory-core " , " recruiter-api " , " telegram " , " openrouter " , " brave " , " openai " , " codex " ] } } ' \
| node openclaw . mjs config patch - - stdin 2 > / dev / null | | true
node openclaw . mjs plugins enable recruiter - api 2 > / dev / null | | true
2026-05-22 10:20:00 +00:00
# Reindex memory-core so the seeded devvm-fallback note (and
# anything else dropped under /workspace/memory/) is searchable
# on first boot; daily memory-sync CronJob also keeps it indexed.
node openclaw . mjs memory index - - force 2 > / dev / null | | true
2026-05-16 14:01:46 +00:00
exec node openclaw . mjs gateway - - allow - unconfigured - - bind lan
EOC
]
2026-02-22 15:13:55 +00:00
port {
container_port = 18789
}
2026-03-01 14:44:22 +00:00
readiness_probe {
tcp_socket {
port = 18789
}
initial_delay_seconds = 30
period_seconds = 10
}
2026-03-14 23:42:08 +00:00
env {
name = " NODE_OPTIONS "
value = " --max-old-space-size=1536 "
}
2026-02-22 15:13:55 +00:00
env {
name = " OPENCLAW_GATEWAY_TOKEN "
value = random_password . gateway_token . result
}
env {
2026-05-22 10:20:00 +00:00
name = " PATH "
# Host-tools bundle (installed by init 4: install-host-tools)
# comes first so ssh/scp/dig/vault/jq/etc. resolve to the
# extracted Debian binaries + the static-binary downloads.
# /bin + /sbin are needed because iputils-ping installs ping
# under /bin (not /usr/bin) on Debian.
value = " /tools/host-tools/root/usr/bin:/tools/host-tools/root/usr/sbin:/tools/host-tools/root/bin:/tools/host-tools/root/sbin:/tools/host-tools/bin:/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin "
}
env {
# Point ld.so at the bundled libs so the host-tools binaries
# find their shared-lib deps (libgssapi_krb5, libkrb5, etc.).
# Both base images are bookworm so the libs match the
# openclaw image's libc/libssl — no ABI conflicts expected.
name = " LD_LIBRARY_PATH "
value = " /tools/host-tools/root/usr/lib/x86_64-linux-gnu:/tools/host-tools/root/lib/x86_64-linux-gnu "
2026-02-22 15:13:55 +00:00
}
env {
name = " TF_VAR_prod "
value = " true "
}
env {
name = " KUBECONFIG "
value = " /home/node/.openclaw/kubeconfig "
}
env {
name = " GIT_CONFIG_GLOBAL "
value = " /home/node/.openclaw/.gitconfig "
}
# Skill secrets - Home Assistant
env {
name = " HOME_ASSISTANT_URL "
value = " https://ha-london.viktorbarzin.me "
}
env {
name = " HOME_ASSISTANT_TOKEN "
2026-03-14 17:15:48 +00:00
value = local . skill_secrets [ " home_assistant_token " ]
2026-02-22 15:13:55 +00:00
}
env {
name = " HOME_ASSISTANT_SOFIA_URL "
value = " https://ha-sofia.viktorbarzin.me "
}
env {
name = " HOME_ASSISTANT_SOFIA_TOKEN "
2026-03-14 17:15:48 +00:00
value = local . skill_secrets [ " home_assistant_sofia_token " ]
2026-02-22 15:13:55 +00:00
}
2026-05-16 14:01:46 +00:00
# MCP URL for ha-mcp add-on on ha-sofia (secret-path auth).
# Consumed in the startup command by `openclaw mcp set ha ...`.
env {
name = " HA_SOFIA_MCP_URL "
value = data . vault_kv_secret_v2 . secrets . data [ " ha_sofia_mcp_url " ]
}
2026-02-22 15:13:55 +00:00
# Skill secrets - Uptime Kuma
env {
name = " UPTIME_KUMA_PASSWORD "
2026-03-14 17:15:48 +00:00
value = local . skill_secrets [ " uptime_kuma_password " ]
2026-02-22 15:13:55 +00:00
}
2026-03-14 16:01:41 +00:00
# Memory API
env {
name = " MEMORY_API_URL "
value = " http://claude-memory.claude-memory.svc.cluster.local "
}
env {
migrate consuming stacks to ESO + remove k8s-dashboard static token
Phase 9: ExternalSecret migration across 26 stacks:
Fully migrated (vault data source removed, ESO delivers secrets):
- speedtest, shadowsocks, wealthfolio, plotting-book, f1-stream, tandoor
- n8n, dawarich, diun, netbox, onlyoffice, tuya-bridge
- hackmd (ESO template for DB URL), health (ESO template for DB URL)
- trading-bot (ESO template for DATABASE_URL + 7 secret env vars)
- forgejo (removed unused vault data source)
Partially migrated (vault kept for plan-time, ESO added for runtime):
- immich, linkwarden, nextcloud, paperless-ngx (jsondecode for homepage)
- claude-memory, rybbit, url, webhook_handler (plan-time in locals/jobs)
- woodpecker, openclaw, resume (plan-time in helm values/jobs/modules)
17 stacks unchanged (all plan-time: homepage annotations, configmaps,
module inputs) — vault data source works with OIDC auth.
Phase 17a: Remove k8s-dashboard static admin token secret.
Users now get tokens via: vault write kubernetes/creds/dashboard-admin
2026-03-15 19:05:04 +00:00
name = " MEMORY_API_KEY "
value_from {
secret_key_ref {
name = " openclaw-secrets "
key = " claude_memory_api_key "
}
}
2026-03-14 16:01:41 +00:00
}
recruiter-responder: deploy stack + llama-cpp qwen3-8b + openclaw plugin mount
Three coupled changes for the new recruiter-responder pipeline:
1. stacks/llama-cpp/: add qwen3-8b text-only model to llama-swap. Uses
unsloth/Qwen3-8B-GGUF Q4_K_M, 16k context, no mmproj. Refactored the
download Job script + cmd renderer to handle text_only=true (skip
mmproj download + --mmproj flag). The 3 existing vision models stay
on text_only=false; no behaviour change for them.
2. stacks/recruiter-responder/: new stack. Namespace, 2 ExternalSecrets
(app secrets from secret/recruiter-responder, DB creds from Vault DB
engine static-creds/pg-recruiter-responder), Deployment (replicas=1,
Recreate -- IMAP IDLE + APScheduler want single leader), Service
ClusterIP. Image: forgejo.viktorbarzin.me/viktor/recruiter-responder.
3. stacks/openclaw/: add init container `install-recruiter-plugin` that
uses the recruiter-responder image to copy the .mjs plugin into
/home/node/.openclaw/extensions/recruiter-api/ on NFS. Couples plugin
version to the recruiter-responder image tag. Also injects
RECRUITER_RESPONDER_URL + RECRUITER_RESPONDER_TOKEN env vars (token
from openclaw-secrets.recruiter_responder_bearer_token, optional).
Pre-apply checklist for recruiter-responder stack:
- Vault: seed secret/recruiter-responder with webhook_bearer_token,
imap_{me,spam}_{user,pass}, smtp_password, claude_agent_token,
task_webhook_token.
- Vault: add secret/openclaw.recruiter_responder_bearer_token (same as
above webhook_bearer_token).
- dbaas: create DB recruiter_responder + role recruiter_responder,
and Vault DB-engine role static-creds/pg-recruiter-responder.
- Build + push image via Woodpecker (recruiter-responder repo CI).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-15 22:38:53 +00:00
# Recruiter Responder API — consumed by the recruiter-api plugin
# (mounted into /home/node/.openclaw/extensions/recruiter-api/ via
# the install-recruiter-plugin init container below).
env {
name = " RECRUITER_RESPONDER_URL "
value = " http://recruiter-responder.recruiter-responder.svc.cluster.local:8080 "
}
env {
name = " RECRUITER_RESPONDER_TOKEN "
value_from {
secret_key_ref {
name = " openclaw-secrets "
key = " recruiter_responder_bearer_token "
optional = true
}
}
}
2026-05-20 21:10:56 +00:00
# Telegram chat ID for the recruiter-api plugin's announcement loop.
env {
name = " VIKTOR_CHAT_ID "
value_from {
secret_key_ref {
name = " openclaw-secrets "
key = " viktor_chat_id "
optional = true
}
}
}
2026-02-22 15:13:55 +00:00
# Python packages path for skills
env {
name = " PYTHONPATH "
value = " /tools/python-libs "
}
volume_mount {
name = " tools "
mount_path = " /tools "
}
volume_mount {
name = " workspace "
mount_path = " /workspace "
}
volume_mount {
name = " data "
mount_path = " /data "
}
volume_mount {
name = " ssh-key "
mount_path = " /ssh "
}
volume_mount {
name = " openclaw-home "
mount_path = " /home/node/.openclaw "
}
resources {
limits = {
2026-03-14 23:42:08 +00:00
memory = " 2Gi "
2026-02-22 15:13:55 +00:00
}
requests = {
2026-03-01 14:44:22 +00:00
cpu = " 100m "
2026-03-15 15:30:18 +00:00
memory = " 2Gi "
2026-02-22 15:13:55 +00:00
}
}
}
2026-03-22 02:56:04 +02:00
# Sidecar: playwright-mcp — headless browser for agents
container {
name = " playwright-mcp "
image = " docker.io/viktorbarzin/playwright-mcp:v1 "
args = [ " --headless " , " --browser " , " chromium " , " --no-sandbox " , " --port " , " 3000 " , " --host " , " 0.0.0.0 " ]
port {
container_port = 3000
}
resources {
requests = {
cpu = " 50m "
memory = " 256Mi "
}
limits = {
memory = " 512Mi "
}
}
}
openclaw: realtime usage dashboard via Prometheus exporter sidecar
Stdlib-only Python exporter ($1) reads ~/.openclaw/agents/*/sessions/*.jsonl
(assistant messages with usage) plus auth-profiles.json (OAuth expiry,
Plus-tier label) and exposes Prometheus text format on :9099/metrics.
Container is python:3.12-slim; pod template gets prometheus.io/scrape
annotations so the existing kubernetes-pods job picks it up — no
ServiceMonitor needed.
Metrics exported:
openclaw_codex_messages_total{provider,model,session_kind} counter
openclaw_codex_input/output/cache_read/cache_write_tokens_total
openclaw_codex_message_errors_total{reason}
openclaw_codex_active_sessions{kind} gauge
openclaw_codex_oauth_expiry_seconds{provider,account,plan} gauge
openclaw_codex_last_run_timestamp gauge
Grafana dashboard "OpenClaw — Codex Usage" (Applications folder, 30s
refresh): messages/5h vs Plus rate-card, % of 1,200 floor, tokens/5h,
cache hit %, OAuth expiry days, active sessions, last-turn age, errors,
plus per-model timeseries + bar gauge + error table.
Plus rate-card thresholds in the gauge are conservative (1,200/5h floor;
real cap is dynamic 1,200–7,000). Re-baseline if throttling shows up
below 80%.
2026-05-07 09:04:25 +00:00
# Sidecar: openclaw-exporter — Prometheus exporter for Codex/OAuth usage.
# Reads sessions JSONL files + auth-profiles.json, exposes /metrics on :9099.
# Stdlib-only Python; no pip install at startup.
container {
name = " openclaw-exporter "
image = " docker.io/library/python:3.12-slim "
command = [ " python3 " , " /scripts/exporter.py " ]
port {
container_port = 9099
name = " metrics "
}
env {
name = " OPENCLAW_HOME "
value = " /home/node/.openclaw "
}
env {
name = " METRICS_PORT "
value = " 9099 "
}
volume_mount {
name = " openclaw-exporter-script "
mount_path = " /scripts "
read_only = true
}
volume_mount {
name = " openclaw-home "
mount_path = " /home/node/.openclaw "
read_only = true
}
readiness_probe {
http_get {
path = " /healthz "
port = 9099
}
initial_delay_seconds = 5
period_seconds = 30
}
resources {
requests = {
cpu = " 10m "
memory = " 64Mi "
}
limits = {
memory = " 128Mi "
}
}
}
2026-03-01 15:57:31 +00:00
# Sidecar: modelrelay — auto-routes to fastest healthy free model
container {
name = " modelrelay "
2026-03-15 10:37:58 +00:00
image = " docker.io/library/node:22-alpine "
2026-03-01 15:57:31 +00:00
command = [ " sh " , " -c " , < < - EOF
if [ ! - f / tools / modelrelay / node_modules / . package - lock . json ] ; then
mkdir - p / tools / modelrelay
cd / tools / modelrelay
npm init - y > / dev / null 2 > & 1
npm install modelrelay > / dev / null 2 > & 1
fi
cd / tools / modelrelay
exec npx modelrelay - - port 7352
EOF
]
port {
container_port = 7352
}
env {
migrate consuming stacks to ESO + remove k8s-dashboard static token
Phase 9: ExternalSecret migration across 26 stacks:
Fully migrated (vault data source removed, ESO delivers secrets):
- speedtest, shadowsocks, wealthfolio, plotting-book, f1-stream, tandoor
- n8n, dawarich, diun, netbox, onlyoffice, tuya-bridge
- hackmd (ESO template for DB URL), health (ESO template for DB URL)
- trading-bot (ESO template for DATABASE_URL + 7 secret env vars)
- forgejo (removed unused vault data source)
Partially migrated (vault kept for plan-time, ESO added for runtime):
- immich, linkwarden, nextcloud, paperless-ngx (jsondecode for homepage)
- claude-memory, rybbit, url, webhook_handler (plan-time in locals/jobs)
- woodpecker, openclaw, resume (plan-time in helm values/jobs/modules)
17 stacks unchanged (all plan-time: homepage annotations, configmaps,
module inputs) — vault data source works with OIDC auth.
Phase 17a: Remove k8s-dashboard static admin token secret.
Users now get tokens via: vault write kubernetes/creds/dashboard-admin
2026-03-15 19:05:04 +00:00
name = " NVIDIA_API_KEY "
value_from {
secret_key_ref {
name = " openclaw-secrets "
key = " nvidia_api_key "
}
}
2026-03-01 15:57:31 +00:00
}
env {
migrate consuming stacks to ESO + remove k8s-dashboard static token
Phase 9: ExternalSecret migration across 26 stacks:
Fully migrated (vault data source removed, ESO delivers secrets):
- speedtest, shadowsocks, wealthfolio, plotting-book, f1-stream, tandoor
- n8n, dawarich, diun, netbox, onlyoffice, tuya-bridge
- hackmd (ESO template for DB URL), health (ESO template for DB URL)
- trading-bot (ESO template for DATABASE_URL + 7 secret env vars)
- forgejo (removed unused vault data source)
Partially migrated (vault kept for plan-time, ESO added for runtime):
- immich, linkwarden, nextcloud, paperless-ngx (jsondecode for homepage)
- claude-memory, rybbit, url, webhook_handler (plan-time in locals/jobs)
- woodpecker, openclaw, resume (plan-time in helm values/jobs/modules)
17 stacks unchanged (all plan-time: homepage annotations, configmaps,
module inputs) — vault data source works with OIDC auth.
Phase 17a: Remove k8s-dashboard static admin token secret.
Users now get tokens via: vault write kubernetes/creds/dashboard-admin
2026-03-15 19:05:04 +00:00
name = " OPENROUTER_API_KEY "
value_from {
secret_key_ref {
name = " openclaw-secrets "
key = " openrouter_api_key "
}
}
2026-03-01 15:57:31 +00:00
}
volume_mount {
name = " tools "
mount_path = " /tools "
}
resources {
limits = {
right-size memory: set requests=limits based on actual usage
- Set memory requests = limits across 56 stacks to prevent overcommit
- Right-sized limits based on actual pod usage (2x actual, rounded up)
- Scaled down trading-bot (replicas=0) to free memory
- Fixed OOMKilled services: forgejo, dawarich, health, meshcentral,
paperless-ngx, vault auto-unseal, rybbit, whisper, openclaw, clickhouse
- Added startup+liveness probes to calibre-web
- Bumped inotify limits on nodes 2,3 (max_user_instances 128->8192)
Post node2 OOM incident (2026-03-14). Previous kubelet config had no
kubeReserved/systemReserved set, allowing pods to starve the kernel.
2026-03-14 21:01:24 +00:00
memory = " 256Mi "
2026-03-01 15:57:31 +00:00
}
requests = {
cpu = " 25m "
2026-03-15 10:47:34 +00:00
memory = " 128Mi "
2026-03-01 15:57:31 +00:00
}
}
}
2026-02-22 15:13:55 +00:00
volume {
name = " tools "
[ci skip] complete NFS CSI migration: complex stacks + platform modules
Migrate remaining multi-volume stacks and all platform modules from
inline NFS volumes to CSI-backed PV/PVC with nfs-truenas StorageClass
(soft,timeo=30,retrans=3 mount options).
Complex stacks: openclaw (4 vols), immich (8 vols), frigate (2 vols),
nextcloud (2 vols + old PV replaced), rybbit (1 vol)
Remaining stacks: affine, ebook2audiobook, f1-stream, osm_routing,
real-estate-crawler
Platform modules: monitoring (prometheus, loki, alertmanager PVs
converted from native NFS to CSI), redis, dbaas, technitium,
headscale, vaultwarden, uptime-kuma, mailserver, infra-maintenance
2026-03-02 01:24:07 +00:00
persistent_volume_claim {
truenas deprecation: migrate all non-immich storage to proxmox NFS
- Migrate 7 backup CronJobs to Proxmox host NFS (192.168.1.127)
(etcd, mysql, postgresql, nextcloud, redis, vaultwarden, plotting-book)
- Migrate headscale backup, ebook2audiobook, osm_routing to Proxmox NFS
- Migrate servarr (lidarr, readarr, soulseek) NFS refs to Proxmox
- Remove 79 orphaned TrueNAS NFS module declarations from 49 stacks
- Delete stacks/platform/modules/ (27 dead module copies, 65MB)
- Update nfs-truenas StorageClass to point to Proxmox (192.168.1.127)
- Remove iscsi DNS record from config.tfvars
- Fix woodpecker persistence config and alertmanager PV
Only Immich (8 PVCs, ~1.4TB) remains on TrueNAS.
2026-04-12 14:35:39 +01:00
claim_name = module . nfs_tools_host . claim_name
2026-03-01 13:59:07 +00:00
}
2026-02-22 15:13:55 +00:00
}
volume {
name = " openclaw-home "
[ci skip] complete NFS CSI migration: complex stacks + platform modules
Migrate remaining multi-volume stacks and all platform modules from
inline NFS volumes to CSI-backed PV/PVC with nfs-truenas StorageClass
(soft,timeo=30,retrans=3 mount options).
Complex stacks: openclaw (4 vols), immich (8 vols), frigate (2 vols),
nextcloud (2 vols + old PV replaced), rybbit (1 vol)
Remaining stacks: affine, ebook2audiobook, f1-stream, osm_routing,
real-estate-crawler
Platform modules: monitoring (prometheus, loki, alertmanager PVs
converted from native NFS to CSI), redis, dbaas, technitium,
headscale, vaultwarden, uptime-kuma, mailserver, infra-maintenance
2026-03-02 01:24:07 +00:00
persistent_volume_claim {
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
claim_name = kubernetes_persistent_volume_claim . home_proxmox . metadata [ 0 ] . name
2026-03-01 16:12:07 +00:00
}
2026-02-22 15:13:55 +00:00
}
volume {
name = " workspace "
[ci skip] complete NFS CSI migration: complex stacks + platform modules
Migrate remaining multi-volume stacks and all platform modules from
inline NFS volumes to CSI-backed PV/PVC with nfs-truenas StorageClass
(soft,timeo=30,retrans=3 mount options).
Complex stacks: openclaw (4 vols), immich (8 vols), frigate (2 vols),
nextcloud (2 vols + old PV replaced), rybbit (1 vol)
Remaining stacks: affine, ebook2audiobook, f1-stream, osm_routing,
real-estate-crawler
Platform modules: monitoring (prometheus, loki, alertmanager PVs
converted from native NFS to CSI), redis, dbaas, technitium,
headscale, vaultwarden, uptime-kuma, mailserver, infra-maintenance
2026-03-02 01:24:07 +00:00
persistent_volume_claim {
truenas deprecation: migrate all non-immich storage to proxmox NFS
- Migrate 7 backup CronJobs to Proxmox host NFS (192.168.1.127)
(etcd, mysql, postgresql, nextcloud, redis, vaultwarden, plotting-book)
- Migrate headscale backup, ebook2audiobook, osm_routing to Proxmox NFS
- Migrate servarr (lidarr, readarr, soulseek) NFS refs to Proxmox
- Remove 79 orphaned TrueNAS NFS module declarations from 49 stacks
- Delete stacks/platform/modules/ (27 dead module copies, 65MB)
- Update nfs-truenas StorageClass to point to Proxmox (192.168.1.127)
- Remove iscsi DNS record from config.tfvars
- Fix woodpecker persistence config and alertmanager PV
Only Immich (8 PVCs, ~1.4TB) remains on TrueNAS.
2026-04-12 14:35:39 +01:00
claim_name = module . nfs_workspace_host . claim_name
2026-02-22 15:13:55 +00:00
}
}
volume {
name = " data "
[ci skip] complete NFS CSI migration: complex stacks + platform modules
Migrate remaining multi-volume stacks and all platform modules from
inline NFS volumes to CSI-backed PV/PVC with nfs-truenas StorageClass
(soft,timeo=30,retrans=3 mount options).
Complex stacks: openclaw (4 vols), immich (8 vols), frigate (2 vols),
nextcloud (2 vols + old PV replaced), rybbit (1 vol)
Remaining stacks: affine, ebook2audiobook, f1-stream, osm_routing,
real-estate-crawler
Platform modules: monitoring (prometheus, loki, alertmanager PVs
converted from native NFS to CSI), redis, dbaas, technitium,
headscale, vaultwarden, uptime-kuma, mailserver, infra-maintenance
2026-03-02 01:24:07 +00:00
persistent_volume_claim {
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
claim_name = kubernetes_persistent_volume_claim . data _ proxmox . metadata [ 0 ] . name
2026-02-22 15:13:55 +00:00
}
}
volume {
name = " ssh-key "
secret {
secret_name = kubernetes_secret . ssh_key . metadata [ 0 ] . name
default_mode = " 0600 "
}
}
volume {
name = " openclaw-config "
config_map {
name = kubernetes_config_map . openclaw_config . metadata [ 0 ] . name
}
}
openclaw: realtime usage dashboard via Prometheus exporter sidecar
Stdlib-only Python exporter ($1) reads ~/.openclaw/agents/*/sessions/*.jsonl
(assistant messages with usage) plus auth-profiles.json (OAuth expiry,
Plus-tier label) and exposes Prometheus text format on :9099/metrics.
Container is python:3.12-slim; pod template gets prometheus.io/scrape
annotations so the existing kubernetes-pods job picks it up — no
ServiceMonitor needed.
Metrics exported:
openclaw_codex_messages_total{provider,model,session_kind} counter
openclaw_codex_input/output/cache_read/cache_write_tokens_total
openclaw_codex_message_errors_total{reason}
openclaw_codex_active_sessions{kind} gauge
openclaw_codex_oauth_expiry_seconds{provider,account,plan} gauge
openclaw_codex_last_run_timestamp gauge
Grafana dashboard "OpenClaw — Codex Usage" (Applications folder, 30s
refresh): messages/5h vs Plus rate-card, % of 1,200 floor, tokens/5h,
cache hit %, OAuth expiry days, active sessions, last-turn age, errors,
plus per-model timeseries + bar gauge + error table.
Plus rate-card thresholds in the gauge are conservative (1,200/5h floor;
real cap is dynamic 1,200–7,000). Re-baseline if throttling shows up
below 80%.
2026-05-07 09:04:25 +00:00
volume {
name = " openclaw-exporter-script "
config_map {
name = kubernetes_config_map . openclaw_exporter . metadata [ 0 ] . name
default_mode = " 0555 "
}
}
2026-02-22 15:13:55 +00:00
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip]
## Context
Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the
27 pre-existing `ignore_changes = [...dns_config]` sites so they could be
grepped and audited. It did NOT address pod-owning resources that were
simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18)
found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec,
and many other stacks showed perpetual `dns_config` drift every plan
because their `kubernetes_deployment` / `kubernetes_stateful_set` /
`kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all.
Root cause (same as Wave 3A): Kyverno's admission webhook stamps
`dns_config { option { name = "ndots"; value = "2" } }` on every pod's
`spec.template.spec.dns_config` to prevent NxDomain search-domain flooding
(see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes`
on every Terraform-managed pod-owner, Terraform repeatedly tries to strip
the injected field.
## This change
Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`,
`kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`,
`kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each
carries the right `ignore_changes` path:
- **kubernetes_deployment / stateful_set / daemon_set / job_v1**:
`spec[0].template[0].spec[0].dns_config`
- **kubernetes_cron_job_v1**:
`spec[0].job_template[0].spec[0].template[0].spec[0].dns_config`
(extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is
one level deeper)
Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno
admission webhook mutates dns_config with ndots=2` inline so the
suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`.
Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`):
1. **No existing `lifecycle {}`**: inject a brand-new block just before the
resource's closing `}`. 108 new blocks on 93 files.
2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag`
from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the
dns_config path. Handles both inline (`= [x]`) and multiline
(`= [\n x,\n]`) forms; ensures the last pre-existing list item carries
a trailing comma so the extended list is valid HCL. 34 extensions.
The script skips anything already mentioning `dns_config` inside an
`ignore_changes`, so re-running is a no-op.
## Scale
- 142 total lifecycle injections/extensions
- 93 `.tf` files touched
- 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones
- Every Tier 0 and Tier 1 stack with a pod-owning resource is covered
- Together with Wave 3A's 27 pre-existing markers → **169 greppable
`KYVERNO_LIFECYCLE_V1` dns_config sites across the repo**
## What is NOT in this change
- `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`).
Python script touched the file, reverted manually.
- `_template/main.tf.example` skeleton — kept minimal on purpose; any
future stack created from it should either inherit the Wave 3A one-line
form or add its own on first `kubernetes_deployment`.
- `terraform fmt` fixes to pre-existing alignment issues in meshcentral,
nvidia/modules/nvidia, vault — unrelated to this commit. Left for a
separate fmt-only pass.
- Non-pod resources (`kubernetes_service`, `kubernetes_secret`,
`kubernetes_manifest`, etc.) — they don't own pods so they don't get
Kyverno dns_config mutation.
## Verification
Random sample post-commit:
```
$ cd stacks/navidrome && ../../scripts/tg plan → No changes.
$ cd stacks/f1-stream && ../../scripts/tg plan → No changes.
$ cd stacks/frigate && ../../scripts/tg plan → No changes.
$ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \
| awk -F: '{s+=$2} END {print s}'
169
```
## Reproduce locally
1. `git pull`
2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+
3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on
the deployment's dns_config field.
Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest
annotation class handled separately in 8d94688d for tls_secret)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [ spec [ 0 ] . template [ 0 ] . spec [ 0 ] . dns_config ]
}
2026-02-22 15:13:55 +00:00
}
resource " kubernetes_service " " openclaw " {
metadata {
name = " openclaw "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
labels = {
app = " openclaw "
}
}
spec {
selector = {
app = " openclaw "
}
port {
port = 80
target_port = 18789
}
}
}
module " ingress " {
source = " ../../modules/kubernetes/ingress_factory "
2026-04-16 13:45:04 +00:00
dns_type = " non-proxied "
2026-02-22 15:13:55 +00:00
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
name = " openclaw "
tls_secret_name = var . tls_secret_name
port = 80
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
auth = " required "
2026-03-07 16:41:36 +00:00
extra_annotations = {
" gethomepage.dev/enabled " = " true "
" gethomepage.dev/name " = " OpenClaw "
" gethomepage.dev/description " = " AI assistant "
" gethomepage.dev/icon " = " openai.png "
" gethomepage.dev/group " = " AI & Data "
" gethomepage.dev/pod-selector " = " "
}
2026-02-22 15:13:55 +00:00
}
2026-03-07 21:09:31 +00:00
# --- Webhook receiver: triggers task-processor Job on Forgejo issue events ---
resource " kubernetes_config_map " " task_webhook " {
metadata {
name = " task-webhook "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
data = {
" server.py " = < < - PYEOF
from http . server import HTTPServer , BaseHTTPRequestHandler
import subprocess , time , json , os
BOT_USER = os . environ . get ( ' FORGEJO_BOT_USER ' , ' viktor ' )
class Handler ( BaseHTTPRequestHandler ) :
def do_POST ( self ) :
try :
body = self . rfile . read ( int ( self . headers . get ( ' Content - Length ' , 0 ) ) )
data = json . loads ( body )
action = data . get ( ' action ' , ' ' )
# Trigger on: new issue, reopened issue, or new comment
trigger = False
if action in ( ' opened ' , ' reopened ' ) :
issue = data . get ( ' issue ' , { } )
print ( f " Issue #{issue.get('number','?')} {action}: {issue.get('title','?')} " )
trigger = True
elif action == ' created ' and ' comment ' in data :
comment = data . get ( ' comment ' , { } )
commenter = comment . get ( ' user ' , { } ) . get ( ' login ' , ' ' )
# Skip comments from the bot itself to avoid loops
if commenter ! = BOT_USER :
issue = data . get ( ' issue ' , { } )
print ( f " Comment on #{issue.get('number','?')} by {commenter} " )
trigger = True
else :
print ( f " Skipping own comment on #{data.get('issue',{}).get('number','?')} " )
if trigger :
job_name = f " task-processor-{int(time.time())} "
subprocess . run ( [
' kubectl ' , ' create ' , ' job ' , job_name ,
' - - from =cronjob / task - processor ' ,
' - n ' , ' openclaw '
] , check =True )
self . send_response ( 200 )
self . end_headers ( )
self . wfile . write ( b ' { " ok " : true } ' )
else :
self . send_response ( 200 )
self . end_headers ( )
self . wfile . write ( b ' { " ok " : true , " skipped " : true } ' )
except Exception as e :
print ( f " Error: {e} " )
self . send_response ( 500 )
self . end_headers ( )
self . wfile . write ( f ' { { " error " : " {e} " } } ' . encode ( ) )
def do_GET ( self ) :
self . send_response ( 200 )
self . end_headers ( )
self . wfile . write ( b ' { " status " : " ok " } ' )
def log_message ( self , fmt , * args ) :
print ( f " [webhook] {args[0]} {args[1]} {args[2]} " )
print ( " Task webhook receiver listening on :8080 " )
HTTPServer ( ( ' ' , 8080 ) , Handler ) . serve_forever ( )
PYEOF
}
}
resource " kubernetes_service_account " " task_webhook " {
metadata {
name = " task-webhook "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
}
resource " kubernetes_role " " task_webhook " {
metadata {
name = " task-webhook-job-creator "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
rule {
api_groups = [ " batch " ]
resources = [ " jobs " , " cronjobs " ]
verbs = [ " get " , "list " , " create " ]
}
}
resource " kubernetes_role_binding " " task_webhook " {
metadata {
name = " task-webhook-job-creator "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
subject {
kind = " ServiceAccount "
name = kubernetes_service_account . task_webhook . metadata [ 0 ] . name
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
role_ref {
api_group = " rbac.authorization.k8s.io "
kind = " Role "
name = kubernetes_role . task_webhook . metadata [ 0 ] . name
}
}
resource " kubernetes_deployment " " task_webhook " {
metadata {
name = " task-webhook "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
labels = {
app = " task-webhook "
tier = local . tiers . aux
}
}
spec {
replicas = 1
selector {
match_labels = {
app = " task-webhook "
}
}
template {
metadata {
labels = {
app = " task-webhook "
}
}
spec {
service_account_name = kubernetes_service_account . task_webhook . metadata [ 0 ] . name
container {
2026-03-14 08:51:45 +00:00
name = " webhook "
image = " python:3-alpine "
2026-03-07 21:09:31 +00:00
command = [ " sh " , " -c " , " apk add --no-cache curl > /dev/null 2>&1 && curl -sfL https://dl.k8s.io/release/v1.34.2/bin/linux/amd64/kubectl -o /usr/local/bin/kubectl && chmod +x /usr/local/bin/kubectl && exec python3 -u /app/server.py " ]
port {
container_port = 8080
}
volume_mount {
name = " app "
mount_path = " /app "
}
resources {
requests = {
cpu = " 5m "
right-size memory: set requests=limits based on actual usage
- Set memory requests = limits across 56 stacks to prevent overcommit
- Right-sized limits based on actual pod usage (2x actual, rounded up)
- Scaled down trading-bot (replicas=0) to free memory
- Fixed OOMKilled services: forgejo, dawarich, health, meshcentral,
paperless-ngx, vault auto-unseal, rybbit, whisper, openclaw, clickhouse
- Added startup+liveness probes to calibre-web
- Bumped inotify limits on nodes 2,3 (max_user_instances 128->8192)
Post node2 OOM incident (2026-03-14). Previous kubelet config had no
kubeReserved/systemReserved set, allowing pods to starve the kernel.
2026-03-14 21:01:24 +00:00
memory = " 64Mi "
2026-03-07 21:09:31 +00:00
}
limits = {
memory = " 64Mi "
}
}
}
volume {
name = " app "
config_map {
name = kubernetes_config_map . task_webhook . metadata [ 0 ] . name
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip]
## Context
Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the
27 pre-existing `ignore_changes = [...dns_config]` sites so they could be
grepped and audited. It did NOT address pod-owning resources that were
simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18)
found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec,
and many other stacks showed perpetual `dns_config` drift every plan
because their `kubernetes_deployment` / `kubernetes_stateful_set` /
`kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all.
Root cause (same as Wave 3A): Kyverno's admission webhook stamps
`dns_config { option { name = "ndots"; value = "2" } }` on every pod's
`spec.template.spec.dns_config` to prevent NxDomain search-domain flooding
(see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes`
on every Terraform-managed pod-owner, Terraform repeatedly tries to strip
the injected field.
## This change
Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`,
`kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`,
`kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each
carries the right `ignore_changes` path:
- **kubernetes_deployment / stateful_set / daemon_set / job_v1**:
`spec[0].template[0].spec[0].dns_config`
- **kubernetes_cron_job_v1**:
`spec[0].job_template[0].spec[0].template[0].spec[0].dns_config`
(extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is
one level deeper)
Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno
admission webhook mutates dns_config with ndots=2` inline so the
suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`.
Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`):
1. **No existing `lifecycle {}`**: inject a brand-new block just before the
resource's closing `}`. 108 new blocks on 93 files.
2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag`
from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the
dns_config path. Handles both inline (`= [x]`) and multiline
(`= [\n x,\n]`) forms; ensures the last pre-existing list item carries
a trailing comma so the extended list is valid HCL. 34 extensions.
The script skips anything already mentioning `dns_config` inside an
`ignore_changes`, so re-running is a no-op.
## Scale
- 142 total lifecycle injections/extensions
- 93 `.tf` files touched
- 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones
- Every Tier 0 and Tier 1 stack with a pod-owning resource is covered
- Together with Wave 3A's 27 pre-existing markers → **169 greppable
`KYVERNO_LIFECYCLE_V1` dns_config sites across the repo**
## What is NOT in this change
- `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`).
Python script touched the file, reverted manually.
- `_template/main.tf.example` skeleton — kept minimal on purpose; any
future stack created from it should either inherit the Wave 3A one-line
form or add its own on first `kubernetes_deployment`.
- `terraform fmt` fixes to pre-existing alignment issues in meshcentral,
nvidia/modules/nvidia, vault — unrelated to this commit. Left for a
separate fmt-only pass.
- Non-pod resources (`kubernetes_service`, `kubernetes_secret`,
`kubernetes_manifest`, etc.) — they don't own pods so they don't get
Kyverno dns_config mutation.
## Verification
Random sample post-commit:
```
$ cd stacks/navidrome && ../../scripts/tg plan → No changes.
$ cd stacks/f1-stream && ../../scripts/tg plan → No changes.
$ cd stacks/frigate && ../../scripts/tg plan → No changes.
$ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \
| awk -F: '{s+=$2} END {print s}'
169
```
## Reproduce locally
1. `git pull`
2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+
3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on
the deployment's dns_config field.
Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest
annotation class handled separately in 8d94688d for tls_secret)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [ spec [ 0 ] . template [ 0 ] . spec [ 0 ] . dns_config ]
}
2026-03-07 21:09:31 +00:00
}
resource " kubernetes_service " " task_webhook " {
metadata {
name = " task-webhook "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
labels = {
app = " task-webhook "
}
}
spec {
selector = {
app = " task-webhook "
}
port {
port = 80
target_port = 8080
}
}
}
module " task_webhook_ingress " {
2026-04-19 13:01:36 +00:00
source = " ../../modules/kubernetes/ingress_factory "
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
auth = " required "
2026-04-19 13:01:36 +00:00
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
name = " task-webhook "
tls_secret_name = var . tls_secret_name
host = " task-webhook "
port = 80
external_monitor = false
2026-03-07 21:09:31 +00:00
}
2026-04-19 15:13:03 +00:00
# --- Shared ServiceAccount: grants pod-exec into the openclaw pod ---
# Used by the task_processor CronJob (below). Previously also used by the
# cluster_healthcheck CronJob, which has been decommissioned — the local
# `scripts/cluster_healthcheck.sh` is now the single authoritative runner.
2026-02-22 15:13:55 +00:00
resource " kubernetes_service_account " " healthcheck " {
metadata {
name = " cluster-healthcheck "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
}
resource " kubernetes_role " " healthcheck_exec " {
metadata {
name = " healthcheck-pod-exec "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
rule {
api_groups = [ " " ]
resources = [ " pods " ]
verbs = [ " get " , "list " ]
}
rule {
api_groups = [ " " ]
resources = [ " pods/exec " ]
verbs = [ " create " ]
}
}
resource " kubernetes_role_binding " " healthcheck_exec " {
metadata {
name = " healthcheck-pod-exec "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
subject {
kind = " ServiceAccount "
name = kubernetes_service_account . healthcheck . metadata [ 0 ] . name
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
role_ref {
api_group = " rbac.authorization.k8s.io "
kind = " Role "
name = kubernetes_role . healthcheck_exec . metadata [ 0 ] . name
}
}
2026-03-07 21:09:31 +00:00
# --- CronJob: Task processor — polls Forgejo issues and triggers OpenClaw ---
resource " kubernetes_cron_job_v1 " " task_processor " {
metadata {
name = " task-processor "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
labels = {
app = " task-processor "
tier = local . tiers . aux
}
}
spec {
schedule = " */5 * * * * "
concurrency_policy = " Forbid "
failed_jobs_history_limit = 3
successful_jobs_history_limit = 3
job_template {
metadata {
labels = {
app = " task-processor "
}
}
spec {
2026-04-19 15:13:03 +00:00
active_deadline_seconds = 600
backoff_limit = 0
ttl_seconds_after_finished = 86400
2026-03-07 21:09:31 +00:00
template {
metadata {
labels = {
app = " task-processor "
}
}
spec {
service_account_name = kubernetes_service_account . healthcheck . metadata [ 0 ] . name
restart_policy = " Never "
container {
name = " task-processor "
image = " bitnami/kubectl:latest "
command = [ " bash " , " -c " , < < - EOF
# Find the openclaw pod
POD =$ ( kubectl get pods - n openclaw - l app =openclaw - o jsonpath =' { . items [ 0 ] . metadata . name } ' 2 > / dev / null )
if [ - z " $ POD " ] ; then
echo " ERROR: OpenClaw pod not found "
exit 1
fi
echo " Executing task processor in pod $ POD... "
kubectl exec - n openclaw " $ POD " - c openclaw - - \
env FORGEJO_TOKEN =" $ FORGEJO_TOKEN " \
2026-03-24 19:40:15 +02:00
FORGEJO_URL =" http://forgejo.forgejo.svc.cluster.local " \
2026-03-07 21:09:31 +00:00
OPENCLAW_TOKEN =" $ OPENCLAW_TOKEN " \
OPENCLAW_URL =" https://integrate.api.nvidia.com " \
bash / workspace / infra / scripts / task - processor . sh
EOF
]
env {
migrate consuming stacks to ESO + remove k8s-dashboard static token
Phase 9: ExternalSecret migration across 26 stacks:
Fully migrated (vault data source removed, ESO delivers secrets):
- speedtest, shadowsocks, wealthfolio, plotting-book, f1-stream, tandoor
- n8n, dawarich, diun, netbox, onlyoffice, tuya-bridge
- hackmd (ESO template for DB URL), health (ESO template for DB URL)
- trading-bot (ESO template for DATABASE_URL + 7 secret env vars)
- forgejo (removed unused vault data source)
Partially migrated (vault kept for plan-time, ESO added for runtime):
- immich, linkwarden, nextcloud, paperless-ngx (jsondecode for homepage)
- claude-memory, rybbit, url, webhook_handler (plan-time in locals/jobs)
- woodpecker, openclaw, resume (plan-time in helm values/jobs/modules)
17 stacks unchanged (all plan-time: homepage annotations, configmaps,
module inputs) — vault data source works with OIDC auth.
Phase 17a: Remove k8s-dashboard static admin token secret.
Users now get tokens via: vault write kubernetes/creds/dashboard-admin
2026-03-15 19:05:04 +00:00
name = " FORGEJO_TOKEN "
value_from {
secret_key_ref {
name = " openclaw-secrets "
key = " forgejo_api_token "
}
}
2026-03-07 21:09:31 +00:00
}
env {
migrate consuming stacks to ESO + remove k8s-dashboard static token
Phase 9: ExternalSecret migration across 26 stacks:
Fully migrated (vault data source removed, ESO delivers secrets):
- speedtest, shadowsocks, wealthfolio, plotting-book, f1-stream, tandoor
- n8n, dawarich, diun, netbox, onlyoffice, tuya-bridge
- hackmd (ESO template for DB URL), health (ESO template for DB URL)
- trading-bot (ESO template for DATABASE_URL + 7 secret env vars)
- forgejo (removed unused vault data source)
Partially migrated (vault kept for plan-time, ESO added for runtime):
- immich, linkwarden, nextcloud, paperless-ngx (jsondecode for homepage)
- claude-memory, rybbit, url, webhook_handler (plan-time in locals/jobs)
- woodpecker, openclaw, resume (plan-time in helm values/jobs/modules)
17 stacks unchanged (all plan-time: homepage annotations, configmaps,
module inputs) — vault data source works with OIDC auth.
Phase 17a: Remove k8s-dashboard static admin token secret.
Users now get tokens via: vault write kubernetes/creds/dashboard-admin
2026-03-15 19:05:04 +00:00
name = " OPENCLAW_TOKEN "
value_from {
secret_key_ref {
name = " openclaw-secrets "
key = " nvidia_api_key "
}
}
2026-03-07 21:09:31 +00:00
}
resources {
requests = {
cpu = " 50m "
memory = " 64Mi "
}
limits = {
right-size memory: set requests=limits based on actual usage
- Set memory requests = limits across 56 stacks to prevent overcommit
- Right-sized limits based on actual pod usage (2x actual, rounded up)
- Scaled down trading-bot (replicas=0) to free memory
- Fixed OOMKilled services: forgejo, dawarich, health, meshcentral,
paperless-ngx, vault auto-unseal, rybbit, whisper, openclaw, clickhouse
- Added startup+liveness probes to calibre-web
- Bumped inotify limits on nodes 2,3 (max_user_instances 128->8192)
Post node2 OOM incident (2026-03-14). Previous kubelet config had no
kubeReserved/systemReserved set, allowing pods to starve the kernel.
2026-03-14 21:01:24 +00:00
memory = " 64Mi "
2026-03-07 21:09:31 +00:00
}
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip]
## Context
Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the
27 pre-existing `ignore_changes = [...dns_config]` sites so they could be
grepped and audited. It did NOT address pod-owning resources that were
simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18)
found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec,
and many other stacks showed perpetual `dns_config` drift every plan
because their `kubernetes_deployment` / `kubernetes_stateful_set` /
`kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all.
Root cause (same as Wave 3A): Kyverno's admission webhook stamps
`dns_config { option { name = "ndots"; value = "2" } }` on every pod's
`spec.template.spec.dns_config` to prevent NxDomain search-domain flooding
(see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes`
on every Terraform-managed pod-owner, Terraform repeatedly tries to strip
the injected field.
## This change
Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`,
`kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`,
`kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each
carries the right `ignore_changes` path:
- **kubernetes_deployment / stateful_set / daemon_set / job_v1**:
`spec[0].template[0].spec[0].dns_config`
- **kubernetes_cron_job_v1**:
`spec[0].job_template[0].spec[0].template[0].spec[0].dns_config`
(extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is
one level deeper)
Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno
admission webhook mutates dns_config with ndots=2` inline so the
suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`.
Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`):
1. **No existing `lifecycle {}`**: inject a brand-new block just before the
resource's closing `}`. 108 new blocks on 93 files.
2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag`
from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the
dns_config path. Handles both inline (`= [x]`) and multiline
(`= [\n x,\n]`) forms; ensures the last pre-existing list item carries
a trailing comma so the extended list is valid HCL. 34 extensions.
The script skips anything already mentioning `dns_config` inside an
`ignore_changes`, so re-running is a no-op.
## Scale
- 142 total lifecycle injections/extensions
- 93 `.tf` files touched
- 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones
- Every Tier 0 and Tier 1 stack with a pod-owning resource is covered
- Together with Wave 3A's 27 pre-existing markers → **169 greppable
`KYVERNO_LIFECYCLE_V1` dns_config sites across the repo**
## What is NOT in this change
- `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`).
Python script touched the file, reverted manually.
- `_template/main.tf.example` skeleton — kept minimal on purpose; any
future stack created from it should either inherit the Wave 3A one-line
form or add its own on first `kubernetes_deployment`.
- `terraform fmt` fixes to pre-existing alignment issues in meshcentral,
nvidia/modules/nvidia, vault — unrelated to this commit. Left for a
separate fmt-only pass.
- Non-pod resources (`kubernetes_service`, `kubernetes_secret`,
`kubernetes_manifest`, etc.) — they don't own pods so they don't get
Kyverno dns_config mutation.
## Verification
Random sample post-commit:
```
$ cd stacks/navidrome && ../../scripts/tg plan → No changes.
$ cd stacks/f1-stream && ../../scripts/tg plan → No changes.
$ cd stacks/frigate && ../../scripts/tg plan → No changes.
$ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \
| awk -F: '{s+=$2} END {print s}'
169
```
## Reproduce locally
1. `git pull`
2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+
3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on
the deployment's dns_config field.
Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest
annotation class handled separately in 8d94688d for tls_secret)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
2026-05-16 14:01:46 +00:00
ignore_changes = [ spec [ 0 ] . job_template [ 0 ] . spec [ 0 ] . template [ 0 ] . spec [ 0 ] . dns_config ]
}
}
# --- CronJob: claude-memory → memory-core sync (daily) ---
# Pulls all (non-sensitive) memories from claude-memory's REST API and
# writes them into memory-core's QMD-backed tree at
# /home/node/.openclaw/memory/projects/claude-memory-sync/. Then runs
# `openclaw memory index --force` to rebuild the search index so the
# OpenClaw agent can `memory_search` over the shared knowledge.
#
# Note: the central claude-memory MCP transport (/mcp/mcp) is broken
# on the deployed image (beads code-z1so) — this REST sync is the
# workaround. Once that's fixed we can also wire claude_memory as a
# native MCP server in the mcp.servers block above.
resource " kubernetes_config_map " " memory_sync_script " {
metadata {
name = " memory-sync-script "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
}
data = {
" memory-sync.py " = file ( " ${ path . module } /files/memory-sync.py " )
}
}
resource " kubernetes_cron_job_v1 " " memory_sync " {
metadata {
name = " memory-sync "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
labels = {
app = " memory-sync "
tier = local . tiers . aux
}
}
spec {
schedule = " 0 3 * * * "
concurrency_policy = " Forbid "
failed_jobs_history_limit = 3
successful_jobs_history_limit = 3
job_template {
metadata {
labels = {
app = " memory-sync "
}
}
spec {
active_deadline_seconds = 600
backoff_limit = 0
ttl_seconds_after_finished = 86400
template {
metadata {
labels = {
app = " memory-sync "
}
}
spec {
# Reuses the SA created for the (decommissioned) cluster
# healthcheck job — already has pods + pods/exec in this ns.
service_account_name = kubernetes_service_account . healthcheck . metadata [ 0 ] . name
restart_policy = " Never "
container {
name = " memory-sync "
image = " bitnami/kubectl:latest "
command = [ " bash " , " -c " , < < - EOF
set - eu
POD =$ ( kubectl get pods - n openclaw - l app =openclaw - o jsonpath =' { . items [ 0 ] . metadata . name } ' )
if [ - z " $ POD " ] ; then
echo " ERROR: no openclaw pod "
exit 1
fi
echo " syncing into pod $ POD ... "
kubectl exec - n openclaw " $ POD " - c openclaw - i - - python3 - u - < / scripts / memory - sync . py
echo " reindexing memory-core ... "
kubectl exec - n openclaw " $ POD " - c openclaw - - sh - c ' cd / app && node openclaw . mjs memory index - - force 2 > & 1 | tail -20 '
echo " memory-sync complete. "
EOF
]
volume_mount {
name = " script "
mount_path = " /scripts "
read_only = true
}
resources {
requests = {
cpu = " 20m "
memory = " 64Mi "
}
limits = {
memory = " 64Mi "
}
}
}
volume {
name = " script "
config_map {
name = kubernetes_config_map . memory_sync_script . metadata [ 0 ] . name
}
}
}
}
}
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip]
## Context
Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the
27 pre-existing `ignore_changes = [...dns_config]` sites so they could be
grepped and audited. It did NOT address pod-owning resources that were
simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18)
found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec,
and many other stacks showed perpetual `dns_config` drift every plan
because their `kubernetes_deployment` / `kubernetes_stateful_set` /
`kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all.
Root cause (same as Wave 3A): Kyverno's admission webhook stamps
`dns_config { option { name = "ndots"; value = "2" } }` on every pod's
`spec.template.spec.dns_config` to prevent NxDomain search-domain flooding
(see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes`
on every Terraform-managed pod-owner, Terraform repeatedly tries to strip
the injected field.
## This change
Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`,
`kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`,
`kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each
carries the right `ignore_changes` path:
- **kubernetes_deployment / stateful_set / daemon_set / job_v1**:
`spec[0].template[0].spec[0].dns_config`
- **kubernetes_cron_job_v1**:
`spec[0].job_template[0].spec[0].template[0].spec[0].dns_config`
(extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is
one level deeper)
Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno
admission webhook mutates dns_config with ndots=2` inline so the
suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`.
Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`):
1. **No existing `lifecycle {}`**: inject a brand-new block just before the
resource's closing `}`. 108 new blocks on 93 files.
2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag`
from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the
dns_config path. Handles both inline (`= [x]`) and multiline
(`= [\n x,\n]`) forms; ensures the last pre-existing list item carries
a trailing comma so the extended list is valid HCL. 34 extensions.
The script skips anything already mentioning `dns_config` inside an
`ignore_changes`, so re-running is a no-op.
## Scale
- 142 total lifecycle injections/extensions
- 93 `.tf` files touched
- 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones
- Every Tier 0 and Tier 1 stack with a pod-owning resource is covered
- Together with Wave 3A's 27 pre-existing markers → **169 greppable
`KYVERNO_LIFECYCLE_V1` dns_config sites across the repo**
## What is NOT in this change
- `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`).
Python script touched the file, reverted manually.
- `_template/main.tf.example` skeleton — kept minimal on purpose; any
future stack created from it should either inherit the Wave 3A one-line
form or add its own on first `kubernetes_deployment`.
- `terraform fmt` fixes to pre-existing alignment issues in meshcentral,
nvidia/modules/nvidia, vault — unrelated to this commit. Left for a
separate fmt-only pass.
- Non-pod resources (`kubernetes_service`, `kubernetes_secret`,
`kubernetes_manifest`, etc.) — they don't own pods so they don't get
Kyverno dns_config mutation.
## Verification
Random sample post-commit:
```
$ cd stacks/navidrome && ../../scripts/tg plan → No changes.
$ cd stacks/f1-stream && ../../scripts/tg plan → No changes.
$ cd stacks/frigate && ../../scripts/tg plan → No changes.
$ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \
| awk -F: '{s+=$2} END {print s}'
169
```
## Reproduce locally
1. `git pull`
2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+
3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on
the deployment's dns_config field.
Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest
annotation class handled separately in 8d94688d for tls_secret)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
ignore_changes = [ spec [ 0 ] . job_template [ 0 ] . spec [ 0 ] . template [ 0 ] . spec [ 0 ] . dns_config ]
}
2026-03-07 21:09:31 +00:00
}
2026-03-19 20:23:59 +00:00
# --- OpenLobster: Multi-user Telegram AI assistant (trial) ---
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
resource " kubernetes_persistent_volume_claim " " openlobster_data_proxmox " {
wait_until_bound = false
metadata {
name = " openlobster-data-proxmox "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
annotations = {
2026-05-10 19:56:16 +00:00
" resize.topolvm.io/threshold " = " 10% "
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
" resize.topolvm.io/increase " = " 100% "
" resize.topolvm.io/storage_limit " = " 5Gi "
}
}
spec {
access_modes = [ " ReadWriteOnce " ]
storage_class_name = " proxmox-lvm "
resources {
requests = {
storage = " 1Gi "
}
}
}
2026-05-10 21:57:01 +00:00
lifecycle {
# The autoresizer expands requests.storage up to storage_limit and
# PVCs can't shrink. Without this, every TF apply tries to revert
# to the spec value, K8s rejects the shrink, and the PVC ends up
# in Terminating-but-in-use limbo.
ignore_changes = [ spec [ 0 ] . resources [ 0 ] . requests ]
}
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
}
2026-03-19 20:23:59 +00:00
resource " random_password " " openlobster_graphql_token " {
length = 32
special = false
}
resource " kubernetes_deployment " " openlobster " {
metadata {
name = " openlobster "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
labels = {
app = " openlobster "
tier = local . tiers . aux
}
}
spec {
strategy {
type = " Recreate "
}
replicas = 0
selector {
match_labels = {
app = " openlobster "
}
}
template {
metadata {
labels = {
app = " openlobster "
}
}
spec {
# node4 has corrupted containerd content store — avoid it
affinity {
node_affinity {
required_during_scheduling_ignored_during_execution {
node_selector_term {
match_expressions {
key = " kubernetes.io/hostname "
operator = " NotIn "
values = [ " k8s-node4 " ]
}
}
}
}
}
container {
name = " openlobster "
image = " ghcr.io/neirth/openlobster/openlobster:latest "
port {
container_port = 8080
}
env {
name = " OPENLOBSTER_GRAPHQL_AUTH_TOKEN "
value = random_password . openlobster_graphql_token . result
}
env {
name = " OPENLOBSTER_PROVIDERS_ANTHROPIC_API_KEY "
value_from {
secret_key_ref {
name = " openclaw-secrets "
key = " anthropic_api_key "
}
}
}
env {
name = " OPENLOBSTER_PROVIDERS_ANTHROPIC_MODEL "
value = " claude-sonnet-4-20250514 "
}
env {
name = " OPENLOBSTER_CHANNELS_TELEGRAM_TOKEN "
value_from {
secret_key_ref {
name = " openclaw-secrets "
key = " telegram_bot_token "
}
}
}
env {
name = " OPENLOBSTER_DATABASE_DRIVER "
value = " sqlite "
}
env {
name = " OPENLOBSTER_DATABASE_DSN "
value = " /app/data/openlobster.db "
}
env {
name = " OPENLOBSTER_AGENT_NAME "
value = " Lobster "
}
env {
name = " OPENLOBSTER_MEMORY_BACKEND "
value = " file "
}
volume_mount {
name = " openlobster-data "
mount_path = " /app/data "
}
resources {
requests = {
cpu = " 10m "
memory = " 64Mi "
}
limits = {
memory = " 256Mi "
}
}
}
volume {
name = " openlobster-data "
persistent_volume_claim {
feat(storage): migrate 38 NFS PVCs to proxmox-lvm (Wave 2)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
remaining single-pod app data services. Deployments updated to
use new block storage PVCs. Old NFS modules retained for rollback.
Services: affine, changedetection, diun, excalidraw, f1-stream,
hackmd, isponsorblocktv, matrix, n8n, send, grampsweb, health,
onlyoffice, owntracks, paperless-ngx, privatebin, resume,
speedtest, stirling-pdf, tandoor, rybbit (clickhouse), tor-proxy
(torrserver), whisper+piper, frigate (config), ollama (ui),
servarr (prowlarr/listenarr/qbittorrent), aiostreams, freshrss
(extensions), meshcentral (data+files), openclaw (data+home+
openlobster), technitium, mailserver (data+roundcube html+enigma),
dbaas (pgadmin).
Strategy set to Recreate where needed for RWO volumes.
2026-04-04 19:25:12 +03:00
claim_name = kubernetes_persistent_volume_claim . openlobster_data_proxmox . metadata [ 0 ] . name
2026-03-19 20:23:59 +00:00
}
}
}
}
}
lifecycle {
2026-05-16 12:41:05 +00:00
ignore_changes = [
spec [ 0 ] . template [ 0 ] . spec [ 0 ] . dns_config , # KYVERNO_LIFECYCLE_V1
metadata [ 0 ] . annotations [ " keel.sh/policy " ] ,
metadata [ 0 ] . annotations [ " keel.sh/trigger " ] ,
metadata [ 0 ] . annotations [ " keel.sh/pollSchedule " ] , # KYVERNO_LIFECYCLE_V2
2026-05-28 23:09:30 +00:00
metadata [ 0 ] . annotations [ " keel.sh/match-tag " ] ,
spec [ 0 ] . template [ 0 ] . spec [ 0 ] . container [ 0 ] . image , # KEEL_IGNORE_IMAGE — Keel manages tag updates
metadata [ 0 ] . annotations [ " kubernetes.io/change-cause " ] ,
metadata [ 0 ] . annotations [ " deployment.kubernetes.io/revision " ] ,
spec [ 0 ] . template [ 0 ] . metadata [ 0 ] . annotations [ " keel.sh/update-time " ] , # KEEL_LIFECYCLE_V1
2026-05-16 12:41:05 +00:00
]
2026-03-19 20:23:59 +00:00
}
}
resource " kubernetes_service " " openlobster " {
metadata {
name = " openlobster "
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
labels = {
app = " openlobster "
}
}
spec {
selector = {
app = " openlobster "
}
port {
port = 80
target_port = 8080
}
}
}
module " openlobster_ingress " {
source = " ../../modules/kubernetes/ingress_factory "
2026-04-16 13:45:04 +00:00
dns_type = " proxied "
2026-03-19 20:23:59 +00:00
namespace = kubernetes_namespace . openclaw . metadata [ 0 ] . name
name = " openlobster "
tls_secret_name = var . tls_secret_name
host = " openlobster "
port = 80
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
auth = " required "
2026-03-19 20:23:59 +00:00
}