Wire ha-mcp, context7, and the in-pod playwright sidecar as native
MCP servers on OpenClaw via `mcp set` in the container startup
(ConfigMap-baked mcp.servers gets stripped by `doctor --fix`; CLI-set
entries persist). HA URL pulled from new Vault key
secret/openclaw.ha_sofia_mcp_url and passed via the
HA_SOFIA_MCP_URL env var.
Add a daily 03:00 UTC `memory-sync` CronJob in the openclaw
namespace: pulls all non-sensitive memories from
claude-memory.claude-memory.svc:80/api/memories, groups by category,
writes 18 Markdown files into /workspace/memory/projects/claude-
memory-sync/ (the path memory-core indexes), then triggers
`openclaw memory index --force` via kubectl exec. Reuses the
existing cluster-healthcheck SA (pods+pods/exec). Smoke test: 1488
memories synced, 25/25 files indexed, search returns hits.
Also drops the legacy /app/extensions entry from
plugins.load.paths (doctor warning), wires HA_SOFIA_MCP_URL env,
and one-shot deletes the stale 2026-02-28 metaclaw-export.json from
the openclaw home volume.
claude_memory MCP intentionally NOT wired — its /mcp/mcp transport
404s on the deployed claude-memory-mcp:17 image (tracked as
code-z1so). Shared knowledge is delivered via the CronJob's REST
sync instead. Adding claude_memory to mcp.servers is a one-line
follow-up once that's fixed.
The broker-sync Fidelity provider emits 'unrealised-gains-offset'
DEPOSIT activities to reconcile Wealthfolio's total with the
PlanViewer reported pot, because Wealthfolio doesn't track pension
fund units directly. Wealthfolio's data model treats that DEPOSIT as
a cash contribution, which double-inflates net_contribution and
zeroes out the implied growth.
Add a Postgres view 'dav_corrected' in wealthfolio_sync that
subtracts the cumulative gains-offset from net_contribution per
account per date (re-exporting as 'net_contribution' so it's a
drop-in replacement). All 17 wealth dashboard panels that compute
contribution/growth/ROI now read from the view. Total impact:
portfolio Growth jumps from £301,753.19 to £337,474.39 (exactly
the £35,721.20 Fidelity offset that was previously miscategorised).
Bulk enrollment commit 8f4b1956 had its CI pipeline #689 killed before
terragrunt apply ran. The enrollment label + V2 lifecycle changes are
in master but never reached the cluster. Appending a one-line marker
to each pending stack's main.tf so Woodpecker's diff-detection picks
them up and applies them serially.
Idempotent — re-applying a stack whose state already matches is a no-op.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
For Deployments enrolled in Keel with policy=patch, the image tag is
updated by Keel as new patches release upstream. Without
ignore_changes on the image field, terragrunt apply would fight Keel
in an endless loop (TF reverts → Keel re-rolls → repeat — same shape
as the calico/tigera-operator fight from earlier).
Adding KEEL_IGNORE_IMAGE marker to the lifecycle of these stacks.
Image string in TF becomes the initial seed; Keel rolls it forward.
Stacks: actualbudget, broker-sync, changedetection, city-guesser,
coturn, dashy, dawarich, diun, ebook2audiobook, ebooks, echo,
excalidraw, foolery, forgejo, freedify.
CI-driven self-hosted stacks (fire-planner, job-hunter, payslip-ingest,
recruiter-responder, claude-agent-service, claude-memory) keep TF
ownership of image and policy=never — their image_tag is set by CI
via terragrunt.hcl inputs, not by Keel. Adding image to ignore_changes
on those would break the CI deploy flow.
Caveat: only container[0].image is added. Multi-container Deployments
(immich, beads, etc.) will need additional container[N].image lines
for any container Keel rolls. Those stacks are not currently enrolled.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Keel kept rewriting calico-node + calico-kube-controllers images to
v3.26.5 (proper patch update); tigera-operator immediately reverted
to v3.26.1 because the Installation CR is the source of truth.
Endless churn but no data loss — Calico stayed healthy throughout.
Removing keel.sh/enrolled label and live label from calico-system ns.
Calico upgrades go through the tigera-operator's Installation CR
manually, not Keel.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Move from `never` (no auto-update) to `patch` for the cluster-wide
default. Keel only auto-updates PATCH versions within the current
major.minor: 0.26.6 → 0.26.7 OK; 0.26.6 → :nightly-latest blocked.
Tag-rewrites that broke calico (v3.26.1 → :master) and affine
(0.26.6 → :nightly-latest) on 2026-05-16 cannot recur with patch.
Caveats:
* Patch causes Terraform image drift for semver-pinned services —
drift-detection pipeline will surface it; lifecycle ignore_changes
on container[].image can be added per stack later if drift is
noisy.
* Tags that aren't parseable as semver (:latest, :11, :nightly,
SHA tags) are ignored by patch — those workloads stay on their
current image until promoted to `force` policy individually.
Self-hosted CI-driven services + chrome-service kept on `never`
(deliberate pins / CI controls the tag):
recruiter-responder, claude-agent-service, claude-memory,
chrome-service, fire-planner, job-hunter, payslip-ingest
Live state already updated via kubectl apply + per-workload patches.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- claude-agent-service bumped to 191ed5dd (new AI section in agent
template — leadership stance, approved tools, usage limits / quotas,
code-gen safety, product-side AI depth, follow-up questions for the
recruiter when the web is sparse).
- recruiter-responder bumped to ab59eeab (deep_research prompt asks
for AI culture; warm_engage template adds a written-only ask for
IDE assistants, chat tools, per-seat limits, source-to-external
model policy).
Smoke-tested 2026-05-16: forced fresh research on Datadog, agent
returned full structured AI section with 7 explicit recruiter
questions covering DLP/IDE/limits/code-gen-policy. $0.80 / 192s.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 incident: Keel's `force` policy switched semver-pinned
images (affine 0.26.6 → :nightly-latest, calico v3.26.1 → :master)
instead of digest-tracking. Force is documented as "always update
to the newest tag in the registry" — only safe on already-mutable
tags like :latest.
Changing the cluster-wide default in inject-keel-annotations to
`never`. The namespace enrollment label + V2 lifecycle suppression
stay in place so opt-in is one annotation per Deployment, but no
service auto-updates until explicitly approved.
To opt in a workload now:
1. Verify the Deployment image is on a mutable tag (:latest,
:<major>, or a vendor "stable" tag) — change in Terraform first
if needed.
2. Add to the Deployment's metadata.annotations:
"keel.sh/policy" = "force" (digest tracking)
OR
"keel.sh/policy" = "patch" (semver patch bumps — also
requires ignore_changes on the image)
Live policy already updated via kubectl apply + per-workload
override (force → never).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wire Keel's Slack notifier to the existing bot token in Vault
(secret/viktor -> slack_bot_token). Posts to #general by default;
override via slack.channel in the Helm values if you want a dedicated
channel like #keel-notifications.
Notification level is "info" so we get every rollout event, not just
errors. Approval flow is OFF — opt-out-pure means all updates apply
unattended. If we later introduce approvals, add slack.approvalsChannel.
Resolves user request: 'keel should send notifications to slack everytime
it upgrades an app'.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
OpenClaw can now answer 'what do we know about <company>?' from cache
via the new recruiter_company_research tool, and recruiter_get embeds
the cached research payload inline.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The Helm repo at https://charts.keel.sh has versions 1.0.0–1.0.5,
1.1.0, 1.2.0. 1.0.6 is not published, so the Phase 0 apply failed
silently. Bump to 1.2.0 (app version 0.21.1, latest stable).
The weekday-only schedule was a 2026-03-16-incident-era guardrail when
the rest of the safety net was thin. Today's gates — halt-on-alert,
sentinel-gate Check 4 (24h soak via node Ready transitions), the
K8sUpgradeStalled alert, drainTimeout=30m, concurrency=1, and the
sentinel-path fix from earlier today — make weekend reboots safe and
just clear the backlog faster.
Effect: 5 pending node reboots clear in 5 calendar days instead of
queueing up over weekends. The K8s version-upgrade detection at Sun
12:00 UTC self-defers if a Sunday-morning kured reboot fires (the
RecentNodeReboot alert is in the Upgrade Gates ignore-less list for
the version-upgrade preflight — same mechanism kured uses).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Enrolls the cleanest Woodpecker-build-only self-hosted services into
the inject-keel-annotations ClusterPolicy by labeling their namespaces
keel.sh/enrolled=true. CI already pushes :latest (auto_tag: true) on
each, so Keel will detect the current upstream digest and trigger a
rolling restart when polling starts (1h cadence).
Per-Deployment lifecycle extended with KYVERNO_LIFECYCLE_V2 to suppress
the annotation drift Kyverno will inject (keel.sh/policy, /trigger,
/pollSchedule).
Services included:
- fire-planner
- job-hunter
- payslip-ingest
- recruiter-responder
Skipped from Phase 1 for follow-up:
- claude-agent-service (user has WIP on main.tf)
- claude-memory (Postgres co-deployed; treat in Phase 9 with other DBs)
- kms (two Deployments; needs per-resource review)
- wealthfolio (sync sidecar pattern; needs review)
- chrome-service (deliberate :v4 pin; needs keel.sh/policy: never label)
- GHA-migrated repos (10) (need per-repo CI cleanup)
- beadboard, freedify (no CI)
See docs/plans/2026-05-16-auto-upgrade-apps-{design,plan}.md.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- claude-agent-service bumped to f764fef6 (agent system prompt adds
the Perks block: food/health/pension/equity/PTO/parental/equipment/
learning/wellness/amenities/commuter). 1200-word cap.
- recruiter-responder bumped to 38a2cdaa (cache-first deep_research:
serves cached payload if fetched_at + ttl_seconds > now; cache
writes upsert; new force flag bypasses).
Verified end-to-end: deep_research on Datadog now returns full Perks
section (~220s, $0.60, 23 turns). Earlier 500 fixed (was
uq_research_company_tier dup-key on re-run).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Foundation for opt-out-pure auto-update model per
docs/plans/2026-05-16-auto-upgrade-apps-{design,plan}.md.
- New stack `stacks/keel/` deploys Keel via Helm (charts.keel.sh, v1.0.6).
Polls registries hourly per design decision #8. Default schedule
overridable per-workload via keel.sh/pollSchedule annotation.
- New Kyverno ClusterPolicy `inject-keel-annotations` mutates Deployments,
StatefulSets, and DaemonSets in namespaces labeled `keel.sh/enrolled=true`
with keel.sh/policy=force + trigger=poll + pollSchedule=@every 1h.
- Phase 0 enrolls no namespaces. Phase 1 (next session) labels the
self-hosted set.
- Per-workload opt-out: label `keel.sh/policy: never` (used by rollback
runbook and chrome-service-style deliberate pins).
- Keel namespace excluded from the mutate — supervisor self-update has
too-bad a failure mode (decision #11).
- AGENTS.md: KYVERNO_LIFECYCLE_V2 marker convention added for the
ignore_changes block enrolled workloads need.
- .claude/CLAUDE.md: docker-images rule flagged as transitional.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures the May 10–16 kured-vs-sentinel-gate hostPath mismatch (chart
derived hostPath from configuration.rebootSentinel) and the companion
work to harden the rolling-reboot pipeline against single-replica
PDB deadlocks: Anubis 1→2 replicas with shared Valkey store, kured
drainTimeout=30m, CNPG pg-cluster 2→3 instances. Includes the
mysql-standalone-PDB orphan cleanup and the k8s-node1 containerd-source
drift audit (benign).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three defensive moves to make the kured rolling-reboot cycle survive
edge cases without operator intervention:
kured (stacks/kured/main.tf):
- Set `configuration.drainTimeout = "30m"`. Default is unlimited; if
a future PDB or finalizer stalls drain, kured retries forever and
the node stays cordoned silently. 30m caps the silent-failure
window — after timeout kured logs the abort and waits for the
next period; the node stays Schedulable so cluster capacity isn't
lost. Lets us fail closed instead of fail-silent.
CNPG pg-cluster (stacks/dbaas/modules/dbaas/main.tf):
- Bump instances 2 → 3 (1 primary + 2 replicas). With 2 instances the
failover during a primary-node drain depended on the lone replica
being caught up; a WAL backlog would stall the drain until the
replica was current. With 3 instances CNPG always has at least one
fully-current replica to promote, and the PDB's
`minAvailable=1` on the primary selector is satisfied throughout
the switchover. Storage: +20Gi PVC on proxmox-lvm-encrypted (about
35Gi after autoresize). Memory: +3Gi pod limit.
- Updated the `triggers.instances` so the null_resource's local-exec
actually re-applies the YAML (kubectl apply with the new spec). The
YAML is the source-of-truth but the trigger is what tells terraform
to re-run the provisioner.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Anubis pre-2026-05-16 ran at replicas=1 because in-flight PoW challenge
state lived in process memory — a challenge issued by pod A wouldn't be
verifiable by pod B (HTTP 500 "store: key not found"). The PDB at
`minAvailable=1` made this worse: with replicas=1 the eviction API can
NEVER satisfy the constraint, so every drain on a node hosting an Anubis
pod looped forever. This is what stalled the manual K8s upgrade on
2026-05-11 (had to delete pods directly to bypass eviction) and was
about to block kured on Monday 2026-05-18 once the kured sentinel fix
landed.
Anubis upstream has first-class support for a Valkey/Redis-protocol
shared store (documented as the "Kubernetes worker pool" pattern).
Wire it up:
- modules/kubernetes/anubis_instance: add `shared_store_url` variable.
When set, appends a `store: { backend: valkey, parameters: { url } }`
block to the rendered policy YAML and defaults replicas to 2 (capped
at 2). PDB switched from `minAvailable=1` to `maxUnavailable=1` so
drains can take down one pod at a time. topologySpreadConstraint
tightened to `DoNotSchedule` so the two replicas land on different
nodes — a single node loss never takes a whole Anubis instance down.
- All 8 call sites (cyberchef, jsoncrack, kms, homepage, blog,
travel_blog, real-estate-crawler, f1-stream) opted in. Each picks a
unique Redis DB index (5–12) on `redis-master.redis:6379`. Cluster
Redis already runs HA via Sentinel + haproxy, no new infra needed.
Verified: every Anubis Deployment now 2/2 Ready with pods on different
nodes; PDBs allow 1 disruption; Redis DBs 5,7,8,10 already populated
by live traffic post-apply; Palo Alto Networks scanner hit blog right
after apply and the challenge log shows the new state path.
Drain on any worker now succeeds without a `predrain_unstick` workaround
— eviction API is satisfied because at most one pod is unavailable at a
time, and the other replica keeps serving. Monday's kured reboot wave
should roll through cleanly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- main.tf: bump image_tag to 1b3350c0 (carries the new agent),
init container also copies recruiter-triage.md
into /home/agent/.claude/agents/.
- terragrunt.hcl: restored (file was missing — apply was blocked).
Standard root include + platform/vault/external-secrets dependencies.
Smoke-tested 2026-05-16: deep_research call on Datadog (thread 42)
via recruiter-responder REST API → 102.5s, $0.43, structured
markdown report with comp bands vs £600k floor, culture signals,
remote policy, recent news, sources cited. End-to-end Tier-2 is live.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the per-repo domain glossary that engineering skills
(diagnose, tdd, improve-codebase-architecture, grill-with-docs)
read before working in this repo. Terms only — no implementation
detail. Six clusters (code organization, cluster, networking,
storage, secrets, CI/CD), 22 terms, plus relationships, an example
dialogue, and five flagged ambiguities.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The kured Helm chart derives the sentinel hostPath from
`dirname(configuration.rebootSentinel)`. Previously
rebootSentinel=/sentinel/gated-reboot-required pointed hostPath at
`/sentinel/` (an empty auto-created directory on every host) while the
kured-sentinel-gate DaemonSet writes to /var/run/gated-reboot-required.
Two different host directories → kured never saw the open gate, even
though the gate's checks were all green every 5 min on every node.
Result: unattended-upgrades has packages waiting on every node since
2026-05-10 (when uu was re-enabled) and kured's hourly log says
"Reboot not required" for the entire period.
Set rebootSentinel=/var/run/gated-reboot-required so the chart mounts
hostPath /var/run — same directory the gate writes to. The in-pod
mountPath (/sentinel) is hardcoded by the chart and doesn't matter,
the symlink chain works out: /sentinel/<file> inside the pod resolves
to /var/run/<file> on the host.
Verified: kured pod can now list /sentinel/gated-reboot-required
(0 B) AND /sentinel/reboot-required (32 B, set by uu on 2026-05-15).
First gated reboot will land Mon 2026-05-18 02:00 London.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stremio stream aggregator now has its own row in the Active Use tier.
Captures the auth model (own UUID+password, not Authentik), monitoring
posture (canary probe + 3 alerts), and backup pipeline (weekly NFS
dumps of both decrypted config and the Stremio account addon
collection).
Follow-up from the 2026-05-15/16 hardening session: 5 commits on
servarr/aiostreams, none previously catalogued.
Adds stremio-account-backup CronJob (Sun 04:00 weekly, offset 1h from
the AIOStreams config-backup at 03:00):
- Logs into api.strem.io with credentials from Vault
(secret/viktor.stremio_email + stremio_password, now also synced
into the aiostreams-probe-secrets ExternalSecret)
- Fetches the full addonCollection via addonCollectionGet
- Writes timestamped JSON to the existing aiostreams-backup PVC
(NFS /srv/nfs/aiostreams-backup/stremio-collection-*.json, mode 0600)
- 90-day retention, logs out to invalidate the auth key
- Pushgateway metrics: stremio_account_backup_{success,bytes,
addon_count,duration_seconds,last_run_timestamp}
Protects against: accidental "uninstall all" / API regression / wrong
account login wiping the curated set of 22 addons (Cinemeta + 16
MDBList + AIOStreams + More Like This + Formulio + Zamunda + Local).
Verified: manual run wrote 93480 bytes, 22 addons, file present on NFS.
- Add ingress_factory module (auth=none, HMAC + expiry are the gate);
ingress_path=["/cb"] only — /api stays internal, /healthz cluster.
dns_type=proxied. anti_ai_scraping=false.
- Drop setup_tls_secret module — Kyverno ClusterPolicy `sync-tls-secret`
auto-clones the wildcard cert into every namespace.
- Bump image_tag to 7383b426 (callback endpoints + SMTP STARTTLS
hostname relax).
- Wire CALLBACK_BASE_URL=https://recruiter-responder.viktorbarzin.me.
- Drop git-crypt-encrypted wildcard cert files into
stacks/recruiter-responder/secrets/. Allowlist privkey.pem in a new
.gitleaksignore — git-crypt encrypts at rest but the working-tree
copy is plaintext, so gitleaks can't tell.
Smoke-tested end-to-end 2026-05-15 23:45:
synthetic email -> Telegram with ✅/❌ buttons -> ✅ tapped via curl
-> 'Sent' HTML page -> thread.status=sent, decision row recorded
with decided_via=telegram_button, outbound message threaded correctly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds two env vars on the AIOStreams deployment:
- WHITELISTED_REGEX_PATTERNS_URLS: Vidhin's release-group regex
(TRaSH-aligned) so syncedRankedRegexUrls works for the user
- WHITELISTED_SEL_URLS: Vidhin's ranked stream expressions +
Tamtaro's ISE/PSE/ESE-standard
Gotcha: AIOStreams validates each synced* field against the matching
whitelist — stream-expression files (incl. Vidhin's expressions.json)
go in WHITELISTED_SEL_URLS, not the regex one, even though they live
in Vidhin's regex repo. Mixing them up returns USER_INVALID_CONFIG.
User config: enabled Vidhin's regex + ranked expressions + Tamtaro's
ISEs. Skipped Tamtaro PSE/ESE for now to avoid surprise over-filtering;
can be added later from the same whitelist.
Adds aiostreams-config-backup CronJob (Sun 03:00 weekly):
- Pulls /api/v1/user via internal ClusterIP with UUID + password from
the existing aiostreams-probe-secrets ExternalSecret
- Writes timestamped JSON to nfs-backup PVC mounted at /backup
- 90-day retention, prunes older files
- Pushgateway metrics: aiostreams_config_backup_{success,bytes,duration,last_run_timestamp}
NFS path: 192.168.1.127:/srv/nfs/aiostreams-backup (auto-synced offsite
to Synology via the existing offsite-sync-backup CronJob).
Complements the daily postgresql-backup-per-db pipeline (which dumps
the encrypted blob) by storing the decrypted JSON — usable for human
inspection / disaster recovery even without the AIOStreams password.
Verified: manual job wrote 12931 bytes, file present on NFS.
- stacks/recruiter-responder/terragrunt.hcl: bump image_tag to 0500c3d3
(300s LLM timeouts + IMAP BODY.PEEK[] fix).
- stacks/openclaw/main.tf: install-recruiter-plugin init container now
runs as uid 0 — the openclaw NFS volume is owned by uid 1000 and the
recruiter-responder image otherwise drops to uid 10001 which can't
write or chown.
Smoke-tested end-to-end 2026-05-15 ~23:15:
Synthetic recruiter email -> IMAP IDLE EXISTS push -> qwen3-8b triage
(12.1s, JSON output complete with company/role/salary/location/tech)
-> 2 drafts persisted in Postgres -> Telegram sendMessage 200 OK.
Then deleted 3 stale n8n workflows W992Nr7..., 1AU4k7..., IisDNx... from
the n8n Postgres workflow_entity table.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- stacks/vault/main.tf: register pg-recruiter-responder static role on
the postgresql connection (7d password rotation). Adds the role to
allowed_roles and creates vault_database_secret_backend_static_role
for `recruiter_responder` user.
- stacks/recruiter-responder/main.tf: drop TASK_WEBHOOK_URL env, swap
TASK_WEBHOOK_TOKEN secret for TELEGRAM_BOT_TOKEN + TELEGRAM_CHAT_ID.
Updated header doc.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three coupled changes for the new recruiter-responder pipeline:
1. stacks/llama-cpp/: add qwen3-8b text-only model to llama-swap. Uses
unsloth/Qwen3-8B-GGUF Q4_K_M, 16k context, no mmproj. Refactored the
download Job script + cmd renderer to handle text_only=true (skip
mmproj download + --mmproj flag). The 3 existing vision models stay
on text_only=false; no behaviour change for them.
2. stacks/recruiter-responder/: new stack. Namespace, 2 ExternalSecrets
(app secrets from secret/recruiter-responder, DB creds from Vault DB
engine static-creds/pg-recruiter-responder), Deployment (replicas=1,
Recreate -- IMAP IDLE + APScheduler want single leader), Service
ClusterIP. Image: forgejo.viktorbarzin.me/viktor/recruiter-responder.
3. stacks/openclaw/: add init container `install-recruiter-plugin` that
uses the recruiter-responder image to copy the .mjs plugin into
/home/node/.openclaw/extensions/recruiter-api/ on NFS. Couples plugin
version to the recruiter-responder image tag. Also injects
RECRUITER_RESPONDER_URL + RECRUITER_RESPONDER_TOKEN env vars (token
from openclaw-secrets.recruiter_responder_bearer_token, optional).
Pre-apply checklist for recruiter-responder stack:
- Vault: seed secret/recruiter-responder with webhook_bearer_token,
imap_{me,spam}_{user,pass}, smtp_password, claude_agent_token,
task_webhook_token.
- Vault: add secret/openclaw.recruiter_responder_bearer_token (same as
above webhook_bearer_token).
- dbaas: create DB recruiter_responder + role recruiter_responder,
and Vault DB-engine role static-creds/pg-recruiter-responder.
- Build + push image via Woodpecker (recruiter-responder repo CI).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Hardening pass following the empty-stream-list incident:
1. STREAM_CACHE_TTL=3600 — re-enables stream payload cache (was -1 /
disabled). Default behaviour hit all 5 upstream addons on every
Stremio request; with a 1h TTL repeat requests for the same title
are instant, while RD cache invalidations still propagate quickly.
2. aiostreams-stream-probe CronJob (every 5 min): fetches the user's
encryptedPassword via the internal ClusterIP, runs a canary stream
search for Breaking Bad S01E01, pushes streams_count + probe_success
to Pushgateway. Uses an ExternalSecret pulling UUID + password from
Vault secret/viktor. Same pattern as email-roundtrip-monitor.
3. Three alerts in monitoring's prometheus_chart_values.tpl:
- AIOStreamsStreamCountLow (< 50 streams for 30m)
- AIOStreamsProbeFailing (probe_success == 0 for 30m)
- AIOStreamsProbeStale (last_run_timestamp > 30min for 10m)
Verified: probe returned streams=411 success=1 on first run; all 3
alerts loaded into Prometheus with state=inactive health=ok.
- Pin viren070/aiostreams:nightly → :2026.05.14.1326-nightly (avoid
stale-pull cache, matches 8-char SHA convention for rolling tags)
- Switch ingress auth tier required → app: Authentik forward-auth
blocks Stremio clients (cannot follow OAuth 302), and AIOStreams
already enforces UUID + password on /configure and /api/*, with
Stremio addon URLs using encryptedPassword as a bearer token.
Result: empty-stream-list issue fixed for public Stremio clients.
Verified: 410 streams returned via public URL for Breaking Bad S01E01
with no cookies, vs 0 before (502→Authentik OIDC redirect).
Positions panel now sits at y=32 (immediately below the
contrib-vs-market + growth row at y=22..32), and everything from
the per-account stack down shifts 8 rows lower.
pg-sync sidecar now mirrors three extra views from the wealthfolio
SQLite: assets (id/symbol/name/currency), quote_latest (one row per
asset, preferring YAHOO over MANUAL on same-day collisions), and
positions_latest (currently-held positions extracted from the TOTAL
aggregate row of holdings_snapshots — quantity, average cost,
total cost basis).
Wealth dashboard gets a new bottom Positions table joining the three:
symbol, name, shares, avg cost, last price, market value, cost,
gain, return %. Gain and return % are color-text with red<0, green>=0
thresholds.
The lobby has grown enough (frontend, two Go services, devvm units +
scripts + config) that it earns its own repo. Code now lives at
https://forgejo.viktorbarzin.me/viktor/terminal-lobby with
scripts/deploy.sh covering the manual deploy until CI activation
lands (Woodpecker forge_id=2 activation still 500s; Forgejo Actions
not yet enabled).
This stack now owns only the K8s side — Services, Endpoints,
IngressRoutes, middlewares. main.tf comment block updated to point
at the new repo and the full DevVM port map.
Removed:
- stacks/terminal/files/ (index.html + DevVM artefacts)
- stacks/terminal/tmux-api/ (Go service)
- stacks/terminal/clipboard-upload/ (Go service)
Drops the hardcoded violet/indigo palette. Four themes are defined as
CSS variables on body.theme-{carbon,slate,mono,ink}:
- Carbon (default): warm dark, ivory text, restrained amber accent.
- Slate: cool dark, GitHub/Linear-ish charcoal with electric blue.
- Mono: strict greyscale, off-white accent.
- Ink: warm paper light, deep ink, terracotta accent.
The lobby reads the choice from localStorage and applies the class
before render. The picker lives at the bottom of the sidebar
(margin-top: auto pins it). On change, the iframe is bounced through
about:blank so the inner xterm picks up the new computed CSS vars
(--terminal-bg/fg/cursor/selection) on the next mount.
Picker UI uses native buttons, current theme highlighted with the
accent border + color. No gradients, hairline borders only.
Backend: POST /sessions/<name>/rename in tmux-api runs tmux
rename-session as the mapped OS user. 400 on bad name, 404 on missing
source, 409 on duplicate target, 401 on missing auth header.
Frontend:
- Rename button per card → prompt() dialog, validates against the
shared regex. Updates currentActive + hash + iframe.src if the
renamed session was active.
- Session order is now user-driven, persisted in localStorage
keyed per osUser. New sessions append at the bottom. The previous
sort-by-lastActivity is gone.
- HTML5 drag-and-drop reorders cards live during dragover; dragend
captures the DOM order into localStorage.
- Polling renderLobby is suppressed while a drag is in flight so the
5s tick doesn't yank the list out from under the user.
Replace full-page navigation with a two-pane lobby. Sidebar holds the
session list as clickable cards; an iframe in the content pane swaps
its src on click so switching sessions takes one click instead of two
navigations.
- #lobby-shell grid (260px sidebar + iframe pane)
- Cards become role=button, kill button stops propagation
- activateSession/deactivateSession with hash routing
(location.hash <-> active session, replaceState so back stack stays
clean)
- Killed active session deactivates the iframe before re-render
- 5s session poll preserves currentActive; deactivates if gone
- Mobile media query collapses to one column
CSP frame-ancestors already permits same-origin embedding
(*.viktorbarzin.me), no infra changes needed. Direct-link
?arg=<name> path is unchanged.