Commit graph

17 commits

Author SHA1 Message Date
Viktor Barzin
903fc8377f [cleanup] Remove ollama from dashy + docs + nfs_directories
## Context
Final stage (9) of ollama decommission. After the stack was destroyed in
commit 0386f03f, several residual references remained:
- Vault KV `secret/ollama` (metadata + versions)
- `secrets/nfs_directories.txt` line listing `ollama` as a backup target
- `stacks/dashy/conf.yml` — "Ollama" tile linking to `ollama.viktorbarzin.me`
- `stacks/homepage/INGRESS_WIDGET_MAPPING.md` — 3 rows documenting the
  now-removed ingresses (ollama, ollama-api, ollama-server)

## This change
- `vault kv metadata delete secret/ollama` → all versions + metadata deleted.
- `secrets/nfs_directories.txt`: removed the `ollama` entry (line 71).
- `stacks/dashy/conf.yml`: removed the Ollama tile (`&ref_42`) and its
  reference at the end of the list; applied via Terragrunt so the running
  dashy ConfigMap picks up the change. Dashy apply: 0 added, 4 changed, 0
  destroyed (the ConfigMap diff plus the usual benign Kyverno drift).
- `stacks/homepage/INGRESS_WIDGET_MAPPING.md`: removed the 3 ollama rows.

## What was considered but NOT changed
- `stacks/ytdlp/yt-highlights/app/main.py`: `OLLAMA_URL = os.getenv("OLLAMA_URL", "")`
  already falls back to empty string when unset; the env var is no longer
  injected (stage 3) so this path is dead at runtime. Leaving source alone
  to keep this commit scoped to infra-only cleanup — future app-level
  cleanup can remove the dead fallback code.
- `stacks/k8s-portal/modules/k8s-portal/files/src/routes/agent/+server.ts`:
  only mentions `var.ollama_host` in a documentation string inside a
  system-prompt template — non-functional. Will fix in a separate commit
  alongside the k8s-portal agent docs pass.

## Test plan
### Automated
- `vault kv get secret/ollama` → "No value found" (confirmed after delete).
- `scripts/tg apply` on dashy → "Apply complete! Resources: 0 added, 4 changed, 0 destroyed."
- `grep -n ollama secrets/nfs_directories.txt` → empty.

### Manual Verification
1. Open `https://dashy.viktorbarzin.me/` → Ollama tile is gone.
2. `kubectl get cm -n dashy dashy-config -o yaml | grep -i ollama` → no matches.
3. `vault kv get secret/ollama` → error "No value found at secret/data/ollama".
4. On PVE host: `rm -rf /srv/nfs-ssd/ollama` (optional — I skipped the
   on-host disk cleanup; it's a manual ops step the user can run when
   comfortable).

Closes: code-1gu

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 11:17:59 +00:00
Viktor Barzin
fb199e2da9 [ci skip] remove atuin: destroy stack, DNS, NFS export, PostgreSQL credentials 2026-03-06 20:11:14 +00:00
Viktor Barzin
678f92ffb4 [ci skip] onlyoffice: cache fonts/themes on NFS for fast restarts
Persist font cache (159MB) and theme images (10MB) to NFS volume.
Set GENERATE_FONTS=false to skip regeneration on startup since cache
is warm. Startup time: ~3 min -> 5 seconds.
2026-03-01 18:02:38 +00:00
Viktor Barzin
b10d43b7a7 [ci skip] openclaw: persist home directory on NFS
- Switch openclaw-home from emptyDir to NFS (/mnt/main/openclaw/home)
- Persists SOUL.md, IDENTITY.md, sessions, memory DB, telegram state,
  device identity, and all runtime files across pod restarts
- Init container still refreshes openclaw.json and kubeconfig on each start
2026-03-01 16:12:07 +00:00
Viktor Barzin
e8ff760aff [ci skip] openclaw: cache tools on NFS for fast restarts
- Switch /tools volume from emptyDir to NFS (/mnt/main/openclaw/tools)
- Skip download of kubectl, terraform, terragrunt, pip packages if cached
- Startup time: ~2.5min → ~38s on subsequent restarts
2026-03-01 13:59:07 +00:00
Viktor Barzin
2b22c90a56 [ci skip] Phase 2: migrate Redis from NFS to local disk
- Switch from redis/redis-stack:latest to redis:7-alpine
  (modules were completely unused — zero module commands in stats)
- Move data from NFS (/mnt/main/redis) to local-path PVC
  (RDB saves: 39s on NFS → <1s on local disk)
- Start fresh (old RDB had redis-stack module data incompatible with plain redis;
  all Redis data is transient — queues and caches rebuild automatically)
- Add hourly redis-backup CronJob: redis-cli --rdb to NFS for backup pipeline
- Remove RedisInsight UI ingress (port 8001, only in redis-stack)
- Add redis-backup to NFS exports
- 110 clients reconnected immediately after switchover
- Memory savings: ~100MB from dropping unused modules
2026-02-28 19:44:08 +00:00
Viktor Barzin
a1ba218cd2 [ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk
Major milestone - shared PostgreSQL moved from NFS to CloudNativePG:
- CNPG cluster (pg-cluster) running in dbaas namespace on local-path storage
- PostGIS image (ghcr.io/cloudnative-pg/postgis:16) for dawarich compatibility
- All 20 databases and 19 roles restored from pg_dumpall backup
- postgresql.dbaas Service patched to point at CNPG primary
- Old PG deployment scaled to 0 (NFS data intact for rollback)
- All 12+ dependent services verified running:
  authentik, n8n, dawarich, tandoor, linkwarden, netbox, woodpecker,
  rybbit, affine, health, resume, trading-bot, atuin
- Authentik PgBouncer working through the switched endpoint

TODO: codify CNPG cluster in Terraform, add 2nd replica, update backup CronJob
2026-02-28 19:08:06 +00:00
Viktor Barzin
0274cc0722 [ci skip] technitium: add primary-secondary DNS HA with AXFR zone replication
Secondary instance on a separate node replicates all zones from primary via
zone transfer. LoadBalancer routes DNS queries to both pods. PDB ensures at
least 1 DNS pod survives voluntary disruptions. Setup job automates zone
transfer enablement and secondary zone creation via Technitium REST API.
2026-02-28 14:14:20 +00:00
Viktor Barzin
c8de2c4803 [ci skip] Sunset Drone CI: remove all artifacts, DNS, configs, and references
Drone CI has been fully replaced by Woodpecker CI at ci.viktorbarzin.me.
Destroys K8s resources (12), removes DNS records, NFS exports, Uptime Kuma
monitor, dashboard entry, and all code/doc references across 18 files.
2026-02-23 19:38:55 +00:00
Viktor Barzin
cbf041bcc9 [ci skip] Add Woodpecker CI stack (WIP) and claude agents
- Add stacks/woodpecker/ with Helm-based deployment config
- Add .woodpecker/ CI pipeline configs (default, build-cli, renew-tls)
- Add NFS export entry for woodpecker
- Add .claude/agents/ definitions
2026-02-22 21:30:25 +00:00
Viktor Barzin
c277d28bd8 [ci skip] Add NFS export and DNS record for poison-fountain 2026-02-22 19:47:46 +00:00
Viktor Barzin
2fe7fa547c [ci skip] Configure f1-stream: WebAuthn, NFS storage, headless browser
- Set WEBAUTHN_RPID/ORIGIN for f1.viktorbarzin.me domain
- Add NFS volume at /mnt/main/f1-stream for persistent session/stream data
- Enable headless browser extraction (HEADLESS_EXTRACT_ENABLED=true)
- Reduce replicas to 1 (file-based sessions don't work across replicas)
2026-02-21 15:57:25 +00:00
Viktor Barzin
843b9658d5 [ci skip] Rename moltbot to openclaw across Terraform, K8s resources, and DNS
Update terraform version in init container from 1.12.1 to 1.14.5.
2026-02-18 21:53:46 +00:00
Viktor Barzin
a73f3fcb6b Cluster health remediation: cleanup CronJob, disable Collabora, fix GPU probe, add NFS exports [ci skip]
- Add daily CronJob to auto-clean Failed/Evicted pods cluster-wide (infra-maintenance)
- Disable Collabora in Nextcloud (broken HPA caused scaling storm; using OnlyOffice instead)
- Increase gpu-pod-exporter liveness probe timeout from 1s to 5s
- Add osm-routing NFS exports (osrm-data, otp-data)
2026-02-15 17:20:47 +00:00
Viktor Barzin
69aae2ec9d [ci skip] Fix code review findings: correct Alertmanager URL, add atomic to Loki, remove dead minio NFS export, update design doc 2026-02-13 23:08:44 +00:00
Viktor Barzin
a44dfac721 [ci skip] Deploy MoltBot (OpenClaw) AI agent gateway
Add new Kubernetes service for OpenClaw gateway connected to in-cluster
Ollama, with kubectl/terraform/git access for infrastructure management.
Protected behind Authentik SSO.
2026-02-13 22:57:36 +00:00
Viktor Barzin
861cd80c64 add the nfs dirs 2026-02-08 02:29:48 +00:00