Commit graph

221 commits

Author SHA1 Message Date
Viktor Barzin
2b5e4888e0
add Google OAuth env vars to plotting-book deployment
Deploy GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET, and GOOGLE_CALLBACK_URL
to the plotting-book container. Update CSP to allow accounts.google.com
for connect-src and form-action directives.
2026-03-07 20:41:08 +00:00
Viktor Barzin
d4400f8283
[ci skip] remove atuin: destroy stack, DNS, NFS export, PostgreSQL credentials 2026-03-06 20:11:14 +00:00
Viktor Barzin
1d80c49201
[ci skip] iSCSI migration, healthcheck fixes, health probes, etcd backup
- Migrate MySQL/PostgreSQL storage from local-path to iscsi-truenas
- Add democratic-csi iSCSI driver module for TrueNAS
- Add open-iscsi to cloud-init VM template
- Fix Shlink health probe path (/api/v3 -> /rest/v3 for Shlink 5.0)
- Fix etcd backup: use etcd 3.5.21-0 (3.6.x is distroless, no /bin/sh)
- Fix cluster healthcheck CronJob: always exit 0 to prevent circular
  JobFailed alerts (reporting via Slack, not exit codes)
- Fix Uptime Kuma nested list handling in cluster-health.sh
- Add health probes to: audiobookshelf, immich ML, ntfy, headscale,
  uptime-kuma, vaultwarden, rybbit (clickhouse + server + client),
  shlink, shlink-web
- Add iSCSI storage documentation to CLAUDE.md
2026-03-06 19:54:21 +00:00
Viktor Barzin
78d5aeb5db
[ci skip] f1-stream: add Discord token and channel env vars 2026-03-01 20:17:38 +00:00
Viktor Barzin
30702057f9
[ci skip] openclaw: fix Telegram, update to v2026.2.26, fix startup issues
- Update OpenClaw from v2026.2.9 to v2026.2.26 (fixes Telegram channel)
- Add gateway.mode=local + wizard block (required for channel startup)
- Add dangerouslyAllowHostHeaderOriginFallback (v2026.2.26 requirement)
- Run doctor --fix at container startup to auto-enable Telegram
- Create required dirs (canvas, devices, cron, sessions, credentials)
- Fix permissions: chown -R 1000:1000 for node user
- Telegram: DM allowlist, user 8281953845 only
2026-03-01 15:47:54 +00:00
Viktor Barzin
2dda360b29
[ci skip] openclaw: add Telegram channel + install terragrunt in init container
- Add Telegram bot integration (DM allowlist, user 8281953845 only)
- Install terragrunt v0.99.4 in init container alongside terraform
- Remove terraform init from init (terragrunt handles this per-stack)
- Add openclaw_telegram_bot_token variable
2026-03-01 13:44:58 +00:00
Viktor Barzin
d9946506cd
[ci skip] openclaw: switch to free agentic models via NVIDIA NIM, OpenRouter, Llama API
- Primary: Mistral Large 3 (675B) on NIM - always warm, excellent tool calling
- Fallback 1: Nemotron Ultra 253B on NIM
- Fallback 2: Llama 4 Maverick on Llama API (different provider for resilience)
- 10 models total across 3 providers, all free
- Removed: Modal (GLM-5), Gemini, Ollama providers
- Added: NVIDIA NIM provider with DeepSeek V3.2, Qwen 3.5, Qwen 3 Coder, GLM-5
- Bumped maxTokens from 8192 to 16384 for agentic output room
2026-03-01 13:22:47 +00:00
Viktor Barzin
9d9c8fdc12
[ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk
Major milestone - shared PostgreSQL moved from NFS to CloudNativePG:
- CNPG cluster (pg-cluster) running in dbaas namespace on local-path storage
- PostGIS image (ghcr.io/cloudnative-pg/postgis:16) for dawarich compatibility
- All 20 databases and 19 roles restored from pg_dumpall backup
- postgresql.dbaas Service patched to point at CNPG primary
- Old PG deployment scaled to 0 (NFS data intact for rollback)
- All 12+ dependent services verified running:
  authentik, n8n, dawarich, tandoor, linkwarden, netbox, woodpecker,
  rybbit, affine, health, resume, trading-bot, atuin
- Authentik PgBouncer working through the switched endpoint

TODO: codify CNPG cluster in Terraform, add 2nd replica, update backup CronJob
2026-02-28 19:08:06 +00:00
Viktor Barzin
a1b669ae97
[ci skip] Deploy VPA + Goldilocks for dynamic resource right-sizing
Add Vertical Pod Autoscaler (recommender, updater, admission-controller)
and Goldilocks dashboard to monitor resource recommendations across all
namespaces. Dashboard at goldilocks.viktorbarzin.me behind Authentik.
2026-02-25 21:54:01 +00:00
Viktor Barzin
900f72d60d
[ci skip] Fix Woodpecker GitHub forge: add explicit GITHUB_URL to prevent Forgejo URL bleed
When both WOODPECKER_GITHUB and WOODPECKER_FORGEJO are enabled without
an explicit WOODPECKER_GITHUB_URL, the GitHub forge inherits the Forgejo
URL causing all GitHub API calls to hit forgejo.viktorbarzin.me with
GitHub OAuth credentials, resulting in 401 Unauthorized on repo add and
cron jobs. Also adds Forgejo forge variables to Terraform.
2026-02-24 23:02:33 +00:00
Viktor Barzin
c665544f41
[ci skip] fix plotting-book: add SESSION_SECRET env var
Session secret stored in encrypted terraform.tfvars, referenced
via variable to avoid committing secrets in plain text.
2026-02-23 22:44:04 +00:00
Viktor Barzin
2d919c4d34
[ci skip] Infrastructure hardening: security, monitoring, reliability, maintainability
Phase 1 - Critical Security:
- Netbox: move hardcoded DB/superuser passwords to variables
- MeshCentral: disable public registration, add Authentik auth
- Traefik: disable insecure API dashboard (api.insecure=false)
- Traefik: configure forwarded headers with Cloudflare trusted IPs

Phase 2 - Security Hardening:
- Add security headers middleware (HSTS, X-Frame-Options, nosniff, etc.)
- Add Kyverno pod security policies in audit mode (privileged, host
  namespaces, SYS_ADMIN, trusted registries)
- Tighten rate limiting (avg=10, burst=50)
- Add Authentik protection to grampsweb

Phase 3 - Monitoring & Alerting:
- Add critical service alerts (PostgreSQL, MySQL, Redis, Headscale,
  Authentik, Loki)
- Increase Loki retention from 7 to 30 days (720h)
- Add predictive PV filling alert (predict_linear)
- Re-enable Hackmd and Privatebin down alerts

Phase 4 - Reliability:
- Add resource requests/limits to Redis, DBaaS, Technitium, Headscale,
  Vaultwarden, Uptime Kuma
- Increase Alloy DaemonSet memory to 512Mi/1Gi

Phase 6 - Maintainability:
- Extract duplicated tiers locals to terragrunt.hcl generate block
  (removed from 67 stacks)
- Replace hardcoded NFS IP 10.0.10.15 with var.nfs_server (114
  instances across 63 files)
- Replace hardcoded Redis/PostgreSQL/MySQL/Ollama/mail host references
  with variables across ~35 stacks
- Migrate xray raw ingress resources to ingress_factory modules
2026-02-23 22:05:28 +00:00
Viktor Barzin
59c862fe18
[ci skip] dns: remove stale SendGrid CNAME records
Remove em7107, s1._domainkey, and s2._domainkey SendGrid CNAME
records from the bind zone. These are remnants from a previous
relay setup that is no longer in use.
2026-02-23 20:31:07 +00:00
Viktor Barzin
e95ef07b04
[ci skip] mailserver: tighten DMARC policy to quarantine
Move DMARC enforcement from p=none (monitoring only) to p=quarantine
so spoofed emails from viktorbarzin.me are quarantined by recipients.
2026-02-23 20:30:30 +00:00
Viktor Barzin
0eababf212
[ci skip] Sunset Drone CI: remove all artifacts, DNS, configs, and references
Drone CI has been fully replaced by Woodpecker CI at ci.viktorbarzin.me.
Destroys K8s resources (12), removes DNS records, NFS exports, Uptime Kuma
monitor, dashboard entry, and all code/doc references across 18 files.
2026-02-23 19:38:55 +00:00
Viktor Barzin
178884714f
[ci skip] Add NFS export and DNS record for poison-fountain 2026-02-22 19:47:46 +00:00
Viktor Barzin
767a8250f6
[ci skip] Disable grampsweb service and remove family DNS record 2026-02-21 18:55:54 +00:00
Viktor Barzin
a4817e8192
[ci skip] Add turn.viktorbarzin.me to non-proxied DNS names 2026-02-21 18:15:06 +00:00
Viktor Barzin
fdf374b751
[ci skip] Add coturn TURN/STUN server for WebRTC relay
- Deploy coturn on k8s with MetalLB shared IP (10.0.20.200)
- Normal pod networking (no hostNetwork), runs on any node
- 100 relay ports (49152-49252), port 3478 for STUN/TURN signaling
- Shared secret auth for time-limited TURN credentials
- For F1 streaming WebRTC NAT traversal
2026-02-21 18:08:01 +00:00
Viktor Barzin
8ec983e3fd
[ci skip] Real estate crawler: 2 replicas for UI/API, rolling update for celery
- UI and API: 1 → 2 replicas for zero-downtime during restarts/crashes
- Celery worker: Recreate → RollingUpdate strategy
- Celery beat: unchanged (Recreate, singleton scheduler)
- Move f1 from Cloudflare proxied to non-proxied DNS
2026-02-21 17:32:45 +00:00
Viktor Barzin
dbab20995b
[ci skip] Add Modal GLM-5 model to OpenClaw, fix streaming and download reliability
- Add modal provider (GLM-5-FP8) as primary model with non-streaming mode
  (GLM-5 uses non-standard reasoning_content field incompatible with streaming)
- Add curl --retry flags to init container downloads for reliability
- Fallback chain: GLM-5 → Gemini 2.5 Flash → Llama 3.3 70B
2026-02-19 23:17:08 +00:00
Viktor Barzin
1206b3860b
[ci skip] Remove Authentik forward auth from Grafana, add admin password management
Fixes HA mobile app 403 when embedding Grafana dashboards - the webview
blocks third-party cookies needed by Authentik forward auth. Grafana
already has anonymous Viewer access enabled, so forward auth is not
needed. Also adds grafana_admin_password variable and explicit resource
limits to prevent ResourceQuota issues during rolling updates.
2026-02-18 21:40:32 +00:00
Viktor Barzin
f8b07b3bb9
[ci skip] Add anca as namespace-owner for plotting-book
- Add ancaelena98@gmail.com as namespace-owner for plotting-book namespace
- Fix RBAC module: don't create namespaces (they're managed by service modules)
- RoleBinding to built-in admin ClusterRole + cluster-wide read-only access
- ResourceQuota: 2 CPU / 4Gi mem requests, 4 CPU / 8Gi limits, 20 pods
2026-02-17 22:18:37 +00:00
Viktor Barzin
14ab6f115f
[ci skip] Fix multi-user k8s access: redirect URIs, email scope, image ref
- Change portal image to viktorbarzin/k8s-portal:latest (Docker Hub)
- Add k8s-portal to cloudflare_non_proxied_names
- Add k8s_users with viktor admin entry to terraform.tfvars
2026-02-17 22:15:20 +00:00
Viktor Barzin
d0b39f1987
[ci skip] Implement multi-user Kubernetes access with OIDC
- Add RBAC module (modules/kubernetes/rbac/) with admin, power-user,
  and namespace-owner roles, API server OIDC flags, and audit logging
- Add self-service portal (modules/kubernetes/k8s-portal/) SvelteKit app
  with kubeconfig download and setup instructions
- Configure Alloy to collect audit logs from kube-apiserver
- Add Grafana dashboard for Kubernetes audit log visualization
- Configure Authentik OIDC provider with groups scope mapping
- Wire up k8s_users and ssh_private_key variables through module chain
2026-02-17 21:42:39 +00:00
Viktor Barzin
c0363be5e4
[ci skip] Add Grafana dashboard for Technitium DNS query logs
Add MySQL datasource and 15-panel dashboard for DNS analytics:
queries over time, response codes, top domains/clients, response
times, blocked/NxDomain domains. Enable Grafana dashboard sidecar
for auto-provisioning dashboards from ConfigMaps.
2026-02-16 23:06:41 +00:00
Viktor Barzin
c05614e4b8
update the scrape schedule for wrongmove [ci skip] 2026-02-15 14:40:05 +00:00
Viktor Barzin
c330648b7b
[ci skip] Deploy MoltBot (OpenClaw) AI agent gateway
Add new Kubernetes service for OpenClaw gateway connected to in-cluster
Ollama, with kubectl/terraform/git access for infrastructure management.
Protected behind Authentik SSO.
2026-02-13 22:57:36 +00:00
Viktor Barzin
e0ff08978d
[ci skip] add vibetunnel proxy 2026-02-13 18:20:50 +00:00
Viktor Barzin
d911db6cd9
[ci skip] Deploy Gramps Web genealogy service
Add grampsweb module with web app + Celery worker in a single pod,
using shared Redis (DB 2/3), NFS storage, email via mailserver,
and Ollama AI integration. Available at family.viktorbarzin.me.
2026-02-08 02:30:18 +00:00
Viktor Barzin
43bee50de8
[ci skip] Deploy health dashboard service
Apple Health data visualization app (Svelte + FastAPI + Caddy).
Uses shared PostgreSQL via DBaaS, NFS storage for uploads,
accessible at health.viktorbarzin.me.
2026-02-08 01:54:24 +00:00
Viktor Barzin
8bbf4e51da
Add registry DNS record and real-estate scrape schedules
Add registry.viktorbarzin.me to non-proxied DNS names. Add scrape
schedule config for real-estate-crawler. Fix crowdsec var formatting.
2026-02-07 22:38:42 +00:00
Viktor Barzin
792f76454c
Add Traefik dashboard ingress with Authentik protection
- Enable api.insecure in Helm values for internal dashboard access on port 8080
- Add TLS secret, dashboard service, and ingress via ingress_factory (protected=true)
- Pass tls_secret_name to traefik module
- Add traefik to cloudflare_non_proxied_names DNS list
2026-02-07 13:06:57 +00:00
Viktor Barzin
cf25e1af4e
Add Celery worker/beat deployments and fix crawler API config
Add celery worker and celery beat deployments for background task
processing and scheduled scraping. Fix API container name, add
image_pull_policy Always, and add missing path_type to ingress rules.
2026-02-06 20:31:34 +00:00
Viktor Barzin
29567103d6 Add DRONE_WEBHOOK_SECRET for GitHub webhook authentication
Fixes webhook signature validation failures causing 400 errors.
2026-02-01 20:42:07 +00:00
Viktor Barzin
19a41367ba
add reactive resume service [ci skip] 2026-01-28 17:57:39 +00:00
Viktor Barzin
947c5d3d19 Add AFFiNE visual canvas for storytelling
- Deploy AFFiNE as self-hosted visual canvas tool
- Uses shared PostgreSQL and Redis from cluster
- NFS storage for uploads and configuration
- Email configured via mailserver.viktorbarzin.me
- Ingress at affine.viktorbarzin.me

[ci skip]
2026-01-25 21:40:39 +00:00
Viktor Barzin
82ae4b411a
add mcaptcha but disabled as we found another way[ci skip] 2026-01-24 18:43:43 +00:00
Viktor Barzin
6e4cfb4c3a
add ollama-api ingress accessible only locally to allow claude code [ci skip] 2026-01-19 20:15:46 +00:00
Viktor Barzin
4642522fd5
update resume to be a bit more working; still not workign but closer...[ci skip] 2026-01-18 14:05:01 +00:00
Viktor Barzin
4ccf2298fa
add freedify [ci skip] 2026-01-17 22:40:35 +00:00
Viktor Barzin
b30bab8bd7
add emo instance for actual budget [ci skip] 2026-01-17 15:01:29 +00:00
Viktor Barzin
a1fd715e4d
add speedtest deployment [ci skip] 2026-01-13 20:34:44 +00:00
Viktor Barzin
9f34337d04
disable auth-response-headers for idrac and gw ingresses as they cause errors on the upstream [ci skip] 2026-01-10 20:41:00 +00:00
Viktor Barzin
bfa53c5455
add credentials for ab bank sync cronjob [ci skip] 2026-01-10 20:01:06 +00:00
Viktor Barzin
20cd480988
monitor idrac more frequently [ci skip] 2026-01-07 18:55:59 +00:00
Viktor Barzin
e4473efaea
add netbox, ebook2audiobook, audiblez, aiostreams and listenarr; alos reenable prowlarr, qbittorrent [ci skip] 2026-01-03 16:58:57 +00:00
Viktor Barzin
e3387671a8
refactor cloudflared module to make changing between for_each and count easier [ci skip] 2025-12-29 12:22:55 +00:00
Viktor Barzin
8be0fc9699
add more alerts in prometheus and gorup them better [ci skip] 2025-12-28 20:07:33 +00:00
Viktor Barzin
90bdd38de1
migrate grafana to mysql from sqlite [ci skip] 2025-12-27 20:51:05 +00:00