Commit graph

1612 commits

Author SHA1 Message Date
Viktor Barzin
767a8250f6
[ci skip] Disable grampsweb service and remove family DNS record 2026-02-21 18:55:54 +00:00
Viktor Barzin
3739906747
[ci skip] Add skills: pfsense-nat-rule-creation, coturn-k8s-without-hostnetwork 2026-02-21 18:29:32 +00:00
Viktor Barzin
a4817e8192
[ci skip] Add turn.viktorbarzin.me to non-proxied DNS names 2026-02-21 18:15:06 +00:00
Viktor Barzin
fdf374b751
[ci skip] Add coturn TURN/STUN server for WebRTC relay
- Deploy coturn on k8s with MetalLB shared IP (10.0.20.200)
- Normal pod networking (no hostNetwork), runs on any node
- 100 relay ports (49152-49252), port 3478 for STUN/TURN signaling
- Shared secret auth for time-limited TURN credentials
- For F1 streaming WebRTC NAT traversal
2026-02-21 18:08:01 +00:00
Viktor Barzin
8ec983e3fd
[ci skip] Real estate crawler: 2 replicas for UI/API, rolling update for celery
- UI and API: 1 → 2 replicas for zero-downtime during restarts/crashes
- Celery worker: Recreate → RollingUpdate strategy
- Celery beat: unchanged (Recreate, singleton scheduler)
- Move f1 from Cloudflare proxied to non-proxied DNS
2026-02-21 17:32:45 +00:00
Viktor Barzin
8e867cfb55
[ci skip] Use versioned image tag for f1-stream to bypass stale cache
Pull-through cache on registry VM served stale arm64-only manifest for
:latest tag. Switch to v1.0.0 tag so cache fetches the fresh amd64 image.
2026-02-21 16:07:58 +00:00
Viktor Barzin
f23d3c220c
[ci skip] Configure f1-stream: WebAuthn, NFS storage, headless browser
- Set WEBAUTHN_RPID/ORIGIN for f1.viktorbarzin.me domain
- Add NFS volume at /mnt/main/f1-stream for persistent session/stream data
- Enable headless browser extraction (HEADLESS_EXTRACT_ENABLED=true)
- Reduce replicas to 1 (file-based sessions don't work across replicas)
2026-02-21 15:57:25 +00:00
Viktor Barzin
691a41ee8b
[ci skip] Fix f1-stream port mismatch: container listens on 8080, not 80 2026-02-21 15:42:47 +00:00
Viktor Barzin
25df219c86
[ci skip] Increase Drone CI namespace resource quota
Double CPU and memory limits to give CI pipelines more headroom.
2026-02-21 14:49:16 +00:00
Viktor Barzin
71a4fb172c
[ci skip] Add Music Assistant librespot stale credentials skill
New skill: music-assistant-librespot-wrong-account
- Documents fix for Spotify playback failing with "librespot does not support
  free accounts" when cached credentials point to wrong Spotify account
- Includes step-by-step solution: find container, inspect cache, clear and restart

Updated: home-assistant skill with Music Assistant addon details for ha-sofia
2026-02-21 11:23:24 +00:00
Viktor Barzin
500c0c2191
[ci skip] Add Kyverno policy to inject ndots:2 on all pods
Reduces NxDomain query flood caused by Kubernetes default ndots:5 search
domain expansion. 78% of DNS queries were wasted NxDomain lookups.
2026-02-20 00:21:03 +00:00
Viktor Barzin
238db327f9
[ci skip] Add ground rules: no secrets, CI/CD required, monitoring required 2026-02-19 23:48:44 +00:00
Viktor Barzin
dbab20995b
[ci skip] Add Modal GLM-5 model to OpenClaw, fix streaming and download reliability
- Add modal provider (GLM-5-FP8) as primary model with non-streaming mode
  (GLM-5 uses non-standard reasoning_content field incompatible with streaming)
- Add curl --retry flags to init container downloads for reliability
- Fallback chain: GLM-5 → Gemini 2.5 Flash → Llama 3.3 70B
2026-02-19 23:17:08 +00:00
Viktor Barzin
66f1867ccb
[ci skip] Update knowledge base: add OpenClaw service, rename moltbot references 2026-02-18 22:39:58 +00:00
Viktor Barzin
14296a3966
[ci skip] Rename moltbot to openclaw across Terraform, K8s resources, and DNS
Update terraform version in init container from 1.12.1 to 1.14.5.
2026-02-18 21:53:46 +00:00
Viktor Barzin
1206b3860b
[ci skip] Remove Authentik forward auth from Grafana, add admin password management
Fixes HA mobile app 403 when embedding Grafana dashboards - the webview
blocks third-party cookies needed by Authentik forward auth. Grafana
already has anonymous Viewer access enabled, so forward auth is not
needed. Also adds grafana_admin_password variable and explicit resource
limits to prevent ResourceQuota issues during rolling updates.
2026-02-18 21:40:32 +00:00
Viktor Barzin
f23246f9a3
[ci skip] Add skills: authentik-oidc-kubernetes, kubelet-static-pod-manifest-update
Two skills extracted from multi-user k8s access implementation:
- authentik-oidc-kubernetes: 6 gotchas for Authentik OIDC + kube-apiserver
- kubelet-static-pod-manifest-update: full restart cycle for static pod changes
2026-02-17 22:56:03 +00:00
Viktor Barzin
1457a04695
[ci skip] Add Authentik management skill for API-based identity provider control 2026-02-17 22:55:41 +00:00
Viktor Barzin
bcb1e7f79f
[ci skip] Fix setup script: handle sudo-less environments, add extra scopes 2026-02-17 22:27:03 +00:00
Viktor Barzin
87719a197d
[ci skip] Add one-command setup scripts to k8s-portal
- Add /setup/script?os=mac and /setup/script?os=linux endpoints
- Scripts install kubectl, kubelogin, write kubeconfig, update shell rc
- Unprotected ingress for /setup/script (curl-able without auth)
- Fix kubeconfig to include --oidc-extra-scope for email/profile/groups
2026-02-17 22:22:41 +00:00
Viktor Barzin
f8b07b3bb9
[ci skip] Add anca as namespace-owner for plotting-book
- Add ancaelena98@gmail.com as namespace-owner for plotting-book namespace
- Fix RBAC module: don't create namespaces (they're managed by service modules)
- RoleBinding to built-in admin ClusterRole + cluster-wide read-only access
- ResourceQuota: 2 CPU / 4Gi mem requests, 4 CPU / 8Gi limits, 20 pods
2026-02-17 22:18:37 +00:00
Viktor Barzin
84d9a3f926
[ci skip] Update CLAUDE.md with OIDC gotchas and k8s multi-user notes 2026-02-17 22:16:46 +00:00
Viktor Barzin
14ab6f115f
[ci skip] Fix multi-user k8s access: redirect URIs, email scope, image ref
- Change portal image to viktorbarzin/k8s-portal:latest (Docker Hub)
- Add k8s-portal to cloudflare_non_proxied_names
- Add k8s_users with viktor admin entry to terraform.tfvars
2026-02-17 22:15:20 +00:00
Viktor Barzin
03651080bb
[ci skip] Update Authentik API token reference to terraform.tfvars 2026-02-17 22:03:55 +00:00
Viktor Barzin
79ce0db11c
[ci skip] Pass skill secrets to moltbot container and fix Python env
- Add skill_secrets variable to moltbot module with HA tokens and
  Uptime Kuma password as container env vars
- Install Python packages (requests, caldav, icalendar, uptime-kuma-api)
  in init container with PYTHONPATH for main container access
- Update all skills to use python3 directly instead of ~/.venvs/claude
  venv path that doesn't exist in the container
- Remove hardcoded Uptime Kuma password from skill, use env var
2026-02-17 21:53:32 +00:00
Viktor Barzin
d0b39f1987
[ci skip] Implement multi-user Kubernetes access with OIDC
- Add RBAC module (modules/kubernetes/rbac/) with admin, power-user,
  and namespace-owner roles, API server OIDC flags, and audit logging
- Add self-service portal (modules/kubernetes/k8s-portal/) SvelteKit app
  with kubeconfig download and setup instructions
- Configure Alloy to collect audit logs from kube-apiserver
- Add Grafana dashboard for Kubernetes audit log visualization
- Configure Authentik OIDC provider with groups scope mapping
- Wire up k8s_users and ssh_private_key variables through module chain
2026-02-17 21:42:39 +00:00
Viktor Barzin
72ff14e2df
[ci skip] Add Authentik API management knowledge 2026-02-17 21:10:40 +00:00
Viktor Barzin
6a8efa69c4
[ci skip] Import Claude skills into OpenClaw moltbot
- Convert setup-project and extend-vm-storage from standalone .md
  to directory-based SKILL.md format with YAML frontmatter
- Add symlink in moltbot init container to expose Claude skills
  at ~/.openclaw/skills/ for auto-discovery by OpenClaw
- Update CLAUDE.md skill path references
2026-02-17 21:09:12 +00:00
Viktor Barzin
734c173f78
[ci skip] Add multi-user Kubernetes access implementation plan 2026-02-17 20:49:14 +00:00
Viktor Barzin
d9913cde2a
[ci skip] Add multi-user Kubernetes access design document 2026-02-17 20:44:23 +00:00
Viktor Barzin
587b649650
[ci skip] Increase drone namespace memory limits with custom ResourceQuota 2026-02-17 20:40:40 +00:00
Viktor Barzin
6f3395fbf5
[ci skip] Add Smart Home (ha-sofia) section to Cluster Health Overview dashboard 2026-02-17 19:48:02 +00:00
Viktor Barzin
c0363be5e4
[ci skip] Add Grafana dashboard for Technitium DNS query logs
Add MySQL datasource and 15-panel dashboard for DNS analytics:
queries over time, response codes, top domains/clients, response
times, blocked/NxDomain domains. Enable Grafana dashboard sidecar
for auto-provisioning dashboards from ConfigMaps.
2026-02-16 23:06:41 +00:00
Viktor Barzin
1802b4c86d
[ci skip] Add pfsense-dnsmasq-interface-binding skill, update ndots skill to v1.1.0 2026-02-16 22:30:57 +00:00
Viktor Barzin
a268b9107f
[ci skip] Replace specific CoreDNS catch-all blocks with generic template regex
Single template regex in the viktorbarzin.lan block catches ALL search
domain expansion junk (*.com.viktorbarzin.lan, *.cluster.local.viktorbarzin.lan,
etc.) instead of needing separate server blocks per pattern. Legitimate
single-label queries (idrac.viktorbarzin.lan) fall through to Technitium.
2026-02-16 21:49:03 +00:00
Viktor Barzin
19136c21f1
[ci skip] Fix .viktorbarzin.lan.viktorbarzin.lan duplicate DNS queries
Add CoreDNS catch-all block for viktorbarzin.lan.viktorbarzin.lan to
return NXDOMAIN immediately, preventing search domain expansion junk
queries from reaching Technitium. Add trailing dots to Prometheus
scrape targets (idrac, ups, ha-sofia) to bypass ndots expansion.
2026-02-16 21:38:38 +00:00
Viktor Barzin
c08bab7da7
[ci skip] Update preference: always use cluster_healthcheck.sh for health checks 2026-02-16 21:19:49 +00:00
Viktor Barzin
205eb2704b
[ci skip] Fix Technitium DNS client IP logging: bypass Traefik L4 proxy
DNS queries were going through Traefik's IngressRouteUDP, replacing
real client IPs with Traefik pod IPs (10.10.169.150) in Technitium logs.
Changed Technitium DNS service from NodePort to LoadBalancer with
externalTrafficPolicy: Local, removed dns-udp entrypoint and
IngressRouteUDP from Traefik, and updated CoreDNS to forward .lan
queries to Technitium's LoadBalancer IP directly.
2026-02-16 21:16:16 +00:00
Viktor Barzin
3d4cdf3203
[ci skip] Fix Alloy OOMKill and iDRAC priority class conflict
- Alloy: bump memory limits from 64Mi/128Mi to 256Mi/768Mi — pods were
  OOMKilled at 128Mi, steady-state usage is ~400-450Mi per node
- iDRAC Redfish Exporter: add explicit priority_class_name to resolve
  conflict between Kyverno priority injection and default priority: 0
2026-02-16 20:09:53 +00:00
Viktor Barzin
32a7c04deb
[ci skip] Remember to use cluster_healthcheck.sh for cluster status checks 2026-02-16 19:45:31 +00:00
Viktor Barzin
5c7efbdb48
[ci skip] Fix docker-registry VM: add SSH key, remove hourly restart cron
- Set explicit devvm SSH public key for cloud-init (was empty, breaking SSH access)
- Remove hourly cron that restarted all registry containers, which wiped the
  in-memory blobdescriptor cache and caused low pull-through cache hit rates
2026-02-15 22:16:41 +00:00
Viktor Barzin
7854f89add
[ci skip] Add skill: k8s-ndots-search-domain-nxdomain-flood
Documents how Kubernetes ndots:5 search domain expansion floods external
DNS with NxDomain queries, and the CoreDNS template block fix.
2026-02-15 21:52:27 +00:00
Viktor Barzin
2d015c1cb4
Update Cluster Health dashboard: dedup metrics, GPU memory, remove broken panels [ci skip] 2026-02-15 21:51:41 +00:00
Viktor Barzin
a8f42d7fc0
[ci skip] Manage CoreDNS Corefile in Terraform and block junk NxDomain queries
Add kubernetes_config_map for CoreDNS to the technitium module, with a
template block for cluster.local.viktorbarzin.lan that returns NXDOMAIN
immediately. This prevents ndots:5 search domain expansion from flooding
Technitium with ~66k/day junk queries (e.g.
redis.redis.svc.cluster.local.viktorbarzin.lan).

Also enabled saveCache on Technitium so the DNS cache persists across
pod restarts.
2026-02-15 21:51:12 +00:00
Viktor Barzin
a2b44c8ff7
Update Cluster Health dashboard: reorder rows, add links, key services, sorting [ci skip] 2026-02-15 21:24:08 +00:00
Viktor Barzin
8d38e1474b
[ci skip] Document Terraform state splitting plan for future implementation 2026-02-15 21:10:40 +00:00
Viktor Barzin
f447e45ee1
Add Cluster Health Overview Grafana dashboard [ci skip] 2026-02-15 19:38:28 +00:00
Viktor Barzin
1564ec7e79
Add tier-based resource governance via Kyverno [ci skip]
Four layers of noisy-neighbor protection using existing tier system:
- PriorityClasses (tier-0-core through tier-4-aux)
- LimitRange defaults auto-generated per namespace tier
- ResourceQuotas auto-generated per namespace tier
- PriorityClassName injection on pods via Kyverno mutate

Custom quota overrides for monitoring and crowdsec namespaces
which exceed the default tier quotas.
2026-02-15 18:48:33 +00:00
Viktor Barzin
7ef23470cd
Add Uptime Kuma monitor check to cluster health script [ci skip]
Adds check #14 that queries Uptime Kuma API for application-level
monitor status, complementing the kubectl-level checks with HTTP/ping
health data. Reports down monitors by name with PASS/WARN/FAIL thresholds.
2026-02-15 17:49:40 +00:00
Viktor Barzin
df840a0078
[ci skip] remember: spawn subagent to monitor pods instead of sleeping 2026-02-15 17:48:42 +00:00