Commit graph

1198 commits

Author SHA1 Message Date
Viktor Barzin
b4f68d99d8 [ci skip] Fix CrowdSec to monitor Traefik and add Slack notifications
- Switch acquisition from ingress-nginx to traefik namespace/pods
- Change collection from crowdsecurity/nginx to crowdsecurity/traefik
- Add Slack notification plugin for ban/captcha decisions
- Wire alertmanager_slack_api_url through to CrowdSec module
2026-02-11 22:25:03 +00:00
Viktor Barzin
c8a41ac567 [ci skip] Add 12 Prometheus alert rules for monitoring gaps
Add 3 new alert groups and 1 rule to existing group:
- Storage: NodeFilesystemFull (<10% free), PVFillingUp (>85% used)
- K8s Health: PodCrashLooping, ContainerOOMKilled, NodeNotReady,
  NodeConditionBad, JobFailed
- Infrastructure Health: CoreDNSErrors, ScrapeTargetDown,
  PrometheusStorageFull, PrometheusNotificationsFailing
- R730 Host: FanFailure (iDRAC Redfish fan health)
2026-02-11 22:14:30 +00:00
Viktor Barzin
dbf397841a Standardize Prometheus alert formatting and fix Slack notifications
- Add color coding (red/green) to Slack alerts, show alertname in title
- Use summary annotation in Slack text (description was always empty)
- Format all alert summaries consistently: value with units and threshold
- Fix ratio expressions (CPU/memory) to display as percentages
- Fix "failiure" typo, capitalize Tailscale
2026-02-11 21:53:22 +00:00
Viktor Barzin
d48052276e [ci skip] Add skill: traefik-rewrite-body-compression
Extracted from debugging session where packruler/rewrite-body plugin
corrupted gzip responses, breaking HA Companion app auth flow and
WebSocket connections. Fix: strip Accept-Encoding header before
rewrite-body plugin so backends send uncompressed responses.
2026-02-11 21:42:07 +00:00
Viktor Barzin
f03b8a055b [ci skip] Fix rewrite-body plugin corrupting compressed responses
The packruler/rewrite-body plugin (used for rybbit analytics injection)
fails to decompress gzip responses with "flate: corrupt input before
offset 5", corrupting the response body. This broke HA Companion app's
external_auth flow and WebSocket connections on ha-sofia.

Fix: add a strip-accept-encoding middleware that removes Accept-Encoding
from requests when rybbit is active, forcing backends to send uncompressed
responses that the plugin can safely process.

Also add extra_middlewares variable to reverse_proxy factory for
extensibility.
2026-02-11 21:40:11 +00:00
Viktor Barzin
036ec06256 immich to 2.5.6 [ci skip] 2026-02-10 22:01:08 +00:00
Viktor Barzin
c82f82af57 [ci skip] Add ingress-factory-migration skill 2026-02-10 21:31:48 +00:00
Viktor Barzin
73aab7f4ce [ci skip] Assorted pending changes: ollama API auth, nvidia dashboard, traefik rewrite-body plugin
- ollama: Add basicAuth middleware for external API access
- monitoring: Update nvidia dashboard (add GPU memory per app panel, bump to v9)
- plotting-book: Switch to ancamilea/book-plotter:latest, add lifecycle ignore
- reverse_proxy/factory: Fix rybbit plugin name (rewritebody -> rewrite-body)
- traefik: Switch to packruler/rewrite-body plugin v1.2.0
2026-02-10 21:29:54 +00:00
Viktor Barzin
5e1e18a044 [ci skip] Use RollingUpdate strategy for real-estate-crawler deployments
Set max_unavailable=0, max_surge=1 on both UI and API deployments
to ensure at least 1 replica is always available during updates.
2026-02-10 21:28:38 +00:00
Viktor Barzin
6d6ec0c1e2 [ci skip] Refactor raw ingresses to use ingress_factory module
Enhance ingress_factory with full_host, extra_middlewares, and
skip_default_rate_limit variables. Fix TLS hosts bug to use
effective_host. Migrate 13 services from raw kubernetes_ingress_v1
resources to centralized ingress_factory module calls, removing
manual rybbit middleware CRDs where the factory now handles them.
2026-02-10 21:11:46 +00:00
Viktor Barzin
70376b623e [ci skip] Fix health service port: container listens on 3000, not 80 2026-02-09 21:27:50 +00:00
Viktor Barzin
f04a072beb [ci skip] Add internal OSM routing services (OSRM foot, bicycle, OTP)
New osm-routing namespace with walking, cycling, and transit routing
services for the real-estate-crawler. Internal-only (no public ingress).
2026-02-09 21:03:57 +00:00
Viktor Barzin
5a81ce5774 [ci skip] allow 100 time slicing of nvidia gpu 2026-02-09 21:00:15 +00:00
Viktor Barzin
7b747350de [ci skip] Add descheduler profile to restart idrac-redfish-exporter every 6h 2026-02-09 20:57:09 +00:00
Viktor Barzin
c408887560 [ci skip] Add WebAuthn env vars to real-estate-crawler API deployment 2026-02-08 20:06:24 +00:00
Viktor Barzin
bcdebfd9c1 [ci skip] update claude knowledge: fix NFS scripts path to secrets/ 2026-02-08 02:41:42 +00:00
Viktor Barzin
13659e0fc6 [ci skip] Fix grampsweb ingress: set service_name to match backend service
The ingress_factory defaults service_name to name, so it was routing
to a non-existent "family" service instead of "grampsweb".
2026-02-08 02:30:19 +00:00
Viktor Barzin
945d2d90a7 [ci skip] update claude knowledge: always apply cloudflared module for DNS
When deploying a new service, the cloudflared module must also be applied
to create the Cloudflare DNS record. Updated CLAUDE.md and setup-project skill.
2026-02-08 02:30:19 +00:00
Viktor Barzin
ce8f81db0c [ci skip] Deploy Gramps Web genealogy service
Add grampsweb module with web app + Celery worker in a single pod,
using shared Redis (DB 2/3), NFS storage, email via mailserver,
and Ollama AI integration. Available at family.viktorbarzin.me.
2026-02-08 02:30:18 +00:00
Viktor Barzin
861cd80c64 add the nfs dirs 2026-02-08 02:29:48 +00:00
Viktor Barzin
a2e1a79286 [ci skip] update claude knowledge: add health service 2026-02-08 01:55:30 +00:00
Viktor Barzin
5ad7b7e76d [ci skip] Deploy health dashboard service
Apple Health data visualization app (Svelte + FastAPI + Caddy).
Uses shared PostgreSQL via DBaaS, NFS storage for uploads,
accessible at health.viktorbarzin.me.
2026-02-08 01:54:24 +00:00
Viktor Barzin
7f871d7675 [ci skip] update add-service skill: require NFS setup before deployment
Add step 3 (NFS Storage Setup) to ensure NFS directories are created
and exported on TrueNAS before deploying services that need persistent
storage. Prevents pods getting stuck in ContainerCreating due to missing
NFS mounts.
2026-02-08 01:51:44 +00:00
Viktor Barzin
b78e60dbf6 [ci skip] Add Ollama TCP entrypoint for HA voice pipeline
Expose Ollama at 10.0.20.202:11434 via Traefik TCP passthrough,
bypassing TLS/auth issues with the HTTPS ingress.
2026-02-08 01:51:43 +00:00
Viktor Barzin
a8caa45589 [ci skip] Add Wyoming Piper TTS alongside Whisper STT
Deploy Piper (rhasspy/wyoming-piper) in the whisper namespace with
en_US-lessac-medium voice. Exposed via Traefik TCP on port 10200.
2026-02-08 01:51:43 +00:00
Viktor Barzin
b22a14c914 [ci skip] Deploy Wyoming Whisper STT service for Home Assistant voice input
Add Wyoming Faster Whisper (rhasspy/wyoming-whisper) as a new K8s service
exposed via Traefik TCP entrypoint on port 10300. Accessible from ha-london
RPi via VPN at 10.0.20.202:10300.
2026-02-08 01:51:43 +00:00
Viktor Barzin
5e3b6c57ad [ci skip] update claude knowledge: fix ha-london IP to 192.168.8.103 2026-02-08 01:51:42 +00:00
Viktor Barzin
65a228632b Drone CI Update TLS Certificates Commit 2026-02-08 00:04:51 +00:00
Viktor Barzin
375e3e115a [ci skip] Fix registry tag cleanup for pull-through cache
- Rewrite cleanup script to use filesystem deletion (shutil.rmtree)
  since proxy registries don't support DELETE via API (405)
- Fix cron entry to invoke with python3
2026-02-07 22:45:17 +00:00
Viktor Barzin
4671ef34a3 [ci skip] Add LLM agents, voice stack, and automations to ha-london knowledge map 2026-02-07 22:40:12 +00:00
Viktor Barzin
c6a05d8e26 [ci skip] Add ha-london knowledge map: RPi Docker setup, smart plugs, air quality, e-bike
ha-london runs on Raspberry Pi at 192.168.8.104 (Docker rootless, HA 2025.9.1).
Key systems: TP-Link Kasa smart plugs with energy monitoring, Apollo AIR-1 air
quality sensor (ESPHome), Cowboy e-bike, UptimeRobot, Oral-B BLE toothbrush.
SSH access via pi@192.168.8.104, config at /home/pi/docker/homeAssistant/.
2026-02-07 22:39:20 +00:00
Viktor Barzin
c57873c4d4 Bump Immich version from v2.5.2 to v2.5.5 2026-02-07 22:38:33 +00:00
Viktor Barzin
11d328fb99 Add Docker registry UI and tag cleanup automation
Deploy joxit/docker-registry-ui on port 8080 for browsing images/tags.
Add Python script to prune old registry tags (keeps last N per image),
scheduled daily at 2am via cron. Expose UI via reverse proxy at
registry.viktorbarzin.me with Authentik auth.
2026-02-07 22:38:15 +00:00
Viktor Barzin
f8c25d9c23 [ci skip] Add skill: traefik-udp-cross-namespace
Extracted from debugging DNS forwarding through Traefik v3. Documents
two non-obvious requirements for custom UDP entrypoints in the Helm chart:
expose.default=true (port not added to Service by default) and
allowCrossNamespace=true (IngressRouteUDP cross-namespace refs blocked
by default). Both issues compound silently.
2026-02-07 22:25:54 +00:00
Viktor Barzin
936607ac4f [ci skip] Update ha-sofia SSH to direct IP 192.168.1.8 and document limitations 2026-02-07 22:21:30 +00:00
Viktor Barzin
4e2dbcde77 [ci skip] Add NAS, printer, iDRAC, AC, and AI to ha-sofia knowledge map 2026-02-07 21:40:47 +00:00
Viktor Barzin
0383e502a4 [ci skip] Add ha-sofia knowledge map to home-assistant skill
Document all systems discovered via API: gas boiler (EMS-ESP), 4-room
thermostats, solar/battery (Solarman), ATS, Paradox alarm, Frigate NVR
with 9 cameras, Home Connect appliances, LED controllers, media, UPS,
Pax ventilation, and Bulgarian ↔ English room name mappings.
2026-02-07 21:39:58 +00:00
Viktor Barzin
01affd9727 [ci skip] Add Proxmox VM inventory to claude knowledge 2026-02-07 21:37:38 +00:00
Viktor Barzin
191c760b94 [ci skip] Add ha-sofia Home Assistant deployment to skills
- Update home-assistant skill to v2.0.0 covering both ha-london and ha-sofia
- Add separate API script for ha-sofia (home-assistant-sofia.py)
- ha-sofia: SSH via vbarzin@ha-sofia.viktorbarzin.lan, config at /config/
- Update CLAUDE.md with both HA deployments
2026-02-07 21:26:05 +00:00
Viktor Barzin
a26fdd27b2 [ci skip] Add skills: traefik-http3-quic and helm-release-force-rerender
- traefik-http3-quic: Enable HTTP/3 (QUIC) on Traefik with advertisedPort
  gotcha, Cloudflare zone settings, and testing instructions
- helm-release-force-rerender: Fix Helm releases where Terraform applies
  but K8s resources don't reflect new values (state rm + reimport pattern)
2026-02-07 20:49:34 +00:00
Viktor Barzin
8b8beb78dd [ci skip] update claude knowledge: HTTP/3 enabled for Traefik and Cloudflare 2026-02-07 20:46:14 +00:00
Viktor Barzin
2875bf9d4e [ci skip] Enable HTTP/3 (QUIC) for all ingresses
- Add http3.enabled + advertisedPort=443 to Traefik websecure entrypoint
- Add cloudflare_zone_settings_override to enable HTTP/3 for proxied domains
2026-02-07 20:43:49 +00:00
Viktor Barzin
eef9d25874 [ci skip] Strip Authentik auth headers before forwarding to backend
Add strip-auth-headers Traefik middleware that removes X-authentik-*
headers from requests before they reach the backend. Backends like
iDRAC and TP-Link gateway break when receiving these extra headers.
2026-02-07 20:28:44 +00:00
Viktor Barzin
30bc2e9386 [ci skip] Fix DNS forwarding through Traefik to Technitium
Expose UDP port 53 on the Traefik LoadBalancer service and enable
cross-namespace CRD references so the IngressRouteUDP in the traefik
namespace can route DNS traffic to technitium-dns in the technitium
namespace. This restores DNS resolution via 10.0.20.202 for pfSense
and Home Assistant.
2026-02-07 20:10:47 +00:00
Viktor Barzin
f01e92b1d9 [ci skip] Fix HTTPS backend proxying for reverse-proxy services
- Add insecureSkipVerify=true globally for self-signed backend certs
- Name service ports with https- prefix for HTTPS backends so Traefik uses HTTPS
- Add ServersTransport CRD for per-service insecureSkipVerify
- Add serversscheme/serverstransport annotations to reverse-proxy factory
2026-02-07 13:56:24 +00:00
Viktor Barzin
b5a74c2016 [ci skip] update kubectl skill to use local kubeconfig 2026-02-07 13:42:35 +00:00
Viktor Barzin
5b9c6484a1 [ci skip] update tf-apply and tf-plan skills to run locally with kubeconfig 2026-02-07 13:42:10 +00:00
Viktor Barzin
0709eb0266 [ci skip] update claude knowledge: always run terraform locally 2026-02-07 13:41:41 +00:00
Viktor Barzin
04d85221c7 [ci skip] Remove unsupported advertisedPort from Traefik Helm values 2026-02-07 13:41:06 +00:00
Viktor Barzin
510673949d [ci skip] Add --api.insecure=true to Traefik for dashboard access on port 8080 2026-02-07 13:35:58 +00:00