Commit graph

1532 commits

Author SHA1 Message Date
Viktor Barzin
04eaf0f989
[ci skip] Add 12 Prometheus alert rules for monitoring gaps
Add 3 new alert groups and 1 rule to existing group:
- Storage: NodeFilesystemFull (<10% free), PVFillingUp (>85% used)
- K8s Health: PodCrashLooping, ContainerOOMKilled, NodeNotReady,
  NodeConditionBad, JobFailed
- Infrastructure Health: CoreDNSErrors, ScrapeTargetDown,
  PrometheusStorageFull, PrometheusNotificationsFailing
- R730 Host: FanFailure (iDRAC Redfish fan health)
2026-02-11 22:14:30 +00:00
Viktor Barzin
12a84c5207
Standardize Prometheus alert formatting and fix Slack notifications
- Add color coding (red/green) to Slack alerts, show alertname in title
- Use summary annotation in Slack text (description was always empty)
- Format all alert summaries consistently: value with units and threshold
- Fix ratio expressions (CPU/memory) to display as percentages
- Fix "failiure" typo, capitalize Tailscale
2026-02-11 21:53:22 +00:00
Viktor Barzin
81bdf53386
[ci skip] Add skill: traefik-rewrite-body-compression
Extracted from debugging session where packruler/rewrite-body plugin
corrupted gzip responses, breaking HA Companion app auth flow and
WebSocket connections. Fix: strip Accept-Encoding header before
rewrite-body plugin so backends send uncompressed responses.
2026-02-11 21:42:07 +00:00
Viktor Barzin
220f4a18b7
[ci skip] Fix rewrite-body plugin corrupting compressed responses
The packruler/rewrite-body plugin (used for rybbit analytics injection)
fails to decompress gzip responses with "flate: corrupt input before
offset 5", corrupting the response body. This broke HA Companion app's
external_auth flow and WebSocket connections on ha-sofia.

Fix: add a strip-accept-encoding middleware that removes Accept-Encoding
from requests when rybbit is active, forcing backends to send uncompressed
responses that the plugin can safely process.

Also add extra_middlewares variable to reverse_proxy factory for
extensibility.
2026-02-11 21:40:11 +00:00
Viktor Barzin
8e54a936c4
immich to 2.5.6 [ci skip] 2026-02-10 22:01:08 +00:00
Viktor Barzin
ed3cb6df07
[ci skip] Add ingress-factory-migration skill 2026-02-10 21:31:48 +00:00
Viktor Barzin
6acf5ee300
[ci skip] Assorted pending changes: ollama API auth, nvidia dashboard, traefik rewrite-body plugin
- ollama: Add basicAuth middleware for external API access
- monitoring: Update nvidia dashboard (add GPU memory per app panel, bump to v9)
- plotting-book: Switch to ancamilea/book-plotter:latest, add lifecycle ignore
- reverse_proxy/factory: Fix rybbit plugin name (rewritebody -> rewrite-body)
- traefik: Switch to packruler/rewrite-body plugin v1.2.0
2026-02-10 21:29:54 +00:00
Viktor Barzin
f9c07f015b
[ci skip] Use RollingUpdate strategy for real-estate-crawler deployments
Set max_unavailable=0, max_surge=1 on both UI and API deployments
to ensure at least 1 replica is always available during updates.
2026-02-10 21:28:38 +00:00
Viktor Barzin
348d706a48
[ci skip] Refactor raw ingresses to use ingress_factory module
Enhance ingress_factory with full_host, extra_middlewares, and
skip_default_rate_limit variables. Fix TLS hosts bug to use
effective_host. Migrate 13 services from raw kubernetes_ingress_v1
resources to centralized ingress_factory module calls, removing
manual rybbit middleware CRDs where the factory now handles them.
2026-02-10 21:11:46 +00:00
Viktor Barzin
97175c5444
[ci skip] Fix health service port: container listens on 3000, not 80 2026-02-09 21:27:50 +00:00
Viktor Barzin
dadee44046
[ci skip] Add internal OSM routing services (OSRM foot, bicycle, OTP)
New osm-routing namespace with walking, cycling, and transit routing
services for the real-estate-crawler. Internal-only (no public ingress).
2026-02-09 21:03:57 +00:00
Viktor Barzin
cb761f90d7
[ci skip] allow 100 time slicing of nvidia gpu 2026-02-09 21:00:15 +00:00
Viktor Barzin
b6499a8090
[ci skip] Add descheduler profile to restart idrac-redfish-exporter every 6h 2026-02-09 20:57:09 +00:00
Viktor Barzin
a33476919f
[ci skip] Add WebAuthn env vars to real-estate-crawler API deployment 2026-02-08 20:06:24 +00:00
Viktor Barzin
748c77ff9f
[ci skip] update claude knowledge: fix NFS scripts path to secrets/ 2026-02-08 02:41:42 +00:00
Viktor Barzin
74cc37a3c2
[ci skip] Fix grampsweb ingress: set service_name to match backend service
The ingress_factory defaults service_name to name, so it was routing
to a non-existent "family" service instead of "grampsweb".
2026-02-08 02:30:19 +00:00
Viktor Barzin
87f851c20e
[ci skip] update claude knowledge: always apply cloudflared module for DNS
When deploying a new service, the cloudflared module must also be applied
to create the Cloudflare DNS record. Updated CLAUDE.md and setup-project skill.
2026-02-08 02:30:19 +00:00
Viktor Barzin
d911db6cd9
[ci skip] Deploy Gramps Web genealogy service
Add grampsweb module with web app + Celery worker in a single pod,
using shared Redis (DB 2/3), NFS storage, email via mailserver,
and Ollama AI integration. Available at family.viktorbarzin.me.
2026-02-08 02:30:18 +00:00
Viktor Barzin
c04a5e6229 add the nfs dirs 2026-02-08 02:29:48 +00:00
Viktor Barzin
71223c02ad
[ci skip] update claude knowledge: add health service 2026-02-08 01:55:30 +00:00
Viktor Barzin
43bee50de8
[ci skip] Deploy health dashboard service
Apple Health data visualization app (Svelte + FastAPI + Caddy).
Uses shared PostgreSQL via DBaaS, NFS storage for uploads,
accessible at health.viktorbarzin.me.
2026-02-08 01:54:24 +00:00
Viktor Barzin
00943e92fe
[ci skip] update add-service skill: require NFS setup before deployment
Add step 3 (NFS Storage Setup) to ensure NFS directories are created
and exported on TrueNAS before deploying services that need persistent
storage. Prevents pods getting stuck in ContainerCreating due to missing
NFS mounts.
2026-02-08 01:51:44 +00:00
Viktor Barzin
44a17f8089
[ci skip] Add Ollama TCP entrypoint for HA voice pipeline
Expose Ollama at 10.0.20.202:11434 via Traefik TCP passthrough,
bypassing TLS/auth issues with the HTTPS ingress.
2026-02-08 01:51:43 +00:00
Viktor Barzin
bdbd354396
[ci skip] Add Wyoming Piper TTS alongside Whisper STT
Deploy Piper (rhasspy/wyoming-piper) in the whisper namespace with
en_US-lessac-medium voice. Exposed via Traefik TCP on port 10200.
2026-02-08 01:51:43 +00:00
Viktor Barzin
d89947c2fd
[ci skip] Deploy Wyoming Whisper STT service for Home Assistant voice input
Add Wyoming Faster Whisper (rhasspy/wyoming-whisper) as a new K8s service
exposed via Traefik TCP entrypoint on port 10300. Accessible from ha-london
RPi via VPN at 10.0.20.202:10300.
2026-02-08 01:51:43 +00:00
Viktor Barzin
e067504170
[ci skip] update claude knowledge: fix ha-london IP to 192.168.8.103 2026-02-08 01:51:42 +00:00
Viktor Barzin
476f2d2b66 Drone CI Update TLS Certificates Commit 2026-02-08 00:04:51 +00:00
Viktor Barzin
e04fabaa72
[ci skip] Fix registry tag cleanup for pull-through cache
- Rewrite cleanup script to use filesystem deletion (shutil.rmtree)
  since proxy registries don't support DELETE via API (405)
- Fix cron entry to invoke with python3
2026-02-07 22:45:17 +00:00
Viktor Barzin
9bc56b3f04
[ci skip] Add LLM agents, voice stack, and automations to ha-london knowledge map 2026-02-07 22:40:12 +00:00
Viktor Barzin
6ba026801c
[ci skip] Update terraform state 2026-02-07 22:39:44 +00:00
Viktor Barzin
a5c782578c
[ci skip] Add ha-london knowledge map: RPi Docker setup, smart plugs, air quality, e-bike
ha-london runs on Raspberry Pi at 192.168.8.104 (Docker rootless, HA 2025.9.1).
Key systems: TP-Link Kasa smart plugs with energy monitoring, Apollo AIR-1 air
quality sensor (ESPHome), Cowboy e-bike, UptimeRobot, Oral-B BLE toothbrush.
SSH access via pi@192.168.8.104, config at /home/pi/docker/homeAssistant/.
2026-02-07 22:39:20 +00:00
Viktor Barzin
8bbf4e51da
Add registry DNS record and real-estate scrape schedules
Add registry.viktorbarzin.me to non-proxied DNS names. Add scrape
schedule config for real-estate-crawler. Fix crowdsec var formatting.
2026-02-07 22:38:42 +00:00
Viktor Barzin
1ff5242a57
Bump Immich version from v2.5.2 to v2.5.5 2026-02-07 22:38:33 +00:00
Viktor Barzin
b27e1ad9f1
Add Docker registry UI and tag cleanup automation
Deploy joxit/docker-registry-ui on port 8080 for browsing images/tags.
Add Python script to prune old registry tags (keeps last N per image),
scheduled daily at 2am via cron. Expose UI via reverse proxy at
registry.viktorbarzin.me with Authentik auth.
2026-02-07 22:38:15 +00:00
Viktor Barzin
4bd87df6d6
[ci skip] Add skill: traefik-udp-cross-namespace
Extracted from debugging DNS forwarding through Traefik v3. Documents
two non-obvious requirements for custom UDP entrypoints in the Helm chart:
expose.default=true (port not added to Service by default) and
allowCrossNamespace=true (IngressRouteUDP cross-namespace refs blocked
by default). Both issues compound silently.
2026-02-07 22:25:54 +00:00
Viktor Barzin
fb2a830cb2
[ci skip] Update ha-sofia SSH to direct IP 192.168.1.8 and document limitations 2026-02-07 22:21:30 +00:00
Viktor Barzin
8cc4f022c3
[ci skip] Add NAS, printer, iDRAC, AC, and AI to ha-sofia knowledge map 2026-02-07 21:40:47 +00:00
Viktor Barzin
f0844bc45a
[ci skip] Add ha-sofia knowledge map to home-assistant skill
Document all systems discovered via API: gas boiler (EMS-ESP), 4-room
thermostats, solar/battery (Solarman), ATS, Paradox alarm, Frigate NVR
with 9 cameras, Home Connect appliances, LED controllers, media, UPS,
Pax ventilation, and Bulgarian ↔ English room name mappings.
2026-02-07 21:39:58 +00:00
Viktor Barzin
fe2283d728
[ci skip] Add Proxmox VM inventory to claude knowledge 2026-02-07 21:37:38 +00:00
Viktor Barzin
4c001178f6
[ci skip] Add ha-sofia Home Assistant deployment to skills
- Update home-assistant skill to v2.0.0 covering both ha-london and ha-sofia
- Add separate API script for ha-sofia (home-assistant-sofia.py)
- ha-sofia: SSH via vbarzin@ha-sofia.viktorbarzin.lan, config at /config/
- Update CLAUDE.md with both HA deployments
2026-02-07 21:26:05 +00:00
Viktor Barzin
cea317a1b7
[ci skip] Add skills: traefik-http3-quic and helm-release-force-rerender
- traefik-http3-quic: Enable HTTP/3 (QUIC) on Traefik with advertisedPort
  gotcha, Cloudflare zone settings, and testing instructions
- helm-release-force-rerender: Fix Helm releases where Terraform applies
  but K8s resources don't reflect new values (state rm + reimport pattern)
2026-02-07 20:49:34 +00:00
Viktor Barzin
b964a92a8b
[ci skip] update claude knowledge: HTTP/3 enabled for Traefik and Cloudflare 2026-02-07 20:46:14 +00:00
Viktor Barzin
8fabc3d49b
[ci skip] Enable HTTP/3 (QUIC) for all ingresses
- Add http3.enabled + advertisedPort=443 to Traefik websecure entrypoint
- Add cloudflare_zone_settings_override to enable HTTP/3 for proxied domains
2026-02-07 20:43:49 +00:00
Viktor Barzin
a81e44dd82
[ci skip] Strip Authentik auth headers before forwarding to backend
Add strip-auth-headers Traefik middleware that removes X-authentik-*
headers from requests before they reach the backend. Backends like
iDRAC and TP-Link gateway break when receiving these extra headers.
2026-02-07 20:28:44 +00:00
Viktor Barzin
c1eac81095
[ci skip] Fix DNS forwarding through Traefik to Technitium
Expose UDP port 53 on the Traefik LoadBalancer service and enable
cross-namespace CRD references so the IngressRouteUDP in the traefik
namespace can route DNS traffic to technitium-dns in the technitium
namespace. This restores DNS resolution via 10.0.20.202 for pfSense
and Home Assistant.
2026-02-07 20:10:47 +00:00
Viktor Barzin
481b51358a
[ci skip] Import 36 existing Traefik middleware resources into terraform state 2026-02-07 18:58:05 +00:00
Viktor Barzin
d4cf63dce9
[ci skip] Fix HTTPS backend proxying for reverse-proxy services
- Add insecureSkipVerify=true globally for self-signed backend certs
- Name service ports with https- prefix for HTTPS backends so Traefik uses HTTPS
- Add ServersTransport CRD for per-service insecureSkipVerify
- Add serversscheme/serverstransport annotations to reverse-proxy factory
2026-02-07 13:56:24 +00:00
Viktor Barzin
4d0d2a3568
[ci skip] update kubectl skill to use local kubeconfig 2026-02-07 13:42:35 +00:00
Viktor Barzin
2d95861add
[ci skip] update tf-apply and tf-plan skills to run locally with kubeconfig 2026-02-07 13:42:10 +00:00
Viktor Barzin
75fb8a0272
[ci skip] update claude knowledge: always run terraform locally 2026-02-07 13:41:41 +00:00