Commit graph

1016 commits

Author SHA1 Message Date
Viktor Barzin
e0ff08978d
[ci skip] add vibetunnel proxy 2026-02-13 18:20:50 +00:00
Viktor Barzin
8bea552664
[ci skip] Add HomeAssistantDown alert for ha-sofia
Fires after 5m if the haos Prometheus scrape target is unreachable.
Covers the HTTP API endpoint which shares the same process as the
WebSocket API used by the mobile app.
2026-02-11 23:24:46 +00:00
Viktor Barzin
0c18a86a7b
[ci skip] Fix all active Prometheus alerts
- meshcentral: rename port from "https" to "http" — MeshCentral serves
  plain HTTP when REVERSE_PROXY=true, but Traefik inferred HTTPS from the
  port name, causing 100% 5xx errors
- osm-routing/otp: scale to 0 — TfL GTFS data expired, OTP crash-loops
  trying to build graph with no valid transit trips
- wireguard: add prometheus.io/port=9586 annotation — without it,
  Prometheus tried scraping all container ports (51820 UDP, 80)
- travel-blog: remove stale prometheus.io annotations and dead port 9113
  — nginx-exporter sidecar was commented out but annotations remained
- dawarich: remove prometheus.io annotations — exporter env vars are
  commented out so nothing listens on port 9394
- monitoring: raise CPU temp threshold 60°C→75°C (E5-2699 v4 Tcase is
  79°C), lower registry cache threshold 50%→25%, add minimum traffic
  floor (>0.1 req/s) to 4xx/5xx rate alerts to prevent false positives
  on low-traffic services
2026-02-11 22:40:56 +00:00
Viktor Barzin
9c3f8adc11
[ci skip] Fix CrowdSec to monitor Traefik and add Slack notifications
- Switch acquisition from ingress-nginx to traefik namespace/pods
- Change collection from crowdsecurity/nginx to crowdsecurity/traefik
- Add Slack notification plugin for ban/captcha decisions
- Wire alertmanager_slack_api_url through to CrowdSec module
2026-02-11 22:25:03 +00:00
Viktor Barzin
04eaf0f989
[ci skip] Add 12 Prometheus alert rules for monitoring gaps
Add 3 new alert groups and 1 rule to existing group:
- Storage: NodeFilesystemFull (<10% free), PVFillingUp (>85% used)
- K8s Health: PodCrashLooping, ContainerOOMKilled, NodeNotReady,
  NodeConditionBad, JobFailed
- Infrastructure Health: CoreDNSErrors, ScrapeTargetDown,
  PrometheusStorageFull, PrometheusNotificationsFailing
- R730 Host: FanFailure (iDRAC Redfish fan health)
2026-02-11 22:14:30 +00:00
Viktor Barzin
12a84c5207
Standardize Prometheus alert formatting and fix Slack notifications
- Add color coding (red/green) to Slack alerts, show alertname in title
- Use summary annotation in Slack text (description was always empty)
- Format all alert summaries consistently: value with units and threshold
- Fix ratio expressions (CPU/memory) to display as percentages
- Fix "failiure" typo, capitalize Tailscale
2026-02-11 21:53:22 +00:00
Viktor Barzin
220f4a18b7
[ci skip] Fix rewrite-body plugin corrupting compressed responses
The packruler/rewrite-body plugin (used for rybbit analytics injection)
fails to decompress gzip responses with "flate: corrupt input before
offset 5", corrupting the response body. This broke HA Companion app's
external_auth flow and WebSocket connections on ha-sofia.

Fix: add a strip-accept-encoding middleware that removes Accept-Encoding
from requests when rybbit is active, forcing backends to send uncompressed
responses that the plugin can safely process.

Also add extra_middlewares variable to reverse_proxy factory for
extensibility.
2026-02-11 21:40:11 +00:00
Viktor Barzin
8e54a936c4
immich to 2.5.6 [ci skip] 2026-02-10 22:01:08 +00:00
Viktor Barzin
6acf5ee300
[ci skip] Assorted pending changes: ollama API auth, nvidia dashboard, traefik rewrite-body plugin
- ollama: Add basicAuth middleware for external API access
- monitoring: Update nvidia dashboard (add GPU memory per app panel, bump to v9)
- plotting-book: Switch to ancamilea/book-plotter:latest, add lifecycle ignore
- reverse_proxy/factory: Fix rybbit plugin name (rewritebody -> rewrite-body)
- traefik: Switch to packruler/rewrite-body plugin v1.2.0
2026-02-10 21:29:54 +00:00
Viktor Barzin
f9c07f015b
[ci skip] Use RollingUpdate strategy for real-estate-crawler deployments
Set max_unavailable=0, max_surge=1 on both UI and API deployments
to ensure at least 1 replica is always available during updates.
2026-02-10 21:28:38 +00:00
Viktor Barzin
348d706a48
[ci skip] Refactor raw ingresses to use ingress_factory module
Enhance ingress_factory with full_host, extra_middlewares, and
skip_default_rate_limit variables. Fix TLS hosts bug to use
effective_host. Migrate 13 services from raw kubernetes_ingress_v1
resources to centralized ingress_factory module calls, removing
manual rybbit middleware CRDs where the factory now handles them.
2026-02-10 21:11:46 +00:00
Viktor Barzin
97175c5444
[ci skip] Fix health service port: container listens on 3000, not 80 2026-02-09 21:27:50 +00:00
Viktor Barzin
dadee44046
[ci skip] Add internal OSM routing services (OSRM foot, bicycle, OTP)
New osm-routing namespace with walking, cycling, and transit routing
services for the real-estate-crawler. Internal-only (no public ingress).
2026-02-09 21:03:57 +00:00
Viktor Barzin
cb761f90d7
[ci skip] allow 100 time slicing of nvidia gpu 2026-02-09 21:00:15 +00:00
Viktor Barzin
b6499a8090
[ci skip] Add descheduler profile to restart idrac-redfish-exporter every 6h 2026-02-09 20:57:09 +00:00
Viktor Barzin
a33476919f
[ci skip] Add WebAuthn env vars to real-estate-crawler API deployment 2026-02-08 20:06:24 +00:00
Viktor Barzin
74cc37a3c2
[ci skip] Fix grampsweb ingress: set service_name to match backend service
The ingress_factory defaults service_name to name, so it was routing
to a non-existent "family" service instead of "grampsweb".
2026-02-08 02:30:19 +00:00
Viktor Barzin
d911db6cd9
[ci skip] Deploy Gramps Web genealogy service
Add grampsweb module with web app + Celery worker in a single pod,
using shared Redis (DB 2/3), NFS storage, email via mailserver,
and Ollama AI integration. Available at family.viktorbarzin.me.
2026-02-08 02:30:18 +00:00
Viktor Barzin
43bee50de8
[ci skip] Deploy health dashboard service
Apple Health data visualization app (Svelte + FastAPI + Caddy).
Uses shared PostgreSQL via DBaaS, NFS storage for uploads,
accessible at health.viktorbarzin.me.
2026-02-08 01:54:24 +00:00
Viktor Barzin
44a17f8089
[ci skip] Add Ollama TCP entrypoint for HA voice pipeline
Expose Ollama at 10.0.20.202:11434 via Traefik TCP passthrough,
bypassing TLS/auth issues with the HTTPS ingress.
2026-02-08 01:51:43 +00:00
Viktor Barzin
bdbd354396
[ci skip] Add Wyoming Piper TTS alongside Whisper STT
Deploy Piper (rhasspy/wyoming-piper) in the whisper namespace with
en_US-lessac-medium voice. Exposed via Traefik TCP on port 10200.
2026-02-08 01:51:43 +00:00
Viktor Barzin
d89947c2fd
[ci skip] Deploy Wyoming Whisper STT service for Home Assistant voice input
Add Wyoming Faster Whisper (rhasspy/wyoming-whisper) as a new K8s service
exposed via Traefik TCP entrypoint on port 10300. Accessible from ha-london
RPi via VPN at 10.0.20.202:10300.
2026-02-08 01:51:43 +00:00
Viktor Barzin
1ff5242a57
Bump Immich version from v2.5.2 to v2.5.5 2026-02-07 22:38:33 +00:00
Viktor Barzin
b27e1ad9f1
Add Docker registry UI and tag cleanup automation
Deploy joxit/docker-registry-ui on port 8080 for browsing images/tags.
Add Python script to prune old registry tags (keeps last N per image),
scheduled daily at 2am via cron. Expose UI via reverse proxy at
registry.viktorbarzin.me with Authentik auth.
2026-02-07 22:38:15 +00:00
Viktor Barzin
8fabc3d49b
[ci skip] Enable HTTP/3 (QUIC) for all ingresses
- Add http3.enabled + advertisedPort=443 to Traefik websecure entrypoint
- Add cloudflare_zone_settings_override to enable HTTP/3 for proxied domains
2026-02-07 20:43:49 +00:00
Viktor Barzin
a81e44dd82
[ci skip] Strip Authentik auth headers before forwarding to backend
Add strip-auth-headers Traefik middleware that removes X-authentik-*
headers from requests before they reach the backend. Backends like
iDRAC and TP-Link gateway break when receiving these extra headers.
2026-02-07 20:28:44 +00:00
Viktor Barzin
c1eac81095
[ci skip] Fix DNS forwarding through Traefik to Technitium
Expose UDP port 53 on the Traefik LoadBalancer service and enable
cross-namespace CRD references so the IngressRouteUDP in the traefik
namespace can route DNS traffic to technitium-dns in the technitium
namespace. This restores DNS resolution via 10.0.20.202 for pfSense
and Home Assistant.
2026-02-07 20:10:47 +00:00
Viktor Barzin
d4cf63dce9
[ci skip] Fix HTTPS backend proxying for reverse-proxy services
- Add insecureSkipVerify=true globally for self-signed backend certs
- Name service ports with https- prefix for HTTPS backends so Traefik uses HTTPS
- Add ServersTransport CRD for per-service insecureSkipVerify
- Add serversscheme/serverstransport annotations to reverse-proxy factory
2026-02-07 13:56:24 +00:00
Viktor Barzin
5bf2040491
[ci skip] Remove unsupported advertisedPort from Traefik Helm values 2026-02-07 13:41:06 +00:00
Viktor Barzin
3c2d496f45
[ci skip] Add --api.insecure=true to Traefik for dashboard access on port 8080 2026-02-07 13:35:58 +00:00
Viktor Barzin
c32acc70e6
Migrate all service modules from nginx-ingress to Traefik
- Remove nginx-specific ingress variables (use_proxy_protocol, proxy_timeout, additional_configuration_snippet)
- Update ingress annotations to use Traefik middleware CRDs
- Delete nginx-ingress module (replaced by traefik)
- Add new traefik middleware.tf for shared middleware definitions
- Update service modules to work with new ingress_factory interface
2026-02-07 13:25:49 +00:00
Viktor Barzin
0315dd4044
Migrate ingress_factory from nginx to Traefik annotations
- Replace nginx ingress class and annotations with Traefik middleware CRDs
- Add Traefik router middleware chain: rate-limit, CSP, CrowdSec, Authentik
- Remove nginx-specific proxy settings (handled by Traefik config)
- Add exclude_crowdsec and custom_content_security_policy options
- Add rybbit analytics and custom CSP middleware resources
2026-02-07 13:24:58 +00:00
Viktor Barzin
792f76454c
Add Traefik dashboard ingress with Authentik protection
- Enable api.insecure in Helm values for internal dashboard access on port 8080
- Add TLS secret, dashboard service, and ingress via ingress_factory (protected=true)
- Pass tls_secret_name to traefik module
- Add traefik to cloudflare_non_proxied_names DNS list
2026-02-07 13:06:57 +00:00
Viktor Barzin
877650034e
Fix AFFiNE init container migration command for v0.26.0
The stable image removed scripts/self-host-predeploy.js. Use the new
predeploy flow: prisma migrate + dist/main.js run.

[ci skip]
2026-02-07 10:33:43 +00:00
Viktor Barzin
af38a71183 Add excalidraw project gitignore and README 2026-02-06 20:38:32 +00:00
Viktor Barzin
306ee8e6ee
[ci skip] add blotting book repo 2026-02-06 20:32:08 +00:00
Viktor Barzin
cf25e1af4e
Add Celery worker/beat deployments and fix crawler API config
Add celery worker and celery beat deployments for background task
processing and scheduled scraping. Fix API container name, add
image_pull_policy Always, and add missing path_type to ingress rules.
2026-02-06 20:31:34 +00:00
Viktor Barzin
ccf25cc99c Upgrade immich to v2.5.2 and add GPU toleration to ML pod
Bump immich version from v2.5.0 to v2.5.2. Add nvidia.com/gpu
toleration to immich-machine-learning deployment.
2026-02-06 20:28:29 +00:00
Viktor Barzin
27c8d60555 Forward authentik response headers through ingress
Add auth-response-headers annotation to pass user identity headers
(username, uid, email, name, groups) from authentik to backend services.
2026-02-06 20:26:21 +00:00
Viktor Barzin
a394c7a7ca Add audiblez-web application source
Web frontend for audiblez audiobook conversion with FastAPI backend.
2026-02-06 20:24:10 +00:00
Viktor Barzin
596c02dfde Add audiblez-web service and refactor ebook2audiobook deployments
Uncomment ebook2audiobook deployment with proper GPU tolerations
(set to 0 replicas). Disable audiblez CLI deployment in favor of
audiblez-web. Add new audiblez-web deployment, service, and ingress
with GPU support, large upload limits, and auth protection.
2026-02-06 20:22:05 +00:00
Viktor Barzin
9689b67895 Add GPU node taint tolerations and enhance GPU memory exporter
Add nvidia.com/gpu toleration to all GPU workloads (frigate, ollama)
to support NoSchedule taint on GPU nodes. Update nvidia operator
helm values with daemonset tolerations. Enhance GPU pod memory
exporter with Kubernetes API integration to resolve container IDs
to pod names/namespaces, adding RBAC resources for API access.
2026-02-06 20:19:26 +00:00
Viktor Barzin
29567103d6 Add DRONE_WEBHOOK_SECRET for GitHub webhook authentication
Fixes webhook signature validation failures causing 400 errors.
2026-02-01 20:42:07 +00:00
Viktor Barzin
4a857ebefd Add per-pod GPU memory metrics exporter
- Add DaemonSet that runs on GPU node and exposes Prometheus metrics
- Uses nvidia-smi to collect per-process GPU memory usage
- Maps PIDs to container IDs via /proc/<pid>/cgroup
- Exposes gpu_pod_memory_used_bytes metric at :9401/metrics
- Add Prometheus scrape config for gpu-pod-memory job

[ci skip]
2026-01-31 16:58:14 +00:00
Viktor Barzin
09a5e3a273
Add crowdsec-blocklist-import CronJob
Import public threat intelligence blocklists into CrowdSec daily at 4 AM.
Uses kubectl exec to run the import script inside an existing CrowdSec
agent pod that is already registered with the LAPI.

Source: https://github.com/wolffcatskyy/crowdsec-blocklist-import

[ci skip]
2026-01-28 20:11:44 +00:00
Viktor Barzin
2ac92167c5
fix resume pdf generation [ci skip] 2026-01-28 19:42:13 +00:00
Viktor Barzin
eeabe652d3
upgrade immich to 2.5.0 [ci skip] 2026-01-28 19:41:52 +00:00
Viktor Barzin
078e1eeeef
add the yt-highlights app [ci skip] 2026-01-28 18:03:49 +00:00
Viktor Barzin
f867de6e7d
ad service for youtube video highlights [ci skip] 2026-01-28 17:58:39 +00:00
Viktor Barzin
19a41367ba
add reactive resume service [ci skip] 2026-01-28 17:57:39 +00:00