infra

Author	SHA1	Message	Date
Viktor Barzin	6cdee231cd	state(shadowsocks): update encrypted state	2026-03-24 18:08:04 +02:00
Viktor Barzin	842e870971	state(headscale): update encrypted state	2026-03-24 18:08:02 +02:00
Viktor Barzin	33037eba46	upgrade MetalLB v0.10.2 → v0.15.3 and update annotations - Replace custom ViktorBarzin/metallb module with official Helm chart - Migrate from ConfigMap-based config to CRD (IPAddressPool + L2Advertisement) - Update Traefik LB annotations from metallb.universe.tf to metallb.io format - Technitium DNS keeps stable IP 10.0.20.204 via MetalLB auto-assignment - Headscale split DNS already configured to use 10.0.20.204	2026-03-24 17:24:05 +02:00
Viktor Barzin	957f13dfd6	state(headscale): update encrypted state	2026-03-24 17:23:34 +02:00
Viktor Barzin	7478f545e0	state(metallb): update encrypted state	2026-03-24 17:23:18 +02:00
Viktor Barzin	dd46252d17	state(metallb): update encrypted state	2026-03-24 17:23:01 +02:00
Viktor Barzin	7ef390f14e	state(metallb): update encrypted state	2026-03-24 17:22:53 +02:00
Viktor Barzin	1defd711fe	state(metallb): update encrypted state	2026-03-24 17:15:06 +02:00
Viktor Barzin	793490eaf4	state(metallb): update encrypted state	2026-03-24 17:14:14 +02:00
Viktor Barzin	d079666d34	state(metallb): update encrypted state	2026-03-24 17:11:26 +02:00
Viktor Barzin	b68f778c5a	state(headscale): update encrypted state	2026-03-24 16:47:26 +02:00
Viktor Barzin	3ecb792a44	state(headscale): update encrypted state	2026-03-24 15:30:25 +02:00
Viktor Barzin	0ee6cade38	state(headscale): update encrypted state	2026-03-24 15:12:01 +02:00
Viktor Barzin	a644eb1c8e	headscale: add STUN port, upgrade to 0.28.0, fix Home DERP connectivity - Expose STUN port 3479/UDP on container and LoadBalancer service - Upgrade headscale from 0.23.0 to 0.28.0 - Vault config updated: auto DERP region with ipv4 field, ISP router port forward for UDP 3479 added Home DERP now shows ~3ms latency and is selected as nearest relay.	2026-03-24 14:51:09 +02:00
Viktor Barzin	fafea4b110	state(headscale): update encrypted state	2026-03-24 14:45:31 +02:00
Viktor Barzin	2cbcf00b8e	state(headscale): update encrypted state	2026-03-24 14:36:30 +02:00
Viktor Barzin	20b0d564f1	state(headscale): update encrypted state	2026-03-24 14:32:12 +02:00
Viktor Barzin	78f302d6c0	state(headscale): update encrypted state	2026-03-24 14:30:02 +02:00
Viktor Barzin	d2c50be088	state(headscale): update encrypted state	2026-03-24 12:49:23 +02:00
Viktor Barzin	5161f77118	state(headscale): update encrypted state	2026-03-24 12:05:34 +02:00
Viktor Barzin	4aa0e97e1d	remove terraform.tfvars from terragrunt loading — complete Vault migration All 148 secret variables were migrated to Vault KV / SOPS / ESO. The legacy terraform.tfvars silently overrode config.tfvars values (e.g. stale postgresql_host), creating override risk. [ci skip]	2026-03-24 11:14:06 +02:00
Viktor Barzin	540d7de807	add wealthfolio-sync CronJob for automated portfolio sync Monthly CronJob (1st at 08:00 UTC) syncs trades from Schwab, Trading 212, and InvestEngine into Wealthfolio SQLite DB. Added Kyverno ndots lifecycle ignore. Removed stale manual sync comment.	2026-03-24 02:07:36 +02:00
Viktor Barzin	5d12f92816	state(wealthfolio): update encrypted state	2026-03-24 02:07:17 +02:00
Viktor Barzin	4ca7af8818	add audiobook-search service to servarr stack - New audiobook-search deployment + service + ingress (Authentik-protected) - qBittorrent: add NFS mount for /audiobooks (shared with Audiobookshelf) - Cloudflare DNS: add audiobook-search.viktorbarzin.me - Env vars: QBITTORRENT_URL/PASS, AUDIOBOOKSHELF_URL/TOKEN from ESO	2026-03-24 01:21:49 +02:00
Viktor Barzin	dbff547741	remove docs/backup-strategy.md, absorbed into architecture/backup-dr.md [ci skip]	2026-03-24 01:08:06 +02:00
Viktor Barzin	5a42643176	add architecture documentation for all infrastructure subsystems [ci skip] 14 docs covering networking, VPN, storage, authentication, security, monitoring, secrets, CI/CD, backup/DR, compute, databases, and multi-tenancy. Each doc includes Mermaid diagrams, component tables, configuration references, decision rationale, and troubleshooting.	2026-03-24 00:55:25 +02:00
Viktor Barzin	31767ed8e7	state(headscale): update encrypted state	2026-03-24 00:03:03 +02:00
Viktor Barzin	2adf68ae03	state(platform): update encrypted state	2026-03-23 23:48:38 +02:00
Viktor Barzin	28f349a8f6	state(servarr): update encrypted state	2026-03-23 23:46:08 +02:00
Viktor Barzin	d9eaf42f36	exclude iDRAC from HighServiceLatency alert iDRAC Redfish exporter is inherently slow, causing noisy alerts.	2026-03-23 22:51:42 +02:00
root	eeae58861b	Woodpecker CI Update TLS Certificates Commit	2026-03-23 20:38:38 +00:00
Viktor Barzin	3bca7a97c2	fix(renew-tls): update TLS secret in ALL namespaces, not just kyverno Kyverno generate+synchronize only manages secrets it created itself. Existing Terraform-managed secrets in ~70 namespaces weren't updated. Now loops through all namespaces and kubectl apply the new cert.	2026-03-23 22:36:31 +02:00
root	dadbec0eb4	Woodpecker CI Update TLS Certificates Commit	2026-03-23 20:34:36 +00:00
Viktor Barzin	2dcb4b7fa4	fix(renew-tls): clean stale _acme-challenge TXT records before certbot 21+ stale TXT records accumulated from previous runs, causing certbot DNS-01 challenge to fail. Now deletes all _acme-challenge records from Cloudflare before certbot creates fresh ones.	2026-03-23 22:32:27 +02:00
Viktor Barzin	b7409cea4e	fix(renew-tls): use alpine+curl for kubectl step to avoid permission denied bitnami/kubectl runs as non-root UID 1001, cannot read git-crypt decrypted secrets owned by root. Switch to alpine (runs as root) with kubectl downloaded directly.	2026-03-23 22:28:56 +02:00
root	b5dd43aeab	Woodpecker CI Update TLS Certificates Commit	2026-03-23 20:27:00 +00:00
Viktor Barzin	304f0de43a	add Metric Staleness alerts for UPS, iDRAC, ATS, and HA metrics Replace fragile NoiDRACData alert with proper absent() checks. Add UPSMetricsMissing (critical), iDRACRedfishMetricsMissing, iDRACSNMPMetricsMissing, ATSMetricsMissing, and HomeAssistantMetricsMissing alerts. Update PowerOutage and NodeDown inhibit rules to suppress staleness alerts during outages.	2026-03-23 22:24:17 +02:00
Viktor Barzin	0c307f4d3d	state(kyverno): update encrypted state	2026-03-23 22:20:18 +02:00
Viktor Barzin	16cde1eab5	add Kyverno TLS secret sync + enhance renewal pipeline Kyverno ClusterPolicy clones tls-secret from kyverno namespace to all namespaces with synchronize=true. Renewal pipeline now updates the source secret via kubectl, verifies cert validity, and sends Slack notification.	2026-03-23 22:19:34 +02:00
Viktor Barzin	6a2bee93b5	fix(monitoring): use patched idrac exporter with PSU input voltage metric The upstream ghcr.io/mrlhansen/idrac_exporter:2.4.1 is missing NewPowerSupplyInputVoltage in RefreshPowerOld, so the R730 iDRAC never emits idrac_power_supply_input_voltage. Switch to the patched viktorbarzin/idrac-redfish-exporter:2.4.1-voltage-fix image.	2026-03-23 22:07:36 +02:00
Viktor Barzin	b6bc51b42b	state(platform): update encrypted state	2026-03-23 22:04:06 +02:00
Viktor Barzin	a95d434ff1	fix backup IO stats: use /proc/$$/io instead of /proc/self/io /proc/self/io inside $(awk ...) resolves to the awk subprocess PID, not the parent bash shell. Use $$ (bash PID) to read the correct process IO counters.	2026-03-23 12:33:52 +02:00
Viktor Barzin	0a294a30a6	add backup IO logging, Pushgateway metrics, and Grafana dashboard - Add /proc/self/io read/write tracking to vault raft-backup and etcd backup - Push backup_duration_seconds, backup_read_bytes, backup_written_bytes, backup_last_success_timestamp to Pushgateway from all 6 backup CronJobs (etcd skipped — distroless image has no wget/curl) - Add cloudsync_duration_seconds metric to cloudsync-monitor - New "Backup Health" Grafana dashboard with 8 panels: time since last backup, overview table, duration/IO trends, cloud sync status, alerts, CronJob schedule	2026-03-23 12:19:01 +02:00
Viktor Barzin	0b595751c5	move Frigate cache to tmpfs to eliminate disk writes on node1 Add 512Mi tmpfs emptyDir for /tmp/cache — Frigate writes 10s MP4 segments here continuously for all cameras. With motion-only retention, segments without events are deleted immediately anyway, so losing them on pod restart is acceptable. Node1 disk writes: 3.55 MB/s → 2.08 MB/s (previous commit) → 96 KB/s (now)	2026-03-23 11:52:49 +02:00
Viktor Barzin	2855da2a3c	state(frigate): update encrypted state	2026-03-23 11:49:40 +02:00
Viktor Barzin	3f0ecda737	harden pull-through cache: intercept errors, reduce lock timeout, add healthz - Add proxy_intercept_errors + error_page for 502/503/504 on blob locations to prevent caching truncated upstream responses (root cause of repeated ImagePullBackOff across services) - Reduce proxy_cache_lock_timeout from 15m to 5m — fail fast, let containerd retry instead of all concurrent pulls waiting on a failed first download - Add proxy_cache_valid any 0 — never cache error responses - Add /healthz endpoints on Docker Hub and GHCR servers - Add draintimeout and proxy.ttl to registry proxy configs	2026-03-23 11:33:06 +02:00
Viktor Barzin	1639910043	ingress latency: add histogram buckets, fix restarts, right-size memory - Traefik: add fine-grained Prometheus histogram buckets (0.01-30s) for meaningful P50/P99 - Calibre: relax liveness probe (timeout 5→10s, threshold 3→6) to stop NFS-caused restarts - Novelapp: increase memory 128Mi/256Mi → 640Mi/640Mi (confirmed OOMKilled, VPA upper 505Mi) - Forgejo: increase memory 256Mi → 384Mi (at 80% of limit, VPA upper 311Mi) - ActualBudget: add explicit resources to prevent silent LimitRange defaults - Docs: update Nextcloud note from 4Gi → 8Gi limit (Apache spike history)	2026-03-23 10:52:43 +02:00
Viktor Barzin	5652972c53	fix dashboard: add refIds, explicit panel IDs, fix CrowdSec bouncer metric - Added refId to all targets (required by Grafana) - Added explicit panel IDs for stable references - Fixed CrowdSec bouncer metric: cs_lapi_bouncer_requests_total doesn't exist, use cs_lapi_route_requests_total instead - Added drawStyle/showPoints to all timeseries panels - Updated via MySQL + ConfigMap + Grafana restart	2026-03-23 10:31:44 +02:00
Viktor Barzin	45d48e7ce7	state(headscale): update encrypted state	2026-03-23 10:27:04 +02:00
Viktor Barzin	9527f62c2e	fix network traffic dashboard: use only available GoFlow2 metrics GoFlow2 v2 only exposes aggregate metrics (traffic_bytes_total, process_nf_total, delay_seconds) — no per-source/dest labels. Removed panels referencing non-existent src_addr/dst_port labels. Replaced with flowset records by type, separated bytes and flows into own panels to avoid scale issues.	2026-03-23 10:16:46 +02:00

1 2 3 4 5 ...

1974 commits