Commit graph

2424 commits

Author SHA1 Message Date
Viktor Barzin
562f7b1db1 state(vaultwarden): update encrypted state 2026-04-12 12:54:29 +01:00
Viktor Barzin
a8c6daeaa5 state(infra-maintenance): update encrypted state 2026-04-12 12:51:11 +01:00
Viktor Barzin
5da6d75094 fix(monitoring): PodCrashLooping alert now fires only for active CrashLoopBackOff
Switch from restart-count based detection (increase restarts[1h] > 5) to
waiting-reason based (kube_pod_container_status_waiting_reason{reason="CrashLoopBackOff"}).
Alert auto-resolves when pod recovers, making it clear whether the issue is active.
2026-04-12 12:41:07 +01:00
Viktor Barzin
cc670d949c docs: add ha-sofia Version Control add-on to HA skill [ci skip]
HomeAssistantVersionControl v1.2.0 installed on ha-sofia for git-based
config tracking. Auto-commits on file change, pushes hourly to private
GitHub repo ViktorBarzin/ha-sofia-config.
2026-04-12 11:37:02 +01:00
Viktor Barzin
53df0285bd state(woodpecker): update encrypted state 2026-04-12 11:37:02 +01:00
Viktor Barzin
9ea75d1c6a state(ytdlp): update encrypted state 2026-04-12 11:37:01 +01:00
Viktor Barzin
4df92b1969 state(crowdsec): update encrypted state 2026-04-12 11:37:01 +01:00
root
a495311ed8 Woodpecker CI Update TLS Certificates Commit 2026-04-12 00:03:20 +00:00
Viktor Barzin
6ba4878f3a docs: update storage architecture for NFS migration to Proxmox host [ci skip] 2026-04-11 17:00:10 +01:00
Viktor Barzin
65551e4602 fix(dbaas): relax MySQL anti-affinity from required to preferred
Avoids pods getting stuck Pending during node outages while still
preferring to spread across nodes.
2026-04-11 16:26:24 +01:00
Viktor Barzin
ee66560661 state(ollama): update encrypted state 2026-04-11 12:03:00 +01:00
Viktor Barzin
dd81156316 state(ollama): update encrypted state 2026-04-11 11:47:26 +01:00
Viktor Barzin
6222f5af7e state(ollama): update encrypted state 2026-04-11 11:30:56 +01:00
Viktor Barzin
ed95626ce8 state(woodpecker): update encrypted state 2026-04-11 11:00:56 +01:00
Viktor Barzin
a568c39363 state(woodpecker): update encrypted state 2026-04-11 11:00:40 +01:00
Viktor Barzin
cbb45f5bec state(woodpecker): update encrypted state 2026-04-11 10:57:40 +01:00
Viktor Barzin
2164208120 state(woodpecker): update encrypted state 2026-04-11 10:57:19 +01:00
Viktor Barzin
f35e56759c state(ebooks): update encrypted state 2026-04-11 10:56:40 +01:00
Viktor Barzin
365a42ee72 state(servarr): update encrypted state 2026-04-11 10:40:39 +01:00
Viktor Barzin
2d50aa0714 state(servarr): update encrypted state 2026-04-11 10:40:36 +01:00
Viktor Barzin
9ac958f4db state(servarr): update encrypted state 2026-04-11 10:40:33 +01:00
Viktor Barzin
e60d397ec7 state(servarr): update encrypted state 2026-04-11 10:40:30 +01:00
Viktor Barzin
7fd2f63520 state(servarr): update encrypted state 2026-04-11 10:40:26 +01:00
Viktor Barzin
ae18047c18 state(servarr): update encrypted state 2026-04-11 10:40:22 +01:00
Viktor Barzin
1c48895696 state(servarr): update encrypted state 2026-04-11 10:40:17 +01:00
Viktor Barzin
3473a99e7f state(servarr): update encrypted state 2026-04-11 10:40:09 +01:00
Viktor Barzin
a4dd5aaed6 state(openclaw): update encrypted state 2026-04-11 10:37:46 +01:00
Viktor Barzin
c7bd381424 state(navidrome): update encrypted state 2026-04-11 10:35:57 +01:00
Viktor Barzin
160e8980e5 perf(immich): restore PostgreSQL vector search optimizations
- shared_buffers: 1GB → 2GB (clip_index is 452MB, needs headroom)
- effective_cache_size: 1536MB → 2560MB
- PG memory: 2Gi → 3Gi to support larger shared_buffers
- Add pg_prewarm to shared_preload_libraries with autoprewarm
- First search after restart: 999ms → 25ms
2026-04-11 10:30:44 +01:00
Viktor Barzin
5afef4c83e state(meshcentral): update encrypted state 2026-04-11 10:24:12 +01:00
Viktor Barzin
79ea17fa82 state(frigate): update encrypted state 2026-04-11 10:23:08 +01:00
Viktor Barzin
340e04de9c state(immich): update encrypted state 2026-04-11 10:22:51 +01:00
Viktor Barzin
71819b2c20 state(ytdlp): update encrypted state 2026-04-11 10:20:21 +01:00
Viktor Barzin
09b7163a06 state(real-estate-crawler): update encrypted state 2026-04-11 10:19:15 +01:00
Viktor Barzin
9a7b5b83b1 state(poison-fountain): update encrypted state 2026-04-11 10:17:28 +01:00
Viktor Barzin
0458b69b5d state(poison-fountain): update encrypted state 2026-04-11 10:17:23 +01:00
Viktor Barzin
2f51daf18e state(poison-fountain): update encrypted state 2026-04-11 10:17:19 +01:00
Viktor Barzin
aa58565ecc upgrade immich to v2.7.4 and increase rate limit burst
- Immich version: v2.7.3 → v2.7.4
- Immich rate limit: avg 200→500, burst 2000→5000 (both traefik and platform stacks)
2026-04-11 10:15:42 +01:00
Viktor Barzin
54dd6071d2 state(immich): update encrypted state 2026-04-11 10:15:41 +01:00
Viktor Barzin
223569e87b state(traefik): update encrypted state 2026-04-11 10:15:41 +01:00
Viktor Barzin
75814e4672 state(actualbudget): update encrypted state 2026-04-11 10:15:41 +01:00
Viktor Barzin
75255d22a2 fix(phpipam): fix London SSH via WG MTU reduction (1420→1200)
Root cause: PMTU black hole on WireGuard tunnel. The tunnel runs over
the HE IPv6 6in4 tunnel (gif0 MTU 1280). With WG overhead (~80 bytes),
effective inner MTU is 1200 — but both sides were configured at 1420.
SSH kex packets >1200 bytes were silently dropped.

Fix: Set tun_wg0 MTU to 1200 on pfSense + peer_855 MTU to 1200 on
London GL-iNet. Re-enabled London DHCP/ARP import in remote CronJob.

All 3 sites now fully automated:
- Sofia: Kea leases + ARP every 5min
- London: DHCP + ARP via pfSense→London SSH hop, hourly
- Valchedrym: DHCP + ARP via pfSense→OpenWRT SSH hop, hourly

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 08:18:42 +00:00
Viktor Barzin
d7de5de07c fix(monitoring): add pve_* metrics to Prometheus whitelist
ProxmoxMetricsMissing alert was firing because pve_* metrics were
excluded from the kubernetes-service-endpoints metric_relabel_configs
whitelist. The exporter was scraping successfully but metrics were
being dropped before ingestion.
2026-04-10 22:58:49 +01:00
Viktor Barzin
04beb123eb feat(phpipam): split CronJobs - Sofia 5min, remote sites hourly
- Sofia import (every 5min): Kea leases + pfSense ARP via SSH
- Remote import (hourly): Valchedrym DHCP/ARP via pfSense SSH hop
- London SSH (dropbear) hangs during kex on low-power router — disabled
  for now, data imported manually. TODO: lightweight push agent
- Fixed SSH key filename (id_rsa, not id_ed25519) for RSA keys
- No more ping sweeping anywhere — all passive DHCP/ARP data

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 21:24:58 +00:00
Viktor Barzin
73531c12e0 docs(vpn): update with dual-stack WG, GL-iNet AllowedIPs fix, and troubleshooting [ci skip]
Document fixes from 2026-04-10 London network debugging session:
- pfSense WG now dual-stack (IPv4+IPv6 via HE tunnel gif0 pf rule)
- GL-iNet AllowedIPs must be single comma-separated UCI entry (parse bug)
- AdGuardHome/carrier-monitor must not use 1.1.1.1 (conntrack + rate limit)
- Expanded troubleshooting for site-to-site tunnel disconnects
2026-04-10 22:24:19 +01:00
Viktor Barzin
a0392a9617 fix(nextcloud): auto-sync DB password from Vault rotation into config.php
Nextcloud persists dbpassword in config.php on its PVC and ignores
MYSQL_PASSWORD env var after initial install. When Vault rotates the
MySQL password, config.php goes stale causing HTTP 500 crash loops.

Adds a before-starting hook that patches config.php with the current
MYSQL_PASSWORD on every pod start. Combined with Stakater Reloader
annotation, the full rotation chain is now automated:
Vault rotates → ESO syncs Secret → Reloader restarts pod → hook
patches config.php → Nextcloud connects with new password.

Also fixes stale existingClaim (nextcloud-data-iscsi → nextcloud-data-proxmox).
2026-04-10 22:23:52 +01:00
Viktor Barzin
92e0c18e81 feat(phpipam): pull Valchedrym devices from OpenWRT DHCP/ARP via SSH
- CronJob now SSHs to Valchedrym OpenWRT (192.168.0.1) to pull DHCP leases + ARP table
- Parses /tmp/dhcp.leases for hostname + MAC, /proc/net/arp for additional devices
- London still uses ping sweep via pfSense WG tunnel (no SSH access to GL-iNet)
- 6 Valchedrym devices tracked: router, alarm, video, termoregulator, 2 clients

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 21:06:39 +00:00
Viktor Barzin
cddbb1c8b0 feat(phpipam): scan London/Valchedrym via WireGuard tunnel
- pfsense-import CronJob now scans remote subnets (192.168.8.0/24,
  192.168.0.0/24) via parallel ping sweep through pfSense WG tunnel
- 13 London devices + 1 Valchedrym device discovered
- Known hosts named: ha-london, rpi-london, openwrt-london
- fping cron container fully removed

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:58:49 +00:00
Viktor Barzin
eec6af6aef docs: add IPAM/DDNS architecture diagram and update docs
- networking.md: Add mermaid diagram showing full device discovery pipeline
  (Kea DHCP → DDNS → Technitium, pfSense import → phpIPAM → DNS sync)
- networking.md: Add data flow table, DHCP coverage table
- networking.md: Update pfSense (3 subnets + 42 reservations), phpIPAM
  (passive import replaces fping), Technitium (192.168.1.2 in ACL)
- CLAUDE.md: Update phpIPAM and networking descriptions

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:42:10 +00:00
Viktor Barzin
bba2de9eb1 refactor(phpipam): remove fping cron container
All device discovery now handled by phpipam-pfsense-import CronJob
which queries Kea DHCP leases + pfSense ARP table every 5min.
No active scanning needed — pfSense sees all devices passively.

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 20:38:59 +00:00