Commit graph

2137 commits

Author SHA1 Message Date
Viktor Barzin
e79b996624 state(authentik): update encrypted state 2026-03-28 14:51:24 +02:00
Viktor Barzin
7e0b0d9362 fix: headscale VPN setup hardening
- Add SQLite backup CronJob (every 6h to NFS for cloud sync pickup)
- Move headscale-ui secrets (COOKIE_SECRET, ROOT_API_KEY) from hardcoded
  values to Vault-managed secrets
- Add DERP IPv6 address (2001:470:6e:43d::2) for IPv6-capable clients
- Clean up stale test nodes, duplicate users, rename "localhost" nodes

Also updated headscale_config in Vault to include DERP ipv6 field
and headscale_ui_cookie_secret/headscale_ui_api_key secrets.
2026-03-28 14:38:12 +02:00
Viktor Barzin
b339d454dd state(headscale): update encrypted state 2026-03-28 14:37:16 +02:00
Viktor Barzin
a42003fb8f fix: add dedicated DERP IngressRoute bypassing middlewares
CrowdSec, rate limiting, anti-AI, and error pages middlewares were
interfering with the Upgrade: DERP protocol handshake. Also updated
Headscale ACL in Vault to allow tailnet DNS traffic to Technitium
(10.0.20.200:53).
2026-03-28 14:26:51 +02:00
Viktor Barzin
1ec11cdab4 state(headscale): update encrypted state 2026-03-28 14:22:44 +02:00
Viktor Barzin
eadc266691 state(headscale): update encrypted state 2026-03-28 14:06:03 +02:00
Viktor Barzin
04a96955c0 fix: exclude NFS PVs from PVFillingUp alert
NFS PVs report the entire NFS server filesystem usage (e.g., navidrome-music
shows 5.3 TiB Synology volume at 97%), not PVC-specific usage. Filter out
PVs with >1TiB capacity (always NFS mounts; iSCSI PVCs are 10-50Gi).
2026-03-28 01:14:05 +02:00
Viktor Barzin
ae21502698 fix: exclude disabled London Pi cloud sync task from CloudSyncFailing alert
Task 2 (Backup London pi) fails because 192.168.8.102 is unreachable.
Disabled task via TrueNAS, excluded task_id=2 from alert rule.
2026-03-27 15:15:48 +02:00
Viktor Barzin
252b65a574 fix: increase memory limits for OOMKilled pods (immich, clickhouse, speedtest)
- immich-server: limits 1700Mi → 2500Mi (70 restarts from media processing spikes)
- clickhouse: limits 1Gi → 1536Mi, max_server_memory_usage 800Mi → 1200Mi
- speedtest: limits 256Mi → 512Mi, requests 256Mi → 128Mi (daily OOM during test)
2026-03-27 13:57:16 +02:00
Viktor Barzin
399f0e2bd0 state(rybbit): update encrypted state 2026-03-27 13:56:54 +02:00
Viktor Barzin
44a1c3a155 state(immich): update encrypted state 2026-03-27 13:54:19 +02:00
Viktor Barzin
e23202399e state(speedtest): update encrypted state 2026-03-27 13:54:09 +02:00
Viktor Barzin
1ec480e5fa novelapp: grant vabbit81 (Gheorghe) admin RBAC on novelapp namespace 2026-03-26 17:34:48 +02:00
Viktor Barzin
2dc27ca128 state(novelapp): update encrypted state 2026-03-26 17:34:44 +02:00
Viktor Barzin
64d1a3bd24 state(woodpecker): update encrypted state 2026-03-26 17:34:18 +02:00
Viktor Barzin
e774d486fd state(rbac): add vabbit81 RBAC resources 2026-03-26 17:33:16 +02:00
Viktor Barzin
e65647edb4 state(vault): add vabbit81 user resources 2026-03-26 17:32:34 +02:00
Viktor Barzin
4e8d087b24 state(novelapp): update encrypted state 2026-03-26 17:23:34 +02:00
Viktor Barzin
5e6e71e727 novelapp: add NextAuth + Google OAuth env vars
Replace AUTH_SECRET with NEXTAUTH_URL, NEXTAUTH_SECRET, GOOGLE_CLIENT_ID,
and GOOGLE_CLIENT_SECRET for Google OAuth integration.
2026-03-26 17:14:16 +02:00
Viktor Barzin
70ea01fb6e vault: increase k8s auth token TTLs and add periodic renewal
Stagger token periods across roles (7d/8d/9d/10d) to prevent
bulk lease revocation storms that caused transient 504s.
Periodic tokens auto-renew indefinitely, eliminating mass expiry.
2026-03-26 12:21:47 +02:00
Viktor Barzin
b6ac68d7f2 state(vault): update encrypted state 2026-03-26 12:21:23 +02:00
Viktor Barzin
b8a5740138 reduce alert noise: remove 4 memory alerts, raise latency threshold [ci skip]
- Remove ClusterMemoryRequestsHigh, ContainerNearOOM, NodeLowFreeMemory,
  NodeMemoryPressureTrending — all fire regularly due to intentional
  memory overcommit and are not actionable
- Keep ContainerOOMKilled (actionable — container actually died)
- Raise HighServiceLatency p99 threshold from 10s to 30s to ignore
  transient spikes
2026-03-26 01:15:18 +02:00
Viktor Barzin
2445edea8f state(freedify): update encrypted state 2026-03-26 01:13:29 +02:00
Viktor Barzin
30d58bc4c8 state(freedify): update encrypted state 2026-03-26 01:11:16 +02:00
Viktor Barzin
9e99c14a77 state(freedify): update encrypted state 2026-03-26 00:36:47 +02:00
Viktor Barzin
9bc37bf257 state(freedify): update encrypted state 2026-03-26 00:15:49 +02:00
Viktor Barzin
c732e92613 state(reverse-proxy): update encrypted state 2026-03-26 00:07:46 +02:00
Viktor Barzin
074a2fceec state(reverse-proxy): update encrypted state 2026-03-26 00:07:41 +02:00
Viktor Barzin
50a3f81261 state(freedify): update encrypted state 2026-03-25 23:58:01 +02:00
Viktor Barzin
4e74f816bc cleanup: remove calibre and audiobookshelf stacks after ebooks migration [ci skip]
Both services migrated to unified ebooks namespace. Remove:
- Old stack directories and Terraform state
- calibre references from monitoring namespace lists
- calibre/audiobookshelf from operational scripts
2026-03-25 23:56:07 +02:00
Viktor Barzin
809e2a7624 state(audiobookshelf): update encrypted state 2026-03-25 23:54:55 +02:00
Viktor Barzin
60e83526ec state(calibre): update encrypted state 2026-03-25 23:54:38 +02:00
Viktor Barzin
3eb0418595 state(freedify): update encrypted state 2026-03-25 23:53:31 +02:00
Viktor Barzin
57d31de5a5 state(freedify): update encrypted state 2026-03-25 23:50:51 +02:00
Viktor Barzin
f49776aec9 state(freedify): update encrypted state 2026-03-25 23:41:15 +02:00
Viktor Barzin
95e49134ae cleanup: remove old audiobook-search, superseded by book-search
- Delete servarr/audiobook-search TF module (moved to ebooks/book-search)
- Remove audiobook-search from cloudflare_proxied_names
- Remove commented-out module reference in servarr/main.tf
- Clean up "renamed from" comment in ebooks/main.tf
- K8s resources (deploy/svc/ingress) deleted from servarr namespace
- Cloudflare DNS record already absent
- Import book-search and insta2spotify DNS records into cloudflared state
2026-03-25 23:16:01 +02:00
Viktor Barzin
97c789510e state(freedify): update encrypted state 2026-03-25 23:14:44 +02:00
Viktor Barzin
b731af1b91 state(platform): update encrypted state 2026-03-25 23:10:10 +02:00
Viktor Barzin
111201796b state(freedify): update encrypted state 2026-03-25 23:05:32 +02:00
Viktor Barzin
fe27709fd4 fix email monitor: use internal URL for Uptime Kuma push
Pods can't reach uptime.viktorbarzin.me externally. Switch to
http://uptime-kuma.uptime-kuma.svc.cluster.local for the push endpoint.
2026-03-25 22:59:26 +02:00
Viktor Barzin
a48149ff0d state(mailserver): update encrypted state 2026-03-25 22:58:35 +02:00
Viktor Barzin
b877ff18fc state(freedify): update encrypted state 2026-03-25 22:52:08 +02:00
Viktor Barzin
78dec8f0ad add e2e email roundtrip monitoring
CronJob (every 30 min) sends test email via Mailgun API to
smoke-test@viktorbarzin.me, verifies IMAP delivery in spam@ catch-all,
deletes test email, pushes metrics to Pushgateway + Uptime Kuma.

Prometheus alerts: EmailRoundtripFailing, EmailRoundtripStale,
EmailRoundtripNeverRun. Uptime Kuma: SMTP/IMAP port checks + E2E push.
2026-03-25 22:50:22 +02:00
Viktor Barzin
b9c2d7c1f6 state(freedify): update encrypted state 2026-03-25 22:24:39 +02:00
Viktor Barzin
49de96a0c1 state(mailserver): update encrypted state 2026-03-25 22:20:02 +02:00
Viktor Barzin
d1036de313 state(mailserver): update encrypted state 2026-03-25 22:16:06 +02:00
Viktor Barzin
a08b1e8384 state(freedify): update encrypted state 2026-03-25 22:15:24 +02:00
Viktor Barzin
f33940cbce state(mailserver): update encrypted state 2026-03-25 22:10:26 +02:00
Viktor Barzin
26ab7acbda state(mailserver): update encrypted state 2026-03-25 22:08:50 +02:00
Viktor Barzin
3adaf88f62 add MAM_ID env var to book-search deployment [ci skip] 2026-03-25 15:52:24 +02:00