Commit graph

2490 commits

Author SHA1 Message Date
Viktor Barzin
b4b6fd5946 state(nfs-csi): update encrypted state 2026-04-14 09:32:41 +00:00
Viktor Barzin
30e5150ecd state(status-page): update encrypted state 2026-04-14 09:31:50 +00:00
Viktor Barzin
ac3a6a96dd state(hermes-agent): update encrypted state 2026-04-14 09:04:35 +00:00
Viktor Barzin
37b3395017 state(hermes-agent): update encrypted state 2026-04-14 09:00:30 +00:00
Viktor Barzin
8b2d3b7e6c state(hermes-agent): update encrypted state 2026-04-14 08:58:36 +00:00
Viktor Barzin
8787ad9f1d state(hermes-agent): update encrypted state 2026-04-14 08:39:18 +00:00
Viktor Barzin
71182f2867 state(openclaw): update encrypted state 2026-04-14 08:39:06 +00:00
Viktor Barzin
aa3af753a6 state(openclaw): update encrypted state 2026-04-14 08:38:54 +00:00
Viktor Barzin
4e059b138c docs: consolidate all post-mortems under docs/post-mortems/
Move HTML post-mortems from repo root post-mortems/ to docs/post-mortems/.
Update index.html with all 3 incidents (newest first).

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 08:24:36 +00:00
Viktor Barzin
bdba15a387 docs: move post-mortems to docs/post-mortems/
Consolidate all outage reports under docs/ for better discoverability.
Moved from .claude/post-mortems/ (agent-internal) to docs/post-mortems/
(repo documentation).

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 08:20:09 +00:00
Viktor Barzin
68c8c5b4a0 fix(technitium): migrate primary to proxmox-lvm-encrypted + post-mortem
SEV1 outage: fsid=0 in PVE /etc/exports broke all NFS subdirectory
mounts from k8s (NFSv4 pseudo-root path resolution). Combined with
lockd failure, both NFSv4 and NFSv3 mount paths broken. Cascaded
into DNS primary, Vault (2/3 pods), Alertmanager, 20+ services.

Changes:
- Primary PVC: NFS (nfs-truenas) → proxmox-lvm-encrypted
- Secondary/tertiary PVCs: proxmox-lvm → proxmox-lvm-encrypted
- Removed NFS module dependency from technitium stack
- Added full post-mortem with prevention plan

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 08:18:59 +00:00
Viktor Barzin
b239af9b6d state(technitium): update encrypted state 2026-04-14 08:04:07 +00:00
Viktor Barzin
6afba3b338 state(hermes-agent): update encrypted state 2026-04-13 22:33:35 +00:00
Viktor Barzin
e8ef81b276 state(hermes-agent): update encrypted state 2026-04-13 22:31:39 +00:00
Viktor Barzin
ab52b8eec2 state(hermes-agent): update encrypted state 2026-04-13 22:26:19 +00:00
Viktor Barzin
110d5d1f86 state(hermes-agent): update encrypted state 2026-04-13 22:20:23 +00:00
Viktor Barzin
d4c71be41c state(hermes-agent): update encrypted state 2026-04-13 22:09:19 +00:00
Viktor Barzin
6dcccbd8fc state(hermes-agent): update encrypted state 2026-04-13 22:07:09 +00:00
Viktor Barzin
04400fb7bd state(hermes-agent): update encrypted state 2026-04-13 22:07:09 +00:00
Viktor Barzin
e0518802f4 state(hermes-agent): update encrypted state 2026-04-13 22:07:08 +00:00
Viktor Barzin
1ef40daeec docs: update for MySQL 3→1, CrowdSec/Technitium PG migration, PG tuning, NFS async, node OS tuning [ci skip] 2026-04-13 23:05:46 +01:00
Viktor Barzin
f6d9959557 state(proxmox-csi): update encrypted state 2026-04-13 23:04:58 +01:00
Viktor Barzin
61988669e4 state(proxmox-csi): update encrypted state 2026-04-13 23:04:58 +01:00
Viktor Barzin
ad53f66bcd state(proxmox-csi): update encrypted state 2026-04-13 23:04:58 +01:00
Viktor Barzin
8b45944b27 state(proxmox-csi): update encrypted state 2026-04-13 23:04:58 +01:00
Viktor Barzin
0eb96e4e22 state(vault): update encrypted state 2026-04-13 23:04:57 +01:00
Viktor Barzin
c5393e6a72 state(wealthfolio): update encrypted state 2026-04-13 23:04:57 +01:00
Viktor Barzin
a5c92c4c78 state(health): update encrypted state 2026-04-13 23:04:57 +01:00
Viktor Barzin
d94b06627d state(affine): update encrypted state 2026-04-13 23:04:56 +01:00
Viktor Barzin
82f674a0b4 rename weekly-backup → daily-backup across scripts, timers, services, and docs [ci skip]
Reflects the schedule change from weekly to daily. All references updated:
- scripts/weekly-backup.{sh,timer,service} → daily-backup.*
- Pushgateway job name: weekly-backup → daily-backup
- Prometheus metric names: weekly_backup_* → daily_backup_*
- All docs, runbooks, AGENTS.md, CLAUDE.md, proxmox-inventory
- offsite-sync dependency: After=daily-backup.service

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:37:04 +00:00
Viktor Barzin
ca5039f8aa switch backup + offsite sync from weekly to daily — RPO 7d → 1d [ci skip]
- weekly-backup.timer: Sun 05:00 → daily 05:00
- offsite-sync-backup.timer: Sun 08:00 → daily 06:00
- Monthly full rsync --delete unchanged (1st-7th of month)
- Total daily I/O cost: ~20GB sdc reads, ~3.5GB sda writes, seconds of network
- Updated script headers and service descriptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:24:38 +00:00
Viktor Barzin
b45cee5c4a docs: update backup architecture for inotify change tracking + consolidated Synology layout [ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:16:36 +00:00
Viktor Barzin
28ad11d12c consolidate offsite backup: inotify change tracking, deduplicate Synology paths [ci skip]
Architecture overhaul:
- Synology truenas/ renamed to nfs/, immich paths flattened to match source
- Created nfs-ssd/ on Synology for SSD data (thumbs, ML cache)
- Deleted pve-backup/nfs-mirror (53GB duplication eliminated)
- New inotifywait daemon (nfs-change-tracker.service) watches /srv/nfs + /srv/nfs-ssd
- offsite-sync Step 2: reads inotify change log, rsync --files-from only changed files
- weekly-backup: removed NFS mirror step entirely (NFS goes direct to Synology)
- Cleaned 9 orphaned LVs (101GB + 38 snapshots reclaimed from thin pool)

Performance: incremental sync completes in seconds (vs 30+ min with full rsync)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:06:20 +00:00
Viktor Barzin
aa4c125f9c improve 3-2-1 backup: auto-discover dirs, Immich offsite sync, SQLite backup [ci skip]
- weekly-backup.sh: replace hardcoded BACKUP_DIRS with glob auto-discovery
  (catches nextcloud-backup, council-complaints-backup, future dirs)
- weekly-backup.sh: add auto SQLite backup from PVC snapshots
  (magic number check, ?mode=ro URI, fallback to raw copy)
- offsite-sync-backup.sh: add NFS media direct-to-Synology sync
  (Immich, calibre, audiobookshelf — reuses existing TrueNAS Cloud Sync paths)
- Cleaned up 9 orphaned LVs + 38 snapshots on PVE host (101GB reclaimed)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 15:47:56 +00:00
Viktor Barzin
38d51ab0af deprecate TrueNAS: migrate Immich NFS to Proxmox, remove all 10.0.10.15 references [ci skip]
- Migrate Immich (8 NFS PVs, 1.1TB) from TrueNAS to Proxmox host NFS
- Update config.tfvars nfs_server to 192.168.1.127 (Proxmox)
- Update nfs-csi StorageClass share to /srv/nfs
- Update scripts (weekly-backup, cluster-healthcheck) to Proxmox IP
- Delete obsolete TrueNAS scripts (nfs_exports.sh, truenas-status.sh)
- Rewrite nfs-health.sh for Proxmox NFS monitoring
- Update Freedify nfs_music_server default to Proxmox
- Mark CloudSync monitor CronJob as deprecated
- Update Prometheus alert summaries
- Update all architecture docs, AGENTS.md, and reference docs
- Zero PVs remain on TrueNAS — VM ready for decommission

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:42:07 +00:00
Viktor Barzin
69248eaa7b state(nfs-csi): update encrypted state 2026-04-13 14:41:56 +00:00
Viktor Barzin
04eae139c6 state(immich): update encrypted state 2026-04-13 14:41:52 +00:00
Viktor Barzin
50ab67b5f7 state(immich): update encrypted state 2026-04-13 14:41:52 +00:00
root
b0303ab17d Woodpecker CI deploy commit [CI SKIP] 2026-04-12 21:39:26 +00:00
Viktor Barzin
a2fad3f20e docs(mailserver): remove HTML visual, fix probe frequency in diagram 2026-04-12 22:25:34 +01:00
Viktor Barzin
1c300a14cf mailserver: overhaul inbound delivery, monitoring, CrowdSec, and migrate to Brevo relay
Inbound:
- Direct MX to mail.viktorbarzin.me (ForwardEmail relay attempted and abandoned)
- Dedicated MetalLB IP 10.0.20.202 with ETP: Local for CrowdSec real-IP detection
- Removed Cloudflare Email Routing (can't store-and-forward)
- Fixed dual SPF violation, hardened to -all
- Added MTA-STS, TLSRPT, imported Rspamd DKIM into Terraform
- Removed dead BIND zones from config.tfvars (199 lines)

Outbound:
- Migrated from Mailgun (100/day) to Brevo (300/day free)
- Added Brevo DKIM CNAMEs and verification TXT

Monitoring:
- Probe frequency: 30m → 20m, alert thresholds adjusted to 60m
- Enabled Dovecot exporter scraping (port 9166)
- Added external SMTP monitor on public IP

Documentation:
- New docs/architecture/mailserver.md with full architecture
- New docs/architecture/mailserver-visual.html visualization
- Updated monitoring.md, CLAUDE.md, historical plan docs
2026-04-12 22:24:38 +01:00
Viktor Barzin
8bc02d1401 state(rybbit): update encrypted state 2026-04-12 22:17:01 +01:00
Viktor Barzin
4e80ac40c4 state(mailserver): update encrypted state 2026-04-12 22:16:25 +01:00
Viktor Barzin
e71a65acc4 state(mailserver): update encrypted state 2026-04-12 22:15:44 +01:00
Viktor Barzin
887152194c state(mailserver): update encrypted state 2026-04-12 22:12:43 +01:00
Viktor Barzin
333b289545 state(cloudflared): update encrypted state 2026-04-12 22:11:30 +01:00
Viktor Barzin
28934afb9a state(cloudflared): update encrypted state 2026-04-12 22:10:33 +01:00
Viktor Barzin
d227a5c896 state(cloudflared): update encrypted state 2026-04-12 22:10:30 +01:00
Viktor Barzin
2ba456e070 state(mailserver): update encrypted state 2026-04-12 21:46:34 +01:00
Viktor Barzin
92881ee6af state(mailserver): update encrypted state 2026-04-12 20:43:56 +01:00