Commit graph

2509 commits

Author SHA1 Message Date
Viktor Barzin
a39b90bbcc state(ollama): update encrypted state 2026-04-14 11:18:10 +00:00
Viktor Barzin
a25739a572 state(poison-fountain): update encrypted state 2026-04-14 11:13:32 +00:00
Viktor Barzin
d9ddf102ec state(plotting-book): update encrypted state 2026-04-14 11:13:02 +00:00
Viktor Barzin
6d209fffad state(meshcentral): update encrypted state 2026-04-14 11:11:59 +00:00
Viktor Barzin
d0805ed2a8 state(infra-maintenance): update encrypted state 2026-04-14 11:11:09 +00:00
Viktor Barzin
28264e69c6 state(headscale): update encrypted state 2026-04-14 11:11:05 +00:00
Viktor Barzin
1738c3437c state(frigate): update encrypted state 2026-04-14 11:09:30 +00:00
Viktor Barzin
fe42993446 state(ebook2audiobook): update encrypted state 2026-04-14 11:08:37 +00:00
Viktor Barzin
23140cf780 state(real-estate-crawler): update encrypted state 2026-04-14 11:08:24 +00:00
Viktor Barzin
d24e4aac0b state(osm_routing): update encrypted state 2026-04-14 11:08:09 +00:00
Viktor Barzin
94b7097789 state(openclaw): update encrypted state 2026-04-14 11:08:05 +00:00
Viktor Barzin
25f4682dc0 state(nextcloud): update encrypted state 2026-04-14 11:06:41 +00:00
Viktor Barzin
aac81e0a1f state(vault): update encrypted state 2026-04-14 11:06:27 +00:00
Viktor Barzin
047f695129 state(ytdlp): update encrypted state 2026-04-14 11:06:11 +00:00
Viktor Barzin
20e86e96a3 state(servarr): update encrypted state 2026-04-14 11:05:54 +00:00
Viktor Barzin
0d6b6cbd95 state(navidrome): update encrypted state 2026-04-14 11:05:10 +00:00
Viktor Barzin
9ea3b33a55 state(ebooks): update encrypted state 2026-04-14 10:54:47 +00:00
Viktor Barzin
ea18116da9 fix: NFS outage recovery — migrate to NFSv4, add alerting
NFS server restart broke NFSv3 (lockd kernel bug on PVE 6.14).
All 52 NFS PVs patched to nfsvers=4, NFSv3 disabled on PVE.

Changes:
- nfs_volume module: add nfsvers=4 mount option
- nfs-csi StorageClass: add nfsvers=4 mount option
- dbaas: MySQL serverInstances 3→1, mysql-native-password=ON
- monitoring: add NFSCSINodeDown and NFSMountFailures alerts

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:28:27 +00:00
Viktor Barzin
92900b5e08 state(dbaas): update encrypted state 2026-04-14 10:27:04 +00:00
Viktor Barzin
b4b6fd5946 state(nfs-csi): update encrypted state 2026-04-14 09:32:41 +00:00
Viktor Barzin
30e5150ecd state(status-page): update encrypted state 2026-04-14 09:31:50 +00:00
Viktor Barzin
ac3a6a96dd state(hermes-agent): update encrypted state 2026-04-14 09:04:35 +00:00
Viktor Barzin
37b3395017 state(hermes-agent): update encrypted state 2026-04-14 09:00:30 +00:00
Viktor Barzin
8b2d3b7e6c state(hermes-agent): update encrypted state 2026-04-14 08:58:36 +00:00
Viktor Barzin
8787ad9f1d state(hermes-agent): update encrypted state 2026-04-14 08:39:18 +00:00
Viktor Barzin
71182f2867 state(openclaw): update encrypted state 2026-04-14 08:39:06 +00:00
Viktor Barzin
aa3af753a6 state(openclaw): update encrypted state 2026-04-14 08:38:54 +00:00
Viktor Barzin
4e059b138c docs: consolidate all post-mortems under docs/post-mortems/
Move HTML post-mortems from repo root post-mortems/ to docs/post-mortems/.
Update index.html with all 3 incidents (newest first).

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 08:24:36 +00:00
Viktor Barzin
bdba15a387 docs: move post-mortems to docs/post-mortems/
Consolidate all outage reports under docs/ for better discoverability.
Moved from .claude/post-mortems/ (agent-internal) to docs/post-mortems/
(repo documentation).

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 08:20:09 +00:00
Viktor Barzin
68c8c5b4a0 fix(technitium): migrate primary to proxmox-lvm-encrypted + post-mortem
SEV1 outage: fsid=0 in PVE /etc/exports broke all NFS subdirectory
mounts from k8s (NFSv4 pseudo-root path resolution). Combined with
lockd failure, both NFSv4 and NFSv3 mount paths broken. Cascaded
into DNS primary, Vault (2/3 pods), Alertmanager, 20+ services.

Changes:
- Primary PVC: NFS (nfs-truenas) → proxmox-lvm-encrypted
- Secondary/tertiary PVCs: proxmox-lvm → proxmox-lvm-encrypted
- Removed NFS module dependency from technitium stack
- Added full post-mortem with prevention plan

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 08:18:59 +00:00
Viktor Barzin
b239af9b6d state(technitium): update encrypted state 2026-04-14 08:04:07 +00:00
Viktor Barzin
6afba3b338 state(hermes-agent): update encrypted state 2026-04-13 22:33:35 +00:00
Viktor Barzin
e8ef81b276 state(hermes-agent): update encrypted state 2026-04-13 22:31:39 +00:00
Viktor Barzin
ab52b8eec2 state(hermes-agent): update encrypted state 2026-04-13 22:26:19 +00:00
Viktor Barzin
110d5d1f86 state(hermes-agent): update encrypted state 2026-04-13 22:20:23 +00:00
Viktor Barzin
d4c71be41c state(hermes-agent): update encrypted state 2026-04-13 22:09:19 +00:00
Viktor Barzin
6dcccbd8fc state(hermes-agent): update encrypted state 2026-04-13 22:07:09 +00:00
Viktor Barzin
04400fb7bd state(hermes-agent): update encrypted state 2026-04-13 22:07:09 +00:00
Viktor Barzin
e0518802f4 state(hermes-agent): update encrypted state 2026-04-13 22:07:08 +00:00
Viktor Barzin
1ef40daeec docs: update for MySQL 3→1, CrowdSec/Technitium PG migration, PG tuning, NFS async, node OS tuning [ci skip] 2026-04-13 23:05:46 +01:00
Viktor Barzin
f6d9959557 state(proxmox-csi): update encrypted state 2026-04-13 23:04:58 +01:00
Viktor Barzin
61988669e4 state(proxmox-csi): update encrypted state 2026-04-13 23:04:58 +01:00
Viktor Barzin
ad53f66bcd state(proxmox-csi): update encrypted state 2026-04-13 23:04:58 +01:00
Viktor Barzin
8b45944b27 state(proxmox-csi): update encrypted state 2026-04-13 23:04:58 +01:00
Viktor Barzin
0eb96e4e22 state(vault): update encrypted state 2026-04-13 23:04:57 +01:00
Viktor Barzin
c5393e6a72 state(wealthfolio): update encrypted state 2026-04-13 23:04:57 +01:00
Viktor Barzin
a5c92c4c78 state(health): update encrypted state 2026-04-13 23:04:57 +01:00
Viktor Barzin
d94b06627d state(affine): update encrypted state 2026-04-13 23:04:56 +01:00
Viktor Barzin
82f674a0b4 rename weekly-backup → daily-backup across scripts, timers, services, and docs [ci skip]
Reflects the schedule change from weekly to daily. All references updated:
- scripts/weekly-backup.{sh,timer,service} → daily-backup.*
- Pushgateway job name: weekly-backup → daily-backup
- Prometheus metric names: weekly_backup_* → daily_backup_*
- All docs, runbooks, AGENTS.md, CLAUDE.md, proxmox-inventory
- offsite-sync dependency: After=daily-backup.service

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:37:04 +00:00
Viktor Barzin
ca5039f8aa switch backup + offsite sync from weekly to daily — RPO 7d → 1d [ci skip]
- weekly-backup.timer: Sun 05:00 → daily 05:00
- offsite-sync-backup.timer: Sun 08:00 → daily 06:00
- Monthly full rsync --delete unchanged (1st-7th of month)
- Total daily I/O cost: ~20GB sdc reads, ~3.5GB sda writes, seconds of network
- Updated script headers and service descriptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:24:38 +00:00