Commit graph

2209 commits

Author SHA1 Message Date
Viktor Barzin
db3bcdb6c1 state(n8n): update encrypted state 2026-04-04 17:24:06 +03:00
Viktor Barzin
d57c6fc5a5 state(isponsorblocktv): update encrypted state 2026-04-04 17:13:03 +03:00
Viktor Barzin
80bd7292e0 state(hackmd): update encrypted state 2026-04-04 17:12:44 +03:00
Viktor Barzin
a1c43936ea state(f1-stream): update encrypted state 2026-04-04 17:12:24 +03:00
Viktor Barzin
1fcef591bb state(excalidraw): update encrypted state 2026-04-04 17:12:02 +03:00
Viktor Barzin
c131a50a32 state(diun): update encrypted state 2026-04-04 17:11:41 +03:00
Viktor Barzin
7bab4ead12 state(changedetection): update encrypted state 2026-04-04 17:11:22 +03:00
Viktor Barzin
c2584f4cdc state(affine): update encrypted state 2026-04-04 17:11:17 +03:00
Viktor Barzin
b98dcaef36 state(nextcloud): update encrypted state 2026-04-04 17:06:04 +03:00
Viktor Barzin
2667a19999 state(nextcloud): update encrypted state 2026-04-04 17:02:26 +03:00
Viktor Barzin
bc4fb5da8f state(nextcloud): update encrypted state 2026-04-04 16:38:09 +03:00
Viktor Barzin
2d5c55f7b1 docs: add storage class decision rule to CLAUDE.md
Default to proxmox-lvm for all new services. NFS only for RWX,
backup destinations, or shared media libraries. Updated iSCSI
backup section to reflect proxmox-lvm migration.
2026-04-04 16:35:12 +03:00
Viktor Barzin
ee39dd2fc9 feat(storage): migrate 12 SQLite NFS PVCs to proxmox-lvm (Wave 1)
Add proxmox-lvm PVCs with pvc-autoresizer annotations for all
SQLite-backed services. Deployments updated to use new block storage
PVCs. Old NFS modules retained for 1-week rollback.

Services: ntfy, freshrss, insta2spotify, actualbudget (x3),
wealthfolio, navidrome (DB only), audiobookshelf config,
headscale, forgejo, uptime-kuma.

Also: set Recreate strategy on ntfy, forgejo, insta2spotify,
wealthfolio (required for RWO volumes).
2026-04-04 16:26:59 +03:00
Viktor Barzin
792da5c066 state(platform): update encrypted state 2026-04-04 16:17:16 +03:00
Viktor Barzin
9ea29160d7 state(uptime-kuma): update encrypted state 2026-04-04 16:17:03 +03:00
Viktor Barzin
e0ca0edd51 state(forgejo): update encrypted state 2026-04-04 16:15:43 +03:00
Viktor Barzin
1f2ab8b547 state(headscale): update encrypted state 2026-04-04 16:15:25 +03:00
Viktor Barzin
afd22150f6 state(proxmox-csi): update encrypted state 2026-04-04 16:13:31 +03:00
Viktor Barzin
f48e400087 state(vault): update encrypted state 2026-04-04 16:10:25 +03:00
Viktor Barzin
d07314c0df state(navidrome): update encrypted state 2026-04-04 16:09:35 +03:00
Viktor Barzin
8b9ae390eb state(wealthfolio): update encrypted state 2026-04-04 16:08:57 +03:00
Viktor Barzin
afd4c78cd7 state(actualbudget): update encrypted state 2026-04-04 16:08:18 +03:00
Viktor Barzin
6ec3774aa9 state(freshrss): update encrypted state 2026-04-04 16:02:50 +03:00
Viktor Barzin
56342aedcc state(ntfy): update encrypted state 2026-04-04 15:59:53 +03:00
Viktor Barzin
c422aa2ef6 state(nextcloud): update encrypted state 2026-04-04 15:30:29 +03:00
Viktor Barzin
3d3759ea2f fix: disable cert-manager webhook for pvc-autoresizer, use self-signed cert [ci skip]
Cluster doesn't have cert-manager installed. Use self-signed certificate
for the controller and disable the PVC mutating webhook (annotations are
set directly on PVCs via Terraform).
2026-04-03 23:44:49 +03:00
Viktor Barzin
ce7b8c2b2e add pvc-autoresizer for automatic PVC expansion before volumes fill up [ci skip]
Deploy topolvm/pvc-autoresizer controller that monitors kubelet_volume_stats
via Prometheus and auto-expands annotated PVCs. Annotated all 9 block-storage
PVCs (proxmox-lvm) with per-PVC thresholds and max limits. Updated PVFillingUp
alert to critical/10m (means auto-expansion failed) and added PVAutoExpanding
info alert at 80%.
2026-04-03 23:30:00 +03:00
Viktor Barzin
b2cac8cc97 add proxmox-csi cleanup TODO for post-migration tasks [ci skip] 2026-04-03 20:02:14 +03:00
Viktor Barzin
d49acebd8e migrate ebooks-calibre to proxmox-lvm, update storage docs [ci skip]
- Migrate ebooks-calibre-config-iscsi (2Gi, 2380 files) to proxmox-lvm
- Update docs/architecture/storage.md: document Proxmox CSI as primary
  block storage, mark democratic-csi iSCSI as deprecated
- Add full migration plan to docs/plans/
2026-04-03 19:45:34 +03:00
Viktor Barzin
ca57c8c15c state(uptime-kuma): update encrypted state 2026-04-03 15:11:53 +03:00
Viktor Barzin
8d8534df8a state(crowdsec): update encrypted state 2026-04-03 15:01:08 +03:00
Viktor Barzin
d86c62b6ef state(health): update encrypted state 2026-04-03 15:00:30 +03:00
Viktor Barzin
b49e9d7e69 state(woodpecker): update encrypted state 2026-04-03 14:59:49 +03:00
Viktor Barzin
dd59512153 migrate iSCSI block volumes from democratic-csi to Proxmox CSI [ci skip]
Replace TrueNAS iSCSI (democratic-csi) with Proxmox CSI plugin for all
block storage PVCs. Eliminates double-CoW (ZFS + LVM-thin) and removes
the iSCSI network hop for database I/O.

New stack: stacks/proxmox-csi/ — deploys proxmox-csi-plugin Helm chart
with StorageClass "proxmox-lvm" using existing local-lvm thin pool.

Migrated PVCs (12 total):
- Phase 1 standalone: plotting-book, novelapp, vaultwarden, nextcloud, prometheus
- Phase 2 StatefulSets: CNPG PostgreSQL (2), MySQL InnoDB (3), Redis (2)

All services verified healthy post-migration.
2026-04-02 22:13:04 +03:00
Viktor Barzin
337da2184d add upstream fallback to containerd registry mirrors
When the pull-through proxy (10.0.20.10) is down, containerd now falls
back to the official upstream registries (registry-1.docker.io, ghcr.io)
instead of failing. Also cleans up stale disabled registry mirror dirs
and removes unnecessary containerd restart from the rollout script.
2026-04-02 11:05:30 +03:00
Viktor Barzin
2d8aa5ed89 docs: update hardware inventory for R730 RAM upgrade to 272GB
Upgraded from 144GB (4x32G + 2x8G) to 272GB (8x32G + 2x8G) DDR4-2400.
Added physical DIMM slot diagram, channel layout, and BIOS speed override
notes. Updated compute architecture with correct CPU (single socket),
VM memory values, and capacity figures.
2026-04-02 00:48:13 +03:00
Viktor Barzin
87c858f026 state(platform): update encrypted state 2026-04-01 20:08:32 +03:00
Viktor Barzin
5af6558935 state(platform): update encrypted state 2026-04-01 20:08:29 +03:00
Viktor Barzin
c7369d8a2b state(platform): update encrypted state 2026-04-01 20:07:42 +03:00
Viktor Barzin
d1059d6017 registry: set proxy TTL to 0 to prevent stale :latest images
Blob caching (content-addressed by SHA256) is unaffected — only manifest
re-validation changes. Every pull now checks upstream for the current
manifest digest, eliminating stale :latest tag issues.
2026-03-30 00:02:48 +03:00
Viktor Barzin
28587c674d fix-broken-blobs: use argparse for proper flag handling
--dry-run as first arg was being parsed as the BASE directory path.
2026-03-29 22:33:33 +03:00
Viktor Barzin
dd461beb33 add registry blob integrity checker to self-heal corrupted cache
The cleanup-tags.sh + garbage-collect cycle can delete blob data while
leaving _layers/ link files intact. The registry then returns HTTP 200
with 0 bytes for those layers, causing "unexpected EOF" on image pulls.

fix-broken-blobs.sh walks all repositories, checks each layer link
against actual blob data, and removes orphaned links so the registry
re-fetches from upstream on next pull.

Schedule: daily at 2:30am (after tag cleanup) and Sunday 3:30am
(after garbage collection). First run found 2335/2556 (91%) of
layer links were orphaned.
2026-03-29 22:31:39 +03:00
Viktor Barzin
facf959ecf fix registry healthchecks: use 127.0.0.1 instead of localhost
localhost resolves to IPv6 ::1 but containers bind to 0.0.0.0 (IPv4
only), causing wget to fail with "Connection refused". The nginx
proxy had 18,462 consecutive health check failures because of this.

Also cleared corrupted pull-through cache for mghee/novelapp — the
registry had layer link files pointing to non-existent blob data,
causing containerd to get 200 responses with 0 bytes (unexpected EOF).
2026-03-29 22:29:27 +03:00
Viktor Barzin
a2b1b0e817 remove caretta network mapper to free 3Gi cluster memory
Caretta eBPF DaemonSet was using 600Mi x 5 nodes = 3Gi total for
non-critical network topology visualization. Removing it to free
memory for novelapp and aiostreams which were stuck in Pending.
2026-03-29 22:17:35 +03:00
Viktor Barzin
b27b508f10 state(terminal): update encrypted state 2026-03-29 21:45:49 +03:00
Viktor Barzin
7ad01661f0 novelapp: migrate NEXTAUTH env vars to Auth.js v5 (AUTH_*)
Replace NEXTAUTH_URL/NEXTAUTH_SECRET with AUTH_URL/AUTH_SECRET and add
AUTH_TRUST_HOST=true for Auth.js v5 compatibility.
2026-03-29 20:37:26 +03:00
Viktor Barzin
c71a784e1c state(novelapp): update encrypted state 2026-03-29 20:37:16 +03:00
Viktor Barzin
8bf83147db add SLACK_WEBHOOK_URL env var to book-search deployment 2026-03-29 13:53:24 +03:00
Viktor Barzin
10f22350c5 exclude frigate, audiblez, ollama, real-estate-crawler from Synology backup [ci skip]
Expanded cloud sync excludes to reduce sync time and Synology disk usage.
All excluded data is either regenerable or low-value.
TrueNAS Task 1 and incremental script already updated live.
2026-03-29 13:44:32 +03:00
Viktor Barzin
78eff9ab11 fix: bump book-search memory to 512Mi for file upload/email [ci skip]
Downloads and sends ebook files via HTTP — needs more than 128Mi
for large PDFs. Applied live via kubectl, persisting in Terraform.
2026-03-29 13:24:19 +03:00