Commit graph

10 commits

Author SHA1 Message Date
Viktor Barzin
1613003d00 upgrade: vaultwarden 1.35.4 -> 1.35.7
Security fixes (1.35.5): 3 CVEs — org vault purge by unconfirmed owner
(GHSA-937x-3j8m-7w7p), cross-org group binding unauthorized access
(GHSA-569v-845w-g82p), refresh tokens not invalidated on stamp rotation
(GHSA-6j4w-g4jh-xjfx). 2FA remember tokens now max 30 days.
1.35.6: Fix 2FA remember tokens broken in 1.35.5.
1.35.7: Fix 2FA for Android.

Risk: SAFE (patch bump, no breaking changes)
DB backup: yes (job: pre-upgrade-vaultwarden-1776280439, SQLite, 7 MiB)
Config changes applied: none
Flagged for manual review: none

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-15 19:14:21 +00:00
Viktor Barzin
82b0f6c4cb truenas deprecation: migrate all non-immich storage to proxmox NFS
- Migrate 7 backup CronJobs to Proxmox host NFS (192.168.1.127)
  (etcd, mysql, postgresql, nextcloud, redis, vaultwarden, plotting-book)
- Migrate headscale backup, ebook2audiobook, osm_routing to Proxmox NFS
- Migrate servarr (lidarr, readarr, soulseek) NFS refs to Proxmox
- Remove 79 orphaned TrueNAS NFS module declarations from 49 stacks
- Delete stacks/platform/modules/ (27 dead module copies, 65MB)
- Update nfs-truenas StorageClass to point to Proxmox (192.168.1.127)
- Remove iscsi DNS record from config.tfvars
- Fix woodpecker persistence config and alertmanager PV

Only Immich (8 PVCs, ~1.4TB) remains on TrueNAS.
2026-04-12 14:35:39 +01:00
Viktor Barzin
ce7b8c2b2e add pvc-autoresizer for automatic PVC expansion before volumes fill up [ci skip]
Deploy topolvm/pvc-autoresizer controller that monitors kubelet_volume_stats
via Prometheus and auto-expands annotated PVCs. Annotated all 9 block-storage
PVCs (proxmox-lvm) with per-PVC thresholds and max limits. Updated PVFillingUp
alert to critical/10m (means auto-expansion failed) and added PVAutoExpanding
info alert at 80%.
2026-04-03 23:30:00 +03:00
Viktor Barzin
dd59512153 migrate iSCSI block volumes from democratic-csi to Proxmox CSI [ci skip]
Replace TrueNAS iSCSI (democratic-csi) with Proxmox CSI plugin for all
block storage PVCs. Eliminates double-CoW (ZFS + LVM-thin) and removes
the iSCSI network hop for database I/O.

New stack: stacks/proxmox-csi/ — deploys proxmox-csi-plugin Helm chart
with StorageClass "proxmox-lvm" using existing local-lvm thin pool.

Migrated PVCs (12 total):
- Phase 1 standalone: plotting-book, novelapp, vaultwarden, nextcloud, prometheus
- Phase 2 StatefulSets: CNPG PostgreSQL (2), MySQL InnoDB (3), Redis (2)

All services verified healthy post-migration.
2026-04-02 22:13:04 +03:00
Viktor Barzin
d20c5e5535 add backup_output_bytes metric and cloudsync_transferred_bytes to backup dashboard
- All 7 backup CronJobs now push backup_output_bytes (file size after backup)
- Cloud Sync monitor parses rclone transfer stats into cloudsync_transferred_bytes
- Grafana dashboard: new Output (MiB) table column, Output Size Trend panel,
  Write Throughput panel, Cloud Sync Transfer Volume bargauge
- All timeseries panels use points-only draw style (discrete backup snapshots)
- etcd backup restructured: init_container for etcdctl (distroless image),
  busybox sidecar for metrics push + purge, ClusterFirstWithHostNet DNS
- Fixed pre-existing curl missing in postgres:16.4-bullseye (immich, dbaas PG)
- Fixed grep -oP not available in alpine/busybox (cloud sync monitor)
2026-03-25 10:44:53 +02:00
Viktor Barzin
a95d434ff1 fix backup IO stats: use /proc/$$/io instead of /proc/self/io
/proc/self/io inside $(awk ...) resolves to the awk subprocess PID,
not the parent bash shell. Use $$ (bash PID) to read the correct
process IO counters.
2026-03-23 12:33:52 +02:00
Viktor Barzin
0a294a30a6 add backup IO logging, Pushgateway metrics, and Grafana dashboard
- Add /proc/self/io read/write tracking to vault raft-backup and etcd backup
- Push backup_duration_seconds, backup_read_bytes, backup_written_bytes,
  backup_last_success_timestamp to Pushgateway from all 6 backup CronJobs
  (etcd skipped — distroless image has no wget/curl)
- Add cloudsync_duration_seconds metric to cloudsync-monitor
- New "Backup Health" Grafana dashboard with 8 panels: time since last backup,
  overview table, duration/IO trends, cloud sync status, alerts, CronJob schedule
2026-03-23 12:19:01 +02:00
Viktor Barzin
311ff5dd9e add hourly SQLite integrity check for vaultwarden with Prometheus alerting
- New CronJob runs PRAGMA integrity_check every hour
- Pushes vaultwarden_sqlite_integrity_ok metric to Prometheus pushgateway
- VaultwardenSQLiteCorrupt alert fires immediately on corruption (critical)
- VaultwardenIntegrityCheckStale alert if check hasn't run in 2h (warning)
- Prevents running for days on a corrupted DB unnoticed
2026-03-23 00:50:15 +02:00
Viktor Barzin
a44f35bcf8 harden vaultwarden iSCSI storage and increase backup frequency
- Increase backup from daily to every 6 hours (0 */6 * * *)
- Add pre/post-flight SQLite integrity checks to backup job
- Harden iSCSI on all nodes: increase recovery timeout (300s),
  enable CRC32C data/header digests for bit-flip detection
- Fix restore runbook PVC name (vaultwarden-data-iscsi)

Motivated by SQLite corruption from iSCSI I/O errors.
2026-03-23 00:36:11 +02:00
Viktor Barzin
73511b1230 extract remaining 19 modules from platform, complete stack split [ci skip]
Phase 3: all 27 platform modules now run as independent stacks.
Platform reduced to empty shell (outputs only) for backward compat
with 72 app stacks that declare dependency "platform".
Fixed technitium cross-module dashboard reference by copying file.
Woodpecker pipeline applies all 27+1 stacks in parallel via loop.
All applied with zero destroys.
2026-03-17 21:42:16 +00:00