Compare commits

...

2 commits

Author SHA1 Message Date
Viktor Barzin
febf12bddd mail(tripit): send From: plans@viktorbarzin.me instead of spam@
Some checks failed
ci/woodpecker/push/default Pipeline failed
ci/woodpecker/push/build-cli Pipeline was successful
tripit outbound (linked-email verification + trip-share invites) was sent
From: spam@viktorbarzin.me. Switch the From to plans@viktorbarzin.me while
keeping SMTP auth as spam@ (its password, unchanged).

docker-mailserver SPOOF_PROTECTION (reject_sender_login_mismatch) requires
the authed login to "own" the From; the @viktorbarzin.me catch-all does NOT
grant that per-address, so add an explicit `plans@ -> spam@` virtual alias to
authorize it (also keeps inbound plans@ routing to spam@ for the mail-ingest
poller). tripit SMTP_FROM flips to plans@.

Verified: sender-login probe (auth spam@, MAIL FROM plans@) now 250 (was 553);
a real send from the tripit pod logs from=<plans@viktorbarzin.me> accepted.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 18:41:08 +00:00
Viktor Barzin
bc33cd5ac4 monitoring: NodeFilesystemFull 90%->95% + Synology storage runbook
The Synology offsite backup target (/mnt/synology-backup, surfaced via
the PVE host NFS mount) sits at ~94% by design and was firing
NodeFilesystemFull continuously. Per user request, raise the threshold
to 95% (<5% free). NOTE: NodeFilesystemFull is a global node-filesystem
rule, so this also loosens the warning on k8s node/system disks;
BackupDiskFull (sda /mnt/backup) stays at 85%.

Also adds docs/runbooks/synology-storage.md: how to assess Synology
usage WITHOUT du (Storage Analyzer weekly CSVs, df/btrfs/qgroup),
btrfs async/snapshot-pinned reclaim, the 2026-06-05 capacity assessment
(94% full; Backup share 4.42TiB), and ~500GiB of homelab cleanup
candidates (redundant gphotos Takeout, old laptop VM images, archives).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 18:18:31 +00:00
4 changed files with 145 additions and 7 deletions

View file

@ -0,0 +1,127 @@
# Runbook: Synology NAS storage — navigate, assess, clean
**Target:** Synology DS218 (`NAS_Barzini`), `192.168.1.13`, `/volume1`
(5.3 TiB btrfs). This is the **offsite backup target** (Copy 3 of the
3-2-1 strategy) **and a shared family volume** — homelab data is only
under `Backup/Viki/`; `Anca/`, `Emo/`, `Common/`, `music`, `video`,
`photo` etc. are family data.
Related: [storage architecture](../architecture/storage.md) ·
[backup & DR](../architecture/backup-dr.md)
## Access
- SSH: `ssh Administrator@192.168.1.13` (capital `A`; key-auth works
from devvm and the PVE host). `Administrator` can `sudo`.
- sudo password: Vault `secret/viktor``synology_admin_password`
(`VAULT_ADDR=https://vault.viktorbarzin.me`). DSM Web API has 2FA, so
**SSH+sudo is the only unattended path** (`read -r PW; printf '%s\n'
"$PW" | sudo -S -p '' <cmd>` to keep the secret out of `argv`).
## ⚠️ NEVER run `du` / `find` / `ncdu` on this NAS
Recursive walks over the multi-TB `Backup` share take 10+ min (often
never finish) and burn disk/IO on the NAS. Use Synology's own
pre-indexed data instead:
| Need | Instant, non-walking source |
|---|---|
| Volume fill | `df -h /volume1` |
| btrfs real usage | `btrfs filesystem df /volume1` |
| Per-subvolume | `sudo btrfs qgroup show -prce --raw /volume1` |
| **Per-share / per-owner / per-type / largest / oldest / dupes** | **Storage Analyzer weekly report** (below) |
### Storage Analyzer weekly report
Storage Analyzer is installed and writes a report every **Monday
~00:00** to:
```
/volume1/Backup/Viki/synoreport/weekly storage report/<YYYY-MM-DD_..>/
```
Data is up to ~7 days stale. The useful files are zipped CSVs in
`csv/`**content is UTF-16, and there is no `unzip` on the box**, so
read them with Python:
```python
import zipfile, os
R=".../<date>/csv"
def readcsv(n):
z=zipfile.ZipFile(os.path.join(R,n)); raw=z.read(z.namelist()[0])
for enc in ("utf-16","utf-8-sig","utf-8"):
try: return raw.decode(enc)
except Exception: pass
```
Key CSVs: `volume_usage`, `share_list` (per-share, incl/excl recycle),
`quota_usage.share` (**per-owner within a share**), `file_group`
(per-file-type), `large_file`, `least_modify` (oldest), `duplicate_file`.
The `*.db` files (`folder.db` etc.) are a **custom Synology format —
NOT sqlite**; `report.html` does not embed clean folder totals.
## btrfs space-reclaim is ASYNCHRONOUS — and snapshot-pinned
- Deleting files/snapshots returns instantly but `df` lags minutes
while the btrfs cleaner reclaims extents (~30 GB/min on the DS218).
- Data deleted from the live share **stays on disk until the share
snapshots that still reference it also rotate out.** There are 4
daily `Backup` share snapshots (`GMT-*-21.00.02`), so **expect up to
~4 days of lag** before a delete fully frees space.
- Snapshot CLI (sudo, full path): `/usr/syno/sbin/synosharesnapshot
{list|delete} Backup <snap>...`. Retention:
`/usr/syno/etc/sharesnap/sharesnap.conf`.
## Capacity alert
The Synology mount surfaces to Prometheus as the PVE host NFS mount
`/mnt/synology-backup` (`job="proxmox-host"`, `fstype=nfs4`), caught by
the **global `NodeFilesystemFull`** rule in
`stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl`.
- **2026-06-05:** threshold changed **90% → 95%** (`* 100 < 5`) at
user request — a backup target legitimately runs hot, so 90% was
noisy. NOTE: this rule is **global**, so the looser 95% now applies to
all node/system disks too. `BackupDiskFull` (the sda `/mnt/backup`
disk, separate alert) stays at 85%.
## Current assessment — 2026-06-05
`/volume1` at **94% (5.0 TiB used / 5.3 TiB, 324 GiB free)**, down from
98% on 2026-05-24. The **`Backup` share is 4.42 TiB (86%)**:
Administrator/homelab **3.92 TiB**, Emo/family **504 GiB**. By type:
Other 1.76 TiB, Videos 1.33 TiB, Pictures 631 GiB, Zipped 495 GiB,
DiskImage 77 GiB. The ~1.9 TiB of media is mostly the **Immich offsite
backup** (`Viki/nfs/immich` + `nfs-ssd/immich`), which **grows daily —
the structural capacity driver now that one-off cleanups are spent.**
### Already reclaimed (verified gone)
`Anca/Elements` (770 GiB — dir now empty), `prometheus-backup` (63 GiB),
`ollama`/`llamacpp`/`audiblez`/`ebook2audiobook` — removed in the
2026-06-01 cleanup; nfs-mirror now excludes the regenerable services.
### Cleanup candidates — homelab (`Backup/Viki/`, Administrator-owned)
| Target | Size | Notes |
|---|---|---|
| `Photos/gphotos-1/` | **208 GiB** zips (+ extracted) | 2023 Google Takeout, **already imported to Immich** (`immich-go.exe` beside them; dupes confirmed). Redundant. |
| `laptop/` | ~167 GiB | old VM images (Kali/windows vdis, metasploitable, soton-rpi.img) |
| `All-in-one/` | ~95 GiB | 20152018 archives |
| `#recycle/` (Backup) | ~16 GiB | recycle bin (HA backup rotation) |
| loose `*.asc`/`*.mov` in `Viki/` root | ~8 GiB | old encrypted archives, phone videos |
| `sgs7/` | ~3.5 GiB | 2021 Galaxy S7 backup |
**~500 GiB** reclaimable without touching live backups or family data.
### Cleanup candidates — family (flag to Emo, do not delete)
- `Emo/D/` Windows 7 vmdks — **3 identical 39.5 GiB copies** (one live +
two under `_SYNCAPP/Versioning/`) → 79 GiB dedup.
- Emo-shared recycle bin: 12.6 GiB.
### Do NOT touch
`Viki/pve-backup/` (live structured backup), `Viki/nfs/immich` +
`nfs-ssd/immich` (irreplaceable), `HomeAssistant/` + `ha_backup_vermont/`
(~7 GiB, healthy 3-copy retention).

View file

@ -3,3 +3,10 @@ closely-keith-generated@viktorbarzin.me vbarzin@gmail.com
literally-paolo-generated@viktorbarzin.me viktorbarzin@fb.com
hastily-stefanie-generated@viktorbarzin.me elliestamenova@gmail.com
vaultwarden@viktorbarzin.me me@viktorbarzin.me
# plans@ -> spam@: authorizes tripit (SMTP-authed as spam@) to send mail
# From: plans@viktorbarzin.me under docker-mailserver SPOOF_PROTECTION (the
# smtpd_sender_login_maps union exact-matches this alias to spam@; the @domain
# catch-all does NOT, so an explicit entry is required). Also keeps inbound
# plans@ routing to spam@ for the tripit mail-ingest poller.
plans@viktorbarzin.me spam@viktorbarzin.me

View file

@ -1257,12 +1257,12 @@ serverFiles:
- name: Storage
rules:
- alert: NodeFilesystemFull
expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.*"} / node_filesystem_size_bytes) * 100 < 10
expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.*"} / node_filesystem_size_bytes) * 100 < 5
for: 15m
labels:
severity: warning
annotations:
summary: "Disk {{ $labels.mountpoint }} on {{ $labels.instance }}: {{ $value | printf \"%.1f\" }}% free (threshold: 10%)"
summary: "Disk {{ $labels.mountpoint }} on {{ $labels.instance }}: {{ $value | printf \"%.1f\" }}% free (threshold: 5%)"
# PVAutoExpanding removed — was info-only at >80% used, but
# pvc-autoresizer's threshold is 10% free (= 90% used), so the
# alert always fired ~10 percentage points before any action

View file

@ -38,15 +38,19 @@ locals {
PUSH_PROVIDER = "webpush"
LLM_MODE = "fake"
MAIL_INGEST_ENABLED = "false"
# Outbound mail for linked-email verification submitted via the cluster
# mailserver as spam@ (which relays out via Brevo). SMTP_PASSWORD comes from
# tripit-secrets (mapped to the existing PLANS_IMAP_PASSWORD). PUBLIC_BASE_URL
# builds the confirmation link mailed to the address.
# Outbound mail (linked-email verification + trip-share invites) submitted
# via the cluster mailserver authenticated as spam@ (SMTP_USER), but sent
# From: plans@viktorbarzin.me (SMTP_FROM). docker-mailserver SPOOF_PROTECTION
# requires the login to "own" the From; an explicit plans@ -> spam@ virtual
# alias grants that (see mailserver extra/aliases.txt) and keeps inbound
# plans@ routing to spam@. Relays out via Brevo. SMTP_PASSWORD comes from
# tripit-secrets (the existing PLANS_IMAP_PASSWORD = spam@'s password).
# PUBLIC_BASE_URL builds the links mailed to recipients.
EMAIL_PROVIDER = "smtp"
SMTP_HOST = "mailserver.mailserver.svc"
SMTP_PORT = "587"
SMTP_USER = "spam@viktorbarzin.me"
SMTP_FROM = "spam@viktorbarzin.me"
SMTP_FROM = "plans@viktorbarzin.me"
PUBLIC_BASE_URL = "https://tripit.viktorbarzin.me"
}
}