Compare commits

...

5 commits

Author SHA1 Message Date
Viktor Barzin
5cdac421c2 forgejo: pin to v11.0.14 + disable Keel (image-rewrite incident 2026-05-24)
Some checks failed
ci/woodpecker/push/build-cli Pipeline failed
ci/woodpecker/push/default Pipeline was successful
On 2026-05-24T15:35:37Z Keel's force-policy rewrote the image tag from
`11.0.14 → 1.18` (codeberg.org/forgejo/forgejo). v1.18 is a Gitea-era
Forgejo (Forgejo forked from Gitea at 1.18 and used pre-Forgejo
versioning early on); the DB had already been migrated to schema 305
by 11.0.14, and 1.18 only knows up to migration 231 → pod refused to
start ("Your database (migration version: 305) is for a newer Gitea,
you can not use the newer database for this old Gitea release (231)").
Exact replay of the 2026-05-16 force-policy tag-rewriting bug
(memory id=1933).

Changes:
- Pin image to explicit `:11.0.14` (latest 11.x, published 2026-05-12)
- Add `keel.sh/policy: "never"` deploy annotation — overrides the
  Kyverno-stamped `force` policy via the chart's `+()` anchor semantics
  (memory id=1972). Keel will no longer touch this workload.
- Drop KEEL_IGNORE_IMAGE from `lifecycle.ignore_changes` (TF owns the
  image now). Restore it if you flip Keel back to `force`.
- Add the KEEL_LIFECYCLE_V1 trio (`kubernetes.io/change-cause`,
  `deployment.kubernetes.io/revision`, `keel.sh/update-time` on the
  pod template) so future TF applies don't fight K8s rollout metadata.

Verified: new pod on v11.0.14 came up Running 1/1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-24 22:06:59 +00:00
Viktor Barzin
5a0e4b3dac f1-stream: revive aceztrims + pitsport, more ppv variants
- aceztrims: scrape /f11/ (the actual stream page), not /f1/ (the
  cross-sport schedule). Drop the dead /iframe1?s= + onclick m3u8
  regexes (site moved to `getElementById('iframe').src = '...'` ~20
  channels ago). Strip HTML comments first so the ~20 legacy buttons
  kept inside <!-- ... --> stop showing up as false positives.
  Also pick up the default inline <iframe id='iframe' src='...'>.
  Local run: 11 channels (was 0).

- pitsport: decode the RSC payload before regex-matching in
  _parse_live_events (raw HTML had it escape-encoded, so the homepage
  card path was silently 0). Add the new /live-now route (canonical
  what's-live-right-now list). Add "f1" to MOTORSPORT_CATEGORIES — the
  site labels Formula 1 events as just "F1". Refresh the stale
  serveplay.site docstring (host rotates; pushembdz's api/stream link
  is authoritative).
  Local run: 7 m3u8 streams covering Canadian GP (EN1/EN2/MULTI/ITA/ESP)
  + NASCAR Coke 600 (was 0).

- ppv: always emit the parent embed alongside substreams (was dropping
  it whenever substreams existed). Prefer source_tag in substream titles
  so users see "Sky Sport 1 NZ" / "Apple TV (F1TV)" instead of generic
  #1/#2 suffixes.

Diagnosed against the live cluster (curated + 7 other extractors
returning 0 cached streams, only 2 dead hmembeds curated 24/7 channels
visible to users). Each fix verified with the extractor run against
live sites this turn.
2026-05-24 22:05:37 +00:00
Viktor Barzin
d5f73ce109 backup: exclude /anca-elements/ from nfs-mirror + offsite Step 1
Anca's photos are being ingested into Immich (started 2026-05-24
afternoon), so /srv/nfs/immich/library/ becomes the canonical copy
for those photos. The separate /srv/nfs/anca-elements/ archive tree
+ its sda mirror at /mnt/backup/anca-elements/ are now redundant.

Going forward:
- nfs-mirror EXCLUDES /anca-elements/ so future weekly runs don't
  re-touch the 771G subtree (also no longer required since Immich
  has the data via its NFS library).
- offsite-sync Step 1 also excludes /anca-elements/ — the historical
  771G under /mnt/backup/anca-elements/ stays on sda for now but is
  NOT shipped to Synology pve-backup/ (Immich's library reaches
  Synology via Step 2 bypass leg anyway).

The 771G on /mnt/backup/anca-elements/ will be cleaned up manually
once Immich ingest completes and we verify all photos are in the
Immich library. Same for /srv/nfs/anca-elements/ on sdc thin pool —
freeing both would reclaim ~1.5 TB across sdc + sda.

In-flight context: today's nfs-mirror first run was killed mid-flight
at ~70% (was at /srv/nfs/postgresql/). The killed run wrote ~200G of
service NFS subtrees to /mnt/backup/<svc>/, then sda hit 95% used,
prompting this change. Next nfs-mirror run will not touch
anca-elements and will fit comfortably (~250G total for the keep-list
minus anca-elements).
2026-05-24 18:34:41 +00:00
Viktor Barzin
c948dc0dbe backup pipeline: flock manifest + cap + drop LAN -z
Three more audit fixes from the 2026-05-24 backup-pipeline review:

#5 (S1 race) — manifest flock
  daily-backup and nfs-mirror both append to /mnt/backup/.changed-files.
  If they overlap (nfs-mirror Mon 04:11 running long, daily-backup
  starting Mon 05:00), concurrent appends from `find | tee` and
  `find | sed >>` could interleave mid-line — partial paths would slip
  past rsync's --files-from. Both scripts now share a manifest_append()
  helper using `flock -x` on /mnt/backup/.changed-files.lock. The 4
  daily-backup call sites + the 1 nfs-mirror call site all pipe through
  it instead of redirecting directly.

#7 (S2 unbounded manifest)
  daily-backup gains check_manifest_size() invoked after the PVE-config
  append (the last manifest writer of the run). Above MANIFEST_MAX_LINES
  (500k) it touches /mnt/backup/.force-full-sync — offsite-sync's Step 1
  now treats that flag the same as day-of-month ≤ 7 (full sync with
  --delete) and clears it on success. Catches the "Synology unreachable
  for many days" edge case where the manifest would grow unbounded.

#9 (wear — drop -z on LAN hops)
  offsite-sync rsync calls to Synology over the same 192.168.1.0/24
  gigabit LAN had `-rltz`. Compression burns CPU on the PVE host (already
  IO-busy) and gives nothing on a saturated GigE link. Dropped to `-rlt`
  on all 5 offsite rsync invocations (Step 1 full + Step 1 incremental +
  Step 2 full nfs + Step 2 full nfs-ssd + Step 2 incremental).

Other adjustments:
- nfs-mirror's find-after-rsync now also excludes the new state files
  (.changed-files.lock, .force-full-sync) when populating the manifest.
- offsite-sync Step 1 full-sync excludes the same .force-full-sync flag
  so it doesn't ship to Synology.

Deployed to PVE host (/usr/local/bin/{daily-backup,nfs-mirror,
offsite-sync-backup}). Currently in-flight nfs-mirror run is unaffected
(bash loaded the old script into memory at start). Next runs use the
new behaviour.

Refs: 2026-05-24 audit Section 2 items #1 (manifest race), #4 (unbounded
manifest), #6 (LAN -z wear).
2026-05-24 16:27:42 +00:00
Viktor Barzin
4798583db7 backup pipeline: S1 fixes from 2026-05-24 audit
Three immediate fixes surfaced by the backup-pipeline audit:

1. **S1 silent-loss race fix** (daily-backup.sh:142): remove the
   `> "${MANIFEST}"` truncation at the start of daily-backup. Truncation
   already lives in offsite-sync-backup at line 159, gated on a successful
   sync. With both scripts truncating, an offsite-sync failure followed by
   the next morning's daily-backup would silently wipe yesterday's
   unconsumed manifest entries — those files would only reach Synology
   via the monthly full sync (1st-7th of month). Now only offsite-sync
   truncates, and only on success.

2. **Missing alert OffsiteBackupSyncFailing**: documented in backup-dr.md
   but was never added to prometheus_chart_values.tpl. Step 1 or Step 2
   failure pushes offsite_sync_last_status=1 but nothing read it. Added.

3. **wear: drop `-z` from local-only rsyncs** (daily-backup.sh:218 PVC
   snapshot rsync + line 347 /etc/pve sync). Both are local-to-sda
   transfers — compression wastes CPU and yields nothing (gigabit local
   path, intermediate disk doesn't benefit).

Bonus cleanups (zero functional impact):
- "Weekly backup starting/complete" → "daily-backup starting/complete"
  (the timer is daily, not weekly — legacy from earlier monthly-rotation
  schedule).
- "--- Step 2: PVC file copy ---" → "Step 1:" (was numbered from 2 with no
  Step 1 above).
- **wear: pfSense full filesystem tar now Sunday-only** instead of daily.
  config.xml stays daily (it's the primary restore artifact and tiny).
  Full tar is forensic recovery only — re-tarring ~100MB+ daily writes
  ~3G/month to sda + Synology for unchanged content. Weekly is plenty.

docs/architecture/backup-dr.md: rewritten Overview + 3-2-1 breakdown to
reflect today's two-leg architecture; added a "2026-05-24 session"
changelog summary at the top; added a "Synology snapshot management"
subsection with the sudo + `synosharesnapshot` recipe (DSM API is gated
by 2FA so this is the only programmatic path); updated Key Files table
with nfs-mirror + the Synology SSH access notes.

Open follow-ups from the audit (S2 — file as beads if pursued):
- Factor two-leg invariant into /etc/backup-skip-list.conf sourced by
  both nfs-mirror.sh and offsite-sync-backup.sh.
- Manifest write-collision flock between nfs-mirror Mon 04:11 and
  daily-backup Mon 05:00.
- Unbounded manifest cap (force full sync if > 500k lines).
- Synology free-space scraper + alert.
- LVM thin pool meta-pool fill alert.
- nfs-change-tracker.service heartbeat to Pushgateway.
- Synology config drift TF surface (snap retention, share defs).
2026-05-24 16:18:44 +00:00
9 changed files with 347 additions and 167 deletions

View file

@ -1,18 +1,34 @@
# Backup & Disaster Recovery Architecture
Last updated: 2026-04-13
Last updated: 2026-05-24
> **2026-05-24 session — what changed today** (deeper structural review pending — see the open backup-pipeline simplification audit):
> - **anca-elements archive direction inverted** — Synology `/Backup/Anca/Elements` (770G) deleted; PVE `/srv/nfs/anca-elements` is now source of truth. `anca-elements-sync.sh` retired.
> - **`anca-elements-mirror.{sh,service,timer}` retired**, subsumed into the new **`nfs-mirror`** weekly job covering all critical NFS subtrees (anca-elements + ~80 services) → sda.
> - **`offsite-sync-backup` Step 2 filter inverted**: NFS-direct-to-Synology now only carries the sda-bypass paths (immich + frigate + prometheus + `*-backup` + …). Two-leg invariant: `nfs-mirror.sh EXCLUDES``offsite-sync-backup Step 2 INCLUDES`. Cross-referenced in both scripts.
> - **Synology `/Backup/Viki/nfs/<svc>/` orphan cleanup** — 84 dirs renamed in-place (btrfs metadata-only) to `/Backup/Viki/pve-backup/<svc>/` so daily-incremental Step 1 sees them as pre-existing and only ships deltas. No re-transfer.
> - **Synology snapshot retention 7d → 3d**, all 8 backlog snapshots deleted via `sudo synosharesnapshot delete Backup ...`. Reclaimed ~800G btrfs (98% → 83% used). DSM API was blocked by 2FA; `sudo` over the existing `Administrator` SSH key worked with the Vault-stored password.
> - **Manifest mechanism extended**: `nfs-mirror` now appends its transferred file list to `/mnt/backup/.changed-files` so daily Step 1 incremental picks it up (was previously only fed by `daily-backup`).
## Overview
The homelab uses a defense-in-depth 3-2-1 backup strategy: **3 copies** (live PVCs on sdc, weekly backups on sda, offsite on Synology), **2 media types** (SSD thin LVM, HDD), **1 offsite copy** (Synology NAS). This architecture provides <1s RPO for recent changes (via 7-day LVM snapshots), <7d RPO for file-level recovery, and <30min RTO for most services.
The homelab runs a 3-2-1 strategy with a **two-leg** path to Synology so every NFS byte takes exactly one route to offsite (no duplication, no gaps):
```
sdc /srv/nfs/<svc>/ ──nfs-mirror weekly──→ sda /mnt/backup/<svc>/ ──offsite-sync Step 1──→ Synology /Backup/Viki/pve-backup/<svc>/ [leg 1]
sdc /srv/nfs/<bypass>/ ──inotify (nfs-change-tracker)──→ offsite-sync Step 2 ──→ Synology /Backup/Viki/nfs/<bypass>/ [leg 2]
sdc PVCs (LVM thin) ──daily-backup~snapshot~rsync──→ sda /mnt/backup/{pvc-data,sqlite-backup,pfsense,pve-config}/ ──Step 1──→ Synology /Backup/Viki/pve-backup/
```
The **bypass list** (paths that take leg 2 — too big for sda, transient, or already-a-backup): `immich`, `frigate`, `prometheus`, `loki`, `temp`, `alertmanager`, `ollama`, `audiblez`, `ebook2audiobook`, `*-backup`. Anything NOT in this list rides leg 1 via `nfs-mirror`.
**3-2-1 Breakdown**:
- **Copy 1** (live): All PVC data + VM disks on Proxmox sdc thin pool (10.7TB RAID1 HDD)
- **Copy 2** (local backup): Weekly file-level backup to sda `/mnt/backup` (1.1TB RAID1 SAS)
- **Copy 3** (offsite): Synology NAS at 192.168.1.13:
- `Synology/Backup/Viki/pve-backup/` — PVC snapshots, pfSense, PVE config (rsync from sda weekly)
- `Synology/Backup/Viki/nfs/` — NFS HDD data (inotify change-tracked rsync from `/srv/nfs`)
- `Synology/Backup/Viki/nfs-ssd/` — NFS SSD data (inotify change-tracked rsync from `/srv/nfs-ssd`)
- **Copy 1** (live): all PVC data + VM disks on Proxmox sdc thin pool (10.7TB RAID1 HDD); all NFS data at `/srv/nfs[-ssd]/`
- **Copy 2** (local backup): sda `/mnt/backup` (1.1TB RAID1 SAS) — at **~90% used** post-2026-05-24 (was ~10% in April)
- **Copy 3** (offsite): Synology NAS at 192.168.1.13 — at **~83% used / 934G free** post-2026-05-24 (was 98% / 121G before today's cleanup)
- `Synology/Backup/Viki/pve-backup/`sda contents (PVC backups + nfs-mirror output: ~90 service dirs)
- `Synology/Backup/Viki/nfs/`bypass-list NFS (immich, frigate, etc.)
- `Synology/Backup/Viki/nfs-ssd/`bypass-list SSD NFS (immich-ML, ollama, llamacpp)
## Architecture Diagram
@ -366,6 +382,38 @@ Pushes `nfs_mirror_last_run_timestamp` + `nfs_mirror_last_status` + `nfs_mirror_
> TrueNAS Cloud Sync was decommissioned along with TrueNAS (2026-04-13). The current offsite path is inotify-change-tracked rsync from the Proxmox host NFS (`/srv/nfs`, `/srv/nfs-ssd`) to Synology.
### Synology snapshot management
Synology DSM keeps daily btrfs snapshots of every shared folder (the `Backup` share most importantly). Retention is configured per-share in DSM's Snapshot Replication app, and persists in `synosharesnapshot shareconf`.
**Current settings** (`Backup` share, 2026-05-24): daily at 02:00, **`snap_auto_remove_keep_days=3`** (tightened from 7 to reduce the window where deleted data continues to consume space).
Snapshots are CoW — deleting a file from the live filesystem does NOT free its blocks while any retained snapshot references them. Reclaim only happens after ALL referencing snapshots roll off.
**DSM Web API is gated by 2FA (FIDO/OTP)** — programmatic snapshot management has to go via SSH + sudo instead:
```bash
# Password is in Vault: secret/viktor → synology_admin_password
PASS=$(VAULT_ADDR=https://vault.viktorbarzin.me vault kv get -field=synology_admin_password secret/viktor)
# List snapshots on the Backup share
ssh Administrator@192.168.1.13 "echo '$PASS' | sudo -S /usr/syno/sbin/synosharesnapshot list Backup"
# Bulk delete ALL snapshots (reclaims everything once btrfs cleaner runs)
ssh Administrator@192.168.1.13 "
SNAPS=\$(echo '$PASS' | sudo -S /usr/syno/sbin/synosharesnapshot list Backup 2>/dev/null \
| grep -oE 'GMT-[0-9]+\.[0-9]+\.[0-9]+-[0-9]+\.[0-9]+\.[0-9]+' | sort -u)
echo '$PASS' | sudo -S /usr/syno/sbin/synosharesnapshot delete Backup \$SNAPS
"
# Tighten retention
ssh Administrator@192.168.1.13 "echo '$PASS' | sudo -S /usr/syno/sbin/synosharesnapshot shareconf set Backup snap_auto_remove_keep_days=3"
```
The btrfs cleaner thread reclaims async — `df` may lag the snapshot-delete by minutes (typical reclaim rate observed 2026-05-24: ~300 MB/s sustained, with bursts of 800 GB in 2 minutes).
> Memory: id=2673-2676 (Synology snapshot retention gotcha — deletion vs reclaim timing).
## Configuration
### Key Files
@ -387,6 +435,8 @@ Pushes `nfs_mirror_last_run_timestamp` + `nfs_mirror_last_status` + `nfs_mirror_
| `stacks/vault/` | Terraform: Vault backup CronJob |
| `stacks/vaultwarden/` | Terraform: Vaultwarden backup + integrity CronJobs |
| `stacks/monitoring/` | Terraform: Prometheus alerts |
| `synology:Administrator@192.168.1.13` | Synology SSH; sudo password = Vault `secret/viktor` `synology_admin_password`; DSM API itself gated by 2FA |
| `/usr/syno/sbin/synosharesnapshot` | Synology: btrfs snapshot CLI — must run as root via sudo |
### Vault Paths

View file

@ -20,6 +20,34 @@ log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"; }
warn() { log "WARN: $*" >&2; }
die() { log "FATAL: $*" >&2; push_metrics 1 0; exit 1; }
# --- Manifest append helper ---
# Both daily-backup and nfs-mirror append to /mnt/backup/.changed-files.
# If their runs overlap (e.g. nfs-mirror Mon 04:11 still running when
# daily-backup starts Mon 05:00) the appends can interleave mid-line.
# `flock -x` on a sibling lock file makes appends atomic across processes.
MANIFEST_LOCK="${MANIFEST}.lock"
manifest_append() {
(
flock -x 200
cat >> "${MANIFEST}"
) 200>"${MANIFEST_LOCK}"
}
# Cap manifest size to prevent unbounded growth (e.g. Synology unreachable
# for many days, every daily-backup keeps appending). At >500k lines,
# `--files-from=` rsync becomes pathological — fall back to a full Step 1
# sync by signalling offsite-sync to ignore the manifest this round.
MANIFEST_MAX_LINES=500000
check_manifest_size() {
[ -f "${MANIFEST}" ] || return 0
local lines
lines=$(wc -l < "${MANIFEST}" 2>/dev/null || echo 0)
if [ "${lines:-0}" -gt "${MANIFEST_MAX_LINES}" ]; then
warn "manifest at ${lines} lines (>${MANIFEST_MAX_LINES}) — flagging next offsite-sync as full"
touch "${BACKUP_ROOT}/.force-full-sync"
fi
}
# --- Locking ---
# Track whether we got SIGTERM/SIGINT so cleanup can push a non-success metric.
# Without this, a systemd timeout-kill leaves WeeklyBackupFailing alerts blind:
@ -123,7 +151,7 @@ check_nfs_exports() {
}
# --- Main ---
log "=== Weekly backup starting ==="
log "=== daily-backup starting ==="
if ! mountpoint -q "${BACKUP_ROOT}"; then
die "${BACKUP_ROOT} is not mounted"
@ -138,16 +166,25 @@ check_nfs_exports || {
STATUS=0
TOTAL_BYTES=0
# Clear manifest for this run
> "${MANIFEST}"
# DO NOT truncate the manifest here.
#
# Truncation lives in offsite-sync-backup (only on successful sync). If
# offsite-sync failed yesterday — Synology unreachable, transient error —
# the manifest holds yesterday's unconsumed file list. Truncating at the
# start of today's daily-backup would silently lose those entries; they'd
# only reach Synology on the next monthly full sync.
#
# Appending duplicates across multiple runs is harmless — rsync transfers
# each file once. If the manifest grows pathologically (Synology down for
# weeks), the OffsiteBackupSync{Stale,Failing} alerts catch it.
# NFS data is synced directly to Synology via inotifywait + offsite-sync-backup.sh
# No NFS mirror step on sda — saves 53GB and eliminates duplication.
# NFS data is synced to Synology via two paths: nfs-mirror → sda → Step 1
# for the curated subset, and inotify + Step 2 for the sda-bypass list.
# ============================================================
# STEP 1: PVC file-level copy from LVM thin snapshots
# ============================================================
log "--- Step 2: PVC file copy from snapshots ---"
log "--- Step 1: PVC file copy from snapshots ---"
WEEK=$(date +%Y-%W)
PREV=$(ls -1d "${BACKUP_ROOT}/pvc-data"/????-?? 2>/dev/null | tail -1 || true)
@ -215,7 +252,7 @@ else
# (immich-postgres ~10 GiB, ~3 min on local ext4) and well
# below the unit-level budget so we still have headroom to
# finish the rest.
timeout 1800 rsync -az --delete \
timeout 1800 rsync -a --delete \
${PREV:+--link-dest="${PREV}/${ns_pvc}/"} \
"${PVC_MOUNT}/" "${dst}/" 2>&1 || rsync_rc=$?
if [ "$rsync_rc" -eq 0 ]; then
@ -274,10 +311,10 @@ else
log " PVC copy: ${PVC_COUNT} OK, ${PVC_FAIL} failed"
[ "${PVC_FAIL}" -gt 0 ] && STATUS=1
# Add PVC files to manifest
# Add PVC files to manifest (locked append)
if [ -d "${BACKUP_ROOT}/pvc-data/${WEEK}" ]; then
find "${BACKUP_ROOT}/pvc-data/${WEEK}" -type f 2>/dev/null | \
sed "s|^${BACKUP_ROOT}/||" >> "${MANIFEST}"
sed "s|^${BACKUP_ROOT}/||" | manifest_append
fi
# Prune old weekly versions (keep 4)
@ -301,23 +338,31 @@ if timeout 10 ssh -o BatchMode=yes -o ConnectTimeout=5 root@10.0.20.1 true 2>/de
# config.xml — primary restore artifact
if scp -o ConnectTimeout=10 root@10.0.20.1:/cf/conf/config.xml "${PFSENSE_DEST}/config-${DATE}.xml" 2>/dev/null; then
log " OK: config.xml"
echo "pfsense/config-${DATE}.xml" >> "${MANIFEST}"
echo "pfsense/config-${DATE}.xml" | manifest_append
else
warn "Failed to copy pfsense config.xml"
STATUS=1
PFSENSE_STATUS=1
fi
# Full filesystem tar
if ssh -o ConnectTimeout=10 root@10.0.20.1 \
"tar czf - --exclude=/dev --exclude=/proc --exclude=/tmp --exclude=/var/run /" \
> "${PFSENSE_DEST}/pfsense-full-${DATE}.tar.gz" 2>/dev/null; then
log " OK: full tar ($(du -sh "${PFSENSE_DEST}/pfsense-full-${DATE}.tar.gz" | cut -f1))"
echo "pfsense/pfsense-full-${DATE}.tar.gz" >> "${MANIFEST}"
# Full filesystem tar — Sundays only (weekly).
# config.xml is the primary restore artifact and runs daily above; the
# full filesystem tar is for forensic / package-state recovery only and
# rarely-needed. Re-tarring 100M+ daily writes ~3G/month to sda + Synology
# for unchanged content. Keep one fresh tarball per week instead.
if [ "$(date +%u)" = "7" ]; then
if ssh -o ConnectTimeout=10 root@10.0.20.1 \
"tar czf - --exclude=/dev --exclude=/proc --exclude=/tmp --exclude=/var/run /" \
> "${PFSENSE_DEST}/pfsense-full-${DATE}.tar.gz" 2>/dev/null; then
log " OK: weekly full tar ($(du -sh "${PFSENSE_DEST}/pfsense-full-${DATE}.tar.gz" | cut -f1))"
echo "pfsense/pfsense-full-${DATE}.tar.gz" | manifest_append
else
warn "Failed to tar pfsense filesystem"
STATUS=1
PFSENSE_STATUS=1
fi
else
warn "Failed to tar pfsense filesystem"
STATUS=1
PFSENSE_STATUS=1
log " skip weekly full tar (only runs Sundays)"
fi
# Retention: keep 4 weekly copies
@ -344,13 +389,15 @@ fi
# ============================================================
log "--- Step 4: PVE host config ---"
mkdir -p "${BACKUP_ROOT}/pve-config/scripts"
timeout 300 rsync -az --delete /etc/pve/ "${BACKUP_ROOT}/pve-config/etc-pve/" 2>&1 || { warn "Failed to sync /etc/pve"; STATUS=1; }
timeout 300 rsync -a --delete /etc/pve/ "${BACKUP_ROOT}/pve-config/etc-pve/" 2>&1 || { warn "Failed to sync /etc/pve"; STATUS=1; }
for script in /usr/local/bin/lvm-pvc-snapshot /usr/local/bin/daily-backup /usr/local/bin/offsite-sync-backup; do
[ -f "${script}" ] && cp "${script}" "${BACKUP_ROOT}/pve-config/scripts/" 2>/dev/null || true
done
find "${BACKUP_ROOT}/pve-config" -type f 2>/dev/null | sed "s|^${BACKUP_ROOT}/||" >> "${MANIFEST}"
find "${BACKUP_ROOT}/pve-config" -type f 2>/dev/null | sed "s|^${BACKUP_ROOT}/||" | manifest_append
log " OK: PVE config"
check_manifest_size
# ============================================================
# STEP 5: Prune LVM snapshots older than 7 days
# ============================================================
@ -361,6 +408,6 @@ log "--- Step 5: Snapshot pruning (7-day retention) ---"
# Done
# ============================================================
MANIFEST_LINES=$(wc -l < "${MANIFEST}" 2>/dev/null || echo 0)
log "=== Weekly backup complete (status=${STATUS}, ${TOTAL_BYTES} bytes, ${MANIFEST_LINES} files in manifest) ==="
log "=== daily-backup complete (status=${STATUS}, ${TOTAL_BYTES} bytes, ${MANIFEST_LINES} files in manifest) ==="
push_metrics "${STATUS}" "${TOTAL_BYTES}"
exit "${STATUS}"

View file

@ -57,6 +57,14 @@ EXCLUDES=(
--exclude='/.lv-pvc-mapping.json'
--exclude='/.nfs-changes.log'
# ---- anca-elements: photos are being ingested into Immich (2026-05-24),
# so /srv/nfs/immich/library/ becomes the canonical copy and the separate
# anca-elements tree is redundant. Excluded from nfs-mirror going forward.
# The historical 771G at /mnt/backup/anca-elements/ stays put until manual
# cleanup once Immich ingest completes; offsite-sync Step 1 also excludes
# it from the Synology pve-backup/ upload so we don't ship the redundant copy.
--exclude='/anca-elements/'
# ---- NFS paths: too big / transient / re-fetchable ----
--exclude='/immich/'
--exclude='/frigate/'
@ -81,6 +89,17 @@ EXCLUDES=(
log() { echo "[$(date -u '+%Y-%m-%dT%H:%M:%SZ')] $*" | tee -a "$LOG"; }
warn() { log "WARN: $*"; }
# Locked manifest append (shared with daily-backup) — see daily-backup.sh
# for the rationale. flock prevents interleaved appends when nfs-mirror
# (Mon 04:11) overruns into daily-backup (Mon 05:00).
MANIFEST_LOCK="${MANIFEST}.lock"
manifest_append() {
(
flock -x 200
cat >> "${MANIFEST}"
) 200>"${MANIFEST_LOCK}"
}
push_metrics() {
local status="${1:-0}" bytes="${2:-0}"
cat <<EOF | curl -s --connect-timeout 5 --max-time 10 --data-binary @- "${PUSHGATEWAY}/metrics/job/${PUSHGATEWAY_JOB}" 2>/dev/null || true
@ -132,10 +151,12 @@ if [ "$RSYNC_RC" -eq 0 ]; then
# manifest so daily Step 1 incremental picks them up tomorrow morning.
NEW_COUNT=$(find /mnt/backup -newer "$STAMP" -type f \
! -path '/mnt/backup/.changed-files' \
! -path '/mnt/backup/.changed-files.lock' \
! -path '/mnt/backup/.lv-pvc-mapping.json' \
! -path '/mnt/backup/.nfs-changes.log' \
! -path '/mnt/backup/.last-offsite-sync' \
-printf '%P\n' 2>/dev/null | tee -a "$MANIFEST" | wc -l)
! -path '/mnt/backup/.force-full-sync' \
-printf '%P\n' 2>/dev/null | tee >(manifest_append) | wc -l)
log "=== mirror complete; ${NEW_COUNT} files added to offsite manifest ==="
log "/mnt/backup used: $(df -h --output=used /mnt/backup | tail -1 | tr -d ' ')"
push_metrics 0 "$DST_BYTES"

View file

@ -54,18 +54,32 @@ DAY_OF_MONTH=$(date +%d)
# ============================================================
log "--- Step 1: sda → Synology pve-backup/ ---"
if [ "${DAY_OF_MONTH}" -le 7 ]; then
log "Monthly full sync (1st Sunday)..."
rsync -rltz --delete --chmod=Du=rwx,Dgo=rx,Fu=rw,Fog=r \
# Trigger: monthly cleanup window OR daily-backup signalled the manifest grew
# past its cap (Synology was unreachable too long for incremental to keep up).
FORCE_FULL_FLAG="${BACKUP_ROOT}/.force-full-sync"
FORCE_FULL=""
[ -f "${FORCE_FULL_FLAG}" ] && FORCE_FULL=1
if [ "${DAY_OF_MONTH}" -le 7 ] || [ -n "${FORCE_FULL}" ]; then
[ -n "${FORCE_FULL}" ] && log "Forced full sync (manifest size cap tripped)..." || log "Monthly full sync (1st Sunday)..."
# No -z on LAN: gigabit hop to 192.168.1.13 doesn't benefit from compression
# and burns CPU on the PVE host that's already busy with cluster IO.
rsync -rlt --delete --chmod=Du=rwx,Dgo=rx,Fu=rw,Fog=r \
--exclude='.changed-files' \
--exclude='.changed-files.lock' \
--exclude='.last-offsite-sync' \
--exclude='.lv-pvc-mapping.json' \
--exclude='.nfs-changes.log' \
--exclude='.force-full-sync' \
--exclude='/anca-elements/' \
"${BACKUP_ROOT}/" "${PVE_BACKUP_DEST}/" 2>&1 || STATUS=1
rm -f "${FORCE_FULL_FLAG}"
elif [ -s "${MANIFEST}" ]; then
MANIFEST_LINES=$(wc -l < "${MANIFEST}")
log "Incremental sync (${MANIFEST_LINES} files from manifest)..."
rsync -rltz --chmod=Du=rwx,Dgo=rx,Fu=rw,Fog=r --files-from="${MANIFEST}" \
# /anca-elements is being ingested into Immich (Immich becomes canonical) —
# skip the redundant copy in /mnt/backup/anca-elements/ until manual cleanup.
rsync -rlt --chmod=Du=rwx,Dgo=rx,Fu=rw,Fog=r --files-from="${MANIFEST}" \
--exclude='anca-elements/' \
"${BACKUP_ROOT}/" "${PVE_BACKUP_DEST}/" 2>&1 || STATUS=1
else
log "No changed files in manifest, nothing to sync"
@ -110,11 +124,11 @@ NFS_FULL_INCLUDES=(
if [ "${DAY_OF_MONTH}" -le 7 ]; then
# Monthly: full sync with --delete for cleanup, restricted to bypass-list.
log "Monthly full NFS sync (sda-bypass paths only)..."
rsync -rltz --delete "${NFS_FULL_INCLUDES[@]}" /srv/nfs/ "${NFS_DEST}/" 2>&1 \
rsync -rlt --delete "${NFS_FULL_INCLUDES[@]}" /srv/nfs/ "${NFS_DEST}/" 2>&1 \
&& log " OK: nfs/ full sync (bypass-list)" || { warn "nfs/ full sync failed"; STATUS=1; }
# nfs-ssd: every dir under it (immich/ollama/llamacpp) is in the bypass list,
# so a plain --delete still applies cleanly.
rsync -rltz --delete /srv/nfs-ssd/ "${NFS_SSD_DEST}/" 2>&1 \
rsync -rlt --delete /srv/nfs-ssd/ "${NFS_SSD_DEST}/" 2>&1 \
&& log " OK: nfs-ssd/ full sync" || { warn "nfs-ssd/ full sync failed"; STATUS=1; }
> "${NFS_CHANGE_LOG}"
elif [ -s "${NFS_CHANGE_LOG}" ]; then
@ -127,7 +141,7 @@ elif [ -s "${NFS_CHANGE_LOG}" ]; then
> /tmp/sync-nfs.list 2>/dev/null
NFS_COUNT=$(wc -l < /tmp/sync-nfs.list 2>/dev/null || echo 0)
if [ "${NFS_COUNT:-0}" -gt 0 ]; then
rsync -rltz --files-from=/tmp/sync-nfs.list /srv/nfs/ "${NFS_DEST}/" 2>&1 \
rsync -rlt --files-from=/tmp/sync-nfs.list /srv/nfs/ "${NFS_DEST}/" 2>&1 \
&& log " OK: nfs/ (${NFS_COUNT} bypass files)" \
|| { warn "nfs/ incremental failed"; STATUS=1; }
fi
@ -138,7 +152,7 @@ elif [ -s "${NFS_CHANGE_LOG}" ]; then
> /tmp/sync-nfs-ssd.list 2>/dev/null || true
SSD_COUNT=$(wc -l < /tmp/sync-nfs-ssd.list 2>/dev/null || echo 0)
if [ "${SSD_COUNT:-0}" -gt 0 ]; then
rsync -rltz --files-from=/tmp/sync-nfs-ssd.list /srv/nfs-ssd/ "${NFS_SSD_DEST}/" 2>&1 \
rsync -rlt --files-from=/tmp/sync-nfs-ssd.list /srv/nfs-ssd/ "${NFS_SSD_DEST}/" 2>&1 \
&& log " OK: nfs-ssd/ (${SSD_COUNT} files)" \
|| { warn "nfs-ssd/ incremental failed"; STATUS=1; }
fi

View file

@ -1,13 +1,24 @@
"""Aceztrims extractor - scrapes F1 streaming links from Aceztrims pages.
"""Aceztrims extractor — scrapes embed URLs from acestrlms.pages.dev/f11/.
Parses HTML for iframe button onclick handlers and extracts streams from:
- /iframe1?s=<m3u8_url> direct m3u8
- https://pooembed.eu/embed/... embed URL
The page (Cloudflare Pages, no anti-bot) hosts an iframe + a strip of
onclick channel-switcher buttons. Each button rewrites the iframe via
`document.getElementById('iframe').src = '<embed_url>'`. The initial
channel is hard-coded as `<iframe id='iframe' src='...'>`.
We strip HTML comments first because the page keeps ~20 legacy channel
buttons inside `<!-- ... -->` blocks for easy re-enablement; the previous
loose regex picked them up as false positives.
All channels are iframe embeds (no direct m3u8) `stream_type='embed'`.
Site naming note: the extractor key stays `aceztrims` (the previous
domain) so registry/cache identifiers don't churn. The current domain
is `acestrlms.pages.dev` and the F1 path is `/f11/` (two ones `/f1/`
is the cross-sport schedule page and has no stream buttons).
"""
import logging
import re
from urllib.parse import parse_qs, urlparse
import httpx
@ -17,9 +28,8 @@ from backend.extractors.models import ExtractedStream
logger = logging.getLogger(__name__)
BASE_URL = "https://acestrlms.pages.dev"
# Pages to scrape for streams
F1_PAGES = [
("/f1/", "Formula 1"),
("/f11/", "Formula 1"),
]
USER_AGENT = (
@ -28,13 +38,21 @@ USER_AGENT = (
"Chrome/120.0.0.0 Safari/537.36"
)
# `document.getElementById('iframe').src = '<URL>'` — current channel-switcher format.
_ONCLICK_IFRAME_SRC = re.compile(
r"""document\.getElementById\(['"]iframe['"]\)\.src\s*=\s*['"]([^'"]+)['"]""",
re.IGNORECASE,
)
# `<iframe id='iframe' src='<URL>'>` — the default/initial channel.
_DEFAULT_IFRAME = re.compile(
r"""<iframe[^>]*id\s*=\s*['"]iframe['"][^>]*src\s*=\s*['"]([^'"]+)['"]""",
re.IGNORECASE,
)
_HTML_COMMENT = re.compile(r"<!--.*?-->", re.DOTALL)
class AceztrimsExtractor(BaseExtractor):
"""Extracts streams from Aceztrims pages by parsing HTML for iframe URLs.
Looks for onclick handlers on buttons/links that open iframes, and
extracts the stream URLs from them.
"""
"""Pulls iframe embed URLs out of the acestrlms.pages.dev F1 page."""
@property
def site_key(self) -> str:
@ -45,7 +63,6 @@ class AceztrimsExtractor(BaseExtractor):
return "Aceztrims"
async def extract(self) -> list[ExtractedStream]:
"""Scrape all configured F1 pages for stream URLs."""
streams: list[ExtractedStream] = []
async with httpx.AsyncClient(
@ -55,12 +72,9 @@ class AceztrimsExtractor(BaseExtractor):
) as client:
for path, category in F1_PAGES:
try:
page_streams = await self._scrape_page(client, path, category)
streams.extend(page_streams)
streams.extend(await self._scrape_page(client, path, category))
except Exception:
logger.exception(
"[aceztrims] Failed to scrape page %s", path
)
logger.exception("[aceztrims] Failed to scrape %s", path)
logger.info("[aceztrims] Extracted %d stream(s)", len(streams))
return streams
@ -68,85 +82,39 @@ class AceztrimsExtractor(BaseExtractor):
async def _scrape_page(
self, client: httpx.AsyncClient, path: str, category: str
) -> list[ExtractedStream]:
"""Scrape a single page for stream URLs."""
url = f"{BASE_URL}{path}"
resp = await client.get(url)
if resp.status_code != 200:
logger.warning(
"[aceztrims] Page %s returned HTTP %d", path, resp.status_code
"[aceztrims] %s returned HTTP %d", path, resp.status_code
)
return []
html = resp.text
# The page keeps a block of legacy channel buttons inside
# `<!-- ... -->` for quick re-enablement. Strip comments first so
# the regex only sees live buttons.
html = _HTML_COMMENT.sub("", resp.text)
seen: set[str] = set()
streams: list[ExtractedStream] = []
seen_urls: set[str] = set()
# Pattern 1: /iframe1?s=<m3u8_url> — direct m3u8
iframe1_pattern = re.compile(
r"""['"]((?:https?://[^'"]*)?/iframe1\?s=([^'"&]+))['""]""",
re.IGNORECASE,
)
for match in iframe1_pattern.finditer(html):
m3u8_url = match.group(2)
if m3u8_url in seen_urls:
continue
seen_urls.add(m3u8_url)
streams.append(
ExtractedStream(
url=m3u8_url,
site_key=self.site_key,
site_name=self.site_name,
quality="",
title=f"{category} Stream",
stream_type="m3u8",
for pattern in (_DEFAULT_IFRAME, _ONCLICK_IFRAME_SRC):
for match in pattern.finditer(html):
embed_url = match.group(1).strip()
if not embed_url or embed_url in seen:
continue
seen.add(embed_url)
streams.append(
ExtractedStream(
url=embed_url,
site_key=self.site_key,
site_name=self.site_name,
quality="",
title=f"{category} Stream",
stream_type="embed",
embed_url=embed_url,
)
)
)
# Pattern 2: embed URLs (pooembed.eu or similar)
embed_pattern = re.compile(
r"""['"]((https?://(?:pooembed\.eu|[^'"]*embed)[^'"]*))['"]""",
re.IGNORECASE,
)
for match in embed_pattern.finditer(html):
embed_url = match.group(1)
if embed_url in seen_urls:
continue
seen_urls.add(embed_url)
streams.append(
ExtractedStream(
url=embed_url,
site_key=self.site_key,
site_name=self.site_name,
quality="",
title=f"{category} Stream (Embed)",
stream_type="embed",
embed_url=embed_url,
)
)
# Pattern 3: Generic onclick handlers with URLs
onclick_pattern = re.compile(
r"""onclick\s*=\s*['"].*?['"]?(https?://[^'")\s]+\.m3u8[^'")\s]*)['"]?""",
re.IGNORECASE,
)
for match in onclick_pattern.finditer(html):
m3u8_url = match.group(1)
if m3u8_url in seen_urls:
continue
seen_urls.add(m3u8_url)
streams.append(
ExtractedStream(
url=m3u8_url,
site_key=self.site_key,
site_name=self.site_name,
quality="",
title=f"{category} Stream",
stream_type="m3u8",
)
)
logger.info(
"[aceztrims] Found %d stream(s) on %s", len(streams), path

View file

@ -34,7 +34,7 @@ USER_AGENT = (
# to also surface MotoGP and adjacent motorsports — keeps the f1-stream
# UI useful between race weekends and during the off-season.
MOTORSPORT_CATEGORIES = {
"formula 1", "formula 2", "formula 3",
"f1", "formula 1", "formula 2", "formula 3",
"motogp", "moto gp", "moto2", "moto3", "motoe",
"world rally championship", "wrc",
"world endurance championship", "wec",
@ -85,27 +85,61 @@ _is_f1_category = _is_motorsport_category
_is_f1_event = _is_motorsport_event
def _parse_live_events(html: str) -> list[_PitsportEvent]:
"""Parse live events from the main page RSC payload.
def _decode_rsc_payload(html: str) -> str:
"""Concatenate and unescape all `self.__next_f.push([1, "..."])` chunks.
The main page contains event cards with props:
category, title, time, imageUrl
wrapped in <a href="/watch/{UUID}"> links.
Next.js RSC ships its tree as escape-encoded strings inside repeated
`self.__next_f.push` calls. Regex over the raw HTML misses everything
interesting; we have to decode unicode escapes first.
"""
chunks = re.findall(r'self\.__next_f\.push\(\[1,"(.*?)"\]\)', html, re.DOTALL)
if not chunks:
return ""
payload = ""
for chunk in chunks:
try:
payload += chunk.encode().decode("unicode_escape")
except Exception:
payload += chunk
return payload
def _parse_live_events(html: str) -> list[_PitsportEvent]:
"""Parse live events from the main page (or `/live-now`) RSC payload.
The pages embed event cards inside the Next.js RSC payload; the raw
HTML keeps it escape-encoded so we decode first, then match.
Two shapes are common:
1) Older card props: "category":"...","title":"..." next to
"href":"/watch/UUID".
2) Newer `event` prop: an `event` object with `uri:"/watch/UUID"`
carrying `category` and `title`.
"""
payload = _decode_rsc_payload(html) or html
events: list[_PitsportEvent] = []
# Match event cards in the RSC payload - they appear as JSON-like structures
# Pattern: href="/watch/UUID" ... category":"...", "title":"..."
# In the RSC payload, the data is in the format:
# ["$","$L2","/watch/UUID",{"href":"/watch/UUID","children":["$","$L10",null,
# {"category":"...","title":"...","time":...,"imageUrl":"..."}]}]
pattern = re.compile(
href_pattern = re.compile(
r'"href":"(/watch/([0-9a-f-]{36}))"[^}]*?"category":"([^"]+)","title":"([^"]+)"',
)
for match in pattern.finditer(html):
for match in href_pattern.finditer(payload):
_, uuid, category, title = match.groups()
events.append(_PitsportEvent(category=category, title=title, watch_uuid=uuid))
event_pattern = re.compile(
r'"event":\{[^{}]*?"title":"([^"]+)"[^{}]*?"uri":"/watch/([0-9a-f-]{36})"[^{}]*?"category":"([^"]+)"',
)
for match in event_pattern.finditer(payload):
title, uuid, category = match.groups()
events.append(_PitsportEvent(category=category, title=title, watch_uuid=uuid))
event_pattern_alt = re.compile(
r'"event":\{[^{}]*?"category":"([^"]+)"[^{}]*?"title":"([^"]+)"[^{}]*?"uri":"/watch/([0-9a-f-]{36})"',
)
for match in event_pattern_alt.finditer(payload):
category, title, uuid = match.groups()
events.append(_PitsportEvent(category=category, title=title, watch_uuid=uuid))
return events
@ -301,13 +335,12 @@ def _is_m3u8_method(method: str) -> bool:
def _extract_m3u8_url(link: str) -> str:
"""Convert a serveplay.site player URL to an m3u8 playlist URL.
"""Pass through the link from pushembdz's `api/stream/<slug>` response.
Input: https://dash.serveplay.site/{channel}/index.html
Output: https://dash.serveplay.site/{channel}/index.html
The index.html IS the m3u8 playlist (served with proper content-type
when fetched with the correct Referer header).
The host has rotated over time (serveplay.site oe1.ossfeed.store
); the response is always a master playlist URL we hand to the
player as-is. Content-Type may be `text/css` or `application/json`
treat as HLS based on body sniffing (`#EXTM3U`), not MIME.
"""
return link
@ -388,6 +421,24 @@ class PitsportExtractor(BaseExtractor):
except Exception:
logger.exception("[pitsport] Failed to fetch main page")
# Fetch /live-now — canonical "currently live" list, added 2026.
try:
resp = await client.get(f"{PITSPORT_BASE}/live-now")
if resp.status_code == 200:
live_now_events = _parse_live_events(resp.text)
logger.info(
"[pitsport] Live-now page: %d event(s)", len(live_now_events)
)
for ev in live_now_events:
if _is_f1_event(ev.category, ev.title):
all_events.append(ev)
else:
logger.warning(
"[pitsport] Live-now page returned HTTP %d", resp.status_code
)
except Exception:
logger.exception("[pitsport] Failed to fetch live-now page")
# Fetch schedule page for upcoming events
try:
resp = await client.get(f"{PITSPORT_BASE}/schedule")

View file

@ -153,21 +153,37 @@ class PPVExtractor(BaseExtractor):
if viewers and int(viewers) > 0:
title += f" ({viewers} viewers)"
# Check for substreams (multiple quality/language options)
# Always emit the parent stream — substreams are
# additional language/source variants, not replacements.
streams.append(
ExtractedStream(
url=embed_url,
site_key=self.site_key,
site_name=self.site_name,
quality=quality,
title=title,
stream_type="embed",
embed_url=embed_url,
)
)
substreams = stream_obj.get("substreams")
if isinstance(substreams, list) and substreams:
if isinstance(substreams, list):
for i, sub in enumerate(substreams):
sub_embed = sub.get("iframe", "") or sub.get("embed_url", "")
if not sub_embed:
# Fall back to the parent embed URL
sub_embed = embed_url
sub_name = sub.get("name", "") or sub.get("label", "")
sub_name = (
sub.get("source_tag", "")
or sub.get("name", "")
or sub.get("label", "")
)
sub_quality = sub.get("tag", "") or sub.get("quality", "") or quality
sub_title = f"{name}"
if sub_name:
sub_title += f" - {sub_name}"
elif i > 0:
sub_title += f" #{i + 1}"
else:
sub_title += f" #{i + 2}"
streams.append(
ExtractedStream(
@ -180,19 +196,6 @@ class PPVExtractor(BaseExtractor):
embed_url=sub_embed,
)
)
else:
# Single stream, no substreams
streams.append(
ExtractedStream(
url=embed_url,
site_key=self.site_key,
site_name=self.site_name,
quality=quality,
title=title,
stream_type="embed",
embed_url=embed_url,
)
)
except Exception:
logger.exception("[ppv] Failed to extract streams")

View file

@ -61,6 +61,12 @@ resource "kubernetes_deployment" "forgejo" {
app = "forgejo"
tier = local.tiers.edge
}
annotations = {
# Keel disabled here its `force` policy rewrote the image tag
# from 11.0.14 1.18 on 2026-05-24 (same bug as memory id=1933).
# TF owns the tag now; bump it manually here when upgrading.
"keel.sh/policy" = "never"
}
}
spec {
replicas = 1
@ -89,7 +95,14 @@ resource "kubernetes_deployment" "forgejo" {
}
container {
name = "forgejo"
image = "codeberg.org/forgejo/forgejo:11"
# Pinned to 11.0.14 (latest 11.x as of 2026-05-12) was on
# floating `:11`. On 2026-05-24T15:35:37Z Keel force-policy
# rewrote the tag from `11.0.14 1.18` (Gitea-era Forgejo
# v1.18), exact replay of the 2026-05-16 force-policy
# tag-rewriting incident (memory id=1933). The pod crashlooped
# because the DB had already been migrated to schema 305 by
# 11.0.14 and v1.18 only knows up to migration 231.
image = "codeberg.org/forgejo/forgejo:11.0.14"
env {
name = "USER_UID"
value = 1000
@ -182,10 +195,16 @@ resource "kubernetes_deployment" "forgejo" {
lifecycle {
ignore_changes = [
spec[0].template[0].spec[0].dns_config, # KYVERNO_LIFECYCLE_V1
spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE Keel manages tag updates
metadata[0].annotations["keel.sh/policy"],
# KEEL_IGNORE_IMAGE removed 2026-05-24 Keel is disabled for this
# workload now (keel.sh/policy=never annotation above), so TF owns
# the image tag. Restore this ignore_changes line if you flip
# keel.sh/policy back to `force` later.
metadata[0].annotations["keel.sh/match-tag"],
metadata[0].annotations["keel.sh/trigger"],
metadata[0].annotations["keel.sh/pollSchedule"], # KYVERNO_LIFECYCLE_V2
metadata[0].annotations["kubernetes.io/change-cause"],
metadata[0].annotations["deployment.kubernetes.io/revision"],
spec[0].template[0].metadata[0].annotations["keel.sh/update-time"],
]
}
}

View file

@ -1562,6 +1562,13 @@ serverFiles:
severity: warning
annotations:
summary: "Offsite backup sync is {{ $value | humanizeDuration }} old (threshold: 9d)"
- alert: OffsiteBackupSyncFailing
expr: offsite_sync_last_status{job="offsite-backup-sync"} != 0
for: 0m
labels:
severity: warning
annotations:
summary: "Offsite backup sync last run reported errors (status={{ $value }})"
- alert: NfsMirrorStale
expr: (time() - nfs_mirror_last_run_timestamp{job="nfs-mirror"}) > 1382400
for: 30m