[ci skip] iSCSI migration, healthcheck fixes, health probes, etcd backup
- Migrate MySQL/PostgreSQL storage from local-path to iscsi-truenas - Add democratic-csi iSCSI driver module for TrueNAS - Add open-iscsi to cloud-init VM template - Fix Shlink health probe path (/api/v3 -> /rest/v3 for Shlink 5.0) - Fix etcd backup: use etcd 3.5.21-0 (3.6.x is distroless, no /bin/sh) - Fix cluster healthcheck CronJob: always exit 0 to prevent circular JobFailed alerts (reporting via Slack, not exit codes) - Fix Uptime Kuma nested list handling in cluster-health.sh - Add health probes to: audiobookshelf, immich ML, ntfy, headscale, uptime-kuma, vaultwarden, rybbit (clickhouse + server + client), shlink, shlink-web - Add iSCSI storage documentation to CLAUDE.md
This commit is contained in:
parent
a8e07ad930
commit
1d80c49201
17 changed files with 378 additions and 13 deletions
|
|
@ -61,6 +61,14 @@ For platform modules, use `source = "../../../../modules/kubernetes/nfs_volume"`
|
|||
**StorageClass**: `nfs-truenas` (deployed via `stacks/platform/modules/nfs-csi/`).
|
||||
**DO NOT use inline `nfs {}` blocks** — they mount with `hard,timeo=600` defaults which hang forever on stale mounts.
|
||||
|
||||
### iSCSI Storage for Databases
|
||||
**StorageClass**: `iscsi-truenas` (deployed via `stacks/platform/modules/iscsi-csi/` using democratic-csi).
|
||||
- Used by: PostgreSQL (CNPG), MySQL (InnoDB Cluster) — any pod, any node, same data
|
||||
- Driver: `freenas-iscsi` (SSH-based, NOT `freenas-api-iscsi` which is TrueNAS SCALE only)
|
||||
- ZFS datasets: `main/iscsi` (zvols), `main/iscsi-snaps` (snapshots)
|
||||
- All K8s nodes have `open-iscsi` + `iscsid` running
|
||||
- Redis stays on `local-path` (StatefulSet `volumeClaimTemplates` are immutable)
|
||||
|
||||
### Adding NFS Exports
|
||||
1. **Create the directory on TrueNAS first**: `ssh root@10.0.10.15 "mkdir -p /mnt/main/<service> && chmod 777 /mnt/main/<service>"`
|
||||
2. Edit `secrets/nfs_directories.txt` — add path, keep sorted
|
||||
|
|
|
|||
|
|
@ -1522,12 +1522,9 @@ main() {
|
|||
print_summary
|
||||
send_slack
|
||||
|
||||
# Exit code: 2 for failures, 1 for warnings, 0 for clean
|
||||
if [[ "$FAIL_COUNT" -gt 0 ]]; then
|
||||
exit 2
|
||||
elif [[ "$WARN_COUNT" -gt 0 ]]; then
|
||||
exit 1
|
||||
fi
|
||||
# Always exit 0 — reporting is done via Slack notification.
|
||||
# Non-zero exits mark the CronJob as Failed, which triggers Prometheus
|
||||
# JobFailed alerts, creating a circular alert loop.
|
||||
exit 0
|
||||
}
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue