Compare commits

..

No commits in common. "eb6ceac5f549071e9d009b22d627494e1901555c" and "ac604d4d1faa494986314fe3467b3d392f033416" have entirely different histories.

70 changed files with 4741 additions and 8399 deletions

View file

@ -137,7 +137,7 @@ Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handle
- Every new service gets Prometheus scrape config + Uptime Kuma monitor. External monitors auto-created for Cloudflare-proxied services by `external-monitor-sync` CronJob (10min, uptime-kuma ns). Mechanism: `ingress_factory` auto-adds `uptime.viktorbarzin.me/external-monitor=true` whenever `dns_type != "none"` (see `modules/kubernetes/ingress_factory/main.tf`) — no manual action needed on new services. The `cloudflare_proxied_names` list in `config.tfvars` is a legacy fallback for the 17 hostnames not yet migrated to `ingress_factory` `dns_type`; don't check that list when debugging "is this monitored?" questions.
- **External monitoring**: `[External] <service>` monitors in Uptime Kuma test full external path (DNS → Cloudflare → Tunnel → Traefik). Divergence metric `external_internal_divergence_count` → alert `ExternalAccessDivergence` (15min). Config: `stacks/uptime-kuma/`, targets from `cloudflare_proxied_names` in `config.tfvars` (17 remaining centrally-managed hostnames; most DNS records now auto-created by `ingress_factory` `dns_type` param).
- Key alerts: OOMKill, pod replica mismatch, 4xx/5xx error rates, UPS battery, CPU temp, SSD writes, NFS responsiveness, ClusterMemoryRequestsHigh (>85%), ContainerNearOOM (>85% limit), PodUnschedulable, ExternalAccessDivergence.
- **E2E email monitoring**: CronJob `email-roundtrip-monitor` (every 20 min) sends test email via Brevo HTTP API to `smoke-test@viktorbarzin.me` (catch-all → `spam@`), verifies IMAP delivery, deletes test email, pushes metrics to Pushgateway + Uptime Kuma. Alerts: `EmailRoundtripFailing` (60m), `EmailRoundtripStale` (60m), `EmailRoundtripNeverRun` (60m). Outbound relay: Brevo EU (`smtp-relay.brevo.com:587`, 300/day free — migrated from Mailgun). Inbound external traffic enters via pfSense HAProxy on `10.0.20.1:{25,465,587,993}`, which forwards to k8s `mailserver-proxy` NodePort (30125-30128) with `send-proxy-v2`. Mailserver pod runs alt PROXY-speaking listeners (2525/4465/5587/10993) alongside stock PROXY-free ones (25/465/587/993) for intra-cluster clients. Real client IPs recovered from PROXY v2 header despite kube-proxy SNAT (replaces pre-2026-04-19 MetalLB `10.0.20.202` ETP:Local scheme; see bd code-yiu + `docs/runbooks/mailserver-pfsense-haproxy.md`). Vault: `brevo_api_key` in `secret/viktor` (probe + relay).
- **E2E email monitoring**: CronJob `email-roundtrip-monitor` (every 20 min) sends test email via Mailgun API to `smoke-test@viktorbarzin.me` (catch-all → `spam@`), verifies IMAP delivery, deletes test email, pushes metrics to Pushgateway + Uptime Kuma. Alerts: `EmailRoundtripFailing` (60m), `EmailRoundtripStale` (60m), `EmailRoundtripNeverRun` (60m). Outbound relay: Brevo EU (`smtp-relay.brevo.com:587`, 300/day free — migrated from Mailgun). Mailserver on dedicated MetalLB IP `10.0.20.202` with `externalTrafficPolicy: Local` for CrowdSec real-IP detection. Vault: `mailgun_api_key` in `secret/viktor` (probe), `brevo_api_key` in `secret/viktor` (relay).
## Storage & Backup Architecture

View file

@ -19,25 +19,15 @@ Produce EXACTLY ONE JSON object on stdout matching the schema. No prose. No mark
## RSU handling (important — Meta UK payslips)
UK payslips for equity-compensated employees (e.g. Meta) report RSU vests as NOTIONAL pay for HMRC reporting only — the broker (Schwab) sells shares to cover US-side withholding but the UK payslip ALSO runs the vest through PAYE via a grossed-up Taxable Pay line. Meta UK template:
UK payslips for equity-compensated employees (e.g. Meta) report RSU vests as NOTIONAL pay for HMRC reporting only — the actual share grant + tax is handled by the broker (Schwab), which sells shares to cover withholding. On the payslip:
- EARNINGS lines: `RSU Tax Offset` (grossed-up vest value) and optionally `RSU Excs Refund` (over-withheld amount returned). SUM BOTH into `rsu_vest`. Other labels seen on non-Meta templates: `RSU Vest`, `Restricted Stock Units`, `Notional Pay`, `GSU Vest`.
- Meta's template does NOT use a matching offset deduction — `rsu_offset` should be 0. Taxable Pay is grossed up to (Total Payment + rsu_vest) so PAYE already includes the RSU share.
- For non-Meta templates that DO use an offset (`Shares Retained`, `Notional Pay Offset`), populate `rsu_offset` with the magnitude.
- An EARNINGS line appears with labels like `RSU Vest`, `Restricted Stock Units`, `Stock Value`, `Notional Pay`, `Share Award`, `GSU Vest`, `Equity Vest` → populate `rsu_vest`.
- A DEDUCTION line of equal-or-similar magnitude nets it back out. Labels: `Shares Retained`, `Stock Tax Withholding`, `RSU Offset`, `Notional Pay Offset`, `Shares Withheld` → populate `rsu_offset`.
If you see ANY of these lines, do NOT add them to `other_deductions` and do NOT let them count as regular income_tax/NI.
If you see either line, populate BOTH fields. Do NOT add them to `other_deductions` and do NOT let them count as regular income_tax/NI even though some templates put them near the tax block. They exist for reporting.
If the payslip has no stock component, leave both as 0.
## Earnings decomposition (v2)
- `salary`: the basic salary/pay line (usually the first "Salary" or "Basic Pay" entry in the Earnings/Payments block).
- `bonus`: the bonus line (`Perform Bonus`, `Bonus`, `Performance Bonus`). If absent or 0, leave as 0 — that's meaningful signal (bonus-sacrifice months). Don't invent.
- `pension_sacrifice`: **ABSOLUTE VALUE** of any NEGATIVE pension line in the Payments block (e.g. `AE Pension EE -600.20``600.20`). This is salary-sacrifice and is ALREADY subtracted from Total Payment/gross. Do not also put it in `pension_employee`.
- `pension_employee`: use this ONLY when pension appears as a POSITIVE deduction on the Deductions side (legacy Meta variant A, or non-Meta templates). Never double-count.
- `taxable_pay`: the "Taxable Pay" line in the summary block, THIS PERIOD column. For Meta this is the post-sacrifice + RSU-grossed-up base that PAYE is computed on. If the payslip doesn't surface a summary block, null.
- `ytd_tax_paid`, `ytd_taxable_pay`, `ytd_gross`: YTD column values from the same summary block. Null if not present.
## Fast path: PAYSLIP_TEXT is present
If the prompt contains `PAYSLIP_TEXT:`, the caller has already run `pdftotext -layout`. Skip Steps 1-2 entirely — the text is already in your context. Go straight to Step 3.

View file

@ -34,11 +34,7 @@ You receive these parameters in your invocation:
- **Infra repo**: `/home/wizard/code/infra`
- **Config**: `/home/wizard/code/infra/.claude/reference/upgrade-config.json`
- **Kubeconfig**: `/home/wizard/code/infra/config`
- **Secrets (env-var contract)**: You run in the `claude-agent-service` pod, which has NO Vault CLI auth — do NOT call `vault kv get`. The following env vars are pre-loaded via `envFrom: claude-agent-secrets`:
- `GITHUB_TOKEN` — PAT for GitHub API (changelog fetch) and `git push`
- `WOODPECKER_API_TOKEN` — bearer for `ci.viktorbarzin.me/api/...`
- `SLACK_WEBHOOK_URL` — full Slack webhook URL for status messages
- Anything else (e.g. `kubectl`) uses the pod's ServiceAccount or in-repo git-crypt-unlocked secrets.
- **Vault**: Authenticate with `vault login -method=oidc` if needed. Secrets at `secret/viktor` and `secret/platform`.
- **Git remote**: `origin``github.com/ViktorBarzin/infra.git`
## NEVER Do
@ -122,6 +118,7 @@ cat /home/wizard/code/infra/.claude/reference/upgrade-config.json
3. **For Helm charts**: Check `helm_chart_repo_overrides` for the chart repository URL
4. If auto-detect fails, verify the repo exists:
```bash
GITHUB_TOKEN=$(vault kv get -field=github_pat secret/viktor)
curl -sf -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/${DETECTED_REPO}" > /dev/null
```
@ -131,6 +128,7 @@ cat /home/wizard/code/infra/.claude/reference/upgrade-config.json
## Step 3: Fetch Changelogs via GitHub API
```bash
GITHUB_TOKEN=$(vault kv get -field=github_pat secret/viktor)
curl -s -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/${GITHUB_REPO}/releases?per_page=100"
```
@ -173,9 +171,11 @@ Scan all intermediate release notes for breaking change indicators from the conf
## Step 5: Slack Notification — Starting
```bash
SLACK_WEBHOOK=$(vault kv get -field=alertmanager_slack_api_url secret/platform)
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"[Upgrade Agent] Starting: *${STACK}* ${OLD_VERSION} -> ${NEW_VERSION} (risk: ${RISK})\"}" \
"$SLACK_WEBHOOK_URL"
"$SLACK_WEBHOOK"
```
For CAUTION risk, include breaking change excerpts in the Slack message.
@ -266,28 +266,23 @@ UPGRADE_SHA=$(git rev-parse HEAD)
## Step 9: Wait for Woodpecker CI
The commit triggers one pipeline that runs multiple **workflows** in parallel — e.g. `default` (terragrunt apply) and `build-cli` (builds the infra CLI image). Only the `default` workflow gates your upgrade; the other workflows may be unrelated and sometimes fail without breaking anything on the cluster (current example: `build-cli` push to `registry.viktorbarzin.me:5050` is known-broken as of 2026-04-19).
**Do not read the overall pipeline `status`** — it reports `failure` whenever *any* workflow fails. Read the `default` workflow's `state` instead.
The commit triggers the `app-stacks.yml` pipeline (or `default.yml` for platform stacks).
```bash
# Find the pipeline for our commit
curl -s -H "Authorization: Bearer $WOODPECKER_API_TOKEN" \
"https://ci.viktorbarzin.me/api/repos/1/pipelines?page=1&per_page=10" \
| jq --arg sha "$UPGRADE_SHA" '.[] | select(.commit==$sha) | .number'
# → $PIPELINE_NUMBER
# Fetch detail (includes workflows[])
curl -s -H "Authorization: Bearer $WOODPECKER_API_TOKEN" \
"https://ci.viktorbarzin.me/api/repos/1/pipelines/$PIPELINE_NUMBER" \
| jq '.workflows[] | select(.name=="default") | .state'
# → "running" | "pending" | "success" | "failure" | "error" | "killed"
WOODPECKER_TOKEN=$(vault kv get -field=woodpecker_token secret/viktor)
```
Poll every 30 seconds until the `default` workflow's `state` is terminal (`success`, `failure`, `error`, `killed`). Timeout after 15 minutes.
Poll for the pipeline triggered by our commit:
```bash
# Get latest pipeline
curl -s -H "Authorization: Bearer $WOODPECKER_TOKEN" \
"https://ci.viktorbarzin.me/api/repos/1/pipelines?page=1&per_page=5"
```
**If `default` state is `success`** → proceed to Step 10 (verification), regardless of other workflows' state.
**If `default` state is terminal-and-not-success, or the poll times out** → proceed to Step 10b (rollback).
Find the pipeline matching our commit SHA. Poll every 30 seconds until status is `success`, `failure`, `error`, or `killed`. Timeout after 15 minutes.
**If CI fails** → proceed to Step 10 (rollback).
**If CI succeeds** → proceed to verification.
## Step 10: Verify
@ -346,7 +341,7 @@ Re-run verification checks to confirm rollback succeeded. If rollback verificati
```bash
curl -s -X POST -H 'Content-type: application/json' \
--data '{"text":"[Upgrade Agent] CRITICAL: Rollback of *${STACK}* also failed. Manual intervention required."}' \
"$SLACK_WEBHOOK_URL"
"$SLACK_WEBHOOK"
```
## Step 11: Report Results
@ -355,14 +350,14 @@ curl -s -X POST -H 'Content-type: application/json' \
```bash
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"[Upgrade Agent] SUCCESS: *${STACK}* upgraded ${OLD_VERSION} -> ${NEW_VERSION}\nVerification: pods ready, HTTP OK${UPTIME_KUMA_MSG}\nCommit: ${UPGRADE_SHA}\"}" \
"$SLACK_WEBHOOK_URL"
"$SLACK_WEBHOOK"
```
### On failure + rollback
```bash
curl -s -X POST -H 'Content-type: application/json' \
--data "{\"text\":\"[Upgrade Agent] FAILED + ROLLED BACK: *${STACK}* ${OLD_VERSION} -> ${NEW_VERSION}\nReason: ${FAILURE_REASON}\nRollback commit: ${ROLLBACK_SHA}\nRollback status: ${ROLLBACK_STATUS}\"}" \
"$SLACK_WEBHOOK_URL"
"$SLACK_WEBHOOK"
```
## Edge Cases

1728
.claude/cluster-health.sh Executable file

File diff suppressed because it is too large Load diff

View file

@ -7,314 +7,339 @@ description: |
(3) User asks to fix stuck pods, evicted pods, or CrashLoopBackOff,
(4) User mentions "health check", "cluster status", "cluster health",
(5) User asks "is everything running" or "any problems".
Runs 42 cluster-wide checks (nodes, workloads, monitoring, certs,
backups, external reachability) with safe auto-fix for evicted pods.
Runs 8 standard K8s health checks with safe auto-fix for evicted pods
and stuck CrashLoopBackOff pods.
author: Claude Code
version: 2.0.0
date: 2026-04-19
version: 1.0.0
date: 2026-02-21
---
# Cluster Health Check
## MANDATORY: Run the script first
## Overview
When this skill is invoked, your **first action** must be to run the
cluster health check script and reason over its output before doing
anything else. Do not improvise individual `kubectl` calls — the
script is the authoritative surface.
- **Script**: `/workspace/infra/.claude/cluster-health.sh`
- **Schedule**: CronJob runs every 30 minutes in the `openclaw` namespace
- **Slack notifications**: Posts results to the webhook URL in `$SLACK_WEBHOOK_URL`
- **Auto-fix**: Automatically deletes evicted/failed pods and CrashLoopBackOff pods with >10 restarts
- **Exit code**: 0 = healthy, 1 = issues found
## Quick Check
Run the health check interactively:
```bash
cd /home/wizard/code
bash infra/scripts/cluster_healthcheck.sh --json | tee /tmp/cluster-health.json
# Report only, no Slack notification
bash /workspace/infra/.claude/cluster-health.sh --no-slack
# Full run with Slack notification
bash /workspace/infra/.claude/cluster-health.sh
# Report only, no auto-fix and no Slack
bash /workspace/infra/.claude/cluster-health.sh --no-fix --no-slack
```
If the session is rooted elsewhere, fall back to the absolute path:
## What It Checks
```bash
bash /home/wizard/code/infra/scripts/cluster_healthcheck.sh --json
```
Then:
1. Parse the JSON. Report the PASS/WARN/FAIL counts + overall verdict.
2. Iterate every FAIL and WARN check, describe what tripped, and propose
the remediation path (use the recipes below).
3. Only reach for ad-hoc `kubectl` commands when investigating a
specific failure beyond what the script reported.
Exit codes: `0` = healthy, `1` = warnings only, `2` = failures.
## Quick flags
```bash
# Human-readable report (default), no auto-fix
bash infra/scripts/cluster_healthcheck.sh
# Machine-readable JSON summary
bash infra/scripts/cluster_healthcheck.sh --json
# Only show WARN + FAIL (suppress PASS noise)
bash infra/scripts/cluster_healthcheck.sh --quiet
# Enable auto-fix (delete evicted pods, kick stuck CrashLoop pods)
bash infra/scripts/cluster_healthcheck.sh --fix
# Combined: quiet JSON without auto-fix
bash infra/scripts/cluster_healthcheck.sh --no-fix --quiet --json
# Custom kubeconfig
bash infra/scripts/cluster_healthcheck.sh --kubeconfig /path/to/config
```
## What It Checks (42 checks)
| # | Check | Notes |
|---|-------|-------|
| 1 | Node Status | NotReady nodes, version drift |
| 2 | Node Resources | CPU/mem >80% (warn) / >90% (fail) |
| 3 | Node Conditions | MemoryPressure / DiskPressure / PIDPressure |
| 4 | Problematic Pods | CrashLoopBackOff / Error / ImagePullBackOff |
| 5 | Evicted/Failed Pods | `status.phase=Failed` |
| 6 | DaemonSets | desired == ready |
| 7 | Deployments | ready == desired replicas |
| 8 | PVC Status | all Bound |
| 9 | HPA Health | targets not `<unknown>`, utilization <100% |
| 10 | CronJob Failures | job conditions `Failed=True` in last 24h |
| 11 | CrowdSec Agents | all pods Running |
| 12 | Ingress Routes | every ingress has an LB IP + Traefik LB |
| 13 | Prometheus Alerts | count of firing alerts |
| 14 | Uptime Kuma Monitors | internal + external monitors up |
| 15 | ResourceQuota Pressure | any quota >80% used |
| 16 | StatefulSets | ready == desired |
| 17 | Node Disk Usage | ephemeral-storage <80% |
| 18 | Helm Release Health | all `deployed` (no `pending-*`) |
| 19 | Kyverno Policy Engine | all pods Running |
| 20 | NFS Connectivity | 192.168.1.127 showmount / port 2049 |
| 21 | DNS Resolution | Technitium resolves internal + external |
| 22 | TLS Certificate Expiry | TLS `Secret` certs >30d valid |
| 23 | GPU Health | nvidia namespace + device-plugin Running |
| 24 | Cloudflare Tunnel | pods Running |
| 25 | Resource Usage | node CPU/mem headroom |
| 26 | HA Sofia — Entity Availability | Home Assistant unavailable/unknown count |
| 27 | HA Sofia — Integration Health | config entries setup_error / not_loaded |
| 28 | HA Sofia — Automation Status | disabled / stale (>30d) automations |
| 29 | HA Sofia — System Resources | HA CPU / mem / disk |
| 30 | Hardware Exporters | snmp / idrac-redfish / proxmox / tuya pods + scrapes |
| 31 | cert-manager — Certificate Readiness | Certificate CRs with `Ready!=True` |
| 32 | cert-manager — Certificate Expiry (<14d) | notAfter within 14d |
| 33 | cert-manager — Failed CertificateRequests | `Ready=False, reason=Failed` |
| 34 | Backup Freshness — Per-DB Dumps | MySQL + PG dumps within 25h |
| 35 | Backup Freshness — Offsite Sync | Pushgateway `backup_last_success_timestamp` <27h |
| 36 | Backup Freshness — LVM PVC Snapshots | newest thin snapshot <25h (SSH PVE) |
| 37 | Monitoring — Prometheus + Alertmanager | `/-/ready` + AM pods Running |
| 38 | Monitoring — Vault Sealed Status | `vault status` reports `Sealed: false` |
| 39 | Monitoring — ClusterSecretStore Ready | `vault-kv` + `vault-database` Ready |
| 40 | External — Cloudflared + Authentik Replicas | deployments fully ready |
| 41 | External — ExternalAccessDivergence Alert | alert not firing |
| 42 | External — Traefik 5xx Rate (15m) | top-10 services emitting 5xx |
| # | Check | Auto-Fix | Alerts |
|---|-------|----------|--------|
| 1 | **Node Health** — NotReady nodes, MemoryPressure, DiskPressure, PIDPressure | No | Yes |
| 2 | **Pod Health** — CrashLoopBackOff, ImagePullBackOff, ErrImagePull, Error | Yes (CrashLoop >10 restarts) | Yes |
| 3 | **Evicted/Failed Pods** — Pods in `Failed` phase | Yes (deletes all) | Yes |
| 4 | **Failed Deployments** — Deployments with ready != desired replicas | No | Yes |
| 5 | **Pending PVCs** — PersistentVolumeClaims not in `Bound` state | No | Yes |
| 6 | **Resource Pressure** — Node CPU or memory >80% (warn) or >90% (issue) | No | Yes |
| 7 | **CronJob Failures** — Failed CronJob-owned Jobs in the last 24h | No | Yes |
| 8 | **DaemonSet Health** — DaemonSets with desired != ready | No | Yes |
## Safe Auto-Fix Rules
`--fix` only performs operations that are genuinely reversible and
observable. Nothing here rewrites Terraform state or mutates the cluster
beyond "delete pod".
### Safe to auto-fix (the script does these automatically)
### Done automatically by `--fix`
1. **Evicted/Failed pods** — These are already terminated and just cluttering the namespace:
```bash
kubectl delete pods -A --field-selector=status.phase=Failed
```
- **Evicted / Failed pods** — delete them; the controller recreates.
```bash
kubectl delete pods -A --field-selector=status.phase=Failed
```
- **CrashLoopBackOff pods with >10 restarts** — delete once to reset
backoff timer.
2. **CrashLoopBackOff pods with >10 restarts** — The pod is stuck in a crash loop; deleting lets the controller recreate it with a fresh backoff timer:
```bash
kubectl delete pod -n <namespace> <pod-name> --grace-period=0
```
### NEVER auto-fix (requires human investigation)
- NotReady nodes
- MemoryPressure / DiskPressure / PIDPressure
- ImagePullBackOff (usually a bad tag / registry credential)
- Deployment ready-replica mismatch
- Pending PVCs
- Node CPU/memory >90%
- CronJob failures
- DaemonSet desired != ready
- Vault sealed
- ClusterSecretStore not Ready
- cert-manager Certificate failures
- Backup freshness regressions
- Any external-reachability failure
- **NotReady nodes** — Could be network, kubelet, or hardware issue; needs SSH investigation
- **DiskPressure / MemoryPressure / PIDPressure** — Root cause must be identified
- **ImagePullBackOff** — Usually a wrong image tag or registry issue; needs config fix
- **Failed deployments** — Could be resource limits, bad config, missing secrets
- **Pending PVCs** — Usually NFS export missing or storage class issue
- **Resource pressure >90%** — Need to identify which pods are consuming resources
- **CronJob failures** — Need to check job logs to understand why it failed
- **DaemonSet issues** — Could be node taints, resource limits, or image issues
## Deep-investigation recipes per failure mode
## Deep Investigation
### Node Issues (checks 1, 3, 17, 25)
When the health check reports issues, use these commands to investigate further.
### Node Issues
```bash
kubectl describe node <node>
# Describe the problematic node (events, conditions, capacity)
kubectl describe node <node-name>
# Check resource usage across all nodes
kubectl top nodes
kubectl get events --field-selector involvedObject.name=<node> --sort-by='.lastTimestamp'
# SSH to the node
ssh root@10.0.20.10X
# Check recent events on a specific node
kubectl get events --field-selector involvedObject.name=<node-name> --sort-by='.lastTimestamp'
# SSH to the node for direct inspection
ssh root@<node-ip>
systemctl status kubelet
journalctl -u kubelet --since "30 minutes ago" | tail -100
df -h ; free -h
df -h
free -h
```
Node IPs: `10.0.20.100` master, `.101` node1 (GPU), `.102` node2,
`.103` node3, `.104` node4.
### Pod Issues (checks 4, 5, 11, 19)
### Pod Issues
```bash
kubectl describe pod -n <ns> <pod>
kubectl logs -n <ns> <pod> --tail=200
kubectl logs -n <ns> <pod> --previous --tail=200
kubectl get events -n <ns> --sort-by='.lastTimestamp' | tail -20
# Describe the pod (events, conditions, container statuses)
kubectl describe pod -n <namespace> <pod-name>
# Check current logs
kubectl logs -n <namespace> <pod-name> --tail=100
# Check logs from the previous crashed container
kubectl logs -n <namespace> <pod-name> --previous --tail=100
# Check events in the namespace
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20
# Check all pods in a namespace
kubectl get pods -n <namespace> -o wide
```
Common failure causes: OOMKilled (raise mem limit in Terraform), bad
config / missing env var, DB connection failure (check `dbaas` pods),
NFS mount failure (`showmount -e 192.168.1.127`), stale
imagePullSecret.
### Deployment / StatefulSet / DaemonSet (checks 6, 7, 16)
### Deployment Issues
```bash
kubectl describe deployment -n <ns> <name>
kubectl rollout status deployment -n <ns> <name>
kubectl rollout history deployment -n <ns> <name>
kubectl get rs -n <ns> -l app=<app>
# Describe the deployment (strategy, conditions, events)
kubectl describe deployment -n <namespace> <deployment-name>
# Check rollout status
kubectl rollout status deployment -n <namespace> <deployment-name>
# Check rollout history
kubectl rollout history deployment -n <namespace> <deployment-name>
# Check the replicaset
kubectl get rs -n <namespace> -l app=<app-label>
```
### PVC (check 8)
### PVC Issues
```bash
kubectl describe pvc -n <ns> <pvc>
kubectl get events -n <ns> --field-selector reason=FailedMount --sort-by='.lastTimestamp'
kubectl get pv | grep <pvc>
showmount -e 192.168.1.127
# Describe the PVC (events, status, storage class)
kubectl describe pvc -n <namespace> <pvc-name>
# Check PVs
kubectl get pv
# Check events related to PVCs
kubectl get events -n <namespace> --field-selector reason=FailedMount --sort-by='.lastTimestamp'
# Verify NFS export exists
showmount -e 10.0.10.15 | grep <service-name>
```
### cert-manager (checks 31, 32, 33)
### Resource Pressure
```bash
kubectl get certificate -A
kubectl describe certificate -n <ns> <name>
kubectl get certificaterequest -A
kubectl describe certificaterequest -n <ns> <name>
kubectl logs -n cert-manager deploy/cert-manager | tail -50
# Top nodes (CPU and memory usage)
kubectl top nodes
# Top pods sorted by memory (cluster-wide)
kubectl top pods -A --sort-by=memory | head -20
# Top pods sorted by CPU (cluster-wide)
kubectl top pods -A --sort-by=cpu | head -20
# Check resource requests/limits in a namespace
kubectl describe resourcequota -n <namespace>
kubectl describe limitrange -n <namespace>
```
Common causes: ACME HTTP-01 challenge blocked, ClusterIssuer missing
DNS provider secret, rate-limit from Let's Encrypt.
## Common Remediation
### Backups (checks 34, 35, 36)
### Persistent CrashLoopBackOff
```bash
# Per-DB dumps (inside the DB pod)
kubectl exec -n dbaas mysql-standalone-0 -- ls -lah /backup/per-db/
kubectl exec -n dbaas pg-cluster-0 -- ls -lah /backup/per-db/
A pod keeps crashing even after the auto-fix deletes it.
# Pushgateway metrics
kubectl exec -n monitoring deploy/prometheus-server -- \
wget -qO- http://prometheus-prometheus-pushgateway:9091/metrics | \
grep backup_last_success_timestamp
1. **Check logs from the crashed container**:
```bash
kubectl logs -n <namespace> <pod-name> --previous --tail=200
```
# LVM snapshots on PVE host
ssh -o BatchMode=yes root@192.168.1.127 \
'lvs -o lv_name,lv_time,lv_size --noheadings | grep snap'
```
2. **Check the pod description for clues**:
```bash
kubectl describe pod -n <namespace> <pod-name>
```
Look for:
- `OOMKilled` in Last State — the container ran out of memory
- `Error` with exit code 1 — application error (bad config, missing env var, DB connection failure)
- `Error` with exit code 137 — killed by OOM killer or liveness probe
- `Error` with exit code 143 — SIGTERM (graceful shutdown failure)
If offsite sync is stale, the common cause is the
`offsite-sync-backup.service` systemd unit on the PVE host failing.
`ssh root@192.168.1.127 'systemctl status offsite-sync-backup'`.
3. **Common causes**:
- **OOMKilled**: Increase memory limits in Terraform (see below)
- **Bad config**: Check environment variables, secrets, config maps
- **DB connection failure**: Verify the database pod is running (`kubectl get pods -n dbaas`)
- **NFS mount failure**: Verify NFS export exists (`showmount -e 10.0.10.15`)
- **Missing secret**: Check if TLS secret or other secrets exist in the namespace
### Monitoring stack (checks 37, 38, 39)
### OOMKilled
```bash
# Prometheus
kubectl exec -n monitoring deploy/prometheus-server -- wget -qO- http://localhost:9090/-/ready
kubectl logs -n monitoring deploy/prometheus-server --tail=100
The container was killed because it exceeded its memory limit.
# Alertmanager
kubectl get pods -n monitoring | grep alertmanager
kubectl logs -n monitoring -l app=prometheus-alertmanager --tail=100
1. **Check current limits**:
```bash
kubectl describe pod -n <namespace> <pod-name> | grep -A 5 "Limits"
```
# Vault
kubectl exec -n vault vault-0 -- sh -c 'VAULT_ADDR=http://127.0.0.1:8200 vault status'
# If sealed: check raft peers with `vault operator raft list-peers` and unseal.
2. **Fix in Terraform** — Edit `modules/kubernetes/<service>/main.tf` and increase the memory limit:
```hcl
resources {
limits = {
memory = "2Gi" # Increase from current value
}
}
```
# ClusterSecretStore
kubectl get clustersecretstore
kubectl describe clustersecretstore vault-kv vault-database
kubectl logs -n external-secrets deploy/external-secrets --tail=100
```
3. **Apply the change**:
```bash
cd /workspace/infra
terraform apply -target=module.kubernetes_cluster.module.<service> -auto-approve
```
### External reachability (checks 40, 41, 42)
### ImagePullBackOff
```bash
# Cloudflared
kubectl get pods -n cloudflared
kubectl logs -n cloudflared -l app=cloudflared --tail=100
The container image cannot be pulled.
# Authentik
kubectl get pods -n authentik -l app=authentik-server
kubectl logs -n authentik -l app=authentik-server --tail=100
1. **Check the exact error**:
```bash
kubectl describe pod -n <namespace> <pod-name> | grep -A 5 "Events"
```
# ExternalAccessDivergence alert
kubectl exec -n monitoring deploy/prometheus-server -- \
wget -qO- 'http://localhost:9090/api/v1/alerts' | \
python3 -m json.tool | grep -A 5 ExternalAccessDivergence
2. **Common causes**:
- **Wrong image tag**: Verify the tag exists on the registry (Docker Hub, ghcr.io, etc.)
- **Private registry without credentials**: Check if imagePullSecrets are configured
- **Pull-through cache issue**: The registry cache at `10.0.20.10` may have a stale entry
```bash
# Check pull-through cache ports:
# 5000 = docker.io, 5010 = ghcr.io, 5020 = quay.io, 5030 = registry.k8s.io
curl -s http://10.0.20.10:5000/v2/_catalog | python3 -m json.tool
```
- **Registry rate limit**: Docker Hub free tier has pull limits; pull-through cache helps avoid this
# Traefik 5xx — find the hot service
kubectl exec -n monitoring deploy/prometheus-server -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=topk(10,rate(traefik_service_requests_total{code=~%225..%22}%5B15m%5D))' \
| python3 -m json.tool
```
3. **Fix**: Update the image tag in the service's Terraform module and re-apply.
### OOMKilled remediation
### Node NotReady
1. `kubectl describe pod -n <ns> <pod> | grep -A 5 Limits`
2. Edit `infra/modules/kubernetes/<service>/main.tf` and raise
`resources.limits.memory`.
3. `cd /home/wizard/code/infra && scripts/tg apply` (Tier 1) or
`terraform apply -target=module.<service>` as appropriate.
A node has gone NotReady.
### ImagePullBackOff remediation
1. **Check node conditions**:
```bash
kubectl describe node <node-name> | grep -A 20 "Conditions"
```
1. `kubectl describe pod -n <ns> <pod> | grep -A 5 Events`
2. Verify tag exists on the source registry.
3. Check pull-through cache at `10.0.20.10:{5000,5010,5020,5030}`.
4. Update the image tag in Terraform + re-apply.
2. **SSH to the node and check kubelet**:
```bash
ssh root@<node-ip>
systemctl status kubelet
journalctl -u kubelet --since "10 minutes ago" | tail -50
```
### Persistent CrashLoopBackOff after auto-fix
3. **Check resources**:
```bash
# On the node
df -h # Disk space
free -h # Memory
top -bn1 # CPU/processes
```
1. `kubectl logs -n <ns> <pod> --previous --tail=200`
2. `kubectl describe pod -n <ns> <pod>` and check Last State:
- `OOMKilled` → raise memory limit
- Exit code 137 → OOM or probe killed
- Exit code 143 → SIGTERM / graceful shutdown failed
3. Cross-check dbaas + NFS + secrets are healthy.
4. **Node IPs** (for SSH):
- `10.0.20.100` — k8s-master
- `10.0.20.101` — k8s-node1 (GPU)
- `10.0.20.102` — k8s-node2
- `10.0.20.103` — k8s-node3
- `10.0.20.104` — k8s-node4
## Notes on the canonical / hardlink setup
## Slack Webhook
The authoritative copy of this SKILL.md lives at
`/home/wizard/code/.claude/skills/cluster-health/SKILL.md`. A hardlink
at `/home/wizard/code/infra/.claude/skills/cluster-health/SKILL.md`
points to the same inode so infra-rooted sessions also discover the
skill.
The script posts results to the Slack incoming webhook URL in `$SLACK_WEBHOOK_URL`. The message format uses Slack mrkdwn:
- All clear: green checkmark with node/pod count
- Warnings only: warning icon with details
- Issues found: red alert icon with auto-fixes applied and remaining issues
To verify the hardlink is intact:
The webhook URL is passed as an environment variable from `openclaw_skill_secrets` in `terraform.tfvars`.
```bash
stat -c '%i %n' \
/home/wizard/code/.claude/skills/cluster-health/SKILL.md \
/home/wizard/code/infra/.claude/skills/cluster-health/SKILL.md
```
## Infrastructure
Both should print the same inode number. If they diverge (e.g. `git
checkout` replaced the file rather than updating it), re-link:
| Component | Path / Location |
|-----------|----------------|
| Health check script | `/workspace/infra/.claude/cluster-health.sh` (in-pod) or `.claude/cluster-health.sh` (repo) |
| Terraform module | `modules/kubernetes/openclaw/main.tf` |
| CronJob definition | Defined in the OpenClaw Terraform module |
| Existing full healthcheck | `scripts/cluster_healthcheck.sh` (local-only, 24 checks with color output) |
| Infra repo (in pod) | `/workspace/infra` |
| kubectl (in pod) | `/tools/kubectl` |
| terraform (in pod) | `/tools/terraform` |
```bash
ln -f /home/wizard/code/.claude/skills/cluster-health/SKILL.md \
/home/wizard/code/infra/.claude/skills/cluster-health/SKILL.md
```
## Auto-File Incidents for SEV1/SEV2
After running health checks, if **SEV1 or SEV2 issues** are found (node down, multiple services affected, core service outage, or single important service down), auto-file a GitHub Issue:
### Severity Classification
- **SEV1**: Node NotReady, multiple services down, data at risk, core service outage (DNS, auth, ingress, databases)
- **SEV2**: Single non-core service down, degraded performance, persistent CrashLoopBackOff
- **SEV3**: Warnings only, resource pressure <90%, cosmetic do NOT auto-file
### Workflow
1. **Dedup check**: Before filing, query open incidents:
```bash
GITHUB_TOKEN=$(vault kv get -field=github_pat secret/viktor)
curl -s -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/ViktorBarzin/infra/issues?labels=incident&state=open&per_page=50"
```
If an open issue already covers the same service/namespace, **skip filing**.
2. **File the issue** with labels `incident`, `sev1` or `sev2`, `postmortem-required`:
- Title: `[AUTO] <Service/Namespace> — <brief symptom>`
- Body: full diagnostic dump (pod status, events, alerts, node state)
- The issue-automation GHA workflow will trigger the post-mortem pipeline automatically
3. **Auto-close recovered services**: If a service that previously had an auto-filed incident is now healthy:
```bash
# Comment and close
curl -s -X POST -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/ViktorBarzin/infra/issues/<N>/comments" \
-d '{"body": "**Resolved** — Service recovered. Auto-closed by cluster health check."}'
curl -s -X PATCH -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/ViktorBarzin/infra/issues/<N>" \
-d '{"state": "closed"}'
```
## Post-Mortem Auto-Suggest
After running a healthcheck, if the cluster has **recovered from an unhealthy state** (previous run showed FAIL items that are now resolved), suggest writing a post-mortem:
> The cluster has recovered from the previous unhealthy state. Would you like me to write a post-mortem? Run `/post-mortem` to generate one.
This ensures incidents are documented while context is fresh.
## Notes
1. This script is designed to run inside the OpenClaw pod where kubectl is pre-configured via the ServiceAccount
2. The full `scripts/cluster_healthcheck.sh` script runs 24 checks and is meant for local interactive use; this skill's script runs 8 core checks optimized for automated CronJob execution
3. When investigating issues interactively, prefer running commands directly rather than re-running the script
4. All Terraform changes must go through the `.tf` files — never use `kubectl apply/edit/patch` for persistent changes

View file

@ -23,14 +23,6 @@ steps:
username: viktorbarzin
password:
from_secret: dockerhub-pat
# Private registry on :5050 requires htpasswd auth since 2026-03-22.
# Without this, buildx pushes the second repo but blob HEAD comes
# back 401 → pipeline fails → CI false-negative (see bd code-12b).
- registry: registry.viktorbarzin.me:5050
username:
from_secret: registry_user
password:
from_secret: registry_password
dockerfile: cli/Dockerfile
context: cli
auto_tag: true

View file

@ -37,12 +37,6 @@ steps:
environment:
SLACK_WEBHOOK:
from_secret: slack_webhook
# Each `- |` command runs in a fresh shell, so we can't rely on an
# `export VAULT_ADDR=...` in the auth command persisting — pin it at
# step level. VAULT_TOKEN is still per-command; we persist it to
# ~/.vault-token (auto-read by `vault` CLI) so downstream commands
# don't need explicit token propagation.
VAULT_ADDR: http://vault-active.vault.svc.cluster.local:8200
commands:
# ── Skip CI commits ──
- |
@ -61,17 +55,9 @@ steps:
# ── Vault auth ──
- |
SA_TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
VAULT_TOKEN=$(curl -s -X POST "$VAULT_ADDR/v1/auth/kubernetes/login" \
export VAULT_ADDR=http://vault-active.vault.svc.cluster.local:8200
export VAULT_TOKEN=$(curl -s -X POST "$VAULT_ADDR/v1/auth/kubernetes/login" \
-d "{\"role\":\"ci\",\"jwt\":\"$SA_TOKEN\"}" | jq -r .auth.client_token)
if [ -z "$VAULT_TOKEN" ] || [ "$VAULT_TOKEN" = "null" ]; then
echo "ERROR: Vault K8s auth failed (role=ci, ns=woodpecker)" >&2
exit 1
fi
# Persist for downstream `- |` blocks (each runs in a fresh shell,
# so exporting VAULT_TOKEN wouldn't help). `vault`, `scripts/tg`,
# and `scripts/state-sync` all fall through to ~/.vault-token when
# the env var is unset.
umask 077; printf '%s' "$VAULT_TOKEN" > "$HOME/.vault-token"
# ── Detect changed stacks ──
- |
@ -137,7 +123,6 @@ steps:
# ── Apply platform stacks (serial, with Vault advisory locks) ──
- |
FAILED_PLATFORM_STACKS=""
if [ -s .platform_apply ]; then
echo "=== Applying platform stacks (serial, locked) ==="
while read -r stack; do
@ -152,7 +137,6 @@ steps:
else
echo "$OUTPUT" | tail -5
echo "[$stack] FAILED (exit $EXIT)"
FAILED_PLATFORM_STACKS="$FAILED_PLATFORM_STACKS $stack"
fi
else
echo "$OUTPUT" | tail -3
@ -160,12 +144,9 @@ steps:
fi
done < .platform_apply
fi
# Deferred until after app stacks so both lists get a chance to run.
echo "$FAILED_PLATFORM_STACKS" > .platform_failed
# ── Apply app stacks (serial, with Vault advisory locks) ──
- |
FAILED_APP_STACKS=""
if [ -s .app_apply ]; then
echo "=== Applying app stacks (serial, locked) ==="
while read -r stack; do
@ -180,7 +161,6 @@ steps:
else
echo "$OUTPUT" | tail -5
echo "[$stack] FAILED (exit $EXIT)"
FAILED_APP_STACKS="$FAILED_APP_STACKS $stack"
fi
else
echo "$OUTPUT" | tail -3
@ -188,15 +168,6 @@ steps:
fi
done < .app_apply
fi
# Fail the step loudly so the pipeline `default` workflow state
# reflects reality — the service-upgrade agent and CI alert cascade
# both rely on this (see bd code-e1x). Lock-skipped stacks are NOT
# counted as failures.
FAILED_PLATFORM=$(cat .platform_failed 2>/dev/null | tr -d ' ')
if [ -n "$FAILED_PLATFORM" ] || [ -n "$FAILED_APP_STACKS" ]; then
echo "=== FAILED STACKS: platform=[$FAILED_PLATFORM ] apps=[$FAILED_APP_STACKS ] ==="
exit 1
fi
# ── Commit and push state changes ──
- |

View file

@ -99,7 +99,7 @@ Terragrunt-based homelab managing a Kubernetes cluster (5 nodes, v1.34.2) on Pro
- `config.tfvars` — non-secret configuration (plaintext)
- `secrets.sops.json` — all secrets (SOPS-encrypted JSON)
- `terraform.tfvars` — legacy secrets file (git-crypt, kept for reference)
- `scripts/cluster_healthcheck.sh`42-check cluster health script (nodes, workloads, monitoring, certs, backups, external reachability)
- `scripts/cluster_healthcheck.sh` — 25-check cluster health script
## Storage
- **NFS** (`nfs-proxmox` StorageClass): For app data. Use the `nfs_volume` module, never inline `nfs {}` blocks.
@ -118,20 +118,6 @@ Terragrunt-based homelab managing a Kubernetes cluster (5 nodes, v1.34.2) on Pro
## Shared Variables (never hardcode)
`var.nfs_server` (192.168.1.127), `var.redis_host`, `var.postgresql_host`, `var.mysql_host`, `var.ollama_host`, `var.mail_host`
## Redis Service Naming (read before wiring a new consumer)
The Redis stack (`stacks/redis/`) exposes three distinct entry points. Pick the one that matches the client's connection pattern — the wrong one causes READONLY errors or silent connection drops.
| Endpoint | Port(s) | Use for | Backed by |
|----------|---------|---------|-----------|
| `redis-master.redis.svc.cluster.local` | 6379 (redis), 26379 (sentinel) | **Default for new services.** Write-safe — HAProxy health-checks nodes and routes only to the current master. Matches `var.redis_host`. | `kubernetes_service.redis_master` → HAProxy → Bitnami StatefulSet |
| `redis-node-{0,1,2}.redis-headless.redis.svc.cluster.local` | 26379 | **Long-lived connections (PUBSUB, BLPOP, MONITOR, Sidekiq).** Use a sentinel-aware client with master name `mymaster`. Example: `stacks/nextcloud/chart_values.yaml:32-54`. | Bitnami-created headless service → pod DNS |
| `redis.redis.svc.cluster.local` | 6379 | **Do NOT use.** Helm chart's default service — selector patched by `null_resource.patch_redis_service` to match `redis-haproxy`, so today it behaves like `redis-master`. This patch is load-bearing but temporary; consumers hard-coded on this name are tracked in a beads follow-up (T0). | Bitnami chart (patched) |
**HAProxy's `timeout client 30s` closes idle raw Redis connections** — any client that holds a connection open for pub/sub, blocking commands, or replication streams MUST use the sentinel path. Uptime Kuma's Redis monitor hit this limit and had to be re-pointed at the sentinel endpoint (see memory id=748).
**When onboarding a new service:** start from `redis-master.redis.svc.cluster.local:6379` via `var.redis_host`. Only reach for sentinel discovery if the client library supports it natively (ioredis, redis-py Sentinel, go-redis FailoverClient, Sidekiq `sentinels` array) AND the workload uses long-lived connections.
## Kyverno Drift Suppression (`# KYVERNO_LIFECYCLE_V1`)
Kyverno's admission webhook mutates every pod with a `dns_config { option { name = "ndots"; value = "2" } }` block (fixes NxDomain search-domain floods — see `k8s-ndots-search-domain-nxdomain-flood` skill). Terraform does not manage that field, so without suppression every pod-owning resource shows perpetual `spec[0].template[0].spec[0].dns_config` drift.

Binary file not shown.

View file

@ -120,31 +120,9 @@ graph TB
### Redis
Single shared cluster for all 17 consumers (Immich, Authentik, Nextcloud, Paperless, Dawarich Sidekiq, Traefik, etc.). HAProxy (3 replicas, PDB minAvailable=2) is the sole client-facing path — clients talk only to `redis-master.redis.svc.cluster.local:6379` and HAProxy health-checks backends via `INFO replication`, routing only to `role:master`.
**Current state (as of 2026-04-19, interim — parallel cluster during rework)**:
| Cluster | Pods | Source | Purpose |
|---|---|---|---|
| Legacy `redis-node-*` | 1 master + 1 replica (2 sentinels) | Bitnami Helm chart v25.3.2 | Serving live traffic via HAProxy |
| New `redis-v2-*` | 3 pods, each co-locating redis + sentinel + exporter | Raw `kubernetes_stateful_set_v1` with `redis:7.4-alpine` | Standing by for REPLICAOF-based cutover |
Both clusters live in the `redis` namespace. See `infra/stacks/redis/modules/redis/main.tf` (end-state; legacy `helm_release.redis` + `kubernetes_stateful_set_v1.redis_v2` coexist until cutover).
**Target architecture (post-cutover)**:
- 3 redis pods + 3 co-located sentinels (quorum=2). Odd sentinel count eliminates split-brain.
- `podManagementPolicy=Parallel` + init container that regenerates `sentinel.conf` on every boot by probing peer sentinels for consensus master. No persistent sentinel runtime state — can't drift out of sync with reality (root cause of 2026-04-19 PM incident).
- redis.conf has `include /shared/replica.conf`; the init container writes either an empty file (master) or `replicaof <master> 6379` (replicas), so pods come up already in the right role — no bootstrap race.
- Memory: master + replicas `requests=limits=768Mi`. Concurrent BGSAVE + AOF-rewrite fork can double RSS via COW, so headroom must cover it. `auto-aof-rewrite-percentage=200` + `auto-aof-rewrite-min-size=128mb` tune down rewrite frequency.
- Persistence: RDB (`save 900 1 / 300 100 / 60 10000`) + AOF `appendfsync=everysec`. Disk-wear analysis on 2026-04-19 (sdb Samsung 850 EVO 1TB, 150 TBW): Redis contributes <1 GB/day cluster-wide 40+ year runway at the 20% TBW budget.
- `maxmemory=640mb` (83% of 768Mi limit), `maxmemory-policy=allkeys-lru`.
- Weekly RDB backup to NFS (`/srv/nfs/redis-backup/`, Sunday 03:00, 28-day retention, pushes Pushgateway metrics).
- Auth disabled this phase — NetworkPolicy is the isolation layer. Enabling `requirepass` + rolling creds to all 17 clients is a planned follow-up.
**Observability** (redis-v2 only): `oliver006/redis_exporter:v1.62.0` sidecar per pod on port 9121, auto-scraped via Prometheus pod annotation. Alerts: `RedisDown`, `RedisMemoryPressure`, `RedisEvictions`, `RedisReplicationLagHigh`, `RedisForkLatencyHigh`, `RedisAOFRewriteLong`, `RedisReplicasMissing`, `RedisBackupStale`, `RedisBackupNeverSucceeded`.
**Why this design** — three incidents in April 2026 drove the rework: (a) 2026-04-04 service selector routed reads+writes to master+replica causing `READONLY` errors; (b) 2026-04-19 AM master OOMKilled during BGSAVE+PSYNC with the 256Mi limit too tight for a 204 MB working set under COW amplification; (c) 2026-04-19 PM sentinel runtime state drifted (only 2 sentinels, no majority) and routed writes to a slave. See beads epic `code-v2b` for the full plan and linked challenger analyses.
- Shared instance at `redis.redis.svc.cluster.local`
- Used for caching and session storage
- No persistence (ephemeral)
### SQLite (Per-App)

View file

@ -1,6 +1,6 @@
# DNS Architecture
Last updated: 2026-04-19
Last updated: 2026-04-15
## Overview
@ -254,42 +254,27 @@ Config is synced to all 3 Technitium instances by CronJob `technitium-split-hori
## CoreDNS Configuration
CoreDNS is managed via Terraform in `stacks/technitium/modules/technitium/` — the Corefile ConfigMap lives in `main.tf`, and scaling/PDB are in `coredns.tf` (a `kubernetes_deployment_v1_patch` against the kubeadm-managed Deployment).
CoreDNS is managed via a Terraform `kubernetes_config_map` resource in `stacks/technitium/modules/technitium/main.tf`.
```
.:53 {
errors / health / ready
kubernetes cluster.local in-addr.arpa ip6.arpa # K8s service discovery
prometheus :9153 # Metrics
forward . 10.0.20.1 8.8.8.8 1.1.1.1 {
policy sequential # try upstreams in order
health_check 5s # mark unhealthy in 5s
max_fails 2
}
cache {
success 10000 300 6
denial 10000 300 60
serve_stale 86400s # resilience during upstream outage
}
forward . 10.0.20.1 8.8.8.8 1.1.1.1 # pfSense → Google → Cloudflare
cache (success 10000 300, denial 10000 300)
loop / reload / loadbalance
}
viktorbarzin.lan:53 {
template: .*\..*\.viktorbarzin\.lan\.$ → NXDOMAIN # ndots:5 junk filter
forward . 10.96.0.53 { # Technitium ClusterIP
health_check 5s
max_fails 2
}
cache (success 10000 300, denial 10000 300, serve_stale 86400s)
forward . 10.96.0.53 # Technitium ClusterIP
cache (success 10000 300, denial 10000 300)
}
```
**Scaling**: 3 replicas, `required` anti-affinity on `kubernetes.io/hostname` (spread across 3 distinct nodes). PodDisruptionBudget `coredns` with `minAvailable=2`.
**Kyverno ndots injection**: A Kyverno policy injects `ndots:2` on all pods cluster-wide to reduce search domain expansion noise. The template regex is a second layer of defense for any queries that still get expanded.
**Failover behaviour**: With `policy sequential` on the root forward block, CoreDNS tries pfSense first; if `health_check 5s` detects pfSense as down, it fails over to 8.8.8.8 then 1.1.1.1 within ~5s rather than timing out per-query. Combined with `serve_stale`, pods keep resolving cached names for up to 24h even with full upstream failure.
## Cloudflare DNS — External Domains
All public domains are under the `viktorbarzin.me` zone. DNS records are **auto-created per service** via the `ingress_factory` module's `dns_type` parameter. A small number of records (Helm-managed ingresses, special cases) remain centrally managed in `config.tfvars`.
@ -375,28 +360,9 @@ Vault DB engine rotates password
| Metric Source | Dashboard | Alerts |
|---------------|-----------|--------|
| Technitium query logs (PostgreSQL) | Grafana `technitium-dns.json` | — |
| CoreDNS Prometheus metrics (:9153) | Grafana CoreDNS dashboard | `CoreDNSErrors`, `CoreDNSForwardFailureRate` |
| Technitium zone-sync CronJob (Pushgateway) | — | `TechnitiumZoneSyncFailed`, `TechnitiumZoneSyncStale`, `TechnitiumZoneCountMismatch` |
| Technitium DNS pod availability | — | `TechnitiumDNSDown` |
| `dns-anomaly-monitor` CronJob (Pushgateway) | — | `DNSQuerySpike`, `DNSQueryRateDropped`, `DNSHighErrorRate` |
| CoreDNS Prometheus metrics (:9153) | Grafana CoreDNS dashboard | — |
| Uptime Kuma | External monitors for all proxied domains | ExternalAccessDivergence (15min) |
### Metrics pushed by `technitium-zone-sync`
The zone-sync CronJob (runs every 30min) pushes the following to the Prometheus Pushgateway under `job=technitium-zone-sync`:
| Metric | Labels | Meaning |
|--------|--------|---------|
| `technitium_zone_sync_status` | — | 0 = last run succeeded, 1 = at least one zone failed to create |
| `technitium_zone_sync_failures` | — | Number of zones that failed to create this run |
| `technitium_zone_sync_last_run` | — | Unix timestamp of last run (used by `TechnitiumZoneSyncStale`) |
| `technitium_zone_count` | `instance=primary\|<replica-host>` | Zone count on each Technitium instance (drives `TechnitiumZoneCountMismatch`) |
### DNS alert rewrites
- `DNSQuerySpike` was previously broken: it compared current queries against `dns_anomaly_avg_queries`, which was computed from a per-pod `/tmp/dns_avg` file. Each CronJob run started with a fresh `/tmp`, so `NEW_AVG == TOTAL_QUERIES` every time and the spike condition could never fire. Rewritten to use `avg_over_time(dns_anomaly_total_queries[1h] offset 15m)` which compares against the actual 1h Prometheus history.
- `DNSQueryRateDropped` (new): fires when query rate drops below 50% of 1h average — upstream clients may be failing to reach Technitium.
## Troubleshooting
### DNS Not Resolving Internal Domains

View file

@ -1,147 +1,72 @@
# Mail Server Architecture
Last updated: 2026-04-19 (code-yiu Phase 6: MetalLB LB retired; traffic now enters via pfSense HAProxy with PROXY v2)
Last updated: 2026-04-18 (SPF switched to Brevo; DMARC reporting address normalized)
## Overview
Self-hosted email for `viktorbarzin.me` using docker-mailserver 15.0.0 on Kubernetes. Inbound mail arrives directly via MX record to the home IP on port 25. Outbound mail relays through Brevo EU (`smtp-relay.brevo.com:587` — migrated from Mailgun on 2026-04-12; SPF record cut over on 2026-04-18). Roundcubemail provides webmail access. CrowdSec protects SMTP/IMAP from brute-force attacks using real client IPs: pfSense HAProxy injects the PROXY v2 header on each backend connection so the mailserver pod sees the true source IP despite kube-proxy SNAT. See [`runbooks/mailserver-pfsense-haproxy.md`](../runbooks/mailserver-pfsense-haproxy.md) for ops details.
Self-hosted email for `viktorbarzin.me` using docker-mailserver 15.0.0 on Kubernetes. Inbound mail arrives directly via MX record to the home IP on port 25. Outbound mail relays through Brevo EU (`smtp-relay.brevo.com:587` — migrated from Mailgun on 2026-04-12; SPF record cut over on 2026-04-18). Roundcubemail provides webmail access. CrowdSec protects SMTP/IMAP from brute-force attacks using real client IPs via `externalTrafficPolicy: Local` on a dedicated MetalLB IP.
## Architecture Diagram
Two independent paths into the mailserver pod:
- **External** (MX traffic, webmail clients over WAN): Internet → pfSense → HAProxy → NodePort → **alt container ports** (2525/4465/5587/10993) that **require** PROXY v2 framing.
- **Intra-cluster** (Roundcube, E2E probe): same pod, **stock container ports** (25/465/587/993), **no** PROXY framing.
One Deployment, one pod, two sets of Postfix `master.cf` services + Dovecot `inet_listener` blocks, two Kubernetes Services (`mailserver` ClusterIP + `mailserver-proxy` NodePort).
```mermaid
flowchart TB
%% External ingress path
SENDER[Sending MTA<br/>arbitrary public IP] -->|MX lookup + SMTP<br/>:25| MX[mail.viktorbarzin.me<br/>A 176.12.22.76]
MX --> PF[pfSense WAN<br/>vtnet0 192.168.1.2]
PF -->|NAT rdr<br/>WAN:25/465/587/993<br/>→ 10.0.20.1:same| HAP
HAP[pfSense HAProxy<br/>4 TCP frontends on 10.0.20.1<br/>send-proxy-v2 to backends]
HAP -->|round-robin<br/>tcp-check inter 120s| KN{k8s worker<br/>node1..4}
KN -->|NodePort 30125-30128<br/>ETP: Cluster → kube-proxy SNAT| PODEXT
%% Internal ingress path
RC[Roundcubemail pod] -->|SMTP :587 + IMAP :993<br/>no PROXY| SVC[Service mailserver<br/>ClusterIP 10.103.108.x<br/>25/465/587/993]
PROBE[email-roundtrip-monitor<br/>CronJob every 20m] -->|IMAP :993<br/>no PROXY| SVC
SVC -->|kube-proxy routes| PODINT
%% The pod — two listener sets, one process tree
subgraph POD["mailserver pod (docker-mailserver 15.0.0)"]
direction LR
PODEXT[Alt ports<br/>2525 / 4465 / 5587 / 10993<br/><b>PROXY v2 REQUIRED</b><br/>smtpd_upstream_proxy_protocol=haproxy<br/>haproxy = yes]
PODINT[Stock ports<br/>25 / 465 / 587 / 993<br/>PROXY-free]
PODEXT --> POSTFIX
PODINT --> POSTFIX
POSTFIX[Postfix<br/>postscreen + smtpd + cleanup + queue]
POSTFIX --> RSPAMD[Rspamd<br/>spam + DKIM + DMARC]
RSPAMD --> DOVECOT[Dovecot IMAP<br/>LMTP deliver]
DOVECOT --> MAILBOX[(Maildir storage<br/>mailserver-data-encrypted PVC<br/>proxmox-lvm-encrypted LUKS2)]
graph TB
subgraph "Inbound Mail"
SENDER[Sending MTA] -->|MX lookup| MX[mail.viktorbarzin.me:25]
MX -->|176.12.22.76:25| PF[pfSense NAT]
PF -->|10.0.20.202:25| MLB[MetalLB<br/>ETP: Local]
MLB --> POSTFIX[Postfix MTA]
end
%% Outbound
POSTFIX -->|queued mail<br/>SASL + TLS| BREVO[Brevo EU Relay<br/>smtp-relay.brevo.com:587<br/>300/day free tier]
BREVO --> RECIPIENT[External Recipient]
subgraph "Mail Processing"
POSTFIX --> RSPAMD[Rspamd<br/>Spam/DKIM/DMARC]
RSPAMD --> DOVECOT[Dovecot IMAP]
DOVECOT --> MAILBOX[(Mailboxes<br/>proxmox-lvm PVC)]
end
%% Webmail HTTP path
USER[User browser] -->|HTTPS| CF[Cloudflare proxy<br/>mail.viktorbarzin.me]
CF --> TUNNEL[Cloudflared tunnel<br/>pfSense → Traefik]
TUNNEL --> TRAEFIK[Traefik Ingress<br/>Authentik-protected]
TRAEFIK --> RC
subgraph "Outbound Mail"
POSTFIX_OUT[Postfix] -->|SASL + TLS| MAILGUN[Brevo EU Relay<br/>smtp-relay.brevo.com:587]
MAILGUN --> RECIPIENT[Recipient]
end
%% Security
POSTFIX -.->|log stream<br/>real client IPs from PROXY v2| CSAGENT[CrowdSec Agent<br/>postfix + dovecot parsers]
CSAGENT -.-> CSLAPI[CrowdSec LAPI]
CSLAPI -.->|bouncer decisions<br/>ban external IPs| PF
subgraph "Webmail"
USER[User] -->|HTTPS| TRAEFIK[Traefik Ingress]
TRAEFIK --> RC[Roundcubemail]
RC -->|IMAP 993| DOVECOT
RC -->|SMTP 587| POSTFIX_OUT
end
%% Monitoring
PROBE -.->|Brevo HTTP API<br/>triggers external delivery| MX
PROBE -.->|Push on roundtrip success| PUSH[Pushgateway + Uptime Kuma]
subgraph "Security"
MLB -->|Real client IPs| CS_AGENT[CrowdSec Agent<br/>postfix + dovecot parsers]
CS_AGENT --> CS_LAPI[CrowdSec LAPI]
end
classDef extPath fill:#ffedd5,stroke:#ea580c,stroke-width:2px
classDef intPath fill:#dbeafe,stroke:#2563eb,stroke-width:2px
classDef pod fill:#dcfce7,stroke:#15803d
classDef sec fill:#fee2e2,stroke:#dc2626
class SENDER,MX,PF,HAP,KN,PODEXT extPath
class RC,PROBE,SVC,PODINT intPath
class POSTFIX,RSPAMD,DOVECOT,MAILBOX pod
class CSAGENT,CSLAPI sec
subgraph "Monitoring"
PROBE[E2E Roundtrip Probe<br/>CronJob every 20m] -->|Mailgun API| SENDER
PROBE -->|IMAP check| DOVECOT
PROBE --> PUSH[Pushgateway + Uptime Kuma]
DEXP[Dovecot Exporter<br/>:9166] --> PROM[Prometheus]
end
```
### PROXY v2 sequence (external SMTP roundtrip)
Illustrates the wire-level sequence of a Brevo probe email arriving at our MX. Same sequence applies to any external sender.
```mermaid
sequenceDiagram
autonumber
participant C as External MTA<br/>(e.g. Brevo 77.32.148.26)
participant PF as pfSense WAN<br/>192.168.1.2:25
participant HAP as pfSense HAProxy<br/>10.0.20.1:25
participant N as k8s-node:30125<br/>ETP: Cluster
participant P as Postfix postscreen<br/>pod:2525
C->>PF: TCP SYN dst=192.168.1.2:25
PF->>HAP: NAT rdr rewrites dst → 10.0.20.1:25
HAP->>N: TCP connect (src=10.0.20.1, dst=k8s-node:30125)
Note over HAP,N: HAProxy opens a NEW TCP flow<br/>to the backend k8s node.
HAP->>N: PROXY v2 header<br/>(source=77.32.148.26, dest=10.0.20.1)
N->>P: kube-proxy SNAT src=k8s-node IP<br/>forwards PROXY header + payload to pod
P->>P: Parse PROXY v2 header<br/>smtpd_client_addr := 77.32.148.26<br/>(despite kube-proxy SNAT on the wire)
P-->>C: SMTP banner 220 mail.viktorbarzin.me
C-->>P: EHLO / MAIL FROM / RCPT TO / DATA
Note over P,C: Real client IP logged in maillog,<br/>fed to CrowdSec postfix parser.
P->>P: → smtpd → Rspamd → Dovecot → mailbox
```
## Components
| Component | Version | Location | Purpose |
|-----------|---------|----------|---------|
| docker-mailserver | 15.0.0 | `mailserver` namespace | Postfix MTA + Dovecot IMAP + Rspamd (single container) |
| docker-mailserver | 15.0.0 | `mailserver` namespace | Postfix MTA + Dovecot IMAP + Rspamd |
| Roundcubemail | 1.6.13-apache | `mailserver` namespace | Webmail UI (MySQL-backed) |
| Dovecot Exporter | latest | Sidecar in mailserver pod | Prometheus metrics (port 9166) |
| Rspamd | Built into docker-mailserver | — | Spam filtering, DKIM signing, DMARC verification |
| pfSense HAProxy | 2.9-dev6 (`pfSense-pkg-haproxy-devel`) | pfSense VM | TCP reverse proxy injecting PROXY v2 for external mail |
| Brevo EU (ex-Sendinblue) | SaaS | — | Outbound SMTP relay (300/day free) |
Dovecot exporter was retired in code-1ik (2026-04-19) — `viktorbarzin/dovecot_exporter` speaks the pre-2.3 `old_stats` FIFO protocol which docker-mailserver 15.0.0's Dovecot 2.3.19 no longer emits.
## Port mapping
The mailserver pod exposes **8 TCP listeners**: 4 stock + 4 alt. Two Kubernetes Services front them depending on whether the client can inject PROXY v2.
| Mail protocol | Service port | K8s Service | Container port | NodePort | PROXY v2? | Who uses this path |
|---|---|---|---|---|---|---|
| SMTP (plain + STARTTLS) | 25 | `mailserver` ClusterIP | 25 | — | ❌ stock | Intra-cluster only (not used — internal clients send via 587) |
| SMTPS (implicit TLS) | 465 | `mailserver` ClusterIP | 465 | — | ❌ stock | Intra-cluster (Roundcube rarely uses this) |
| Submission (STARTTLS) | 587 | `mailserver` ClusterIP | 587 | — | ❌ stock | **Roundcube pod** → mailserver.svc:587 |
| IMAPS | 993 | `mailserver` ClusterIP | 993 | — | ❌ stock | **Roundcube pod** + E2E probe → mailserver.svc:993 |
| SMTP | 25 | `mailserver-proxy` NodePort | 2525 | 30125 | ✅ required | External MX traffic via pfSense HAProxy |
| SMTPS | 465 | `mailserver-proxy` NodePort | 4465 | 30126 | ✅ required | External SMTPS submission |
| Submission | 587 | `mailserver-proxy` NodePort | 5587 | 30127 | ✅ required | External STARTTLS submission (mail clients over WAN) |
| IMAPS | 993 | `mailserver-proxy` NodePort | 10993 | 30128 | ✅ required | External IMAPS (mail clients over WAN) |
The alt listeners are set up by:
- **Postfix**: `user-patches.sh` (shipped via ConfigMap `mailserver-user-patches`) appends 3 entries to `master.cf` with `-o postscreen_upstream_proxy_protocol=haproxy` (for 2525) or `-o smtpd_upstream_proxy_protocol=haproxy` (for 4465/5587).
- **Dovecot**: `dovecot.cf` ConfigMap adds a second `inet_listener` inside `service imap-login` with `haproxy = yes`, plus `haproxy_trusted_networks = 10.0.20.0/24` to allow PROXY headers from the k8s node subnet (post kube-proxy SNAT the source IP is always a node IP).
## Mail Flow
### Inbound
```
Internet → MX: mail.viktorbarzin.me (priority 1)
→ A record: 176.12.22.76 (non-proxied Cloudflare DNS-only)
→ pfSense NAT rdr: WAN:{25,465,587,993} → 10.0.20.1:{same}
→ pfSense HAProxy (TCP mode, send-proxy-v2 on backend)
→ k8s-node:{30125..30128} NodePort (mailserver-proxy, ETP: Cluster)
→ kube-proxy → pod alt listener (2525/4465/5587/10993)
→ Postfix postscreen / smtpd / Dovecot parses PROXY v2 header
→ Rspamd (spam + DKIM + DMARC) → Dovecot → mailbox
→ pfSense NAT: port 25 → 10.0.20.202:25
→ MetalLB (dedicated IP, ETP: Local — preserves real client IPs)
→ Postfix → Rspamd (spam + DKIM + DMARC check) → Dovecot → mailbox
```
No backup MX. If the server is down, sender MTAs queue and retry for 4-5 days per SMTP standards (RFC 5321).
@ -189,13 +114,9 @@ Reverse DNS for `176.12.22.76` returns `176-12-22-76.pon.spectrumnet.bg.` (ISP-a
### CrowdSec Integration
- **Collections**: `crowdsecurity/postfix` + `crowdsecurity/dovecot` (installed)
- **Log acquisition**: CrowdSec agents parse mailserver pod logs for brute-force patterns
- **Real client IPs**: pfSense HAProxy injects PROXY v2 header on each backend connection; Postfix (`postscreen_upstream_proxy_protocol=haproxy` / `smtpd_upstream_proxy_protocol=haproxy` on alt ports) + Dovecot (`haproxy = yes` on alt IMAPS listener) parse it to recover the true source IP despite kube-proxy SNAT. Replaces the pre-2026-04-19 MetalLB `10.0.20.202` ETP:Local scheme (see code-yiu)
- **Real client IPs**: `externalTrafficPolicy: Local` on dedicated MetalLB IP `10.0.20.202` preserves original client IPs (not SNATed to node IPs)
- **Decisions**: CrowdSec bans/challenges attackers via firewall bouncer rules
### Fail2ban Disabled (CrowdSec is the Policy)
docker-mailserver ships Fail2ban, but it is explicitly disabled here: `ENABLE_FAIL2BAN = "0"` at [`stacks/mailserver/modules/mailserver/main.tf:68`](../../stacks/mailserver/modules/mailserver/main.tf). CrowdSec is the cluster-wide bouncer for SSH, HTTP, and SMTP/IMAP brute-force defence — it already parses the `postfix` and `dovecot` log streams via the collections listed above and applies decisions at the LB/firewall layer. Enabling Fail2ban in-pod would create a duplicate response path (two systems racing to ban the same IP from different enforcement points), add iptables churn inside the container, and fragment the audit trail across two decision stores. Decision (2026-04-18): keep it disabled; CrowdSec owns this policy.
### Rspamd
- Spam filtering with phishing detection and Oletools
- DKIM signing (selector `mail`, 2048-bit RSA)
@ -218,13 +139,11 @@ anvil_rate_time_unit = 60s
## Monitoring
### E2E Roundtrip Probe
CronJob `email-roundtrip-monitor` (every 20 min, `*/20 * * * *`):
1. Sends test email via **Brevo HTTP API** to `smoke-test@viktorbarzin.me` (Brevo delivers it to our MX over the public internet, exercising the full external-ingress path).
2. Email hits WAN → pfSense HAProxy → k8s-node:30125 → pod :2525 postscreen (PROXY v2) → Postfix → catch-all delivers to `spam@` mailbox.
3. Verifies delivery via IMAP — connects to `mailserver.mailserver.svc.cluster.local:993` (intra-cluster path, no PROXY), searches by UUID marker.
4. Deletes test email, pushes metrics to Pushgateway + Uptime Kuma.
Push secrets (`BREVO_API_KEY`, `EMAIL_MONITOR_IMAP_PASSWORD`) come from ExternalSecret `mailserver-probe-secrets` (synced from Vault `secret/viktor` + `secret/platform.mailserver_accounts`) — see code-39v.
CronJob `email-roundtrip-monitor` (every 10 min):
1. Sends test email via Mailgun HTTP API to `smoke-test@viktorbarzin.me`
2. Email hits MX → Postfix → catch-all delivers to `spam@` mailbox
3. Verifies delivery via IMAP (searches by UUID marker)
4. Deletes test email, pushes metrics to Pushgateway + Uptime Kuma
### Prometheus Alerts
| Alert | Threshold | Severity |
@ -235,13 +154,13 @@ Push secrets (`BREVO_API_KEY`, `EMAIL_MONITOR_IMAP_PASSWORD`) come from External
| EmailRoundtripNeverRun | Metric absent for 40m | warning |
### Uptime Kuma Monitors
- TCP SMTP on `176.12.22.76:25` — full external path (DNS → WAN → pfSense HAProxy → mailserver)
- TCP `mailserver.svc:{587,993}` — intra-cluster ClusterIP path
- TCP `10.0.20.1:{25,993}` — pfSense HAProxy health (post code-yiu Phase 6)
- E2E Push monitor (receives push from `email-roundtrip-monitor` probe)
- TCP SMTP on `176.12.22.76:25` (external, 60s interval)
- TCP IMAP on `10.0.20.202:993` (internal)
- E2E Push monitor (receives push from roundtrip probe)
### Dovecot exporter — retired
`viktorbarzin/dovecot_exporter` was removed in code-1ik (2026-04-19). It spoke the pre-2.3 `old_stats` FIFO protocol; Dovecot 2.3.19 (docker-mailserver 15.0.0) no longer emits that, so the scrape only ever returned `dovecot_up{scope="user"} 0`. If Dovecot metrics become valuable, reach for a 2.3+ compatible exporter (e.g. `jtackaberry/dovecot_exporter`) and re-add the scrape + alerts. The previously-created `mailserver-metrics` ClusterIP Service was also removed.
### Dovecot Exporter
- Sidecar container in mailserver pod, port 9166
- Scraped by Prometheus for IMAP connection metrics
## Terraform
@ -259,20 +178,16 @@ Push secrets (`BREVO_API_KEY`, `EMAIL_MONITOR_IMAP_PASSWORD`) come from External
| `secret/platform` | `mailserver_aliases` | Postfix virtual aliases |
| `secret/platform` | `mailserver_opendkim_key` | DKIM private key |
| `secret/platform` | `mailserver_sasl_passwd` | Brevo relay credentials (`[smtp-relay.brevo.com]:587 <login>:<key>`) |
| `secret/viktor` | `brevo_api_key` | Brevo API key — used by BOTH outbound SMTP SASL (postfix) AND the E2E roundtrip probe (sends external test mail via Brevo HTTP) |
| `secret/viktor` | `mailgun_api_key` | Historical; no longer used by the probe post code-n5l/Phase-5 work. Kept for reference. |
| `secret/viktor` | `mailgun_api_key` | Mailgun API for E2E roundtrip probe (retained for inbound delivery testing only; not used for user mail) |
| `secret/viktor` | `brevo_api_key` | Brevo API key (stored for reference) |
## Storage
| PVC | Size | Storage Class | Purpose |
|-----|------|---------------|---------|
| `mailserver-data-encrypted` | 2Gi (auto-resize 5Gi) | `proxmox-lvm-encrypted` (LUKS2) | Maildir + Postfix queue + state + logs |
| `roundcubemail-html-encrypted` | 1Gi | `proxmox-lvm-encrypted` | Roundcube PHP code + user session data |
| `roundcubemail-enigma-encrypted` | 1Gi | `proxmox-lvm-encrypted` | Roundcube Enigma (PGP) user keys |
| `mailserver-backup-host` (RWX) | 10Gi | `nfs-truenas` | `mailserver-backup` CronJob destination (`/srv/nfs/mailserver-backup/<YYYY-WW>/`) |
| `roundcube-backup-host` (RWX) | 10Gi | `nfs-truenas` | `roundcube-backup` CronJob destination |
**Backup**: daily `mailserver-backup` + `roundcube-backup` CronJobs rsync data PVCs to NFS. NFS directory is picked up by the PVE host's inotify-driven `/usr/local/bin/offsite-sync-backup` which pushes to Synology (weekly). See [Storage & Backup Architecture](storage.md) for the 3-2-1 flow.
| `mailserver-data-proxmox` | 2Gi (auto-resize 5Gi) | proxmox-lvm | Mail data, state, logs |
| `roundcubemail-html-proxmox` | 1Gi | proxmox-lvm | Roundcube web files |
| `roundcubemail-enigma-proxmox` | 1Gi | proxmox-lvm | Roundcube encryption |
## Decisions & Rationale
@ -291,23 +206,19 @@ Push secrets (`BREVO_API_KEY`, `EMAIL_MONITOR_IMAP_PASSWORD`) come from External
- **Decision**: Rspamd replaces both SpamAssassin and OpenDKIM in a single component
- **Tradeoff**: Higher memory usage (~150-200MB) but simpler stack
### Client-IP Preservation (pfSense HAProxy + PROXY v2)
- **Current (2026-04-19, bd code-yiu)**: pfSense HAProxy listens on `10.0.20.1:{25,465,587,993}`, forwards to k8s NodePort 30125-30128 with `send-proxy-v2` on each backend connection. The mailserver pod exposes parallel listeners (2525/4465/5587/10993) that REQUIRE the PROXY v2 header, while the stock ports 25/465/587/993 stay PROXY-free for intra-cluster traffic (Roundcube, probe). The mailserver Service is ClusterIP-only; ETP is no longer a concern for external traffic.
- **Historical (2026-04-12 → 2026-04-19)**: Dedicated MetalLB IP `10.0.20.202` with `externalTrafficPolicy: Local` — required pod/speaker colocation; kube-proxy preserved client IP only when pod was on the same node as the advertising speaker.
- **Why switched**: ETP:Local made the mailserver's single replica drop inbound mail silently during pod reschedule (30-60s GARP flip). HAProxy with `send-proxy-v2` lets the pod reschedule to any node and recover IP-preservation through the header.
- **Tradeoff**: pfSense now runs HAProxy (one more service in the firewall's responsibility); alt container ports + extra Service are ~80 lines of Terraform. The win is HA without IP-preservation compromise.
- **Runbook**: [`runbooks/mailserver-pfsense-haproxy.md`](../runbooks/mailserver-pfsense-haproxy.md).
### Dedicated MetalLB IP for CrowdSec
- **Decision**: Mailserver gets `10.0.20.202` (separate from shared `10.0.20.200`) with `externalTrafficPolicy: Local`
- **Why**: Shared IP with ETP: Cluster SNATs away real client IPs, making CrowdSec detections and Postfix rate limiting useless
- **Tradeoff**: Uses one extra IP from the MetalLB pool. Requires separate pfSense NAT rule.
## Troubleshooting
### Inbound mail not arriving
1. **DNS/MX**: `dig MX viktorbarzin.me +short` → should show `mail.viktorbarzin.me`
2. **WAN reachability**: `nc -zw5 mail.viktorbarzin.me 25` from outside
3. **pfSense NAT**: verify WAN:{25,465,587,993} rdr to `10.0.20.1` (HAProxy VIP). `ssh admin@10.0.20.1 'pfctl -sn' | grep '10.0.20.1'`
4. **HAProxy health**: `ssh admin@10.0.20.1 "echo 'show servers state' | socat /tmp/haproxy.socket stdio"` — at least one backend in `srv_op_state=2` (UP) per pool
5. **Container listener**: `kubectl exec -n mailserver -c docker-mailserver deployment/mailserver -- ss -ltn | grep -E ':(25|2525|465|4465|587|5587|993|10993)\b'` — 8 lines expected
6. **Postfix queue + delivery**: `kubectl logs -n mailserver deploy/mailserver -c docker-mailserver | grep -E 'from=|reject|smtpd-proxy'`
7. **CrowdSec decisions**: `kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions list`
1. Check MX: `dig MX viktorbarzin.me +short` → should show `mail.viktorbarzin.me`
2. Check port 25: `nc -zw5 mail.viktorbarzin.me 25`
3. Check pfSense NAT rule: port 25 → `10.0.20.202:25`
4. Check Postfix logs: `kubectl logs -n mailserver deploy/mailserver -c docker-mailserver | grep -E 'from=|reject'`
5. Check if CrowdSec is blocking the sender: `kubectl exec -n crowdsec deploy/crowdsec-lapi -- cscli decisions list`
### Outbound mail failing
1. Check Brevo relay: `kubectl logs -n mailserver deploy/mailserver -c docker-mailserver | grep relay` — should show `relay=smtp-relay.brevo.com`

View file

@ -75,9 +75,7 @@ Prometheus scrapes metrics from all cluster components and applications using Se
### External Monitoring
The `external-monitor-sync` CronJob (every 10min, `stacks/uptime-kuma/`) ensures Uptime Kuma has `[External] <service>` monitors for externally-reachable ingresses. Discovery is **opt-OUT**: the script lists every ingress via the K8s API and creates a monitor for any host ending in `.viktorbarzin.me`, skipping only those annotated `uptime.viktorbarzin.me/external-monitor: "false"`. Both `ingress_factory` and the `reverse-proxy` factory emit that annotation when the caller sets `external_monitor = false`; leaving it null keeps the opt-in default (important for helm-provisioned ingresses that don't go through our factories). The legacy `cloudflare_proxied_names` ConfigMap is a fallback if the K8s API discovery fails.
These monitors test the full external access path (DNS → Cloudflare → Tunnel → Traefik → Service) from inside the cluster. The status-page-pusher groups them as "External Reachability" and pushes a `external_internal_divergence_count` metric to Pushgateway when services are externally down but internally up. Alert `ExternalAccessDivergence` fires after 15min of divergence.
The `external-monitor-sync` CronJob (every 10min, `stacks/uptime-kuma/`) ensures Uptime Kuma has `[External] <service>` monitors for every service in `cloudflare_proxied_names`. These monitors test the full external access path (DNS → Cloudflare → Tunnel → Traefik → Service) from inside the cluster. The status-page-pusher groups them as "External Reachability" and pushes a `external_internal_divergence_count` metric to Pushgateway when services are externally down but internally up. Alert `ExternalAccessDivergence` fires after 15min of divergence.
Data flows from targets through Prometheus storage to Grafana dashboards. Applications emit logs to stdout/stderr which are aggregated by Loki and queryable through Grafana's log viewer.

View file

@ -1,203 +0,0 @@
# pfSense HAProxy for Mailserver — Runbook
Last updated: 2026-04-19 (Phase 6 complete)
## What & why
External mail traffic (SMTP/IMAP) requires **real client IP visibility** for
CrowdSec + Postfix rate-limiting. MetalLB cannot inject PROXY-protocol
headers (see [`mailserver-proxy-protocol.md`](./mailserver-proxy-protocol.md)),
so pfSense runs a small HAProxy that:
1. Listens on the pfSense VLAN20 IP (`10.0.20.1`) on all 4 mail ports,
2. Forwards each connection to a k8s node's NodePort with `send-proxy-v2`,
3. Injects PROXY v2 framing so Postfix/Dovecot see the original client IP,
4. TCP health-checks every k8s worker — any node can serve (ETP:Cluster).
Corresponding k8s-side setup (`stacks/mailserver/modules/mailserver/`):
- ConfigMap `mailserver-user-patches``user-patches.sh` appends 3 alt
`master.cf` services to Postfix:
- `:2525` postscreen (alt :25) with `postscreen_upstream_proxy_protocol=haproxy`
- `:4465` smtpd (alt :465 SMTPS) with `smtpd_upstream_proxy_protocol=haproxy`
- `:5587` smtpd (alt :587 submission) with `smtpd_upstream_proxy_protocol=haproxy`
- ConfigMap `mailserver.config` adds Dovecot `inet_listener imaps_proxy` on
port 10993 with `haproxy = yes` and `haproxy_trusted_networks = 10.0.20.0/24`.
- Service `mailserver-proxy` (NodePort, ETP:Cluster) with 4 NodePorts:
- `port 25 → targetPort 2525 → nodePort 30125`
- `port 465 → targetPort 4465 → nodePort 30126`
- `port 587 → targetPort 5587 → nodePort 30127`
- `port 993 → targetPort 10993 → nodePort 30128`
- Service `mailserver` (ClusterIP) — unchanged stock ports 25/465/587/993
for intra-cluster clients (Roundcube pod, `email-roundtrip-monitor`
CronJob). These listeners are PROXY-free.
bd: `code-yiu`.
## Steady-state architecture
```
External mail (WAN) path — PROXY v2
┌─────────────────────────────────────────────────────────────────────┐
│ Client (real IP) │
│ │ SMTP/SMTPS/Sub/IMAPS │
│ ▼ │
│ pfSense WAN:{25,465,587,993} │
│ │ NAT rdr → 10.0.20.1:{same} │
│ ▼ │
│ pfSense HAProxy (mode tcp, 4 frontends, 4 backend pools) │
│ │ send-proxy-v2 + tcp-check inter 120000 │
│ ▼ │
│ k8s-node<1-4>:{30125..30128} ← any node (ETP:Cluster) │
│ │ kube-proxy SNAT (source IP lost on the wire) │
│ ▼ │
│ mailserver pod :{2525,4465,5587,10993} │
│ │ postscreen / smtpd / Dovecot parse PROXY v2 header │
│ │ → real client IP recovered despite kube-proxy SNAT │
│ ▼ │
│ CrowdSec + Postfix / Dovecot see the true source IP ✓ │
└─────────────────────────────────────────────────────────────────────┘
Intra-cluster path — no PROXY
┌─────────────────────────────────────────────────────────────────────┐
│ Roundcube pod / email-roundtrip-monitor CronJob │
│ │ SMTP/IMAP │
│ ▼ │
│ mailserver.mailserver.svc.cluster.local:{25,465,587,993} │
│ │ ClusterIP — bypasses LoadBalancer/NodePort layer entirely │
│ ▼ │
│ mailserver pod stock :{25,465,587,993} (PROXY-free) │
└─────────────────────────────────────────────────────────────────────┘
```
## Validation
```sh
# All HAProxy frontends listening
ssh admin@10.0.20.1 'sockstat -l | grep haproxy'
# Expect: *:25, *:465, *:587, *:993, *:2525 (test port)
# All backend pools healthy
ssh admin@10.0.20.1 "echo 'show servers state' | socat /tmp/haproxy.socket stdio" \
| awk 'NR>1 {print $3, $4, $6}'
# srv_op_state 2 = UP, 0 = DOWN
# Container listens on all 8 ports
kubectl exec -n mailserver -c docker-mailserver deployment/mailserver -- \
ss -ltn | grep -E ':(25|2525|465|4465|587|5587|993|10993)\b'
# pf rdr points at pfSense (10.0.20.1), not <mailserver> alias
ssh admin@10.0.20.1 'pfctl -sn' | grep -E 'port = (25|submission|imaps|smtps)'
# E2E probe — Brevo → external MX :25 → IMAP fetch
kubectl create job --from=cronjob/email-roundtrip-monitor probe-test -n mailserver
kubectl wait --for=condition=complete --timeout=90s job/probe-test -n mailserver
kubectl logs job/probe-test -n mailserver | grep SUCCESS
kubectl delete job probe-test -n mailserver
# Real client IP in maillog post-delivery
kubectl logs -c docker-mailserver deployment/mailserver -n mailserver \
| grep 'smtpd-proxy25.*CONNECT from' | tail -5
# Expect external source IPs (e.g., Brevo 77.32.148.x), NOT 10.0.20.x
```
## Bootstrap / restore from scratch
pfSense HAProxy config lives in `/cf/conf/config.xml` under
`<installedpackages><haproxy>`. That file is scp'd nightly to
`/mnt/backup/pfsense/config-YYYYMMDD.xml` by `scripts/daily-backup.sh`, then
synced to Synology. To rebuild from source of truth (git):
```sh
scp infra/scripts/pfsense-haproxy-bootstrap.php admin@10.0.20.1:/tmp/
ssh admin@10.0.20.1 'php /tmp/pfsense-haproxy-bootstrap.php'
```
The script is idempotent — re-runs reset the mailserver frontends + backends
to the declared state.
Expected output:
```
haproxy_check_and_run rc=OK
```
## Operations
### Change backend k8s node IPs / NodePorts
Edit `infra/scripts/pfsense-haproxy-bootstrap.php``$NODES` array + the
`build_pool()` port arguments. Re-run the bootstrap command above. Don't
hand-edit `/var/etc/haproxy/haproxy.cfg` — it is regenerated from XML on
every apply.
### Check health of backends
```sh
ssh admin@10.0.20.1 "echo 'show servers state' | socat /tmp/haproxy.socket stdio"
```
`srv_op_state=2` means UP, `0` means DOWN.
### View live HAProxy stats (WebUI)
`https://pfsense.viktorbarzin.me` → Services → HAProxy → Stats.
### Reload after config.xml edit
```sh
ssh admin@10.0.20.1 'pfSsh.php playback svc restart haproxy'
```
### Rollback (flip NAT back to MetalLB, post-Phase-6 only partial)
There is no Phase-6 rollback one-liner. Phase 6 removed the MetalLB
LoadBalancer 10.0.20.202 entirely, so un-flipping NAT now would send
traffic to a dead alias. To regress:
1. Re-add `metallb.io/loadBalancerIPs = "10.0.20.202"` + `type = "LoadBalancer"`
+ `external_traffic_policy = "Local"` to `kubernetes_service.mailserver`,
apply.
2. Re-add the `mailserver` host alias in pfSense pointing at 10.0.20.202
(Firewall → Aliases → Hosts).
3. Run `infra/scripts/pfsense-nat-mailserver-haproxy-unflip.php` on pfSense.
For rollback of just the NAT (Phase 4) without touching the Service, only
the third step is needed — but only meaningful BEFORE Phase 6.
### Restore from backup
pfSense config backup is a plain XML file:
```
/mnt/backup/pfsense/config-YYYYMMDD.xml # sda host copy (1.1TB RAID1)
/volume1/Backup/Viki/pve-backup/pfsense/... # Synology offsite
```
Full restore: pfSense WebUI → Diagnostics → Backup & Restore → Upload that
`config.xml`. The `<installedpackages><haproxy>` section is included.
## Phase history (bd code-yiu)
| Phase | Status | Description |
|---|---|---|
| 1a | ✅ commit `ef75c02f` | k8s alt :2525 listener + NodePort Service |
| 2 | ✅ 2026-04-19 | pfSense HAProxy pkg installed (`pfSense-pkg-haproxy-devel-0.63_2`, HAProxy 2.9-dev6) |
| 3 | ✅ commit `ba697b02` | HAProxy config persisted in pfSense XML (bootstrap script + this runbook) |
| 4+5| ✅ commit `9806d515` | 4-port alt listeners + HAProxy frontends for 25/465/587/993 + NAT flip |
| 6 | ✅ this commit | Mailserver Service downgraded LoadBalancer → ClusterIP; `10.0.20.202` released back to MetalLB pool; orphan `mailserver` pfSense alias removed; monitors retargeted |
## Known warts
- HAProxy TCP health-check with `send-proxy-v2` generates `getpeername:
Transport endpoint not connected` warnings on postscreen every check cycle.
Mitigated with `inter 120000` (2 min). To reduce further, switch to
`option smtpchk` — but that requires a separate non-PROXY health-check
port on the pod (not done yet).
- Frontend binds on all pfSense interfaces (`bind :25` instead of
`10.0.20.1:25`). `<extaddr>` is set in XML but pfSense templates it
port-only. Low concern in practice because WAN firewall rules plus the
NAT rdr gate external access; internal VLAN clients SHOULD be able to
reach HAProxy on any pfSense-local IP.
- k8s-node5 doesn't exist — cluster has master + 4 workers. Backend pool
capped at 4 servers.
- Postscreen still logs `improper command pipelining` for legitimate
clients that send `EHLO\r\nQUIT\r\n` as a single TCP write. This is
unchanged pre/post-migration — postscreen's anti-bot heuristic.

View file

@ -1,181 +0,0 @@
# Mailserver PROXY protocol — research & decision
Last updated: 2026-04-18 (original research). **Outcome implemented 2026-04-19 — see [UPDATE](#update-2026-04-19) below.**
> ## UPDATE (2026-04-19)
>
> This doc describes the research that led to the Phase-6 rollout. **Option C
> (pfSense HAProxy + PROXY v2)** was chosen and is now live. Operational
> state, cutover history, bootstrap, and rollback procedures live in
> [`mailserver-pfsense-haproxy.md`](mailserver-pfsense-haproxy.md).
>
> This file is retained as a decision record — it explains *why* Option A
> (pod-pinning via nodeSelector) was rejected mid-session in favour of
> Option C, and documents the MetalLB upstream limitation (PROXY injection
> is explicitly won't-implement). Future debates of "why don't we just pin
> the pod?" should land here first.
## TL;DR
**MetalLB does not and will not inject PROXY protocol headers.** The original plan
(`/home/wizard/.claude/plans/let-s-work-on-linking-temporal-valiant.md`, task
`code-rtb`) assumed MetalLB could be configured to emit PROXY v1/v2 on behalf of
the `mailserver` LoadBalancer Service. That assumption is wrong at the product
level. MetalLB is a control-plane-only announcer (ARP/NDP for L2 mode, BGP for
L3 mode); it never touches the L4 payload.
As a result, there is no single Terraform change that can flip
`externalTrafficPolicy: Local``Cluster` on the `mailserver` Service while
preserving the real client IP for Postfix/postscreen and Dovecot. Three
alternative paths exist (see below); none is trivial.
## Environment (verified 2026-04-18)
- **MetalLB version**: `quay.io/metallb/controller:v0.15.3` /
`quay.io/metallb/speaker:v0.15.3` (5 speakers).
- **Advertisement type**: L2Advertisement `default` bound to IPAddressPool
`default` (10.0.20.20010.0.20.220). No BGPAdvertisements.
- **Service**: `mailserver/mailserver` — type `LoadBalancer`, `loadBalancerIPs:
10.0.20.202`, `externalTrafficPolicy: Local`,
`healthCheckNodePort: 30234`, 5 ports (25, 465, 587, 993, 9166/dovecot-metrics).
- **Pod**: single replica today, RWO PVCs prevent horizontal scale without
further work (`mailserver-data-encrypted`, `mailserver-letsencrypt-encrypted`).
## Why the original plan fails
### MetalLB never touches packets
> *"MetalLB is controlplane only, making it part of the dataplane means we
> would be responsible for the performance of the system, so more bugs to
> fight, I personally don't see that happening."*
> — MetalLB maintainer `champtar`, 2021-01-06
> (issue [#797 — Feature Request: Supporting Proxy Protocol v2](https://github.com/metallb/metallb/issues/797))
Issue #797 is closed as "won't implement". Repeat asks in 20222023 got the
same answer. The v0.15.3 API surface confirms this: no
`proxyProtocol`/`haproxy`/`protocol: proxy` field exists on `IPAddressPool`,
`L2Advertisement`, `BGPAdvertisement`, or as a Service annotation.
Only managed-cloud LBs (AWS NLB, Azure LB, OCI, DO, OVH, Scaleway, etc.) offer
PROXY protocol as a tick-box. MetalLB's equivalents are:
| MetalLB feature | Does it preserve client IP? | Comment |
|---|---|---|
| `externalTrafficPolicy: Local` (current) | Yes, via iptables DNAT on the speaker node | Forces pod↔speaker colocation on L2 mode. This is the pain we wanted to avoid. |
| `externalTrafficPolicy: Cluster` | No — kube-proxy SNATs to the node IP | The problem we would re-introduce if we flipped without PROXY injection. |
| PROXY protocol injection | N/A — not implemented | Dead end. |
### The `Local` trap is real, but narrower than it seems
Today's `Local` policy means the ARP announcer node must also host the mailserver
pod. MetalLB always picks a single speaker to advertise the VIP (leader
election per IP), so in practice exactly one node matters at any moment. A pod
rescheduled to a different node silently drops inbound SMTP/IMAP until a GARP
flip or node cordon.
The only pods on our cluster that see this same class of risk are Traefik
(3 replicas + PDB `minAvailable=2`, so 2 of 3 nodes always have a pod) and
mailserver (1 replica). Traefik survives because the pods outnumber the nodes
that could be the speaker at once; the mailserver cannot.
## Alternative paths (ranked by effort)
### Option A — Pin the mailserver pod to a specific node (SIMPLEST)
Add `nodeSelector` on the mailserver Deployment pointing at a label that's also
stamped on the MetalLB speaker we want to advertise the VIP from, and use
MetalLB's [node selector](https://metallb.io/configuration/_advanced_l2_configuration/#specify-network-interfaces-that-lb-ip-can-be-announced-from)
on `L2Advertisement.spec.nodeSelectors` to pin the announcer to the same node.
Trade-offs:
- Zero changes to Postfix/Dovecot configs.
- Keeps `externalTrafficPolicy: Local` — real client IP keeps arriving.
- Loses HA (the whole point of the MetalLB layer) but reflects reality — one
replica, one PVC, no HA today anyway.
- Drain of that node requires a planned cutover, but that's no worse than
today's silent failure mode.
Implementation (~10 lines of Terraform):
```hcl
# In stacks/mailserver/modules/mailserver/main.tf, on the Deployment:
node_selector = { "viktorbarzin.me/mailserver-anchor" = "true" }
# In stacks/platform (or wherever the MetalLB CRs live):
resource "kubernetes_manifest" "mailserver_l2ad" {
manifest = {
apiVersion = "metallb.io/v1beta1"
kind = "L2Advertisement"
metadata = { name = "mailserver", namespace = "metallb-system" }
spec = {
ipAddressPools = ["default"]
nodeSelectors = [{ matchLabels = { "viktorbarzin.me/mailserver-anchor" = "true" } }]
}
}
}
```
Plus a node label via `kubectl label node k8s-node3 viktorbarzin.me/mailserver-anchor=true`.
**Recommendation: this is the shortest path to eliminating the silent-drop
failure mode** without taking on a new proxy tier.
### Option B — Put a HAProxy sidecar in front of Postfix/Dovecot
Stand up an in-cluster HAProxy with PROXY v2 enabled on the frontend and
`send-proxy-v2` on the backend to `mailserver:25/465/587/993`. Expose HAProxy
via a new MetalLB Service with `externalTrafficPolicy: Cluster` + kube-proxy
DSR workaround (still loses client IP at that layer), or run HAProxy on the
host-network of the same node (back to Option A's colocation).
Trade-offs:
- Introduces one more network hop and TLS-termination decision for every
SMTP connect.
- HAProxy needs its own cert rotation (or `tls-passthrough`) — adds moving
parts to an already crowded mailserver module.
- Doesn't actually solve the colocation problem on its own — HAProxy itself
needs to receive the client IP, so we are back to externalTrafficPolicy
constraints for HAProxy.
**Recommendation: avoid unless we also get HA for mailserver itself, which
needs RWX storage + DB split-brain work — out of scope.**
### Option C — Replace MetalLB with a different LB for this Service
Candidates: [kube-vip](https://kube-vip.io/) (supports eBPF-based DSR but not
PROXY injection either), [Cilium LB](https://docs.cilium.io/en/stable/network/lb-ipam/)
(preserves client IP via DSR in hybrid mode), or a dedicated HAProxy running on
pfSense and NAT-forwarding 25/465/587/993 with PROXY headers to a
ClusterIP-exposed mailserver. Cilium requires a CNI migration (we run Calico
today); pfSense HAProxy is genuinely feasible but belongs in a different bd
task.
**Recommendation: track as P3 follow-up under a new bd task if Option A proves
insufficient.**
## Decision
Do nothing in this session beyond this runbook + the bd note. The `code-rtb`
task as written is not executable — MetalLB cannot inject PROXY headers, and
the Postfix/Dovecot config changes the plan proposed would not receive the
header they expect, they would hang waiting for it and then timeout (5s per
connection).
Follow-up work filed as bd child tasks (if user wants to pursue):
- **Option A — pin mailserver + L2Advertisement nodeSelectors** (new bd task)
- **Option C — HAProxy on pfSense with PROXY v2 to a ClusterIP** (new bd task)
## References
- [MetalLB issue #797 — Feature Request: Supporting Proxy Protocol v2](https://github.com/metallb/metallb/issues/797) (closed, won't implement)
- [MetalLB PR #796 — Source IP Preservation discussion](https://github.com/metallb/metallb/issues/796)
- Postfix [postscreen_upstream_proxy_protocol](https://www.postfix.org/postconf.5.html#postscreen_upstream_proxy_protocol) — expects the PROXY header *on every incoming connection*; if absent, postscreen drops after `postscreen_upstream_proxy_timeout`.
- Dovecot [haproxy_trusted_networks](https://doc.dovecot.org/settings/core/#core_setting-haproxy_trusted_networks) — treats the header as mandatory for listed source networks.
- Cluster state verified against: `kubectl -n metallb-system get pods`,
`kubectl get ipaddresspools.metallb.io -A`,
`kubectl get l2advertisements.metallb.io -A`,
`kubectl get bgpadvertisements.metallb.io -A`,
`kubectl -n mailserver get svc mailserver -o yaml`.

View file

@ -1,66 +0,0 @@
# NFS Prerequisites for `modules/kubernetes/nfs_volume`
The `nfs_volume` Terraform module creates a `PersistentVolume` pointing at a
path on the Proxmox NFS server (`192.168.1.127`). It does **not** create the
underlying directory on the server.
If the path does not exist, the first pod that tries to mount the resulting
PVC gets stuck in `ContainerCreating` with the kubelet event:
```
MountVolume.SetUp failed for volume "<name>" : mount failed: exit status 32
mount.nfs: mounting 192.168.1.127:/srv/nfs/<path> failed, reason given by
server: No such file or directory
```
## Bootstrap before first apply
Before adding a new `nfs_volume` consumer (backup CronJob, data PV, etc.),
create the export root on the PVE host:
```sh
# Replace <app> with the backup stack name, e.g. mailserver-backup,
# roundcube-backup, immich-backup, etc.
ssh root@192.168.1.127 'mkdir -p /srv/nfs/<app> && chmod 755 /srv/nfs/<app>'
# Confirm exports are live (no change to /etc/exports needed — `/srv/nfs`
# is already exported via the root entry in pve-nfs-exports).
ssh root@192.168.1.127 exportfs -v | grep '/srv/nfs\b'
```
`/srv/nfs` is exported with the root entry. Subdirectories inherit the
export automatically; they just have to exist on disk.
## Known consumers
| Consumer | NFS path | Owning stack |
|--------------------------------|---------------------------------|--------------------------|
| `mailserver-backup` | `/srv/nfs/mailserver-backup` | `stacks/mailserver/` |
| `roundcube-backup` | `/srv/nfs/roundcube-backup` | `stacks/mailserver/` |
| `mysql-backup` | `/srv/nfs/mysql-backup` | `stacks/dbaas/` |
| `postgresql-backup` | `/srv/nfs/postgresql-backup` | `stacks/dbaas/` |
| `vaultwarden-backup` | `/srv/nfs/vaultwarden-backup` | `stacks/vaultwarden/` |
Use `grep -rn 'nfs_volume' infra/stacks/` to find all active consumers.
## Why not auto-create?
Two options were considered for automating this:
1. `null_resource` + `local-exec` SSH `mkdir` in the `nfs_volume` module —
works but adds an SSH dependency to every Terraform run, makes the
module non-hermetic, and fails if the operator does not have SSH to
the PVE host.
2. `nfs-subdir-external-provisioner` — handles subdirs automatically but
changes the PV/PVC shape and would require migrating all existing
consumers.
Neither is worth the churn for a one-time operation per new backup stack.
Document + checklist is the current call; re-evaluate if we start adding
one NFS consumer per week.
## Related tasks
- `code-yo4` — this runbook
- `code-z26` — mailserver backup CronJob (first-time setup hit this)
- `code-1f6` — Roundcube backup CronJob (also hit this)

View file

@ -1,103 +0,0 @@
# Runbook: Proxmox host (pve, 192.168.1.127)
Last updated: 2026-04-19
The Proxmox host is a baremetal hypervisor on the storage LAN
(192.168.1.0/24) with a single IP `192.168.1.127`. It hosts every
Kubernetes node VM and the NFS exports that back PVCs. It does **not**
receive DHCP — its network config is static in
`/etc/network/interfaces` (ifupdown). Because of that, DNS must be
configured manually and stays out of the scope of Kea/DHCP-DDNS.
## DNS configuration
The host uses a plain `/etc/resolv.conf` with two nameservers. No
`systemd-resolved`, no `resolvconf`, no NetworkManager — nothing
manages `/etc/resolv.conf`; it is a regular file owned by root.
### Why plain `/etc/resolv.conf` and not systemd-resolved
1. Installing `systemd-resolved` on an active Proxmox node during
business hours is the kind of change that risks breaking the NFS
server or VM networking. PVE's Debian base does not ship
`systemd-resolved` by default.
2. The ifupdown `/etc/network/interfaces` file does not manage
`/etc/resolv.conf` here — ifupdown's resolvconf integration is
only active if the `resolvconf` package is installed, which it is
not (`dpkg -l resolvconf` returns `un`).
3. A plain file is the simplest mental model and avoids a second
layer of "which tool is running now" confusion during an incident.
If you ever want to migrate to `systemd-resolved`, install the
package, enable the service, symlink `/etc/resolv.conf` to
`/run/systemd/resolve/stub-resolv.conf`, and drop the config in
`/etc/systemd/resolved.conf.d/10-internal-dns.conf` — but do this
during a maintenance window, not reactively.
### Current state
```
# /etc/resolv.conf
search viktorbarzin.lan
nameserver 192.168.1.2
nameserver 94.140.14.14
options timeout:2 attempts:2
```
| Field | Value | Purpose |
|---|---|---|
| Primary | `192.168.1.2` | pfSense LAN interface (dnsmasq forwarder → Technitium LB) — resolves `.viktorbarzin.lan` |
| Fallback | `94.140.14.14` | AdGuard public DNS — recursive only, used if pfSense LAN IP unreachable |
| `search` | `viktorbarzin.lan` | Unqualified names (`technitium`, `idrac`, etc.) resolve against the internal zone |
| `timeout:2 attempts:2` | — | Cap glibc resolver at 2s per server, 2 tries — reasonable fallback latency |
### Verification commands
```sh
ssh root@192.168.1.127 '
cat /etc/resolv.conf # should show the two nameservers
dig +short idrac.viktorbarzin.lan # expect an A record (192.168.1.4)
dig +short github.com # expect an A record
'
```
Simulated failover — force the primary unreachable and verify the
fallback answers:
```sh
ssh root@192.168.1.127 '
ip route add blackhole 192.168.1.2
dig +short +time=3 github.com # glibc times out on primary, tries 94.140.14.14 → A record returned
ip route del blackhole 192.168.1.2 # cleanup
'
```
Expected behaviour: the first `dig` prints a warning about the UDP
setup failing for 192.168.1.2 and then prints the GitHub A record
(answered by 94.140.14.14).
## Rollback
A pre-change backup of `/etc/resolv.conf`, `/etc/network/interfaces`,
and `/etc/network/interfaces.d/` lives at
`/root/dns-backups/dns-config-backup-YYYYMMDD-HHMMSS.tar.gz` on the
host. To roll back:
```sh
ssh root@192.168.1.127 '
# pick the backup you want (there may be multiple if this runbook has been applied more than once)
BACKUP=$(ls -t /root/dns-backups/dns-config-backup-*.tar.gz | head -1)
tar -xzf "$BACKUP" -C /
cat /etc/resolv.conf
'
```
No service restart is needed — glibc re-reads `/etc/resolv.conf` per
lookup.
## Related docs
- `docs/architecture/dns.md` — where each resolver IP lives and which
subnet it serves.
- `docs/runbooks/nfs-prerequisites.md` — other operations on this
host; read before adding new NFS exports.

View file

@ -1,147 +0,0 @@
# Runbook: Registry VM (docker-registry, 10.0.20.10)
Last updated: 2026-04-19
The registry VM hosts `registry.viktorbarzin.me` (private Docker
registry, htpasswd-auth, NGINX → registry:2). It is an Ubuntu 24.04
VM on the cluster LAN subnet `10.0.20.0/24`, with a static netplan
config (no DHCP). Because it sits on a subnet that only has pfSense
as its gateway, its DNS must be statically configured.
## DNS configuration
Ubuntu ships `systemd-resolved` and uses netplan to declare per-link
`nameservers`. Netplan writes systemd-networkd or NetworkManager
configs that resolved reads at runtime. There is **no automatic
merging** of netplan DNS with the `[Resolve]` section of
`/etc/systemd/resolved.conf` — per-link settings override the global
ones. So both layers must be in sync:
| Layer | File | Role |
|---|---|---|
| Netplan | `/etc/netplan/50-cloud-init.yaml` | Per-link DNS servers that resolved reports on `Link 2 (eth0)` |
| Resolved global | `/etc/systemd/resolved.conf.d/10-internal-dns.conf` | `Global` scope `DNS=` / `FallbackDNS=` — also shown in `resolvectl status` |
### Current state
`/etc/systemd/resolved.conf.d/10-internal-dns.conf`:
```ini
[Resolve]
DNS=10.0.20.1
FallbackDNS=94.140.14.14
Domains=viktorbarzin.lan
```
`/etc/netplan/50-cloud-init.yaml` (eth0 block, simplified):
```yaml
nameservers:
addresses:
- 10.0.20.1
- 94.140.14.14
search:
- viktorbarzin.lan
```
`resolvectl status` output after the change:
```
Global
resolv.conf mode: stub
Current DNS Server: 10.0.20.1
DNS Servers: 10.0.20.1
Fallback DNS Servers: 94.140.14.14
DNS Domain: viktorbarzin.lan
Link 2 (eth0)
Current Scopes: DNS
Current DNS Server: 10.0.20.1
DNS Servers: 10.0.20.1 94.140.14.14
DNS Domain: viktorbarzin.lan
```
| Field | Value | Purpose |
|---|---|---|
| Primary | `10.0.20.1` | pfSense OPT1 interface (dnsmasq forwarder → Technitium LB) — resolves `.viktorbarzin.lan` |
| Fallback | `94.140.14.14` | AdGuard public DNS — used if pfSense unreachable (e.g., OPT1 flap) |
| Search | `viktorbarzin.lan` | Unqualified names resolve against the internal zone |
### Why this matters for the registry
Container builds on this VM reference `.lan` hostnames (Technitium,
NFS, etc.) and external hostnames (Docker Hub, GHCR). Before the
hardening the netplan had `1.1.1.1` / `8.8.8.8` only, which meant:
1. Internal hostname lookups silently failed (slow timeout) — the
VM could not resolve `idrac.viktorbarzin.lan` or any internal
helper.
2. If Cloudflare's 1.1.1.1 had an outage, the VM would lose DNS
entirely.
With the new config the VM can resolve both zones and keeps working
if the primary DNS server is unreachable.
## Apply / re-apply
```sh
ssh root@10.0.20.10 '
netplan generate
netplan apply
systemctl restart systemd-resolved
resolvectl status | head -20
'
```
`netplan apply` is not disruptive when only `nameservers` change — it
does not bounce the link.
## Verification
```sh
ssh root@10.0.20.10 '
dig +short idrac.viktorbarzin.lan # 192.168.1.4
dig +short github.com # GitHub A record
dig +short registry.viktorbarzin.me # 10.0.20.10 + external A
'
```
Fallback test — blackhole the primary and confirm external lookups
still succeed through 94.140.14.14:
```sh
ssh root@10.0.20.10 '
ip route add blackhole 10.0.20.1
dig +short +time=5 +tries=2 github.com # should still answer
ip route del blackhole 10.0.20.1
'
```
Internal lookups do fail during the blackhole (the fallback is a
public resolver and does not know about the internal zone), which is
expected — the fallback buys availability for external pulls, not
internal hostnames.
## Rollback
A pre-change backup of `/etc/resolv.conf`, `/etc/systemd/resolved.conf`,
and `/etc/netplan/` lives at
`/root/dns-backups/dns-config-backup-YYYYMMDD-HHMMSS.tar.gz` on the
VM. To roll back:
```sh
ssh root@10.0.20.10 '
BACKUP=$(ls -t /root/dns-backups/dns-config-backup-*.tar.gz | head -1)
tar -xzf "$BACKUP" -C /
rm -f /etc/systemd/resolved.conf.d/10-internal-dns.conf
netplan apply
systemctl restart systemd-resolved
resolvectl status | head -10
'
```
## Related docs
- `docs/architecture/dns.md` — resolver IP assignments per subnet.
- `.claude/CLAUDE.md` (at repo root) — notes on the private registry
and `containerd` `hosts.toml` redirects.

View file

@ -1,51 +0,0 @@
# Runbook: Applying the Technitium Terraform stack
Last updated: 2026-04-19
The `stacks/technitium/` apply has a **post-apply readiness gate** that asserts all three DNS instances are healthy before the apply is allowed to finish. This runbook explains what it checks, how to interpret failures, and how to override it for emergency maintenance.
## What the gate checks
`stacks/technitium/modules/technitium/readiness.tf` defines `null_resource.technitium_readiness_gate`. It runs after the three Technitium deployments, the DNS LoadBalancer service, and the PDB are applied, and performs:
1. **Rollout status**`kubectl rollout status deploy/<name> --timeout=180s` for `technitium`, `technitium-secondary`, `technitium-tertiary`. Fails if any deployment has not reached its desired pod count within 180s.
2. **Per-pod API health** — for every pod with label `dns-server=true`, executes `wget http://127.0.0.1:5380/api/stats/get` inside the pod and asserts the response contains `"status":"ok"`. Catches Technitium process hangs that TCP probes miss.
3. **Zone-count parity** — queries `technitium-web`, `technitium-secondary-web`, `technitium-tertiary-web` and counts the zones returned. Fails if the three counts differ, which would mean `technitium-zone-sync` has drifted or a replica has lost state.
The gate is re-run whenever any of the deployment container spec, the CoreDNS Corefile, or the apply timestamp changes (see `triggers` in `readiness.tf`).
## Emergency override
Set `skip_readiness=true` via terragrunt inputs or pass it directly to the Terraform apply:
```bash
cd infra/stacks/technitium
scripts/tg apply -var skip_readiness=true
```
Only use this when you need to land a Terraform change while one Technitium instance is intentionally offline (e.g., you are replacing its PVC, migrating storage, or recovering a corrupted config DB). Re-apply without the flag once the instance is back.
You can also target around the gate during emergency work:
```bash
scripts/tg apply -target=kubernetes_config_map.coredns
```
`-target` bypasses the `depends_on` chain feeding the gate, so a single-resource push does not need the gate to pass.
## Failure modes and responses
| Symptom | Likely cause | Fix |
|---------|--------------|-----|
| `rollout status` times out on one deployment | Pod stuck `Pending` (node pressure / anti-affinity with other dns-server pods) or `ImagePullBackOff` | `kubectl describe pod` for events. If anti-affinity is blocking, confirm 3 nodes are Ready. |
| API check fails on a pod but readiness probe passes | Technitium process hung but port 53 still accepting TCP (liveness probe is `tcp_socket` on :53) | `kubectl delete pod <name>` — deployment will recreate it. |
| Zone count differs between instances | `technitium-zone-sync` CronJob is failing or AXFR is blocked | `kubectl logs -n technitium -l job-name=<latest-zone-sync-job>`. Check `TechnitiumZoneSyncFailed` alert. |
| Gate passes but external clients still cannot resolve | Gate only checks in-pod API and intra-cluster zone parity — external path (LoadBalancer → Technitium pod) is not tested | Run the LAN-client drill in `docs/architecture/dns.md` troubleshooting section. |
## What the gate does NOT check
- External reachability through the LoadBalancer IP `10.0.20.201` (that would require a LAN-side probe).
- CoreDNS health (CoreDNS is patched by `coredns.tf`, not this module's deployments — alerts `CoreDNSErrors` / `CoreDNSForwardFailureRate` catch regressions post-apply).
- Upstream resolver health (covered by `CoreDNSForwardFailureRate`).
For broader end-to-end verification, see `docs/architecture/dns.md` → "Verification" section, or run the Uptime Kuma external DNS probe.

View file

@ -148,19 +148,10 @@ locals {
# record (either CF-proxied or direct A/AAAA). Explicit bool overrides.
effective_external_monitor = var.external_monitor != null ? var.external_monitor : (var.dns_type != "none")
# Emit the annotation when effective is true (positive signal), or when the
# caller explicitly set external_monitor=false (opt-out). When the caller
# leaves it null AND dns_type="none", emit nothing the sync script's
# default opt-in (any *.viktorbarzin.me ingress) keeps monitoring services
# that are publicly reachable via routes we don't manage here (e.g.
# helm-provisioned ingresses, services behind cloudflared tunnel with DNS
# set elsewhere).
external_monitor_annotations = local.effective_external_monitor ? merge(
{ "uptime.viktorbarzin.me/external-monitor" = "true" },
var.external_monitor_name != null ? { "uptime.viktorbarzin.me/external-monitor-name" = var.external_monitor_name } : {},
) : (var.external_monitor == false ?
{ "uptime.viktorbarzin.me/external-monitor" = "false" } : {}
)
) : {}
ns_to_group = {
monitoring = "Infrastructure"

View file

@ -1,7 +1,7 @@
#!/usr/bin/env bash
# Cluster health check script.
# Runs 42 diagnostic checks against the Kubernetes cluster and prints
# Runs 24 diagnostic checks against the Kubernetes cluster and prints
# a colour-coded report with PASS / WARN / FAIL for each section.
#
# Usage: ./scripts/cluster_healthcheck.sh [--fix] [--quiet|-q] [--json] [--kubeconfig <path>]
@ -26,7 +26,7 @@ JSON=false
KUBECONFIG_PATH="$(pwd)/config"
KUBECTL=""
JSON_RESULTS=()
TOTAL_CHECKS=42
TOTAL_CHECKS=30
# --- Helpers ---
info() { [[ "$JSON" == true ]] && return 0; echo -e "${BLUE}[INFO]${NC} $*"; }
@ -71,16 +71,14 @@ parse_args() {
while [[ $# -gt 0 ]]; do
case "$1" in
--fix) FIX=true; shift ;;
--no-fix) FIX=false; shift ;;
--quiet|-q) QUIET=true; shift ;;
--json) JSON=true; shift ;;
--kubeconfig) KUBECONFIG_PATH="$2"; shift 2 ;;
-h|--help)
echo "Usage: $0 [--fix|--no-fix] [--quiet|-q] [--json] [--kubeconfig <path>]"
echo "Usage: $0 [--fix] [--quiet|-q] [--json] [--kubeconfig <path>]"
echo ""
echo "Flags:"
echo " --fix Auto-remediate safe issues (delete evicted pods)"
echo " --no-fix Disable auto-remediation (default)"
echo " --quiet, -q Only show WARN and FAIL sections"
echo " --json Machine-readable JSON output"
echo " --kubeconfig PATH Override kubeconfig (default: \$(pwd)/config)"
@ -1752,593 +1750,6 @@ else:
json_add "hardware_exporters" "$status" "${detail:-All healthy}"
}
# --- 31. cert-manager: Certificate Readiness ---
check_cert_manager_certificates() {
section 31 "cert-manager — Certificate Readiness"
local certs not_ready detail="" status="PASS"
certs=$($KUBECTL get certificates.cert-manager.io -A -o json 2>/dev/null) || {
warn "cert-manager CRDs not installed or inaccessible"
json_add "certmanager_certificates" "WARN" "CRDs unavailable"
return 0
}
not_ready=$(echo "$certs" | python3 -c '
import json, sys
data = json.load(sys.stdin)
for item in data.get("items", []):
ns = item["metadata"]["namespace"]
name = item["metadata"]["name"]
conds = item.get("status", {}).get("conditions", [])
ready = next((c for c in conds if c.get("type") == "Ready"), None)
if not ready or ready.get("status") != "True":
reason = ready.get("reason", "NoCondition") if ready else "NoCondition"
print(f"{ns}/{name}:{reason}")
' 2>/dev/null) || true
if [[ -z "$not_ready" ]]; then
pass "All Certificate CRs Ready"
json_add "certmanager_certificates" "PASS" "All Ready"
else
[[ "$QUIET" == true ]] && section_always 31 "cert-manager — Certificate Readiness"
local count
count=$(count_lines "$not_ready")
while IFS= read -r line; do
fail "Certificate not Ready: $line"
detail+="$line; "
done <<< "$not_ready"
status="FAIL"
json_add "certmanager_certificates" "$status" "$count not Ready: $detail"
fi
}
# --- 32. cert-manager: Certificate Expiry (<14d) ---
check_cert_manager_expiry() {
section 32 "cert-manager — Certificate Expiry (<14d)"
local certs expiring detail="" status="PASS"
certs=$($KUBECTL get certificates.cert-manager.io -A -o json 2>/dev/null) || {
warn "cert-manager CRDs not installed or inaccessible"
json_add "certmanager_expiry" "WARN" "CRDs unavailable"
return 0
}
expiring=$(echo "$certs" | python3 -c '
import json, sys
from datetime import datetime, timezone, timedelta
data = json.load(sys.stdin)
cutoff = datetime.now(timezone.utc) + timedelta(days=14)
for item in data.get("items", []):
ns = item["metadata"]["namespace"]
name = item["metadata"]["name"]
not_after = item.get("status", {}).get("notAfter")
if not not_after:
continue
try:
expiry = datetime.fromisoformat(not_after.replace("Z", "+00:00"))
if expiry < cutoff:
days = (expiry - datetime.now(timezone.utc)).days
level = "FAIL" if days <= 3 else "WARN"
print(f"{level}:{ns}/{name}:{days}")
except ValueError:
pass
' 2>/dev/null) || true
if [[ -z "$expiring" ]]; then
pass "No Certificate CRs expiring within 14 days"
json_add "certmanager_expiry" "PASS" "None expiring <14d"
else
[[ "$QUIET" == true ]] && section_always 32 "cert-manager — Certificate Expiry (<14d)"
while IFS= read -r line; do
local level cert_name days
level=$(echo "$line" | cut -d: -f1)
cert_name=$(echo "$line" | cut -d: -f2)
days=$(echo "$line" | cut -d: -f3)
if [[ "$level" == "FAIL" ]]; then
fail "Certificate $cert_name expires in ${days}d"
status="FAIL"
else
warn "Certificate $cert_name expires in ${days}d"
[[ "$status" != "FAIL" ]] && status="WARN"
fi
detail+="$cert_name=${days}d; "
done <<< "$expiring"
json_add "certmanager_expiry" "$status" "$detail"
fi
}
# --- 33. cert-manager: Failed CertificateRequests ---
check_cert_manager_requests() {
section 33 "cert-manager — Failed CertificateRequests"
local requests failed detail="" status="PASS"
requests=$($KUBECTL get certificaterequests.cert-manager.io -A -o json 2>/dev/null) || {
warn "cert-manager CRDs not installed or inaccessible"
json_add "certmanager_requests" "WARN" "CRDs unavailable"
return 0
}
failed=$(echo "$requests" | python3 -c '
import json, sys
data = json.load(sys.stdin)
for item in data.get("items", []):
ns = item["metadata"]["namespace"]
name = item["metadata"]["name"]
conds = item.get("status", {}).get("conditions", [])
for c in conds:
if c.get("type") == "Ready" and c.get("status") == "False" and c.get("reason") == "Failed":
print(f"{ns}/{name}:{c.get(\"message\", \"\")[:80]}")
break
' 2>/dev/null) || true
if [[ -z "$failed" ]]; then
pass "No failed CertificateRequests"
json_add "certmanager_requests" "PASS" "None failed"
else
[[ "$QUIET" == true ]] && section_always 33 "cert-manager — Failed CertificateRequests"
local count
count=$(count_lines "$failed")
while IFS= read -r line; do
fail "CertificateRequest failed: $line"
detail+="$line; "
done <<< "$failed"
status="FAIL"
json_add "certmanager_requests" "$status" "$count failed: $detail"
fi
}
# --- 34. Backup Freshness: Per-DB Dumps ---
check_backup_per_db() {
section 34 "Backup Freshness — Per-DB Dumps"
local detail="" had_issue=false status="PASS"
# Freshness threshold: 25 hours
local now_epoch max_age_sec
now_epoch=$(date -u +%s)
max_age_sec=$((25 * 3600))
_check_cronjob_fresh() {
local ns="$1" cj="$2" label="$3"
local ts age_sec
ts=$($KUBECTL get cronjob -n "$ns" "$cj" -o jsonpath='{.status.lastSuccessfulTime}' 2>/dev/null || true)
if [[ -z "$ts" ]]; then
[[ "$had_issue" == false && "$QUIET" == true ]] && section_always 34 "Backup Freshness — Per-DB Dumps"
fail "$label: CronJob $ns/$cj has no lastSuccessfulTime"
detail+="${label}=no-success; "
had_issue=true
status="FAIL"
return 0
fi
local ts_epoch
ts_epoch=$(date -u -d "$ts" +%s 2>/dev/null || echo 0)
age_sec=$((now_epoch - ts_epoch))
if [[ "$age_sec" -gt "$max_age_sec" ]]; then
[[ "$had_issue" == false && "$QUIET" == true ]] && section_always 34 "Backup Freshness — Per-DB Dumps"
local age_h=$((age_sec / 3600))
fail "$label: last success ${age_h}h ago (>25h)"
detail+="${label}=${age_h}h; "
had_issue=true
status="FAIL"
else
local age_h=$((age_sec / 3600))
detail+="${label}=${age_h}h; "
fi
}
_check_cronjob_fresh dbaas mysql-backup-per-db mysql
_check_cronjob_fresh dbaas postgresql-backup-per-db pg
[[ "$had_issue" == false ]] && pass "Per-DB dumps fresh — $detail"
json_add "backup_per_db" "$status" "$detail"
}
# --- 35. Backup Freshness: Offsite Sync ---
check_backup_offsite_sync() {
section 35 "Backup Freshness — Offsite Sync"
local metrics detail="" status="PASS"
metrics=$($KUBECTL exec -n monitoring deploy/prometheus-server -- \
wget -qO- "http://prometheus-prometheus-pushgateway:9091/metrics" 2>/dev/null || true)
if [[ -z "$metrics" ]]; then
[[ "$QUIET" == true ]] && section_always 35 "Backup Freshness — Offsite Sync"
warn "Cannot query Pushgateway"
json_add "backup_offsite_sync" "WARN" "Pushgateway unreachable"
return 0
fi
local age_hours
age_hours=$(echo "$metrics" | python3 -c '
import sys, re, time
ts = None
for line in sys.stdin:
if line.startswith("#"):
continue
if "backup_last_success_timestamp" in line and "offsite-backup-sync" in line:
m = re.search(r"\s([0-9.eE+]+)\s*$", line.strip())
if m:
try:
ts = float(m.group(1))
break
except ValueError:
pass
if ts is None:
print("missing")
else:
age = (time.time() - ts) / 3600
print(f"{age:.1f}")
' 2>/dev/null) || age_hours="error"
if [[ "$age_hours" == "missing" ]]; then
[[ "$QUIET" == true ]] && section_always 35 "Backup Freshness — Offsite Sync"
fail "backup_last_success_timestamp metric missing for offsite-backup-sync"
json_add "backup_offsite_sync" "FAIL" "Metric missing"
elif [[ "$age_hours" == "error" ]]; then
[[ "$QUIET" == true ]] && section_always 35 "Backup Freshness — Offsite Sync"
warn "Failed to parse Pushgateway metric"
json_add "backup_offsite_sync" "WARN" "Parse error"
else
local age_int
age_int=$(printf '%.0f' "$age_hours")
if [[ "$age_int" -gt 27 ]]; then
[[ "$QUIET" == true ]] && section_always 35 "Backup Freshness — Offsite Sync"
fail "Offsite sync last success ${age_hours}h ago (>27h)"
status="FAIL"
else
pass "Offsite sync last success ${age_hours}h ago"
fi
detail="age=${age_hours}h"
json_add "backup_offsite_sync" "$status" "$detail"
fi
}
# --- 36. Backup Freshness: LVM PVC Snapshots ---
check_backup_lvm_snapshots() {
section 36 "Backup Freshness — LVM PVC Snapshots"
local snap_output detail="" status="PASS"
snap_output=$(ssh -o BatchMode=yes -o ConnectTimeout=5 -o StrictHostKeyChecking=no \
root@192.168.1.127 "lvs -o lv_name,lv_time --noheadings 2>/dev/null | grep -- -snap" 2>/dev/null || true)
if [[ -z "$snap_output" ]]; then
[[ "$QUIET" == true ]] && section_always 36 "Backup Freshness — LVM PVC Snapshots"
warn "No LVM PVC snapshots found or SSH to 192.168.1.127 failed (BatchMode)"
json_add "backup_lvm_snapshots" "WARN" "SSH failed or no snapshots"
return 0
fi
local newest_age_hours
newest_age_hours=$(echo "$snap_output" | python3 -c '
import sys, re, time
from datetime import datetime
newest = None
for line in sys.stdin:
line = line.strip()
if not line:
continue
parts = line.split(None, 1)
if len(parts) < 2:
continue
date_str = parts[1].strip()
# lv_time format: "2026-04-19 03:00:01 +0000" or similar
for fmt in ("%Y-%m-%d %H:%M:%S %z", "%Y-%m-%d %H:%M:%S"):
try:
dt = datetime.strptime(date_str, fmt)
ts = dt.timestamp()
if newest is None or ts > newest:
newest = ts
break
except ValueError:
continue
if newest is None:
print("parse_error")
else:
age = (time.time() - newest) / 3600
print(f"{age:.1f}")
' 2>/dev/null) || newest_age_hours="error"
if [[ "$newest_age_hours" == "parse_error" || "$newest_age_hours" == "error" ]]; then
[[ "$QUIET" == true ]] && section_always 36 "Backup Freshness — LVM PVC Snapshots"
warn "Could not parse LVM snapshot timestamps"
json_add "backup_lvm_snapshots" "WARN" "Parse error"
else
local count age_int
count=$(count_lines "$snap_output")
age_int=$(printf '%.0f' "$newest_age_hours")
if [[ "$age_int" -gt 25 ]]; then
[[ "$QUIET" == true ]] && section_always 36 "Backup Freshness — LVM PVC Snapshots"
fail "Newest LVM snapshot ${newest_age_hours}h old (>25h); $count total"
status="FAIL"
else
pass "LVM snapshots fresh — $count total, newest ${newest_age_hours}h old"
fi
detail="count=$count newest=${newest_age_hours}h"
json_add "backup_lvm_snapshots" "$status" "$detail"
fi
}
# --- 37. Monitoring: Prometheus + Alertmanager ---
check_monitoring_prom_am() {
section 37 "Monitoring — Prometheus + Alertmanager"
local detail="" had_issue=false status="PASS"
# Prometheus /-/ready
local prom_ready
prom_ready=$($KUBECTL exec -n monitoring deploy/prometheus-server -- \
wget -qO- "http://localhost:9090/-/ready" 2>/dev/null || true)
if echo "$prom_ready" | grep -qi "ready"; then
detail+="prometheus=ready; "
else
[[ "$had_issue" == false && "$QUIET" == true ]] && section_always 37 "Monitoring — Prometheus + Alertmanager"
fail "Prometheus /-/ready returned no Ready response"
detail+="prometheus=not-ready; "
had_issue=true
status="FAIL"
fi
# Alertmanager running pod count
local am_running
am_running=$($KUBECTL get pods -n monitoring --no-headers 2>/dev/null | \
grep alertmanager | awk '$3 == "Running"' | wc -l | tr -d ' ')
if [[ "$am_running" -gt 0 ]]; then
detail+="alertmanager=${am_running} running; "
else
[[ "$had_issue" == false && "$QUIET" == true ]] && section_always 37 "Monitoring — Prometheus + Alertmanager"
fail "Alertmanager: 0 Running pods"
detail+="alertmanager=none-running; "
had_issue=true
status="FAIL"
fi
[[ "$had_issue" == false ]] && pass "Prometheus Ready, $am_running Alertmanager pod(s) Running"
json_add "monitoring_prom_am" "$status" "$detail"
}
# --- 38. Monitoring: Vault Sealed Status ---
check_monitoring_vault() {
section 38 "Monitoring — Vault Sealed Status"
local output detail="" status="PASS"
output=$($KUBECTL exec -n vault vault-0 -- \
sh -c 'VAULT_ADDR=http://127.0.0.1:8200 vault status' 2>&1 || true)
if [[ -z "$output" ]]; then
[[ "$QUIET" == true ]] && section_always 38 "Monitoring — Vault Sealed Status"
fail "Cannot exec vault status on vault-0"
json_add "monitoring_vault" "FAIL" "Exec failed"
return 0
fi
if echo "$output" | grep -qi "^Sealed[[:space:]]*false"; then
pass "Vault unsealed"
detail="sealed=false"
json_add "monitoring_vault" "PASS" "$detail"
elif echo "$output" | grep -qi "^Sealed[[:space:]]*true"; then
[[ "$QUIET" == true ]] && section_always 38 "Monitoring — Vault Sealed Status"
fail "Vault is SEALED — secrets unavailable"
detail="sealed=true"
status="FAIL"
json_add "monitoring_vault" "$status" "$detail"
else
[[ "$QUIET" == true ]] && section_always 38 "Monitoring — Vault Sealed Status"
warn "Cannot parse vault status output"
json_add "monitoring_vault" "WARN" "Parse error"
fi
}
# --- 39. Monitoring: ClusterSecretStore Ready ---
check_monitoring_css() {
section 39 "Monitoring — ClusterSecretStore Ready"
local css not_ready detail="" status="PASS"
css=$($KUBECTL get clustersecretstore -o json 2>/dev/null) || {
[[ "$QUIET" == true ]] && section_always 39 "Monitoring — ClusterSecretStore Ready"
warn "ClusterSecretStore CRD not installed"
json_add "monitoring_css" "WARN" "CRD missing"
return 0
}
not_ready=$(echo "$css" | python3 -c '
import json, sys
data = json.load(sys.stdin)
for item in data.get("items", []):
name = item["metadata"]["name"]
conds = item.get("status", {}).get("conditions", [])
ready = next((c for c in conds if c.get("type") == "Ready"), None)
if not ready or ready.get("status") != "True":
print(f"{name}:{ready.get(\"reason\", \"NoCondition\") if ready else \"NoCondition\"}")
' 2>/dev/null) || true
if [[ -z "$not_ready" ]]; then
local total
total=$(echo "$css" | python3 -c 'import json,sys; print(len(json.load(sys.stdin).get("items",[])))' 2>/dev/null || echo "?")
pass "All $total ClusterSecretStores Ready"
json_add "monitoring_css" "PASS" "$total Ready"
else
[[ "$QUIET" == true ]] && section_always 39 "Monitoring — ClusterSecretStore Ready"
while IFS= read -r line; do
fail "ClusterSecretStore not Ready: $line"
detail+="$line; "
done <<< "$not_ready"
status="FAIL"
json_add "monitoring_css" "$status" "$detail"
fi
}
# --- 40. External Reachability: Cloudflared + Authentik Replicas ---
check_external_replicas() {
section 40 "External — Cloudflared + Authentik Replicas"
local detail="" had_issue=false status="PASS"
# Cloudflared
local cf_json cf_ready cf_desired
cf_json=$($KUBECTL get deployment cloudflared -n cloudflared -o json 2>/dev/null || true)
if [[ -z "$cf_json" ]]; then
[[ "$had_issue" == false && "$QUIET" == true ]] && section_always 40 "External — Cloudflared + Authentik Replicas"
fail "Cloudflared deployment not found"
detail+="cloudflared=missing; "
had_issue=true
status="FAIL"
else
cf_ready=$(echo "$cf_json" | python3 -c 'import json,sys; print(json.load(sys.stdin).get("status",{}).get("readyReplicas",0) or 0)' 2>/dev/null || echo "0")
cf_desired=$(echo "$cf_json" | python3 -c 'import json,sys; print(json.load(sys.stdin).get("spec",{}).get("replicas",0) or 0)' 2>/dev/null || echo "0")
if [[ "$cf_ready" != "$cf_desired" ]]; then
[[ "$had_issue" == false && "$QUIET" == true ]] && section_always 40 "External — Cloudflared + Authentik Replicas"
fail "Cloudflared: $cf_ready/$cf_desired ready (external access degraded)"
detail+="cloudflared=${cf_ready}/${cf_desired}; "
had_issue=true
status="FAIL"
else
detail+="cloudflared=${cf_ready}/${cf_desired}; "
fi
fi
# Authentik server (Helm chart names the deployment goauthentik-server)
local auth_json auth_ready auth_desired
auth_json=$($KUBECTL get deployment goauthentik-server -n authentik -o json 2>/dev/null || true)
if [[ -z "$auth_json" ]]; then
[[ "$had_issue" == false && "$QUIET" == true ]] && section_always 40 "External — Cloudflared + Authentik Replicas"
warn "goauthentik-server deployment not found in authentik namespace"
detail+="authentik=missing; "
had_issue=true
[[ "$status" != "FAIL" ]] && status="WARN"
else
auth_ready=$(echo "$auth_json" | python3 -c 'import json,sys; print(json.load(sys.stdin).get("status",{}).get("readyReplicas",0) or 0)' 2>/dev/null || echo "0")
auth_desired=$(echo "$auth_json" | python3 -c 'import json,sys; print(json.load(sys.stdin).get("spec",{}).get("replicas",0) or 0)' 2>/dev/null || echo "0")
if [[ "$auth_ready" != "$auth_desired" ]]; then
[[ "$had_issue" == false && "$QUIET" == true ]] && section_always 40 "External — Cloudflared + Authentik Replicas"
fail "goauthentik-server: $auth_ready/$auth_desired ready (auth degraded)"
detail+="authentik=${auth_ready}/${auth_desired}; "
had_issue=true
status="FAIL"
else
detail+="authentik=${auth_ready}/${auth_desired}; "
fi
fi
[[ "$had_issue" == false ]] && pass "Cloudflared + authentik-server at full replicas ($detail)"
json_add "external_replicas" "$status" "$detail"
}
# --- 41. External Reachability: ExternalAccessDivergence Alert ---
check_external_divergence() {
section 41 "External — ExternalAccessDivergence Alert"
local alerts result detail="" status="PASS"
alerts=$($KUBECTL exec -n monitoring deploy/prometheus-server -- \
wget -qO- "http://localhost:9090/api/v1/alerts" 2>/dev/null || true)
if [[ -z "$alerts" ]]; then
[[ "$QUIET" == true ]] && section_always 41 "External — ExternalAccessDivergence Alert"
warn "Cannot query Prometheus alerts"
json_add "external_divergence" "WARN" "Cannot query"
return 0
fi
result=$(echo "$alerts" | python3 -c '
import json, sys
try:
data = json.load(sys.stdin)
alerts = data.get("data", {}).get("alerts", []) if isinstance(data, dict) else data
firing = [a for a in alerts
if a.get("labels", {}).get("alertname") == "ExternalAccessDivergence"
and a.get("state") == "firing"]
if firing:
hosts = [a.get("labels", {}).get("host") or a.get("labels", {}).get("service") or "?" for a in firing]
print(f"{len(firing)}:" + ",".join(hosts))
else:
print("0:")
except Exception as e:
print(f"error:{e}")
' 2>/dev/null) || result="error:parse"
if [[ "$result" == error:* ]]; then
[[ "$QUIET" == true ]] && section_always 41 "External — ExternalAccessDivergence Alert"
warn "Failed to parse alerts JSON: ${result#error:}"
json_add "external_divergence" "WARN" "Parse error"
return 0
fi
local count names
count=$(echo "$result" | cut -d: -f1)
names=$(echo "$result" | cut -d: -f2-)
if [[ "$count" -eq 0 ]]; then
pass "ExternalAccessDivergence not firing"
json_add "external_divergence" "PASS" "Not firing"
else
[[ "$QUIET" == true ]] && section_always 41 "External — ExternalAccessDivergence Alert"
fail "ExternalAccessDivergence firing for $count target(s): $names"
status="FAIL"
detail="$count firing: $names"
json_add "external_divergence" "$status" "$detail"
fi
}
# --- 42. External Reachability: Traefik 5xx Rate ---
check_external_traefik_5xx() {
section 42 "External — Traefik 5xx Rate (15m)"
local query_result detail="" status="PASS"
query_result=$($KUBECTL exec -n monitoring deploy/prometheus-server -- \
wget -qO- 'http://localhost:9090/api/v1/query?query=topk(10,rate(traefik_service_requests_total{code=~%225..%22}%5B15m%5D))' 2>/dev/null || true)
if [[ -z "$query_result" ]]; then
[[ "$QUIET" == true ]] && section_always 42 "External — Traefik 5xx Rate (15m)"
warn "Cannot query Prometheus for traefik 5xx rate"
json_add "external_traefik_5xx" "WARN" "Query failed"
return 0
fi
local parsed
parsed=$(echo "$query_result" | python3 -c '
import json, sys
try:
data = json.load(sys.stdin)
results = data.get("data", {}).get("result", [])
hot = [(r.get("metric", {}).get("service", "?"), float(r.get("value", [0, "0"])[1])) for r in results]
hot = [(s, v) for s, v in hot if v > 0.01] # 1% req/s threshold
hot.sort(key=lambda x: -x[1])
if not hot:
print("0:")
else:
top = [f"{s}={v:.2f}/s" for s, v in hot[:5]]
print(f"{len(hot)}:" + "; ".join(top))
except Exception as e:
print(f"error:{e}")
' 2>/dev/null) || parsed="error:parse"
if [[ "$parsed" == error:* ]]; then
[[ "$QUIET" == true ]] && section_always 42 "External — Traefik 5xx Rate (15m)"
warn "Parse failed: ${parsed#error:}"
json_add "external_traefik_5xx" "WARN" "Parse error"
return 0
fi
local count top
count=$(echo "$parsed" | cut -d: -f1)
top=$(echo "$parsed" | cut -d: -f2-)
if [[ "$count" -eq 0 ]]; then
pass "No Traefik services with 5xx rate >0.01 req/s (last 15m)"
json_add "external_traefik_5xx" "PASS" "None above threshold"
else
[[ "$QUIET" == true ]] && section_always 42 "External — Traefik 5xx Rate (15m)"
# WARN at any 5xx; FAIL if top service >1 req/s
local top_rate
top_rate=$(echo "$top" | grep -oE '[0-9.]+/s' | head -1 | tr -d '/s')
if awk "BEGIN{exit !($top_rate > 1.0)}" 2>/dev/null; then
fail "$count Traefik service(s) with elevated 5xx: $top"
status="FAIL"
else
warn "$count Traefik service(s) emitting 5xx: $top"
status="WARN"
fi
detail="$count services: $top"
json_add "external_traefik_5xx" "$status" "$detail"
fi
}
# --- Summary ---
print_summary() {
if [[ "$JSON" == true ]]; then
@ -2421,18 +1832,6 @@ main() {
check_ha_automations
check_ha_system
check_hardware_exporters
check_cert_manager_certificates
check_cert_manager_expiry
check_cert_manager_requests
check_backup_per_db
check_backup_offsite_sync
check_backup_lvm_snapshots
check_monitoring_prom_am
check_monitoring_vault
check_monitoring_css
check_external_replicas
check_external_divergence
check_external_traefik_5xx
print_summary
# Exit code: 2 for failures, 1 for warnings, 0 for clean

View file

@ -1,188 +0,0 @@
<?php
// pfSense HAProxy bootstrap — configures the mailserver PROXY-v2 path
// (bd code-yiu, Phases 2/3 + 5).
//
// WHY THIS EXISTS
// pfSense HAProxy config is stored XML-in-`/cf/conf/config.xml` under
// `<installedpackages><haproxy>`. That file IS picked up by the nightly
// `daily-backup` on the PVE host (see `scripts/daily-backup.sh` → `scp
// root@10.0.20.1:/cf/conf/config.xml`) and synced to Synology. This script
// is the canonical reproducer: run it to rebuild the pfSense HAProxy config
// from scratch (DR restore, fresh pfSense install, etc.).
//
// WHAT IT BUILDS
// 4 backend pools — one per mail port:
// mailserver_nodes_smtp → k8s-node1..4:30125 (container :2525 postscreen)
// mailserver_nodes_smtps → k8s-node1..4:30126 (container :4465 smtps)
// mailserver_nodes_sub → k8s-node1..4:30127 (container :5587 submission)
// mailserver_nodes_imaps → k8s-node1..4:30128 (container :10993 IMAPS)
// Each server uses `send-proxy-v2` and TCP health-check every 120s.
// 4 frontends on pfSense 10.0.20.1:{25,465,587,993} TCP mode.
// + 1 legacy test frontend on :2525 (kept for validation; safe to remove later).
//
// USAGE (on pfSense host, via SSH as admin)
// scp infra/scripts/pfsense-haproxy-bootstrap.php admin@10.0.20.1:/tmp/
// ssh admin@10.0.20.1 'php /tmp/pfsense-haproxy-bootstrap.php'
//
// IDEMPOTENCY
// Removes any existing entries named mailserver_* before re-adding, so
// repeat runs are safe and behave as reset-to-declared.
require_once('/etc/inc/config.inc');
require_once('/usr/local/pkg/haproxy/haproxy.inc');
require_once('/usr/local/pkg/haproxy/haproxy_utils.inc');
global $config;
parse_config(true);
if (!is_array($config['installedpackages']['haproxy'])) {
$config['installedpackages']['haproxy'] = [];
}
$h = &$config['installedpackages']['haproxy'];
$h['enable'] = 'yes';
$h['maxconn'] = '1000';
// Our declared object names (anything starting with mailserver_ is ours)
$POOL_NAMES = [
'mailserver_nodes', // legacy (Phase 2/3 test)
'mailserver_nodes_smtp',
'mailserver_nodes_smtps',
'mailserver_nodes_sub',
'mailserver_nodes_imaps',
];
$FRONTEND_NAMES = [
'mailserver_proxy_test', // legacy (Phase 2/3 test, :2525)
'mailserver_proxy_25',
'mailserver_proxy_465',
'mailserver_proxy_587',
'mailserver_proxy_993',
];
// k8s workers. Not in the cluster: master (control-plane) and node5
// (doesn't exist in this topology).
$NODES = [
['k8s-node1', '10.0.20.101'],
['k8s-node2', '10.0.20.102'],
['k8s-node3', '10.0.20.103'],
['k8s-node4', '10.0.20.104'],
];
function build_pool(string $name, string $nodeport, array $nodes): array {
$servers = [];
foreach ($nodes as $n) {
$servers[] = [
'name' => $n[0],
'address' => $n[1],
'port' => $nodeport,
'weight' => '10',
'ssl' => '',
// check every 2 min — send-proxy-v2 check + close generates
// noise on postscreen, not worth doing more often.
'checkinter' => '120000',
'advanced' => 'send-proxy-v2',
'status' => 'active',
];
}
return [
'name' => $name,
'balance' => 'roundrobin',
'check_type' => 'TCP',
'checkinter' => '120000',
'retries' => '3',
'ha_servers' => ['item' => $servers],
'advanced_bind' => '',
'persist_cookie_enabled' => '',
'transparent_clientip' => '',
'advanced' => '',
];
}
function build_frontend(string $name, string $descr, string $extaddr, string $port, string $pool): array {
return [
'name' => $name,
'descr' => $descr,
'status' => 'active',
'secondary' => '',
'type' => 'tcp',
'a_extaddr' => ['item' => [[
'extaddr' => $extaddr,
'extaddr_port' => $port,
'extaddr_ssl' => '',
'extaddr_advanced' => '',
]]],
'backend_serverpool' => $pool,
'ha_acls' => '',
'dontlognull'=> '',
'httpclose' => '',
'forwardfor' => '',
'advanced' => '',
];
}
// ── Backend pools ───────────────────────────────────────────────────────
if (!is_array($h['ha_pools'])) $h['ha_pools'] = ['item' => []];
if (!is_array($h['ha_pools']['item'])) $h['ha_pools']['item'] = [];
$h['ha_pools']['item'] = array_values(array_filter(
$h['ha_pools']['item'],
fn($p) => !in_array($p['name'] ?? '', $POOL_NAMES, true)
));
// Legacy test pool (still used by the :2525 test frontend for manual SMTP roundtrip).
$h['ha_pools']['item'][] = build_pool('mailserver_nodes', '30125', $NODES);
// Production pools — one per mail port.
$h['ha_pools']['item'][] = build_pool('mailserver_nodes_smtp', '30125', $NODES);
$h['ha_pools']['item'][] = build_pool('mailserver_nodes_smtps', '30126', $NODES);
$h['ha_pools']['item'][] = build_pool('mailserver_nodes_sub', '30127', $NODES);
$h['ha_pools']['item'][] = build_pool('mailserver_nodes_imaps', '30128', $NODES);
// ── Frontends ───────────────────────────────────────────────────────────
if (!is_array($h['ha_backends'])) $h['ha_backends'] = ['item' => []];
if (!is_array($h['ha_backends']['item'])) $h['ha_backends']['item'] = [];
$h['ha_backends']['item'] = array_values(array_filter(
$h['ha_backends']['item'],
fn($f) => !in_array($f['name'] ?? '', $FRONTEND_NAMES, true)
));
// Legacy test frontend — :2525 — retained so SMTP roundtrip tests keep working
// without touching the real :25. Safe to remove once fully validated.
$h['ha_backends']['item'][] = build_frontend(
'mailserver_proxy_test',
'code-yiu Phase 2/3 test — PROXY v2 to k8s mailserver NodePort 30125 (alt port :2525)',
'10.0.20.1', '2525',
'mailserver_nodes'
);
// Production frontends — 4 ports listening on pfSense VLAN20 IP 10.0.20.1.
$h['ha_backends']['item'][] = build_frontend(
'mailserver_proxy_25',
'code-yiu Phase 4/5 — external SMTP (:25) via PROXY v2 → pod :2525 postscreen',
'10.0.20.1', '25',
'mailserver_nodes_smtp'
);
$h['ha_backends']['item'][] = build_frontend(
'mailserver_proxy_465',
'code-yiu Phase 4/5 — external SMTPS (:465) via PROXY v2 → pod :4465 smtpd',
'10.0.20.1', '465',
'mailserver_nodes_smtps'
);
$h['ha_backends']['item'][] = build_frontend(
'mailserver_proxy_587',
'code-yiu Phase 4/5 — external submission (:587) via PROXY v2 → pod :5587 smtpd',
'10.0.20.1', '587',
'mailserver_nodes_sub'
);
$h['ha_backends']['item'][] = build_frontend(
'mailserver_proxy_993',
'code-yiu Phase 4/5 — external IMAPS (:993) via PROXY v2 → pod :10993 Dovecot',
'10.0.20.1', '993',
'mailserver_nodes_imaps'
);
write_config('code-yiu: mailserver HAProxy — 4 production frontends + legacy :2525 test');
$messages = '';
$rc = haproxy_check_and_run($messages, true);
echo 'haproxy_check_and_run rc=' . ($rc ? 'OK' : 'FAIL') . "\n";
echo "messages: $messages\n";

View file

@ -1,68 +0,0 @@
<?php
// pfSense NAT redirect flip — mail ports 25/465/587/993 from
// <mailserver> alias (10.0.20.202 MetalLB LB) to pfSense's own HAProxy
// listener (10.0.20.1). bd code-yiu.
//
// THIS IS THE CUTOVER. After this script:
// Internet → pfSense WAN:{25,465,587,993} → rdr → 10.0.20.1:{...}
// (pfSense HAProxy) → send-proxy-v2 → k8s-node:{30125..30128} NodePort
// → kube-proxy → mailserver pod alt listeners (2525/4465/5587/10993)
// → Postfix/Dovecot parse PROXY v2 → real client IP recovered.
//
// Internal clients (Roundcube, email-roundtrip-monitor CronJob) continue
// using the existing mailserver ClusterIP Service on the stock ports
// (25/465/587/993) which hit container stock listeners WITHOUT PROXY.
// No change to internal traffic paths.
//
// USAGE
// scp infra/scripts/pfsense-nat-mailserver-haproxy-flip.php admin@10.0.20.1:/tmp/
// ssh admin@10.0.20.1 'php /tmp/pfsense-nat-mailserver-haproxy-flip.php'
//
// REVERT — run pfsense-nat-mailserver-haproxy-unflip.php (companion script).
//
// IDEMPOTENT — re-runs converge. Flips nothing if already pointed at 10.0.20.1.
require_once('/etc/inc/config.inc');
require_once('/etc/inc/filter.inc');
global $config;
parse_config(true);
$PORTS_TO_FLIP = ['25', '465', '587', '993'];
$OLD_TARGET = 'mailserver';
$NEW_TARGET = '10.0.20.1';
$changed = 0;
foreach ($config['nat']['rule'] as $i => &$r) {
$iface = $r['interface'] ?? '';
$lport = $r['local-port'] ?? '';
$tgt = $r['target'] ?? '';
if ($iface !== 'wan') continue;
if (!in_array($lport, $PORTS_TO_FLIP, true)) continue;
if ($tgt !== $OLD_TARGET) {
printf("rule %d (dport=%s) target=%s — not flipping (already %s or unexpected)\n",
$i, $lport, $tgt, $NEW_TARGET);
continue;
}
$r['target'] = $NEW_TARGET;
// Also unset the 'associated-rule-id' linked filter rule target if any —
// actually pfSense regenerates the associated rule from NAT rule on apply,
// so leaving associated-rule-id intact is fine.
$changed++;
printf("rule %d (dport=%s): target %s → %s\n", $i, $lport, $OLD_TARGET, $NEW_TARGET);
}
unset($r);
if ($changed === 0) {
echo "No changes. (Already flipped? Run unflip script to revert.)\n";
exit(0);
}
write_config("code-yiu: NAT rdr — mail ports {$changed} flipped to HAProxy (10.0.20.1)");
// Rebuild pf rules & reload.
$rc = filter_configure();
printf("filter_configure rc=%s\n", var_export($rc, true));
echo "done.\n";

View file

@ -1,48 +0,0 @@
<?php
// REVERT of pfsense-nat-mailserver-haproxy-flip.php.
// Moves mail-port NAT rdr target from 10.0.20.1 (pfSense HAProxy) back to
// <mailserver> alias (10.0.20.202 MetalLB LB IP). bd code-yiu rollback.
//
// USE THIS IF: external mail breaks after the flip, any postscreen
// PROXY timeouts show up in logs, or you need to back out before Phase 6.
require_once('/etc/inc/config.inc');
require_once('/etc/inc/filter.inc');
global $config;
parse_config(true);
$PORTS_TO_REVERT = ['25', '465', '587', '993'];
$OLD_TARGET = '10.0.20.1';
$NEW_TARGET = 'mailserver';
$changed = 0;
foreach ($config['nat']['rule'] as $i => &$r) {
$iface = $r['interface'] ?? '';
$lport = $r['local-port'] ?? '';
$tgt = $r['target'] ?? '';
if ($iface !== 'wan') continue;
if (!in_array($lport, $PORTS_TO_REVERT, true)) continue;
if ($tgt !== $OLD_TARGET) {
printf("rule %d (dport=%s) target=%s — not reverting (already %s or unexpected)\n",
$i, $lport, $tgt, $NEW_TARGET);
continue;
}
$r['target'] = $NEW_TARGET;
$changed++;
printf("rule %d (dport=%s): target %s → %s\n", $i, $lport, $OLD_TARGET, $NEW_TARGET);
}
unset($r);
if ($changed === 0) {
echo "No changes. (Already reverted.)\n";
exit(0);
}
write_config("code-yiu: NAT rdr — mail ports {$changed} reverted to <mailserver> alias");
$rc = filter_configure();
printf("filter_configure rc=%s\n", var_export($rc, true));
echo "done.\n";

Binary file not shown.

Binary file not shown.

29
setup-monitoring.sh Executable file
View file

@ -0,0 +1,29 @@
#!/bin/bash
# Setup script for automated monitoring environment
# Ensures health check scripts have access to kubeconfig
echo "=== Setting up automated monitoring environment ==="
# Copy kubeconfig to location expected by health check scripts
if [ -f /home/node/.openclaw/kubeconfig ]; then
cp /home/node/.openclaw/kubeconfig /workspace/infra/config
echo "✅ Kubeconfig copied to /workspace/infra/config"
else
echo "❌ Source kubeconfig not found at /home/node/.openclaw/kubeconfig"
exit 1
fi
# Test health check access
echo ""
echo "Testing health check script access..."
cd /workspace/infra
if KUBECONFIG="" timeout 30 bash .claude/cluster-health.sh --quiet > /dev/null 2>&1; then
echo "✅ Health check script can access cluster"
else
echo "❌ Health check script cannot access cluster"
exit 1
fi
echo ""
echo "✅ Automated monitoring environment setup complete"
echo "📊 Cron health checks will now work properly"

View file

@ -201,7 +201,7 @@ resource "kubernetes_deployment" "affine" {
annotations = {
"diun.enable" = "true"
"diun.include_tags" = "^\\d+\\.\\d+\\.\\d+$"
"dependency.kyverno.io/wait-for" = "postgresql.dbaas:5432,redis-master.redis:6379"
"dependency.kyverno.io/wait-for" = "postgresql.dbaas:5432,redis.redis:6379"
}
}
spec {

View file

@ -74,36 +74,6 @@ resource "kubernetes_deployment" "pgbouncer" {
container_port = 6432
}
resources {
requests = {
cpu = "50m"
memory = "128Mi"
}
limits = {
memory = "512Mi"
}
}
readiness_probe {
tcp_socket {
port = 6432
}
initial_delay_seconds = 5
period_seconds = 10
timeout_seconds = 3
failure_threshold = 3
}
liveness_probe {
tcp_socket {
port = 6432
}
initial_delay_seconds = 30
period_seconds = 30
timeout_seconds = 5
failure_threshold = 3
}
volume_mount {
name = "config"
mount_path = "/etc/pgbouncer/pgbouncer.ini"
@ -151,25 +121,6 @@ resource "kubernetes_deployment" "pgbouncer" {
}
}
# --- 3b PodDisruptionBudget ---
# Protects auth against simultaneous node drains. With 3 replicas and
# minAvailable=2, a single drain rolls cleanly; a simultaneous two-node
# outage is correctly blocked.
resource "kubernetes_pod_disruption_budget_v1" "pgbouncer" {
metadata {
name = "pgbouncer"
namespace = "authentik"
}
spec {
min_available = 2
selector {
match_labels = {
app = "pgbouncer"
}
}
}
}
# --- 4 Service ---
resource "kubernetes_service" "pgbouncer" {
metadata {

View file

@ -14,29 +14,9 @@ authentik:
port: 6432
user: authentik
password: ""
# Persistent client-side connections (safe with PgBouncer session mode;
# must be < pgbouncer server_idle_timeout=600s). Cuts Django connection
# setup overhead off the ~70 sequential ORM ops per flow stage.
conn_max_age: 60
conn_health_checks: true
cache:
# Cache flow plans for 30m and policy evaluations for 15m. Authentik 2026.2
# moved cache storage from Redis to Postgres, so a TTL hit is still a
# SELECT — but a single indexed lookup beats re-evaluating PolicyBindings.
timeout_flows: 1800
timeout_policies: 900
web:
# Gunicorn: 3 workers × 4 threads per server pod (default 2×4).
# Pairs with the server memory bump to 2Gi (each worker preloads Django ~500Mi).
workers: 3
threads: 4
worker:
# Celery-equivalent worker threads per pod (default 2, renamed from
# AUTHENTIK_WORKER__CONCURRENCY in 2025.8).
threads: 4
server:
replicas: 3
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
@ -47,7 +27,7 @@ server:
cpu: 100m
memory: 1.5Gi
limits:
memory: 2Gi
memory: 1.5Gi
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
@ -64,12 +44,12 @@ server:
diun.include_tags: "^202[0-9].[0-9]+.*$" # no need to annotate the worker as it uses the same image
pdb:
enabled: true
minAvailable: 2
minAvailable: 1
global:
addPrometheusAnnotations: true
worker:
replicas: 3
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
@ -80,7 +60,7 @@ worker:
cpu: 100m
memory: 1.5Gi
limits:
memory: 2Gi
memory: 1.5Gi
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname

View file

@ -11,7 +11,7 @@ data "vault_kv_secret_v2" "viktor_secrets" {
locals {
namespace = "claude-agent"
image = "registry.viktorbarzin.me/claude-agent-service"
image_tag = "2fd7670d"
image_tag = "0c24c9b6"
labels = {
app = "claude-agent-service"
}
@ -78,25 +78,6 @@ resource "kubernetes_manifest" "external_secret" {
property = "claude_oauth_token"
}
},
{
# Consumed by service-upgrade agent to poll ci.viktorbarzin.me
# per-workflow status. Pod has no Vault CLI auth, so the old
# `vault kv get` path is dead see bd code-3o3.
secretKey = "WOODPECKER_API_TOKEN"
remoteRef = {
key = "ci/global"
property = "woodpecker_api_token"
}
},
{
# Consumed by service-upgrade agent for Start/Success/Failure
# notifications. Same shared webhook as alertmanager.
secretKey = "SLACK_WEBHOOK_URL"
remoteRef = {
key = "viktor"
property = "alertmanager_slack_api_url"
}
},
]
}
}

View file

@ -206,7 +206,7 @@ resource "cloudflare_record" "mail_tlsrpt" {
}
resource "cloudflare_record" "mail_dmarc" {
content = "\"v=DMARC1; p=quarantine; pct=100; fo=1; ri=3600; sp=quarantine; adkim=r; aspf=r; rua=mailto:dmarc@viktorbarzin.me,mailto:adb84997@inbox.ondmarc.com; ruf=mailto:dmarc@viktorbarzin.me,mailto:adb84997@inbox.ondmarc.com,mailto:postmaster@viktorbarzin.me;\""
content = "\"v=DMARC1; p=quarantine; pct=100; fo=1; ri=3600; sp=quarantine; adkim=r; aspf=r; rua=mailto:e21c0ff8@dmarc.mailgun.org,mailto:adb84997@inbox.ondmarc.com; ruf=mailto:e21c0ff8@dmarc.mailgun.org,mailto:adb84997@inbox.ondmarc.com,mailto:postmaster@viktorbarzin.me;\""
name = "_dmarc.viktorbarzin.me"
proxied = false
ttl = 1

View file

@ -84,7 +84,7 @@ resource "kubernetes_deployment" "dawarich" {
annotations = {
"diun.enable" = "true"
"diun.include_tags" = "^v?\\d+\\.\\d+\\.\\d+$"
"dependency.kyverno.io/wait-for" = "postgresql.dbaas:5432,redis-master.redis:6379"
"dependency.kyverno.io/wait-for" = "postgresql.dbaas:5432,redis.redis:6379"
}
}
spec {

View file

@ -180,7 +180,7 @@ resource "kubernetes_deployment" "grampsweb" {
app = "grampsweb"
}
annotations = {
"dependency.kyverno.io/wait-for" = "redis-master.redis:6379"
"dependency.kyverno.io/wait-for" = "redis.redis:6379"
}
}
spec {
@ -354,14 +354,13 @@ resource "kubernetes_service" "grampsweb" {
}
module "ingress" {
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.grampsweb.metadata[0].name
name = "family"
service_name = "grampsweb"
tls_secret_name = var.tls_secret_name
max_body_size = "500m"
protected = true
external_monitor = false
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.grampsweb.metadata[0].name
name = "family"
service_name = "grampsweb"
tls_secret_name = var.tls_secret_name
max_body_size = "500m"
protected = true
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "GrampsWeb"

View file

@ -9,7 +9,7 @@ defaultPodOptions:
env:
# REDIS_HOSTNAME: '{{ printf "%s-redis-master" .Release.Name }}'
REDIS_HOSTNAME: "redis-master.redis.svc.cluster.local"
REDIS_HOSTNAME: "redis.redis.svc.cluster.local"
# DB_HOSTNAME: "postgresql.dbaas"
# DB_USERNAME: "immich"
# DB_DATABASE_NAME: "immich"

View file

@ -83,7 +83,7 @@ For secrets requiring admin access (shared infra passwords, API keys):
| \`modules/kubernetes/nfs_volume/\` | NFS volume module (CSI-backed, soft mount) |
| \`config.tfvars\` | Non-secret configuration (plaintext) |
| \`secrets.sops.json\` | All secrets (SOPS-encrypted JSON) |
| \`scripts/cluster_healthcheck.sh\` | 42-check cluster health script |
| \`scripts/cluster_healthcheck.sh\` | 25-check cluster health script |
| \`AGENTS.md\` | Full AI agent instructions (auto-loaded by most agents) |
### Tier System

View file

@ -7,7 +7,7 @@
#
# Usage:
# annotations:
# dependency.kyverno.io/wait-for: "postgresql.dbaas:5432,redis-master.redis:6379"
# dependency.kyverno.io/wait-for: "postgresql.dbaas:5432,redis.redis:6379"
#
# Each comma-separated entry becomes a busybox init container that runs
# `nc -z <host> <port>` in a loop until the dependency is reachable.

View file

@ -134,29 +134,6 @@ resource "kubernetes_config_map" "mailserver_config" {
# Increase max IMAP connections per user+IP - all Roundcube connections come from same pod IP
"dovecot.cf" = <<-EOF
mail_max_userip_connections = 50
# Throttle IMAP auth brute-force. CrowdSec handles the network-level
# ban, this adds defense in depth at the auth layer each failed
# attempt waits 5s before responding, stretching a 1000-password
# dictionary attack from <1s to ~85min. Addresses code-9mi.
auth_failure_delay = 5s
# code-yiu Phase 5: alt IMAPS listener on :10993 that REQUIRES the
# HAProxy PROXY v2 wire format. pfSense HAProxy injects the header
# on backend connects via k8s-node:30128 kube-proxy pod :10993.
# Real client IP recovered from header despite kube-proxy SNAT.
# The stock :993 listener stays PROXY-free for internal clients
# (Roundcube, email-roundtrip-monitor) on the mailserver ClusterIP.
# haproxy_trusted_networks = source IPs allowed to *send* PROXY v2.
# Post kube-proxy SNAT the source is the k8s node IP (10.0.20.101-104);
# allow-list the whole VLAN 20 node subnet.
haproxy_trusted_networks = 10.0.20.0/24
service imap-login {
inet_listener imaps_proxy {
port = 10993
ssl = yes
haproxy = yes
}
}
EOF
fail2ban_conf = <<-EOF
[DEFAULT]
@ -165,110 +142,31 @@ resource "kubernetes_config_map" "mailserver_config" {
logtarget = SYSOUT
EOF
}
# bcrypt() generates a fresh salt on every evaluation, so the hash line
# differs each plan run. ignore_changes is the pragmatic workaround.
#
# INVARIANT (code-7ns, decision 2026-04-19): if a password in Vault
# (secret/platform.mailserver_accounts) is rotated, ignore_changes WILL
# mask that rotation TF will not re-render the ConfigMap and the pod
# will keep accepting the old password until the ConfigMap is force-
# taintned (`terraform taint module.mailserver.kubernetes_config_map
# .postfix-accounts-cf`) or the resource is addressed explicitly on
# apply (`-replace=...`). Currently there is NO automatic Vault
# rotation for mailserver_accounts, so this is acceptable. If automatic
# rotation is ever added, replace this ignore_changes with either:
# (a) deterministic hashing (bcrypt with a stable salt derived from
# the user string loses per-user salt uniqueness but keeps TF
# convergent), or
# (b) render postfix-accounts.cf from a K8s Secret synced by ESO
# (CRD consumed by a dedicated volume mount; docker-mailserver
# loads it at pod start).
# Password hashes are different each time and avoid changing secret constantly.
# Either 1.Create consistent hashes or 2.Find a way to ignore_changes on per password
lifecycle {
# DRIFT_WORKAROUND: postfix-accounts.cf password hashes non-deterministic; would flap on every apply. Reviewed 2026-04-18.
ignore_changes = [data["postfix-accounts.cf"]]
}
}
# code-yiu Phase 1a: user-patches.sh appends alt PROXY-speaking listeners to
# Postfix master.cf at container startup. docker-mailserver runs
# /tmp/docker-mailserver/user-patches.sh after initial config generation, so
# our append lands on every fresh pod. Idempotent guard prevents double-append
# on in-place container restarts. Dovecot extensions are in the dovecot.cf
# ConfigMap entry (no patches.sh entry needed).
resource "kubernetes_config_map" "mailserver_user_patches" {
metadata {
name = "mailserver-user-patches"
namespace = kubernetes_namespace.mailserver.metadata[0].name
labels = {
app = "mailserver"
}
annotations = {
"reloader.stakater.com/match" = "true"
}
}
# resource "kubernetes_config_map" "user_patches" {
# metadata {
# name = "user-patches"
# namespace = kubernetes_namespace.mailserver.metadata[0].name
# labels = {
# "app" = "mailserver"
# }
# }
data = {
"user-patches.sh" = <<-EOT
#!/bin/bash
# code-yiu Phase 5: append PROXY-speaking alt listeners to Postfix master.cf:
# :2525 postscreen (alt :25) injected with PROXY v2 by pfSense HAProxy
# :4465 smtpd (alt :465 SMTPS) ditto, wrappermode TLS
# :5587 smtpd (alt :587 submission) ditto
# Stock :25/:465/:587 stay in parallel (no PROXY required) so internal
# Roundcube/probe traffic on mailserver.svc ClusterIP keeps working.
# Dovecot alt IMAPS listener on :10993 is configured via dovecot.cf
# (not here) because that's a Dovecot config, not a Postfix master.cf.
set -euxo pipefail
MASTER_CF=/etc/postfix/master.cf
SENTINEL='# code-yiu:alt-proxy'
if ! grep -qF "$SENTINEL" "$MASTER_CF"; then
cat >> "$MASTER_CF" <<'PFXEOF'
# code-yiu:alt-proxy PROXY-speaking alt listeners for pfSense HAProxy backend pool.
# Mirrors stock docker-mailserver submission/submissions options (incl. SASL via
# Dovecot's /dev/shm/sasl-auth.sock) but with PROXY v2 upstream. chroot=n so the
# SASL path is readable from the smtpd process (sockets live outside /var/spool).
2525 inet n - n - 1 postscreen
-o syslog_name=postfix/smtpd-proxy25
-o postscreen_upstream_proxy_protocol=haproxy
-o postscreen_upstream_proxy_timeout=5s
4465 inet n - n - - smtpd
-o syslog_name=postfix/smtpd-proxy465
-o smtpd_tls_wrappermode=yes
-o smtpd_sasl_auth_enable=yes
-o smtpd_sasl_type=dovecot
-o smtpd_tls_auth_only=yes
-o smtpd_reject_unlisted_recipient=no
-o smtpd_sasl_authenticated_header=yes
-o smtpd_client_restrictions=permit_sasl_authenticated,reject
-o smtpd_relay_restrictions=permit_sasl_authenticated,reject
-o smtpd_sender_restrictions=$mua_sender_restrictions
-o smtpd_discard_ehlo_keywords=
-o milter_macro_daemon_name=ORIGINATING
-o cleanup_service_name=sender-cleanup
-o smtpd_upstream_proxy_protocol=haproxy
-o smtpd_upstream_proxy_timeout=5s
5587 inet n - n - - smtpd
-o syslog_name=postfix/smtpd-proxy587
-o smtpd_tls_security_level=encrypt
-o smtpd_sasl_auth_enable=yes
-o smtpd_sasl_type=dovecot
-o smtpd_tls_auth_only=yes
-o smtpd_reject_unlisted_recipient=no
-o smtpd_sasl_authenticated_header=yes
-o smtpd_client_restrictions=permit_sasl_authenticated,reject
-o smtpd_relay_restrictions=permit_sasl_authenticated,reject
-o smtpd_sender_restrictions=$mua_sender_restrictions
-o smtpd_discard_ehlo_keywords=
-o milter_macro_daemon_name=ORIGINATING
-o cleanup_service_name=sender-cleanup
-o smtpd_upstream_proxy_protocol=haproxy
-o smtpd_upstream_proxy_timeout=5s
PFXEOF
fi
EOT
}
}
# data = {
# user_patches = <<EOF
# #!/bin/bash
# cp -f /tmp/dovecot.key /etc/dovecot/ssl/dovecot.key
# cp -f /tmp/dovecot.crt /etc/dovecot/ssl/dovecot.pem
# EOF
# }
# }
resource "kubernetes_secret" "opendkim_key" {
metadata {
@ -332,8 +230,7 @@ resource "kubernetes_deployment" "mailserver" {
template {
metadata {
annotations = {
"diun.enable" = "true"
"diun.include_tags" = "^latest$"
# "diun.enable" = "true"
}
labels = {
"app" = "mailserver"
@ -345,14 +242,11 @@ resource "kubernetes_deployment" "mailserver" {
name = "docker-mailserver"
image = "docker.io/mailserver/docker-mailserver:15.0.0"
image_pull_policy = "IfNotPresent"
# NET_ADMIN was originally required by docker-mailserver's
# Fail2ban (iptables ban actions). Fail2ban is DISABLED in this
# stack (ENABLE_FAIL2BAN=0, see above) CrowdSec owns the
# brute-force policy. The capability is therefore unnecessary.
# Dropping it 2026-04-19 (code-4mu). If mail flow regresses,
# `kubectl logs -n mailserver -l app=mailserver -c docker-mailserver`
# will show permission-denied errors revert if observed.
security_context {}
security_context {
capabilities {
add = ["NET_ADMIN"]
}
}
lifecycle {
post_start {
@ -482,15 +376,6 @@ resource "kubernetes_deployment" "mailserver" {
sub_path = "fail2ban_conf"
read_only = true
}
# code-yiu Phase 1a: user-patches.sh runs at container startup to
# append PROXY-speaking listeners to master.cf (see
# kubernetes_config_map.mailserver_user_patches).
volume_mount {
name = "user-patches"
mount_path = "/tmp/docker-mailserver/user-patches.sh"
sub_path = "user-patches.sh"
read_only = true
}
port {
name = "smtp"
container_port = 25
@ -511,29 +396,6 @@ resource "kubernetes_deployment" "mailserver" {
container_port = 993
protocol = "TCP"
}
# code-yiu Phase 5: alt PROXY-speaking listeners.
# Postfix: 2525 (postscreen), 4465 (smtps), 5587 (submission).
# Dovecot: 10993 (imaps). All require PROXY v2 from pfSense HAProxy.
port {
name = "smtp-proxy"
container_port = 2525
protocol = "TCP"
}
port {
name = "smtps-proxy"
container_port = 4465
protocol = "TCP"
}
port {
name = "sub-proxy"
container_port = 5587
protocol = "TCP"
}
port {
name = "imaps-proxy"
container_port = 10993
protocol = "TCP"
}
env_from {
config_map_ref {
name = "mailserver.env.config"
@ -550,25 +412,35 @@ resource "kubernetes_deployment" "mailserver" {
}
}
readiness_probe {
tcp_socket {
port = 25
}
initial_delay_seconds = 30
period_seconds = 10
}
liveness_probe {
tcp_socket {
port = 993
}
initial_delay_seconds = 60
period_seconds = 60
timeout_seconds = 15
}
}
container {
name = "dovecot-exporter"
image = "viktorbarzin/dovecot_exporter:latest"
command = [
"/dovecot_exporter/exporter",
"--dovecot.socket-path=/var/run/dovecot/stats-reader"
]
image_pull_policy = "IfNotPresent"
port {
name = "dovecotexporter"
container_port = 9166
protocol = "TCP"
}
volume_mount {
name = "var-run-dovecot"
mount_path = "/var/run/dovecot"
}
resources {
requests = {
cpu = "10m"
memory = "32Mi"
}
limits = {
memory = "32Mi"
}
}
}
volume {
name = "config"
@ -600,14 +472,12 @@ resource "kubernetes_deployment" "mailserver" {
# fs_type = "ext4"
# }
}
# code-yiu Phase 1a
volume {
name = "user-patches"
config_map {
name = kubernetes_config_map.mailserver_user_patches.metadata[0].name
default_mode = "0755"
}
}
# volume {
# name = "user-patches"
# config_map {
# name = "user-patches"
# }
# }
volume {
name = "var-run-dovecot"
empty_dir {}
@ -624,13 +494,6 @@ resource "kubernetes_deployment" "mailserver" {
}
resource "kubernetes_service" "mailserver" {
# code-yiu Phase 6: downgraded from LoadBalancer (MetalLB 10.0.20.202,
# ETP: Local) to ClusterIP on 2026-04-19. External mail now enters via
# pfSense HAProxy kubernetes_service.mailserver_proxy NodePort alt
# PROXY-speaking listeners. This Service exists only for intra-cluster
# clients (Roundcube pod, email-roundtrip-monitor CronJob) that talk to
# `mailserver.mailserver.svc.cluster.local:{25,465,587,993}` on the
# stock (PROXY-free) container listeners.
metadata {
name = "mailserver"
namespace = kubernetes_namespace.mailserver.metadata[0].name
@ -638,10 +501,15 @@ resource "kubernetes_service" "mailserver" {
labels = {
app = "mailserver"
}
annotations = {
"metallb.io/loadBalancerIPs" = "10.0.20.202"
}
}
spec {
type = "ClusterIP"
type = "LoadBalancer"
external_traffic_policy = "Local"
selector = {
app = "mailserver"
}
@ -673,65 +541,12 @@ resource "kubernetes_service" "mailserver" {
port = 993
target_port = "imap-secure"
}
}
}
# The `mailserver-metrics` ClusterIP Service (formerly split from the
# main LB in code-izl) was retired in code-1ik when the Dovecot
# exporter was removed the exporter spoke the pre-Dovecot-2.3
# old_stats protocol which docker-mailserver 15.0.0 no longer
# emits, so the scrape was a no-op. If a working exporter is ever
# re-introduced, add back: ClusterIP Service exposing port 9166
# with selector app=mailserver.
# code-yiu Phase 1a: NodePort Service for pfSense HAProxy backend connections.
# External SMTP flow post-cutover:
# Client pfSense WAN:25 pfSense HAProxy k8s-node:30125 (NodePort
# targeting container :2525 on any node, ETP: Cluster) pod postscreen
# with PROXY v2 parsing real client IP in maillog.
# Internal flow (Roundcube, probe) stays on the mailserver ClusterIP Service
# hitting container :25 without PROXY unchanged.
resource "kubernetes_service" "mailserver_proxy" {
metadata {
name = "mailserver-proxy"
namespace = kubernetes_namespace.mailserver.metadata[0].name
labels = {
app = "mailserver"
}
}
spec {
type = "NodePort"
external_traffic_policy = "Cluster"
selector = {
app = "mailserver"
}
port {
name = "smtp-proxy"
name = "dovecot-metrics"
protocol = "TCP"
port = 25
target_port = 2525
node_port = 30125
}
port {
name = "smtps-proxy"
protocol = "TCP"
port = 465
target_port = 4465
node_port = 30126
}
port {
name = "sub-proxy"
protocol = "TCP"
port = 587
target_port = 5587
node_port = 30127
}
port {
name = "imaps-proxy"
protocol = "TCP"
port = 993
target_port = 10993
node_port = 30128
port = 9166
target_port = 9166
}
}
}
@ -897,68 +712,32 @@ except Exception as e:
duration = time.time() - start
print(f"ERROR: {e}")
# Push metrics to Pushgateway. On failure we omit email_roundtrip_last_success_timestamp
# and POST (not PUT) so the prior successful timestamp is preserved otherwise pushing 0
# makes EmailRoundtripStale fire immediately alongside EmailRoundtripFailing.
metric_lines = [
"# HELP email_roundtrip_success Whether the last e2e email probe succeeded",
"# TYPE email_roundtrip_success gauge",
f"email_roundtrip_success {success}",
"# HELP email_roundtrip_duration_seconds Duration of the last e2e email probe",
"# TYPE email_roundtrip_duration_seconds gauge",
f"email_roundtrip_duration_seconds {duration:.2f}",
]
if success:
metric_lines += [
"# HELP email_roundtrip_last_success_timestamp Unix timestamp of last successful probe",
"# TYPE email_roundtrip_last_success_timestamp gauge",
f"email_roundtrip_last_success_timestamp {int(time.time())}",
]
metrics = "\n".join(metric_lines) + "\n"
UPTIME_KUMA_URL = "http://uptime-kuma.uptime-kuma.svc.cluster.local/api/push/hLtyRKgeZO?status=up&msg=OK&ping=" + str(int(duration))
def push_with_retry(label, func, url):
# 3 attempts with exponential backoff (1s, 2s, 4s). Returns True on success, False otherwise.
# Final failure logs ERROR with URL + status code (or exception) so the pod log surfaces the drop.
last_status = None
last_exc = None
for attempt in range(3):
try:
resp = func()
last_status = resp.status_code
if 200 <= resp.status_code < 300:
print(f"Pushed to {label} (attempt {attempt+1}, status {resp.status_code})")
return True
last_exc = None
except Exception as e:
last_exc = e
last_status = None
if attempt < 2:
time.sleep(2 ** attempt)
detail = f"status={last_status}" if last_exc is None else f"exception={last_exc!r}"
print(f"ERROR: Failed to push to {label} after 3 attempts: url={url} {detail}", file=sys.stderr)
return False
pushgateway_ok = push_with_retry(
"Pushgateway",
lambda: requests.post(PUSHGATEWAY, data=metrics, timeout=10),
PUSHGATEWAY,
)
# Push metrics to Pushgateway
metrics = f"""# HELP email_roundtrip_success Whether the last e2e email probe succeeded
# TYPE email_roundtrip_success gauge
email_roundtrip_success {success}
# HELP email_roundtrip_duration_seconds Duration of the last e2e email probe
# TYPE email_roundtrip_duration_seconds gauge
email_roundtrip_duration_seconds {duration:.2f}
# HELP email_roundtrip_last_success_timestamp Unix timestamp of last successful probe
# TYPE email_roundtrip_last_success_timestamp gauge
email_roundtrip_last_success_timestamp {int(time.time()) if success else 0}
"""
try:
requests.put(PUSHGATEWAY, data=metrics, timeout=10)
print("Pushed metrics to Pushgateway")
except Exception as e:
print(f"Failed to push metrics: {e}")
# Push to Uptime Kuma on success
uptime_kuma_ok = True
if success:
uptime_kuma_ok = push_with_retry(
"Uptime Kuma",
lambda: requests.get(UPTIME_KUMA_URL, timeout=10),
UPTIME_KUMA_URL,
)
try:
requests.get("http://uptime-kuma.uptime-kuma.svc.cluster.local/api/push/hLtyRKgeZO?status=up&msg=OK&ping=" + str(int(duration)), timeout=10)
print("Pushed to Uptime Kuma")
except Exception as e:
print(f"Failed to push to Uptime Kuma: {e}")
# Exit non-zero when the round-trip itself failed, OR when BOTH push endpoints
# failed after all retries (only possible on the success path on failure we
# only attempt Pushgateway, and the round-trip failure already dominates exit).
both_pushes_failed = success and (not pushgateway_ok) and (not uptime_kuma_ok)
sys.exit(0 if (success and not both_pushes_failed) else 1)
sys.exit(0 if success else 1)
'
EOT
]
@ -1149,381 +928,3 @@ resource "kubernetes_cron_job_v1" "mailserver-backup" {
}
}
# =============================================================================
# Roundcube Backup Daily rsync of html + enigma PVCs to NFS
# Roundcube uses two encrypted RWO PVCs (see roundcubemail.tf):
# - roundcubemail-html-encrypted /var/www/html (plugins, user sessions, skin overrides)
# - roundcubemail-enigma-encrypted /var/roundcube/enigma (user-uploaded PGP keys)
# Losing either one = users lose plugin state + have to re-import PGP keys.
# Mirrors the mailserver-backup pattern but:
# - pod_affinity targets app=roundcubemail (both PVCs attach to the
# Roundcube pod, not mailserver)
# - schedule offset by +10m (03:10) so two NFS-writers don't overlap
# - writes to /srv/nfs/roundcube-backup/<YYYY-WW>/{html,enigma}/
# =============================================================================
module "nfs_roundcube_backup_host" {
source = "../../../../modules/kubernetes/nfs_volume"
name = "roundcube-backup-host"
namespace = kubernetes_namespace.mailserver.metadata[0].name
nfs_server = var.nfs_server
nfs_path = "/srv/nfs/roundcube-backup"
}
resource "kubernetes_cron_job_v1" "roundcube-backup" {
metadata {
name = "roundcube-backup"
namespace = kubernetes_namespace.mailserver.metadata[0].name
}
spec {
concurrency_policy = "Replace"
failed_jobs_history_limit = 5
# +10 min offset vs mailserver-backup (03:00) to avoid NFS contention.
schedule = "10 3 * * *"
starting_deadline_seconds = 10
successful_jobs_history_limit = 10
job_template {
metadata {}
spec {
backoff_limit = 3
ttl_seconds_after_finished = 10
template {
metadata {}
spec {
# RWO co-location: Roundcube PVCs are ReadWriteOnce; the backup
# pod must land on the same node as the Roundcube pod (single
# replica, Recreate strategy see roundcubemail.tf).
affinity {
pod_affinity {
required_during_scheduling_ignored_during_execution {
label_selector {
match_labels = {
app = "roundcubemail"
}
}
topology_key = "kubernetes.io/hostname"
}
}
}
container {
name = "roundcube-backup"
image = "docker.io/library/alpine"
command = ["/bin/sh", "-c", <<-EOT
set -euxo pipefail
apk add --no-cache rsync
_t0=$(date +%s)
_rb0=$(awk '/^read_bytes/{print $2}' /proc/$$/io 2>/dev/null || echo 0)
_wb0=$(awk '/^write_bytes/{print $2}' /proc/$$/io 2>/dev/null || echo 0)
week=$(date +"%Y-%W")
prev_week=$(date -d "-7 days" +"%Y-%W" 2>/dev/null || echo "")
dst=/backup/$week
mkdir -p "$dst"
# Use --link-dest against previous week for space-efficient
# incrementals (unchanged files are hardlinked, not re-copied).
link_dest_arg=""
if [ -n "$prev_week" ] && [ -d "/backup/$prev_week" ]; then
link_dest_arg="--link-dest=/backup/$prev_week"
fi
# Roundcube data layout (from deployment volume mounts in roundcubemail.tf):
# /src/html -> roundcubemail-html-encrypted (html PVC)
# /src/enigma -> roundcubemail-enigma-encrypted (enigma PVC, PGP keys)
for src in /src/html /src/enigma; do
[ -d "$src" ] || { echo "SKIP missing $src"; continue; }
name=$(basename "$src")
rsync -aH --delete $link_dest_arg "$src/" "$dst/$name/"
done
# Rotate keep 8 weekly snapshots (~2 months)
find /backup -maxdepth 1 -mindepth 1 -type d -regex '.*/[0-9]+-[0-9]+$' | sort | head -n -8 | xargs -r rm -rf
_dur=$(($(date +%s) - _t0))
_rb1=$(awk '/^read_bytes/{print $2}' /proc/$$/io 2>/dev/null || echo 0)
_wb1=$(awk '/^write_bytes/{print $2}' /proc/$$/io 2>/dev/null || echo 0)
echo "=== Backup IO Stats ==="
echo "duration: $${_dur}s"
echo "read: $(( (_rb1 - _rb0) / 1048576 )) MiB"
echo "written: $(( (_wb1 - _wb0) / 1048576 )) MiB"
echo "output: $(du -sh "$dst" | awk '{print $1}')"
_out_bytes=$(du -sb "$dst" | awk '{print $1}')
wget -qO- --post-data "backup_duration_seconds $${_dur}
backup_read_bytes $(( _rb1 - _rb0 ))
backup_written_bytes $(( _wb1 - _wb0 ))
backup_output_bytes $${_out_bytes}
backup_last_success_timestamp $(date +%s)
" "http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/roundcube-backup" || true
EOT
]
volume_mount {
name = "html"
mount_path = "/src/html"
read_only = true
}
volume_mount {
name = "enigma"
mount_path = "/src/enigma"
read_only = true
}
volume_mount {
name = "backup"
mount_path = "/backup"
}
}
volume {
name = "html"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.roundcube_html_encrypted.metadata[0].name
read_only = true
}
}
volume {
name = "enigma"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.roundcube_enigma_encrypted.metadata[0].name
read_only = true
}
}
volume {
name = "backup"
persistent_volume_claim {
claim_name = module.nfs_roundcube_backup_host.claim_name
}
}
dns_config {
option {
name = "ndots"
value = "2"
}
}
}
}
}
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}
# =============================================================================
# Spam mailbox targeted retention (code-oy4)
#
# The @viktorbarzin.me catch-all routes to spam@viktorbarzin.me. Unbounded
# growth (~43 MiB baseline on 2026-04-18, 519 messages, top sender
# tldrnewsletter.com = 138 msgs / 8.2 MiB) makes it painful to triage.
# Profile (2026-04-18):
# - 502/519 messages older than 14 days (97 %)
# - 342/519 carry List-Unsubscribe: (66 %)
# - 21/519 carry Precedence: bulk ( 4 %)
# - 177/519 carry neither marker (= human-ish, 34 %)
#
# Strategy (user-signed-off 2026-04-18, do NOT blind-age-expunge):
# - Messages older than 14 days carrying List-Unsubscribe OR
# Precedence: bulk|list|junk OR Auto-Submitted: auto-* -> DELETE
# - Messages older than 90 days with no automated-sender marker
# -> DELETE (long-tail human forwards)
# - Everything else -> KEEP
#
# Implementation: kubectl exec into the mailserver pod because the
# Maildir lives on a RWO encrypted PVC; a sibling CronJob would fail to
# attach the volume while the mailserver pod holds it. Pattern mirrors
# the `nextcloud-watchdog` in stacks/nextcloud/main.tf.
# =============================================================================
resource "kubernetes_service_account" "spam_retention" {
metadata {
name = "spam-retention"
namespace = kubernetes_namespace.mailserver.metadata[0].name
}
}
resource "kubernetes_role" "spam_retention" {
metadata {
name = "spam-retention"
namespace = kubernetes_namespace.mailserver.metadata[0].name
}
rule {
api_groups = [""]
resources = ["pods"]
verbs = ["list", "get"]
}
rule {
api_groups = [""]
resources = ["pods/exec"]
verbs = ["create"]
}
}
resource "kubernetes_role_binding" "spam_retention" {
metadata {
name = "spam-retention"
namespace = kubernetes_namespace.mailserver.metadata[0].name
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "Role"
name = kubernetes_role.spam_retention.metadata[0].name
}
subject {
kind = "ServiceAccount"
name = kubernetes_service_account.spam_retention.metadata[0].name
namespace = kubernetes_namespace.mailserver.metadata[0].name
}
}
resource "kubernetes_cron_job_v1" "spam_retention" {
metadata {
name = "spam-retention"
namespace = kubernetes_namespace.mailserver.metadata[0].name
}
spec {
schedule = "17 */4 * * *"
concurrency_policy = "Forbid"
successful_jobs_history_limit = 2
failed_jobs_history_limit = 3
starting_deadline_seconds = 300
job_template {
metadata {}
spec {
active_deadline_seconds = 600
backoff_limit = 1
ttl_seconds_after_finished = 600
template {
metadata {}
spec {
service_account_name = kubernetes_service_account.spam_retention.metadata[0].name
restart_policy = "Never"
container {
name = "spam-retention"
image = "bitnami/kubectl:latest"
command = ["/bin/bash", "-c", <<-EOF
set -euo pipefail
POD=$(kubectl -n mailserver get pods -l app=mailserver -o jsonpath='{.items[0].metadata.name}')
if [ -z "$POD" ]; then
echo "ERROR: no mailserver pod found" >&2
exit 1
fi
echo "Targeting pod $POD"
# Stream the retention script to python3 inside the mailserver
# container via stdin. Keeping the logic in Python avoids the
# POSIX-sh/awk fragility around stat(1) differences and header
# matching.
kubectl -n mailserver exec -i "$POD" -c docker-mailserver -- python3 - <<'PYEOF'
import os
import re
import sys
import time
SPAM = "/var/mail/viktorbarzin.me/spam/cur"
# Retention thresholds, in days, one per rule.
AUTOMATED_MAX_AGE_DAYS = 14
HUMAN_MAX_AGE_DAYS = 90
HEADER_SCAN_BYTES = 65536
AUTO_PATTERNS = (
re.compile(rb"^list-unsubscribe:", re.IGNORECASE),
re.compile(rb"^precedence:\s*(bulk|list|junk)", re.IGNORECASE),
re.compile(rb"^auto-submitted:\s*auto-", re.IGNORECASE),
)
def is_automated(path):
try:
with open(path, "rb") as fh:
head = fh.read(HEADER_SCAN_BYTES)
except OSError:
return False
hdr, _, _ = head.partition(b"\r\n\r\n")
if hdr == head:
hdr, _, _ = head.partition(b"\n\n")
for line in hdr.splitlines():
for pat in AUTO_PATTERNS:
if pat.search(line):
return True
return False
if not os.path.isdir(SPAM):
print(f"SKIP: {SPAM} does not exist")
sys.exit(0)
now = time.time()
scanned = auto_deleted = human_deleted = kept = errors = 0
for entry in sorted(os.listdir(SPAM)):
path = os.path.join(SPAM, entry)
try:
st = os.stat(path)
except OSError:
errors += 1
continue
if not os.path.isfile(path):
continue
scanned += 1
age_days = (now - st.st_mtime) / 86400
automated = is_automated(path)
if automated and age_days > AUTOMATED_MAX_AGE_DAYS:
try:
os.unlink(path)
auto_deleted += 1
except OSError:
errors += 1
continue
if (not automated) and age_days > HUMAN_MAX_AGE_DAYS:
try:
os.unlink(path)
human_deleted += 1
except OSError:
errors += 1
continue
kept += 1
# Metric lines (Pushgateway-compatible format). The parent
# kubectl wrapper logs them for now; Pushgateway integration
# is a follow-up.
print(f"spam_retention_scanned_total {scanned}")
print(f"spam_retention_auto_deleted_total {auto_deleted}")
print(f"spam_retention_human_deleted_total {human_deleted}")
print(f"spam_retention_kept_total {kept}")
print(f"spam_retention_errors_total {errors}")
sys.exit(1 if errors else 0)
PYEOF
# Refresh Dovecot index so IMAP sees the deletions immediately.
kubectl -n mailserver exec "$POD" -c docker-mailserver -- \
doveadm force-resync -u spam@viktorbarzin.me INBOX/spam || true
echo "Retention pass complete"
EOF
]
resources {
requests = {
cpu = "10m"
memory = "32Mi"
}
limits = {
memory = "128Mi"
}
}
}
dns_config {
option {
name = "ndots"
value = "2"
}
}
}
}
}
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}

View file

@ -267,7 +267,6 @@ module "ingress" {
name = "mail"
service_name = "roundcubemail"
tls_secret_name = var.tls_secret_name
protected = true
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "Roundcube Mail"

View file

@ -12,13 +12,6 @@ smtp_tls_security_level = encrypt
smtpd_tls_cert_file=/tmp/ssl/tls.crt
smtpd_tls_key_file=/tmp/ssl/tls.key
smtpd_use_tls=yes
# Require STARTTLS before any AUTH command on the SMTPD listener.
# Without this, a misconfigured client that skips STARTTLS would send
# PLAIN/LOGIN creds in the clear. docker-mailserver's default does NOT
# enforce this at the main.cf level for submission (587).
# Note: smtpd_sasl_auth_only (sometimes cited) is NOT a real Postfix
# parameter only smtpd_tls_auth_only is. Addresses code-vnw.
smtpd_tls_auth_only = yes
header_size_limit = 4096000
# Debug mail tls
@ -44,3 +37,139 @@ anvil_rate_time_unit = 60s
postscreen_cache_map =
EOT
}
variable "postfix_cf_reference_DO_NOT_USE" {
default = <<EOT
# See /usr/share/postfix/main.cf.dist for a commented, more complete version
smtpd_banner = $myhostname ESMTP $mail_name (Debian)
biff = no
append_dot_mydomain = no
readme_directory = no
# Basic configuration
# myhostname =
alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
mydestination = $myhostname, localhost.$mydomain, localhost
mynetworks = 127.0.0.0/8 [::1]/128 [fe80::]/64
mailbox_size_limit = 0
recipient_delimiter = +
inet_interfaces = all
inet_protocols = ipv4
# TLS parameters
smtpd_tls_cert_file=/tmp/ssl/tls.crt
smtpd_tls_key_file=/tmp/ssl/tls.key
#smtpd_tls_CAfile=
#smtp_tls_CAfile=
smtpd_tls_security_level = may
smtpd_use_tls=yes
smtpd_tls_loglevel = 1
smtp_tls_loglevel = 1
tls_ssl_options = NO_COMPRESSION
tls_high_cipherlist = ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:DES-CBC3-SHA:!DSS
tls_preempt_cipherlist = yes
smtpd_tls_protocols = !SSLv2,!SSLv3
smtp_tls_protocols = !SSLv2,!SSLv3
smtpd_tls_mandatory_ciphers = high
smtpd_tls_mandatory_protocols = !SSLv2,!SSLv3
smtpd_tls_exclude_ciphers = aNULL, LOW, EXP, MEDIUM, ADH, AECDH, MD5, DSS, ECDSA, CAMELLIA128, 3DES, CAMELLIA256, RSA+AES, eNULL
smtpd_tls_dh1024_param_file = /etc/postfix/dhparams.pem
smtpd_tls_CApath = /etc/ssl/certs
smtp_tls_CApath = /etc/ssl/certs
# Settings to prevent SPAM early
smtpd_helo_required = yes
smtpd_delay_reject = yes
smtpd_helo_restrictions = permit_mynetworks, reject_invalid_helo_hostname, permit
#smtpd_relay_restrictions = permit_mynetworks permit_sasl_authenticated defer_unauth_destination
#smtpd_relay_restrictions = reject_sender_login_mismatch permit_sasl_authenticated permit_mynetworks defer_unauth_destination
smtpd_relay_restrictions = reject_sender_login_mismatch permit_sasl_authenticated permit_mynetworks defer_unauth_destination
smtpd_recipient_restrictions = permit_sasl_authenticated, reject_unauth_destination, reject_unauth_pipelining, reject_invalid_helo_hostname, reject_non_fqdn_helo_hostname, reject_unknown_recipient_domain, reject_rbl_client bl.spamcop.net, permit_mynetworks
smtpd_client_restrictions = permit_mynetworks, permit_sasl_authenticated, reject_unauth_destination, reject_unauth_pipelining
#smtpd_sender_restrictions = reject_sender_login_mismatch, permit_sasl_authenticated, permit_mynetworks, reject_unknown_sender_domain
smtpd_sender_restrictions = reject_sender_login_mismatch, reject_authenticated_sender_login_mismatch, reject_unknown_sender_domain, permit_sasl_authenticated, permit_mynetworks
disable_vrfy_command = yes
# Postscreen settings to drop zombies/open relays/spam early
#postscreen_dnsbl_action = enforce
postscreen_dnsbl_action = ignore
postscreen_dnsbl_sites = zen.spamhaus.org*2
bl.mailspike.net
b.barracudacentral.org*2
bl.spameatingmonkey.net
bl.spamcop.net
dnsbl.sorbs.net
psbl.surriel.com
list.dnswl.org=127.0.[0..255].0*-2
list.dnswl.org=127.0.[0..255].1*-3
list.dnswl.org=127.0.[0..255].[2..3]*-4
postscreen_dnsbl_threshold = 3
postscreen_dnsbl_whitelist_threshold = -1
postscreen_greet_action = enforce
postscreen_bare_newline_action = enforce
# SASL
smtpd_sasl_auth_enable = no
#smtpd_sasl_auth_enable = yes
##smtpd_sasl_path = /var/spool/postfix/private/auth
#smtpd_sasl_path = /var/spool/postfix/private/smtpd
##smtpd_sasl_type = dovecot
#smtpd_sasl_type = dovecot
##smtpd_sasl_security_options = noanonymous
#smtpd_sasl_security_options = noanonymous
##smtpd_sasl_local_domain = $mydomain
##broken_sasl_auth_clients = yes
#broken_sasl_auth_clients = yes
# SMTP configuration
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl/passwd
smtp_sasl_security_options = noanonymous
smtp_sasl_tls_security_options = noanonymous
smtp_tls_security_level = encrypt
header_size_limit = 4096000
relayhost = [smtp.sendgrid.net]:587
# Mail directory
virtual_transport = lmtp:unix:/var/run/dovecot/lmtp
virtual_mailbox_domains = /etc/postfix/vhost
virtual_mailbox_maps = texthash:/etc/postfix/vmailbox
virtual_alias_maps = texthash:/etc/postfix/virtual
# Additional option for filtering
content_filter = smtp-amavis:[127.0.0.1]:10024
# Milters used by DKIM
milter_protocol = 6
milter_default_action = accept
dkim_milter = inet:localhost:8891
dmarc_milter = inet:localhost:8893
smtpd_milters = $dkim_milter,$dmarc_milter
non_smtpd_milters = $dkim_milter
# SPF policy settings
policyd-spf_time_limit = 3600
# Header checks for content inspection on receiving
header_checks = pcre:/etc/postfix/maps/header_checks.pcre
# Remove unwanted headers that reveail our privacy
smtp_header_checks = pcre:/etc/postfix/maps/sender_header_filter.pcre
myhostname = mail.viktorbarzin.me
mydomain = viktorbarzin.me
smtputf8_enable = no
message_size_limit = 20480000
sender_canonical_maps = tcp:localhost:10001
sender_canonical_classes = envelope_sender
recipient_canonical_maps = tcp:localhost:10002
recipient_canonical_classes = envelope_recipient,header_recipient
compatibility_level = 2
# enable_original_recipient = no # b4 uncommenting see https://serverfault.com/questions/661615/how-to-drop-orig-to-using-postfix-virtual-domains
always_add_missing_headers = yes
anvil_status_update_time = 5s
EOT
}

View file

@ -434,223 +434,6 @@
],
"title": "Transfer Speed (Global)",
"type": "timeseries"
},
{
"collapsed": false,
"gridPos": { "h": 1, "w": 24, "x": 0, "y": 39 },
"id": 103,
"title": "MAM Profile (from jsonLoad.php)",
"type": "row"
},
{
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"fieldConfig": {
"defaults": {
"mappings": [
{ "type": "value", "options": {
"0": { "color": "red", "text": "Mouse" },
"1": { "color": "orange", "text": "Vole" },
"2": { "color": "yellow", "text": "User" },
"3": { "color": "green", "text": "Power User" },
"4": { "color": "green", "text": "Elite" },
"5": { "color": "blue", "text": "Torrent Master" },
"6": { "color": "blue", "text": "Power TM" },
"7": { "color": "purple", "text": "Elite TM" },
"8": { "color": "purple", "text": "VIP" }
} }
],
"thresholds": { "mode": "absolute", "steps": [
{ "color": "red", "value": null },
{ "color": "green", "value": 2 }
] }
}
},
"gridPos": { "h": 6, "w": 4, "x": 0, "y": 40 },
"id": 20,
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "center",
"textMode": "value",
"reduceOptions": { "calcs": ["lastNotNull"] }
},
"targets": [{ "expr": "mam_class_code", "legendFormat": "Class" }],
"title": "MAM Class",
"type": "stat"
},
{
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"fieldConfig": {
"defaults": {
"thresholds": { "mode": "absolute", "steps": [
{ "color": "red", "value": null },
{ "color": "orange", "value": 0.8 },
{ "color": "green", "value": 1.2 }
] },
"decimals": 3
}
},
"gridPos": { "h": 6, "w": 4, "x": 4, "y": 40 },
"id": 21,
"options": {
"colorMode": "background",
"graphMode": "area",
"justifyMode": "center",
"textMode": "value",
"reduceOptions": { "calcs": ["lastNotNull"] }
},
"targets": [{ "expr": "mam_ratio", "legendFormat": "Ratio" }],
"title": "MAM Ratio (profile)",
"type": "stat"
},
{
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"fieldConfig": {
"defaults": {
"unit": "short",
"thresholds": { "mode": "absolute", "steps": [
{ "color": "red", "value": null },
{ "color": "green", "value": 5000 }
] }
}
},
"gridPos": { "h": 6, "w": 4, "x": 8, "y": 40 },
"id": 22,
"options": {
"colorMode": "background",
"graphMode": "area",
"justifyMode": "center",
"textMode": "value",
"reduceOptions": { "calcs": ["lastNotNull"] }
},
"targets": [{ "expr": "mam_bp_balance", "legendFormat": "BP" }],
"title": "MAM Bonus Points",
"type": "stat"
},
{
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"fieldConfig": { "defaults": { "unit": "decbytes" } },
"gridPos": { "h": 6, "w": 12, "x": 12, "y": 40 },
"id": 23,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "center",
"textMode": "value_and_name",
"reduceOptions": { "calcs": ["lastNotNull"] }
},
"targets": [
{ "expr": "mam_downloaded_bytes", "legendFormat": "Downloaded" },
{ "expr": "mam_uploaded_bytes", "legendFormat": "Uploaded" }
],
"title": "MAM Transfer (profile)",
"type": "stat"
},
{
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": {
"drawStyle": "line",
"fillOpacity": 10,
"lineWidth": 2,
"showPoints": "never",
"spanNulls": true,
"thresholdsStyle": { "mode": "line" }
},
"thresholds": { "mode": "absolute", "steps": [
{ "color": "transparent", "value": null },
{ "color": "orange", "value": 500 }
] },
"unit": "short"
}
},
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 46 },
"id": 24,
"options": {
"legend": { "calcs": ["lastNotNull", "min"], "displayMode": "table", "placement": "bottom" },
"tooltip": { "mode": "multi" }
},
"targets": [
{ "expr": "mam_bp_balance", "legendFormat": "BP Balance" },
{ "expr": "mam_bp_needed_gib * 500", "legendFormat": "Next-run cost (BP)" }
],
"title": "BP Balance vs Reserve",
"type": "timeseries"
},
{
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"fieldConfig": {
"defaults": {
"color": { "mode": "palette-classic" },
"custom": {
"drawStyle": "bars",
"fillOpacity": 80,
"lineWidth": 1,
"stacking": { "mode": "normal" }
},
"unit": "short"
}
},
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 46 },
"id": 25,
"options": {
"legend": { "calcs": ["lastNotNull", "sum"], "displayMode": "table", "placement": "bottom" },
"tooltip": { "mode": "multi" }
},
"targets": [
{
"expr": "mam_janitor_deleted_per_run",
"legendFormat": "{{reason}}"
}
],
"title": "Janitor Deletions per Run (by reason)",
"type": "timeseries"
},
{
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"fieldConfig": {
"defaults": { "unit": "short" }
},
"gridPos": { "h": 6, "w": 12, "x": 0, "y": 54 },
"id": 26,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "center",
"textMode": "value_and_name",
"reduceOptions": { "calcs": ["lastNotNull"] }
},
"targets": [
{ "expr": "mam_janitor_preserved_hnr", "legendFormat": "Preserved (H&R <72h)" },
{ "expr": "mam_janitor_skipped_active", "legendFormat": "Skipped (in-progress)" },
{ "expr": "mam_janitor_dry_run", "legendFormat": "Dry-run mode" }
],
"title": "Janitor State",
"type": "stat"
},
{
"datasource": { "type": "prometheus", "uid": "${datasource}" },
"fieldConfig": {
"defaults": { "unit": "short" }
},
"gridPos": { "h": 6, "w": 12, "x": 12, "y": 54 },
"id": 27,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "center",
"textMode": "value_and_name",
"reduceOptions": { "calcs": ["lastNotNull"] }
},
"targets": [
{ "expr": "mam_farming_grabbed", "legendFormat": "Last run grabbed" },
{ "expr": "mam_farming_total_seeding", "legendFormat": "Total in farming" },
{ "expr": "sum by (reason) (mam_grabber_skipped_reason)", "legendFormat": "Grabber skipped: {{reason}}" }
],
"title": "Grabber State",
"type": "stat"
}
],
"refresh": "1m",

File diff suppressed because it is too large Load diff

View file

@ -5,8 +5,6 @@ deploymentStrategy:
maxUnavailable: 1
replicas: 1
adminPassword: "${grafana_admin_password}"
plugins:
- netsage-sankey-panel
resources:
requests:
cpu: 50m

View file

@ -1355,65 +1355,12 @@ serverFiles:
annotations:
summary: "PostgreSQL pod {{ $labels.pod }} is not ready"
- alert: RedisDown
# Covers both the legacy Bitnami StatefulSet (redis-node) and the
# new raw StatefulSet (redis-v2) during the 2026-04-19 migration.
# Drop the redis-node branch after helm_release.redis is removed.
expr: (sum(kube_statefulset_status_replicas_ready{namespace="redis", statefulset=~"redis-node|redis-v2"}) or on() vector(0)) < 1
expr: kube_statefulset_status_replicas_ready{namespace="redis", statefulset="redis-node"} < 1
for: 5m
labels:
severity: critical
annotations:
summary: "Redis has no ready replicas across both clusters"
- alert: RedisMemoryPressure
expr: redis_memory_used_bytes{namespace="redis"} / redis_memory_max_bytes{namespace="redis"} > 0.85
for: 5m
labels:
severity: warning
annotations:
summary: "Redis pod {{ $labels.pod }} using {{ $value | humanizePercentage }} of maxmemory — eviction imminent"
- alert: RedisEvictions
# allkeys-lru is configured so evictions under cache pressure are
# expected, but sustained evictions mean we're thrashing — raise it.
expr: rate(redis_evicted_keys_total{namespace="redis"}[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Redis pod {{ $labels.pod }} evicting keys ({{ $value }} keys/s)"
- alert: RedisReplicationLagHigh
expr: redis_connected_slave_lag_seconds{namespace="redis"} > 30
for: 3m
labels:
severity: warning
annotations:
summary: "Redis replica {{ $labels.slave_ip }} lagging {{ $value }}s behind master"
- alert: RedisForkLatencyHigh
# latest_fork_usec > 500ms means BGSAVE fork is stalling the main
# thread long enough to drop client requests. COW pressure or
# constrained memory headroom are the usual causes.
expr: redis_latest_fork_usec{namespace="redis"} > 500000
for: 0m
labels:
severity: warning
annotations:
summary: "Redis pod {{ $labels.pod }} fork took {{ $value }}us (>500ms) — investigate memory headroom"
- alert: RedisAOFRewriteLong
expr: redis_aof_rewrite_in_progress{namespace="redis"} == 1
for: 10m
labels:
severity: warning
annotations:
summary: "Redis pod {{ $labels.pod }} AOF rewrite running >10m — COW memory risk, investigate"
- alert: RedisReplicasMissing
# redis-v2 StatefulSet should always have 3 replicas connected to
# the master (2 replicas + itself). <2 connected_slaves means one
# replica is unreachable or still syncing.
expr: redis_connected_slaves{namespace="redis", pod=~"redis-v2-.*"} < 2 and redis_instance_info{namespace="redis", pod=~"redis-v2-.*", role="master"} == 1
for: 10m
labels:
severity: warning
annotations:
summary: "Redis master {{ $labels.pod }} has only {{ $value }} connected replicas (expected 2)"
summary: "Redis has no ready replicas"
- alert: HeadscaleDown
expr: (kube_deployment_status_replicas_available{namespace="headscale"} or on() vector(0)) < 1
for: 5m
@ -1921,24 +1868,13 @@ serverFiles:
summary: "NetFlow processing delay p50: {{ $value | printf \"%.0f\" }}s — softflowd may be overloaded"
- name: "DNS Anomaly Detection"
rules:
# Spike detection: compare current value against its own 1h history via
# avg_over_time. Previous version compared against dns_anomaly_avg_queries
# which was computed from a per-pod /tmp file and always equalled the
# current value (fresh /tmp each run), so the alert could never fire.
- alert: DNSQuerySpike
expr: dns_anomaly_total_queries > 2 * avg_over_time(dns_anomaly_total_queries[1h] offset 15m) and dns_anomaly_total_queries > 1000
expr: dns_anomaly_total_queries > 2 * dns_anomaly_avg_queries and dns_anomaly_total_queries > 1000
for: 0m
labels:
severity: warning
annotations:
summary: "DNS query spike: {{ $value | printf \"%.0f\" }} queries (>2x 1h avg)"
- alert: DNSQueryRateDropped
expr: dns_anomaly_total_queries < 0.5 * avg_over_time(dns_anomaly_total_queries[1h] offset 15m) and avg_over_time(dns_anomaly_total_queries[1h] offset 15m) > 1000
for: 10m
labels:
severity: warning
annotations:
summary: "DNS query volume dropped: {{ $value | printf \"%.0f\" }} queries (<50% of 1h avg) — upstream clients may be failing to reach Technitium"
summary: "DNS query spike: {{ $value | printf \"%.0f\" }} queries (>2x average)"
- alert: DNSHighErrorRate
expr: dns_anomaly_server_failure > 100
for: 0m
@ -1946,77 +1882,18 @@ serverFiles:
severity: warning
annotations:
summary: "High DNS SERVFAIL rate: {{ $value | printf \"%.0f\" }} failures detected"
- alert: TechnitiumZoneSyncFailed
expr: technitium_zone_sync_status != 0
for: 30m
labels:
severity: warning
annotations:
summary: "Technitium zone-sync CronJob has reported failure for 30m — replicas may be missing zones"
- alert: TechnitiumZoneSyncStale
expr: (time() - technitium_zone_sync_last_run) > 3600
for: 10m
labels:
severity: warning
annotations:
summary: "Technitium zone-sync has not run successfully in >1h (last: {{ $value | humanizeDuration }} ago)"
- alert: TechnitiumZoneCountMismatch
expr: (max(technitium_zone_count) - min(technitium_zone_count)) > 0
for: 15m
labels:
severity: warning
annotations:
summary: "Technitium zone counts differ across instances (max-min delta: {{ $value | printf \"%.0f\" }}) — replica has drifted from primary"
- alert: CoreDNSForwardFailureRate
expr: sum(rate(coredns_forward_responses_total{rcode=~"SERVFAIL|REFUSED"}[5m])) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "CoreDNS forward SERVFAIL/REFUSED rate: {{ $value | printf \"%.2f\" }}/s — upstream DNS (pfSense/public) may be unhealthy"
- name: qbittorrent
rules:
- alert: MAMMouseClass
expr: mam_class_code == 0
- alert: QBittorrentMAMRatioLow
expr: qbt_tracker_ratio{tracker="mam"} < 1.0
for: 1h
labels:
severity: critical
annotations:
summary: "MAM account is in Mouse class — tracker is refusing announces, ratio cannot recover"
- alert: MAMCookieExpired
expr: mam_farming_cookie_expired > 0
for: 0m
labels:
severity: critical
annotations:
summary: "MAM session cookie has expired — refresh `mam_id` in Vault servarr/mam_id"
- alert: MAMRatioBelowOne
expr: mam_ratio < 1.0
for: 24h
labels:
severity: warning
annotations:
summary: "MAM ratio is {{ $value | printf \"%.2f\" }} for 24h (target: >= 1.0)"
- alert: MAMFarmingStuck
expr: |
increase(mam_farming_grabbed[4h]) == 0
and mam_farming_total_seeding < 150
and mam_ratio >= 1.2
for: 4h
labels:
severity: warning
annotations:
summary: "Grabber has added 0 torrents in 4h despite healthy ratio ({{ $value | printf \"%.2f\" }})"
- alert: MAMJanitorStuckBacklog
expr: mam_janitor_skipped_active > 400
for: 6h
labels:
severity: warning
annotations:
summary: "Janitor is skipping {{ $value | printf \"%.0f\" }} in-progress torrents — queue not draining"
summary: "MAM ratio is {{ $value | printf \"%.2f\" }} (must be >= 1.0)"
- alert: QBittorrentDisconnected
expr: qbt_connected == 0
for: 10m
for: 5m
labels:
severity: critical
annotations:
@ -2100,41 +1977,6 @@ serverFiles:
severity: warning
annotations:
summary: "Authentik outpost restarted {{ $value | printf \"%.0f\" }} times in 30m — check for OOM or crash loop"
- alert: AuthentikOutpostDevShmFull
# Direct filesystem measure of the /dev/shm emptyDir sizeLimit.
# The 2026-04-18 incident went undetected for 40h because working-set
# memory lags tmpfs fill (files count against memory but not always
# against working set). This rule catches the underlying cause.
# See docs/post-mortems/2026-04-18-authentik-outpost-shm-full.md.
expr: container_fs_usage_bytes{namespace="authentik", pod=~"ak-outpost-.*"} / container_fs_limit_bytes{namespace="authentik", pod=~"ak-outpost-.*"} > 0.8
for: 5m
labels:
severity: critical
annotations:
summary: "Authentik outpost filesystem at {{ $value | humanizePercentage }} on {{ $labels.pod }} — session files filling tmpfs, forward-auth imminent failure"
- alert: AuthentikOutpostForwardAuth400Spike
# Sudden 400 spike from the outpost means forward-auth is broken
# for all protected services. The /dev/shm ENOSPC class of failures
# manifests as the outpost returning 400 on /outpost.goauthentik.io/auth/traefik.
expr: sum by (service) (increase(traefik_service_requests_total{code="400", service=~"authentik-authentik-outpost.*"}[5m])) > 10
for: 2m
labels:
severity: critical
annotations:
summary: "Authentik outpost returning {{ $value | printf \"%.0f\" }} 400s in 5m on {{ $labels.service }} — forward-auth broken for all 43 protected services"
- alert: AuthentikServerReplicasMismatch
# With 3 replicas + PDB minAvailable=2, a sustained drop to <3
# means a node is unschedulable, image pull failing, or quota hit.
expr: (kube_deployment_spec_replicas{namespace="authentik", deployment="goauthentik-server"} - kube_deployment_status_replicas_available{namespace="authentik", deployment="goauthentik-server"}) > 0
for: 15m
labels:
severity: warning
annotations:
summary: "Authentik server has {{ $value }} unavailable replica(s) for 15m — check pod events"
# Mailserver Dovecot alerts were removed with the exporter in
# code-1ik (viktorbarzin/dovecot_exporter incompatible with
# Dovecot 2.3 stats architecture). Re-add the rule group if a
# working exporter is introduced.
- name: Infrastructure Drift
# Metrics pushed by .woodpecker/drift-detection.yml after each cron run.
# See Wave 7 of the state-drift consolidation plan.
@ -2169,11 +2011,6 @@ serverFiles:
summary: "{{ $value | printf \"%.0f\" }} stacks drifting — likely a systemic cause (new admission webhook, provider upgrade). Check the most recent drift-detection run in Woodpecker."
extraScrapeConfigs: |
# The `mailserver-dovecot` scrape job was retired in code-1ik together
# with the Dovecot exporter. docker-mailserver 15.0.0's Dovecot 2.3
# doesn't emit the old_stats protocol the exporter expected, so the
# scrape only ever returned `dovecot_up{scope="user"} 0`. Re-add here
# if a working exporter is introduced.
- job_name: 'proxmox-host'
static_configs:
- targets:

View file

@ -142,7 +142,7 @@ resource "kubernetes_deployment" "onlyoffice-document-server" {
app = "onlyoffice-document-server"
}
annotations = {
"dependency.kyverno.io/wait-for" = "mysql.dbaas:3306,redis-master.redis:6379"
"dependency.kyverno.io/wait-for" = "mysql.dbaas:3306,redis.redis:6379"
}
}
spec {

View file

@ -441,6 +441,11 @@ resource "kubernetes_deployment" "openclaw" {
name = "UPTIME_KUMA_PASSWORD"
value = local.skill_secrets["uptime_kuma_password"]
}
# Skill secrets - Slack
env {
name = "SLACK_WEBHOOK_URL"
value = local.skill_secrets["slack_webhook"]
}
# Memory API
env {
name = "MEMORY_API_URL"
@ -832,19 +837,15 @@ resource "kubernetes_service" "task_webhook" {
}
module "task_webhook_ingress" {
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.openclaw.metadata[0].name
name = "task-webhook"
tls_secret_name = var.tls_secret_name
host = "task-webhook"
port = 80
external_monitor = false
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.openclaw.metadata[0].name
name = "task-webhook"
tls_secret_name = var.tls_secret_name
host = "task-webhook"
port = 80
}
# --- Shared ServiceAccount: grants pod-exec into the openclaw pod ---
# Used by the task_processor CronJob (below). Previously also used by the
# cluster_healthcheck CronJob, which has been decommissioned the local
# `scripts/cluster_healthcheck.sh` is now the single authoritative runner.
# --- CronJob: Scheduled cluster health check ---
resource "kubernetes_service_account" "healthcheck" {
metadata {
@ -887,6 +888,76 @@ resource "kubernetes_role_binding" "healthcheck_exec" {
}
}
resource "kubernetes_cron_job_v1" "cluster_healthcheck" {
metadata {
name = "cluster-healthcheck"
namespace = kubernetes_namespace.openclaw.metadata[0].name
labels = {
app = "cluster-healthcheck"
tier = local.tiers.aux
}
}
spec {
schedule = "0 */8 * * *"
concurrency_policy = "Forbid"
failed_jobs_history_limit = 3
successful_jobs_history_limit = 3
job_template {
metadata {
labels = {
app = "cluster-healthcheck"
}
}
spec {
active_deadline_seconds = 300
backoff_limit = 0
template {
metadata {
labels = {
app = "cluster-healthcheck"
}
}
spec {
service_account_name = kubernetes_service_account.healthcheck.metadata[0].name
restart_policy = "Never"
container {
name = "healthcheck"
image = "bitnami/kubectl:latest"
command = ["bash", "-c", <<-EOF
# Find the openclaw pod
POD=$(kubectl get pods -n openclaw -l app=openclaw -o jsonpath='{.items[0].metadata.name}' 2>/dev/null)
if [ -z "$POD" ]; then
echo "ERROR: OpenClaw pod not found"
exit 1
fi
echo "Executing health check in pod $POD..."
kubectl exec -n openclaw "$POD" -c openclaw -- bash /workspace/infra/.claude/cluster-health.sh
EOF
]
resources {
requests = {
cpu = "50m"
memory = "64Mi"
}
limits = {
memory = "64Mi"
}
}
}
}
}
}
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}
# --- CronJob: Task processor polls Forgejo issues and triggers OpenClaw ---
resource "kubernetes_cron_job_v1" "task_processor" {
@ -911,9 +982,8 @@ resource "kubernetes_cron_job_v1" "task_processor" {
}
}
spec {
active_deadline_seconds = 600
backoff_limit = 0
ttl_seconds_after_finished = 86400
active_deadline_seconds = 600
backoff_limit = 0
template {
metadata {
labels = {

View file

@ -1,82 +0,0 @@
-- ot-recorder Lua hook: forward every location publish to Dawarich.
-- Loaded by ot-recorder via `--lua-script`. The hook() function is invoked
-- synchronously per publish; we fork curl with `&` to keep it fire-and-forget.
-- Dawarich's points table has UNIQUE (lonlat, timestamp, user_id) — duplicates
-- are safely dropped. The .rec file is always written regardless of hook result,
-- so a Dawarich 5xx loses nothing long-term (re-playable via backfill Job).
local function escape_shell_single(s)
return "'" .. tostring(s):gsub("'", "'\\''") .. "'"
end
local function json_escape_string(s)
return (s:gsub("\\", "\\\\")
:gsub('"', '\\"')
:gsub("\n", "\\n")
:gsub("\r", "\\r")
:gsub("\t", "\\t"))
end
-- Minimal JSON serializer — scalars, arrays, maps. Owntracks payloads are
-- all primitive/flat; no bignum or cyclic-ref concerns.
local function to_json(v)
local t = type(v)
if t == "nil" then return "null" end
if t == "number" then return tostring(v) end
if t == "boolean" then return tostring(v) end
if t == "string" then return '"' .. json_escape_string(v) .. '"' end
if t == "table" then
if #v > 0 or next(v) == nil then
local parts = {}
for i, x in ipairs(v) do parts[i] = to_json(x) end
return "[" .. table.concat(parts, ",") .. "]"
end
local parts = {}
for k, x in pairs(v) do
parts[#parts + 1] = '"' .. json_escape_string(tostring(k)) .. '":' .. to_json(x)
end
return "{" .. table.concat(parts, ",") .. "}"
end
return "null"
end
function otr_init()
otr.log("dawarich-bridge: init")
if not os.getenv("DAWARICH_API_KEY") then
otr.log("dawarich-bridge: WARN DAWARICH_API_KEY unset — hook will skip")
end
end
function otr_exit()
otr.log("dawarich-bridge: exit")
end
function otr_hook(topic, _type, data)
if _type ~= "location" then return end
local api_key = os.getenv("DAWARICH_API_KEY")
if not api_key or api_key == "" then
otr.log("dawarich-bridge: DAWARICH_API_KEY missing — dropping point")
return
end
-- Strip the base64 user avatar: ot-recorder appends a ~120KB `face` field
-- to enriched payloads which pushes the curl command past ARG_MAX (code=7
-- "Argument list too long"). Dawarich doesn't need it.
data.face = nil
local url = "https://dawarich.viktorbarzin.me/api/v1/owntracks/points?api_key=" .. api_key
local payload = to_json(data)
local cmd = table.concat({
"curl -sS -o /dev/null --max-time 5 -X POST",
"-H 'Content-Type: application/json'",
"-d", escape_shell_single(payload),
escape_shell_single(url),
"&",
}, " ")
local ok, reason, code = os.execute(cmd)
if not ok then
otr.log("dawarich-bridge: FAIL tst=" .. tostring(data.tst) ..
" reason=" .. tostring(reason) .. " code=" .. tostring(code) ..
" cmd=" .. cmd)
else
otr.log("dawarich-bridge: ok tst=" .. tostring(data.tst))
end
end

View file

@ -86,13 +86,25 @@ resource "kubernetes_secret" "basic_auth" {
}
}
resource "kubernetes_config_map" "dawarich_hook" {
resource "kubernetes_persistent_volume_claim" "data_proxmox" {
wait_until_bound = false
metadata {
name = "dawarich-hook"
name = "owntracks-data-proxmox"
namespace = kubernetes_namespace.owntracks.metadata[0].name
annotations = {
"resize.topolvm.io/threshold" = "80%"
"resize.topolvm.io/increase" = "100%"
"resize.topolvm.io/storage_limit" = "5Gi"
}
}
data = {
"dawarich-hook.lua" = file("${path.module}/dawarich-hook.lua")
spec {
access_modes = ["ReadWriteOnce"]
storage_class_name = "proxmox-lvm"
resources {
requests = {
storage = "1Gi"
}
}
}
}
@ -137,23 +149,10 @@ resource "kubernetes_deployment" "owntracks" {
name = "http"
container_port = 8083
}
# ot-recorder 1.0.1 has no OTR_HTTPHOOK; forwarding to Dawarich is
# done via a Lua hook script loaded with --lua-script. The script
# reads DAWARICH_API_KEY from env and fires curl fire-and-forget.
args = ["--lua-script", "/hook/dawarich-hook.lua", "owntracks/#"]
env {
name = "OTR_PORT"
value = "0"
}
env {
name = "DAWARICH_API_KEY"
value_from {
secret_key_ref {
name = "owntracks-secrets"
key = "dawarich_api_key"
}
}
}
volume_mount {
name = "data"
@ -163,11 +162,6 @@ resource "kubernetes_deployment" "owntracks" {
name = "data"
mount_path = "/config"
}
volume_mount {
name = "hook"
mount_path = "/hook"
read_only = true
}
resources {
requests = {
cpu = "10m"
@ -184,12 +178,6 @@ resource "kubernetes_deployment" "owntracks" {
claim_name = "owntracks-data-encrypted"
}
}
volume {
name = "hook"
config_map {
name = kubernetes_config_map.dawarich_hook.metadata[0].name
}
}
}
}
}

View file

@ -117,7 +117,7 @@ resource "kubernetes_deployment" "paperless-ngx" {
annotations = {
"diun.enable" = "true"
"diun.include_tags" = "^\\d+(?:\\.\\d+)?(?:\\.\\d+)?$"
"dependency.kyverno.io/wait-for" = "mysql.dbaas:3306,redis-master.redis:6379"
"dependency.kyverno.io/wait-for" = "mysql.dbaas:3306,redis.redis:6379"
}
}
spec {

View file

@ -217,7 +217,7 @@ resource "kubernetes_deployment" "realestate-crawler-api" {
"kubernetes.io/cluster-service" = "true"
}
annotations = {
"dependency.kyverno.io/wait-for" = "mysql.dbaas:3306,redis-master.redis:6379"
"dependency.kyverno.io/wait-for" = "mysql.dbaas:3306,redis.redis:6379"
}
}
spec {
@ -395,7 +395,7 @@ resource "kubernetes_deployment" "realestate-crawler-celery" {
app = "realestate-crawler-celery"
}
annotations = {
"dependency.kyverno.io/wait-for" = "mysql.dbaas:3306,redis-master.redis:6379"
"dependency.kyverno.io/wait-for" = "mysql.dbaas:3306,redis.redis:6379"
}
}
spec {
@ -524,7 +524,7 @@ resource "kubernetes_deployment" "realestate-crawler-celery-beat" {
app = "realestate-crawler-celery-beat"
}
annotations = {
"dependency.kyverno.io/wait-for" = "mysql.dbaas:3306,redis-master.redis:6379"
"dependency.kyverno.io/wait-for" = "mysql.dbaas:3306,redis.redis:6379"
}
}
spec {

View file

@ -72,18 +72,13 @@ resource "helm_release" "redis" {
}
}
# 256Mi was too tight once the working set crossed ~200Mi: BGSAVE
# fork during a replica full PSYNC doubled RSS via COW and pushed
# the master past 256Mi OOMKilled (exit 137), HAProxy flapped,
# every redis client (Paperless, Immich, Authentik) saw connection
# resets. 512Mi gives ~2x headroom on the current 204Mi RDB.
resources = {
requests = {
cpu = "100m"
memory = "512Mi"
memory = "64Mi"
}
limits = {
memory = "512Mi"
memory = "64Mi"
}
}
}
@ -105,10 +100,10 @@ resource "helm_release" "redis" {
resources = {
requests = {
cpu = "50m"
memory = "512Mi"
memory = "64Mi"
}
limits = {
memory = "512Mi"
memory = "64Mi"
}
}
}
@ -149,24 +144,6 @@ resource "kubernetes_config_map" "haproxy" {
timeout server 30s
timeout check 3s
# Dynamic DNS resolution via cluster CoreDNS. Without this, haproxy
# resolves server hostnames once at startup and caches forever, so
# when redis-node-X pods restart and get new IPs, haproxy keeps
# connecting to the old (dead) IPs and returns "Connection refused"
# until haproxy itself is restarted. This caused an immich outage
# on 2026-04-19 after a redis pod cycle.
resolvers kubernetes
nameserver coredns kube-dns.kube-system.svc.cluster.local:53
resolve_retries 3
timeout resolve 1s
timeout retry 1s
hold other 10s
hold refused 10s
hold nx 10s
hold timeout 10s
hold valid 10s
hold obsolete 10s
frontend redis_front
bind *:6379
default_backend redis_master
@ -186,13 +163,13 @@ resource "kubernetes_config_map" "haproxy" {
tcp-check expect rstring role:master
tcp-check send "QUIT\r\n"
tcp-check expect string +OK
server redis-node-0 redis-node-0.redis-headless.redis.svc.cluster.local:6379 check inter 1s fall 2 rise 2 resolvers kubernetes init-addr last,libc,none
server redis-node-1 redis-node-1.redis-headless.redis.svc.cluster.local:6379 check inter 1s fall 2 rise 2 resolvers kubernetes init-addr last,libc,none
server redis-node-0 redis-node-0.redis-headless.redis.svc.cluster.local:6379 check inter 1s fall 2 rise 2
server redis-node-1 redis-node-1.redis-headless.redis.svc.cluster.local:6379 check inter 1s fall 2 rise 2
backend redis_sentinel
balance roundrobin
server redis-node-0 redis-node-0.redis-headless.redis.svc.cluster.local:26379 check inter 5s resolvers kubernetes init-addr last,libc,none
server redis-node-1 redis-node-1.redis-headless.redis.svc.cluster.local:26379 check inter 5s resolvers kubernetes init-addr last,libc,none
server redis-node-0 redis-node-0.redis-headless.redis.svc.cluster.local:26379 check inter 5s
server redis-node-1 redis-node-1.redis-headless.redis.svc.cluster.local:26379 check inter 5s
EOT
}
}
@ -206,11 +183,7 @@ resource "kubernetes_deployment" "haproxy" {
}
}
spec {
# 3 replicas + PDB minAvailable=2 (see kubernetes_pod_disruption_budget_v1.redis_haproxy).
# After Nextcloud drops its sentinel fallback in Phase 6 of the 2026-04-19 redis
# rework, HAProxy is the sole client-facing path for all 17 redis consumers, so
# it needs HA equivalent to other critical-path pods (Traefik, Authentik, PgBouncer).
replicas = 3
replicas = 2
selector {
match_labels = {
app = "redis-haproxy"
@ -221,11 +194,6 @@ resource "kubernetes_deployment" "haproxy" {
labels = {
app = "redis-haproxy"
}
annotations = {
# Roll the deployment whenever haproxy.cfg content changes so a
# config update (e.g. DNS resolver tweaks) actually takes effect.
"checksum/config" = sha256(kubernetes_config_map.haproxy.data["haproxy.cfg"])
}
}
spec {
container {
@ -314,11 +282,7 @@ resource "kubernetes_service" "redis_master" {
# This runs on every apply to ensure the Helm chart's service is always corrected.
resource "null_resource" "patch_redis_service" {
triggers = {
# Re-patch only when a Helm upgrade (chart version bump) or an HAProxy
# config change could have reset the selector / rotated HAProxy pods.
# timestamp() would force-replace on every apply, hiding real drift.
chart_version = helm_release.redis.version
haproxy_config = sha256(kubernetes_config_map.haproxy.data["haproxy.cfg"])
always = timestamp()
}
provisioner "local-exec" {
@ -340,499 +304,6 @@ module "nfs_backup_host" {
nfs_path = "/srv/nfs/redis-backup"
}
#### Redis v2 parallel 3-node raw StatefulSet (target architecture)
#
# Built alongside the Bitnami helm_release.redis so data can migrate via
# REPLICAOF with <60s cutover downtime (see session plan / beads code-v2b).
#
# Pattern: MySQL standalone precedent (stacks/dbaas/modules/dbaas/main.tf,
# 2026-04-16 migration) raw kubernetes_stateful_set_v1 + official image,
# no Bitnami Helm chart (deprecated by Broadcom Aug 2025; atomic-Helm trap
# caused the 2026-04-04 memory-bump deadlock).
#
# Design choices driven by incident cluster in April 2026:
# - 3 sentinels (odd count, quorum=2) eliminates the split-brain class
# that caused the 2026-04-19 PM incident (2 sentinels, stale master state).
# - Init container regenerates sentinel.conf on every boot by probing
# peers for role:master no persistent sentinel runtime state, so stale
# entries can never resurface across pod restarts.
# - podManagementPolicy=Parallel all 3 pods start together, avoiding the
# "sentinel-0 elects before -2 booted" ordering bug.
# - Memory 768Mi (up from 512Mi) concurrent BGSAVE + AOF-rewrite fork can
# double RSS via COW. auto-aof-rewrite-percentage 200 + min-size 128mb
# tune down rewrite frequency.
# - Persistence: RDB snapshots + AOF everysec. Measured <1 GB/day write
# volume (2026-04-19 disk-wear analysis) 40+ year SSD runway.
# - HAProxy remains sole client-facing path for all 17 consumers.
resource "kubernetes_config_map" "redis_v2_conf" {
metadata {
name = "redis-v2-conf"
namespace = kubernetes_namespace.redis.metadata[0].name
}
data = {
"redis.conf" = <<-EOT
bind 0.0.0.0 -::*
port 6379
protected-mode no
dir /data
maxmemory 640mb
maxmemory-policy allkeys-lru
save 900 1
save 300 100
save 60 10000
rdbcompression yes
rdbchecksum yes
stop-writes-on-bgsave-error no
appendonly yes
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 200
auto-aof-rewrite-min-size 128mb
aof-load-truncated yes
aof-use-rdb-preamble yes
replica-read-only yes
replica-serve-stale-data yes
timeout 0
tcp-keepalive 300
tcp-backlog 511
databases 16
loglevel notice
# Included last so `replicaof` directive written by the init container
# overrides the "standalone master" default. Prevents the parallel-
# bootstrap race where all 3 pods claim role:master simultaneously.
include /shared/replica.conf
EOT
}
}
resource "kubernetes_config_map" "redis_v2_sentinel_bootstrap" {
metadata {
name = "redis-v2-sentinel-bootstrap"
namespace = kubernetes_namespace.redis.metadata[0].name
}
data = {
"init.sh" = <<-EOT
#!/bin/sh
set -eu
HOSTNAME=$(hostname)
MY_NUM=$${HOSTNAME##*-}
MY_DNS="$HOSTNAME.redis-v2-headless.redis.svc.cluster.local"
MASTER_HOST=""
echo "=== Redis v2 bootstrap ==="
echo "hostname: $HOSTNAME (index $MY_NUM)"
# Priority 1: ask peer sentinels for the consensus master. Covers the
# "steady-state pod restart" case sentinels already agree on reality
# and a restarting pod should join that topology.
votes_0=0; votes_1=0; votes_2=0; votes_total=0
for i in 0 1 2; do
if [ "$i" = "$MY_NUM" ]; then continue; fi
peer="redis-v2-$i.redis-v2-headless.redis.svc.cluster.local"
reply=$(redis-cli -h "$peer" -p 26379 -t 2 SENTINEL get-master-addr-by-name mymaster 2>/dev/null | head -n1 || true)
echo "sentinel probe $peer: master=$${reply:-unreachable}"
case "$reply" in
*redis-v2-0*) votes_0=$((votes_0 + 1)); votes_total=$((votes_total + 1)) ;;
*redis-v2-1*) votes_1=$((votes_1 + 1)); votes_total=$((votes_total + 1)) ;;
*redis-v2-2*) votes_2=$((votes_2 + 1)); votes_total=$((votes_total + 1)) ;;
esac
done
if [ "$votes_total" -gt 0 ]; then
if [ "$votes_0" -ge "$votes_1" ] && [ "$votes_0" -ge "$votes_2" ] && [ "$votes_0" -gt 0 ]; then
MASTER_HOST="redis-v2-0.redis-v2-headless.redis.svc.cluster.local"
elif [ "$votes_1" -ge "$votes_2" ] && [ "$votes_1" -gt 0 ]; then
MASTER_HOST="redis-v2-1.redis-v2-headless.redis.svc.cluster.local"
elif [ "$votes_2" -gt 0 ]; then
MASTER_HOST="redis-v2-2.redis-v2-headless.redis.svc.cluster.local"
fi
[ -n "$MASTER_HOST" ] && echo "sentinel vote winner: $MASTER_HOST"
fi
# Priority 2: look for a peer redis that's a master WITH at least one
# replica connected. "Standalone master" peers (bootstrap race) are
# skipped connected_slaves=0 is ambiguous.
if [ -z "$MASTER_HOST" ]; then
for i in 0 1 2; do
if [ "$i" = "$MY_NUM" ]; then continue; fi
peer="redis-v2-$i.redis-v2-headless.redis.svc.cluster.local"
info=$(redis-cli -h "$peer" -t 2 INFO replication 2>/dev/null || true)
role=$(echo "$info" | awk -F: '/^role:/ {gsub(/\r/,""); print $2; exit}')
slaves=$(echo "$info" | awk -F: '/^connected_slaves:/ {gsub(/\r/,""); print $2; exit}')
echo "redis probe $peer: role=$${role:-unreachable} slaves=$${slaves:-0}"
if [ "$role" = "master" ] && [ "$${slaves:-0}" -gt 0 ]; then
MASTER_HOST="$peer"
break
fi
done
fi
# Priority 3: deterministic fallback pod -0 is always the bootstrap
# master on a fresh cluster. All sentinels converge here, no race.
if [ -z "$MASTER_HOST" ]; then
MASTER_HOST="redis-v2-0.redis-v2-headless.redis.svc.cluster.local"
echo "no master found via probes — bootstrap default: $MASTER_HOST"
fi
cat > /shared/sentinel.conf <<EOF
port 26379
bind 0.0.0.0 -::*
dir /shared
sentinel resolve-hostnames yes
sentinel announce-hostnames yes
sentinel monitor mymaster $MASTER_HOST 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 30000
sentinel parallel-syncs mymaster 1
EOF
# replica.conf is included by redis.conf (see ConfigMap redis_v2_conf).
# Master pod gets an empty file; replicas get `replicaof <master>`.
# This way pods come up already in the right role no post-start race.
if [ "$MY_DNS" = "$MASTER_HOST" ]; then
: > /shared/replica.conf
echo "role: master"
else
echo "replicaof $MASTER_HOST 6379" > /shared/replica.conf
echo "role: replica of $MASTER_HOST"
fi
echo "=== bootstrap complete ==="
cat /shared/sentinel.conf
echo "--- replica.conf ---"
cat /shared/replica.conf
EOT
}
}
resource "kubernetes_service" "redis_v2_headless" {
metadata {
name = "redis-v2-headless"
namespace = kubernetes_namespace.redis.metadata[0].name
labels = {
app = "redis-v2"
}
}
spec {
cluster_ip = "None"
publish_not_ready_addresses = true
selector = {
app = "redis-v2"
}
port {
name = "redis"
port = 6379
}
port {
name = "sentinel"
port = 26379
}
port {
name = "exporter"
port = 9121
}
}
}
resource "kubernetes_stateful_set_v1" "redis_v2" {
metadata {
name = "redis-v2"
namespace = kubernetes_namespace.redis.metadata[0].name
labels = {
app = "redis-v2"
}
}
spec {
service_name = kubernetes_service.redis_v2_headless.metadata[0].name
replicas = 3
pod_management_policy = "Parallel"
selector {
match_labels = {
app = "redis-v2"
}
}
template {
metadata {
labels = {
app = "redis-v2"
}
annotations = {
"prometheus.io/scrape" = "true"
"prometheus.io/port" = "9121"
"checksum/conf" = sha256(kubernetes_config_map.redis_v2_conf.data["redis.conf"])
"checksum/bootstrap" = sha256(kubernetes_config_map.redis_v2_sentinel_bootstrap.data["init.sh"])
}
}
spec {
termination_grace_period_seconds = 30
affinity {
pod_anti_affinity {
preferred_during_scheduling_ignored_during_execution {
weight = 100
pod_affinity_term {
label_selector {
match_expressions {
key = "app"
operator = "In"
values = ["redis-v2"]
}
}
topology_key = "kubernetes.io/hostname"
}
}
}
}
init_container {
name = "generate-sentinel-conf"
image = "docker.io/library/redis:7.4-alpine"
command = ["/bin/sh", "/bootstrap/init.sh"]
resources {
requests = {
cpu = "10m"
memory = "32Mi"
}
limits = {
memory = "32Mi"
}
}
volume_mount {
name = "bootstrap"
mount_path = "/bootstrap"
read_only = true
}
volume_mount {
name = "shared"
mount_path = "/shared"
}
}
container {
name = "redis"
image = "docker.io/library/redis:7.4-alpine"
command = ["redis-server", "/etc/redis/redis.conf"]
port {
container_port = 6379
name = "redis"
}
resources {
requests = {
cpu = "100m"
memory = "768Mi"
}
limits = {
memory = "768Mi"
}
}
volume_mount {
name = "data"
mount_path = "/data"
}
volume_mount {
name = "conf"
mount_path = "/etc/redis"
read_only = true
}
volume_mount {
# redis.conf `include /shared/replica.conf` written by init container.
name = "shared"
mount_path = "/shared"
read_only = true
}
liveness_probe {
exec {
command = ["redis-cli", "PING"]
}
initial_delay_seconds = 15
period_seconds = 10
timeout_seconds = 3
failure_threshold = 3
}
readiness_probe {
exec {
command = ["redis-cli", "PING"]
}
initial_delay_seconds = 5
period_seconds = 5
timeout_seconds = 3
failure_threshold = 3
}
}
container {
name = "sentinel"
image = "docker.io/library/redis:7.4-alpine"
command = ["redis-sentinel", "/shared/sentinel.conf"]
port {
container_port = 26379
name = "sentinel"
}
resources {
requests = {
cpu = "20m"
memory = "64Mi"
}
limits = {
memory = "64Mi"
}
}
volume_mount {
name = "shared"
mount_path = "/shared"
}
liveness_probe {
exec {
command = ["redis-cli", "-p", "26379", "PING"]
}
initial_delay_seconds = 20
period_seconds = 10
timeout_seconds = 3
failure_threshold = 3
}
readiness_probe {
exec {
command = ["redis-cli", "-p", "26379", "PING"]
}
initial_delay_seconds = 10
period_seconds = 5
timeout_seconds = 3
failure_threshold = 3
}
}
container {
name = "exporter"
image = "docker.io/oliver006/redis_exporter:v1.62.0"
port {
container_port = 9121
name = "exporter"
}
env {
name = "REDIS_ADDR"
value = "redis://localhost:6379"
}
resources {
requests = {
cpu = "10m"
memory = "32Mi"
}
limits = {
memory = "32Mi"
}
}
liveness_probe {
http_get {
path = "/"
port = 9121
}
initial_delay_seconds = 15
period_seconds = 30
timeout_seconds = 5
}
}
volume {
name = "conf"
config_map {
name = kubernetes_config_map.redis_v2_conf.metadata[0].name
}
}
volume {
name = "bootstrap"
config_map {
name = kubernetes_config_map.redis_v2_sentinel_bootstrap.metadata[0].name
default_mode = "0755"
}
}
volume {
name = "shared"
empty_dir {}
}
}
}
volume_claim_template {
metadata {
name = "data"
annotations = {
"resize.topolvm.io/threshold" = "80%"
"resize.topolvm.io/increase" = "100%"
"resize.topolvm.io/storage_limit" = "20Gi"
}
}
spec {
access_modes = ["ReadWriteOnce"]
storage_class_name = "proxmox-lvm-encrypted"
resources {
requests = {
storage = "5Gi"
}
}
}
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].template[0].spec[0].dns_config]
}
}
resource "kubernetes_pod_disruption_budget_v1" "redis_v2" {
metadata {
name = "redis-v2"
namespace = kubernetes_namespace.redis.metadata[0].name
}
spec {
min_available = 2
selector {
match_labels = {
app = "redis-v2"
}
}
}
}
resource "kubernetes_pod_disruption_budget_v1" "redis_haproxy" {
metadata {
name = "redis-haproxy"
namespace = kubernetes_namespace.redis.metadata[0].name
}
spec {
min_available = 2
selector {
match_labels = {
app = "redis-haproxy"
}
}
}
}
# Hourly backup: copy RDB snapshot from master to NFS
resource "kubernetes_cron_job_v1" "redis-backup" {
metadata {
@ -864,10 +335,10 @@ resource "kubernetes_cron_job_v1" "redis-backup" {
TIMESTAMP=$(date +%Y%m%d-%H%M)
# Trigger a fresh RDB save on the master
redis-cli -h redis-master.redis BGSAVE
redis-cli -h redis.redis BGSAVE
sleep 5
# Copy the RDB via redis-cli --rdb
redis-cli -h redis-master.redis --rdb /backup/redis-$TIMESTAMP.rdb
redis-cli -h redis.redis --rdb /backup/redis-$TIMESTAMP.rdb
# Rotate 28-day retention
find /backup -name 'redis-*.rdb' -type f -mtime +28 -delete

View file

@ -14,16 +14,7 @@ variable "name" {}
variable "namespace" {
default = "reverse-proxy"
}
variable "external_name" {
type = string
default = null
description = "DNS name for ExternalName Service. Mutually exclusive with backend_ip."
}
variable "backend_ip" {
type = string
default = null
description = "IP address backend. When set, creates a selector-less Service + EndpointSlice pointing at this IP. Mutually exclusive with external_name — use for hosts that aren't in Technitium (e.g. upstream gateways)."
}
variable "external_name" {}
variable "port" {
default = "80"
}
@ -104,14 +95,7 @@ variable "public_ipv6" {
}
locals {
use_backend_ip = var.backend_ip != null
port_name = var.backend_protocol == "HTTPS" ? "https-${var.name}" : "${var.name}-web"
}
# ExternalName flavor used when the backend is addressable by DNS.
resource "kubernetes_service" "proxied-service" {
count = local.use_backend_ip ? 0 : 1
metadata {
name = var.name
namespace = var.namespace
@ -125,7 +109,7 @@ resource "kubernetes_service" "proxied-service" {
external_name = var.external_name
port {
name = local.port_name
name = var.backend_protocol == "HTTPS" ? "https-${var.name}" : "${var.name}-web"
port = var.port
protocol = "TCP"
target_port = var.port
@ -133,73 +117,14 @@ resource "kubernetes_service" "proxied-service" {
}
}
# IP-backend flavor selector-less Service + manually-managed EndpointSlice.
# Used for upstreams that have no DNS entry in Technitium (e.g. 192.168.1.1).
resource "kubernetes_service" "ip-backend-service" {
count = local.use_backend_ip ? 1 : 0
metadata {
name = var.name
namespace = var.namespace
labels = {
"app" = var.name
}
}
spec {
type = "ClusterIP"
port {
name = local.port_name
port = var.port
protocol = "TCP"
target_port = var.port
}
}
}
resource "kubernetes_manifest" "ip_backend_endpointslice" {
count = local.use_backend_ip ? 1 : 0
manifest = {
apiVersion = "discovery.k8s.io/v1"
kind = "EndpointSlice"
metadata = {
name = var.name
namespace = var.namespace
labels = {
"kubernetes.io/service-name" = var.name
"app" = var.name
}
}
addressType = "IPv4"
ports = [{
name = local.port_name
port = tonumber(var.port)
protocol = "TCP"
}]
endpoints = [{
addresses = [var.backend_ip]
conditions = {
ready = true
}
}]
}
depends_on = [kubernetes_service.ip-backend-service]
}
locals {
# External monitor defaults: on when proxied, off otherwise. Explicit bool overrides.
effective_external_monitor = var.external_monitor != null ? var.external_monitor : (var.dns_type == "proxied")
# Emit the annotation when effective is true (positive signal), or when the
# caller explicitly set external_monitor=false (opt-out). When the caller
# leaves it null AND dns_type != "proxied", emit nothing the sync script's
# default opt-in (any *.viktorbarzin.me ingress) keeps monitoring services
# that are publicly reachable via routes we don't manage here.
external_monitor_annotations = local.effective_external_monitor ? merge(
{ "uptime.viktorbarzin.me/external-monitor" = "true" },
var.external_monitor_name != null ? { "uptime.viktorbarzin.me/external-monitor-name" = var.external_monitor_name } : {},
) : (var.external_monitor == false ?
{ "uptime.viktorbarzin.me/external-monitor" = "false" } : {}
)
) : {}
}
resource "kubernetes_ingress_v1" "proxied-ingress" {

View file

@ -112,11 +112,13 @@ module "idrac" {
depends_on = [kubernetes_namespace.reverse-proxy]
}
# Can either listen on https or http; can't do both :/
# TODO: Not working yet
module "tp-link-gateway" {
source = "./factory"
dns_type = "proxied"
name = "gw"
backend_ip = "192.168.1.1"
external_name = "gw.viktorbarzin.lan"
port = 443
tls_secret_name = var.tls_secret_name
backend_protocol = "HTTPS"
@ -151,6 +153,25 @@ module "truenas" {
depends_on = [kubernetes_namespace.reverse-proxy]
}
# https://r730.viktorbarzin.me/
module "r730" {
source = "./factory"
name = "r730"
external_name = "r730.viktorbarzin.lan"
port = 443
tls_secret_name = var.tls_secret_name
backend_protocol = "HTTPS"
depends_on = [kubernetes_namespace.reverse-proxy]
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "R730"
"gethomepage.dev/description" = "Dell PowerEdge server"
"gethomepage.dev/icon" = "dell.png"
"gethomepage.dev/group" = "Infrastructure"
"gethomepage.dev/pod-selector" = ""
}
}
# https://proxmox.viktorbarzin.me/
module "proxmox" {
source = "./factory"
@ -249,7 +270,6 @@ module "mladost3" {
port = 8080
tls_secret_name = var.tls_secret_name
depends_on = [kubernetes_namespace.reverse-proxy]
external_monitor = false
extra_annotations = { "gethomepage.dev/enabled" = "false" }
}
@ -281,101 +301,43 @@ resource "kubernetes_manifest" "ha_sofia_rate_limit" {
}
}
# Per-service retry bumps default (attempts=2) to 3 so transient DNS/connect
# stalls on the ha-sofia.viktorbarzin.lan ExternalName are absorbed before
# surfacing a 502. Drives bd code-rd1 Phase 2.2.
resource "kubernetes_manifest" "ha_sofia_retry" {
manifest = {
apiVersion = "traefik.io/v1alpha1"
kind = "Middleware"
metadata = {
name = "ha-sofia-retry"
namespace = "reverse-proxy"
}
spec = {
retry = {
attempts = 3
initialInterval = "100ms"
}
}
}
}
# Per-service ServersTransport overrides the global 60s dialTimeout
# (set for Immich) with 500ms so a stall fails fast and the retry middleware
# kicks in instead of blocking the connection for seconds. Drives bd
# code-rd1 Phase 2.3.
resource "kubernetes_manifest" "ha_sofia_transport" {
manifest = {
apiVersion = "traefik.io/v1alpha1"
kind = "ServersTransport"
metadata = {
name = "ha-sofia-transport"
namespace = "reverse-proxy"
}
spec = {
forwardingTimeouts = {
dialTimeout = "500ms"
responseHeaderTimeout = "30s"
idleConnTimeout = "90s"
}
}
}
}
module "ha-sofia" {
source = "./factory"
dns_type = "non-proxied"
name = "ha-sofia"
external_name = "ha-sofia.viktorbarzin.lan"
port = 8123
tls_secret_name = var.tls_secret_name
# depends_on on the retry/transport manifests avoids a dangling-reference
# window that would 404 ha-sofia traffic (memory 768: 2026-04-17 P0 outage).
depends_on = [
kubernetes_namespace.reverse-proxy,
kubernetes_manifest.ha_sofia_retry,
kubernetes_manifest.ha_sofia_transport,
]
source = "./factory"
dns_type = "non-proxied"
name = "ha-sofia"
external_name = "ha-sofia.viktorbarzin.lan"
port = 8123
tls_secret_name = var.tls_secret_name
depends_on = [kubernetes_namespace.reverse-proxy]
protected = false
skip_global_rate_limit = true
extra_middlewares = [
"reverse-proxy-ha-sofia-rate-limit@kubernetescrd",
"reverse-proxy-ha-sofia-retry@kubernetescrd",
]
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "Home Assistant Sofia"
"gethomepage.dev/description" = "Smart home hub"
"gethomepage.dev/icon" = "home-assistant.png"
"gethomepage.dev/group" = "Smart Home"
"gethomepage.dev/pod-selector" = ""
"traefik.ingress.kubernetes.io/service.serverstransport" = "reverse-proxy-ha-sofia-transport@kubernetescrd"
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "Home Assistant Sofia"
"gethomepage.dev/description" = "Smart home hub"
"gethomepage.dev/icon" = "home-assistant.png"
"gethomepage.dev/group" = "Smart Home"
"gethomepage.dev/pod-selector" = ""
}
}
# https://music-assistant.viktorbarzin.me/
module "music-assistant" {
source = "./factory"
dns_type = "non-proxied"
name = "music-assistant"
external_name = "ha-sofia.viktorbarzin.lan"
port = 8095
tls_secret_name = var.tls_secret_name
depends_on = [
kubernetes_namespace.reverse-proxy,
kubernetes_manifest.ha_sofia_retry,
kubernetes_manifest.ha_sofia_transport,
]
source = "./factory"
dns_type = "non-proxied"
name = "music-assistant"
external_name = "ha-sofia.viktorbarzin.lan"
port = 8095
tls_secret_name = var.tls_secret_name
depends_on = [kubernetes_namespace.reverse-proxy]
protected = false
skip_global_rate_limit = true
extra_middlewares = [
"reverse-proxy-ha-sofia-rate-limit@kubernetescrd",
"reverse-proxy-ha-sofia-retry@kubernetescrd",
]
extra_annotations = {
"traefik.ingress.kubernetes.io/service.serverstransport" = "reverse-proxy-ha-sofia-transport@kubernetescrd"
}
}
# https://ha-london.viktorbarzin.me/

View file

@ -86,15 +86,6 @@ module "qbittorrent" {
homepage_credentials = local.homepage_credentials
}
module "mam_farming" {
source = "./mam-farming"
namespace = kubernetes_namespace.servarr.metadata[0].name
depends_on = [
kubernetes_manifest.external_secret,
module.qbittorrent,
]
}
module "flaresolverr" {
source = "./flaresolverr"
tls_secret_name = var.tls_secret_name

View file

@ -1,163 +0,0 @@
"""
MAM bonus-point spender tier-aware, pay-what-we-owe.
MAM's bonusBuy.php API enforces a hard 50 GiB minimum per purchase
("Automated spenders are limited to buying at least 50 GB... due to log
spam"). Valid API tiers are 50, 100, 200, 500 GiB (@ 500 BP/GiB). That
means the "pay exactly what we owe" approach from the recovery plan
rounds UP to 50 GiB for the first purchase small buys can only be done
via the web UI, not the API.
Logic: pick the smallest valid tier that both (a) satisfies the ratio
deficit and (b) we can afford without burning the BP reserve. Skip if
nothing fits; the cron will retry in 6 h once BP grows.
"""
import math
import os
import sys
import tempfile
import time
import requests
PUSHGW = "http://prometheus-prometheus-pushgateway.monitoring:9091"
COOKIE_FILE = "/data/mam_id"
TARGET_RATIO = float(os.environ.get("TARGET_RATIO", "2.0"))
RESERVE_BP = int(os.environ.get("RESERVE_BP", "500"))
BP_PER_GB = int(os.environ.get("BP_PER_GB", "500"))
# MAM-enforced minimum purchase for API callers: 50 GiB.
API_TIERS_GIB = (50, 100, 200, 500)
CLASS_CODES = {
"Mouse": 0,
"Vole": 1,
"User": 2,
"Power User": 3,
"Elite": 4,
"Torrent Master": 5,
"Power TM": 6,
"Elite TM": 7,
"VIP": 8,
}
def save_cookie(resp):
for c in resp.cookies:
if c.name == "mam_id":
fd, tmp = tempfile.mkstemp(dir="/data")
os.write(fd, c.value.encode())
os.close(fd)
os.rename(tmp, COOKIE_FILE)
return
def push(metrics):
try:
requests.post(
f"{PUSHGW}/metrics/job/mam-bp-spender", data=metrics, timeout=10
)
except Exception as e:
print(f"pushgateway error: {e}", file=sys.stderr)
def load_cookie():
if os.path.exists(COOKIE_FILE):
return open(COOKIE_FILE).read().strip()
return os.environ.get("MAM_ID", "")
def main():
mam_id = load_cookie()
if not mam_id:
print("No mam_id available", file=sys.stderr)
sys.exit(1)
s = requests.Session()
s.cookies.set("mam_id", mam_id, domain=".myanonamouse.net")
r = s.get("https://www.myanonamouse.net/jsonLoad.php", timeout=15)
if r.status_code != 200:
push("mam_farming_cookie_expired 1\n")
print(f"Cookie expired: {r.status_code}", file=sys.stderr)
sys.exit(1)
save_cookie(r)
profile = r.json()
ratio = float(profile.get("ratio", 0) or 0)
classname = profile.get("classname", "Mouse")
class_code = CLASS_CODES.get(classname, 0)
# MAM returns `downloaded`/`uploaded` as pretty strings ("715.55 MiB");
# `*_bytes` are the authoritative integer fields.
downloaded = int(profile.get("downloaded_bytes", 0) or 0)
uploaded = int(profile.get("uploaded_bytes", 0) or 0)
bp = int(float(profile.get("seedbonus", 0) or 0))
deficit_bytes = max(0, int(downloaded * TARGET_RATIO) - uploaded)
needed_gib = math.ceil(deficit_bytes / (1024**3)) + 1 if deficit_bytes > 0 else 0
affordable_gib = max(0, (bp - RESERVE_BP) // BP_PER_GB)
# Pick the smallest API tier that satisfies the deficit AND fits the
# budget. If even the smallest tier is too expensive, skip — the cron
# will retry in 6 h once BP has grown.
buy_gib = 0
for tier in API_TIERS_GIB:
if tier >= needed_gib and tier <= affordable_gib:
buy_gib = tier
break
if buy_gib == 0 and needed_gib > 0 and affordable_gib >= API_TIERS_GIB[0]:
# Deficit exceeds all tiers we can afford — buy the largest
# tier that fits to make progress.
for tier in reversed(API_TIERS_GIB):
if tier <= affordable_gib:
buy_gib = tier
break
print(
f"Profile: ratio={ratio} class={classname} "
f"DL={downloaded / 1024**3:.2f} GiB UL={uploaded / 1024**3:.2f} GiB "
f"BP={bp} | deficit={deficit_bytes / 1024**3:.2f} GiB "
f"needed={needed_gib} affordable={affordable_gib} buy={buy_gib}"
)
spent_gib = 0
if buy_gib >= API_TIERS_GIB[0]:
time.sleep(3)
url = (
"https://www.myanonamouse.net/json/bonusBuy.php"
f"?spendtype=upload&amount={buy_gib}"
)
r2 = s.get(url, timeout=15)
save_cookie(r2)
try:
body = r2.json()
except ValueError:
body = {}
ok = r2.status_code == 200 and body.get("success") is True
print(
f"Buy {buy_gib} GiB -> {r2.status_code} "
f"success={body.get('success')} {r2.text[:160]}"
)
if ok:
spent_gib = buy_gib
metrics = (
"mam_farming_cookie_expired 0\n"
f"mam_ratio {ratio}\n"
f'mam_class_code{{classname="{classname}"}} {class_code}\n'
f"mam_downloaded_bytes {downloaded}\n"
f"mam_uploaded_bytes {uploaded}\n"
f"mam_bp_balance {bp}\n"
f"mam_bp_spent_gb {spent_gib}\n"
f"mam_bp_needed_gib {needed_gib}\n"
f"mam_bp_affordable_gib {affordable_gib}\n"
)
push(metrics)
print(
f"Done: BP={bp}, spent={spent_gib} GiB (needed={needed_gib}, "
f"affordable={affordable_gib})"
)
if __name__ == "__main__":
main()

View file

@ -1,264 +0,0 @@
"""
MAM freeleech grabber demand-first, ratio-guarded.
Selects small-but-popular freeleech titles to grow the account's upload
credit. Refuses to grab while the account is in Mouse class or ratio is
below 1.2, because MAM rejects peer-list announces under those conditions
and new grabs only deepen the ratio hole.
Cleanup is handled by `mam-farming-janitor.py`, which runs unconditionally.
"""
import json
import math
import os
import random
import sys
import tempfile
import time
import requests
QB_URL = "http://qbittorrent.servarr.svc.cluster.local"
PUSHGW = "http://prometheus-prometheus-pushgateway.monitoring:9091"
COOKIE_FILE = "/data/mam_id"
GRABBED_IDS_FILE = "/data/grabbed_ids.txt"
MIN_MB = int(os.environ.get("MIN_MB", "50"))
MAX_MB = int(os.environ.get("MAX_MB", "1024"))
LEECHER_FLOOR = int(os.environ.get("LEECHER_FLOOR", "1"))
SEEDER_CEILING = int(os.environ.get("SEEDER_CEILING", "50"))
GRAB_PER_RUN = int(os.environ.get("GRAB_PER_RUN", "5"))
MAX_TORRENTS = int(os.environ.get("MAX_TORRENTS", "500"))
RATIO_FLOOR = float(os.environ.get("RATIO_FLOOR", "1.2"))
REQUEST_SLEEP = float(os.environ.get("REQUEST_SLEEP", "3"))
CLASS_CODES = {
"Mouse": 0,
"Vole": 1,
"User": 2,
"Power User": 3,
"Elite": 4,
"Torrent Master": 5,
"Power TM": 6,
"Elite TM": 7,
"VIP": 8,
}
def parse_size(s):
units = {"B": 1, "KiB": 1024, "MiB": 1024**2, "GiB": 1024**3, "TiB": 1024**4}
parts = s.split()
if len(parts) != 2:
return 0
return int(float(parts[0]) * units.get(parts[1], 1))
def save_cookie(resp):
for c in resp.cookies:
if c.name == "mam_id":
fd, tmp = tempfile.mkstemp(dir="/data")
os.write(fd, c.value.encode())
os.close(fd)
os.rename(tmp, COOKIE_FILE)
return
def push(metrics):
try:
requests.post(
f"{PUSHGW}/metrics/job/mam-freeleech-grabber", data=metrics, timeout=10
)
except Exception as e:
print(f"pushgateway error: {e}", file=sys.stderr)
def load_cookie():
if os.path.exists(COOKIE_FILE):
return open(COOKIE_FILE).read().strip()
return os.environ.get("MAM_ID", "")
def exit_cookie_expired(status):
push("mam_farming_cookie_expired 1\n")
print(f"Cookie expired: {status}", file=sys.stderr)
sys.exit(1)
def main():
mam_id = load_cookie()
if not mam_id:
print("No mam_id available", file=sys.stderr)
sys.exit(1)
s = requests.Session()
s.cookies.set("mam_id", mam_id, domain=".myanonamouse.net")
r = s.get("https://www.myanonamouse.net/jsonLoad.php", timeout=15)
if r.status_code != 200:
exit_cookie_expired(r.status_code)
save_cookie(r)
profile = r.json()
ratio = float(profile.get("ratio", 0) or 0)
classname = profile.get("classname", "Mouse")
# `*_bytes` are authoritative integers; `downloaded`/`uploaded` are
# pretty strings like "715.55 MiB".
downloaded = int(profile.get("downloaded_bytes", 0) or 0)
uploaded = int(profile.get("uploaded_bytes", 0) or 0)
class_code = CLASS_CODES.get(classname, 0)
profile_metrics = (
f"mam_farming_cookie_expired 0\n"
f"mam_ratio {ratio}\n"
f'mam_class_code{{classname="{classname}"}} {class_code}\n'
f"mam_downloaded_bytes {downloaded}\n"
f"mam_uploaded_bytes {uploaded}\n"
)
if ratio < RATIO_FLOOR or classname == "Mouse":
reason = "mouse_class" if classname == "Mouse" else "low_ratio"
print(
f"Skip grab: ratio={ratio} class={classname} (floor={RATIO_FLOOR}) "
f"reason={reason}"
)
push(
profile_metrics
+ f'mam_grabber_skipped_reason{{reason="{reason}"}} 1\n'
+ f"mam_farming_grabbed 0\n"
)
return
time.sleep(REQUEST_SLEEP)
r = s.get("https://t.myanonamouse.net/json/dynamicSeedbox.php", timeout=15)
save_cookie(r)
print(f"Seedbox: {r.text[:80]}")
grabbed_ids = set()
if os.path.exists(GRABBED_IDS_FILE):
raw = open(GRABBED_IDS_FILE).read().strip()
grabbed_ids = set(raw.split("\n")) if raw else set()
try:
all_torrents = requests.get(
f"{QB_URL}/api/v2/torrents/info", timeout=10
).json()
except Exception as e:
print(f"qBittorrent unreachable: {e}", file=sys.stderr)
push(profile_metrics + "mam_farming_grabbed 0\n")
sys.exit(1)
farming = [t for t in all_torrents if t.get("category") == "mam-farming"]
all_names_lower = {t["name"].lower() for t in all_torrents}
total_size = sum(t.get("size", 0) for t in farming)
print(
f"Profile: ratio={ratio} class={classname} | "
f"Farming: {len(farming)}, {total_size / (1024**3):.1f} GiB, "
f"tracked IDs: {len(grabbed_ids)}"
)
grabbed = 0
if len(farming) >= MAX_TORRENTS:
print(f"At max torrents ({MAX_TORRENTS}), skipping grab")
else:
time.sleep(REQUEST_SLEEP)
offset = random.randint(0, 1400)
params = {
"tor[searchType]": "fl",
"tor[searchIn]": "torrents",
"tor[perpage]": "50",
"tor[startNumber]": str(offset),
}
r = s.get(
"https://www.myanonamouse.net/tor/js/loadSearchJSONbasic.php",
params=params,
timeout=15,
)
save_cookie(r)
data = r.json()
results = data.get("data", []) or []
print(
f"Search offset={offset}, found={data.get('found', 0)}, "
f"page_results={len(results)}"
)
candidates = []
for t in results:
tid = str(t.get("id", ""))
if tid in grabbed_ids:
continue
title = t.get("title", "")
if any(title.lower() in n for n in all_names_lower):
grabbed_ids.add(tid)
continue
size = parse_size(t.get("size", "0 B"))
if size < MIN_MB * 1024**2 or size > MAX_MB * 1024**2:
continue
seeders = int(t.get("seeders", 999) or 999)
leechers = int(t.get("leechers", 0) or 0)
if leechers < LEECHER_FLOOR:
continue
if seeders > SEEDER_CEILING:
continue
wedge_bonus = (
200 if (t.get("free") == 1 or t.get("personal_freeleech") == 1) else 0
)
score = leechers * 3 - seeders * 0.5 + wedge_bonus
candidates.append((score, t))
candidates.sort(key=lambda x: -x[0])
for score, t in candidates[:GRAB_PER_RUN]:
time.sleep(REQUEST_SLEEP)
tid = t["id"]
r = s.get(
f"https://www.myanonamouse.net/tor/download.php?tid={tid}", timeout=15
)
save_cookie(r)
if not r.content.startswith(b"d"):
print(f"Bad torrent body for tid={tid}")
grabbed_ids.add(str(tid))
continue
add_resp = requests.post(
f"{QB_URL}/api/v2/torrents/add",
files={
"torrents": (
f"{tid}.torrent",
r.content,
"application/x-bittorrent",
)
},
data={
"savepath": "/downloads/mam-farming",
"category": "mam-farming",
"tags": "mam,freeleech",
},
timeout=20,
)
ok = add_resp.status_code == 200 and add_resp.text.strip() != "Fails."
print(
f"{'Added' if ok else 'FAILED'} (score={score:.1f}): "
f"{t['title'][:60]} ({t['size']}, S:{t.get('seeders')} "
f"L:{t.get('leechers')}) -> {add_resp.status_code}"
)
grabbed_ids.add(str(tid))
if ok:
grabbed += 1
fd, tmp = tempfile.mkstemp(dir="/data")
os.write(fd, "\n".join(grabbed_ids).encode())
os.close(fd)
os.rename(tmp, GRABBED_IDS_FILE)
metrics = (
profile_metrics
+ f"mam_farming_grabbed {grabbed}\n"
+ f"mam_farming_total_seeding {len(farming) + grabbed}\n"
+ f"mam_farming_size_bytes {total_size}\n"
)
push(metrics)
print(f"Done: grabbed={grabbed}")
if __name__ == "__main__":
main()

View file

@ -1,177 +0,0 @@
"""
MAM farming janitor H&R-aware cleanup.
Runs every 15 minutes independently of the grabber's ratio guard: stuck
torrents accumulate fastest precisely when the grabber is skipping. Never
deletes a torrent that's inside MAM's 72-hour Hit-and-Run window.
Set DRY_RUN=1 to log candidates without deleting (used for the first
24 hours after rollout to sanity-check the rules against live state).
"""
import json
import os
import sys
import time
import requests
QB_URL = "http://qbittorrent.servarr.svc.cluster.local"
PUSHGW = "http://prometheus-prometheus-pushgateway.monitoring:9091"
DRY_RUN = os.environ.get("DRY_RUN", "0") == "1"
HNR_SEED_SECONDS = int(os.environ.get("HNR_SEED_SECONDS", str(72 * 3600)))
NEVER_STARTED_AGE = int(os.environ.get("NEVER_STARTED_AGE", str(24 * 3600)))
STALLED_AGE = int(os.environ.get("STALLED_AGE", str(3 * 86400)))
SATISFIED_SEED_AGE = int(os.environ.get("SATISFIED_SEED_AGE", str(3 * 86400)))
SATISFIED_SEEDER_FLOOR = int(os.environ.get("SATISFIED_SEEDER_FLOOR", "5"))
GRACEFUL_SEED_AGE = int(os.environ.get("GRACEFUL_SEED_AGE", str(14 * 86400)))
ZERO_DEMAND_AGE = int(os.environ.get("ZERO_DEMAND_AGE", str(7 * 86400)))
UNREG_KEYWORDS = ("unregistered", "torrent not found", "info hash not authorized")
REASONS = (
"never_started",
"stalled_old",
"satisfied_redundant",
"graceful_retire",
"zero_demand",
"unregistered",
)
def classify(t, now, tracker_msg):
age = now - int(t.get("added_on", 0) or 0)
progress = float(t.get("progress", 0) or 0)
downloaded = int(t.get("downloaded", 0) or 0)
uploaded = int(t.get("uploaded", 0) or 0)
seed_time = int(t.get("seeding_time", 0) or 0)
state = t.get("state", "")
num_complete = int(t.get("num_complete", 0) or 0)
if tracker_msg and any(k in tracker_msg.lower() for k in UNREG_KEYWORDS):
return "unregistered"
if progress < 1.0:
if age > NEVER_STARTED_AGE and downloaded == 0:
return "never_started"
if state == "stalledDL" and age > STALLED_AGE:
return "stalled_old"
return None
if seed_time < HNR_SEED_SECONDS:
return "hnr_window"
if seed_time > GRACEFUL_SEED_AGE:
return "graceful_retire"
if (
seed_time >= HNR_SEED_SECONDS
and uploaded == 0
and age > ZERO_DEMAND_AGE
):
return "zero_demand"
if seed_time > SATISFIED_SEED_AGE and num_complete > SATISFIED_SEEDER_FLOOR:
return "satisfied_redundant"
return None
def fetch_tracker_msg(hash_):
try:
resp = requests.get(
f"{QB_URL}/api/v2/torrents/trackers",
params={"hash": hash_},
timeout=10,
)
trackers = resp.json() or []
except Exception:
return ""
for tr in trackers:
url = tr.get("url", "")
if url.startswith("** ["):
continue
msg = tr.get("msg", "")
if msg:
return msg
return ""
def push(metrics):
try:
requests.post(
f"{PUSHGW}/metrics/job/mam-farming-janitor", data=metrics, timeout=10
)
except Exception as e:
print(f"pushgateway error: {e}", file=sys.stderr)
def main():
try:
all_torrents = requests.get(
f"{QB_URL}/api/v2/torrents/info", timeout=15
).json()
except Exception as e:
print(f"qBittorrent unreachable: {e}", file=sys.stderr)
sys.exit(1)
farming = [t for t in all_torrents if t.get("category") == "mam-farming"]
now = int(time.time())
deleted = {r: 0 for r in REASONS}
preserved_hnr = 0
skipped_active = 0
delete_hashes = []
# Only inspect tracker msg on torrents with a peer problem — avoids
# hundreds of extra API calls when things are healthy.
for t in farming:
state = t.get("state", "")
progress = float(t.get("progress", 0) or 0)
tracker_msg = ""
if progress < 1.0 and state in ("stalledDL", "metaDL", "missingFiles"):
tracker_msg = fetch_tracker_msg(t["hash"])
verdict = classify(t, now, tracker_msg)
if verdict is None:
skipped_active += 1
elif verdict == "hnr_window":
preserved_hnr += 1
else:
deleted[verdict] += 1
delete_hashes.append((t["hash"], verdict, t.get("name", "")[:60]))
for hash_, reason, name in delete_hashes:
if DRY_RUN:
print(f"[DRY_RUN] would delete ({reason}): {name}")
continue
try:
requests.post(
f"{QB_URL}/api/v2/torrents/delete",
data={"hashes": hash_, "deleteFiles": "true"},
timeout=20,
)
print(f"Deleted ({reason}): {name}")
except Exception as e:
print(f"Delete failed for {name}: {e}", file=sys.stderr)
for reason in REASONS:
push(
f'mam_janitor_deleted_per_run{{reason="{reason}"}} '
f"{deleted[reason] if not DRY_RUN else 0}\n"
f'mam_janitor_dry_run_candidates{{reason="{reason}"}} '
f"{deleted[reason] if DRY_RUN else 0}\n"
)
push(
f"mam_janitor_preserved_hnr {preserved_hnr}\n"
f"mam_janitor_skipped_active {skipped_active}\n"
f"mam_janitor_dry_run {1 if DRY_RUN else 0}\n"
f"mam_janitor_last_run_timestamp {now}\n"
)
total = sum(deleted.values())
print(
f"Done: deleted={total} preserved_hnr={preserved_hnr} "
f"skipped_active={skipped_active} dry_run={DRY_RUN}"
)
print(f" per reason: {deleted}")
if __name__ == "__main__":
main()

View file

@ -1,281 +0,0 @@
variable "namespace" {
type = string
default = "servarr"
}
locals {
python_image = "docker.io/library/python:3.12-alpine"
pip_prefix = "pip install -q requests > /dev/null 2>&1; python3 /tmp/script.py"
data_pvc = "mam-farming-data-proxmox"
# Dry-run window was satisfied by a one-shot test on 2026-04-19 that
# produced 466 `never_started` candidates and 0 matches in any other
# reason bucket consistent with Phase B's expected 495 stuck torrents.
# Enforcing from here on.
janitor_dry_run = "0"
}
# ------------------------------- PVC -------------------------------
# Shared scratch volume for cookie + grabbed-ID dedup list. The existing
# in-cluster PVC (kubectl-applied 2026-04-14) is adopted via an `import {}`
# block declared in the root module (servarr/main.tf) Terraform 1.5+
# rejects imports inside child modules.
resource "kubernetes_persistent_volume_claim" "mam_data" {
wait_until_bound = false
metadata {
name = local.data_pvc
namespace = var.namespace
annotations = {
"resize.topolvm.io/threshold" = "80%"
"resize.topolvm.io/increase" = "100%"
"resize.topolvm.io/storage_limit" = "5Gi"
}
}
spec {
access_modes = ["ReadWriteOnce"]
storage_class_name = "proxmox-lvm"
resources {
requests = {
storage = "1Gi"
}
}
}
}
# --------------------------- Grabber ---------------------------------
# Every 30 minutes: skip while ratio < 1.2 or class == Mouse; otherwise
# grab up to 5 small-but-popular freeleech torrents. Existing ConfigMap
# + CronJob are adopted via imports in the parent stack.
resource "kubernetes_config_map" "grabber_script" {
metadata {
name = "mam-freeleech-grabber-script"
namespace = var.namespace
}
data = {
"script.py" = file("${path.module}/files/freeleech-grabber.py")
}
}
resource "kubernetes_cron_job_v1" "grabber" {
metadata {
name = "mam-freeleech-grabber"
namespace = var.namespace
}
spec {
schedule = "*/30 * * * *"
concurrency_policy = "Forbid"
successful_jobs_history_limit = 3
failed_jobs_history_limit = 3
job_template {
metadata {}
spec {
backoff_limit = 2
ttl_seconds_after_finished = 300
template {
metadata {}
spec {
restart_policy = "Never"
container {
name = "freeleech-grabber"
image = local.python_image
command = ["/bin/sh", "-c", local.pip_prefix]
env {
name = "MAM_ID"
value_from {
secret_key_ref {
name = "servarr-secrets"
key = "mam_id"
}
}
}
resources {
requests = { memory = "64Mi", cpu = "10m" }
limits = { memory = "128Mi" }
}
volume_mount {
name = "script"
mount_path = "/tmp/script.py"
sub_path = "script.py"
}
volume_mount {
name = "data"
mount_path = "/data"
}
}
volume {
name = "script"
config_map {
name = kubernetes_config_map.grabber_script.metadata[0].name
}
}
volume {
name = "data"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.mam_data.metadata[0].name
}
}
}
}
}
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}
# --------------------------- BP Spender ------------------------------
# Every 6 hours: compute the upload deficit against TARGET_RATIO and buy
# exactly what we need (+1 GiB margin), capped by BP reserve. Existing
# ConfigMap + CronJob are adopted via imports in the parent stack.
resource "kubernetes_config_map" "bp_spender_script" {
metadata {
name = "mam-bp-spender-script"
namespace = var.namespace
}
data = {
"script.py" = file("${path.module}/files/bp-spender.py")
}
}
resource "kubernetes_cron_job_v1" "bp_spender" {
metadata {
name = "mam-bp-spender"
namespace = var.namespace
}
spec {
schedule = "0 */6 * * *"
concurrency_policy = "Forbid"
successful_jobs_history_limit = 3
failed_jobs_history_limit = 3
job_template {
metadata {}
spec {
backoff_limit = 2
ttl_seconds_after_finished = 300
template {
metadata {}
spec {
restart_policy = "Never"
container {
name = "bp-spender"
image = local.python_image
command = ["/bin/sh", "-c", local.pip_prefix]
env {
name = "MAM_ID"
value_from {
secret_key_ref {
name = "servarr-secrets"
key = "mam_id"
}
}
}
resources {
requests = { memory = "64Mi", cpu = "10m" }
limits = { memory = "128Mi" }
}
volume_mount {
name = "script"
mount_path = "/tmp/script.py"
sub_path = "script.py"
}
volume_mount {
name = "data"
mount_path = "/data"
}
}
volume {
name = "script"
config_map {
name = kubernetes_config_map.bp_spender_script.metadata[0].name
}
}
volume {
name = "data"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.mam_data.metadata[0].name
}
}
}
}
}
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}
# ----------------------------- Janitor -------------------------------
# New: every 15 minutes, independent of grabber ratio guard. Deletes
# stuck/unregistered/redundant torrents in category=mam-farming while
# preserving torrents inside the 72h H&R window.
resource "kubernetes_config_map" "janitor_script" {
metadata {
name = "mam-farming-janitor-script"
namespace = var.namespace
}
data = {
"script.py" = file("${path.module}/files/mam-farming-janitor.py")
}
}
resource "kubernetes_cron_job_v1" "janitor" {
metadata {
name = "mam-farming-janitor"
namespace = var.namespace
}
spec {
schedule = "*/15 * * * *"
concurrency_policy = "Forbid"
successful_jobs_history_limit = 3
failed_jobs_history_limit = 3
job_template {
metadata {}
spec {
backoff_limit = 2
ttl_seconds_after_finished = 300
template {
metadata {}
spec {
restart_policy = "Never"
container {
name = "farming-janitor"
image = local.python_image
command = ["/bin/sh", "-c", local.pip_prefix]
env {
name = "DRY_RUN"
value = local.janitor_dry_run
}
resources {
requests = { memory = "64Mi", cpu = "10m" }
limits = { memory = "128Mi" }
}
volume_mount {
name = "script"
mount_path = "/tmp/script.py"
sub_path = "script.py"
}
}
volume {
name = "script"
config_map {
name = kubernetes_config_map.janitor_script.metadata[0].name
}
}
}
}
}
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}

View file

@ -79,11 +79,11 @@ resource "kubernetes_deployment" "qbittorrent" {
}
spec {
container {
image = "lscr.io/linuxserver/qbittorrent:5.1.4"
image = "lscr.io/linuxserver/qbittorrent:5.0.4"
name = "qbittorrent"
port {
container_port = 8080
container_port = 8787
}
env {
name = "PUID"
@ -113,15 +113,6 @@ resource "kubernetes_deployment" "qbittorrent" {
name = "audiobooks"
mount_path = "/audiobooks"
}
resources {
requests = {
memory = "512Mi"
cpu = "50m"
}
limits = {
memory = "1Gi"
}
}
}
volume {
name = "data"
@ -298,26 +289,21 @@ tracker_stats = defaultdict(lambda: {
})
for t in torrents:
category = (t.get("category") or "").lower()
tracker_url = t.get("tracker", "")
domain = ""
if tracker_url:
try:
domain = (urlparse(tracker_url).hostname or "").lower()
except Exception:
domain = ""
# Category is the only signal for queuedDL torrents whose announces
# haven't happened yet (tracker field is empty). Map those first so
# hundreds of MAM torrents don't collect under "unknown".
if category == "mam-farming" or "myanonamouse" in domain or "mam" in domain:
label = "mam"
elif category.startswith("abb") or "audiobookbay" in domain or "abb" in domain:
label = "audiobookbay"
elif domain:
label = domain.replace(".", "_")
if not tracker_url:
domain = "unknown"
else:
label = "unknown"
try:
domain = urlparse(tracker_url).hostname or "unknown"
except Exception:
domain = "unknown"
if "myanonamouse" in domain or "mam" in domain.lower():
label = "mam"
elif "audiobookbay" in domain or "abb" in domain.lower():
label = "audiobookbay"
else:
label = domain.replace(".", "_")
s = tracker_stats[label]
s["uploaded"] += t.get("uploaded", 0)

View file

@ -1,69 +0,0 @@
# =============================================================================
# CoreDNS Scaling, Anti-Affinity, PDB
# =============================================================================
#
# CoreDNS is kube-system / kubeadm-managed. We only patch replicas + affinity
# here (the Corefile ConfigMap is in main.tf). The hashicorp/kubernetes v3
# provider removed the *_patch resource family from v2, so we apply the
# desired state via `kubectl patch` inside a null_resource. The patch is
# idempotent a no-op when the deployment already matches.
#
# Kubeadm upgrades preserve the replica count on the existing deployment but
# reset the pod template (including affinity) from the ClusterConfiguration.
# Re-running `terraform apply` re-asserts the affinity patch; the readiness
# gate in `readiness.tf` catches regressions if the patch is reverted.
resource "null_resource" "coredns_scale_and_affinity" {
triggers = {
replicas = 3
spec_hash = sha256(file("${path.module}/coredns.tf"))
}
provisioner "local-exec" {
command = <<-BASH
set -euo pipefail
# 1. Scale to 3 replicas.
kubectl -n kube-system scale deploy/coredns --replicas=3
# 2. Switch anti-affinity from preferred required on hostname.
kubectl -n kube-system patch deploy/coredns --type=json -p='[
{
"op": "replace",
"path": "/spec/template/spec/affinity/podAntiAffinity",
"value": {
"requiredDuringSchedulingIgnoredDuringExecution": [
{
"labelSelector": {
"matchExpressions": [
{"key": "k8s-app", "operator": "In", "values": ["kube-dns"]}
]
},
"topologyKey": "kubernetes.io/hostname"
}
]
}
}
]' || true
# 3. Wait for rollout to settle.
kubectl -n kube-system rollout status deploy/coredns --timeout=120s
BASH
interpreter = ["/bin/bash", "-c"]
}
}
# PDB keep at least 2 CoreDNS pods running during voluntary disruptions.
resource "kubernetes_pod_disruption_budget_v1" "coredns" {
metadata {
name = "coredns"
namespace = "kube-system"
}
spec {
min_available = "2"
selector {
match_labels = {
"k8s-app" = "kube-dns"
}
}
}
}

View file

@ -115,11 +115,11 @@ resource "kubernetes_deployment" "technitium_secondary" {
}
resources {
requests = {
cpu = "100m"
memory = "2Gi"
cpu = "25m"
memory = "512Mi"
}
limits = {
memory = "2Gi"
memory = "512Mi"
}
}
port {
@ -270,11 +270,11 @@ resource "kubernetes_deployment" "technitium_tertiary" {
}
resources {
requests = {
cpu = "100m"
memory = "2Gi"
cpu = "25m"
memory = "512Mi"
}
limits = {
memory = "2Gi"
memory = "512Mi"
}
}
port {
@ -391,90 +391,44 @@ resource "kubernetes_cron_job_v1" "technitium_zone_sync" {
set -e
PRIMARY="http://technitium-primary.technitium.svc.cluster.local:5380"
REPLICAS="http://technitium-secondary-web.technitium.svc.cluster.local:5380 http://technitium-tertiary-web.technitium.svc.cluster.local:5380"
PUSHGW="http://prometheus-prometheus-pushgateway.monitoring:9091/metrics/job/technitium-zone-sync"
# Track overall status non-zero if any zone fails to create
OVERALL_STATUS=0
FAIL_COUNT=0
SYNCED=0
# Login to primary
P_TOKEN=$(curl -sf "$PRIMARY/api/user/login?user=$TECH_USER&pass=$TECH_PASS" | sed -n 's/.*"token":"\([^"]*\)".*/\1/p')
if [ -z "$P_TOKEN" ]; then echo "ERROR: Cannot login to primary"; OVERALL_STATUS=1; fi
if [ -z "$P_TOKEN" ]; then echo "ERROR: Cannot login to primary"; exit 1; fi
if [ "$OVERALL_STATUS" -eq 0 ]; then
# Get zones from primary (excluding default zones that don't need replication)
curl -sf "$PRIMARY/api/zones/list?token=$P_TOKEN" | tr ',' '\n' | sed -n 's/.*"name":"\([^"]*\)".*/\1/p' | \
grep -v -E '^(localhost|0\.in-addr\.arpa|127\.in-addr\.arpa|255\.in-addr\.arpa|1\.0\.0.*ip6\.arpa)$$' > /tmp/primary_zones.txt
PRIMARY_COUNT=$(wc -l < /tmp/primary_zones.txt)
echo "Primary has $PRIMARY_COUNT zones to replicate"
# Get zones from primary (excluding default zones that don't need replication)
curl -sf "$PRIMARY/api/zones/list?token=$P_TOKEN" | tr ',' '\n' | sed -n 's/.*"name":"\([^"]*\)".*/\1/p' | \
grep -v -E '^(localhost|0\.in-addr\.arpa|127\.in-addr\.arpa|255\.in-addr\.arpa|1\.0\.0.*ip6\.arpa)$$' > /tmp/primary_zones.txt
echo "Primary has $(wc -l < /tmp/primary_zones.txt) zones to replicate"
# Enable zone transfers on primary for all zones
while read -r zone; do
curl -sf "$PRIMARY/api/zones/options/set?token=$P_TOKEN&zone=$zone&zoneTransfer=Allow" > /dev/null || true
done < /tmp/primary_zones.txt
# Sync to each replica
SYNCED=0
for REPLICA in $REPLICAS; do
R_TOKEN=$(curl -sf "$REPLICA/api/user/login?user=$TECH_USER&pass=$TECH_PASS" | sed -n 's/.*"token":"\([^"]*\)".*/\1/p')
if [ -z "$R_TOKEN" ]; then echo "WARN: Cannot login to $REPLICA, skipping"; continue; fi
# Get existing zones on this replica
curl -sf "$REPLICA/api/zones/list?token=$R_TOKEN" | tr ',' '\n' | sed -n 's/.*"name":"\([^"]*\)".*/\1/p' > /tmp/replica_zones.txt
# Enable zone transfers on primary for all zones
while read -r zone; do
curl -sf "$PRIMARY/api/zones/options/set?token=$P_TOKEN&zone=$zone&zoneTransfer=Allow" > /dev/null || true
done < /tmp/primary_zones.txt
# Sync to each replica
for REPLICA in $REPLICAS; do
R_NAME=$(echo "$REPLICA" | sed 's|http://||; s|-web.*||')
R_TOKEN=$(curl -sf "$REPLICA/api/user/login?user=$TECH_USER&pass=$TECH_PASS" | sed -n 's/.*"token":"\([^"]*\)".*/\1/p')
if [ -z "$R_TOKEN" ]; then
echo "ERROR: Cannot login to $REPLICA"
OVERALL_STATUS=1
FAIL_COUNT=$((FAIL_COUNT + 1))
# Push replica zone_count=0 so divergence alert fires
printf 'technitium_zone_count{instance="%s"} 0\n' "$R_NAME" | \
curl -sf --data-binary @- "$PUSHGW/instance/$R_NAME" || true
continue
if grep -qx "$zone" /tmp/replica_zones.txt; then
# Zone exists just resync
curl -sf "$REPLICA/api/zones/resync?token=$R_TOKEN&zone=$zone" > /dev/null || true
else
# New zone create as Secondary and sync
echo "NEW: Creating $zone on $REPLICA"
curl -sf "$REPLICA/api/zones/create?token=$R_TOKEN&zone=$zone&type=Secondary&primaryNameServerAddresses=$PRIMARY_IP" > /dev/null || true
SYNCED=$((SYNCED + 1))
fi
done < /tmp/primary_zones.txt
done
# Get existing zones on this replica
curl -sf "$REPLICA/api/zones/list?token=$R_TOKEN" | tr ',' '\n' | sed -n 's/.*"name":"\([^"]*\)".*/\1/p' > /tmp/replica_zones.txt
REPLICA_COUNT=$(wc -l < /tmp/replica_zones.txt)
while read -r zone; do
if grep -qx "$zone" /tmp/replica_zones.txt; then
# Zone exists just resync
curl -sf "$REPLICA/api/zones/resync?token=$R_TOKEN&zone=$zone" > /dev/null || true
else
# New zone create as Secondary and validate response
echo "NEW: Creating $zone on $REPLICA"
RESP=$(curl -sf "$REPLICA/api/zones/create?token=$R_TOKEN&zone=$zone&type=Secondary&primaryNameServerAddresses=$PRIMARY_IP" || echo '{"status":"error"}')
if echo "$RESP" | grep -q '"status":"ok"'; then
SYNCED=$((SYNCED + 1))
else
echo "ERROR: Failed to create $zone on $REPLICA: $RESP"
OVERALL_STATUS=1
FAIL_COUNT=$((FAIL_COUNT + 1))
fi
fi
done < /tmp/primary_zones.txt
# Push per-replica zone count
printf 'technitium_zone_count{instance="%s"} %s\n' "$R_NAME" "$REPLICA_COUNT" | \
curl -sf --data-binary @- "$PUSHGW/instance/$R_NAME" || true
done
# Push primary zone count
printf 'technitium_zone_count{instance="primary"} %s\n' "$PRIMARY_COUNT" | \
curl -sf --data-binary @- "$PUSHGW/instance/primary" || true
fi
# Push overall status (0=ok, 1=fail) + last-run timestamp
cat <<METRICS | curl -sf --data-binary @- "$PUSHGW" || true
# HELP technitium_zone_sync_status Zone sync job status (0=ok, 1=fail)
# TYPE technitium_zone_sync_status gauge
technitium_zone_sync_status $OVERALL_STATUS
# HELP technitium_zone_sync_failures Zones that failed to create this run
# TYPE technitium_zone_sync_failures gauge
technitium_zone_sync_failures $FAIL_COUNT
# HELP technitium_zone_sync_last_run Timestamp of last zone-sync run
# TYPE technitium_zone_sync_last_run gauge
technitium_zone_sync_last_run $(date +%s)
METRICS
echo "Zone sync complete. $SYNCED new zone(s) created. $FAIL_COUNT failures. status=$OVERALL_STATUS"
exit $OVERALL_STATUS
echo "Zone sync complete. $$SYNCED new zone(s) created."
SCRIPT
]
env {

View file

@ -60,15 +60,10 @@ resource "kubernetes_config_map" "coredns" {
ttl 30
}
prometheus :9153
forward . 10.0.20.1 8.8.8.8 1.1.1.1 {
policy sequential
health_check 5s
max_fails 2
}
forward . 10.0.20.1 8.8.8.8 1.1.1.1
cache {
success 10000 300 6
denial 10000 300 60
serve_stale 86400s
}
loop
reload
@ -82,14 +77,10 @@ resource "kubernetes_config_map" "coredns" {
rcode NXDOMAIN
fallthrough
}
forward . 10.96.0.53 {
health_check 5s
max_fails 2
}
forward . 10.96.0.53 # Technitium ClusterIP (technitium-dns-internal)
cache {
success 10000 300 6
denial 10000 300 60
serve_stale 86400s
}
}
EOF
@ -170,11 +161,11 @@ resource "kubernetes_deployment" "technitium" {
name = "technitium"
resources {
requests = {
cpu = "100m"
memory = "2Gi"
cpu = "25m"
memory = "512Mi"
}
limits = {
memory = "2Gi"
memory = "512Mi"
}
}
port {
@ -230,10 +221,6 @@ resource "kubernetes_deployment" "technitium" {
}
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].template[0].spec[0].dns_config]
}
}
resource "kubernetes_service" "technitium-web" {

View file

@ -1,100 +0,0 @@
# =============================================================================
# Post-apply readiness gate
# =============================================================================
#
# Runs after all three Technitium deployments + the DNS LB service have been
# applied. Verifies that every instance is rolled out, the API responds, the
# DNS pods answer queries, and zone counts agree. Fails the apply if any
# check fails. No canary this is a hard gate.
#
# Override for emergency maintenance: apply with `-var skip_readiness=true`
# (set via terragrunt inputs when needed), or `terraform apply -target` the
# resources needed without touching this module.
variable "skip_readiness" {
type = bool
default = false
description = "Skip the Technitium readiness gate. Use only for emergency maintenance."
}
resource "null_resource" "technitium_readiness_gate" {
count = var.skip_readiness ? 0 : 1
# Re-run when any deployment image/resource changes, or on every apply
# (timestamp) so transient drift still gets exercised.
triggers = {
primary_digest = sha256(jsonencode(kubernetes_deployment.technitium.spec[0].template[0].spec[0].container[0]))
secondary_digest = sha256(jsonencode(kubernetes_deployment.technitium_secondary.spec[0].template[0].spec[0].container[0]))
tertiary_digest = sha256(jsonencode(kubernetes_deployment.technitium_tertiary.spec[0].template[0].spec[0].container[0]))
corefile = sha256(kubernetes_config_map.coredns.data["Corefile"])
always = timestamp()
}
provisioner "local-exec" {
command = <<-BASH
set -euo pipefail
NS=technitium
echo "=== Technitium readiness gate ==="
# 1. Wait for rollout on all three deployments.
for d in technitium technitium-secondary technitium-tertiary; do
echo "-> rollout status deploy/$d"
kubectl -n $NS rollout status deploy/$d --timeout=180s
done
# 2. Per-pod DNS check + content parity. Technitium pods have `dig` but
# no HTTP client, so we use DNS directly. Each pod must return an A
# record for idrac.viktorbarzin.lan, AND the answer must match across
# all three instances. This catches:
# - Zone not loaded on an instance (NXDOMAIN / empty)
# - Zone drift between primary and replicas (different A record)
# The AXFR chain means all three should converge on the same value.
PODS=$(kubectl -n $NS get pod -l dns-server=true -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}')
if [ -z "$PODS" ]; then
echo "ERROR: no dns-server=true pods found"
exit 1
fi
# Zone load can take tens of seconds after a memory-bump rollout, so retry
# up to 6 times with 10s backoff before giving up.
ANSWERS=""
for POD in $PODS; do
echo "-> dig @127.0.0.1 idrac.viktorbarzin.lan on $POD"
ANSWER=""
for TRY in 1 2 3 4 5 6; do
ANSWER=$(kubectl -n $NS exec "$POD" -- dig +short +time=5 +tries=2 @127.0.0.1 idrac.viktorbarzin.lan A 2>&1 || true)
if echo "$ANSWER" | grep -qE '^[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$'; then
break
fi
echo " attempt $TRY: no A record yet, sleeping 10s"
sleep 10
ANSWER=""
done
if [ -z "$ANSWER" ]; then
echo "ERROR: pod $POD never returned an A record for idrac.viktorbarzin.lan"
exit 1
fi
echo " $POD → $ANSWER"
ANSWERS="$ANSWERS $ANSWER"
done
# 3. Content parity all three instances must agree on the A record.
UNIQ=$(echo "$ANSWERS" | tr ' ' '\n' | grep -v '^$' | sort -u | wc -l)
if [ "$UNIQ" -gt 1 ]; then
echo "ERROR: instances returned different A records for idrac.viktorbarzin.lan: $ANSWERS"
exit 1
fi
echo "=== Technitium readiness gate PASSED ==="
BASH
interpreter = ["/bin/bash", "-c"]
}
depends_on = [
kubernetes_deployment.technitium,
kubernetes_deployment.technitium_secondary,
kubernetes_deployment.technitium_tertiary,
kubernetes_service.technitium-dns,
kubernetes_pod_disruption_budget_v1.technitium_dns,
]
}

View file

@ -295,13 +295,12 @@ resource "kubernetes_service" "torrserver-bt" {
}
module "torrserver_ingress" {
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.tor-proxy.metadata[0].name
name = "torrserver"
tls_secret_name = var.tls_secret_name
port = "8090"
protected = true
external_monitor = false
source = "../../modules/kubernetes/ingress_factory"
namespace = kubernetes_namespace.tor-proxy.metadata[0].name
name = "torrserver"
tls_secret_name = var.tls_secret_name
port = "8090"
protected = true
extra_annotations = {
"gethomepage.dev/enabled" = "true"
"gethomepage.dev/name" = "TorrServer"

View file

@ -252,7 +252,7 @@ resource "kubernetes_deployment" "trading-bot-frontend" {
app = "trading-bot-frontend"
}
annotations = {
"dependency.kyverno.io/wait-for" = "postgresql.dbaas:5432,redis-master.redis:6379"
"dependency.kyverno.io/wait-for" = "postgresql.dbaas:5432,redis.redis:6379"
}
}
spec {
@ -353,7 +353,7 @@ resource "kubernetes_deployment" "trading-bot-workers" {
app = "trading-bot-workers"
}
annotations = {
"dependency.kyverno.io/wait-for" = "postgresql.dbaas:5432,redis-master.redis:6379"
"dependency.kyverno.io/wait-for" = "postgresql.dbaas:5432,redis.redis:6379"
}
}
spec {

View file

@ -552,42 +552,10 @@ locals {
type = "mysql"
database_connection_string = "mysql://uptimekuma@mysql.dbaas.svc.cluster.local:3306"
database_password_vault_key = "uptimekuma_db_password"
hostname = null
port = null
interval = 60
retry_interval = 60
max_retries = 2
},
{
# HAProxy service in redis ns health-checks INFO replication and
# only routes to the current Sentinel-elected master, so this
# survives failover. Bitnami chart has auth disabled, so no
# password_vault_key.
name = "Redis"
type = "redis"
database_connection_string = "redis://redis-master.redis.svc.cluster.local:6379"
database_password_vault_key = null
hostname = null
port = null
interval = 60
retry_interval = 30
max_retries = 3
},
{
# TP-Link home router upstream of pfSense. Complements the
# `[External] gw` HTTPS monitor: this one checks the router
# directly on 443, so we can tell a Cloudflare/tunnel outage
# apart from the router itself being unreachable.
name = "TP-Link Gateway (192.168.1.1)"
type = "port"
database_connection_string = null
database_password_vault_key = null
hostname = "192.168.1.1"
port = 443
interval = 60
retry_interval = 30
max_retries = 3
},
]
}
@ -602,7 +570,6 @@ resource "kubernetes_secret" "internal_monitor_sync" {
for m in local.internal_monitors :
"DB_PASSWORD_${upper(replace(m.name, "/[^A-Za-z0-9]/", "_"))}" =>
data.vault_kv_secret_v2.viktor.data[m.database_password_vault_key]
if m.database_password_vault_key != null
},
)
}
@ -618,9 +585,7 @@ resource "kubernetes_config_map_v1" "internal_monitor_targets" {
name = m.name
type = m.type
database_connection_string = m.database_connection_string
hostname = m.hostname
port = m.port
password_env = m.database_password_vault_key != null ? "DB_PASSWORD_${upper(replace(m.name, "/[^A-Za-z0-9]/", "_"))}" : null
password_env = "DB_PASSWORD_${upper(replace(m.name, "/[^A-Za-z0-9]/", "_"))}"
interval = m.interval
retry_interval = m.retry_interval
max_retries = m.max_retries
@ -669,42 +634,40 @@ existing = {m["name"]: m for m in api.get_monitors()}
for t in targets:
name = t["name"]
mtype = MonitorType(t["type"])
# MYSQL uses `databaseConnectionString` + `radiusPassword` (UK v2 re-uses
# radiusPassword for mysql auth backwards compat). Redis has auth
# disabled on the cluster, so password_env is null. PORT monitors use
# hostname + port directly.
password = os.environ[t["password_env"]]
# MYSQL monitors use `databaseConnectionString` + `radiusPassword`
# (UK v2 re-uses the radiusPassword field for mysql auth backwards compat).
desired = {
"type": mtype,
"type": MonitorType(t["type"]),
"name": name,
"databaseConnectionString": t["database_connection_string"],
"radiusPassword": password,
"interval": t["interval"],
"retryInterval": t["retry_interval"],
"maxretries": t["max_retries"],
}
if mtype == MonitorType.PORT:
desired["hostname"] = t["hostname"]
desired["port"] = t["port"]
else:
desired["databaseConnectionString"] = t["database_connection_string"]
if t.get("password_env"):
desired["radiusPassword"] = os.environ[t["password_env"]]
if name not in existing:
print(f"Creating monitor: {name}")
api.add_monitor(**desired)
continue
m = existing[name]
drift_fields = ["interval", "retryInterval", "maxretries"]
if mtype == MonitorType.PORT:
drift_fields += ["hostname", "port"]
else:
drift_fields += ["databaseConnectionString"]
if "radiusPassword" in desired:
drift_fields += ["radiusPassword"]
drifted = any(m.get(f) != desired.get(f) for f in drift_fields)
drifted = (
m.get("databaseConnectionString") != desired["databaseConnectionString"]
or m.get("radiusPassword") != desired["radiusPassword"]
or m.get("interval") != desired["interval"]
or m.get("retryInterval") != desired["retryInterval"]
or m.get("maxretries") != desired["maxretries"]
)
if drifted:
print(f"Updating monitor {name} (id={m['id']})")
edit_kwargs = {f: desired[f] for f in drift_fields if f in desired}
api.edit_monitor(m["id"], **edit_kwargs)
api.edit_monitor(
m["id"],
databaseConnectionString=desired["databaseConnectionString"],
radiusPassword=desired["radiusPassword"],
interval=desired["interval"],
retryInterval=desired["retryInterval"],
maxretries=desired["maxretries"],
)
else:
print(f"Monitor {name} (id={m['id']}) already in desired state")
time.sleep(0.3)

View file

@ -394,14 +394,9 @@ resource "vault_kubernetes_auth_backend_role" "ci" {
role_name = "ci"
bound_service_account_names = ["default"]
bound_service_account_namespaces = ["woodpecker"]
# terraform_state policy grants `database/static-creds/pg-terraform-state`
# read scripts/tg needs this to fetch the Tier-1 PG backend password.
# Without it, CI's per-stack `tg apply` dies with
# `ERROR: Cannot read PG credentials from Vault` and the default.yml
# apply-loop swallows the exit code (set +e) fixed in bd code-e1x.
token_policies = [vault_policy.ci.name, vault_policy.terraform_state.name]
token_ttl = 604800 # 7d
token_period = 604800 # periodic: auto-renews indefinitely
token_policies = [vault_policy.ci.name]
token_ttl = 604800 # 7d
token_period = 604800 # periodic: auto-renews indefinitely
}
# --- ESO Policy & Role ---

View file

@ -9,10 +9,6 @@ terraform {
source = "cloudflare/cloudflare"
version = "~> 4"
}
authentik = {
source = "goauthentik/authentik"
version = "~> 2024.10"
}
}
}

View file

@ -133,7 +133,7 @@ SLACK_BOT_TOKEN = os.getenv("SLACK_BOT_TOKEN", "")
SLACK_CHANNEL = os.getenv("SLACK_CHANNEL", "automation")
# Redis configuration
REDIS_URL = os.getenv("REDIS_URL", "redis://redis-master.redis.svc.cluster.local:6379/0")
REDIS_URL = os.getenv("REDIS_URL", "redis://redis.redis.svc.cluster.local:6379/0")
REDIS_PREFIX = "yt-highlights:"
# Paths

File diff suppressed because one or more lines are too long