Commit graph

6 commits

Author SHA1 Message Date
Viktor Barzin
6ee283c2f0 [docs] Document external-monitor opt-out mechanism in monitoring.md
The doc said monitors were created for everything in cloudflare_proxied_names,
but since the k8s-api discovery rewrite the ConfigMap is a fallback only.
Describe the opt-OUT semantics and how external_monitor=false on a factory
call translates to the sync script's skip annotation.
2026-04-19 15:19:06 +00:00
Viktor Barzin
f8facf44dd [infra] Fix rewrite-body plugin + cleanup TrueNAS + version bumps
## Context

The rewrite-body Traefik plugin (packruler/rewrite-body v1.2.0) silently
broke on Traefik v3.6.12 — every service using rybbit analytics or anti-AI
injection returned HTTP 200 with "Error 404: Not Found" body. Root cause:
middleware specs referenced plugin name `rewrite-body` but Traefik registered
it as `traefik-plugin-rewritebody`.

Migrated to maintained fork `the-ccsn/traefik-plugin-rewritebody` v0.1.3
which uses the correct plugin name. Also added `lastModified = true` and
`methods = ["GET"]` to anti-AI middleware to avoid rewriting non-HTML
responses.

## This change

- Replace packruler/rewrite-body v1.2.0 with the-ccsn/traefik-plugin-rewritebody v0.1.3
- Fix plugin name in all 3 middleware locations (ingress_factory, reverse-proxy factory, traefik anti-AI)
- Remove deprecated TrueNAS cloud sync monitor (VM decommissioned 2026-04-13)
- Remove CloudSyncStale/CloudSyncFailing/CloudSyncNeverRun alerts
- Fix PrometheusBackupNeverRun alert (for: 48h → 32d to match monthly sidecar schedule)
- Bump versions: rybbit v1.0.21→v1.1.0, wealthfolio v1.1.0→v3.2,
  networking-toolbox 1.1.1→1.6.0, cyberchef v10.24.0→v9.55.0
- MySQL standalone storage_limit 30Gi → 50Gi
- beads-server: fix Dolt workbench type casing, remove Authentik on GraphQL endpoint

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 05:51:52 +00:00
Viktor Barzin
d31bbc9a18 docs: update monitoring and backup docs for external monitors and per-db backups
- CLAUDE.md: document external monitoring (ExternalAccessDivergence alert,
  external-monitor-sync CronJob) and per-database backup/restore paths
- backup-dr.md: add per-db backup CronJobs to inventory table and daily
  timeline, update restore runbook references
- monitoring.md: add External Monitor Sync component and external monitoring
  architecture section

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 06:37:07 +00:00
Viktor Barzin
1c300a14cf mailserver: overhaul inbound delivery, monitoring, CrowdSec, and migrate to Brevo relay
Inbound:
- Direct MX to mail.viktorbarzin.me (ForwardEmail relay attempted and abandoned)
- Dedicated MetalLB IP 10.0.20.202 with ETP: Local for CrowdSec real-IP detection
- Removed Cloudflare Email Routing (can't store-and-forward)
- Fixed dual SPF violation, hardened to -all
- Added MTA-STS, TLSRPT, imported Rspamd DKIM into Terraform
- Removed dead BIND zones from config.tfvars (199 lines)

Outbound:
- Migrated from Mailgun (100/day) to Brevo (300/day free)
- Added Brevo DKIM CNAMEs and verification TXT

Monitoring:
- Probe frequency: 30m → 20m, alert thresholds adjusted to 60m
- Enabled Dovecot exporter scraping (port 9166)
- Added external SMTP monitor on public IP

Documentation:
- New docs/architecture/mailserver.md with full architecture
- New docs/architecture/mailserver-visual.html visualization
- Updated monitoring.md, CLAUDE.md, historical plan docs
2026-04-12 22:24:38 +01:00
Viktor Barzin
78dec8f0ad add e2e email roundtrip monitoring
CronJob (every 30 min) sends test email via Mailgun API to
smoke-test@viktorbarzin.me, verifies IMAP delivery in spam@ catch-all,
deletes test email, pushes metrics to Pushgateway + Uptime Kuma.

Prometheus alerts: EmailRoundtripFailing, EmailRoundtripStale,
EmailRoundtripNeverRun. Uptime Kuma: SMTP/IMAP port checks + E2E push.
2026-03-25 22:50:22 +02:00
Viktor Barzin
5a42643176 add architecture documentation for all infrastructure subsystems [ci skip]
14 docs covering networking, VPN, storage, authentication, security,
monitoring, secrets, CI/CD, backup/DR, compute, databases, and
multi-tenancy. Each doc includes Mermaid diagrams, component tables,
configuration references, decision rationale, and troubleshooting.
2026-03-24 00:55:25 +02:00