Commit graph

10 commits

Author SHA1 Message Date
5258f09230 mailserver: decommission SendGrid
Remove leftover SendGrid references after the Brevo migration was completed:

- Delete TF `cloudflare_record.mail_domainkey` (TXT at `s1._domainkey`,
  SendGrid-era DKIM, hidden behind the SendGrid CNAME but would re-emerge
  once the CNAME is removed).
- Clean up commented-out `smtp.sendgrid.net` relayhost references and the
  `# For sendgrid` comment on `sasl_passwd` in the mailserver module.

DNS records deleted out-of-band (not TF-managed):
- CF: `s1._domainkey CNAME` + `s2._domainkey CNAME` → sendgrid.net (manual entries)
- Technitium internal `viktorbarzin.me`: `em7107`, `s1._domainkey`,
  `s2._domainkey` CNAMEs → sendgrid.net

Verified end-to-end mail flow unaffected (Brevo outbound + IMAP receive,
roundtrip 20.4s — identical to baseline). Active DKIM (`mail._domainkey`
local + `brevo1/brevo2._domainkey` Brevo) untouched.
2026-05-22 20:08:38 +00:00
8ff74bb422 cloudflare: disable AI bot edge-block so x402 can issue payment offers
CF zone was returning 403 to declared AI-bot UAs at the edge
(`ai_bots_protection: "block"`). That meant the in-cluster x402
gateway never saw the request and could never issue an HTTP 402 with
the wallet payment requirements — the bot just bounced.

Adopt `cloudflare_bot_management.zone` via root-module import block,
flip ai_bots_protection to "disabled". Bot Fight Mode (`fight_mode`),
crawler challenge (`crawler_protection`), and managed robots.txt are
unaffected — generic automated traffic still gets the bot fight gate.

End-to-end verified: `User-Agent: Mozilla/5.0 (compatible; ClaudeBot/
1.0;...)` on viktorbarzin.me now returns HTTP 402 (was 403 CF block)
with `payTo=0xCc33...659f`, `amount=10000` micro-USDC, `network=base`.

Trade-off: bots that don't pay still hit origin (instead of CF
blackholing them), so a small bandwidth uptick. Negligible at our
traffic level.
2026-05-10 18:37:29 +00:00
Viktor Barzin
8f5e131572 [mailserver] Route DMARC rua/ruf to dmarc@viktorbarzin.me [ci skip]
## Context

Mailgun was decommissioned on 2026-04-12 in favour of Brevo as the outbound
SMTP relay. The DMARC aggregate (`rua`) and forensic (`ruf`) report targets
still pointed at `e21c0ff8@dmarc.mailgun.org`, an inbox that no longer
exists — meaning every DMARC report Google/Microsoft/etc. generate has
been bouncing or silently dropped for six days. No alerts fire on this
(DMARC reports are best-effort, not RFC-mandated), but we've lost visibility
into alignment failures and spoofing attempts during the exact window where
the SPF/DKIM/DMARC posture was being reshaped for the Brevo cutover.

Decision (2026-04-18): route reports to `mailto:dmarc@viktorbarzin.me`.
The mailserver's catch-all sieve delivers anything to non-existent
local-parts into `spam@`, so `dmarc@` does not need to be provisioned as
a real mailbox — the inbox will land in `spam@`'s maildir unchanged.

Alternative considered: route to a dedicated `dmarc@` maildir with sieve
rules to file into a folder. Rejected for now — the monitoring value of
DMARC reports is low-frequency (one aggregate per reporter per day at
most), so the catch-all path is good enough until volume justifies a
proper parser. Can be revisited once we see actual report traffic.

The third-party aggregator target `adb84997@inbox.ondmarc.com` (Red Sift
OnDMARC) is preserved in both rua and ruf — it provides parsed dashboards
that we actually read. The `postmaster@viktorbarzin.me` ruf-only target
also stays as a local mirror.

As a side effect, this apply also canonicalises the TXT record: the
previous value was stored as a two-string split in Cloudflare state
(`...viktorbarzin" ".me;"`) due to the 255-byte TXT string limit
(the record length exceeded 255 chars). The new value is shorter
(dmarc@viktorbarzin.me is 21 chars vs e21c0ff8@dmarc.mailgun.org's
26 chars, doubled across rua and ruf) and fits in a single string,
so the provider serialises it as one string and the prior split-drift
noise disappears from future plans.

## This change

Single-line content edit on `cloudflare_record.mail_dmarc` in
`stacks/cloudflared/modules/cloudflared/cloudflare.tf`:

Before → After (rua and ruf, both):
```
mailto:e21c0ff8@dmarc.mailgun.org  →  mailto:dmarc@viktorbarzin.me
```

All other DMARC tags unchanged: `v=DMARC1`, `p=quarantine`, `pct=100`,
`fo=1`, `ri=3600`, `sp=quarantine`, `adkim=r`, `aspf=r`.

Delivery flow:
```
DMARC reporter (Gmail/Outlook/...)
      │ aggregate XML.gz to rua / forensic to ruf
      ▼
dmarc@viktorbarzin.me
      │ mailserver catch-all (no local recipient)
      ▼
spam@viktorbarzin.me (Viki's mailbox)
```

## What is NOT in this change

- **Mailbox sieve rules** to file DMARC reports into a dedicated folder
  (separate concern; deferred until traffic justifies it).
- **DMARC parser / dashboard**. OnDMARC (adb84997@inbox.ondmarc.com)
  already provides this for aggregate reports.
- **Policy tightening** (`p=reject`, `pct` ramp) — out of scope.
- **SPF / DKIM records** — not touched.
- **Removal of the split-string drift suppression**, if any existed in
  prior work. The canonicalisation happens naturally on this apply;
  no separate workaround was needed.

## Test Plan

### Automated

Targeted terragrunt plan + apply via `scripts/tg`:

```
$ cd stacks/cloudflared && scripts/tg plan \
    -target=module.cloudflared.cloudflare_record.mail_dmarc
...
Terraform will perform the following actions:
  # module.cloudflared.cloudflare_record.mail_dmarc will be updated in-place
  ~ resource "cloudflare_record" "mail_dmarc" {
      ~ content = "\"v=DMARC1; ...
                    rua=mailto:e21c0ff8@dmarc.mailgun.org,
                        mailto:adb84997@inbox.ondmarc.com; ...
                    ruf=mailto:e21c0ff8@dmarc.mailgun.org,
                        mailto:adb84997@inbox.ondmarc.com,
                        mailto:postmaster@viktorbarzin\" \".me;\""
                -> "\"v=DMARC1; ...
                    rua=mailto:dmarc@viktorbarzin.me,
                        mailto:adb84997@inbox.ondmarc.com; ...
                    ruf=mailto:dmarc@viktorbarzin.me,
                        mailto:adb84997@inbox.ondmarc.com,
                        mailto:postmaster@viktorbarzin.me;\""
    }
Plan: 0 to add, 1 to change, 0 to destroy.

$ scripts/tg apply /tmp/dmarc.tfplan
module.cloudflared.cloudflare_record.mail_dmarc: Modifying...
module.cloudflared.cloudflare_record.mail_dmarc: Modifications complete after 1s
Apply complete! Resources: 0 added, 1 changed, 0 destroyed.
```

Authoritative DNS post-apply:

```
$ dig TXT _dmarc.viktorbarzin.me @evan.ns.cloudflare.com +short
"v=DMARC1; p=quarantine; pct=100; fo=1; ri=3600; sp=quarantine; adkim=r; aspf=r; rua=mailto:dmarc@viktorbarzin.me,mailto:adb84997@inbox.ondmarc.com; ruf=mailto:dmarc@viktorbarzin.me,mailto:adb84997@inbox.ondmarc.com,mailto:postmaster@viktorbarzin.me;"
```

Note: `dig @1.1.1.1` still served the old value immediately after apply —
Cloudflare's public resolver holds its cache until TTL expires
(TTL=1/auto ≈ 5 min). Authoritative NS is the source of truth.

### Manual Verification

**Setup**: none (DNS change only).

**Commands**:
```
# 1. Confirm authoritative DNS (run now, should pass)
dig TXT _dmarc.viktorbarzin.me @evan.ns.cloudflare.com +short
# Expected: rua=mailto:dmarc@viktorbarzin.me,... and ruf similarly.

# 2. Confirm public resolver catches up (run after ~5min)
dig TXT _dmarc.viktorbarzin.me @1.1.1.1 +short
# Expected: same as above (no more mailgun.org entries).

# 3. Within 24-48h, check Viki's spam@ inbox for an incoming DMARC
#    aggregate report from Google/Microsoft/etc. Reports are
#    typically .zip or .gz attachments with XML inside.
```

**Interpretation**: seeing a DMARC report land in spam@ proves the
end-to-end delivery path works: reporter DNS lookup → _dmarc.viktorbarzin.me
→ mailto:dmarc@viktorbarzin.me → catch-all → spam@ maildir.

## Reproduce locally

```
1. git pull
2. cd stacks/cloudflared
3. dig TXT _dmarc.viktorbarzin.me @evan.ns.cloudflare.com +short
4. Expected: rua=mailto:dmarc@viktorbarzin.me (and ruf the same).
```

Closes: code-569

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 23:49:14 +00:00
Viktor Barzin
b9e9c3f084 [mailserver] Update SPF + docs for Brevo migration [ci skip]
## Context

Outbound mail relay migrated from Mailgun EU to Brevo EU on 2026-04-12 when
variables.tf:6 of the mailserver stack was switched to `smtp-relay.brevo.com:587`.
Postfix immediately began using Brevo for user mail — but the SPF TXT record
at viktorbarzin.me was left pointing at `include:mailgun.org -all`, so every
Brevo-relayed message failed SPF alignment and was spam-foldered or
DMARC-quarantined by Gmail/Outlook.

Observed on 2026-04-18 via `dig TXT viktorbarzin.me @1.1.1.1`:

    "v=spf1 include:mailgun.org -all"  <-- wrong sender network

User decision (2026-04-18): switch to `v=spf1 include:spf.brevo.com ~all`.
Soft-fail (`~all`) is intentional during cutover — keeps unauthorized Brevo
sends quarantined rather than outright rejected while we validate Brevo's
sending IPs + rate limits for real user mail. Tighten to `-all` once the
relay is proven stable.

The docs in `docs/architecture/mailserver.md` still described the old
Mailgun-based configuration (Overview paragraph, DNS table, Vault secrets
table). Per `infra/.claude/CLAUDE.md` rule "Update docs with every change",
those are updated in the same commit.

## This change

Coupled commit covering beads tasks code-q8p (SPF) + code-9pe (docs):

1. `stacks/cloudflared/modules/cloudflared/cloudflare.tf` — SPF TXT content
   flipped from `include:mailgun.org -all` to `include:spf.brevo.com ~all`,
   with an inline comment pointing at the mailserver docs for rationale.
2. `docs/architecture/mailserver.md` —
   - Last-updated stamp moved to 2026-04-18 with the cutover note.
   - Overview paragraph now says "relays through Brevo EU" (was Mailgun).
   - DNS table SPF row reflects the new value plus an annotated history
     note ("was include:mailgun.org -all until 2026-04-18").
   - DMARC row now calls out the intended `dmarc@viktorbarzin.me` rua
     target and flags that the current live record still points at
     e21c0ff8@dmarc.mailgun.org, tracked under follow-up code-569.
   - Vault secrets table: `mailserver_sasl_passwd` relabelled as Brevo
     relay credentials; `mailgun_api_key` annotated as retained for the
     E2E roundtrip probe only (inbound delivery testing, not user mail).

Apply was scoped with `-target=module.cloudflared.cloudflare_record.mail_spf`
to avoid sweeping up two unrelated pre-existing drifts that the Terraform
state shows on this stack: the DMARC + mail._domainkey_rspamd records are
stored on Cloudflare as RFC-compliant split TXT strings (>255 bytes), and
a naive refresh+apply would normalize them in the state back to single
strings. Those drifts are semantically equivalent (DNS concatenates
adjacent TXT strings at resolution time) and are out of scope for this
commit — they'll be handled under their own ticket.

## What is NOT in this change

- DMARC `rua=mailto:dmarc@viktorbarzin.me` cutover — that's code-569 (M1),
  still using the legacy `e21c0ff8@dmarc.mailgun.org` + ondmarc addresses
  in the live record.
- DMARC/DKIM TXT multi-string state reconciliation on `mail_dmarc` and
  `mail_domainkey_rspamd` — pre-existing Cloudflare representation drift,
  untouched here.
- Removal of Mailgun references in history/decision sections of the docs,
  or the Mailgun-backed E2E roundtrip probe — probe still uses Mailgun API
  on purpose for inbound delivery testing (code-569 scope).
- Mailgun DKIM record `s1._domainkey` — left in place; still consumed by
  the roundtrip probe.
- Other pending items from the 2026-04-18 mail audit plan.

## Test Plan

### Automated

Targeted plan showed exactly one change, no other drift sneaking in:

    module.cloudflared.cloudflare_record.mail_spf will be updated in-place
      ~ content = "\"v=spf1 include:mailgun.org -all\""
             -> "\"v=spf1 include:spf.brevo.com ~all\""
    Plan: 0 to add, 1 to change, 0 to destroy.

Apply result:

    Apply complete! Resources: 0 added, 1 changed, 0 destroyed.

DNS propagation verified on three independent resolvers immediately after
apply:

    $ dig TXT viktorbarzin.me @1.1.1.1 +short | grep spf
    "v=spf1 include:spf.brevo.com ~all"

    $ dig TXT viktorbarzin.me @8.8.8.8 +short | grep spf
    "v=spf1 include:spf.brevo.com ~all"

    $ dig TXT viktorbarzin.me @10.0.20.201 +short | grep spf   # Technitium primary
    "v=spf1 include:spf.brevo.com ~all"

### Manual Verification

Setup: nothing extra — change is already live (TF applied before commit
per home-lab convention; `[ci skip]` in title).

1. Confirm SPF is the Brevo-only record from an external resolver:

       dig TXT viktorbarzin.me @1.1.1.1 +short

   Expected: `"v=spf1 include:spf.brevo.com ~all"` — no Mailgun reference.

2. Send a test email via the mailserver (through Brevo relay) to a Gmail
   account and view the original headers:

       Authentication-Results: ... spf=pass smtp.mailfrom=viktorbarzin.me
       ...
       Received-SPF: Pass (google.com: domain of ... designates ... as
       permitted sender)

   Expected: `spf=pass` (it was `spf=fail` or `spf=softfail` before this
   change because the envelope sender IP was a Brevo IP not covered by
   `include:mailgun.org`).

3. Confirm no live Mailgun references in the mailserver doc:

       grep -n mailgun.org infra/docs/architecture/mailserver.md

   Expected: only annotated-history mentions — SPF "was ... until
   2026-04-18" and DMARC "current live record still points at
   e21c0ff8@dmarc.mailgun.org pending cutover". No claims of active
   Mailgun relay.

## Reproduce locally

    cd infra
    git pull
    dig TXT viktorbarzin.me @1.1.1.1 +short | grep spf
    # expected: "v=spf1 include:spf.brevo.com ~all"

    # inspect the TF change:
    git show HEAD -- stacks/cloudflared/modules/cloudflared/cloudflare.tf

    # inspect the doc change:
    git show HEAD -- docs/architecture/mailserver.md

Closes: code-q8p
Closes: code-9pe

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 23:13:47 +00:00
Viktor Barzin
b1d152be1f [infra] Auto-create Cloudflare DNS records from ingress_factory
## Context

Deploying new services required manually adding hostnames to
cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars —
a separate file from the service stack. This was frequently forgotten,
leaving services unreachable externally.

## This change:

- Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory`
  modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates
  the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP).
- Simplify cloudflared tunnel from 100 per-hostname rules to wildcard
  `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing.
- Add global Cloudflare provider via terragrunt.hcl (separate
  cloudflare_provider.tf with Vault-sourced API key).
- Migrate 118 hostnames from centralized config.tfvars to per-service
  dns_type. 17 hostnames remain centrally managed (Helm ingresses,
  special cases).
- Update docs, AGENTS.md, CLAUDE.md, dns.md runbook.

```
BEFORE                          AFTER
config.tfvars (manual list)     stacks/<svc>/main.tf
        |                         module "ingress" {
        v                           dns_type = "proxied"
stacks/cloudflared/               }
  for_each = list                     |
  cloudflare_record               auto-creates
  tunnel per-hostname             cloudflare_record + annotation
```

## What is NOT in this change:

- Uptime Kuma monitor migration (still reads from config.tfvars)
- 17 remaining centrally-managed hostnames (Helm, special cases)
- Removal of allow_overwrite (keep until migration confirmed stable)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:45:04 +00:00
Viktor Barzin
1c300a14cf mailserver: overhaul inbound delivery, monitoring, CrowdSec, and migrate to Brevo relay
Inbound:
- Direct MX to mail.viktorbarzin.me (ForwardEmail relay attempted and abandoned)
- Dedicated MetalLB IP 10.0.20.202 with ETP: Local for CrowdSec real-IP detection
- Removed Cloudflare Email Routing (can't store-and-forward)
- Fixed dual SPF violation, hardened to -all
- Added MTA-STS, TLSRPT, imported Rspamd DKIM into Terraform
- Removed dead BIND zones from config.tfvars (199 lines)

Outbound:
- Migrated from Mailgun (100/day) to Brevo (300/day free)
- Added Brevo DKIM CNAMEs and verification TXT

Monitoring:
- Probe frequency: 30m → 20m, alert thresholds adjusted to 60m
- Enabled Dovecot exporter scraping (port 9166)
- Added external SMTP monitor on public IP

Documentation:
- New docs/architecture/mailserver.md with full architecture
- New docs/architecture/mailserver-visual.html visualization
- Updated monitoring.md, CLAUDE.md, historical plan docs
2026-04-12 22:24:38 +01:00
Viktor Barzin
82b0f6c4cb truenas deprecation: migrate all non-immich storage to proxmox NFS
- Migrate 7 backup CronJobs to Proxmox host NFS (192.168.1.127)
  (etcd, mysql, postgresql, nextcloud, redis, vaultwarden, plotting-book)
- Migrate headscale backup, ebook2audiobook, osm_routing to Proxmox NFS
- Migrate servarr (lidarr, readarr, soulseek) NFS refs to Proxmox
- Remove 79 orphaned TrueNAS NFS module declarations from 49 stacks
- Delete stacks/platform/modules/ (27 dead module copies, 65MB)
- Update nfs-truenas StorageClass to point to Proxmox (192.168.1.127)
- Remove iscsi DNS record from config.tfvars
- Fix woodpecker persistence config and alertmanager PV

Only Immich (8 PVCs, ~1.4TB) remains on TrueNAS.
2026-04-12 14:35:39 +01:00
Viktor Barzin
c49e4561a3 consolidate MetalLB IPs: 5 → 1 (10.0.20.200)
Migrate all 11 LoadBalancer services to share 10.0.20.200:
- Update annotations: metallb.universe.tf → metallb.io
- Pin all services to 10.0.20.200 with allow-shared-ip: shared
- Standardize externalTrafficPolicy to Cluster (required for IP sharing)
- Remove redundant port 80 (roundcube) from mailserver LB
- Update CoreDNS forward: 10.0.20.204 → 10.0.20.200
- Update cloudflared tunnel target: 10.0.20.202 → 10.0.20.200

Services consolidated: coturn, headscale, kms, qbittorrent, shadowsocks,
torrserver, wireguard, mailserver, traefik, xray, technitium
2026-03-24 18:35:43 +02:00
Viktor Barzin
644562454c add IPv6 connectivity via Hurricane Electric 6in4 tunnel
- Add public_ipv6 variable and AAAA records for all 34 non-proxied services
- Fix stale DNS records (85.130.108.6 → 176.12.22.76, old IPv6 → HE tunnel)
- Update SPF record with current IPv4/IPv6 addresses
- Add AAAA update support to Technitium DNS updater CLI
- Pin mailserver MetalLB IP to 10.0.20.201 for stable pfSense NAT
- pfSense: HE_IPv6 interface, strict firewall (80,443,25,465,587,993 + ICMPv6),
  socat IPv6→IPv4 proxy, removed dangerous "Allow all DEBUG" rules
2026-03-23 02:22:00 +02:00
Viktor Barzin
ae36dc253b extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip]
Phase 2 of platform stack split. 5 more modules extracted into
independent stacks. All applied successfully with zero destroys.
Cloudflared now reads k8s_users from Vault directly to compute
user_domains. Woodpecker pipeline runs all 8 extracted stacks
in parallel. Memory bumped to 6Gi for 9 concurrent TF processes.
Platform reduced from 27 to 19 modules.
2026-03-17 21:34:11 +00:00