[mailserver] Add targeted retention for spam@ mailbox

## Context

The @viktorbarzin.me catch-all routes to spam@viktorbarzin.me. The
mailbox had no retention policy. On 2026-04-18 it held 519 messages
consuming 43 MiB. Without a policy, the only brake on growth was
manual deletion, which has not been happening - hence the bd task.

Viktor's explicit constraint when filing code-oy4: DO NOT blind
age-expunge. We need targeted retention that keeps genuine forwarded
human mail for a long time while shedding the recurring-newsletter
cruft that dominates the byte count.

## Profile findings (2026-04-18, verified on the live pod)

Total: 519 messages, 43 MiB, 0 in new/, 0 in tmp/.

Top senders by volume:
   138  dan@tldrnewsletter.com
    51  hi@ratepunk.com
    40  uber@uber.com
    35  truenas@viktorbarzin.me
    19  ubereats@uber.com
    15  hello@travel.jacksflightclub.com
    12  chris@chriswillx.com
    10  me@viktorbarzin.me

Top senders by storage bytes:
   8,176,481  dan@tldrnewsletter.com  (19 % of 43 MiB alone)
   2,866,104  uber@uber.com
   2,207,458  noreply@mail.selfh.st
   2,066,094  hi@ratepunk.com
   1,675,435  ubereats@uber.com

Age distribution:
    97 %  older than 14 days (502 / 519)
    23 %  older than 90 days (121 / 519)

Automated-sender markers:
    66 %  carry List-Unsubscribe:                   (342 / 519)
     4 %  carry Precedence: bulk|list|junk          ( 21 / 519)
    34 %  carry neither marker (= human-ish tail)   (177 / 519)

Combined "automated AND >14d": 328 messages -> target of rule 1.

## Retention strategy

Signed off by Viktor 2026-04-18. Two rules, both delete-leaf:

  1. Older than 14 days AND header matches one of:
       - `^List-Unsubscribe:`
       - `^Precedence:\s*(bulk|list|junk)`
       - `^Auto-Submitted:\s*auto-`
     -> DELETE.
     Rationale: these markers are the RFC-agreed indicators of bulk /
     robotic senders. A 14-day window still lets genuine subscription
     alerts (delivery, flight, calendar invite) come to attention.

  2. Older than 90 days AND no automated marker at all
     -> DELETE.
     Rationale: these are long-tail forwards from real people to the
     catch-all. 90 days is deliberately generous - I would rather
     leak bytes than lose Viktor's personal correspondence.

  3. Everything else -> KEEP (recent traffic, or aged human tail
     younger than 90d).

## Implementation

A `kubernetes_cron_job_v1.spam_retention` running every 4h (at :17
past) that `kubectl exec`s a Python retention script into the
mailserver pod.

Why kubectl exec and not a sibling CronJob with the Maildir mounted:
mailserver-data-encrypted is a RWO volume held by the mailserver
pod. A sibling would fail to attach. The nextcloud-watchdog pattern
in stacks/nextcloud/main.tf already solves this for a similar
"interact with the live pod on a schedule" shape. Mirrored here with
its own SA + Role + RoleBinding scoped to list/get pods and create
pods/exec in the mailserver namespace only.

Why Python and not pure shell: POSIX `find + stat + awk` struggles
with the header-scan-up-to-blank-line rule, and `stat -c` is Linux-
GNU-specific anyway. The script reads each message's first 64 KiB,
stops at the first blank line, scans headers only, then checks mtime.

The CronJob streams the Python source via `kubectl exec -i ... --
python3 - <<PYEOF`. After the retention pass, `doveadm force-resync
-u spam@viktorbarzin.me INBOX/spam` refreshes Dovecot's cached index
so the deletions appear in IMAP immediately instead of after the next
pod restart.

Includes the standard KYVERNO_LIFECYCLE_V1 marker on the CronJob so
Kyverno ndots mutation does not cause perpetual drift.

## What is NOT in this change

- Dovecot sieve rules (no sieve infrastructure exists in the module;
  the plan file's fallback option was precisely this CronJob path).
- Push of retention metrics to Pushgateway - the script prints them
  to the job log for now; plumbing Pushgateway is a follow-up if
  Viktor wants alerts.
- Any touch of other mailboxes - only `/var/mail/viktorbarzin.me/spam/cur`
  is walked.
- Any mailserver pod restart or config reload.

## Test plan

### Automated

`terraform fmt` + `terragrunt hclfmt` pass. `scripts/tg plan` on the
mailserver stack shows:
  Plan: 7 to add, 3 to change, 0 to destroy.
Of the 7 adds, 4 are mine (SA + Role + RoleBinding + CronJob). The
other 3 adds belong to the concurrent roundcube-backup CronJob +
nfs_roundcube_backup_host PV + PVC already on master in parallel.
The 3 in-place updates are pre-existing drift on the mailserver
Deployment, Service and email_roundtrip_monitor CronJob, not
introduced by this change.

### Manual Verification

After `scripts/tg apply` lands the CronJob:

  1. Trigger an immediate run:
     `kubectl -n mailserver create job --from=cronjob/spam-retention manual-1`
  2. Wait for completion, read the log:
     `kubectl -n mailserver logs job/manual-1`
     -> expected tail:
        spam_retention_scanned_total <N>
        spam_retention_auto_deleted_total <M>
        spam_retention_human_deleted_total <H>
        spam_retention_kept_total <K>
        spam_retention_errors_total 0
        Retention pass complete
  3. Confirm mailbox shrunk:
     `kubectl -n mailserver exec deploy/mailserver -c docker-mailserver \
         -- du -sh /var/mail/viktorbarzin.me/spam/`
     -> expected: well below 43 MiB within one run (bulk rule alone
        purges ~328 messages per the profile numbers above).
  4. Confirm IMAP reflects the deletions:
     `kubectl -n mailserver exec deploy/mailserver -c docker-mailserver \
         -- doveadm mailbox status -u spam@viktorbarzin.me messages INBOX/spam`
     -> expected: message count dropped accordingly.
  5. 4 hours later, confirm the next scheduled run logs a much
     smaller scan count and 0 deletions (nothing new crossed the
     threshold).

Closes: code-oy4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-04-19 00:20:54 +00:00
parent 6cfc4b7836
commit 6a75ed4809

View file

@ -1133,3 +1133,222 @@ resource "kubernetes_cron_job_v1" "roundcube-backup" {
}
}
# =============================================================================
# Spam mailbox targeted retention (code-oy4)
#
# The @viktorbarzin.me catch-all routes to spam@viktorbarzin.me. Unbounded
# growth (~43 MiB baseline on 2026-04-18, 519 messages, top sender
# tldrnewsletter.com = 138 msgs / 8.2 MiB) makes it painful to triage.
# Profile (2026-04-18):
# - 502/519 messages older than 14 days (97 %)
# - 342/519 carry List-Unsubscribe: (66 %)
# - 21/519 carry Precedence: bulk ( 4 %)
# - 177/519 carry neither marker (= human-ish, 34 %)
#
# Strategy (user-signed-off 2026-04-18, do NOT blind-age-expunge):
# - Messages older than 14 days carrying List-Unsubscribe OR
# Precedence: bulk|list|junk OR Auto-Submitted: auto-* -> DELETE
# - Messages older than 90 days with no automated-sender marker
# -> DELETE (long-tail human forwards)
# - Everything else -> KEEP
#
# Implementation: kubectl exec into the mailserver pod because the
# Maildir lives on a RWO encrypted PVC; a sibling CronJob would fail to
# attach the volume while the mailserver pod holds it. Pattern mirrors
# the `nextcloud-watchdog` in stacks/nextcloud/main.tf.
# =============================================================================
resource "kubernetes_service_account" "spam_retention" {
metadata {
name = "spam-retention"
namespace = kubernetes_namespace.mailserver.metadata[0].name
}
}
resource "kubernetes_role" "spam_retention" {
metadata {
name = "spam-retention"
namespace = kubernetes_namespace.mailserver.metadata[0].name
}
rule {
api_groups = [""]
resources = ["pods"]
verbs = ["list", "get"]
}
rule {
api_groups = [""]
resources = ["pods/exec"]
verbs = ["create"]
}
}
resource "kubernetes_role_binding" "spam_retention" {
metadata {
name = "spam-retention"
namespace = kubernetes_namespace.mailserver.metadata[0].name
}
role_ref {
api_group = "rbac.authorization.k8s.io"
kind = "Role"
name = kubernetes_role.spam_retention.metadata[0].name
}
subject {
kind = "ServiceAccount"
name = kubernetes_service_account.spam_retention.metadata[0].name
namespace = kubernetes_namespace.mailserver.metadata[0].name
}
}
resource "kubernetes_cron_job_v1" "spam_retention" {
metadata {
name = "spam-retention"
namespace = kubernetes_namespace.mailserver.metadata[0].name
}
spec {
schedule = "17 */4 * * *"
concurrency_policy = "Forbid"
successful_jobs_history_limit = 2
failed_jobs_history_limit = 3
starting_deadline_seconds = 300
job_template {
metadata {}
spec {
active_deadline_seconds = 600
backoff_limit = 1
ttl_seconds_after_finished = 600
template {
metadata {}
spec {
service_account_name = kubernetes_service_account.spam_retention.metadata[0].name
restart_policy = "Never"
container {
name = "spam-retention"
image = "bitnami/kubectl:latest"
command = ["/bin/bash", "-c", <<-EOF
set -euo pipefail
POD=$(kubectl -n mailserver get pods -l app=mailserver -o jsonpath='{.items[0].metadata.name}')
if [ -z "$POD" ]; then
echo "ERROR: no mailserver pod found" >&2
exit 1
fi
echo "Targeting pod $POD"
# Stream the retention script to python3 inside the mailserver
# container via stdin. Keeping the logic in Python avoids the
# POSIX-sh/awk fragility around stat(1) differences and header
# matching.
kubectl -n mailserver exec -i "$POD" -c docker-mailserver -- python3 - <<'PYEOF'
import os
import re
import sys
import time
SPAM = "/var/mail/viktorbarzin.me/spam/cur"
# Retention thresholds, in days, one per rule.
AUTOMATED_MAX_AGE_DAYS = 14
HUMAN_MAX_AGE_DAYS = 90
HEADER_SCAN_BYTES = 65536
AUTO_PATTERNS = (
re.compile(rb"^list-unsubscribe:", re.IGNORECASE),
re.compile(rb"^precedence:\s*(bulk|list|junk)", re.IGNORECASE),
re.compile(rb"^auto-submitted:\s*auto-", re.IGNORECASE),
)
def is_automated(path):
try:
with open(path, "rb") as fh:
head = fh.read(HEADER_SCAN_BYTES)
except OSError:
return False
hdr, _, _ = head.partition(b"\r\n\r\n")
if hdr == head:
hdr, _, _ = head.partition(b"\n\n")
for line in hdr.splitlines():
for pat in AUTO_PATTERNS:
if pat.search(line):
return True
return False
if not os.path.isdir(SPAM):
print(f"SKIP: {SPAM} does not exist")
sys.exit(0)
now = time.time()
scanned = auto_deleted = human_deleted = kept = errors = 0
for entry in sorted(os.listdir(SPAM)):
path = os.path.join(SPAM, entry)
try:
st = os.stat(path)
except OSError:
errors += 1
continue
if not os.path.isfile(path):
continue
scanned += 1
age_days = (now - st.st_mtime) / 86400
automated = is_automated(path)
if automated and age_days > AUTOMATED_MAX_AGE_DAYS:
try:
os.unlink(path)
auto_deleted += 1
except OSError:
errors += 1
continue
if (not automated) and age_days > HUMAN_MAX_AGE_DAYS:
try:
os.unlink(path)
human_deleted += 1
except OSError:
errors += 1
continue
kept += 1
# Metric lines (Pushgateway-compatible format). The parent
# kubectl wrapper logs them for now; Pushgateway integration
# is a follow-up.
print(f"spam_retention_scanned_total {scanned}")
print(f"spam_retention_auto_deleted_total {auto_deleted}")
print(f"spam_retention_human_deleted_total {human_deleted}")
print(f"spam_retention_kept_total {kept}")
print(f"spam_retention_errors_total {errors}")
sys.exit(1 if errors else 0)
PYEOF
# Refresh Dovecot index so IMAP sees the deletions immediately.
kubectl -n mailserver exec "$POD" -c docker-mailserver -- \
doveadm force-resync -u spam@viktorbarzin.me INBOX/spam || true
echo "Retention pass complete"
EOF
]
resources {
requests = {
cpu = "10m"
memory = "32Mi"
}
limits = {
memory = "128Mi"
}
}
}
dns_config {
option {
name = "ndots"
value = "2"
}
}
}
}
}
}
}
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
}