No description
Find a file
Viktor Barzin 6a75ed4809 [mailserver] Add targeted retention for spam@ mailbox
## Context

The @viktorbarzin.me catch-all routes to spam@viktorbarzin.me. The
mailbox had no retention policy. On 2026-04-18 it held 519 messages
consuming 43 MiB. Without a policy, the only brake on growth was
manual deletion, which has not been happening - hence the bd task.

Viktor's explicit constraint when filing code-oy4: DO NOT blind
age-expunge. We need targeted retention that keeps genuine forwarded
human mail for a long time while shedding the recurring-newsletter
cruft that dominates the byte count.

## Profile findings (2026-04-18, verified on the live pod)

Total: 519 messages, 43 MiB, 0 in new/, 0 in tmp/.

Top senders by volume:
   138  dan@tldrnewsletter.com
    51  hi@ratepunk.com
    40  uber@uber.com
    35  truenas@viktorbarzin.me
    19  ubereats@uber.com
    15  hello@travel.jacksflightclub.com
    12  chris@chriswillx.com
    10  me@viktorbarzin.me

Top senders by storage bytes:
   8,176,481  dan@tldrnewsletter.com  (19 % of 43 MiB alone)
   2,866,104  uber@uber.com
   2,207,458  noreply@mail.selfh.st
   2,066,094  hi@ratepunk.com
   1,675,435  ubereats@uber.com

Age distribution:
    97 %  older than 14 days (502 / 519)
    23 %  older than 90 days (121 / 519)

Automated-sender markers:
    66 %  carry List-Unsubscribe:                   (342 / 519)
     4 %  carry Precedence: bulk|list|junk          ( 21 / 519)
    34 %  carry neither marker (= human-ish tail)   (177 / 519)

Combined "automated AND >14d": 328 messages -> target of rule 1.

## Retention strategy

Signed off by Viktor 2026-04-18. Two rules, both delete-leaf:

  1. Older than 14 days AND header matches one of:
       - `^List-Unsubscribe:`
       - `^Precedence:\s*(bulk|list|junk)`
       - `^Auto-Submitted:\s*auto-`
     -> DELETE.
     Rationale: these markers are the RFC-agreed indicators of bulk /
     robotic senders. A 14-day window still lets genuine subscription
     alerts (delivery, flight, calendar invite) come to attention.

  2. Older than 90 days AND no automated marker at all
     -> DELETE.
     Rationale: these are long-tail forwards from real people to the
     catch-all. 90 days is deliberately generous - I would rather
     leak bytes than lose Viktor's personal correspondence.

  3. Everything else -> KEEP (recent traffic, or aged human tail
     younger than 90d).

## Implementation

A `kubernetes_cron_job_v1.spam_retention` running every 4h (at :17
past) that `kubectl exec`s a Python retention script into the
mailserver pod.

Why kubectl exec and not a sibling CronJob with the Maildir mounted:
mailserver-data-encrypted is a RWO volume held by the mailserver
pod. A sibling would fail to attach. The nextcloud-watchdog pattern
in stacks/nextcloud/main.tf already solves this for a similar
"interact with the live pod on a schedule" shape. Mirrored here with
its own SA + Role + RoleBinding scoped to list/get pods and create
pods/exec in the mailserver namespace only.

Why Python and not pure shell: POSIX `find + stat + awk` struggles
with the header-scan-up-to-blank-line rule, and `stat -c` is Linux-
GNU-specific anyway. The script reads each message's first 64 KiB,
stops at the first blank line, scans headers only, then checks mtime.

The CronJob streams the Python source via `kubectl exec -i ... --
python3 - <<PYEOF`. After the retention pass, `doveadm force-resync
-u spam@viktorbarzin.me INBOX/spam` refreshes Dovecot's cached index
so the deletions appear in IMAP immediately instead of after the next
pod restart.

Includes the standard KYVERNO_LIFECYCLE_V1 marker on the CronJob so
Kyverno ndots mutation does not cause perpetual drift.

## What is NOT in this change

- Dovecot sieve rules (no sieve infrastructure exists in the module;
  the plan file's fallback option was precisely this CronJob path).
- Push of retention metrics to Pushgateway - the script prints them
  to the job log for now; plumbing Pushgateway is a follow-up if
  Viktor wants alerts.
- Any touch of other mailboxes - only `/var/mail/viktorbarzin.me/spam/cur`
  is walked.
- Any mailserver pod restart or config reload.

## Test plan

### Automated

`terraform fmt` + `terragrunt hclfmt` pass. `scripts/tg plan` on the
mailserver stack shows:
  Plan: 7 to add, 3 to change, 0 to destroy.
Of the 7 adds, 4 are mine (SA + Role + RoleBinding + CronJob). The
other 3 adds belong to the concurrent roundcube-backup CronJob +
nfs_roundcube_backup_host PV + PVC already on master in parallel.
The 3 in-place updates are pre-existing drift on the mailserver
Deployment, Service and email_roundtrip_monitor CronJob, not
introduced by this change.

### Manual Verification

After `scripts/tg apply` lands the CronJob:

  1. Trigger an immediate run:
     `kubectl -n mailserver create job --from=cronjob/spam-retention manual-1`
  2. Wait for completion, read the log:
     `kubectl -n mailserver logs job/manual-1`
     -> expected tail:
        spam_retention_scanned_total <N>
        spam_retention_auto_deleted_total <M>
        spam_retention_human_deleted_total <H>
        spam_retention_kept_total <K>
        spam_retention_errors_total 0
        Retention pass complete
  3. Confirm mailbox shrunk:
     `kubectl -n mailserver exec deploy/mailserver -c docker-mailserver \
         -- du -sh /var/mail/viktorbarzin.me/spam/`
     -> expected: well below 43 MiB within one run (bulk rule alone
        purges ~328 messages per the profile numbers above).
  4. Confirm IMAP reflects the deletions:
     `kubectl -n mailserver exec deploy/mailserver -c docker-mailserver \
         -- doveadm mailbox status -u spam@viktorbarzin.me messages INBOX/spam`
     -> expected: message count dropped accordingly.
  5. 4 hours later, confirm the next scheduled run logs a much
     smaller scan count and 0 deletions (nothing new crossed the
     threshold).

Closes: code-oy4

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 00:22:55 +00:00
.beads bd init: initialize beads issue tracking 2026-04-06 15:38:46 +03:00
.claude [payslip-extractor] Add RSU handling section 2026-04-18 23:37:33 +00:00
.git-crypt Add 1 git-crypt collaborator [ci skip] 2025-10-24 18:00:00 +00:00
.github chore: sort outage report service list alphabetically 2026-04-15 18:01:54 +00:00
.planning [ci skip] add auto-generated tiers.tf, planning docs, and helm chart cache 2026-03-06 23:55:57 +00:00
.woodpecker [infra] Add Woodpecker pipeline to deploy PVE /etc/exports (Wave 6b) 2026-04-18 23:21:36 +00:00
ci feat: CI/CD performance overhaul 2026-04-15 11:22:26 +00:00
cli add IPv6 connectivity via Hurricane Electric 6in4 tunnel 2026-03-23 02:22:00 +02:00
diagram [ci skip] Sunset Drone CI: remove all artifacts, DNS, configs, and references 2026-02-23 19:38:55 +00:00
docs [mailserver] Delete postfix_cf_reference_DO_NOT_USE dead code [ci skip] 2026-04-19 00:05:44 +00:00
modules [infra] Suppress Kyverno label drift on module.tls_secret Secrets [ci skip] 2026-04-18 19:23:02 +00:00
playbooks [ci skip] Reduce node config drift: GPU label, OIDC idempotency, node-exporter, rebuild docs 2026-02-22 22:59:38 +00:00
scripts [claude-agent-service] Migrate all pipelines from DevVM SSH to K8s HTTP 2026-04-18 10:12:02 +00:00
secrets Woodpecker CI Update TLS Certificates Commit 2026-04-19 00:02:53 +00:00
stacks [mailserver] Add targeted retention for spam@ mailbox 2026-04-19 00:22:55 +00:00
state/stacks state(vault): update encrypted state 2026-04-18 22:12:55 +00:00
.gitattributes Add broker-sync Terraform stack (#7) 2026-04-17 21:17:45 +01:00
.gitignore .gitignore: ignore terragrunt_rendered.json debug output 2026-04-18 13:18:05 +00:00
.sops.yaml state: per-stack Transit keys for namespace-owner access control 2026-03-17 23:08:18 +00:00
AGENTS.md [infra] Document HCL import {} block convention [ci skip] 2026-04-18 21:10:05 +00:00
config.tfvars [config] Remove ollama_host root variable 2026-04-18 11:14:53 +00:00
CONTRIBUTING.md multi-user access: fix template memory default, add storage quota, add CONTRIBUTING.md [ci skip] 2026-03-19 23:49:15 +00:00
LICENSE.txt Drone CI Update TLS Certificates Commit 2025-10-12 00:13:18 +00:00
MEMORY.md Update MEMORY.md timestamp 2026-03-07 16:43:15 +00:00
README.md add architecture documentation for all infrastructure subsystems [ci skip] 2026-03-24 00:55:25 +02:00
setup-monitoring.sh fix(monitoring): Add setup script for automated health check environment 2026-03-13 13:57:11 +00:00
terragrunt.hcl [infra] Adopt Authentik catch-all Proxy Provider + Application into TF (Wave 6a) 2026-04-18 22:48:26 +00:00
tiers.tf [ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk 2026-02-28 19:08:06 +00:00

This repo contains my infra-as-code sources.

My infrastructure is built using Terraform, Kubernetes and CI/CD is done using Woodpecker CI.

Read more by visiting my website: https://viktorbarzin.me

Documentation

Full architecture documentation is available in docs/ — covering networking, storage, security, monitoring, secrets, CI/CD, databases, and more.

Adding a New User (Admin)

Adding a new namespace-owner to the cluster requires three steps — no code changes needed.

1. Authentik Group Assignment

In the Authentik admin UI, add the user to:

  • kubernetes-namespace-owners group (grants OIDC group claim for K8s RBAC)
  • Headscale Users group (if they need VPN access)

2. Vault KV Entry

Add a JSON entry to secret/platformk8s_users key in Vault:

"username": {
  "role": "namespace-owner",
  "email": "user@example.com",
  "namespaces": ["username"],
  "domains": ["myapp"],
  "quota": {
    "cpu_requests": "2",
    "memory_requests": "4Gi",
    "memory_limits": "8Gi",
    "pods": "20"
  }
}
  • username key must match the user's Forgejo username (for Woodpecker admin access)
  • namespaces — K8s namespaces to create and grant admin access to
  • domains — subdomains under viktorbarzin.me for Cloudflare DNS records
  • quota — resource limits per namespace (defaults shown above)

3. Apply Stacks

vault login -method=oidc

cd stacks/vault && terragrunt apply --non-interactive
# Creates: namespace, Vault policy, identity entity, K8s deployer role

cd ../platform && terragrunt apply --non-interactive
# Creates: RBAC bindings, ResourceQuota, TLS secret, DNS records

cd ../woodpecker && terragrunt apply --non-interactive
# Adds user to Woodpecker admin list

What Gets Auto-Generated

Resource Stack
Kubernetes namespace vault
Vault policy (namespace-owner-{user}) vault
Vault identity entity + OIDC alias vault
K8s deployer Role + Vault K8s role vault
RBAC RoleBinding (namespace admin) platform
RBAC ClusterRoleBinding (cluster read-only) platform
ResourceQuota platform
TLS secret in namespace platform
Cloudflare DNS records platform
Woodpecker admin access woodpecker

New User Onboarding

If you've been added as a namespace-owner, follow these steps to get started.

1. Join the VPN

# Install Tailscale: https://tailscale.com/download
tailscale login --login-server https://headscale.viktorbarzin.me
# Send the registration URL to Viktor, wait for approval
ping 10.0.20.100  # verify connectivity

2. Install Tools

Run the setup script to install kubectl, kubelogin, Vault CLI, Terraform, and Terragrunt:

# macOS
bash <(curl -fsSL https://k8s-portal.viktorbarzin.me/setup/script?os=mac)

# Linux
bash <(curl -fsSL https://k8s-portal.viktorbarzin.me/setup/script?os=linux)

3. Authenticate

# Log into Vault (opens browser for SSO)
vault login -method=oidc

# Test kubectl (opens browser for OIDC login)
kubectl get pods -n YOUR_NAMESPACE

4. Deploy Your First App

# Clone the infra repo
git clone https://github.com/ViktorBarzin/infra.git && cd infra

# Copy the stack template
cp -r stacks/_template stacks/myapp
mv stacks/myapp/main.tf.example stacks/myapp/main.tf

# Edit main.tf — replace all <placeholders>

# Store secrets in Vault
vault kv put secret/YOUR_USERNAME/myapp DB_PASSWORD=secret123

# Submit a PR
git checkout -b feat/myapp
git add stacks/myapp/
git commit -m "add myapp stack"
git push -u origin feat/myapp

After review and merge, an admin runs cd stacks/myapp && terragrunt apply.

5. Set Up CI/CD (Optional)

Create .woodpecker.yml in your app's Forgejo repo:

steps:
  - name: build
    image: woodpeckerci/plugin-docker-buildx
    settings:
      repo: YOUR_DOCKERHUB_USER/myapp
      tag: ["${CI_PIPELINE_NUMBER}", "latest"]
      username:
        from_secret: dockerhub-username
      password:
        from_secret: dockerhub-token
      platforms: linux/amd64

  - name: deploy
    image: hashicorp/vault:1.18.1
    commands:
      - export VAULT_ADDR=http://vault-active.vault.svc.cluster.local:8200
      - export VAULT_TOKEN=$(vault write -field=token auth/kubernetes/login
          role=ci jwt=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token))
      - KUBE_TOKEN=$(vault write -field=service_account_token
          kubernetes/creds/YOUR_NAMESPACE-deployer
          kubernetes_namespace=YOUR_NAMESPACE)
      - kubectl --server=https://kubernetes.default.svc
          --token=$KUBE_TOKEN
          --certificate-authority=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          -n YOUR_NAMESPACE set image deployment/myapp
          myapp=YOUR_DOCKERHUB_USER/myapp:${CI_PIPELINE_NUMBER}

Useful Commands

# Check your pods
kubectl get pods -n YOUR_NAMESPACE

# View quota usage
kubectl describe resourcequota -n YOUR_NAMESPACE

# Store/read secrets
vault kv put secret/YOUR_USERNAME/myapp KEY=value
vault kv get secret/YOUR_USERNAME/myapp

# Get a short-lived K8s deploy token
vault write kubernetes/creds/YOUR_NAMESPACE-deployer \
  kubernetes_namespace=YOUR_NAMESPACE

Important Rules

  • All changes go through Terraform — never kubectl apply/edit/patch directly
  • Never put secrets in code — use Vault: vault kv put secret/YOUR_USERNAME/...
  • Always use a PR — never push directly to master
  • Docker images: build for linux/amd64, use versioned tags (not :latest)

git-crypt setup

To decrypt the secrets, you need to setup git-crypt.

  1. Install git-crypt.
  2. Setup gpg keys on the machine
  3. git-crypt unlock

This will unlock the secrets and will lock them on commit