diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index cacd3a43..f23a3ba5 100755 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -168,6 +168,7 @@ Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handle - **External monitoring**: `[External] ` monitors in Uptime Kuma test full external path (DNS → Cloudflare → Tunnel → Traefik). Divergence metric `external_internal_divergence_count` → alert `ExternalAccessDivergence` (15min). Config: `stacks/uptime-kuma/`, targets from `cloudflare_proxied_names` in `config.tfvars` (17 remaining centrally-managed hostnames; most DNS records now auto-created by `ingress_factory` `dns_type` param). - Key alerts: OOMKill, pod replica mismatch, 4xx/5xx error rates, UPS battery, CPU temp, SSD writes, NFS responsiveness, ClusterMemoryRequestsHigh (>85%), ContainerNearOOM (>85% limit), PodUnschedulable, ExternalAccessDivergence. - **E2E email monitoring**: CronJob `email-roundtrip-monitor` (every 20 min) sends test email via Brevo HTTP API to `smoke-test@viktorbarzin.me` (catch-all → `spam@`), verifies IMAP delivery, deletes test email, pushes metrics to Pushgateway + Uptime Kuma. Alerts: `EmailRoundtripFailing` (60m), `EmailRoundtripStale` (60m), `EmailRoundtripNeverRun` (60m). Outbound relay: Brevo EU (`smtp-relay.brevo.com:587`, 300/day free — migrated from Mailgun). Inbound external traffic enters via pfSense HAProxy on `10.0.20.1:{25,465,587,993}`, which forwards to k8s `mailserver-proxy` NodePort (30125-30128) with `send-proxy-v2`. Mailserver pod runs alt PROXY-speaking listeners (2525/4465/5587/10993) alongside stock PROXY-free ones (25/465/587/993) for intra-cluster clients. Real client IPs recovered from PROXY v2 header despite kube-proxy SNAT (replaces pre-2026-04-19 MetalLB `10.0.20.202` ETP:Local scheme; see bd code-yiu + `docs/runbooks/mailserver-pfsense-haproxy.md`). Vault: `brevo_api_key` in `secret/viktor` (probe + relay). +- **Authentik walling-off guard**: `blackbox-exporter` (monitoring ns, `stacks/monitoring/modules/monitoring/authentik_walloff_probe.tf`) probes each must-stay-public `auth = "none"` carve-out URL with `no_follow_redirects` and FAILS (`fail_if_header_matches` on `Location`) iff it 302s to Authentik. Catches a carve-out regressing (TF revert / deploy / `ingress_factory` `auth` default flipping back to `"required"`). Scrape job `blackbox-authentik-walloff` (1m) → alert `AuthentikWallingOffPublicPath` (`probe_failed_due_to_regex == 1`, for 10m, `lane=security` → `#security` Slack). **To guard a new carve-out: add one line to `local.authentik_walloff_targets`** (a `service → URL` map; `valid_status_codes` includes 301/302 so legit redirects/404s stay green — only the Authentik `Location` fails the probe). `curl -sI ''` must NOT show a Location to `authentik.viktorbarzin.me` before adding. ## Security Posture (Wave 1 — locked 2026-05-18) diff --git a/docs/architecture/monitoring.md b/docs/architecture/monitoring.md index a182981e..521540c9 100644 --- a/docs/architecture/monitoring.md +++ b/docs/architecture/monitoring.md @@ -64,6 +64,7 @@ graph TB | dcgm-exporter | Configurable resources | `stacks/monitoring/modules/monitoring/` | NVIDIA GPU metrics collection | | Email Roundtrip Probe | Python 3.12 | `stacks/mailserver/modules/mailserver/` | E2E email delivery verification via Mailgun API + IMAP | | Forgejo Registry Integrity Probe | Alpine 3.20 + curl/jq | `stacks/monitoring/modules/monitoring/main.tf` | CronJob every 15m: walks `/v2/_catalog` on `forgejo.viktorbarzin.me` (HTTP via in-cluster service), HEADs every tagged manifest + index child; emits `registry_manifest_integrity_*` metrics to Pushgateway. Replaces the legacy `registry-integrity-probe` against `registry.viktorbarzin.me:5050` decommissioned in Phase 4 of forgejo-registry-consolidation 2026-05-07. | +| blackbox-exporter (Authentik walling-off guard) | `prom/blackbox-exporter` (Keel-managed) | `stacks/monitoring/modules/monitoring/authentik_walloff_probe.tf` | Single-purpose blackbox-exporter. Its `http_no_authentik_redirect` module probes each must-stay-public carve-out URL with `no_follow_redirects` and FAILS (`fail_if_header_matches` on `Location`) iff the response redirects to Authentik. Scraped by job `blackbox-authentik-walloff` (1m); feeds alert `AuthentikWallingOffPublicPath`. Target list = `local.authentik_walloff_targets` in the same file. | ## How It Works @@ -205,6 +206,14 @@ Allowlist source-IP CIDRs (used by K2, K9, V7, S1): `10.0.20.0/22`, `192.168.1.0 IOPS impact estimated ~1-2 GB/day additional disk writes after custom audit-policy tuning. Retention: 90d for security streams. +##### Authentik walling-off guard — `AuthentikWallingOffPublicPath` + +Detects the inverse of the K-series alerts: a service that **must work WITHOUT Authentik SSO** getting accidentally walled off. Services on `ingress_factory auth = "required"` put Authentik forward-auth on `/`, which 302-bounces native-client / public / webhook / WebSocket / SPA-XHR paths. We carve those out with path-scoped `auth = "none"` ingresses; a TF revert, a bad deploy, or `ingress_factory`'s fail-closed `auth` default flipping back to `"required"` can silently clobber a carve-out. + +- **Mechanism**: `blackbox-exporter` (monitoring ns) probes a representative GET-able URL per carve-out with `no_follow_redirects: true`. The `http_no_authentik_redirect` module FAILS the probe (`fail_if_header_matches` on the `Location` header, regex `authentik\.viktorbarzin\.me|/outpost\.goauthentik\.io|/application/o/authorize`) iff the response redirects to Authentik. `valid_status_codes` enumerates all expected non-Authentik responses **including 301/302** (so a legitimate redirect, e.g. a short-link 302, or a 404 carve-out like meshcentral `/agent.ashx`, stays green). Scrape job: `blackbox-authentik-walloff` (1m). +- **Alert**: `probe_failed_due_to_regex{job="blackbox-authentik-walloff"} == 1` for 10m → `severity=warning`, `lane=security` → **`#security` Slack** (Slack-only, no paging). `probe_failed_due_to_regex` (not bare `probe_success==0`) is the signal: it isolates the Authentik-redirect from unrelated 5xx/DNS/TLS failures already covered by reachability alerts. Inhibited by `TraefikDown` and `AuthentikDown` (symptom, not regression, during those outages). +- **Target list + how to add one**: `local.authentik_walloff_targets` in `stacks/monitoring/modules/monitoring/authentik_walloff_probe.tf` — a map of `service → URL`. To guard a NEW carve-out, add ONE line. Verify it does NOT already 302 to Authentik first: `curl -s -o /dev/null -w '%{http_code} %{redirect_url}\n' ''`. The map key becomes the `service` label on the metric + alert. (Note: openclaw `task-webhook` is intentionally NOT probed — no public DNS record.) + #### Backup Alerts - **PostgreSQLBackupStale**: >36h since last backup - **MySQLBackupStale**: >36h since last backup diff --git a/stacks/monitoring/modules/monitoring/authentik_walloff_probe.tf b/stacks/monitoring/modules/monitoring/authentik_walloff_probe.tf new file mode 100644 index 00000000..583900fe --- /dev/null +++ b/stacks/monitoring/modules/monitoring/authentik_walloff_probe.tf @@ -0,0 +1,220 @@ +# ============================================================================= +# Authentik walling-off guard +# ============================================================================= +# Detects regressions where a service that MUST work WITHOUT Authentik SSO gets +# accidentally walled off — i.e. an ingress that should be `auth = "none"` (or a +# path-scoped carve-out) starts returning an Authentik forward-auth 302. +# +# The "walled off" signature (captured live 2026-06-02): a request to a +# must-stay-public URL returns 301/302 whose `Location` header points at +# Authentik: +# https://authentik.viktorbarzin.me/application/o/authorize/?client_id=... +# A correctly-carved path returns a non-redirect (200/400/401/403/404/405/426/…) +# OR a redirect whose Location is NOT Authentik (e.g. a short-link 302). +# +# Mechanism: a tiny blackbox-exporter (below) probes each guarded URL with +# `no_follow_redirects: true` and FAILS the probe iff the `Location` header +# matches Authentik (`fail_if_header_matches`). Prometheus scrapes the probe +# (job `blackbox-authentik-walloff` in extraScrapeConfigs) and the +# `AuthentikWallingOffPublicPath` PrometheusRule (alerting_rules.yml, lane=security) +# routes a firing alert to the #security Slack receiver. +# +# Chosen over a CronJob+pushgateway probe (the apex-probe pattern) because that +# pattern's `pip install`/`apk add` per-run footprint is a known disk-write +# anti-pattern that got status-page-pusher disabled (memory id=559). blackbox is +# a single long-lived deployment — zero per-run disk writes, fully declarative. +# +# --------------------------------------------------------------------------- +# TARGET LIST — HOW TO ADD A NEW CARVE-OUT (one-line edit) +# --------------------------------------------------------------------------- +# When you add a new `auth = "none"` carve-out (or path-scoped carve-out) to any +# stack, add ONE representative GET-able URL here that returns a NON-Authentik +# response today. The map key becomes the `service` label on the probe metric +# and the alert. Verify with: +# curl -s -o /dev/null -w '%{http_code} %{redirect_url}\n' '' +# It must NOT 302 to authentik.viktorbarzin.me before you add it. +# --------------------------------------------------------------------------- +locals { + # Representative URL per `auth = "none"` carve-out service. Each MUST return a + # non-Authentik response (200/3xx-non-authentik/400/404/426/…) when the + # carve-out is intact. Probed every 60s; alert fires only on an Authentik 302. + authentik_walloff_targets = { + # meshcentral agent/relay paths (auth="none"): native mesh-cert clients. + # /agent.ashx 404s without WebSocket upgrade headers — non-redirect = OK. + "meshcentral-agent" = "https://meshcentral.viktorbarzin.me/agent.ashx" + # uptime-kuma public status page (auth="none" on /status, /api/push, …). + "uptime-status" = "https://uptime.viktorbarzin.me/status/infra" + # shlink REST API health (auth="none"): X-Api-Key self-gated, CORS XHR. + "shlink-rest-health" = "https://url.viktorbarzin.me/rest/health" + # rybbit analytics tracker beacon (auth="none"): public sites embed this JS. + "rybbit-script" = "https://rybbit.viktorbarzin.me/api/script.js" + # insta2spotify API (auth="none"): browser fetch() XHRs, CORS preflight. + "insta2spotify-api-health" = "https://insta2spotify.viktorbarzin.me/api/health" + # k8s-portal setup script (auth="none"): curl-ed by automation, no cookies. + "k8s-portal-setup-script" = "https://k8s-portal.viktorbarzin.me/setup/script" + # instagram-poster image derivative endpoint (auth="none"): Meta's fetcher. + # /image 404s without a query param — non-redirect = OK. + "instagram-poster-image" = "https://instagram-poster.viktorbarzin.me/image" + # trading-bot app root (auth="app"): WebAuthn/JWT in-app; was walled, now 200. + "trading-bot-app" = "https://trading.viktorbarzin.me/" + # NOTE: openclaw task-webhook (auth="none") is intentionally NOT probed — it + # has no public DNS record (NXDOMAIN, external_monitor=false), so there is no + # externally GET-able URL to probe. Its carve-out is internal-only. + } +} + +# --- blackbox-exporter ------------------------------------------------------- +# Single-purpose blackbox-exporter. The `http_no_authentik_redirect` module does +# NOT follow redirects and FAILS the probe ONLY when the Location header points +# at Authentik. The status code alone must NEVER fail the probe — carve-outs +# legitimately return 404 (meshcentral /agent.ashx without WS headers, +# instagram-poster /image without a query) or 400/401/403/426, all of which mean +# "carve-out intact". So `valid_status_codes` enumerates every plausible +# non-Authentik response INCLUDING 301/302 — a redirect is status-valid, and the +# Authentik case is then singled out by `fail_if_header_matches` on Location +# (NOT empty: blackbox treats an empty list as "2xx only", which would +# false-fire on every 404 carve-out). probe_failed_due_to_regex isolates the +# Authentik match even further (used as a tie-break in the alert expr). +resource "kubernetes_config_map" "blackbox_exporter_config" { + metadata { + name = "blackbox-exporter-config" + namespace = kubernetes_namespace.monitoring.metadata[0].name + annotations = { + "reloader.stakater.com/match" = "true" + } + } + data = { + "blackbox.yml" = yamlencode({ + modules = { + http_no_authentik_redirect = { + prober = "http" + timeout = "10s" + http = { + method = "GET" + no_follow_redirects = true + preferred_ip_protocol = "ip4" + ip_protocol_fallback = false + fail_if_not_ssl = false + valid_http_versions = ["HTTP/1.1", "HTTP/2.0"] + # Every non-Authentik response a carve-out may legitimately return. + # 301/302 are INCLUDED so a redirect passes the status check and is + # judged solely by the Location header match below. 5xx is excluded: + # a backend 500 isn't a walling-off but is still worth surfacing as a + # probe failure. The full 2xx/3xx/4xx set keeps probe_success==1 for + # all intact carve-outs (404s included). + valid_status_codes = [200, 201, 202, 204, 301, 302, 304, 400, 401, 403, 404, 405, 409, 410, 426, 429] + # FAIL the probe if the response redirects to Authentik. This is the + # walling-off signature: forward-auth 301/302 -> /application/o/authorize + # on authentik.viktorbarzin.me (also matches /outpost.goauthentik.io). + fail_if_header_matches = [ + { + header = "Location" + regexp = "(authentik\\.viktorbarzin\\.me|/outpost\\.goauthentik\\.io|/application/o/authorize)" + allow_missing = true + }, + ] + } + } + } + }) + } +} + +resource "kubernetes_deployment" "blackbox_exporter" { + metadata { + name = "blackbox-exporter" + namespace = kubernetes_namespace.monitoring.metadata[0].name + labels = { + app = "blackbox-exporter" + tier = var.tier + } + annotations = { + "reloader.stakater.com/search" = "true" + } + } + spec { + replicas = 1 + selector { + match_labels = { + app = "blackbox-exporter" + } + } + template { + metadata { + labels = { + app = "blackbox-exporter" + } + } + spec { + container { + name = "blackbox-exporter" + image = "prom/blackbox-exporter:v0.25.0" + args = ["--config.file=/etc/blackbox_exporter/blackbox.yml"] + port { + container_port = 9115 + name = "http" + } + resources { + requests = { + cpu = "5m" + memory = "24Mi" + } + limits = { + memory = "48Mi" + } + } + volume_mount { + name = "config-volume" + mount_path = "/etc/blackbox_exporter/" + } + } + volume { + name = "config-volume" + config_map { + name = kubernetes_config_map.blackbox_exporter_config.metadata[0].name + } + } + dns_config { + option { + name = "ndots" + value = "2" + } + } + } + } + } + lifecycle { + # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2 + # KEEL: monitoring ns is keel-enrolled (policy=patch) — Keel owns the image + # tag and injects keel.sh annotations. Ignore so TF stops reverting Keel. + ignore_changes = [ + spec[0].template[0].spec[0].dns_config, + spec[0].template[0].spec[0].container[0].image, # KEEL_IGNORE_IMAGE + metadata[0].annotations["keel.sh/policy"], + metadata[0].annotations["keel.sh/trigger"], + metadata[0].annotations["keel.sh/pollSchedule"], + metadata[0].annotations["keel.sh/match-tag"], + spec[0].template[0].metadata[0].annotations["keel.sh/update-time"], # KEEL_LIFECYCLE_V1 + ] + } +} + +resource "kubernetes_service" "blackbox_exporter" { + metadata { + name = "blackbox-exporter" + namespace = kubernetes_namespace.monitoring.metadata[0].name + labels = { + app = "blackbox-exporter" + } + } + spec { + selector = { + app = "blackbox-exporter" + } + port { + name = "http" + port = 9115 + target_port = 9115 + } + } +} diff --git a/stacks/monitoring/modules/monitoring/prometheus.tf b/stacks/monitoring/modules/monitoring/prometheus.tf index 75b931df..19f1c80d 100644 --- a/stacks/monitoring/modules/monitoring/prometheus.tf +++ b/stacks/monitoring/modules/monitoring/prometheus.tf @@ -60,5 +60,5 @@ resource "helm_release" "prometheus" { # Re-enable temporarily only when a StatefulSet volumeClaimTemplate change needs --force. force_update = false - values = [templatefile("${path.module}/prometheus_chart_values.tpl", { alertmanager_mail_pass = var.alertmanager_account_password, alertmanager_slack_api_url = var.alertmanager_slack_api_url, tuya_api_key = var.tiny_tuya_service_secret, haos_api_token = var.haos_api_token })] + values = [templatefile("${path.module}/prometheus_chart_values.tpl", { alertmanager_mail_pass = var.alertmanager_account_password, alertmanager_slack_api_url = var.alertmanager_slack_api_url, tuya_api_key = var.tiny_tuya_service_secret, haos_api_token = var.haos_api_token, authentik_walloff_targets = local.authentik_walloff_targets })] } diff --git a/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl b/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl index da01bc92..be54fdf7 100755 --- a/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl +++ b/stacks/monitoring/modules/monitoring/prometheus_chart_values.tpl @@ -99,7 +99,14 @@ alertmanager: - source_matchers: - alertname = TraefikDown target_matchers: - - alertname =~ "PoisonFountainDown|ForwardAuthFallbackActive" + - alertname =~ "PoisonFountainDown|ForwardAuthFallbackActive|AuthentikWallingOffPublicPath" + # Authentik down: every protected ingress behaves oddly (fallback proxies + # engage). The walling-off probe failing then is a symptom, not a regressed + # carve-out — suppress it so the root-cause AuthentikDown alert stands alone. + - source_matchers: + - alertname = AuthentikDown + target_matchers: + - alertname = AuthentikWallingOffPublicPath # A stale Traefik replica returns 404 for a fraction of requests; the same # bug surfaces as TTFB / 4xx / 5xx / external-divergence symptoms downstream. # When TraefikReplicaConfigStale fires, the root cause is identified — @@ -2942,8 +2949,59 @@ serverFiles: subsystem: traefik annotations: summary: "Traefik replicas have diverging router counts (skew={{ $value | printf \"%.0f\" }}). Restart the laggard pod: `kubectl get pods -n traefik` and delete the one with fewer routers." + # Authentik walling-off guard. Fires when a must-stay-public carve-out URL + # (job blackbox-authentik-walloff, targets in authentik_walloff_probe.tf) + # starts returning an Authentik forward-auth 302. probe_success==0 there + # means blackbox's fail_if_header_matches caught a Location -> Authentik: + # a path-scoped `auth = "none"` carve-out was clobbered (TF revert, deploy, + # ingress_factory default flipping back to auth="required"). lane=security + # routes it to the #security Slack receiver (Slack-only, no paging). + - name: Authentik Walling Off + rules: + - alert: AuthentikWallingOffPublicPath + # probe_failed_due_to_regex==1 means the response's Location header + # matched Authentik — the precise walling-off signature, independent + # of status code. (We deliberately do NOT alert on bare + # probe_success==0: with the broad valid_status_codes, a 404 carve-out + # is success, and a 5xx/DNS/TLS failure is a DIFFERENT failure mode + # already covered by reachability alerts — not a forward-auth wall.) + # for:10m rides out scrape blips / brief Traefik restarts. + expr: probe_failed_due_to_regex{job="blackbox-authentik-walloff"} == 1 + for: 10m + labels: + severity: warning + lane: security + subsystem: authentik + annotations: + summary: "Public path walled off by Authentik: {{ $labels.service }} ({{ $labels.instance }})" + description: "The must-stay-public URL {{ $labels.instance }} (carve-out `{{ $labels.service }}`) is failing its blackbox probe — most likely it now 302-redirects to Authentik SSO. A path-scoped `auth = \"none\"` carve-out probably regressed (TF revert / deploy / ingress_factory auth default flipping back to \"required\"). Native-client / public / webhook / WebSocket / SPA-XHR traffic to this endpoint is broken for strangers and machines. Check the owning stack's ingress_factory `auth` + `ingress_path`, and curl the URL: `curl -sI '{{ $labels.instance }}'` — a Location to authentik.viktorbarzin.me confirms the regression. Probe config + target list: stacks/monitoring/modules/monitoring/authentik_walloff_probe.tf." extraScrapeConfigs: | + # Authentik walling-off guard. Probes each must-stay-public carve-out URL via + # blackbox-exporter's `http_no_authentik_redirect` module (no_follow_redirects + + # fail_if_header_matches on a Location -> Authentik). probe_success == 0 for a + # target here means that URL now 302s to Authentik — a carve-out regressed. + # Target list + "how to add a target" docs: authentik_walloff_probe.tf. + # Alert: AuthentikWallingOffPublicPath (alerting_rules.yml, lane=security). + - job_name: 'blackbox-authentik-walloff' + scrape_interval: 1m + scrape_timeout: 30s + metrics_path: /probe + params: + module: [http_no_authentik_redirect] + static_configs: +%{ for svc, url in authentik_walloff_targets ~} + - targets: ["${url}"] + labels: + service: "${svc}" +%{ endfor ~} + relabel_configs: + - source_labels: [__address__] + target_label: __param_target + - source_labels: [__param_target] + target_label: instance + - target_label: __address__ + replacement: 'blackbox-exporter.monitoring.svc.cluster.local:9115' # The `mailserver-dovecot` scrape job was retired in code-1ik together # with the Dovecot exporter. docker-mailserver 15.0.0's Dovecot 2.3 # doesn't emit the old_stats protocol the exporter expected, so the diff --git a/stacks/tripit/main.tf b/stacks/tripit/main.tf index a7855287..4f48a328 100644 --- a/stacks/tripit/main.tf +++ b/stacks/tripit/main.tf @@ -97,6 +97,9 @@ resource "kubernetes_manifest" "external_secret" { { secretKey = "VAPID_SUBJECT", remoteRef = { key = "tripit", property = "VAPID_SUBJECT" } }, { secretKey = "CALENDAR_TOKEN_SECRET", remoteRef = { key = "tripit", property = "CALENDAR_TOKEN_SECRET" } }, { secretKey = "IMAP_PASSWORD", remoteRef = { key = "tripit", property = "IMAP_PASSWORD" } }, + # spam@viktorbarzin.me password — used only by the ingest-plans CronJob + # (forward-to-parse via the @viktorbarzin.me -> spam@ catch-all). + { secretKey = "PLANS_IMAP_PASSWORD", remoteRef = { key = "tripit", property = "PLANS_IMAP_PASSWORD" } }, ] } } @@ -312,12 +315,17 @@ locals { suspend = false extra_env = {} } - # Ongoing forward-to-parse ingest of me@viktorbarzin.me's mailbox. Uses the - # real local LLM (qwen3vl-4b on llama-swap — qwen3-8b OOMs the shared T4). - # Read-only IMAP (BODY.PEEK), bounded to the 30 most-recent messages/run; - # the pipeline is idempotent (skips message_ids already in inbound_email), - # so re-reading the recent window is a no-op for already-seen mail. - # IMAP_PASSWORD is injected from secret/tripit via the tripit-secrets ES. + # Ongoing forward-to-parse ingest of vbarzin@gmail.com — Viktor's real + # travel mailbox (the self-hosted me@ box receives no booking mail). LLM = + # qwen3vl-4b on llama-swap (qwen3-8b OOMs the shared T4). Read-only + # IMAP_SEARCH over [Gmail]/All Mail (BODY.PEEK, never sets \Seen), bounded + # to a rolling 12-month window of travel-sender mail via Gmail X-GM-RAW; the + # two Croatia Jet2 refs (33W6Y3/33W7L2) are excluded so the hand-curated + # Croatia trip isn't duplicated. Idempotent (skips message_ids already in + # inbound_email). Trips land under MAIL_DEFAULT_OWNER_EMAIL (vbarzin@gmail.com + # — Viktor's Authentik login identity, so trips show up in his account). + # IMAP_PASSWORD (the vbarzin@gmail.com app-password) comes from secret/tripit + # via the tripit-secrets ES. ingest-mail = { schedule = "*/30 * * * *" command = ["python", "-m", "tripit_api", "ingest-mail"] @@ -327,13 +335,41 @@ locals { LLM_ENDPOINT = "http://llama-swap.llama-cpp.svc.cluster.local:8080" LLM_MODEL = "qwen3vl-4b" MAIL_INGEST_ENABLED = "true" - MAIL_DEFAULT_OWNER_EMAIL = "me@viktorbarzin.me" + MAIL_DEFAULT_OWNER_EMAIL = "vbarzin@gmail.com" + IMAP_HOST = "imap.gmail.com" + IMAP_PORT = "993" + IMAP_USER = "vbarzin@gmail.com" + IMAP_FOLDER = "[Gmail]/All Mail" + IMAP_USE_SSL = "true" + IMAP_SEARCH = "X-GM-RAW \"newer_than:12m -33W6Y3 -33W7L2 (from:jet2.com OR from:ryanair.com OR from:easyjet.com OR from:wizzair.com OR from:booking.com OR from:airbnb.com OR from:expedia.com OR from:croatiaairlines.com OR from:vueling.com OR from:lufthansa.com OR from:trainline)\"" + } + } + # Forward-to-parse: forward any booking confirmation to plans@viktorbarzin.me + # (which the @viktorbarzin.me catch-all delivers into the spam@ mailbox), and + # this job ingests it. Polls spam@ read-only, filtered by IMAP SEARCH to mail + # addressed To plans@ — so only deliberate forwards are processed, not the + # rest of the catch-all junk. The LLM extracts segments and the pipeline + # attaches them to the date-overlapping trip (or creates one) under + # MAIL_DEFAULT_OWNER_EMAIL. IMAP_PASSWORD is overridden for this job to + # spam@'s password via imap_password_key (secret/tripit PLANS_IMAP_PASSWORD), + # because env_from otherwise injects the Gmail app-password. + ingest-plans = { + schedule = "*/15 * * * *" + command = ["python", "-m", "tripit_api", "ingest-mail"] + suspend = false + imap_pw_secret_key = "PLANS_IMAP_PASSWORD" + extra_env = { + LLM_MODE = "llamacpp" + LLM_ENDPOINT = "http://llama-swap.llama-cpp.svc.cluster.local:8080" + LLM_MODEL = "qwen3vl-4b" + MAIL_INGEST_ENABLED = "true" + MAIL_DEFAULT_OWNER_EMAIL = "vbarzin@gmail.com" IMAP_HOST = "mailserver.mailserver.svc.cluster.local" IMAP_PORT = "993" - IMAP_USER = "me@viktorbarzin.me" + IMAP_USER = "spam@viktorbarzin.me" IMAP_FOLDER = "INBOX" IMAP_USE_SSL = "true" - IMAP_RECENT_N = "30" + IMAP_SEARCH = "TO \"plans@viktorbarzin.me\"" } } } @@ -391,6 +427,23 @@ resource "kubernetes_cron_job_v1" "tripit_worker" { } } + # Per-job IMAP_PASSWORD override from a secret key. An explicit env + # takes precedence over env_from, so a job that polls a different + # mailbox (ingest-plans -> spam@) gets its own password instead of + # the default IMAP_PASSWORD (vbarzin@gmail.com) from tripit-secrets. + dynamic "env" { + for_each = lookup(each.value, "imap_pw_secret_key", null) != null ? [1] : [] + content { + name = "IMAP_PASSWORD" + value_from { + secret_key_ref { + name = "tripit-secrets" + key = each.value.imap_pw_secret_key + } + } + } + } + volume_mount { name = "documents" mount_path = "/data/documents"