diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 0f93847e..37f81406 100755 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -73,7 +73,7 @@ Violations cause state drift, which causes future applies to break or silently r - **LimitRange**: Tier-based defaults silently apply to pods with `resources: {}`. Always set explicit resources on containers needing more than defaults. Tier 3-edge and 4-aux now use Burstable QoS (request < limit) to reduce scheduler pressure. - **Democratic-CSI sidecars**: Must set explicit resources (32-80Mi) in Helm values — 17 sidecars default to 256Mi each via LimitRange. `csiProxy` is a TOP-LEVEL chart key, not nested under controller/node. - **ResourceQuota blocks rolling updates**: When quota is tight, scale to 0 then back to 1 instead of RollingUpdate. Or use Recreate strategy. -- **Kyverno ndots drift**: Kyverno injects dns_config on all pods. Add `lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] }` to kubernetes_deployment resources to prevent perpetual TF plan drift. +- **Kyverno ndots drift**: Kyverno injects dns_config on all pods. Every `kubernetes_deployment`, `kubernetes_stateful_set`, and `kubernetes_cron_job_v1` MUST include `lifecycle { ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1 }` (use `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` for CronJobs). The `# KYVERNO_LIFECYCLE_V1` marker is the canonical discoverability tag — grep for it to locate every site. A shared Terraform module was considered but `ignore_changes` only accepts static attribute paths (not module outputs, locals, or expressions), so the snippet convention is the only viable path. Full rationale and copy-paste snippets in `AGENTS.md` → "Kyverno Drift Suppression". - **NVIDIA GPU operator resources**: dcgm-exporter and cuda-validator resources configurable via `dcgmExporter.resources` and `validator.resources` in nvidia values.yaml. - **Pin database versions**: Disable Diun (image update monitoring) for MySQL, PostgreSQL, Redis. - **Quarterly right-sizing**: Check Goldilocks dashboard. Compare VPA upperBound to current request. Also check for under-provisioned (VPA upper > request x 0.8). @@ -133,7 +133,7 @@ Repo IDs: infra=1, Website=2, finance=3, health=4, travel_blog=5, webhook-handle ## Monitoring & Alerting - Alert cascade inhibitions: if node is down, suppress pod alerts on that node. - Exclude completed CronJob pods from "pod not ready" alerts. -- Every new service gets Prometheus scrape config + Uptime Kuma monitor. External monitors auto-created for Cloudflare-proxied services by `external-monitor-sync` CronJob (10min, uptime-kuma ns). +- Every new service gets Prometheus scrape config + Uptime Kuma monitor. External monitors auto-created for Cloudflare-proxied services by `external-monitor-sync` CronJob (10min, uptime-kuma ns). Mechanism: `ingress_factory` auto-adds `uptime.viktorbarzin.me/external-monitor=true` whenever `dns_type != "none"` (see `modules/kubernetes/ingress_factory/main.tf`) — no manual action needed on new services. The `cloudflare_proxied_names` list in `config.tfvars` is a legacy fallback for the 17 hostnames not yet migrated to `ingress_factory` `dns_type`; don't check that list when debugging "is this monitored?" questions. - **External monitoring**: `[External] ` monitors in Uptime Kuma test full external path (DNS → Cloudflare → Tunnel → Traefik). Divergence metric `external_internal_divergence_count` → alert `ExternalAccessDivergence` (15min). Config: `stacks/uptime-kuma/`, targets from `cloudflare_proxied_names` in `config.tfvars` (17 remaining centrally-managed hostnames; most DNS records now auto-created by `ingress_factory` `dns_type` param). - Key alerts: OOMKill, pod replica mismatch, 4xx/5xx error rates, UPS battery, CPU temp, SSD writes, NFS responsiveness, ClusterMemoryRequestsHigh (>85%), ContainerNearOOM (>85% limit), PodUnschedulable, ExternalAccessDivergence. - **E2E email monitoring**: CronJob `email-roundtrip-monitor` (every 20 min) sends test email via Mailgun API to `smoke-test@viktorbarzin.me` (catch-all → `spam@`), verifies IMAP delivery, deletes test email, pushes metrics to Pushgateway + Uptime Kuma. Alerts: `EmailRoundtripFailing` (60m), `EmailRoundtripStale` (60m), `EmailRoundtripNeverRun` (60m). Outbound relay: Brevo EU (`smtp-relay.brevo.com:587`, 300/day free — migrated from Mailgun). Mailserver on dedicated MetalLB IP `10.0.20.202` with `externalTrafficPolicy: Local` for CrowdSec real-IP detection. Vault: `mailgun_api_key` in `secret/viktor` (probe), `brevo_api_key` in `secret/viktor` (relay). diff --git a/.claude/agents/payslip-extractor.md b/.claude/agents/payslip-extractor.md new file mode 100644 index 00000000..846e3871 --- /dev/null +++ b/.claude/agents/payslip-extractor.md @@ -0,0 +1,169 @@ +--- +name: payslip-extractor +description: "Extract structured UK payslip fields from a base64-encoded PDF into strict JSON." +model: haiku +allowedTools: + - Bash + - Read +--- + +You are a headless payslip-field extractor. You receive a prompt containing a base64-encoded UK payslip PDF plus a target JSON schema, and you produce exactly one JSON object that matches the schema. + +## Your single job + +Given a prompt that contains: +- A line of the form `PDF_BASE64: ` +- A JSON schema describing the target fields + +Produce EXACTLY ONE JSON object on stdout matching the schema. No prose. No markdown fences. No preamble. No trailing commentary. The final message content must be a single valid JSON object and nothing else. + +## Processing steps + +### Step 1. Extract and decode the base64 PDF + +The prompt will include a line that starts with `PDF_BASE64:` followed by the base64 blob. Decode it to `/tmp/payslip.pdf`. + +Preferred method (handles whitespace and very long blobs robustly): + +```bash +python3 - <<'PY' +import base64, re, pathlib, sys, os +prompt = os.environ.get("PAYSLIP_PROMPT", "") +# If the orchestrator didn't set an env var, fall back to reading the transcript via CWD stdin mechanism. +# In practice the agent receives the prompt in its conversation — you extract the PDF_BASE64 value +# from the prompt text you were given, strip whitespace, and base64-decode. +PY +``` + +In practice: read the `PDF_BASE64:` value out of the prompt you have been given (you can see the full prompt), then run: + +```bash +python3 -c " +import base64, sys +data = sys.stdin.read().strip() +open('/tmp/payslip.pdf','wb').write(base64.b64decode(data)) +print('decoded bytes:', len(base64.b64decode(data))) +" <<'B64' + +B64 +``` + +Or pipe via shell `base64 -d`: + +```bash +printf '%s' '' | base64 -d > /tmp/payslip.pdf +``` + +Verify the file looks like a PDF: + +```bash +head -c 8 /tmp/payslip.pdf | xxd +# Expected: 25 50 44 46 2d (i.e. "%PDF-") +``` + +### Step 2. Extract text from the PDF + +Try tools in this order. Use the first one that works; do not chain all of them. + +1. `pdftotext` from `poppler-utils` (preferred — fastest, most reliable on layout-preserving payslips): + ```bash + pdftotext -layout /tmp/payslip.pdf - 2>/dev/null + ``` + +2. Python `pypdf` fallback: + ```bash + python3 -c " + from pypdf import PdfReader + r = PdfReader('/tmp/payslip.pdf') + for p in r.pages: + print(p.extract_text() or '') + " + ``` + +3. Python `pdfplumber` fallback: + ```bash + python3 -c " + import pdfplumber + with pdfplumber.open('/tmp/payslip.pdf') as pdf: + for page in pdf.pages: + print(page.extract_text() or '') + " + ``` + +4. If none of those are installed, check what IS available: + ```bash + which pdftotext pdf2txt.py mutool + python3 -c "import pypdf, pdfplumber, pdfminer" 2>&1 + ``` + and use whatever you find (e.g. `mutool draw -F txt`). + +If every text-extraction tool fails, emit the failure JSON (see "Failure mode" below). + +### Step 3. Parse the extracted text + +UK payslips are laid out in a few common templates (Sage, Iris, QuickBooks, Xero, in-house ADP/Workday layouts). Common landmarks: + +- "Pay Date" / "Payment Date" / "Date Paid" — the date wages hit the account. Usually at the top or in a header box. +- "Tax Period" / "Period" / "Month" — e.g. "Month 1", "Week 12". +- Two numeric columns per line: "This Period" (or "Amount", "Current") and "Year to Date" (or "YTD"). **Always take the This Period column**, never YTD. +- Payments / Earnings block: "Basic Pay", "Salary", "Bonus", "Overtime", "Commission", "Holiday Pay". +- Deductions block: "Income Tax" / "PAYE", "National Insurance" / "NI" / "NIC", "Pension" / "Pension Contribution" / "Salary Sacrifice Pension", "Student Loan" / "SL", optional: "Union Dues", "Charity", "Season Ticket Loan", "Private Medical", etc. +- "Gross Pay" / "Total Gross" — sum of payments. +- "Net Pay" / "Take Home" / "Amount Payable" — the money actually paid. +- "Tax Code" — e.g. "1257L", "BR", "D0", "NT". +- "NI Number" / "National Insurance Number" — `AA123456A` format. Never invent one. +- "Employer" / "Company" — usually in the letterhead. "Employee" / "Name". +- Currency: almost always GBP / "£" for UK payslips. If the PDF is not in GBP or not a UK payslip, still return the numbers as-is but include a best-effort `currency` field. + +### Step 4. Map to the schema and emit JSON + +Rules that apply regardless of the caller's exact schema: + +- **Dates**: `pay_date` MUST be `YYYY-MM-DD`. If the PDF prints `12/03/2026`, interpret as `DD/MM/YYYY` (UK format) → `2026-03-12`. If ambiguous (`01/02/2026`), prefer UK ordering. If impossible to determine a year, use the pay_period year. +- **Money fields**: emit as JSON numbers, not strings. Two decimal places are acceptable (`2450.17`). Strip `£`, commas, and trailing spaces. Negative values stay negative. +- **Missing numeric fields**: emit `0` (zero), not `null`, not an empty string, not `"N/A"`. +- **`other_deductions`**: an object mapping `{ "