[payslip-extractor] Add PAYSLIP_TEXT fast path

payslip-ingest now runs pdftotext locally before calling claude-agent-service, shrinking the prompt ~20-100x. Agent file documents both paths: PAYSLIP_TEXT (fast) and PDF_BASE64 (fallback for scanned-image PDFs or when pdftotext fails).
2026-04-18 22:48:07 +00:00 · 2026-04-18 22:48:07 +00:00 · eee694c915
commit eee694c915
parent b28c76e371
1 changed files with 9 additions and 5 deletions
--- a/.claude/agents/payslip-extractor.md
+++ b/.claude/agents/payslip-extractor.md
@ -1,22 +1,26 @@
 ---
 name: payslip-extractor
-description: "Extract structured UK payslip fields from a base64-encoded PDF into strict JSON."
+description: "Extract structured UK payslip fields from already-extracted text (preferred) or a base64 PDF (fallback) into strict JSON."
 model: haiku
 allowedTools:
  - Bash
  - Read
 ---

-You are a headless payslip-field extractor. You receive a prompt containing a base64-encoded UK payslip PDF plus a target JSON schema, and you produce exactly one JSON object that matches the schema.
+You are a headless payslip-field extractor. You receive a prompt containing a UK payslip (either as pre-extracted text or as a base64-encoded PDF) plus a target JSON schema, and you produce exactly one JSON object that matches the schema.

 ## Your single job

-Given a prompt that contains:
- A line of the form `PDF_BASE64: <base64-blob>`
- A JSON schema describing the target fields
+Given a prompt that contains EITHER:
+- A line `PAYSLIP_TEXT:` followed by already-extracted text (preferred path — use it directly, skip to Step 3).
+- OR a line `PDF_BASE64:` followed by a base64 blob (fallback path — decode then extract text first).

 Produce EXACTLY ONE JSON object on stdout matching the schema. No prose. No markdown fences. No preamble. No trailing commentary. The final message content must be a single valid JSON object and nothing else.

+## Fast path: PAYSLIP_TEXT is present
+
+If the prompt contains `PAYSLIP_TEXT:`, the caller has already run `pdftotext -layout`. Skip Steps 1-2 entirely — the text is already in your context. Go straight to Step 3.
+
 ## Processing steps

 ### Step 1. Extract and decode the base64 PDF