payslip-ingest/payslip_ingest/schema.py
Viktor Barzin 9105b6b79d extractor: track rsu_vest + rsu_offset separately from cash pay
UK payslips for equity-comp employees report RSU vests as notional pay
for HMRC only. A paired same-magnitude deduction (Shares Retained /
Stock Tax Withholding / RSU Offset) nets it back out of cash. The UK
payslip's income_tax line shows tax on the grossed-up total, but the
actual RSU tax is handled by Schwab (US broker) via share sale. No
cash flows through UK payroll for RSU.

Previously the extractor folded RSU notional into gross_pay and
income_tax, which inflated the dashboard numbers — a payslip with
£25k RSU vest looked like 2x salary with 80% tax rate.

Changes:
- schema: add rsu_vest + rsu_offset fields (default 0).
- db + alembic 0002: add two new NUMERIC(12,2) columns with server
  default 0 (backward-compatible; existing rows get 0).
- validate_totals: include rsu_offset in deductions sum so the
  gross + rsu_vest inflation is properly netted out.
- extraction prompt: explicit rules for identifying RSU lines by the
  common Meta/Sage/Workday labels, and to NOT put them in
  other_deductions.

Dashboards in a follow-up commit: cash_gross = gross_pay - rsu_vest,
effective tax rate based on cash metrics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 23:37:25 +00:00

55 lines
2.2 KiB
Python

from datetime import date
from decimal import Decimal
from pydantic import BaseModel, ConfigDict, Field
TOTALS_TOLERANCE = Decimal("0.02")
class ExtractedPayslip(BaseModel):
model_config = ConfigDict(extra="forbid")
pay_date: date
pay_period_start: date | None = None
pay_period_end: date | None = None
employer: str | None = None
currency: str = "GBP"
gross_pay: Decimal
income_tax: Decimal = Field(default=Decimal("0"))
national_insurance: Decimal = Field(default=Decimal("0"))
pension_employee: Decimal = Field(default=Decimal("0"))
pension_employer: Decimal = Field(default=Decimal("0"))
student_loan: Decimal = Field(default=Decimal("0"))
# RSU vest reported on the UK payslip is notional — the share grant is
# handled by Schwab which withholds US-side tax by selling shares. The
# UK payslip only lists it for HMRC reporting; no cash flows through
# UK payroll. Track it separately so dashboards can derive cash-only
# gross = gross_pay - rsu_vest.
rsu_vest: Decimal = Field(default=Decimal("0"))
# Corresponding offset deduction that nets the RSU out of cash pay on the
# UK slip (labels vary: "Shares Retained", "Stock Tax Withholding",
# "RSU Offset", "Notional Pay Offset"). Same as rsu_vest in magnitude.
rsu_offset: Decimal = Field(default=Decimal("0"))
other_deductions: dict[str, Decimal] = Field(default_factory=dict)
net_pay: Decimal
class WebhookPayload(BaseModel):
model_config = ConfigDict(extra="forbid")
document_id: int
def validate_totals(p: ExtractedPayslip) -> bool:
"""Check that gross - deductions ≈ net within a 2p tolerance.
- Employer pension is excluded — it never leaves the employer's books.
- `rsu_offset` is included as a deduction: it's the line that nets
the RSU notional back out of cash pay on UK payslips with stock comp.
The gross + rsu_vest inflation is offset by rsu_offset of equal size.
"""
deductions = (p.income_tax + p.national_insurance + p.pension_employee + p.student_loan +
p.rsu_offset +
sum(p.other_deductions.values(), start=Decimal("0")))
diff = abs(p.gross_pay - deductions - p.net_pay)
return diff < TOTALS_TOLERANCE