meta_uk parser: add variant A (2019-2022) + variant C (2022-2023)
## Context
The initial v2 parser (commit 9741816) only handled the modern template
(variant B, 2024+). Of Viktor's 73 real payslips in Paperless, 30 from
2021-07 through 2023-11 failed entirely — Claude fallback hit errors on
them and the rows never landed. Investigation via `kubectl exec` +
pdftotext on a sample of the failing docs revealed two previously-unseen
layouts that the parser needs to handle directly:
- **Variant A** (2019 → mid-2022): single-column Description/This Period/
This Year. Parenthesized negatives `(152.90)`. Date format `Date : 31
Aug 2021`. Employer is `Facebook UK Ltd` (not `Limited`). RSU lines:
`RSU Gain Taxable` + `RSU Gain Nicable` + `RSU Net Cash UK` on the
earnings side with a matching `RSU Net Gain` on the deductions side.
BIK items (Private Dental/Medical) appear on both sides — net zero in
the gross, but the deduction-side copy must land in other_deductions
for the validation formula to hold.
- **Variant C** (late-2022 → 2023): side-by-side Payments|Deductions|
Year To Date (note capital "To", vs variant B's lowercase "to"). Date
format `Pay Date : 30.11.2022` (dots, not slashes). RSU labels use the
abbreviated `RSU Gain Taxabl` / `Nicabl` and still include the `RSU
Net Gain` offset. `Company Name : Facebook UK Limited` preamble.
Variant B (2024+) is unchanged.
## This change
### Parser refactor
- `EMPLOYER_RE = re.compile(r"Facebook UK (?:Limited|Ltd)\b")` — matches
all three eras.
- `AMOUNT_RE` now accepts both `-1,234.56` and `(1,234.56)` — variant A's
accounting-style parenthesized negatives normalize to `-1234.56` in
`_to_decimal`.
- `_parse_date` tries three formats in order: slash (B), dot (C), word (A).
- `_is_variant_b_or_c` collapses B and C into one detector (both have the
side-by-side header with `Year [Tt]o Date`); their parsers share code
because the column mechanics are identical — only the RSU-label set and
date format differ.
- `_parse_variant_a` is a full rewrite: single-column rows split by the
two `Total ...` anchors (payments → deductions), pay_date from the
header's `Date : ...`, gross from first Total, net from the trailing
`Net Pay` line, taxable_pay from the `Taxable Pay : This Period £X`
line at the bottom.
- RSU_VEST_LABELS is a shared set covering 8 aliases; rsu_vest sums every
matching payment line. rsu_offset maps to `RSU Net Gain` on the
deduction side when present (absent in variant B, present in A and C).
### Fixtures switched to real pdftotext output
Removed the two synthetic fixtures that no longer reflected real Meta
output (`meta_uk_2019_07.txt`, `meta_uk_2024_03_bonus_sacrificed.txt`)
and replaced with real pdftotext captures:
- `meta_uk_2021_08_variant_a.txt` (doc_id=43)
- `meta_uk_2022_11_variant_c.txt` (doc_id=53)
The remaining synthetic fixtures (`2025_03`, `2026_02`) stay because
they encode specific bonus/no-bonus scenarios and the numbers are
derived from the real Feb-2026 sample in the plan.
## Tests
- 10 parser tests: one per variant (A/B/C) + totals validation across
all 4 fixtures + the existing non-Meta/empty-input guards. All pass.
- 52 total tests across the repo, all green.
## Test Plan
### Automated
```
$ poetry run pytest
============================== 52 passed in 1.66s ==============================
$ poetry run ruff check .
All checks passed!
$ poetry run mypy .
Success: no issues found in 24 source files
```
### Manual verification (after deploy)
1. TRUNCATE + re-run backfill — expect 73 real payslips to extract via
regex (≥95% hit rate), 42 → 70+ validated rows.
2. Sample a row for each variant via psql: employer, rsu_vest, and
taxable_pay should all be populated.
## Reproduce locally
1. `poetry run pytest tests/test_meta_uk_parser.py -v`
2. Expected: 10 passed, each fixture validates totals to within 2p.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
974181674d
commit
f62c5332e3
6 changed files with 439 additions and 263 deletions
|
|
@ -1,22 +1,28 @@
|
|||
"""Regex-based Meta UK payslip parser.
|
||||
|
||||
Meta UK payslips use a stable template that splits into two layout variants
|
||||
with a hard boundary at the 2022-01-31 template change:
|
||||
Meta UK payslips come in three layout variants across 2019-2026:
|
||||
|
||||
- Variant A (pre-2022): single-column "Description / This Period / This Year"
|
||||
layout. No RSU lines (Viktor's pre-vest tenure). AE Pension EE lists as a
|
||||
positive deduction against a pre-sacrifice gross.
|
||||
- **Variant A** (2019-mid-2022, seen as `Facebook UK Ltd`):
|
||||
Single-column `Description | This Period | This Year` layout. Parenthesized
|
||||
negatives `(152.90)` = -152.90. Date format `Date : 31 Aug 2021`. RSU
|
||||
labels: `RSU Gain Taxable`, `RSU Gain Nicable`, `RSU Net Cash UK`, plus a
|
||||
matching `RSU Net Gain` deduction. BIK items (Private Dental/Medical,
|
||||
EE Discount) appear as both earnings and deductions.
|
||||
|
||||
- Variant B (post-2022): side-by-side "Payments | Deductions | Year to Date"
|
||||
three-column layout. AE Pension EE sits in the Payments column as a
|
||||
negative line — i.e. salary sacrifice reduces Total Payment before it hits
|
||||
PAYE. RSU vest arrives as two lines in Payments: "RSU Tax Offset" (the
|
||||
notional RSU value) and "RSU Excs Refund" (any over-withheld amount
|
||||
returned). Their sum is what we attribute as `rsu_vest`.
|
||||
- **Variant C** (late-2022 - 2023, `Facebook UK Limited`):
|
||||
Side-by-side `Payments | Deductions | Year To Date` (capital "To"). Date
|
||||
format `Pay Date : 30.11.2022` (dots). `Company Name : Facebook UK Limited`
|
||||
preamble. RSU labels use the abbreviated `RSU Gain Taxabl` / `Nicabl` and
|
||||
still have the `RSU Net Gain` offset.
|
||||
|
||||
Parser returns `ExtractedPayslip`. On any structural miss (header not found,
|
||||
Pay Date missing, totals row malformed) it raises `ParserError` — the caller
|
||||
falls back to ClaudeExtractor so we never silently drop a payslip.
|
||||
- **Variant B** (2024+, `Facebook UK Limited`):
|
||||
Side-by-side `Payments | Deductions | Year to Date` (lowercase "to"). Date
|
||||
format `Pay Date: 27/02/2026` (slashes). RSU labels are `RSU Tax Offset`
|
||||
+ `RSU Excs Refund`; there is NO matching offset deduction — the vest
|
||||
grosses up Taxable Pay and PAYE is on the grossed-up figure.
|
||||
|
||||
Parser returns `ExtractedPayslip`. On any structural miss it raises
|
||||
`ParserError` so the caller falls back to ClaudeExtractor.
|
||||
"""
|
||||
import re
|
||||
from datetime import date, datetime
|
||||
|
|
@ -29,30 +35,42 @@ class ParserError(ValueError):
|
|||
"""Raised when the Meta UK template cannot be matched."""
|
||||
|
||||
|
||||
AMOUNT_RE = re.compile(r"-?\d{1,3}(?:,\d{3})*\.\d{2}")
|
||||
PAY_DATE_RE = re.compile(r"Pay Date:\s*(\d{2}/\d{2}/\d{4})")
|
||||
PERIOD_START_RE = re.compile(r"Period Start:\s*(\d{2}/\d{2}/\d{4})")
|
||||
PERIOD_END_RE = re.compile(r"Period End:\s*(\d{2}/\d{2}/\d{4})")
|
||||
# Two amount notations:
|
||||
# "-1,234.56" (slashes-era) and "(1,234.56)" (variant A parenthesized)
|
||||
AMOUNT_RE = re.compile(r"-?\d{1,3}(?:,\d{3})*\.\d{2}|\(\d{1,3}(?:,\d{3})*\.\d{2}\)")
|
||||
|
||||
EMPLOYER = "Facebook UK Limited"
|
||||
# Pay Date / Date — three accepted formats:
|
||||
# "Pay Date: 27/02/2026"
|
||||
# "Pay Date : 30.11.2022"
|
||||
# "Date : 31 Aug 2021"
|
||||
PAY_DATE_SLASH_RE = re.compile(r"Pay Date\s*:\s*(\d{2}/\d{2}/\d{4})")
|
||||
PAY_DATE_DOT_RE = re.compile(r"Pay Date\s*:\s*(\d{2}\.\d{2}\.\d{4})")
|
||||
PAY_DATE_WORD_RE = re.compile(r"\bDate\s*:\s*(\d{1,2}\s+[A-Za-z]{3}\s+\d{4})")
|
||||
|
||||
PERIOD_START_RE = re.compile(r"Period Start\s*:\s*(\d{2}/\d{2}/\d{4})")
|
||||
PERIOD_END_RE = re.compile(r"Period End\s*:\s*(\d{2}/\d{2}/\d{4})")
|
||||
|
||||
EMPLOYER_RE = re.compile(r"Facebook UK (?:Limited|Ltd)\b")
|
||||
|
||||
|
||||
def parse_meta_uk(text: str) -> ExtractedPayslip:
|
||||
if not text.strip():
|
||||
raise ParserError("empty text")
|
||||
if "Facebook UK Limited" not in text and "Meta Platforms" not in text:
|
||||
employer_match = EMPLOYER_RE.search(text)
|
||||
if not employer_match:
|
||||
raise ParserError("does not look like a Meta UK payslip")
|
||||
employer = employer_match.group(0)
|
||||
|
||||
lines = text.splitlines()
|
||||
if _is_variant_b(lines):
|
||||
return _parse_variant_b(text, lines)
|
||||
if _is_variant_b_or_c(lines):
|
||||
return _parse_variant_bc(text, lines, employer)
|
||||
if _is_variant_a(lines):
|
||||
return _parse_variant_a(text, lines)
|
||||
raise ParserError("neither variant A nor variant B header found")
|
||||
return _parse_variant_a(text, lines, employer)
|
||||
raise ParserError("neither side-by-side nor single-column header found")
|
||||
|
||||
|
||||
def _is_variant_b(lines: list[str]) -> bool:
|
||||
return any("Payments" in line and "Deductions" in line and "Year to Date" in line
|
||||
def _is_variant_b_or_c(lines: list[str]) -> bool:
|
||||
return any("Payments" in line and "Deductions" in line and re.search(r"Year [Tt]o Date", line)
|
||||
for line in lines)
|
||||
|
||||
|
||||
|
|
@ -62,26 +80,34 @@ def _is_variant_a(lines: list[str]) -> bool:
|
|||
|
||||
|
||||
def _to_decimal(s: str) -> Decimal:
|
||||
s = s.strip()
|
||||
if s.startswith("(") and s.endswith(")"):
|
||||
s = "-" + s[1:-1]
|
||||
return Decimal(s.replace(",", ""))
|
||||
|
||||
|
||||
def _parse_uk_date(s: str) -> date:
|
||||
return datetime.strptime(s, "%d/%m/%Y").date()
|
||||
def _parse_date(text: str) -> date:
|
||||
"""Try each supported format — whichever matches first wins."""
|
||||
m = PAY_DATE_SLASH_RE.search(text)
|
||||
if m:
|
||||
return datetime.strptime(m.group(1), "%d/%m/%Y").date()
|
||||
m = PAY_DATE_DOT_RE.search(text)
|
||||
if m:
|
||||
return datetime.strptime(m.group(1), "%d.%m.%Y").date()
|
||||
m = PAY_DATE_WORD_RE.search(text)
|
||||
if m:
|
||||
raw = re.sub(r"\s+", " ", m.group(1)).strip()
|
||||
return datetime.strptime(raw, "%d %b %Y").date()
|
||||
raise ParserError("pay date not found")
|
||||
|
||||
|
||||
def _find_field(text: str, pattern: re.Pattern[str]) -> str | None:
|
||||
def _find_match(text: str, pattern: re.Pattern[str]) -> str | None:
|
||||
m = pattern.search(text)
|
||||
return m.group(1) if m else None
|
||||
|
||||
|
||||
def _last_amount(segment: str) -> tuple[str, Decimal | None]:
|
||||
"""Return (label, rightmost numeric amount) parsed out of one cell.
|
||||
|
||||
pdftotext -layout keeps Meta's column alignment stable, so each cell in
|
||||
a row is "label ... amount" (optionally "label units rate amount" but
|
||||
Meta leaves units/rate blank). We take the rightmost token as the
|
||||
amount and whatever precedes it, stripped, as the label.
|
||||
"""
|
||||
"""Return (label, rightmost numeric amount)."""
|
||||
matches = list(AMOUNT_RE.finditer(segment))
|
||||
if not matches:
|
||||
return segment.strip(), None
|
||||
|
|
@ -90,44 +116,77 @@ def _last_amount(segment: str) -> tuple[str, Decimal | None]:
|
|||
return label, _to_decimal(last.group())
|
||||
|
||||
|
||||
def _parse_dates(text: str) -> tuple[date, date | None, date | None]:
|
||||
pay_date_str = _find_field(text, PAY_DATE_RE)
|
||||
if pay_date_str is None:
|
||||
raise ParserError("Pay Date not found")
|
||||
period_start = _find_field(text, PERIOD_START_RE)
|
||||
period_end = _find_field(text, PERIOD_END_RE)
|
||||
return (
|
||||
_parse_uk_date(pay_date_str),
|
||||
_parse_uk_date(period_start) if period_start else None,
|
||||
_parse_uk_date(period_end) if period_end else None,
|
||||
)
|
||||
# --------------------------------------------------------------------------
|
||||
# Variant B / C — side-by-side Payments | Deductions | Year to/To Date
|
||||
# --------------------------------------------------------------------------
|
||||
|
||||
PAYMENTS_KNOWN = {
|
||||
"Salary",
|
||||
"Perform Bonus",
|
||||
"Bonus",
|
||||
"AE Pension EE",
|
||||
"AE Pension",
|
||||
"RSU Tax Offset",
|
||||
"RSU Excs Refund",
|
||||
"RSU Gain Taxabl",
|
||||
"RSU Gain Nicabl",
|
||||
"RSU Gain Taxable",
|
||||
"RSU Gain Nicable",
|
||||
"RSU Net Cash",
|
||||
"RSU Net Cash UK",
|
||||
}
|
||||
DEDUCTIONS_KNOWN = {
|
||||
"Tax paid",
|
||||
"Tax",
|
||||
"Employee NIC",
|
||||
"National Insurance",
|
||||
"Student Loans",
|
||||
"Student Loan",
|
||||
"RSU Net Gain",
|
||||
}
|
||||
RSU_VEST_LABELS = {
|
||||
"RSU Tax Offset",
|
||||
"RSU Excs Refund",
|
||||
"RSU Gain Taxabl",
|
||||
"RSU Gain Nicabl",
|
||||
"RSU Gain Taxable",
|
||||
"RSU Gain Nicable",
|
||||
"RSU Net Cash",
|
||||
"RSU Net Cash UK",
|
||||
}
|
||||
|
||||
|
||||
def _parse_variant_b(text: str, lines: list[str]) -> ExtractedPayslip:
|
||||
header_idx, d_col, y_col = _find_variant_b_header(lines)
|
||||
payments, payments_order, deductions = _collect_b_rows(lines, header_idx, d_col, y_col)
|
||||
gross_pay, net_pay = _parse_b_totals_row(lines, header_idx, d_col, y_col)
|
||||
summary = _parse_summary_block(lines)
|
||||
def _parse_variant_bc(text: str, lines: list[str], employer: str) -> ExtractedPayslip:
|
||||
header_idx, d_col, y_col = _find_bc_header(lines)
|
||||
payments, payments_order, deductions = _collect_bc_rows(lines, header_idx, d_col, y_col)
|
||||
gross_pay, net_pay = _parse_bc_totals_row(lines, header_idx, d_col, y_col)
|
||||
summary = _parse_bc_summary_block(lines)
|
||||
|
||||
ae_pension = payments.get("AE Pension EE", Decimal("0"))
|
||||
ae_pension = payments.get("AE Pension EE", payments.get("AE Pension", Decimal("0")))
|
||||
pension_sacrifice = abs(ae_pension) if ae_pension < 0 else Decimal("0")
|
||||
|
||||
rsu_vest = (payments.get("RSU Tax Offset", Decimal("0")) +
|
||||
payments.get("RSU Excs Refund", Decimal("0")))
|
||||
rsu_vest = sum((payments.get(label, Decimal("0")) for label in RSU_VEST_LABELS),
|
||||
start=Decimal("0"))
|
||||
rsu_offset = deductions.get("RSU Net Gain", Decimal("0"))
|
||||
|
||||
income_tax = deductions.get("Tax paid", deductions.get("Tax", Decimal("0")))
|
||||
nic = deductions.get("Employee NIC", deductions.get("National Insurance", Decimal("0")))
|
||||
student_loan = deductions.get("Student Loans", deductions.get("Student Loan", Decimal("0")))
|
||||
|
||||
other_deductions = _build_other_deductions_b(deductions, payments_order)
|
||||
other_deductions = {k: v for k, v in deductions.items() if k not in DEDUCTIONS_KNOWN}
|
||||
del payments_order # retained for future debugging; not used in validation
|
||||
|
||||
pay_date, period_start, period_end = _parse_dates(text)
|
||||
pay_date = _parse_date(text)
|
||||
period_start_s = _find_match(text, PERIOD_START_RE)
|
||||
period_end_s = _find_match(text, PERIOD_END_RE)
|
||||
period_start = datetime.strptime(period_start_s, "%d/%m/%Y").date() if period_start_s else None
|
||||
period_end = datetime.strptime(period_end_s, "%d/%m/%Y").date() if period_end_s else None
|
||||
|
||||
return ExtractedPayslip(
|
||||
pay_date=pay_date,
|
||||
pay_period_start=period_start,
|
||||
pay_period_end=period_end,
|
||||
employer=EMPLOYER,
|
||||
employer=employer,
|
||||
currency="GBP",
|
||||
gross_pay=gross_pay,
|
||||
income_tax=income_tax,
|
||||
|
|
@ -136,7 +195,7 @@ def _parse_variant_b(text: str, lines: list[str]) -> ExtractedPayslip:
|
|||
pension_employer=Decimal("0"),
|
||||
student_loan=student_loan,
|
||||
rsu_vest=rsu_vest,
|
||||
rsu_offset=Decimal("0"),
|
||||
rsu_offset=rsu_offset,
|
||||
salary=payments.get("Salary", Decimal("0")),
|
||||
bonus=payments.get("Perform Bonus", payments.get("Bonus", Decimal("0"))),
|
||||
pension_sacrifice=pension_sacrifice,
|
||||
|
|
@ -149,14 +208,17 @@ def _parse_variant_b(text: str, lines: list[str]) -> ExtractedPayslip:
|
|||
)
|
||||
|
||||
|
||||
def _find_variant_b_header(lines: list[str]) -> tuple[int, int, int]:
|
||||
def _find_bc_header(lines: list[str]) -> tuple[int, int, int]:
|
||||
for i, line in enumerate(lines):
|
||||
if "Payments" in line and "Deductions" in line and "Year to Date" in line:
|
||||
return i, line.index("Deductions"), line.index("Year to Date")
|
||||
raise ParserError("variant B header not found")
|
||||
if ("Payments" in line and "Deductions" in line and re.search(r"Year [Tt]o Date", line)):
|
||||
# Columns anchored on left edge of "Deductions" / "Year [Tt]o Date"
|
||||
ytd_match = re.search(r"Year [Tt]o Date", line)
|
||||
assert ytd_match is not None
|
||||
return i, line.index("Deductions"), ytd_match.start()
|
||||
raise ParserError("variant B/C header not found")
|
||||
|
||||
|
||||
def _collect_b_rows(
|
||||
def _collect_bc_rows(
|
||||
lines: list[str],
|
||||
header_idx: int,
|
||||
d_col: int,
|
||||
|
|
@ -167,9 +229,9 @@ def _collect_b_rows(
|
|||
deductions: dict[str, Decimal] = {}
|
||||
for i in range(header_idx + 1, len(lines)):
|
||||
line = lines[i].rstrip()
|
||||
if not line.strip() or "Total Payment" in line:
|
||||
if "Total Payment" in line:
|
||||
return payments, order, deductions
|
||||
if "Total Payment" in line:
|
||||
return payments, order, deductions
|
||||
if not line.strip():
|
||||
continue
|
||||
p_seg = line[:d_col] if len(line) > d_col else line
|
||||
d_seg = line[d_col:y_col] if len(line) > d_col else ""
|
||||
|
|
@ -179,24 +241,31 @@ def _collect_b_rows(
|
|||
order.append((p_label, p_amount))
|
||||
d_label, d_amount = _last_amount(d_seg)
|
||||
if d_label and d_amount is not None:
|
||||
# RSU Net Gain can show as negative on the YTD side duplication;
|
||||
# normalize to absolute value on the deductions side.
|
||||
if d_label == "RSU Net Gain":
|
||||
d_amount = abs(d_amount)
|
||||
deductions[d_label] = d_amount
|
||||
return payments, order, deductions
|
||||
|
||||
|
||||
def _parse_b_totals_row(
|
||||
def _parse_bc_totals_row(
|
||||
lines: list[str],
|
||||
header_idx: int,
|
||||
d_col: int,
|
||||
y_col: int,
|
||||
) -> tuple[Decimal, Decimal]:
|
||||
del y_col # "Net Pay:" aligns with the Amount column, not the left edge of YTD
|
||||
for i in range(header_idx + 1, len(lines)):
|
||||
line = lines[i]
|
||||
if "Total Payment" not in line:
|
||||
continue
|
||||
p_seg = line[:d_col] if len(line) > d_col else line
|
||||
y_seg = line[y_col:] if len(line) > y_col else ""
|
||||
_, gross_pay = _last_amount(p_seg)
|
||||
_, net_pay = _last_amount(y_seg) if "Net Pay" in y_seg else (None, None)
|
||||
net_pay_idx = line.find("Net Pay")
|
||||
if net_pay_idx < 0:
|
||||
raise ParserError("Net Pay missing from totals row")
|
||||
_, net_pay = _last_amount(line[net_pay_idx:])
|
||||
if gross_pay is None:
|
||||
raise ParserError("Total Payment amount missing")
|
||||
if net_pay is None:
|
||||
|
|
@ -205,13 +274,8 @@ def _parse_b_totals_row(
|
|||
raise ParserError("totals row not found")
|
||||
|
||||
|
||||
def _parse_summary_block(lines: list[str]) -> dict[str, Decimal]:
|
||||
"""Pull Taxable Pay (this period + YTD), Tax Paid (YTD), Total Gross (YTD).
|
||||
|
||||
The summary sits after the totals row. Each row has 4 columns but only
|
||||
the numeric ones matter; we use "2+ numbers on a line starting with
|
||||
LABEL:" as the anchor, period-value first, YTD second.
|
||||
"""
|
||||
def _parse_bc_summary_block(lines: list[str]) -> dict[str, Decimal]:
|
||||
"""Pull Taxable Pay (this period + YTD), Tax Paid (YTD), Total Gross (YTD)."""
|
||||
result: dict[str, Decimal] = {}
|
||||
for line in lines:
|
||||
stripped = line.lstrip()
|
||||
|
|
@ -232,80 +296,98 @@ def _parse_summary_block(lines: list[str]) -> dict[str, Decimal]:
|
|||
return result
|
||||
|
||||
|
||||
PAYMENTS_KNOWN = {
|
||||
# --------------------------------------------------------------------------
|
||||
# Variant A — single-column Description | This Period | This Year
|
||||
# --------------------------------------------------------------------------
|
||||
|
||||
VARIANT_A_PAYMENTS_KNOWN = {
|
||||
"Salary",
|
||||
"Perform Bonus",
|
||||
"Bonus",
|
||||
"Perform Bonus",
|
||||
"Relocation Bonus",
|
||||
"AE Pension EE",
|
||||
"RSU Tax Offset",
|
||||
"RSU Excs Refund",
|
||||
"AE Pension",
|
||||
"Laundry Expense",
|
||||
"Transportation Allowance",
|
||||
"EE Edu Assist",
|
||||
"RSU Gain Taxable",
|
||||
"RSU Gain Nicable",
|
||||
"RSU Gain Taxabl",
|
||||
"RSU Gain Nicabl",
|
||||
"RSU Net Cash",
|
||||
"RSU Net Cash UK",
|
||||
# BIK earnings mirrored on the deduction side — we exclude them from
|
||||
# bonus/other_earnings so they don't double-count.
|
||||
"Private Dental Insurance",
|
||||
"Private Medical Insurance",
|
||||
"EE Discount BIK",
|
||||
}
|
||||
DEDUCTIONS_KNOWN = {
|
||||
"Tax paid",
|
||||
VARIANT_A_DEDUCTIONS_KNOWN = {
|
||||
"Tax",
|
||||
"Employee NIC",
|
||||
"National Insurance",
|
||||
"Student Loans",
|
||||
"Student Loan",
|
||||
"RSU Net Gain",
|
||||
"EE Discount BIK",
|
||||
}
|
||||
|
||||
VARIANT_A_RSU_LABELS = {
|
||||
"RSU Gain Taxable",
|
||||
"RSU Gain Nicable",
|
||||
"RSU Gain Taxabl",
|
||||
"RSU Gain Nicabl",
|
||||
"RSU Net Cash",
|
||||
"RSU Net Cash UK",
|
||||
}
|
||||
|
||||
def _build_other_deductions_b(
|
||||
deductions: dict[str, Decimal],
|
||||
payments_order: list[tuple[str, Decimal]],
|
||||
) -> dict[str, Decimal]:
|
||||
# Negative payments (Cycle To Work, Share Save, AE Pension EE) are
|
||||
# already subtracted from Total Payment — adding them here would
|
||||
# double-count in the validation formula. They remain visible in
|
||||
# raw_extraction for historical reference.
|
||||
del payments_order
|
||||
return {k: v for k, v in deductions.items() if k not in DEDUCTIONS_KNOWN}
|
||||
# "Taxable Pay : This Period £15323.16 : To Date £52446.53"
|
||||
TAXABLE_PAY_A_RE = re.compile(r"Taxable Pay\s*:\s*This Period\s*£([\d,]+\.\d{2})")
|
||||
NET_PAY_A_RE = re.compile(r"Net Pay\s+(-?[\d,]+\.\d{2})")
|
||||
|
||||
|
||||
def _parse_variant_a(text: str, lines: list[str]) -> ExtractedPayslip:
|
||||
def _parse_variant_a(text: str, lines: list[str], employer: str) -> ExtractedPayslip:
|
||||
header_idx = _find_variant_a_header(lines)
|
||||
items = _collect_a_rows(lines, header_idx)
|
||||
gross_pay, net_pay = _parse_a_gross_net(lines)
|
||||
payments, deductions = _collect_a_blocks(lines, header_idx)
|
||||
gross_pay = _parse_a_gross(lines, header_idx, payments)
|
||||
net_pay = _parse_a_net(text)
|
||||
|
||||
salary = items.get("Salary", Decimal("0"))
|
||||
bonus = items.get("Bonus", Decimal("0"))
|
||||
taxable_pay = items.get("Taxable Pay")
|
||||
income_tax = items.get("Tax", Decimal("0"))
|
||||
nic = items.get("National Insurance", Decimal("0"))
|
||||
student_loan = items.get("Student Loans", items.get("Student Loan", Decimal("0")))
|
||||
pension_employee = items.get("AE Pension EE", Decimal("0"))
|
||||
ae_pension = payments.get("AE Pension EE", payments.get("AE Pension", Decimal("0")))
|
||||
pension_sacrifice = abs(ae_pension) if ae_pension < 0 else Decimal("0")
|
||||
|
||||
known = {
|
||||
"Salary",
|
||||
"Bonus",
|
||||
"Taxable Pay",
|
||||
"Tax",
|
||||
"National Insurance",
|
||||
"Student Loans",
|
||||
"Student Loan",
|
||||
"AE Pension EE",
|
||||
}
|
||||
other_deductions = {k: v for k, v in items.items() if k not in known}
|
||||
rsu_vest = sum((payments.get(label, Decimal("0")) for label in VARIANT_A_RSU_LABELS),
|
||||
start=Decimal("0"))
|
||||
rsu_offset = deductions.get("RSU Net Gain", Decimal("0"))
|
||||
|
||||
pay_date, period_start, period_end = _parse_dates(text)
|
||||
income_tax = deductions.get("Tax", Decimal("0"))
|
||||
nic = deductions.get("National Insurance", Decimal("0"))
|
||||
student_loan = deductions.get("Student Loans", deductions.get("Student Loan", Decimal("0")))
|
||||
|
||||
other_deductions = {k: v for k, v in deductions.items() if k not in VARIANT_A_DEDUCTIONS_KNOWN}
|
||||
|
||||
bonus = payments.get("Perform Bonus", payments.get("Bonus", Decimal("0")))
|
||||
|
||||
taxable_pay_s = _find_match(text, TAXABLE_PAY_A_RE)
|
||||
taxable_pay = _to_decimal(taxable_pay_s) if taxable_pay_s else None
|
||||
|
||||
pay_date = _parse_date(text)
|
||||
|
||||
return ExtractedPayslip(
|
||||
pay_date=pay_date,
|
||||
pay_period_start=period_start,
|
||||
pay_period_end=period_end,
|
||||
employer=EMPLOYER,
|
||||
pay_period_start=None,
|
||||
pay_period_end=None,
|
||||
employer=employer,
|
||||
currency="GBP",
|
||||
gross_pay=gross_pay,
|
||||
income_tax=income_tax,
|
||||
national_insurance=nic,
|
||||
pension_employee=pension_employee,
|
||||
pension_employee=Decimal("0"),
|
||||
pension_employer=Decimal("0"),
|
||||
student_loan=student_loan,
|
||||
rsu_vest=Decimal("0"),
|
||||
rsu_offset=Decimal("0"),
|
||||
salary=salary,
|
||||
rsu_vest=rsu_vest,
|
||||
rsu_offset=rsu_offset,
|
||||
salary=payments.get("Salary", Decimal("0")),
|
||||
bonus=bonus,
|
||||
pension_sacrifice=Decimal("0"),
|
||||
pension_sacrifice=pension_sacrifice,
|
||||
taxable_pay=taxable_pay,
|
||||
ytd_tax_paid=None,
|
||||
ytd_taxable_pay=None,
|
||||
|
|
@ -322,37 +404,70 @@ def _find_variant_a_header(lines: list[str]) -> int:
|
|||
raise ParserError("variant A header not found")
|
||||
|
||||
|
||||
def _collect_a_rows(lines: list[str], header_idx: int) -> dict[str, Decimal]:
|
||||
items: dict[str, Decimal] = {}
|
||||
def _collect_a_blocks(
|
||||
lines: list[str],
|
||||
header_idx: int,
|
||||
) -> tuple[dict[str, Decimal], dict[str, Decimal]]:
|
||||
"""Split variant A rows into Payments vs Deductions by the two `Total` anchors.
|
||||
|
||||
Layout: header → payments rows → `Total <gross>` → deductions rows →
|
||||
`Total <deductions>` → `Net Pay <net>`. We collect rows into whichever
|
||||
block we're currently in.
|
||||
"""
|
||||
payments: dict[str, Decimal] = {}
|
||||
deductions: dict[str, Decimal] = {}
|
||||
block = payments
|
||||
total_count = 0
|
||||
for i in range(header_idx + 1, len(lines)):
|
||||
line = lines[i].rstrip()
|
||||
if not line.strip() or line.lstrip().startswith("-"):
|
||||
raw = lines[i].rstrip()
|
||||
if not raw.strip():
|
||||
continue
|
||||
if "Gross Pay" in line or "Net Pay" in line:
|
||||
stripped = raw.strip()
|
||||
if stripped.startswith("Total ") or stripped.startswith("Total\t"):
|
||||
total_count += 1
|
||||
if total_count == 1:
|
||||
block = deductions
|
||||
continue
|
||||
if total_count == 2:
|
||||
break
|
||||
if "Net Pay" in raw:
|
||||
break
|
||||
amounts = list(AMOUNT_RE.finditer(line))
|
||||
if not amounts:
|
||||
matches = list(AMOUNT_RE.finditer(raw))
|
||||
if not matches:
|
||||
continue
|
||||
label = line[:amounts[0].start()].strip()
|
||||
if label:
|
||||
items[label] = _to_decimal(amounts[0].group())
|
||||
return items
|
||||
label = raw[:matches[0].start()].strip()
|
||||
if not label:
|
||||
continue
|
||||
# "This Period" value is the first amount; "This Year" is the second.
|
||||
# If only one amount is present, it's a YTD-only row (e.g. Relocation
|
||||
# Bonus which doesn't apply this period) — skip it for the period totals.
|
||||
if len(matches) < 2:
|
||||
continue
|
||||
amount = _to_decimal(matches[0].group())
|
||||
block[label] = amount
|
||||
return payments, deductions
|
||||
|
||||
|
||||
def _parse_a_gross_net(lines: list[str]) -> tuple[Decimal, Decimal]:
|
||||
gross_pay: Decimal | None = None
|
||||
net_pay: Decimal | None = None
|
||||
for line in lines:
|
||||
if "Gross Pay" in line and gross_pay is None:
|
||||
nums = AMOUNT_RE.findall(line)
|
||||
def _parse_a_gross(
|
||||
lines: list[str],
|
||||
header_idx: int,
|
||||
payments: dict[str, Decimal],
|
||||
) -> Decimal:
|
||||
"""Pull the first `Total <amount>` after the header — that's gross pay."""
|
||||
for i in range(header_idx + 1, len(lines)):
|
||||
stripped = lines[i].strip()
|
||||
if stripped.startswith("Total "):
|
||||
nums = AMOUNT_RE.findall(stripped)
|
||||
if nums:
|
||||
gross_pay = _to_decimal(nums[0])
|
||||
if "Net Pay" in line and net_pay is None:
|
||||
nums = AMOUNT_RE.findall(line)
|
||||
if nums:
|
||||
net_pay = _to_decimal(nums[0])
|
||||
if gross_pay is None:
|
||||
raise ParserError("Gross Pay not found")
|
||||
if net_pay is None:
|
||||
raise ParserError("Net Pay not found")
|
||||
return gross_pay, net_pay
|
||||
return _to_decimal(nums[0])
|
||||
# Fallback: sum payments values if the Total line is missing.
|
||||
if payments:
|
||||
return sum(payments.values(), start=Decimal("0"))
|
||||
raise ParserError("Total (gross pay) row not found in variant A")
|
||||
|
||||
|
||||
def _parse_a_net(text: str) -> Decimal:
|
||||
m = NET_PAY_A_RE.search(text)
|
||||
if not m:
|
||||
raise ParserError("Net Pay line not found in variant A")
|
||||
return _to_decimal(m.group(1))
|
||||
|
|
|
|||
21
tests/fixtures/meta_uk_2019_07.txt
vendored
21
tests/fixtures/meta_uk_2019_07.txt
vendored
|
|
@ -1,21 +0,0 @@
|
|||
Facebook UK Limited Payslip
|
||||
|
||||
Employee: Viktor Barzin NI Number: AA123456A
|
||||
Employee No: 254680 Tax Code: 1185L
|
||||
Pay Date: 31/07/2019 Pay Period: 4
|
||||
Period Start: 01/07/2019 Period End: 31/07/2019
|
||||
|
||||
|
||||
Description This Period This Year
|
||||
---------------------------------------------------------------------
|
||||
Salary 7,083.33 28,333.32
|
||||
Taxable Pay 6,583.33 26,333.32
|
||||
Tax 1,480.00 5,920.00
|
||||
National Insurance 564.73 2,258.92
|
||||
AE Pension EE 500.00 2,000.00
|
||||
Student Loans 120.00 480.00
|
||||
|
||||
---------------------------------------------------------------------
|
||||
|
||||
Gross Pay: 7,083.33
|
||||
Net Pay: 4,418.60
|
||||
44
tests/fixtures/meta_uk_2021_08_variant_a.txt
vendored
Normal file
44
tests/fixtures/meta_uk_2021_08_variant_a.txt
vendored
Normal file
|
|
@ -0,0 +1,44 @@
|
|||
254680A Mr Viktor Barzin Facebook UK Ltd
|
||||
|
||||
NI No : SZ762223D NI Letter : A Tax Code : 0T Pay By : BACS Date : 31 Aug 2021 Period : 5
|
||||
Description Rate Units This Period This Year
|
||||
|
||||
Salary 5,096.65 25,483.25
|
||||
AE Pension (152.90) (764.50)
|
||||
Laundry Expense 40.00 200.00
|
||||
Relocation Bonus 8,184.20
|
||||
RSU Gain Taxable 10,239.30 18,757.93
|
||||
RSU Gain Nicable 10,239.30 18,757.93
|
||||
Transportation Allowance 73.10
|
||||
RSU Net Cash UK 175.91 207.29
|
||||
Private Dental Insurance 15.61 78.05
|
||||
Private Medical Insurance 84.50 422.50
|
||||
EE Discount BIK 12.00
|
||||
|
||||
|
||||
|
||||
|
||||
Total 25,738.37
|
||||
|
||||
Tax 5,500.87 17,836.73
|
||||
National Insurance 627.72 2,655.22
|
||||
RSU Net Gain 15,666.13 28,699.63
|
||||
Private Dental Insurance 15.61 78.05
|
||||
Private Medical Insurance 84.50 422.50
|
||||
EE Discount BIK 12.00
|
||||
Student Loans 1,165.00 3,649.00
|
||||
|
||||
|
||||
|
||||
|
||||
Total 23,059.83
|
||||
Tax District : Pay As You Earn
|
||||
|
||||
Tax Reference : 846/BA09294 Net Pay 2,678.54
|
||||
|
||||
Taxable Pay : This Period £15323.16 : To Date £52446.53
|
||||
Employers NIC This Period : 1,999.08
|
||||
Employers NIC To Date : 6,660.03
|
||||
Employers Pension This Period : 458.70
|
||||
Employers Pension To Date :2,293.50
|
||||
|
||||
60
tests/fixtures/meta_uk_2022_11_variant_c.txt
vendored
Normal file
60
tests/fixtures/meta_uk_2022_11_variant_c.txt
vendored
Normal file
|
|
@ -0,0 +1,60 @@
|
|||
Page : 1
|
||||
|
||||
|
||||
|
||||
Employee Number: 254680
|
||||
|
||||
Facebook UK Limited
|
||||
|
||||
|
||||
|
||||
|
||||
PRIVATE & CONFIDENTIAL
|
||||
Viktor Barzin
|
||||
Flat 37
|
||||
Spenlow Apartments
|
||||
London
|
||||
N1 7GH
|
||||
|
||||
|
||||
|
||||
|
||||
Company Name : Facebook UK Limited Tax Ref : 846/BA09294 Nat Ins No : SZ762223D
|
||||
Pay Date : 30.11.2022 Tax Code : 0T Nat Ins Cat: A
|
||||
Pay Method : BACS Transfer Tax Basis : 0 Cost Centre: 4220
|
||||
Pay Period : 08/2022 Tax Period : 08
|
||||
|
||||
|
||||
Payments Units Rate Amount Deductions Amount Year To Date Amount
|
||||
|
||||
Salary 8,983.33 Tax paid 5,800.07 Salary 8,983.33
|
||||
AE Pension EE -539.00 Employee NIC 612.65 RSU Gain Taxabl 7,531.31
|
||||
RSU Gain Taxabl 7,531.31 Student Loan 1,233.00 RSU Gain Nicabl 7,531.31
|
||||
RSU Gain Nicabl 7,531.31 RSU Net Gain 11,522.91 RSU Net Cash 129.38
|
||||
RSU Net Cash 129.38 Student Loan 7,271.00
|
||||
RSU Net Gain -11,522.91
|
||||
|
||||
|
||||
|
||||
|
||||
ER Pension This Period
|
||||
AE Pension ER 808.50
|
||||
|
||||
|
||||
|
||||
|
||||
Total Payment: 23,636.33 Total Deduction : 19,168.63 Net Pay: 4,467.70
|
||||
|
||||
|
||||
This Period Amount Year To Date Amount Gross Benefits Payments
|
||||
----------------------------------
|
||||
Total Gross: 23,636.33 Total Gross: 131,034.64 Dent Ins TaxB 17.83
|
||||
Non-Tax Ded: 0.00 Non-Tax ded: 0.00 Medi Ins TaxB 76.67
|
||||
Taxable Pay: 16,070.14 Taxable Pay: 99,784.08
|
||||
Tax Paid: 5,800.07 Tax Paid: 34,886.93
|
||||
EEs NI : 612.65 EEs NI: 5,361.55
|
||||
ERs NI : 2,100.04 ERs NI: 13,800.83
|
||||
EEs Pension: -539.00 EEs Pension: -3,596.28
|
||||
EEs AVC: 0.00 EEs AVC 0.00
|
||||
ERs Pension: 5,504.38
|
||||
|
||||
|
|
@ -1,24 +0,0 @@
|
|||
Facebook UK Limited Payslip
|
||||
|
||||
Employee: Viktor Barzin NI Number: AA123456A Pay Date: 27/03/2024
|
||||
Employee No: 254680 Tax Code: 1257L Pay Period: 12
|
||||
Department: Engineering Period Start: 01/03/2024
|
||||
Period End: 31/03/2024
|
||||
|
||||
|
||||
Payments Units Rate Amount Deductions Amount Year to Date Amount
|
||||
Salary 9,500.00 Tax paid 800.00 Salary 114,000.00
|
||||
Perform Bonus 0.00 Employee NIC 280.00 Transportation 820.50
|
||||
AE Pension EE -6,200.00 Student Loans 90.00
|
||||
|
||||
|
||||
--------- ---------
|
||||
Total Payment: 3,300.00 Total Deduction : 1,170.00 Net Pay: 2,130.00
|
||||
|
||||
|
||||
This Period Amount Year To Date Amount
|
||||
Total Gross: 3,300.00 Total Gross: 210,000.00
|
||||
Taxable Pay: 3,300.00 Taxable Pay: 185,000.00
|
||||
Tax Paid: 800.00 Tax Paid: 42,000.00
|
||||
EEs NI: 280.00 EEs NI: 9,100.00
|
||||
EEs Pension: -6,200.00 EEs Pension: -52,000.00
|
||||
|
|
@ -13,28 +13,24 @@ def _load(name: str) -> str:
|
|||
return (FIXTURES / name).read_text(encoding="utf-8")
|
||||
|
||||
|
||||
def test_parses_variant_b_standard_month() -> None:
|
||||
"""Feb 2026 — variant B, RSU vesting, no bonus, salary-sacrifice pension."""
|
||||
def test_parses_variant_b_modern() -> None:
|
||||
"""Feb 2026 — variant B (post-2024), RSU vest, salary-sacrifice pension."""
|
||||
result = parse_meta_uk(_load("meta_uk_2026_02.txt"))
|
||||
|
||||
assert result.pay_date == date(2026, 2, 27)
|
||||
assert result.pay_period_start == date(2026, 2, 1)
|
||||
assert result.pay_period_end == date(2026, 2, 27)
|
||||
assert result.employer == "Facebook UK Limited"
|
||||
assert result.currency == "GBP"
|
||||
|
||||
assert result.salary == Decimal("10003.33")
|
||||
assert result.bonus == Decimal("0")
|
||||
assert result.pension_sacrifice == Decimal("600.20")
|
||||
# rsu_vest = RSU Tax Offset + RSU Excs Refund
|
||||
assert result.rsu_vest == Decimal("30479.76")
|
||||
assert result.rsu_offset == Decimal("0")
|
||||
assert result.rsu_vest == Decimal("30479.76") # RSU Tax Offset + RSU Excs Refund
|
||||
assert result.rsu_offset == Decimal("0") # modern Meta template omits offset
|
||||
|
||||
assert result.gross_pay == Decimal("39882.89")
|
||||
assert result.income_tax == Decimal("31311.90")
|
||||
assert result.national_insurance == Decimal("1602.89")
|
||||
assert result.pension_employee == Decimal("0")
|
||||
assert result.student_loan == Decimal("0")
|
||||
assert result.net_pay == Decimal("6968.10")
|
||||
|
||||
assert result.taxable_pay == Decimal("72096.92")
|
||||
|
|
@ -43,8 +39,8 @@ def test_parses_variant_b_standard_month() -> None:
|
|||
assert result.ytd_gross == Decimal("232630.34")
|
||||
|
||||
|
||||
def test_parses_variant_b_with_bonus_and_rsu() -> None:
|
||||
"""March 2025 — variant B, bonus month, RSU vesting, multiple other deductions."""
|
||||
def test_parses_variant_b_with_bonus() -> None:
|
||||
"""March 2025 — variant B, bonus + RSU + multiple other deductions."""
|
||||
result = parse_meta_uk(_load("meta_uk_2025_03.txt"))
|
||||
|
||||
assert result.pay_date == date(2025, 3, 27)
|
||||
|
|
@ -54,62 +50,74 @@ def test_parses_variant_b_with_bonus_and_rsu() -> None:
|
|||
assert result.rsu_vest == Decimal("20000.00")
|
||||
|
||||
assert result.gross_pay == Decimal("53720.00")
|
||||
assert result.income_tax == Decimal("45210.44")
|
||||
assert result.national_insurance == Decimal("2750.12")
|
||||
assert result.student_loan == Decimal("850.00")
|
||||
assert result.net_pay == Decimal("4753.69")
|
||||
|
||||
# Private Medical comes from the Deductions column. Cycle To Work is a
|
||||
# negative Payments line — already subtracted from Total Payment, so it
|
||||
# does NOT belong in other_deductions (that would double-count).
|
||||
assert "Private Medical" in result.other_deductions
|
||||
assert result.other_deductions["Private Medical"] == Decimal("155.75")
|
||||
assert "Cycle To Work" not in result.other_deductions
|
||||
|
||||
|
||||
def test_parses_variant_b_bonus_sacrificed() -> None:
|
||||
"""March 2024 — variant B, full bonus sacrificed into pension, bonus line = 0."""
|
||||
result = parse_meta_uk(_load("meta_uk_2024_03_bonus_sacrificed.txt"))
|
||||
def test_parses_variant_c_2022_11() -> None:
|
||||
"""Nov 2022 — mid-era template. Real pdftotext from doc_id=53.
|
||||
|
||||
assert result.pay_date == date(2024, 3, 27)
|
||||
assert result.salary == Decimal("9500.00")
|
||||
# Bonus line present but zero — parser should surface this so the dashboard
|
||||
# can highlight the "bonus sacrificed" dip.
|
||||
assert result.bonus == Decimal("0")
|
||||
# Big pension sacrifice dwarfs the salary — this is the signal we care about.
|
||||
assert result.pension_sacrifice == Decimal("6200.00")
|
||||
assert result.rsu_vest == Decimal("0")
|
||||
|
||||
assert result.gross_pay == Decimal("3300.00")
|
||||
assert result.net_pay == Decimal("2130.00")
|
||||
|
||||
|
||||
def test_parses_variant_a_pre_2022() -> None:
|
||||
"""July 2019 — variant A, pre-RSU, single-column layout.
|
||||
|
||||
Variant A lists AE Pension EE as a positive deduction (pre-sacrifice gross),
|
||||
so it maps to `pension_employee` for the standard validation formula to hold.
|
||||
Variant B lists it as a negative payment (post-sacrifice gross) and maps to
|
||||
`pension_sacrifice` instead. Both represent money going into the pension.
|
||||
Side-by-side Payments | Deductions | Year To Date (capital "To"), dot-
|
||||
separated date, RSU labels use `RSU Gain Taxabl` / `Nicabl` (abbreviated)
|
||||
and a matching `RSU Net Gain` offset on the deductions side.
|
||||
"""
|
||||
result = parse_meta_uk(_load("meta_uk_2019_07.txt"))
|
||||
result = parse_meta_uk(_load("meta_uk_2022_11_variant_c.txt"))
|
||||
|
||||
assert result.pay_date == date(2019, 7, 31)
|
||||
assert result.pay_date == date(2022, 11, 30)
|
||||
assert result.employer == "Facebook UK Limited"
|
||||
assert result.salary == Decimal("7083.33")
|
||||
|
||||
assert result.salary == Decimal("8983.33")
|
||||
assert result.bonus == Decimal("0")
|
||||
assert result.rsu_vest == Decimal("0")
|
||||
assert result.pension_sacrifice == Decimal("0")
|
||||
assert result.pension_employee == Decimal("500.00")
|
||||
assert result.pension_sacrifice == Decimal("539.00")
|
||||
# rsu_vest = RSU Gain Taxabl + RSU Gain Nicabl + RSU Net Cash
|
||||
assert result.rsu_vest == Decimal("15192.00")
|
||||
# rsu_offset = RSU Net Gain (the matching deduction)
|
||||
assert result.rsu_offset == Decimal("11522.91")
|
||||
|
||||
assert result.gross_pay == Decimal("7083.33")
|
||||
assert result.income_tax == Decimal("1480.00")
|
||||
assert result.national_insurance == Decimal("564.73")
|
||||
assert result.student_loan == Decimal("120.00")
|
||||
assert result.net_pay == Decimal("4418.60")
|
||||
assert result.gross_pay == Decimal("23636.33")
|
||||
assert result.income_tax == Decimal("5800.07")
|
||||
assert result.national_insurance == Decimal("612.65")
|
||||
assert result.student_loan == Decimal("1233.00")
|
||||
assert result.net_pay == Decimal("4467.70")
|
||||
|
||||
# Variant A carries a "Taxable Pay" line inline
|
||||
assert result.taxable_pay == Decimal("6583.33")
|
||||
assert result.taxable_pay == Decimal("16070.14")
|
||||
assert result.ytd_tax_paid == Decimal("34886.93")
|
||||
assert result.ytd_taxable_pay == Decimal("99784.08")
|
||||
assert result.ytd_gross == Decimal("131034.64")
|
||||
|
||||
|
||||
def test_parses_variant_a_2021_08() -> None:
|
||||
"""Aug 2021 — variant A. Real pdftotext from doc_id=43.
|
||||
|
||||
Single-column Description | This Period | This Year layout. Parenthesized
|
||||
negatives `(152.90)`, Facebook UK Ltd (not Limited), date `Date : 31 Aug
|
||||
2021`. BIK items (Dental/Medical) appear as both earnings and deductions.
|
||||
"""
|
||||
result = parse_meta_uk(_load("meta_uk_2021_08_variant_a.txt"))
|
||||
|
||||
assert result.pay_date == date(2021, 8, 31)
|
||||
assert result.employer == "Facebook UK Ltd"
|
||||
|
||||
assert result.salary == Decimal("5096.65")
|
||||
assert result.bonus == Decimal("0")
|
||||
assert result.pension_sacrifice == Decimal("152.90")
|
||||
# rsu_vest = RSU Gain Taxable + RSU Gain Nicable + RSU Net Cash UK
|
||||
assert result.rsu_vest == Decimal("20654.51")
|
||||
assert result.rsu_offset == Decimal("15666.13")
|
||||
|
||||
assert result.gross_pay == Decimal("25738.37")
|
||||
assert result.income_tax == Decimal("5500.87")
|
||||
assert result.national_insurance == Decimal("627.72")
|
||||
assert result.student_loan == Decimal("1165.00")
|
||||
assert result.net_pay == Decimal("2678.54")
|
||||
|
||||
# BIK offsets on the deductions side
|
||||
assert result.other_deductions.get("Private Dental Insurance") == Decimal("15.61")
|
||||
assert result.other_deductions.get("Private Medical Insurance") == Decimal("84.50")
|
||||
|
||||
# Variant A surfaces Taxable Pay via a trailing line `Taxable Pay : This
|
||||
# Period £XXXX.XX : To Date £YYYY.YY`.
|
||||
assert result.taxable_pay == Decimal("15323.16")
|
||||
|
||||
|
||||
def test_raises_on_non_meta_payslip() -> None:
|
||||
|
|
@ -122,17 +130,11 @@ def test_raises_on_empty_text() -> None:
|
|||
parse_meta_uk("")
|
||||
|
||||
|
||||
def test_raises_when_pay_date_missing() -> None:
|
||||
broken = "Facebook UK Limited\nPayslip\nSalary 1000.00\nNet Pay: 800.00\n"
|
||||
with pytest.raises(ParserError):
|
||||
parse_meta_uk(broken)
|
||||
|
||||
|
||||
@pytest.mark.parametrize("fixture_name", [
|
||||
"meta_uk_2026_02.txt",
|
||||
"meta_uk_2025_03.txt",
|
||||
"meta_uk_2024_03_bonus_sacrificed.txt",
|
||||
"meta_uk_2019_07.txt",
|
||||
"meta_uk_2022_11_variant_c.txt",
|
||||
"meta_uk_2021_08_variant_a.txt",
|
||||
])
|
||||
def test_all_fixtures_validate_totals(fixture_name: str) -> None:
|
||||
"""Every fixture must satisfy gross - deductions ≈ net within 2p."""
|
||||
|
|
@ -140,7 +142,7 @@ def test_all_fixtures_validate_totals(fixture_name: str) -> None:
|
|||
|
||||
result = parse_meta_uk(_load(fixture_name))
|
||||
assert validate_totals(result), (
|
||||
f"{fixture_name}: gross={result.gross_pay} "
|
||||
f"tax={result.income_tax} nic={result.national_insurance} "
|
||||
f"student={result.student_loan} other={result.other_deductions} "
|
||||
f"net={result.net_pay}")
|
||||
f"{fixture_name}: gross={result.gross_pay} tax={result.income_tax} "
|
||||
f"nic={result.national_insurance} student={result.student_loan} "
|
||||
f"pension_employee={result.pension_employee} rsu_offset={result.rsu_offset} "
|
||||
f"other={result.other_deductions} net={result.net_pay}")
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue