hmrc-sync/README.md

91 lines
3.8 KiB
Markdown
Raw Permalink Normal View History

2026-05-07 17:06:11 +00:00
# hmrc-sync
Pulls annual PAYE/NI figures from **HMRC Individual Tax API v1.1** to
reconcile against the monthly payslip data captured by `payslip-ingest/`.
## Phase 1 — sandbox OAuth smoke test (shipped)
Scripts live at the repo root next to this README:
- `oauth_dance.py` — interactive browser OAuth flow against
`test-api.service.hmrc.gov.uk`, captures the callback on
`localhost:8080/oauth/callback`, exchanges for tokens, hits
`/individual-income/sa/{utr}/annual-summary/{tax_year}`.
- `headless_auth.py` — same flow but driven by Chromium via Playwright.
Useful for CI smoke tests.
See the inline module docstrings for usage.
## Phase 2 — production service (scaffolded, awaiting HMRC approval)
Directory layout matches `payslip-ingest/`:
```
hmrc-sync/
├── hmrc_sync/
│ ├── __init__.py
│ ├── __main__.py # click CLI: serve / sync / migrate
│ ├── app.py # FastAPI (authorize, callback, sync, healthz)
│ ├── client.py # HmrcClient — wraps Individual Tax API v1.1
│ ├── db.py # SQLAlchemy models (tax_year_snapshot, fetch_log)
│ ├── fraud_headers.py # build Gov-Client-/Gov-Vendor- headers
│ └── oauth.py # Vault-backed refresh_token storage
├── alembic/
│ ├── env.py
│ └── versions/0001_initial.py
├── tests/
│ └── test_fraud_headers.py # CI-gated shape tests + sandbox validator smoke
├── Dockerfile
├── alembic.ini
└── pyproject.toml
```
### Critical path to prod
1. **HMRC Dev Hub** (user action, ~10 min):
- Subscribe to *Individual Tax API v1.1*.
- Add prod redirect URI: `https://hmrc-oauth.viktorbarzin.me/callback`.
- Submit Production Access application — 2 questionnaires, frame as
"single-user PAYE reconciliation, not redistributed".
- Review takes ~10 working days.
2. **File HMRC SDST support ticket** up-front asking (a) is MTD ITSA
signup required for Individual Tax API prod access, and (b) can a
PAYE-only individual voluntarily enroll without self-employment
income. Proceed with app submission in parallel.
3. **Fraud-header validator sweep** (local — blocking):
```
HMRC_VALIDATOR=1 pytest tests/test_fraud_headers.py
```
Must be green before prod deploy.
4. **After HMRC approval arrives**:
- Seed Vault keys: `hmrc_prod_client_id`, `hmrc_prod_client_secret`,
`hmrc_sync_webhook_token`, `hmrc_device_id` at `secret/viktor/`.
- Create `infra/stacks/hmrc-sync/` Terraform stack (clone from
`infra/stacks/payslip-ingest/`): Deployment, Service, Ingress via
`ingress_factory` (protected=false for HMRC callback), ESO for
Vault→K8s Secret, Grafana datasource ConfigMap, CronJob at 06:00
UTC daily running `python -m hmrc_sync sync --tax-year current`.
- Deploy stack.
- Visit `https://hmrc-oauth.viktorbarzin.me/authorize` once in a
browser to seed the refresh_token. CronJob takes over thereafter.
### Dashboard Panel 10
`infra/stacks/monitoring/modules/monitoring/dashboards/uk-payslip.json`
already carries Panel 10 ("HMRC Tax Year Reconciliation — Individual
Tax API"). It queries `hmrc_sync.tax_year_snapshot` which doesn't
exist yet on the monitoring DB — the panel renders empty until
hmrc-sync is deployed and the Alembic migration runs.
### Risks / mitigations
- **MTD pilot gate blocks API** — SDST ticket resolves; fallback is
payslip-ingest P60 reconciliation (already shipped).
- **Prod approval denied on "personal use"** — reframe + appeal; else
permanent P60-only reconciliation.
- **Fraud-header audit fails** — validator API gates deploy.
- **Refresh token expires (18 months)** — alert on `expires_in` < 30
days; manual re-auth via `/authorize`.
Tracked as beads `code-74j`.