- **Sources** (`job_hunter/sources/`): `ats` (Greenhouse/Lever/Ashby JSON APIs, ~35 companies in `config/companies.yaml`), `hn` (Algolia), `levels_fyi` (comp medians), `linkedin_guest` (opt-in), `changedetection` (`/webhook/cdio` for non-ATS careers pages in `config/cdio_watches.yaml`).
- **Tables**: `companies`, `roles`, `comp_points`, `levels`, `fx_rates` (upsert-in-place, "current state"); `comp_snapshots`, `roles_snapshots` (append-only, one row per source-row per `snapshot_date` — the dated series). Snapshots are written as a side-effect of every upsert during a refresh.
- **The ATS fetch is resilient**: a board returning a permanent 4xx (404/410/403) is skipped with a warning; 5xx/network errors retry once then skip. One dead board cannot abort the whole run (regression fixed 2026-06-02 — Elastic's 404 had been taking down every refresh). Boards are fetched concurrently (bounded semaphore, default 8 in-flight).
---
## OPS
### Is it healthy?
```bash
# CronJob exists + last schedule/success
kubectl -n job-hunter get cronjob job-hunter-refresh
# Most recent run's pods + logs
kubectl -n job-hunter get jobs -l app=job-hunter --sort-by=.metadata.creationTimestamp
kubectl -n job-hunter logs -l job-name=$(kubectl -n job-hunter get jobs -o jsonpath='{.items[-1:].metadata.name}')
# Deployment (serves the CLI / webhook) is up
kubectl -n job-hunter get deploy job-hunter
# Data freshness — newest snapshot date should advance weekly
The DB password rotates every 7 days (Vault static role `pg-job-hunter`);
Reloader restarts the Deployment when the ESO-synced secret changes. The
Grafana datasource password is mirrored via a second ExternalSecret in the
`monitoring` namespace.
### Common failures
| Symptom | Cause | Fix |
|---|---|---|
| Refresh job `Error`, log shows `ats: skipping company=X — HTTP 404` | A board slug was renamed/removed | Expected — the run continues. Remove the dead slug from `companies.yaml` if permanent. |
| Refresh aborts with a traceback before any company | Pre-2026-06-02 image (no skip-on-404) | Confirm Keel rolled the new image: `kubectl -n job-hunter get deploy job-hunter -o jsonpath='{..image}'`. |
| `snapshot` / refresh fails: `relation "job_hunter.comp_snapshots" does not exist` | Migration 0004 not applied | The CronJob + Deployment run `migrate` on start. Run `kubectl -n job-hunter exec deploy/job-hunter -- python -m job_hunter migrate`. |
| `/webhook/cdio` returns 401 | `webhook_bearer_token` mismatch between Vault and the CDIO notification URL | Re-run `cdio-seed` after rotating the token; it rebuilds the `jsons://...?+Authorization=` URL. |
| Non-GBP comp looks wrong / NULL | `fx_rates` gap for the role's `posted_at` date | `kubectl -n job-hunter exec deploy/job-hunter -- python -m job_hunter backfill-fx --days 30` |
| Job OOMKilled | levels.fyi HTML parse spike across many companies | Bump the CronJob container memory limit in `cronjob.tf` (currently 1Gi). |
---
## ANALYST
### The periodic "market leaders in comp" report
This is the headline command — current leaders by p50 total comp, week-over-week
movers, new entrants, open-role counts, and sample-size caveats:
### Trend queries (Grafana or psql against the snapshot tables)
The dated series lives in `comp_snapshots` / `roles_snapshots`. Examples (run in
Grafana's "Job Hunter" datasource, or `psql` as the `job_hunter` role):
```sql
-- Comp trend: median total comp per company over time (London)
SELECT s.snapshot_date, c.display_name,
percentile_cont(0.5) WITHIN GROUP (ORDER BY COALESCE(s.total_gbp, s.base_gbp)) AS p50_gbp
FROM job_hunter.comp_snapshots s
JOIN job_hunter.companies c ON c.id = s.company_id
WHERE s.location_bucket = 'london'
GROUP BY s.snapshot_date, c.display_name
ORDER BY s.snapshot_date, p50_gbp DESC;
-- Hiring-volume trend: open London roles per company per snapshot
SELECT s.snapshot_date, c.display_name, COUNT(*) AS open_roles
FROM job_hunter.roles_snapshots s
JOIN job_hunter.companies c ON c.id = s.company_id
WHERE s.primary_location = 'london'
GROUP BY s.snapshot_date, c.display_name
ORDER BY s.snapshot_date, open_roles DESC;
-- Two-snapshot diff: p50 change for one company between two dates
SELECT c.display_name, s.snapshot_date,
percentile_cont(0.5) WITHIN GROUP (ORDER BY COALESCE(s.total_gbp, s.base_gbp)) AS p50
FROM job_hunter.comp_snapshots s
JOIN job_hunter.companies c ON c.id = s.company_id
WHERE c.slug = 'janestreet' AND s.snapshot_date IN ('2026-06-02', '2026-08-30')
GROUP BY c.display_name, s.snapshot_date;
```
### Interpreting the numbers — caveats
- **Sample size**: `analyze` flags companies with `n < 3` as `low_confidence`. A single self-reported datapoint is anecdote, not a band — chase the p50 only where n is healthy.
- **levels.fyi bias**: comp_points are self-reported medians; they skew toward people who report (often higher earners) and lag the market by a quarter or two.
- **HFT/quant**: base comp is the disclosed figure; bonus (often the larger half) is variable and usually absent from postings. Treat HFT base as a floor, not total.
- **Currency**: all figures are GBP-normalised via ECB rates looked up by `posted_at` (7-day fallback). A FX gap shows as NULL comp, not a wrong number.
- **Movers need history**: a delta is only as good as the two snapshot dates behind it; early deltas (<full`trend_weeks`ofdata)compareagainsttheearliestavailablesnapshotandarenotedassuch.