Commit graph

142 commits

Author SHA1 Message Date
Viktor Barzin
1cb2bb30f7 monitoring(wealth): show pre-2024 historical data on timeseries
Bug: timeseries panels were empty before 2024-04-10. Cause was the
complete_dates CTE filtering to "every active account has a row for
this date" -- which excluded every day before the most-recently-added
account first appeared. The 6th account (Trading212 Invest GIA) only
started 2024-04-10, so 4 years of legitimate historical data
(2020-06-07 onwards, when the user genuinely had fewer accounts) got
hidden.

New pattern across panels 5/6/7/8/9/12/13: replace complete_dates with
max_complete cutoff. Compute the most-recent date where all current
accounts have a row, then include every historical date up to and
including that day. Partial-today is still excluded automatically.
Historical days with fewer accounts now show as their actual smaller
sums -- which is the correct historical net worth at the time.

Verified via PG: new pattern returns 2,159 distinct days from
2020-06-07 to 2026-05-05 (vs the previous 391 from 2024-04-10).

Per-account first-seen dates:
  InvestEngine ISA       - 2020-06-07
  Schwab US workplace    - 2020-11-17
  InvestEngine GIA       - 2022-03-17
  Fidelity UK Pension    - 2022-05-16
  Trading212 ISA         - 2024-04-08
  Trading212 Invest GIA  - 2024-04-10  (was the bottleneck)
2026-05-05 18:43:26 +00:00
Viktor Barzin
6715cdc51f monitoring(wealth): re-add milestone annotations (now that PG creds rotated)
Re-applies the milestone annotation commit reverted in 0ef36aec. The
earlier "nothing loads / syntax error" was a red herring: Vault had
rotated the wealthfolio_sync DB password 7 days prior, the K8s Secret
picked it up automatically (pg-sync sidecar still working), but the
Grafana datasource ConfigMap is baked at TF-apply time so Grafana was
sending the old password. Every panel + the new annotation alike
failed with: pq password authentication failed for user wealthfolio_sync.

Fix today: refresh the datasource ConfigMap and roll Grafana.

  scripts/tg apply -target=kubernetes_config_map.grafana_wealth_datasource
  kubectl -n monitoring rollout restart deploy/grafana

Annotation source verified live via /api/ds/query: SQL returns 5
milestone rows correctly. Dashboard charts now show vertical dashed
lines at GBP100k 2021-11-01, GBP250k 2023-07-18, GBP500k 2024-09-19,
GBP750k 2025-08-26, GBP1M 2026-04-18.

KNOWN FOLLOW-UP: Vault rotates pg-wealthfolio-sync every 7 days
(static role). Todays failure will recur unless the Grafana
datasource auto-refreshes. Options:
  1. Annotate Grafana deploy with stakater/reloader so it restarts
     when wealthfolio-sync-db-creds Secret changes.
  2. Switch datasource provisioning to read password from an env var
     sourced from the Secret instead of baking into the ConfigMap.
     Combined with reloader, picks up rotation cleanly.
2026-05-02 20:27:21 +00:00
Viktor Barzin
0ef36aec36 Revert "monitoring(wealth): milestone annotations on every timeseries chart"
This reverts commit 5a00b9c096.
2026-05-02 20:20:18 +00:00
Viktor Barzin
5a00b9c096 monitoring(wealth): milestone annotations on every timeseries chart
Inspired by the user's "Journey to £1M" reference — adds vertical
dashed lines on every timeseries panel at the date net worth first
crossed each round threshold (£100k, £250k, £500k, £750k, £1M).

Implementation: a dashboard-level annotation source ("Milestones",
purple) backed by a PG query that finds the MIN(valuation_date) where
SUM(total_value) >= each threshold. The query returns (time, text)
pairs, e.g. "2026-04-18 → £1M 🎉". Annotations attach to all
timeseries panels automatically; auto-extends as future thresholds
are crossed.

Verified against current data:
  £100k → 2021-11-01    £250k → 2023-07-18    £500k → 2024-09-19
  £750k → 2025-08-26    £1M    → 2026-04-18 🎉

Future work (per user request): add a "Journey" stat-card row at the
top mirroring the reference (date achieved + months from previous).
2026-05-02 08:42:21 +00:00
Viktor Barzin
664a85ef1e Revert "monitoring(wealth): show daily points + lighter fill on timeseries"
This reverts commit 5472720c75.
2026-05-01 16:24:18 +00:00
Viktor Barzin
5472720c75 monitoring(wealth): show daily points + lighter fill on timeseries
Make daily movements visible on the line charts. The y-axis still spans
~£700k–£1M so an £8k daily move is ~1% of vertical range and easy to
miss when only the line is drawn.

Changes per panel:
  * 5 (Net worth):                  showPoints never→always, pointSize 4→5, fillOpacity 20→10
  * 6 (Net contrib vs market):      showPoints never→always, pointSize 4→5
  * 7 (Growth over time):           showPoints never→always, pointSize 4→5, fillOpacity 50→25
  * 8 (Per-account stacked):        showPoints never→always (kept stacking fill at 70)
  * 9 (Cash vs invested stacked):   showPoints never→always (kept stacking fill at 70)

Each daily value now renders as a visible dot, so even if the line
appears flat at this scale, the per-day points trace the wiggle. Lighter
fill on the unstacked panels lets the line + points dominate visually.

Caveat: the fundamental "£8k on a £1M base" visibility issue is best
solved with a dedicated "Daily change" delta panel — happy to add one
on next pass if this isn't enough.
2026-05-01 16:23:25 +00:00
Viktor Barzin
2722260ce9 monitoring(wealth): unbreak timeseries SQL — over-escaped time alias
Fix: panels 5–9 had `AS \"time\"` (literal backslash-quote sequence
embedded in the SQL string). PostgreSQL parsed that as a syntax error
at the leading backslash:

  ERROR:  syntax error at or near "\"
  LINE 1: ...complete_dates)) SELECT valuation_date::timestamp AS \"time\"

Root cause: the patch script for the skew-resilient queries (commit
628f5a0d) used a Python f-string with `\\\"time\\\"`, which produces
a literal backslash-quote in the Python string. When that string
was JSON-encoded the backslash was preserved verbatim instead of
collapsed to plain `"time"`.

Replaces all five occurrences with the correct `AS "time"` form.
Verified the corrected query against PG returns 7 daily net-worth
rows for 04-25..05-01 as expected.
2026-05-01 16:19:07 +00:00
Viktor Barzin
d67416d4ca monitoring(wealth): tighten default time range, bump decimals for granularity
Two adjustments to make daily movements visible:

1. Default time range: now-5y → now-180d. The timeseries charts (Net
   worth, Net contribution vs market value, Growth, Per-account
   stacked, Cash vs invested) auto-fit their y-axis to the data range
   in view. Over 5 years, daily £1k–£10k moves are ~1% of axis range
   and visually invisible against the cumulative trend. Over 6
   months, the same daily moves dominate. Yearly bar charts (12, 13)
   are unaffected — they aggregate by calendar year and don't filter
   on $__timeFilter.

2. Decimals → 2 on every currency panel (1, 2, 3, 5–9, 13, 15, 16)
   and every percent panel (4, 14). Stat panels now show pennies on
   currency and 0.01% on rates; chart y-axis ticks are likewise more
   precise. Honest caveat: pennies on a £1M number don't make the
   absolute readout easier — to see "today changed by £8,358" cleanly
   we'd want a dedicated delta panel; pending user direction.

Widen the time picker manually to recover the 5-year view; default
just zooms into the last 6 months.
2026-05-01 16:15:39 +00:00
Viktor Barzin
628f5a0d26 monitoring(wealth): skew-resilient queries, no more partial-day dips
Bug witnessed 2026-05-01: dashboard "Net worth (current)" showed £88k
instead of £1.03M because at 02:00 UTC an external trigger refreshed
ONE account (Trading212 ISA), creating its 05-01 daily_account_valuation
row. The 5 other accounts still had their last row at 04-30. The panel
SQL `WHERE valuation_date = (SELECT MAX(valuation_date))` then summed
only the single account that had a 05-01 row.

Two new SQL patterns adopted across all 15 affected panels:

  1. Stat / barchart "current snapshot" panels (1, 2, 3, 4, 11, 14, 15,
     16): latest-per-account stitching —
       WITH latest AS (SELECT DISTINCT ON (d.account_id) ...
                       FROM daily_account_valuation d
                       JOIN accounts a ON a.id = d.account_id
                       ORDER BY d.account_id, d.valuation_date DESC)
     gives a coherent "now" snapshot regardless of refresh skew, and
     the inner join filters out orphan/deleted accounts (one such was
     adding a stale £33k from 04-17). 12-month panels add a parallel
     `ago` CTE picking each account's row closest to (d_now - 12mo).

  2. Time-series / yearly panels (5, 6, 7, 8, 9, 12, 13): complete-days-
     only filter —
       WITH active_accounts AS (SELECT COUNT(*) FROM accounts),
            complete_dates AS (SELECT valuation_date
                               FROM daily_account_valuation d
                               JOIN accounts a ON a.id=d.account_id
                               GROUP BY valuation_date
                               HAVING COUNT(*) >= active.n)
     so a partial today never renders as a chart dip. The day rejoins
     the chart automatically once the daily 16:00 UTC sync writes rows
     for every account.

Verified end-to-end against live PG: new queries produce £1,033,734
(matches the 6 active accounts' true latest sum) where the old query
gave £88k.
2026-05-01 16:08:18 +00:00
Viktor Barzin
31b9e5d4a9 monitoring(wealth): add 12mo contrib + 12mo gain to top row
Top row goes from 5 → 7 stat panels (widths 4+4+4+3+3+3+3=24):

- Net worth, Net contribution, Growth shrink from w=5 to w=4.
- ROI % shrinks from w=5 to w=3 (now sits at x=12).
- 12mo return slides from x=20/w=4 to x=15/w=3.
- New: 12mo contrib (id=15, currency, blue) at x=18 — net contributions
  added in the trailing 12 months.
- New: 12mo gain (id=16, currency, red/green) at x=21 — pure market gain
  in £ over the trailing 12 months (12mo Δnet-worth − 12mo contribs).

Live values verified against PG: contrib_12mo=£245k, gain_12mo=£172k,
sum = £417k = nw_now − nw_ago, return = 23.51%.
2026-04-27 06:32:53 +00:00
Viktor Barzin
215717c90f monitoring(dashboards): tables at the bottom convention
wealth: move Activity log table from y=45 to y=77; the three barcharts
(Yearly return, Annual change, Per-account ROI) shift up by 14 to fill
the gap.

uk-payslip: move Sankey "where the money went" from y=80 to y=48 (right
above the table block); the three tables (Data integrity, All payslips,
YTD reconciliation) shift down by 14 so all four tables (4, 5, 6, 9) sit
contiguously at the bottom.

fire-planner and job-hunter still have intentional side-by-side
table/chart pairings; left untouched pending user direction on whether
to break them.
2026-04-26 18:30:52 +00:00
Viktor Barzin
bb28485ce0 monitoring(wealth): move 12mo return to top bar, shrink to w=4
Trailing 12-month investment return % was a full-width stat at y=59.
Now sits inline with Net worth / Contribution / Growth / ROI as the
fifth headline number — top-row stats reflowed from w=6 (×4) to w=5
(×4) + w=4 (×1). Title shortened to "12mo return" so it fits.
Panels below the old row shifted up by 4 rows to close the gap.
2026-04-26 18:19:24 +00:00
Viktor Barzin
a24cd7ceb7 monitoring(uk-payslip): yearly receipt aligns with P60 (RSU gross)
Switch the RSU stack from "after band-aware tax" to gross. Receipt
total is now pre-sacrifice gross compensation; bar − pension stack
≈ ytd_gross reported on the final March payslip / P60.

Verified alignment for 2025/26: bar−pension = £266,752 vs P60
ytd_gross = £268,127 — gap of £1,375 ≈ "other taxable" (benefits,
overtime). Remaining year-level gaps are upstream parser/ingest
issues, not dashboard logic:
  - 2024/25 +£27k: March 2025 payslip parsed bonus=£26,969 but never
    propagated it into gross_pay/income_tax. Receipt is more
    accurate than ytd_gross here.
  - 2023/24 −£36k: Feb 2024 payslip row appears to be missing from
    the table; ytd_gross has it, sum(gross_pay) doesn't.
  - 2022/23 −£10k: variant A→B transition residual.

SQL simplified — band-aware CTE chain dropped (no longer needed for
this panel since RSU is shown gross).
2026-04-26 10:24:06 +00:00
Viktor Barzin
222013806d monitoring(uk-payslip): split salary into cash + pension on yearly receipt
The salary field on the payslip is pre-pension-sacrifice, so the
"Salary (gross)" stack already silently included the salary-sacrifice
pension contribution. Split it out so pension is explicitly visible:
- Salary (cash, post-sacrifice) = salary - pension_sacrifice
- Pension (salary sacrifice, untaxed) = pension_sacrifice
- Bonus
- RSU vest (after band-aware tax)

Bar total unchanged (just relabels what was already there). Pension
is now visibly counted as income — consistent with "untaxed but real"
framing.

Caveat documented in panel description: receipt total ≠ P60 gross
because P60 reports pre-RSU-tax gross. Receipt shows RSU net of tax
per earlier intent. To exactly match P60, swap rsu_after_tax →
rsu_vest gross.
2026-04-26 09:18:32 +00:00
Viktor Barzin
21ac619fac monitoring(uk-payslip): promote yearly receipt + YTD gross YoY to row 4
Move both barchart/timeseries panels into row 4 (y=29, side-by-side
w=12 each, h=10) so the per-tax-year overviews appear right after
the income-tax-and-pension YTD row. Shift panels 13, 4, 5, 6, 8, 9
down by 10 to accommodate.

Final ordering: rows 1–3 = monthly + YTD timeseries (panels 1/7/2/3/11/12),
row 4 = yearly receipt + YTD gross YoY (16/17), then the wider
deduction/integrity/table panels below.
2026-04-25 23:58:15 +00:00
Viktor Barzin
53f555dc61 monitoring(uk-payslip): drop 3 panels referencing undeployed data
Removed:
- Panel 10 "HMRC Tax Year Reconciliation — Individual Tax API"
  → references hmrc_sync.tax_year_snapshot schema. The hmrc-sync
    service / DB has not been deployed, so the panel always errored
    with "relation does not exist".
- Panel 14 "Meta payroll: bank deposit vs payslip net pay"
  → references payslip_ingest.external_meta_deposits, which is
    created by alembic migration 0007. The deployed payslip-ingest
    image is at 0005, so the table doesn't exist.
- Panel 15 "RSU vest reconciliation — payslip vs Schwab"
  → references payslip_ingest.rsu_vest_events, created by migration
    0008. Same image-staleness story.

Verified all 14 remaining panels return without error via Grafana
/api/ds/query. SQL for the removed panels is preserved in git history;
re-add when the data sources are actually deployed.
2026-04-25 23:56:03 +00:00
Viktor Barzin
b2a25775aa monitoring(uk-payslip): simplify yearly receipt to earned-and-kept view
Replace the 7-stack "where total comp went" decomposition with a 3-stack
"what I actually earned" view: salary (gross), bonus (gross), and RSU
vest after band-aware tax (PAYE+NI withheld via sell-to-cover). Skips
income tax / NI / student loan / pension / RSU offset.

Bar height = real income kept across all components. RSU is net of tax
because it's withheld at source and never hits the bank account; salary
and bonus are gross because they're paid in full and taxes are deducted
elsewhere. This is the income-side view where tax is implicit, not the
deduction waterfall.

Per-year RSU after tax: 2020/21 £18k · 2021/22 £39k · 2022/23 £50k ·
2023/24 £26k · 2024/25 £71k · 2025/26 £73k.
2026-04-25 23:42:20 +00:00
Viktor Barzin
a17304f735 monitoring(uk-payslip): fix empty YTD gross YoY chart
Two bugs:
1. Synthetic dates projected onto 1970/71 fell outside the dashboard's
   default time range (now-10y → now), so Grafana filtered out every
   point. Switched to a sliding 12-month window
   (CURRENT_DATE - INTERVAL '12 months') as the projection base, plus
   a per-panel timeFrom: "13M" override so the panel always shows the
   last 13 months regardless of the dashboard's time picker.
2. ORDER BY tax_year, pay_date violated Grafana's long→wide conversion
   requirement (data must be ascending by time). Wrapped in a CTE and
   re-ordered by the synthetic time column. Pivoted result is now a
   single wide frame with 7 series (2019/20…2025/26).
2026-04-25 23:36:16 +00:00
Viktor Barzin
ac18c49a7b monitoring(wealth): fix x-axis label formatting on yearly bars
The default fieldConfig unit (percent on Yearly investment return %,
currencyGBP on Annual change decomposition) was being applied to the
"year" string column too — so x-axis labels rendered as "2024%" and
"£2,024" respectively. Add field overrides on the "year" column to
force unit=string. The earlier "tax_year" panels weren't affected
because "2024/25" doesn't parse as a number; "2024" did.
2026-04-25 23:31:03 +00:00
Viktor Barzin
77bed10a51 monitoring: investment-only returns + YoY YTD gross line chart
Wealth dashboard:
- "Yearly growth %" → "Yearly investment return %": switched to
  modified-Dietz formula `market_gain / (nw_start + 0.5 × contributions)`
  so contributions don't inflate the return. New money in is excluded —
  this is portfolio performance, not net-worth change.
- "Trailing 12-month growth %" → "Trailing 12-month investment return %":
  same formula, applied to the trailing 12mo window.

Pre-fix vs post-fix:
  2020: 155.0% → 5.12%   (large contributions on small base)
  2021: 344.7% → 26.45%
  2022: 26.9%  → -25.65% (the actual 2022 bear market)
  2023: 123.2% → 41.60%
  2024: 87.4%  → 25.70%
  2025: 46.8%  → 8.43%
  2026: 16.7%  → 3.28%   (YTD)

UK Payslip dashboard:
- Replaced the per-tax-year stacked bar with a year-over-year line chart:
  one line per tax year, X = month-of-tax-year (April→March, projected
  onto a 1970/71 fiscal calendar so years overlay), Y = cumulative YTD
  gross. Five+ lines visible at a glance for trend comparison.
2026-04-25 23:25:42 +00:00
Viktor Barzin
55d1da41f6 monitoring: more growth detail in Wealth + gross composition in UK Payslip
Wealth (4 new panels at the bottom):
- Trailing 12-month growth % (stat) — % change in net worth over last 12mo.
- Yearly growth % (bar per calendar year) — first→last valuation each year.
- Annual change decomposition (stacked bar) — splits each year's NW change
  into "net contributions" (new money in) and "market gain" (everything
  else: appreciation, dividends, FX). Answers "did I grow because I saved
  or because the market did the work?".
- Per-account ROI % (horizontal bar) — (value − contribution) / contribution
  × 100, latest snapshot. Excludes accounts with zero/negative net
  contribution (Schwab — distorts ratio after RSU sells).

UK Payslip (1 new panel below the yearly receipt):
- Gross composition by tax year (stacked bar) — salary / bonus / RSU vest /
  other components per tax year. Bar height = gross pay. Trends in salary
  growth, bonus levels, and RSU vest sizing at a glance.

All queries spot-checked via Grafana /api/ds/query.
2026-04-25 23:21:42 +00:00
Viktor Barzin
d48e222054 monitoring: lock Finance (Personal) folder to admin + fix cash classification
Folder ACL:
- Move uk-payslip + wealth dashboards to a new "Finance (Personal)"
  folder; job-hunter + fire-planner stay in "Finance" (open).
- New null_resource calls Grafana's folder permissions API after the
  dashboard sidecar materialises the folder, setting an admin-only
  ACL ({Admin: 4}). Default Viewer/Editor inheritance is overridden,
  so anonymous-Viewer (auth.anonymous=true) is denied. Server-admin
  always retains access.
- Verified: anonymous → 403 on uk-payslip + wealth, 200 on
  control dashboards (node-exporter); admin → 200 on all.

Wealth cash fix:
- Wealthfolio dumps WORKPLACE_PENSION wrappers entirely into
  cash_balance because it doesn't track underlying fund holdings.
  Reclassify pension cash as invested in the "Cash vs invested"
  panel so the cash series reflects actual uninvested broker cash
  (~£16k T212 ISA + Schwab) instead of phantom £154k.

  Pre-fix:  cash=£153,789 / invested=£870,282 / total=£1,024,071
  Post-fix: cash=£16,064  / invested=£1,008,008 / total=£1,024,071
2026-04-25 23:11:26 +00:00
Viktor Barzin
f0ce7b0363 fire-planner: add stack, Vault DB role, dashboard, DB
New stacks/fire-planner/ mirrors payslip-ingest layout:
- ExternalSecret pulling RECOMPUTE_BEARER_TOKEN from Vault secret/fire-planner
- DB ExternalSecret templating DB_CONNECTION_STRING via static role pg-fire-planner
- FastAPI Deployment (serve), CronJob (recompute-all monthly on 2nd at 09:00 UTC,
  scheduled after wealthfolio-sync's 1st at 08:00), ClusterIP Service
- Grafana datasource ConfigMap "FirePlanner" — `database` inside jsonData
  (cc56ba29 fix; otherwise Grafana 11.2+ hits "you do not have default database")

Plus:
- vault/main.tf: pg-fire-planner static role (7d rotation), allowed_roles
- dbaas/modules/dbaas/main.tf: null_resource creates fire_planner DB+role
- monitoring/dashboards/fire-planner.json: 9-panel Finance-folder dashboard
  (NW timeseries, MC fan chart, success heatmap, lifetime tax bars,
  years-to-ruin table, optimal leave-UK stat, ending wealth stat,
  UK success-by-strategy bars, sequence-risk correlation table)
- monitoring/modules/monitoring/grafana.tf: register "fire-planner.json" in Finance folder

Apply order:
  1. vault stack — creates the static role
  2. dbaas stack — creates the database & role
  3. external-secrets stack picks up vault-database refs (no change needed)
  4. fire-planner stack — first apply with -target=kubernetes_manifest.db_external_secret
     before full apply, per the plan-time-data-source pattern
  5. monitoring stack — picks up the new dashboard ConfigMap

[ci skip]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 17:27:19 +00:00
Viktor Barzin
bf4c7618d8 wealth: SQLite→PG ETL sidecar + new Grafana dashboard
Mirrors Wealthfolio's daily_account_valuation / accounts / activities
from SQLite into a new PG database (wealthfolio_sync) every hour, so
Grafana can chart net worth, contributions, and growth over time.

Components:
- dbaas: null_resource creates wealthfolio_sync DB + role on the CNPG
  cluster (dynamic primary lookup so it survives failover).
- vault: pg-wealthfolio-sync static role rotates the password every 7d.
- wealthfolio: ExternalSecret pulls the rotated password into the WF
  namespace; new pg-sync sidecar (alpine + sqlite + postgresql-client +
  busybox crond) does sqlite3 .backup → TSV dump → truncate-and-reload
  psql, hourly at :07. Plus a grafana-wealth-datasource ConfigMap in
  the monitoring namespace (uid: wealth-pg).
- monitoring: new Wealth dashboard (wealth.json, 10 panels) — current
  net worth / contribution / growth / ROI% stats, then time-series
  for net worth, contribution-vs-market, growth area, per-account
  stacked area, cash-vs-invested, and a 100-row activity log.

Initial sync: 6 accounts, 10,798 daily valuations, 518 activities.
Verified PG totals match SQLite latest snapshot exactly.
2026-04-25 17:07:33 +00:00
Viktor Barzin
4f5f1ff8c2 monitoring(uk-payslip): add yearly receipt stacked barchart panel
New panel 16 (barchart, h=11, y=179): one stacked bar per tax year showing
total comp split into net pay (bank deposit), cash income tax, RSU tax
(band-aware marginal: PAYE+NI), cash NI, student loan, pension salary-
sacrifice, and RSU offset (Variant A only).

X-axis = tax_year (categorical), y-axis = currencyGBP. Bar height ≈
gross_pay + pension_sacrifice (small over-attribution in Variant A years
where the band-aware model exceeds recorded payslip PAYE).
2026-04-25 16:26:57 +00:00
Viktor Barzin
b3c29eda12 monitoring(uk-payslip): model UK income-tax bands + PA-taper for RSU marginal
Replaces the flat 47% (45 PAYE + 2 NI) RSU marginal across panels 3, 7, 8, 11,
and 12 with an exact piecewise band-aware computation. Each row computes
ani_prior/ani_pre/ani_post over the tax-year YTD (chronological model — the
RSU is taxed at the band its YTD ANI position occupies at the vest date,
mirroring PAYE withholding behaviour).

Bands (2024/25+, applied to all years):
  IT:  0% / 20% / 40% / 60% (PA-taper) / 45%   at 12,570 / 50,270 / 100k / 125,140
  NI:  0% / 8% / 2%                            at 12,570 / 50,270

PA-taper modelled as 60% effective IT marginal in £100k–£125,140
(40% on the £1 + 40% on the £0.50 of lost PA = 60%).

Spot-checked per tax-year totals via psql; numbers diverge from the flat
47% baseline most for years where vests cross PA-taper or basic-rate bands
(2020/21 ~35%, 2024/25 ~41%, 2025/26 ~43%).
2026-04-25 16:14:49 +00:00
Viktor Barzin
0d5f53f337 monitoring(uk-payslip): replace misleading take-home rates in Panel 3
Drop the two misleading series in "Effective rate & take-home % (YTD
cumulative)" — both used SUM(gross_pay) as denominator while only
counting cash deductions/net in the numerator, which understated
take-home by 25-30 pp because RSU shares are absent from the cash
deposit but present in gross. Replaced with three semantically clean
angles:

- ytd_paye_rate_pct: SUM(income_tax) / SUM(taxable_pay) — HMRC audit
  rate (~41-42% in additional-rate band), kept as before.
- ytd_cash_take_home_pct: SUM(net_pay) / SUM(gross_pay - rsu_vest) —
  what fraction of cash earnings hits the bank (~62-65%).
- ytd_total_keep_pct: (SUM(net_pay) + 0.53 × SUM(rsu_vest)) /
  SUM(gross_pay) — true "what I actually keep" including post-tax RSU
  shares (47% marginal applied to vest value), ~55-60%.

Added field overrides for clear color-coding (red/green/blue).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:45:47 +00:00
Viktor Barzin
8f0d13282c monitoring(uk-payslip): drop cash PAYE/NI from "Tax & pension — monthly"
Same reasoning as panel 2: cash-side income_tax and NI are inherently
bumpy in vest months due to UK cumulative PAYE catching up on YTD,
and the flat-47% strip can't fix it. Panel now shows only the
explicit RSU vest tax (orange, 47% × rsu_vest), student loan, and
pensions. The smooth view of total cash deductions stays available on
panel 12 (YTD cumulative).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:43:32 +00:00
Viktor Barzin
2230cb6cf4 monitoring(uk-payslip): drop tax/NI from "Monthly cash flow (RSU stripped)" panel
Vest months still bumped 4-5x in this panel after the flat-47% strip
because UK cumulative PAYE genuinely catches up YTD tax in vest
months, on top of the marginal RSU portion — no arithmetic split can
make that line flat without distorting the data. The cash-flow
question this panel answers (what hits the bank, RSU aside) is
already covered cleanly by cash_gross + net_pay; the tax detail lives
on Panel 11 where the RSU split is now linear.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:30:46 +00:00
Viktor Barzin
cb3ffa6d8d monitoring(uk-payslip): smooth quarterly RSU tax bumps via flat 47% marginal
Replace the implicit pro-rata RSU/cash split with an explicit flat
47% marginal (45% PAYE + 2% NI) for the RSU vest tax stack. The orange
slice now scales linearly with rsu_vest instead of wobbling around the
month's effective PAYE rate; cash PAYE/NI slices have those amounts
subtracted out so the stack still totals to actual deductions.

Affects panel 7 (monthly), panel 12 (YTD cumulative), panel 7
(YTD uses), and the Sankey panel. Verified on 35 months of live data:
sum invariant holds exactly (cash + rsu_marginal + cash_ni ==
income_tax + national_insurance), no negatives in cash slices.

Out of scope (left raw): effective-rate %, data-integrity, payslip
table, P60/HMRC reconciliation — those are audit views that use
unmodified income_tax / cash_income_tax columns.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 15:13:29 +00:00
Viktor Barzin
d231615ebb [monitoring] Fix fuse voltage alerts — divide raw deciVolt reading by 10
The tuya-bridge exporter reports `fuse_main_voltage` and
`fuse_garage_voltage` as raw uint16 from the Tuya protocol, which
encodes voltage in deciVolts (e.g. 2352 = 235.2V). The 200/260V
thresholds were comparing against the raw integer, so both
FuseMainVoltageAbnormal and FuseGarageVoltageAbnormal fired
continuously during normal mains conditions.

Dividing in the expression also makes `{{ $value }}V` render the
correct human-readable value in the alert summary.

Root fix would be in tuya-bridge `_decode_value()` where
`name.startswith("voltage")` returns `int.from_bytes(...)` without the
/10 scaling that `decode_voltage_threshold` applies. Leaving that
alone to avoid breaking the automatic_transfer_switch scrape which
uses a different code path (`parse_voltage_string`).
2026-04-24 11:12:56 +00:00
Viktor Barzin
a5e4db9af8 [monitoring] Tuya Cloud root-cause alert + cascade suppression
New alert TuyaCloudDown fires when any *_tuya_cloud_up gauge == 0
(i.e., the Tuya Cloud API rejects scrape calls — the symptom during
last night's iot.tuya.com trial expiry, code=28841002). 5m for-duration
beats the 15m window of the seven downstream *MetricsMissing alerts, so
the new Alertmanager inhibit rule suppresses the per-device noise and
only TuyaCloudDown pages.

Also flips helm_release.prometheus.force_update from true to false:
force_update was tripping on the pushgateway PVC added in rev 188
(commit e51c104) — Helm's --force path tried to reset spec.volumeName
on a bound PVC. Disabled here; re-enable temporarily when a
StatefulSet volumeClaimTemplate change actually needs --force.

Bundled with pre-existing working-tree additions for Fuse/Thermostat
threshold alerts and expanded PowerOutage inhibit regex (landed in the
same Helm revision 190).

Verified: rule loaded, value=7 (all 7 tuya-bridge devices report
cloud_up=0 right now), TuyaCloudDown moved pending→firing after 5m,
3 *MetricsMissing alerts currently suppressed in Alertmanager with
inhibitedBy=1 (thermostat alerts still pending their 15m window, will
be suppressed on transition).
2026-04-23 09:59:48 +00:00
Viktor Barzin
344fce3692 [monitoring][poison-fountain] pushgateway persistence + cronjob uid-0
Two independent root-cause fixes surfaced by the 2026-04-22 cluster
health check:

1. Pushgateway lost all in-memory metrics when node3 kubelet hiccuped
   at 11:42 UTC, hiding backup_last_success_timestamp{job="offsite-
   backup-sync"} until the next 06:01 UTC push — a ~18h false-negative
   window. Enable persistence on a 2Gi proxmox-lvm-encrypted PVC with
   --persistence.interval=1m. Chart note: values key is
   `prometheus-pushgateway:` (subchart alias), not `pushgateway:`.

2. poison-fountain-fetcher CronJob runs curlimages/curl as UID 100
   but the NFS mount /srv/nfs/poison-fountain is root:root 755 and
   the main Deployment runs as root, so mkdir /data/cache fails
   every 6h. Set run_as_user=0 on the CronJob container (no_root_squash
   is set on the export).

Closes the backup_offsite_sync FAIL on the next 06:01 UTC offsite
sync; closes the recurring poison-fountain evicted-pod noise on the
next 00:00 UTC cron tick.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 18:32:29 +00:00
Viktor Barzin
fdced7577b [monitoring] HomeAssistantCriticalSensorUnavailable alert 2026-04-22 14:52:23 +00:00
Viktor Barzin
a4eafafe49 [monitoring] Add GPUNodeUnschedulable alert — fires when GPU node is cordoned
After k8s-node1 was silently cordoned and broke Frigate camera streams,
existing alerts (NvidiaExporterDown, PodUnschedulable) didn't catch the
root cause proactively. This alert fires within 5m of the GPU node being
cordoned, before any pod restart attempts to schedule and fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-22 14:05:12 +00:00
Viktor Barzin
134d6b9a82 vault runbook + raft/HA stuck-leader alerts
Post-2026-04-22 Step 5 deliverables:
- docs/runbooks/vault-raft-leader-deadlock.md — safe pod-restart
  sequence that avoids zombie containerd-shim + kernel NFS
  corruption, qm reset no-op gotcha, boot-order gotcha.
- prometheus_chart_values.tpl — VaultRaftLeaderStuck +
  VaultHAStatusUnavailable. Silent until vault telemetry
  scraping lands (tracked as beads code-vkpn).

Epic for moving vault off NFS tracked as beads code-gy7h.
2026-04-22 12:44:46 +00:00
Viktor Barzin
d39770b30d monitoring: tighten LVMSnapshotStale to 30h for daily-cadence detection
Threshold was 48h + 30m for: a job that runs daily. We don't need
to wait 2.5 days to detect a broken timer — bring it down to 30h
+ 30m (just over a day of cadence + minor drift/retry grace). Also
add a description pointing to the restore runbook so the alert
text surfaces the fix path directly.

Threshold change: 172800s → 108000s. Docs in backup-dr.md synced.

Re-triggers default.yml apply now that ci/Dockerfile is rebuilt
with vault CLI — this is the first commit touching a stack that
will actually succeed since the e80b2f02 regression.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 08:54:37 +00:00
Viktor Barzin
9b4970da61 monitoring: alert hygiene — disambiguate, rename, tune, fix inhibits
- HighPowerUsage: add subsystem:gpu (line 724) + subsystem:r730 (line 775)
  labels so the two same-named alerts are distinguishable in routing.
- HeadscaleDown (deployment-replicas flavor, line 1414) → rename to
  HeadscaleReplicasMismatch. Line 2039 keeps HeadscaleDown as the real
  up-metric critical check. NodeDown inhibit rule updated to suppress
  the renamed alert too.
- EmailRoundtripStale (line 1816): for 10m → 20m. Survives one missed
  20-min probe cycle before firing, cuts flapping (12 short-burst fires
  over last 24h).

ATSOverload tuning skipped: 24h fire-count is 0, it's continuously
firing not flapping — already-known sustained 83% ATS load, tuning
would not change behavior.

8 backup *NeverSucceeded rules audited: all 7 using
kube_cronjob_status_last_successful_time target real K8s CronJobs with
active metrics (not Pushgateway-sourced). PrometheusBackupNeverRun
already uses absent() correctly. No fixes needed.
2026-04-21 22:29:15 +00:00
Viktor Barzin
9041f52b05 monitoring: TechnitiumZoneCountMismatch — compare replicas only, exclude primary
Primary has only the Primary-type zones it owns (10). Replicas have those
+ built-in zones (localhost, in-addr.arpa reverse, etc.), so their count
(14) can never match primary. Alert expr compared max-min across all
instances, making it chronically firing.

Fix: instance!="primary" filter. The real signal this alert wants is
"did one replica drift from the others" — replica-to-replica comparison
captures that; primary was never comparable.
2026-04-19 22:15:55 +00:00
Viktor Barzin
e092f159b3 monitoring: drop MAM Mouse-class + qBittorrent-unsatisfied alerts
Both alerts fired as expected noise while the MAM account is in new-member
Mouse class — tracker refuses announces and the 72h seed-gate can't be met
until ratio recovers. Keeping the rest of the MAM rules (cookie expiry,
ratio, farming/janitor stalls, qbt disconnect) which still signal real
pipeline failures.

Firing count drops from 7 → 3 in healthcheck.
2026-04-19 21:24:46 +00:00
Viktor Barzin
68a10905e0 [monitoring] uk-payslip Panel 13: stacked bars + sum-in-legend
"Monthly cash flow — tax impact (RSU excluded)" was already stacking
group A in normal mode but rendered as 70%-opacity filled lines — the
overlap made the total-per-month figure visually inaccessible.

Switch drawStyle to bars (100% fill, 0-width lineWidth, no per-point
markers) so each month reads as a single stacked bar whose top edge is
the total cash-side deduction. Add "sum" to legend.calcs so the
tax-year totals per series show in the legend table alongside last and
max.

Panel 11 (Tax & pension — monthly, RSU-inclusive) retains the line/
area style so the two panels remain visually distinct.
2026-04-19 20:31:53 +00:00
Viktor Barzin
3f6dfb10aa [monitoring] job-hunter: panels 6-9 for comp_points tables + trends
Append the structured-comp dashboard surface to the job-hunter
dashboard:

Panel 6 — Per-company salary by level (p50 base, GBP table).
Panel 7 — Total-comp heatmap per (company, level), p50 GBP.
Panel 8 — Comp-point volume by source (daily time-series).
Panel 9 — Base-salary trend (p50) over time for the top 5 companies.

Adds templating: $location (multi, default london), $level (single,
default senior), $company (multi, default all) — populated from
comp_points + levels metadata so the selection reflects what was
actually ingested.

Closes: code-5ph
2026-04-19 18:50:48 +00:00
Viktor Barzin
a8280e77b6 [broker-sync] unsuspend IMAP + Panel 15 RSU vest reconciliation (Phase D)
Activates the Schwab/InvestEngine IMAP ingest CronJob that's been
scaffolded-but-suspended since Phase 2 of broker-sync, now that the
Schwab parser can detect vest-confirmation emails. Runs nightly 02:30 UK.

Current behaviour once deployed:
  - Trade confirmations (Schwab sell-to-cover, InvestEngine orders) →
    Activity rows posted to Wealthfolio. Unchanged.
  - Release Confirmations (Schwab RSU vests) → parser returns gross-vest
    BUY + sell-to-cover SELL Activities (to Wealthfolio) and a VestEvent
    object (NOT YET persisted — Postgres sink + DB grant pending; see
    follow-up under code-860). Vest detection uses a subject/body
    heuristic that will need tightening against a real email fixture.

Panel 15 of the UK payslip dashboard added: per-vest-month join of
payslip.rsu_vest vs rsu_vest_events (gross_value_gbp, tax_withheld_gbp)
with delta columns. Tax-delta-percent coloured green/orange/red at
0/2%/5% thresholds. Table is empty until broker-sync starts persisting
VestEvents — harmless until then.

Before applying:
  - Verify IMAP creds in Vault (secret/broker-sync: imap_host,
    imap_user, imap_password, imap_directory) are still valid.
  - Empty vest-event table is expected; delta columns show NULL until
    the postgres sink lands.

Part of: code-860
2026-04-19 18:29:01 +00:00
Viktor Barzin
1c0e1bcdde [payslip-ingest] ActualBudget payroll sync CronJob + Panel 14 (Phase C)
Wires the daily ActualBudget deposit sync from the payslip-ingest app into
K8s as a CronJob, and adds dashboard Panel 14 to overlay bank deposits
against payslip net_pay.

CronJob: actualbudget-payroll-sync in payslip-ingest namespace, runs
02:00 UTC. Calls `python -m payslip_ingest sync-meta-deposits`, which
hits budget-http-api-viktor in the actualbudget namespace and upserts
matching Meta payroll deposits into payslip_ingest.external_meta_deposits.

ExternalSecret extended with three new Vault keys:
  - ACTUALBUDGET_API_KEY (same as actualbudget-http-api-viktor's env API_KEY)
  - ACTUALBUDGET_ENCRYPTION_PASSWORD (Viktor's budget password)
  - ACTUALBUDGET_BUDGET_SYNC_ID (Viktor's sync_id)

These must be seeded at secret/payslip-ingest in Vault before the
CronJob will run — it'll CrashLoop on missing env vars otherwise. First
run can be triggered on demand via `kubectl -n payslip-ingest create
job --from=cronjob/actualbudget-payroll-sync initial-sync`.

Panel 14 plots monthly SUM(external_meta_deposits.amount) vs
SUM(payslip.net_pay), plus a delta bar series — |delta| > £50 flags
likely parser drift on net_pay.

Part of: code-860
2026-04-19 18:21:20 +00:00
Viktor Barzin
fca3dd4976 [monitoring] uk-payslip: Panel 2 uses COALESCE cash_income_tax; Panel 4 flags NULL
Phase A of RSU tax spike fix. Two changes:

1. Panel 2 "Monthly cash flow (RSU stripped)" plotted raw income_tax despite
   the title. Switch to COALESCE(cash_income_tax, income_tax) so the chart
   is honest once the Phase B back-fill populates cash_income_tax on
   variant-A slips. For slips where cash_income_tax is already populated
   (variant B, 2024+) the spike is removed immediately.

2. Panel 4 "Data integrity" now surfaces rows where cash_income_tax is NULL
   on vest months (rsu_vest > 0). New status value NULL_CASH_TAX (orange)
   highlights the back-fill remaining population — expected to drop to 0
   after Phase B lands.

Part of: code-860
2026-04-19 18:04:05 +00:00
Viktor Barzin
f4d3fdb2e3 [monitoring] uk-payslip: drop RSU-vest annotations
Vertical orange markers at every vest month added more visual noise
than signal. Panel 13 (cash-only) already conveys the "no spike on
vest months" story without needing markers across panels 1/2/3/7/11/12.
2026-04-19 17:32:49 +00:00
Viktor Barzin
a641dc744f [monitoring] uk-payslip: RSU vest annotations + cash-only tax panel
Panel 11 stacks RSU-attributed income tax on top of cash PAYE, which
is mathematically correct but emotionally misleading since RSU tax is
withheld at source via sell-to-cover and never hits the bank. Adopts
the two-view convention: Panel 11 keeps the full PAYE picture; new
Panel 13 shows cash-only deductions. Dashboard-level "RSU vests"
annotation paints orange markers on every vest month across all
timeseries panels, with tooltips like "RSU vest: £31232 gross /
£15257 tax withheld".

Shifts Panels 4/5/6/8/9/10 down by 9 rows to make room for Panel 13
at y=29.
2026-04-19 17:24:35 +00:00
Viktor Barzin
e7ce545da2 [job-hunter] Add infra stack + Grafana dashboard + n8n digest workflow
New service stack at stacks/job-hunter/ mirroring the payslip-ingest
pattern: per-service CNPG database + role (via dbaas null_resource),
Vault static role pg-job-hunter (7d rotation), ExternalSecrets for app
secrets and DB creds, Deployment with alembic-migrate init container,
ClusterIP Service, Grafana datasource ConfigMap.

Grafana dashboard job-hunter.json in Finance folder: new roles per
day, source breakdown, top companies, GBP salary distribution, recent
roles table (sorted by parse confidence then salary).

n8n weekly-digest workflow calls POST /digest/generate with bearer
auth every Monday 07:00 London; digest_runs table provides
idempotency.

Refs: code-snp

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:09:29 +00:00
Viktor Barzin
7cb44d7264 [registry] Stop recurring orphan OCI-index incidents — detection + prevention + recovery
Second identical registry incident on 2026-04-19 (first 2026-04-13): the
infra-ci:latest image index resolved to child manifests whose blobs had been
garbage-collected out from under the index. Pipelines P366→P376 all exited
126 "image can't be pulled". Hot fix (a05d63e / 6371e75 / c113be4) restored
green CI but left the underlying bug unaddressed.

Root cause: cleanup-tags.sh rmtrees tag dirs on the registry VM daily at
02:00, registry:2's GC (Sunday 03:25) walks OCI index children imperfectly
(distribution/distribution#3324 class). Nothing verified pushes end-to-end;
nothing probed the registry for fetchability; nothing caught orphan indexes.

Phase 1 — Detection:
 - .woodpecker/build-ci-image.yml: after build-and-push, a verify-integrity
   step walks the just-pushed manifest (index + children + config + every
   layer blob) via HEAD and fails the pipeline on any non-200. Catches
   broken pushes at the source.
 - stacks/monitoring: new registry-integrity-probe CronJob (every 15m) and
   three alerts — RegistryManifestIntegrityFailure,
   RegistryIntegrityProbeStale, RegistryCatalogInaccessible — closing the
   "registry serves 404 for a tag that exists" gap that masked the incident
   for 2+ hours.
 - docs/post-mortems/2026-04-19-registry-orphan-index.md: root cause,
   timeline, monitoring gaps, permanent fix.

Phase 2 — Prevention:
 - modules/docker-registry/docker-compose.yml: pin registry:2 → registry:2.8.3
   across all six registry services. Removes the floating-tag footgun.
 - modules/docker-registry/fix-broken-blobs.sh: new scan walks every
   _manifests/revisions/sha256/<digest> that is an image index and logs a
   loud WARNING when a referenced child blob is missing. Does NOT auto-
   delete — deleting a published image is a conscious decision. Layer-link
   scan preserved.

Phase 3 — Recovery:
 - build-ci-image.yml: accept `manual` event so Woodpecker API/UI rebuilds
   don't need a cosmetic Dockerfile edit (matches convention from
   pve-nfs-exports-sync.yml).
 - docs/runbooks/registry-rebuild-image.md: exact command sequence for
   diagnosing + rebuilding after an orphan-index incident, plus a fallback
   for building directly on the registry VM if Woodpecker itself is down.
 - docs/runbooks/registry-vm.md + .claude/reference/service-catalog.md:
   cross-references to the new runbook.

Out of scope (verified healthy or intentionally deferred):
 - Pull-through DockerHub/GHCR mirrors (74.5% hit rate, no 404s).
 - Registry HA/replication (single-VM SPOF is a known architectural
   choice; Synology offsite covers RPO < 1 day).
 - Diun exclude for registry:2 — not applicable; Diun only watches
   k8s (DIUN_PROVIDERS_KUBERNETES=true), not the VM's docker-compose.

Verified locally:
 - fix-broken-blobs.sh --dry-run on a synthetic registry directory correctly
   flags both orphan layer links and orphan OCI-index children.
 - terraform fmt + validate on stacks/monitoring: success (only unrelated
   deprecation warnings).
 - python3 yaml.safe_load on .woodpecker/build-ci-image.yml and
   modules/docker-registry/docker-compose.yml: both parse clean.

Closes: code-4b8

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 17:08:28 +00:00
Viktor Barzin
df2c53db8d [infra] TrueNAS decommission — remove active references from Terraform + configs
TrueNAS VM 9000 at 10.0.10.15 was operationally decommissioned 2026-04-13.
The subagent-driven doc sweep in 5a0b24f5 covered the prose. This commit
removes the remaining in-code references:

- reverse-proxy: drop truenas Traefik ingress + Cloudflare record
  (truenas.viktorbarzin.me was 502-ing since the VM stopped), drop
  truenas_homepage_token variable.
- config.tfvars: drop deprecated `truenas IN A 10.0.10.15`, `iscsi CNAME
  truenas`, and the commented-out `iscsi`/`zabbix` A records.
- dashy/conf.yml: remove Truenas dashboard entry (&ref_28).
- monitoring/loki.yaml: change storageClass from the decommissioned
  `iscsi-truenas` to `proxmox-lvm` so a future re-enable has a valid SC
  (Loki is currently disabled).
- actualbudget/main.tf + freedify/main.tf: update new-deployment
  docstrings to cite Proxmox host NFS instead of TrueNAS.
- nfs-csi: add an explanatory comment to the `nfs-truenas` StorageClass
  noting the name is historical — 48 bound PVs reference it, SC names
  are immutable on PVs, rename not worth the churn.

Also cleaned out-of-band:
- Technitium DNS: deleted `truenas.viktorbarzin.lan` A and
  `iscsi.viktorbarzin.lan` CNAME records.
- Vault: `secret/viktor` → removed `truenas_api_key` and
  `truenas_ssh_private_key`; `secret/platform.homepage_credentials.reverse_proxy.truenas_token` removed.
- Terraform-applied: `scripts/tg apply -target=module.reverse-proxy.module.truenas`
  destroyed the 3 K8s/Cloudflare resources cleanly.

Deferred:
- VM 9000 is still stopped on PVE. Deletion (destructive) awaits explicit
  user go-ahead.
- `nfs-truenas` StorageClass name retained (see nfs-csi comment above).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 16:57:05 +00:00