nginx's not_modified_filter evaluated If-Match headers forwarded by
Traefik's forwardAuth, returning 412 and breaking CalDAV VTODO updates
from macOS/iOS Reminders. Switch to OpenResty and clear conditional
headers with Lua before proxy processing.
Cloudflare cannot proxy raw TCP/1688 (KMS protocol). Switch
kms.viktorbarzin.me from CF-proxied CNAME to direct A/AAAA so
clients can reach the vlmcsd LoadBalancer (10.0.20.200) via the
existing pfSense WAN port-forward for 1688.
Verified end-to-end: vlmcs against 176.12.22.76:1688 completes
the KMS V4 handshake for Office Professional Plus 2019.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adopts the always-latest convention used by job-hunter, payslip-ingest,
and fire-planner: image SHA lives in stacks/priority-pass/terragrunt.hcl
inputs, default in main.tf var. The priority-pass GHA build workflow
auto-commits new SHAs to this file on every successful push.
- Add `variable "image_tag"` (default = current value 7c01448d).
- Both containers now use `local.{frontend,backend}_image` interpolation.
- Replace symlinked terragrunt.hcl with a real file so the stack-local
inputs block can override image_tag (mirrors payslip-ingest exactly).
State note: priority-pass TF state is currently empty (Tier 1 PG migration
skipped this stack). A subsequent `terragrunt import` is required to
adopt the live deployment + namespace + ingress before running apply.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two coordinated fixes for the same root cause: Postfix's smtpd_upstream_proxy_protocol
listener fatals on every HAProxy health probe with `smtpd_peer_hostaddr_to_sockaddr:
... Servname not supported for ai_socktype` — the daemon respawns get throttled by
postfix master, and real client connections that land mid-respawn time out. We saw
this as ~50% timeout rate on public 587 from inside the cluster.
Layer 1 (book-search) — stacks/ebooks/main.tf:
SMTP_HOST mail.viktorbarzin.me → mailserver.mailserver.svc.cluster.local
Internal services should use ClusterIP, not hairpin through pfSense+HAProxy.
12/12 OK in <28ms vs ~6/12 timeouts on the public path.
Layer 2 (pfSense HAProxy) — stacks/mailserver + scripts/pfsense-haproxy-bootstrap.php:
Add 3 non-PROXY healthcheck NodePorts to mailserver-proxy svc:
30145 → pod 25 (stock postscreen)
30146 → pod 465 (stock smtps)
30147 → pod 587 (stock submission)
HAProxy uses `port <healthcheck-nodeport>` (per-server in advanced field) to
redirect L4 health probes to those ports while real client traffic keeps
going to 30125-30128 with PROXY v2.
Result: 0 fatals/min (was 96), 30/30 probes OK on 587, e2e roundtrip 20.4s.
Inter dropped 120000 → 5000 since log-spam concern is gone.
`option smtpchk EHLO` was tried first but flapped against postscreen (multi-line
greet + DNSBL silence + anti-pre-greet detection trip HAProxy's parser → L7RSP).
Plain TCP accept-on-port check is sufficient for both submission and postscreen.
Updated docs/runbooks/mailserver-pfsense-haproxy.md to reflect the new healthcheck
path and mark the "Known warts" entry as resolved.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bug: timeseries panels were empty before 2024-04-10. Cause was the
complete_dates CTE filtering to "every active account has a row for
this date" -- which excluded every day before the most-recently-added
account first appeared. The 6th account (Trading212 Invest GIA) only
started 2024-04-10, so 4 years of legitimate historical data
(2020-06-07 onwards, when the user genuinely had fewer accounts) got
hidden.
New pattern across panels 5/6/7/8/9/12/13: replace complete_dates with
max_complete cutoff. Compute the most-recent date where all current
accounts have a row, then include every historical date up to and
including that day. Partial-today is still excluded automatically.
Historical days with fewer accounts now show as their actual smaller
sums -- which is the correct historical net worth at the time.
Verified via PG: new pattern returns 2,159 distinct days from
2020-06-07 to 2026-05-05 (vs the previous 391 from 2024-04-10).
Per-account first-seen dates:
InvestEngine ISA - 2020-06-07
Schwab US workplace - 2020-11-17
InvestEngine GIA - 2022-03-17
Fidelity UK Pension - 2022-05-16
Trading212 ISA - 2024-04-08
Trading212 Invest GIA - 2024-04-10 (was the bottleneck)
Pipeline 1846 apply step errored at ~5s post-start, leaving the
e2e probe shell-quoting fix unapplied. K8s state was patched
manually as a temporary unblock; this empty commit retriggers CI
to land the source fix and resolve drift.
[ci]
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 2026-05-02 change that added the Brevo defensive-unblock step
to the email-roundtrip-monitor cron contained an apostrophe in a
Python comment ("wasn't"). The whole script is wrapped in shell
single quotes (python3 -c '...'), so the apostrophe terminated
the shell string. Python only parsed up to the apostrophe and
raised IndentationError on the now-bodyless try: block; everything
after was handed to /bin/sh which complained about "try::" and
unmatched parens. Result: every probe run since 2026-05-02 00:41 UTC
crashed before it could push, and the "Email Roundtrip E2E" Uptime
Kuma push monitor went DOWN with "No heartbeat in the time window".
Fix: rewrite the comment without an apostrophe and add a banner
warning so the next person editing this heredoc does not regress.
Validated: shell parses (bash -n), Python compiles (py_compile)
with the wrapping single quotes intact.
Re-applies the milestone annotation commit reverted in 0ef36aec. The
earlier "nothing loads / syntax error" was a red herring: Vault had
rotated the wealthfolio_sync DB password 7 days prior, the K8s Secret
picked it up automatically (pg-sync sidecar still working), but the
Grafana datasource ConfigMap is baked at TF-apply time so Grafana was
sending the old password. Every panel + the new annotation alike
failed with: pq password authentication failed for user wealthfolio_sync.
Fix today: refresh the datasource ConfigMap and roll Grafana.
scripts/tg apply -target=kubernetes_config_map.grafana_wealth_datasource
kubectl -n monitoring rollout restart deploy/grafana
Annotation source verified live via /api/ds/query: SQL returns 5
milestone rows correctly. Dashboard charts now show vertical dashed
lines at GBP100k 2021-11-01, GBP250k 2023-07-18, GBP500k 2024-09-19,
GBP750k 2025-08-26, GBP1M 2026-04-18.
KNOWN FOLLOW-UP: Vault rotates pg-wealthfolio-sync every 7 days
(static role). Todays failure will recur unless the Grafana
datasource auto-refreshes. Options:
1. Annotate Grafana deploy with stakater/reloader so it restarts
when wealthfolio-sync-db-creds Secret changes.
2. Switch datasource provisioning to read password from an env var
sourced from the Secret instead of baking into the ConfigMap.
Combined with reloader, picks up rotation cleanly.
Inspired by the user's "Journey to £1M" reference — adds vertical
dashed lines on every timeseries panel at the date net worth first
crossed each round threshold (£100k, £250k, £500k, £750k, £1M).
Implementation: a dashboard-level annotation source ("Milestones",
purple) backed by a PG query that finds the MIN(valuation_date) where
SUM(total_value) >= each threshold. The query returns (time, text)
pairs, e.g. "2026-04-18 → £1M 🎉". Annotations attach to all
timeseries panels automatically; auto-extends as future thresholds
are crossed.
Verified against current data:
£100k → 2021-11-01 £250k → 2023-07-18 £500k → 2024-09-19
£750k → 2025-08-26 £1M → 2026-04-18 🎉
Future work (per user request): add a "Journey" stat-card row at the
top mirroring the reference (date achieved + months from previous).
Now that priority-pass has its own GitHub repo (ViktorBarzin/priority-pass)
with a working GHA build pipeline that pushes to DockerHub, switch the TF
deployment pin from registry.viktorbarzin.me to docker.io/viktorbarzin so
future automated rollouts (once the repo is registered with Woodpecker)
land on the matching image source.
[ci skip]
Three fixes for boarding passes uploaded as iPhone screenshots (input
includes phone status bar, partial Tesco card below, etc.):
1. Detect the card region first and crop to it. All proportional
coordinates (Step 8 text replacement, Step 9 logo removal) are now
card-relative instead of full-image-relative — they were landing
in the wrong region on tall screenshots, putting "Priority" text
inside the QR area and leaving a yellow icon box at the bottom.
2. Step 8 now picks the LONGEST contiguous dark-row run inside a wider
y-band, instead of using the dark-row [first, last] span. This
distinguishes the QUEUE value text from the QUEUE label above it
(both are dark blue in the original) so the erase rectangle no
longer eats into the labels.
3. QR container padding bumped 8% → 12% so QR/container ratio matches
the ~74-80% golden look.
Verified end-to-end against three real samples saved by the previous
build's training-data feature, plus the original non-priority.jpeg
fixture: outputs now match priority.jpeg layout.
[ci skip]
- Adopt UserLoginStage (default-authentication-login) into Terraform
and pin session_duration=weeks=4 so users stay logged in across
browser restarts. There is no Brand.session_duration in 2026.2.x;
UserLoginStage is the only correct lever.
- Cap anonymous Django sessions at 2h via
AUTHENTIK_SESSIONS__UNAUTHENTICATED_AGE on server + worker pods
(default is days=1). Bots, healthcheckers, and partial flows now
get reaped within 2h instead of accumulating for a day.
Implementation note: the env var is injected via server.env /
worker.env rather than authentik.sessions.unauthenticated_age,
because authentik.existingSecret.secretName is set, which makes the
chart skip rendering its own AUTHENTIK_* Secret. authentik.* values
are therefore inert in this stack -- this is documented in
.claude/reference/authentik-state.md so future edits use the right
surface.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Backend changes:
- transformers.py: QR container now sized to actual qr_bbox + 8% padding
(was fixed at 45% of card width). When QR was wider than 45% of card,
the leftover-pixel branch color-remapped QR pixels outside the
container, breaking the scan. New container always encloses qr_mask.
- main.py: persist input + output + json metadata under
$UPLOAD_DIR/<airline>/<ts-uuid>-{input.<ext>,output.png,*.json} for
future training. Failure to save is logged, never breaks the API.
Infra:
- New PVC priority-pass-uploads (1Gi proxmox-lvm-encrypted, 10Gi
autoresize cap) — encrypted because boarding passes contain PII.
- Deployment strategy → Recreate (RWO requirement).
- Volume + volumeMount + UPLOAD_DIR env on backend container.
Applied via kubectl (TF state for this stack is empty — see prior
commit). New pod priority-pass-77956b64fb rolled out, PVC bound,
test transform succeeded, sample written to /data/uploads/ryanair/.
[ci skip]
Frontend bug fix: photo input forced camera on mobile via
capture="environment". Added separate "Choose from Gallery" / "Take
Photo" buttons so users can pick from their photo library.
Backend image unchanged; pin synced from stale v8 to live SHA
ae1420a0 (the v8 tag was never pushed to the registry).
Image built locally and pushed to registry.viktorbarzin.me. The
priority-pass project directory isn't under git, so deployment was
applied via kubectl set image (matches existing pattern — TF state
for this stack is empty).
[ci skip]
Make daily movements visible on the line charts. The y-axis still spans
~£700k–£1M so an £8k daily move is ~1% of vertical range and easy to
miss when only the line is drawn.
Changes per panel:
* 5 (Net worth): showPoints never→always, pointSize 4→5, fillOpacity 20→10
* 6 (Net contrib vs market): showPoints never→always, pointSize 4→5
* 7 (Growth over time): showPoints never→always, pointSize 4→5, fillOpacity 50→25
* 8 (Per-account stacked): showPoints never→always (kept stacking fill at 70)
* 9 (Cash vs invested stacked): showPoints never→always (kept stacking fill at 70)
Each daily value now renders as a visible dot, so even if the line
appears flat at this scale, the per-day points trace the wiggle. Lighter
fill on the unstacked panels lets the line + points dominate visually.
Caveat: the fundamental "£8k on a £1M base" visibility issue is best
solved with a dedicated "Daily change" delta panel — happy to add one
on next pass if this isn't enough.
Fix: panels 5–9 had `AS \"time\"` (literal backslash-quote sequence
embedded in the SQL string). PostgreSQL parsed that as a syntax error
at the leading backslash:
ERROR: syntax error at or near "\"
LINE 1: ...complete_dates)) SELECT valuation_date::timestamp AS \"time\"
Root cause: the patch script for the skew-resilient queries (commit
628f5a0d) used a Python f-string with `\\\"time\\\"`, which produces
a literal backslash-quote in the Python string. When that string
was JSON-encoded the backslash was preserved verbatim instead of
collapsed to plain `"time"`.
Replaces all five occurrences with the correct `AS "time"` form.
Verified the corrected query against PG returns 7 daily net-worth
rows for 04-25..05-01 as expected.
Two adjustments to make daily movements visible:
1. Default time range: now-5y → now-180d. The timeseries charts (Net
worth, Net contribution vs market value, Growth, Per-account
stacked, Cash vs invested) auto-fit their y-axis to the data range
in view. Over 5 years, daily £1k–£10k moves are ~1% of axis range
and visually invisible against the cumulative trend. Over 6
months, the same daily moves dominate. Yearly bar charts (12, 13)
are unaffected — they aggregate by calendar year and don't filter
on $__timeFilter.
2. Decimals → 2 on every currency panel (1, 2, 3, 5–9, 13, 15, 16)
and every percent panel (4, 14). Stat panels now show pennies on
currency and 0.01% on rates; chart y-axis ticks are likewise more
precise. Honest caveat: pennies on a £1M number don't make the
absolute readout easier — to see "today changed by £8,358" cleanly
we'd want a dedicated delta panel; pending user direction.
Widen the time picker manually to recover the 5-year view; default
just zooms into the last 6 months.
Bug witnessed 2026-05-01: dashboard "Net worth (current)" showed £88k
instead of £1.03M because at 02:00 UTC an external trigger refreshed
ONE account (Trading212 ISA), creating its 05-01 daily_account_valuation
row. The 5 other accounts still had their last row at 04-30. The panel
SQL `WHERE valuation_date = (SELECT MAX(valuation_date))` then summed
only the single account that had a 05-01 row.
Two new SQL patterns adopted across all 15 affected panels:
1. Stat / barchart "current snapshot" panels (1, 2, 3, 4, 11, 14, 15,
16): latest-per-account stitching —
WITH latest AS (SELECT DISTINCT ON (d.account_id) ...
FROM daily_account_valuation d
JOIN accounts a ON a.id = d.account_id
ORDER BY d.account_id, d.valuation_date DESC)
gives a coherent "now" snapshot regardless of refresh skew, and
the inner join filters out orphan/deleted accounts (one such was
adding a stale £33k from 04-17). 12-month panels add a parallel
`ago` CTE picking each account's row closest to (d_now - 12mo).
2. Time-series / yearly panels (5, 6, 7, 8, 9, 12, 13): complete-days-
only filter —
WITH active_accounts AS (SELECT COUNT(*) FROM accounts),
complete_dates AS (SELECT valuation_date
FROM daily_account_valuation d
JOIN accounts a ON a.id=d.account_id
GROUP BY valuation_date
HAVING COUNT(*) >= active.n)
so a partial today never renders as a chart dip. The day rejoins
the chart automatically once the daily 16:00 UTC sync writes rows
for every account.
Verified end-to-end against live PG: new queries produce £1,033,734
(matches the 6 active accounts' true latest sum) where the old query
gave £88k.
Restart-only didn't refresh the wealth Grafana dashboard — verified
empirically: a fresh `daily_account_valuation` row only lands when a
PortfolioJob runs with ValuationRecalcMode != None, and Wealthfolio's
internal schedulers don't trigger that path:
- 6h quotes scheduler refreshes the `quotes` table only.
- 4h broker scheduler short-circuits on missing `sync_refresh_token`.
The right knob is `POST /api/v1/market-data/sync`. Replaced the
rollout-restart CronJob (+ its SA/Role/RoleBinding) with a curl-based
CronJob that logs in (`POST /api/v1/auth/login`) then POSTs to
`/api/v1/market-data/sync` with the session cookie. Backfills missing
days via IncrementalFromLast in one call.
Schedule 16:00 UTC (= 17:00 BST):
* After UK market close (15:30 UTC BST), EOD UK prices settled.
* US market open ~2.5h, intra-day US quotes fresh.
* pg-sync next :07 tick mirrors → Grafana refresh ≤5m → fresh data
by ~17:12 BST, comfortably before the 18:00 BST target.
Plaintext password lives in Vault `secret/wealthfolio.web_password`,
flows via the existing `dataFrom.extract` ExternalSecret — no extra
ESO wiring needed. Verified end-to-end: API call backfilled 04-26
through 04-29, pg-sync mirrored, PG now shows rows up to today.
Top row goes from 5 → 7 stat panels (widths 4+4+4+3+3+3+3=24):
- Net worth, Net contribution, Growth shrink from w=5 to w=4.
- ROI % shrinks from w=5 to w=3 (now sits at x=12).
- 12mo return slides from x=20/w=4 to x=15/w=3.
- New: 12mo contrib (id=15, currency, blue) at x=18 — net contributions
added in the trailing 12 months.
- New: 12mo gain (id=16, currency, red/green) at x=21 — pure market gain
in £ over the trailing 12 months (12mo Δnet-worth − 12mo contribs).
Live values verified against PG: contrib_12mo=£245k, gain_12mo=£172k,
sum = £417k = nw_now − nw_ago, return = 23.51%.
Reduces 5-min disk-write spikes on PVE sdc. The cronjob was the
heaviest single contributor in our hourly fan-out investigation
(11.2 MB/s burst when it fired). Kea DDNS still handles real-time
DNS auto-registration; phpIPAM inventory just lags by up to 1h,
which we don't need fresher.
Docs (dns.md, networking.md, .claude/CLAUDE.md) updated to match.
Eliminates the OOM-on-face-detection-burst class of incidents (2026-04-26).
VPA upper for immich-server is 2.98Gi steady-state; the prior 4Gi limit was
1.34x upper and still got SIGKILL'd when face-detection bursts pushed
transient RSS past 4Gi. 8Gi gives 2.7x VPA upper headroom.
The kyverno tier-2-gpu default quota is 12Gi requests.memory which can't fit
8Gi (server) + 3.5Gi (ML) + 3Gi (PG) + backup CronJobs simultaneously. Opts
the namespace into the kyverno custom-quota exclude rule and overrides with
20Gi (~4.5Gi headroom) — same pattern as woodpecker/nvidia.
Root cause of 502/503/decode errors clustered at 19:20 BST 2026-04-26: immich-server
hit its 3500Mi memory limit during a face-detection burst and was OOMKilled (Exit Code
137). VPA upperBound is 3050Mi but real-world bursts crossed it; with the single pod
running both API and microservices workers, the OOM took the API down for ~30s of
restart, surfacing as PlatformException image decode + 502 on uploads + 503 on
ActivityService to the iOS app.
Bump immich-server requests=limits to 4096Mi (per CLAUDE.md "upperBound x 1.3 for
volatile workloads" rule, with headroom over the OOM mark). Quota math: 9680Mi used -
2000Mi old req + 4096Mi new req = 11776Mi, fits the tier-2-gpu 12Gi cap.
Switch both immich-server and immich-machine-learning to Recreate strategy: the
namespace tier-2-gpu quota is too tight for RollingUpdate to keep an old + new pod up
during apply (transient 13776Mi > 12Gi cap, see "ResourceQuota blocks rolling updates"
in CLAUDE.md). With single replicas and Recreate, future memory tweaks no longer
require manual scale-to-0 dance.
Verified: new pod has limits.memory=4Gi, quota usage stable at 11776Mi/12Gi, immich
API serving normally.
Note: a pending node_selector drift on immich-machine-learning (gpu=true ->
nvidia.com/gpu.present=true) also reconciled in this apply; the canonical NVIDIA
operator label already on the GPU node, no scheduling impact.
The default workflow truncated the failed-stack output at `tail -5`,
which only captured the trailing source-line indicator (`│ 45: resource
…`) and dropped the actual `Error: …` line above it. Bump to `tail -50`
so the real error is visible without re-running locally to reproduce.
Also fix the pre-warm step's FIRST_STACK detection — `head -1 file1
file2 | head -1` returns the file header (`==> .platform_apply <==`),
not the first stack name, so the cd then fails with "no such file or
directory". Use `cat | head -1` instead.
Pure logging-and-pre-warm change; no stacks touched, so this commit is
a no-op for the apply step.
wealth: move Activity log table from y=45 to y=77; the three barcharts
(Yearly return, Annual change, Per-account ROI) shift up by 14 to fill
the gap.
uk-payslip: move Sankey "where the money went" from y=80 to y=48 (right
above the table block); the three tables (Data integrity, All payslips,
YTD reconciliation) shift down by 14 so all four tables (4, 5, 6, 9) sit
contiguously at the bottom.
fire-planner and job-hunter still have intentional side-by-side
table/chart pairings; left untouched pending user direction on whether
to break them.
Trailing 12-month investment return % was a full-width stat at y=59.
Now sits inline with Net worth / Contribution / Growth / ROI as the
fifth headline number — top-row stats reflowed from w=6 (×4) to w=5
(×4) + w=4 (×1). Title shortened to "12mo return" so it fits.
Panels below the old row shifted up by 4 rows to close the gap.
iOS NSURLSession held a dead TCP/TLS socket past Traefik's 180s idle close,
then errored with NSURLErrorDomain -1005 on the next thumbnail. Bumping the
timeout to 600s pushes the bug to "app idle for >10 min" -- much rarer in
normal use. Verified with /home/wizard/.claude/immich-scroll-sim.py
keepalive probe: 200s idle, mean reuse latency +1.8ms over warmup (was ~50ms
TLS handshake penalty before). Synthesis: ~/.claude/immich-debug/synthesis.md.
The weekly backup mounts the same RWO PVC (proxmox-lvm-encrypted) as the
main nextcloud deployment. Single-node attach — the backup pod can never
mount the volume if it lands on a different node, and was stuck in
ContainerCreating for 6+ hours when cron fired today.
Add pod_affinity (required, hostname topology) so the backup co-locates
with the nextcloud app pod. Discovered via cluster-health probe; manual
verify run scheduled on k8s-node3 next to nextcloud's pod and completed
the rsync in seconds.
Switch the RSU stack from "after band-aware tax" to gross. Receipt
total is now pre-sacrifice gross compensation; bar − pension stack
≈ ytd_gross reported on the final March payslip / P60.
Verified alignment for 2025/26: bar−pension = £266,752 vs P60
ytd_gross = £268,127 — gap of £1,375 ≈ "other taxable" (benefits,
overtime). Remaining year-level gaps are upstream parser/ingest
issues, not dashboard logic:
- 2024/25 +£27k: March 2025 payslip parsed bonus=£26,969 but never
propagated it into gross_pay/income_tax. Receipt is more
accurate than ytd_gross here.
- 2023/24 −£36k: Feb 2024 payslip row appears to be missing from
the table; ytd_gross has it, sum(gross_pay) doesn't.
- 2022/23 −£10k: variant A→B transition residual.
SQL simplified — band-aware CTE chain dropped (no longer needed for
this panel since RSU is shown gross).
Mobile timeline scrubs prefetch ~100 thumbs in <1s, which exhausted the
immich-rate-limit (avg=500, burst=5000) and produced a cascade of HTTP
429s. CrowdSec's local http-429-abuse scenario then fired captcha:1 on
the source IP (alert #291, 2026-04-25 — owner's Hyperoptic IPv6).
Two changes:
- crowdsec: add a second whitelist doc (viktor/immich-asset-paths-whitelist)
filtering events by Immich asset paths so they never feed leaky buckets.
Auth endpoints intentionally excluded — brute-force protection unchanged.
- traefik: raise immich-rate-limit avg=500->1000, burst=5000->20000 so
legitimate mobile scrubs don't produce 429s in the first place.
The salary field on the payslip is pre-pension-sacrifice, so the
"Salary (gross)" stack already silently included the salary-sacrifice
pension contribution. Split it out so pension is explicitly visible:
- Salary (cash, post-sacrifice) = salary - pension_sacrifice
- Pension (salary sacrifice, untaxed) = pension_sacrifice
- Bonus
- RSU vest (after band-aware tax)
Bar total unchanged (just relabels what was already there). Pension
is now visibly counted as income — consistent with "untaxed but real"
framing.
Caveat documented in panel description: receipt total ≠ P60 gross
because P60 reports pre-RSU-tax gross. Receipt shows RSU net of tax
per earlier intent. To exactly match P60, swap rsu_after_tax →
rsu_vest gross.
Move both barchart/timeseries panels into row 4 (y=29, side-by-side
w=12 each, h=10) so the per-tax-year overviews appear right after
the income-tax-and-pension YTD row. Shift panels 13, 4, 5, 6, 8, 9
down by 10 to accommodate.
Final ordering: rows 1–3 = monthly + YTD timeseries (panels 1/7/2/3/11/12),
row 4 = yearly receipt + YTD gross YoY (16/17), then the wider
deduction/integrity/table panels below.
Removed:
- Panel 10 "HMRC Tax Year Reconciliation — Individual Tax API"
→ references hmrc_sync.tax_year_snapshot schema. The hmrc-sync
service / DB has not been deployed, so the panel always errored
with "relation does not exist".
- Panel 14 "Meta payroll: bank deposit vs payslip net pay"
→ references payslip_ingest.external_meta_deposits, which is
created by alembic migration 0007. The deployed payslip-ingest
image is at 0005, so the table doesn't exist.
- Panel 15 "RSU vest reconciliation — payslip vs Schwab"
→ references payslip_ingest.rsu_vest_events, created by migration
0008. Same image-staleness story.
Verified all 14 remaining panels return without error via Grafana
/api/ds/query. SQL for the removed panels is preserved in git history;
re-add when the data sources are actually deployed.
Replace the 7-stack "where total comp went" decomposition with a 3-stack
"what I actually earned" view: salary (gross), bonus (gross), and RSU
vest after band-aware tax (PAYE+NI withheld via sell-to-cover). Skips
income tax / NI / student loan / pension / RSU offset.
Bar height = real income kept across all components. RSU is net of tax
because it's withheld at source and never hits the bank account; salary
and bonus are gross because they're paid in full and taxes are deducted
elsewhere. This is the income-side view where tax is implicit, not the
deduction waterfall.
Per-year RSU after tax: 2020/21 £18k · 2021/22 £39k · 2022/23 £50k ·
2023/24 £26k · 2024/25 £71k · 2025/26 £73k.
Two bugs:
1. Synthetic dates projected onto 1970/71 fell outside the dashboard's
default time range (now-10y → now), so Grafana filtered out every
point. Switched to a sliding 12-month window
(CURRENT_DATE - INTERVAL '12 months') as the projection base, plus
a per-panel timeFrom: "13M" override so the panel always shows the
last 13 months regardless of the dashboard's time picker.
2. ORDER BY tax_year, pay_date violated Grafana's long→wide conversion
requirement (data must be ascending by time). Wrapped in a CTE and
re-ordered by the synthetic time column. Pivoted result is now a
single wide frame with 7 series (2019/20…2025/26).
The default fieldConfig unit (percent on Yearly investment return %,
currencyGBP on Annual change decomposition) was being applied to the
"year" string column too — so x-axis labels rendered as "2024%" and
"£2,024" respectively. Add field overrides on the "year" column to
force unit=string. The earlier "tax_year" panels weren't affected
because "2024/25" doesn't parse as a number; "2024" did.
Wealth dashboard:
- "Yearly growth %" → "Yearly investment return %": switched to
modified-Dietz formula `market_gain / (nw_start + 0.5 × contributions)`
so contributions don't inflate the return. New money in is excluded —
this is portfolio performance, not net-worth change.
- "Trailing 12-month growth %" → "Trailing 12-month investment return %":
same formula, applied to the trailing 12mo window.
Pre-fix vs post-fix:
2020: 155.0% → 5.12% (large contributions on small base)
2021: 344.7% → 26.45%
2022: 26.9% → -25.65% (the actual 2022 bear market)
2023: 123.2% → 41.60%
2024: 87.4% → 25.70%
2025: 46.8% → 8.43%
2026: 16.7% → 3.28% (YTD)
UK Payslip dashboard:
- Replaced the per-tax-year stacked bar with a year-over-year line chart:
one line per tax year, X = month-of-tax-year (April→March, projected
onto a 1970/71 fiscal calendar so years overlay), Y = cumulative YTD
gross. Five+ lines visible at a glance for trend comparison.
Wealth (4 new panels at the bottom):
- Trailing 12-month growth % (stat) — % change in net worth over last 12mo.
- Yearly growth % (bar per calendar year) — first→last valuation each year.
- Annual change decomposition (stacked bar) — splits each year's NW change
into "net contributions" (new money in) and "market gain" (everything
else: appreciation, dividends, FX). Answers "did I grow because I saved
or because the market did the work?".
- Per-account ROI % (horizontal bar) — (value − contribution) / contribution
× 100, latest snapshot. Excludes accounts with zero/negative net
contribution (Schwab — distorts ratio after RSU sells).
UK Payslip (1 new panel below the yearly receipt):
- Gross composition by tax year (stacked bar) — salary / bonus / RSU vest /
other components per tax year. Bar height = gross pay. Trends in salary
growth, bonus levels, and RSU vest sizing at a glance.
All queries spot-checked via Grafana /api/ds/query.
Folder ACL:
- Move uk-payslip + wealth dashboards to a new "Finance (Personal)"
folder; job-hunter + fire-planner stay in "Finance" (open).
- New null_resource calls Grafana's folder permissions API after the
dashboard sidecar materialises the folder, setting an admin-only
ACL ({Admin: 4}). Default Viewer/Editor inheritance is overridden,
so anonymous-Viewer (auth.anonymous=true) is denied. Server-admin
always retains access.
- Verified: anonymous → 403 on uk-payslip + wealth, 200 on
control dashboards (node-exporter); admin → 200 on all.
Wealth cash fix:
- Wealthfolio dumps WORKPLACE_PENSION wrappers entirely into
cash_balance because it doesn't track underlying fund holdings.
Reclassify pension cash as invested in the "Cash vs invested"
panel so the cash series reflects actual uninvested broker cash
(~£16k T212 ISA + Schwab) instead of phantom £154k.
Pre-fix: cash=£153,789 / invested=£870,282 / total=£1,024,071
Post-fix: cash=£16,064 / invested=£1,008,008 / total=£1,024,071
Deleted the 6 NFS PVs orphaned by the Phase 2 rolling and removed
their /srv/nfs/<dir> subtrees on the PVE host (~1.5 GB; vault-2 audit
log was 1.4 GB on its own). Cluster-wide Released-PV sweep on the
proxmox-lvm/encrypted side stays out of scope.
Wealthfolio's daily_account_valuation includes a row with
account_id='TOTAL' that pre-aggregates the per-account values for that
day. Mirroring it into PG verbatim caused every SUM(total_value) in
the Wealth dashboard to double-count (showing ~£2M against actual
~£1M). Drop the synthetic row at the dump step so the PG mirror only
holds real-account rows.
Initial sync after fix: 8,649 DAV rows (was 10,798), net worth resolves
to £1,024,071 — matches the per-account latest snapshot.
New stacks/fire-planner/ mirrors payslip-ingest layout:
- ExternalSecret pulling RECOMPUTE_BEARER_TOKEN from Vault secret/fire-planner
- DB ExternalSecret templating DB_CONNECTION_STRING via static role pg-fire-planner
- FastAPI Deployment (serve), CronJob (recompute-all monthly on 2nd at 09:00 UTC,
scheduled after wealthfolio-sync's 1st at 08:00), ClusterIP Service
- Grafana datasource ConfigMap "FirePlanner" — `database` inside jsonData
(cc56ba29 fix; otherwise Grafana 11.2+ hits "you do not have default database")
Plus:
- vault/main.tf: pg-fire-planner static role (7d rotation), allowed_roles
- dbaas/modules/dbaas/main.tf: null_resource creates fire_planner DB+role
- monitoring/dashboards/fire-planner.json: 9-panel Finance-folder dashboard
(NW timeseries, MC fan chart, success heatmap, lifetime tax bars,
years-to-ruin table, optimal leave-UK stat, ending wealth stat,
UK success-by-strategy bars, sequence-risk correlation table)
- monitoring/modules/monitoring/grafana.tf: register "fire-planner.json" in Finance folder
Apply order:
1. vault stack — creates the static role
2. dbaas stack — creates the database & role
3. external-secrets stack picks up vault-database refs (no change needed)
4. fire-planner stack — first apply with -target=kubernetes_manifest.db_external_secret
before full apply, per the plan-time-data-source pattern
5. monitoring stack — picks up the new dashboard ConfigMap
[ci skip]
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
All 3 vault voters now on proxmox-lvm-encrypted (vault-0 16:18, vault-1
+ vault-2 today). The NFS fsync incompatibility identified in the
2026-04-22 raft-leader-deadlock post-mortem is no longer reachable —
raft consensus log + audit log live on LUKS2 block storage with real
fsync semantics.
Cluster-wide consumers of the inline kubernetes_storage_class.nfs_proxmox
dropped to zero after the rolling, so the resource is removed from
infra/stacks/vault/main.tf. Released NFS PVs (6) remain in the cluster
and will be reclaimed in Phase 3 cleanup.
Lesson learned (recorded in plan): pvc-protection finalizer races the
StatefulSet controller — pod recreates on the OLD PVCs unless the
finalizer is patched out before pod delete. Force-finalize technique
applied to vault-1 + vault-2 successfully.
Closes: code-gy7h