infra

Author	SHA1	Message	Date
Viktor Barzin	dfbf6faf3d	priority-pass: backend f4246691 (QR fit fix + persist uploads), add encrypted PVC Backend changes: - transformers.py: QR container now sized to actual qr_bbox + 8% padding (was fixed at 45% of card width). When QR was wider than 45% of card, the leftover-pixel branch color-remapped QR pixels outside the container, breaking the scan. New container always encloses qr_mask. - main.py: persist input + output + json metadata under $UPLOAD_DIR/<airline>/<ts-uuid>-{input.<ext>,output.png,*.json} for future training. Failure to save is logged, never breaks the API. Infra: - New PVC priority-pass-uploads (1Gi proxmox-lvm-encrypted, 10Gi autoresize cap) — encrypted because boarding passes contain PII. - Deployment strategy → Recreate (RWO requirement). - Volume + volumeMount + UPLOAD_DIR env on backend container. Applied via kubectl (TF state for this stack is empty — see prior commit). New pod priority-pass-77956b64fb rolled out, PVC bound, test transform succeeded, sample written to /data/uploads/ryanair/. [ci skip]	2026-05-01 18:50:51 +00:00
Viktor Barzin	ce7a584801	priority-pass: frontend ea9176f8 (gallery upload), sync backend pin to live Frontend bug fix: photo input forced camera on mobile via capture="environment". Added separate "Choose from Gallery" / "Take Photo" buttons so users can pick from their photo library. Backend image unchanged; pin synced from stale v8 to live SHA ae1420a0 (the v8 tag was never pushed to the registry). Image built locally and pushed to registry.viktorbarzin.me. The priority-pass project directory isn't under git, so deployment was applied via kubectl set image (matches existing pattern — TF state for this stack is empty). [ci skip]	2026-05-01 18:38:30 +00:00
Viktor Barzin	664a85ef1e	Revert "monitoring(wealth): show daily points + lighter fill on timeseries" This reverts commit `5472720c75`.	2026-05-01 16:24:18 +00:00
Viktor Barzin	5472720c75	monitoring(wealth): show daily points + lighter fill on timeseries Make daily movements visible on the line charts. The y-axis still spans ~£700k–£1M so an £8k daily move is ~1% of vertical range and easy to miss when only the line is drawn. Changes per panel: * 5 (Net worth): showPoints never→always, pointSize 4→5, fillOpacity 20→10 * 6 (Net contrib vs market): showPoints never→always, pointSize 4→5 * 7 (Growth over time): showPoints never→always, pointSize 4→5, fillOpacity 50→25 * 8 (Per-account stacked): showPoints never→always (kept stacking fill at 70) * 9 (Cash vs invested stacked): showPoints never→always (kept stacking fill at 70) Each daily value now renders as a visible dot, so even if the line appears flat at this scale, the per-day points trace the wiggle. Lighter fill on the unstacked panels lets the line + points dominate visually. Caveat: the fundamental "£8k on a £1M base" visibility issue is best solved with a dedicated "Daily change" delta panel — happy to add one on next pass if this isn't enough.	2026-05-01 16:23:25 +00:00
Viktor Barzin	2722260ce9	monitoring(wealth): unbreak timeseries SQL — over-escaped time alias Fix: panels 5–9 had `AS \"time\"` (literal backslash-quote sequence embedded in the SQL string). PostgreSQL parsed that as a syntax error at the leading backslash: ERROR: syntax error at or near "\" LINE 1: ...complete_dates)) SELECT valuation_date::timestamp AS \"time\" Root cause: the patch script for the skew-resilient queries (commit `628f5a0d`) used a Python f-string with `\\\"time\\\"`, which produces a literal backslash-quote in the Python string. When that string was JSON-encoded the backslash was preserved verbatim instead of collapsed to plain `"time"`. Replaces all five occurrences with the correct `AS "time"` form. Verified the corrected query against PG returns 7 daily net-worth rows for 04-25..05-01 as expected.	2026-05-01 16:19:07 +00:00
Viktor Barzin	d67416d4ca	monitoring(wealth): tighten default time range, bump decimals for granularity Two adjustments to make daily movements visible: 1. Default time range: now-5y → now-180d. The timeseries charts (Net worth, Net contribution vs market value, Growth, Per-account stacked, Cash vs invested) auto-fit their y-axis to the data range in view. Over 5 years, daily £1k–£10k moves are ~1% of axis range and visually invisible against the cumulative trend. Over 6 months, the same daily moves dominate. Yearly bar charts (12, 13) are unaffected — they aggregate by calendar year and don't filter on $__timeFilter. 2. Decimals → 2 on every currency panel (1, 2, 3, 5–9, 13, 15, 16) and every percent panel (4, 14). Stat panels now show pennies on currency and 0.01% on rates; chart y-axis ticks are likewise more precise. Honest caveat: pennies on a £1M number don't make the absolute readout easier — to see "today changed by £8,358" cleanly we'd want a dedicated delta panel; pending user direction. Widen the time picker manually to recover the 5-year view; default just zooms into the last 6 months.	2026-05-01 16:15:39 +00:00
Viktor Barzin	628f5a0d26	monitoring(wealth): skew-resilient queries, no more partial-day dips Bug witnessed 2026-05-01: dashboard "Net worth (current)" showed £88k instead of £1.03M because at 02:00 UTC an external trigger refreshed ONE account (Trading212 ISA), creating its 05-01 daily_account_valuation row. The 5 other accounts still had their last row at 04-30. The panel SQL `WHERE valuation_date = (SELECT MAX(valuation_date))` then summed only the single account that had a 05-01 row. Two new SQL patterns adopted across all 15 affected panels: 1. Stat / barchart "current snapshot" panels (1, 2, 3, 4, 11, 14, 15, 16): latest-per-account stitching — WITH latest AS (SELECT DISTINCT ON (d.account_id) ... FROM daily_account_valuation d JOIN accounts a ON a.id = d.account_id ORDER BY d.account_id, d.valuation_date DESC) gives a coherent "now" snapshot regardless of refresh skew, and the inner join filters out orphan/deleted accounts (one such was adding a stale £33k from 04-17). 12-month panels add a parallel `ago` CTE picking each account's row closest to (d_now - 12mo). 2. Time-series / yearly panels (5, 6, 7, 8, 9, 12, 13): complete-days- only filter — WITH active_accounts AS (SELECT COUNT() FROM accounts), complete_dates AS (SELECT valuation_date FROM daily_account_valuation d JOIN accounts a ON a.id=d.account_id GROUP BY valuation_date HAVING COUNT() >= active.n) so a partial today never renders as a chart dip. The day rejoins the chart automatically once the daily 16:00 UTC sync writes rows for every account. Verified end-to-end against live PG: new queries produce £1,033,734 (matches the 6 active accounts' true latest sum) where the old query gave £88k.	2026-05-01 16:08:18 +00:00
Viktor Barzin	1d3ae01aac	wealthfolio(daily-sync): API call CronJob, replaces rollout-restart Restart-only didn't refresh the wealth Grafana dashboard — verified empirically: a fresh `daily_account_valuation` row only lands when a PortfolioJob runs with ValuationRecalcMode != None, and Wealthfolio's internal schedulers don't trigger that path: - 6h quotes scheduler refreshes the `quotes` table only. - 4h broker scheduler short-circuits on missing `sync_refresh_token`. The right knob is `POST /api/v1/market-data/sync`. Replaced the rollout-restart CronJob (+ its SA/Role/RoleBinding) with a curl-based CronJob that logs in (`POST /api/v1/auth/login`) then POSTs to `/api/v1/market-data/sync` with the session cookie. Backfills missing days via IncrementalFromLast in one call. Schedule 16:00 UTC (= 17:00 BST): * After UK market close (15:30 UTC BST), EOD UK prices settled. * US market open ~2.5h, intra-day US quotes fresh. * pg-sync next :07 tick mirrors → Grafana refresh ≤5m → fresh data by ~17:12 BST, comfortably before the 18:00 BST target. Plaintext password lives in Vault `secret/wealthfolio.web_password`, flows via the existing `dataFrom.extract` ExternalSecret — no extra ESO wiring needed. Verified end-to-end: API call backfilled 04-26 through 04-29, pg-sync mirrored, PG now shows rows up to today.	2026-04-29 21:21:24 +00:00
Viktor Barzin	31b9e5d4a9	monitoring(wealth): add 12mo contrib + 12mo gain to top row Top row goes from 5 → 7 stat panels (widths 4+4+4+3+3+3+3=24): - Net worth, Net contribution, Growth shrink from w=5 to w=4. - ROI % shrinks from w=5 to w=3 (now sits at x=12). - 12mo return slides from x=20/w=4 to x=15/w=3. - New: 12mo contrib (id=15, currency, blue) at x=18 — net contributions added in the trailing 12 months. - New: 12mo gain (id=16, currency, red/green) at x=21 — pure market gain in £ over the trailing 12 months (12mo Δnet-worth − 12mo contribs). Live values verified against PG: contrib_12mo=£245k, gain_12mo=£172k, sum = £417k = nw_now − nw_ago, return = 23.51%.	2026-04-27 06:32:53 +00:00
Viktor Barzin	cd96fb64a8	phpipam-pfsense-import: every 5min → hourly Reduces 5-min disk-write spikes on PVE sdc. The cronjob was the heaviest single contributor in our hourly fan-out investigation (11.2 MB/s burst when it fired). Kea DDNS still handles real-time DNS auto-registration; phpIPAM inventory just lags by up to 1h, which we don't need fresher. Docs (dns.md, networking.md, .claude/CLAUDE.md) updated to match.	2026-04-26 22:48:43 +00:00
Viktor Barzin	6ad5292128	immich: bump server to 8Gi + override tier-2-gpu quota to 20Gi Eliminates the OOM-on-face-detection-burst class of incidents (2026-04-26). VPA upper for immich-server is 2.98Gi steady-state; the prior 4Gi limit was 1.34x upper and still got SIGKILL'd when face-detection bursts pushed transient RSS past 4Gi. 8Gi gives 2.7x VPA upper headroom. The kyverno tier-2-gpu default quota is 12Gi requests.memory which can't fit 8Gi (server) + 3.5Gi (ML) + 3Gi (PG) + backup CronJobs simultaneously. Opts the namespace into the kyverno custom-quota exclude rule and overrides with 20Gi (~4.5Gi headroom) — same pattern as woodpecker/nvidia.	2026-04-26 20:02:28 +00:00
Viktor Barzin	d093aed7f6	immich(server,ml): bump server to 4Gi + Recreate strategy on tight quota Root cause of 502/503/decode errors clustered at 19:20 BST 2026-04-26: immich-server hit its 3500Mi memory limit during a face-detection burst and was OOMKilled (Exit Code 137). VPA upperBound is 3050Mi but real-world bursts crossed it; with the single pod running both API and microservices workers, the OOM took the API down for ~30s of restart, surfacing as PlatformException image decode + 502 on uploads + 503 on ActivityService to the iOS app. Bump immich-server requests=limits to 4096Mi (per CLAUDE.md "upperBound x 1.3 for volatile workloads" rule, with headroom over the OOM mark). Quota math: 9680Mi used - 2000Mi old req + 4096Mi new req = 11776Mi, fits the tier-2-gpu 12Gi cap. Switch both immich-server and immich-machine-learning to Recreate strategy: the namespace tier-2-gpu quota is too tight for RollingUpdate to keep an old + new pod up during apply (transient 13776Mi > 12Gi cap, see "ResourceQuota blocks rolling updates" in CLAUDE.md). With single replicas and Recreate, future memory tweaks no longer require manual scale-to-0 dance. Verified: new pod has limits.memory=4Gi, quota usage stable at 11776Mi/12Gi, immich API serving normally. Note: a pending node_selector drift on immich-machine-learning (gpu=true -> nvidia.com/gpu.present=true) also reconciled in this apply; the canonical NVIDIA operator label already on the GPU node, no scheduling impact.	2026-04-26 19:11:50 +00:00
Viktor Barzin	07bc0098e3	ci(woodpecker): show full terraform error on stack apply failure The default workflow truncated the failed-stack output at `tail -5`, which only captured the trailing source-line indicator (`│ 45: resource …`) and dropped the actual `Error: …` line above it. Bump to `tail -50` so the real error is visible without re-running locally to reproduce. Also fix the pre-warm step's FIRST_STACK detection — `head -1 file1 file2 \| head -1` returns the file header (`==> .platform_apply <==`), not the first stack name, so the cd then fails with "no such file or directory". Use `cat \| head -1` instead. Pure logging-and-pre-warm change; no stacks touched, so this commit is a no-op for the apply step.	2026-04-26 18:39:46 +00:00
Viktor Barzin	215717c90f	monitoring(dashboards): tables at the bottom convention wealth: move Activity log table from y=45 to y=77; the three barcharts (Yearly return, Annual change, Per-account ROI) shift up by 14 to fill the gap. uk-payslip: move Sankey "where the money went" from y=80 to y=48 (right above the table block); the three tables (Data integrity, All payslips, YTD reconciliation) shift down by 14 so all four tables (4, 5, 6, 9) sit contiguously at the bottom. fire-planner and job-hunter still have intentional side-by-side table/chart pairings; left untouched pending user direction on whether to break them.	2026-04-26 18:30:52 +00:00
Viktor Barzin	bb28485ce0	monitoring(wealth): move 12mo return to top bar, shrink to w=4 Trailing 12-month investment return % was a full-width stat at y=59. Now sits inline with Net worth / Contribution / Growth / ROI as the fifth headline number — top-row stats reflowed from w=6 (×4) to w=5 (×4) + w=4 (×1). Title shortened to "12mo return" so it fits. Panels below the old row shifted up by 4 rows to close the gap.	2026-04-26 18:19:24 +00:00
Viktor Barzin	532285e48c	traefik: raise websecure idleTimeout 180s -> 600s for iOS Immich -1005 iOS NSURLSession held a dead TCP/TLS socket past Traefik's 180s idle close, then errored with NSURLErrorDomain -1005 on the next thumbnail. Bumping the timeout to 600s pushes the bug to "app idle for >10 min" -- much rarer in normal use. Verified with /home/wizard/.claude/immich-scroll-sim.py keepalive probe: 200s idle, mean reuse latency +1.8ms over warmup (was ~50ms TLS handshake penalty before). Synthesis: ~/.claude/immich-debug/synthesis.md.	2026-04-26 12:32:05 +00:00
Viktor Barzin	3489621a45	nextcloud(backup): pin backup pod to nextcloud's node via podAffinity The weekly backup mounts the same RWO PVC (proxmox-lvm-encrypted) as the main nextcloud deployment. Single-node attach — the backup pod can never mount the volume if it lands on a different node, and was stuck in ContainerCreating for 6+ hours when cron fired today. Add pod_affinity (required, hostname topology) so the backup co-locates with the nextcloud app pod. Discovered via cluster-health probe; manual verify run scheduled on k8s-node3 next to nextcloud's pod and completed the rsync in seconds.	2026-04-26 11:03:20 +00:00
Viktor Barzin	a24cd7ceb7	monitoring(uk-payslip): yearly receipt aligns with P60 (RSU gross) Switch the RSU stack from "after band-aware tax" to gross. Receipt total is now pre-sacrifice gross compensation; bar − pension stack ≈ ytd_gross reported on the final March payslip / P60. Verified alignment for 2025/26: bar−pension = £266,752 vs P60 ytd_gross = £268,127 — gap of £1,375 ≈ "other taxable" (benefits, overtime). Remaining year-level gaps are upstream parser/ingest issues, not dashboard logic: - 2024/25 +£27k: March 2025 payslip parsed bonus=£26,969 but never propagated it into gross_pay/income_tax. Receipt is more accurate than ytd_gross here. - 2023/24 −£36k: Feb 2024 payslip row appears to be missing from the table; ytd_gross has it, sum(gross_pay) doesn't. - 2022/23 −£10k: variant A→B transition residual. SQL simplified — band-aware CTE chain dropped (no longer needed for this panel since RSU is shown gross).	2026-04-26 10:24:06 +00:00
Viktor Barzin	d0152e1f38	crowdsec/traefik: stop captchaing legit Immich mobile bursts Mobile timeline scrubs prefetch ~100 thumbs in <1s, which exhausted the immich-rate-limit (avg=500, burst=5000) and produced a cascade of HTTP 429s. CrowdSec's local http-429-abuse scenario then fired captcha:1 on the source IP (alert #291, 2026-04-25 — owner's Hyperoptic IPv6). Two changes: - crowdsec: add a second whitelist doc (viktor/immich-asset-paths-whitelist) filtering events by Immich asset paths so they never feed leaky buckets. Auth endpoints intentionally excluded — brute-force protection unchanged. - traefik: raise immich-rate-limit avg=500->1000, burst=5000->20000 so legitimate mobile scrubs don't produce 429s in the first place.	2026-04-26 09:27:16 +00:00
Viktor Barzin	222013806d	monitoring(uk-payslip): split salary into cash + pension on yearly receipt The salary field on the payslip is pre-pension-sacrifice, so the "Salary (gross)" stack already silently included the salary-sacrifice pension contribution. Split it out so pension is explicitly visible: - Salary (cash, post-sacrifice) = salary - pension_sacrifice - Pension (salary sacrifice, untaxed) = pension_sacrifice - Bonus - RSU vest (after band-aware tax) Bar total unchanged (just relabels what was already there). Pension is now visibly counted as income — consistent with "untaxed but real" framing. Caveat documented in panel description: receipt total ≠ P60 gross because P60 reports pre-RSU-tax gross. Receipt shows RSU net of tax per earlier intent. To exactly match P60, swap rsu_after_tax → rsu_vest gross.	2026-04-26 09:18:32 +00:00
root	423aac0908	Woodpecker CI Update TLS Certificates Commit	2026-04-26 00:03:26 +00:00
Viktor Barzin	21ac619fac	monitoring(uk-payslip): promote yearly receipt + YTD gross YoY to row 4 Move both barchart/timeseries panels into row 4 (y=29, side-by-side w=12 each, h=10) so the per-tax-year overviews appear right after the income-tax-and-pension YTD row. Shift panels 13, 4, 5, 6, 8, 9 down by 10 to accommodate. Final ordering: rows 1–3 = monthly + YTD timeseries (panels 1/7/2/3/11/12), row 4 = yearly receipt + YTD gross YoY (16/17), then the wider deduction/integrity/table panels below.	2026-04-25 23:58:15 +00:00
Viktor Barzin	53f555dc61	monitoring(uk-payslip): drop 3 panels referencing undeployed data Removed: - Panel 10 "HMRC Tax Year Reconciliation — Individual Tax API" → references hmrc_sync.tax_year_snapshot schema. The hmrc-sync service / DB has not been deployed, so the panel always errored with "relation does not exist". - Panel 14 "Meta payroll: bank deposit vs payslip net pay" → references payslip_ingest.external_meta_deposits, which is created by alembic migration 0007. The deployed payslip-ingest image is at 0005, so the table doesn't exist. - Panel 15 "RSU vest reconciliation — payslip vs Schwab" → references payslip_ingest.rsu_vest_events, created by migration 0008. Same image-staleness story. Verified all 14 remaining panels return without error via Grafana /api/ds/query. SQL for the removed panels is preserved in git history; re-add when the data sources are actually deployed.	2026-04-25 23:56:03 +00:00
Viktor Barzin	b2a25775aa	monitoring(uk-payslip): simplify yearly receipt to earned-and-kept view Replace the 7-stack "where total comp went" decomposition with a 3-stack "what I actually earned" view: salary (gross), bonus (gross), and RSU vest after band-aware tax (PAYE+NI withheld via sell-to-cover). Skips income tax / NI / student loan / pension / RSU offset. Bar height = real income kept across all components. RSU is net of tax because it's withheld at source and never hits the bank account; salary and bonus are gross because they're paid in full and taxes are deducted elsewhere. This is the income-side view where tax is implicit, not the deduction waterfall. Per-year RSU after tax: 2020/21 £18k · 2021/22 £39k · 2022/23 £50k · 2023/24 £26k · 2024/25 £71k · 2025/26 £73k.	2026-04-25 23:42:20 +00:00
Viktor Barzin	a17304f735	monitoring(uk-payslip): fix empty YTD gross YoY chart Two bugs: 1. Synthetic dates projected onto 1970/71 fell outside the dashboard's default time range (now-10y → now), so Grafana filtered out every point. Switched to a sliding 12-month window (CURRENT_DATE - INTERVAL '12 months') as the projection base, plus a per-panel timeFrom: "13M" override so the panel always shows the last 13 months regardless of the dashboard's time picker. 2. ORDER BY tax_year, pay_date violated Grafana's long→wide conversion requirement (data must be ascending by time). Wrapped in a CTE and re-ordered by the synthetic time column. Pivoted result is now a single wide frame with 7 series (2019/20…2025/26).	2026-04-25 23:36:16 +00:00
Viktor Barzin	ac18c49a7b	monitoring(wealth): fix x-axis label formatting on yearly bars The default fieldConfig unit (percent on Yearly investment return %, currencyGBP on Annual change decomposition) was being applied to the "year" string column too — so x-axis labels rendered as "2024%" and "£2,024" respectively. Add field overrides on the "year" column to force unit=string. The earlier "tax_year" panels weren't affected because "2024/25" doesn't parse as a number; "2024" did.	2026-04-25 23:31:03 +00:00
Viktor Barzin	77bed10a51	monitoring: investment-only returns + YoY YTD gross line chart Wealth dashboard: - "Yearly growth %" → "Yearly investment return %": switched to modified-Dietz formula `market_gain / (nw_start + 0.5 × contributions)` so contributions don't inflate the return. New money in is excluded — this is portfolio performance, not net-worth change. - "Trailing 12-month growth %" → "Trailing 12-month investment return %": same formula, applied to the trailing 12mo window. Pre-fix vs post-fix: 2020: 155.0% → 5.12% (large contributions on small base) 2021: 344.7% → 26.45% 2022: 26.9% → -25.65% (the actual 2022 bear market) 2023: 123.2% → 41.60% 2024: 87.4% → 25.70% 2025: 46.8% → 8.43% 2026: 16.7% → 3.28% (YTD) UK Payslip dashboard: - Replaced the per-tax-year stacked bar with a year-over-year line chart: one line per tax year, X = month-of-tax-year (April→March, projected onto a 1970/71 fiscal calendar so years overlay), Y = cumulative YTD gross. Five+ lines visible at a glance for trend comparison.	2026-04-25 23:25:42 +00:00
Viktor Barzin	55d1da41f6	monitoring: more growth detail in Wealth + gross composition in UK Payslip Wealth (4 new panels at the bottom): - Trailing 12-month growth % (stat) — % change in net worth over last 12mo. - Yearly growth % (bar per calendar year) — first→last valuation each year. - Annual change decomposition (stacked bar) — splits each year's NW change into "net contributions" (new money in) and "market gain" (everything else: appreciation, dividends, FX). Answers "did I grow because I saved or because the market did the work?". - Per-account ROI % (horizontal bar) — (value − contribution) / contribution × 100, latest snapshot. Excludes accounts with zero/negative net contribution (Schwab — distorts ratio after RSU sells). UK Payslip (1 new panel below the yearly receipt): - Gross composition by tax year (stacked bar) — salary / bonus / RSU vest / other components per tax year. Bar height = gross pay. Trends in salary growth, bonus levels, and RSU vest sizing at a glance. All queries spot-checked via Grafana /api/ds/query.	2026-04-25 23:21:42 +00:00
Viktor Barzin	d48e222054	monitoring: lock Finance (Personal) folder to admin + fix cash classification Folder ACL: - Move uk-payslip + wealth dashboards to a new "Finance (Personal)" folder; job-hunter + fire-planner stay in "Finance" (open). - New null_resource calls Grafana's folder permissions API after the dashboard sidecar materialises the folder, setting an admin-only ACL ({Admin: 4}). Default Viewer/Editor inheritance is overridden, so anonymous-Viewer (auth.anonymous=true) is denied. Server-admin always retains access. - Verified: anonymous → 403 on uk-payslip + wealth, 200 on control dashboards (node-exporter); admin → 200 on all. Wealth cash fix: - Wealthfolio dumps WORKPLACE_PENSION wrappers entirely into cash_balance because it doesn't track underlying fund holdings. Reclassify pension cash as invested in the "Cash vs invested" panel so the cash series reflects actual uninvested broker cash (~£16k T212 ISA + Schwab) instead of phantom £154k. Pre-fix: cash=£153,789 / invested=£870,282 / total=£1,024,071 Post-fix: cash=£16,064 / invested=£1,008,008 / total=£1,024,071	2026-04-25 23:11:26 +00:00
Viktor Barzin	51bf38815c	vault: record Phase 3 vault Released-PV cleanup Deleted the 6 NFS PVs orphaned by the Phase 2 rolling and removed their /srv/nfs/<dir> subtrees on the PVE host (~1.5 GB; vault-2 audit log was 1.4 GB on its own). Cluster-wide Released-PV sweep on the proxmox-lvm/encrypted side stays out of scope.	2026-04-25 23:08:45 +00:00
Viktor Barzin	498400173c	wealthfolio-sync: skip the synthetic TOTAL row in ETL Wealthfolio's daily_account_valuation includes a row with account_id='TOTAL' that pre-aggregates the per-account values for that day. Mirroring it into PG verbatim caused every SUM(total_value) in the Wealth dashboard to double-count (showing ~£2M against actual ~£1M). Drop the synthetic row at the dump step so the PG mirror only holds real-account rows. Initial sync after fix: 8,649 DAV rows (was 10,798), net worth resolves to £1,024,071 — matches the per-account latest snapshot.	2026-04-25 22:59:24 +00:00
Viktor Barzin	f0ce7b0363	fire-planner: add stack, Vault DB role, dashboard, DB New stacks/fire-planner/ mirrors payslip-ingest layout: - ExternalSecret pulling RECOMPUTE_BEARER_TOKEN from Vault secret/fire-planner - DB ExternalSecret templating DB_CONNECTION_STRING via static role pg-fire-planner - FastAPI Deployment (serve), CronJob (recompute-all monthly on 2nd at 09:00 UTC, scheduled after wealthfolio-sync's 1st at 08:00), ClusterIP Service - Grafana datasource ConfigMap "FirePlanner" — `database` inside jsonData (`cc56ba29` fix; otherwise Grafana 11.2+ hits "you do not have default database") Plus: - vault/main.tf: pg-fire-planner static role (7d rotation), allowed_roles - dbaas/modules/dbaas/main.tf: null_resource creates fire_planner DB+role - monitoring/dashboards/fire-planner.json: 9-panel Finance-folder dashboard (NW timeseries, MC fan chart, success heatmap, lifetime tax bars, years-to-ruin table, optimal leave-UK stat, ending wealth stat, UK success-by-strategy bars, sequence-risk correlation table) - monitoring/modules/monitoring/grafana.tf: register "fire-planner.json" in Finance folder Apply order: 1. vault stack — creates the static role 2. dbaas stack — creates the database & role 3. external-secrets stack picks up vault-database refs (no change needed) 4. fire-planner stack — first apply with -target=kubernetes_manifest.db_external_secret before full apply, per the plan-time-data-source pattern 5. monitoring stack — picks up the new dashboard ConfigMap [ci skip] Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 17:27:19 +00:00
Viktor Barzin	484b4c7190	vault: complete Phase 2 NFS-hostile migration; remove nfs-proxmox SC All 3 vault voters now on proxmox-lvm-encrypted (vault-0 16:18, vault-1 + vault-2 today). The NFS fsync incompatibility identified in the 2026-04-22 raft-leader-deadlock post-mortem is no longer reachable — raft consensus log + audit log live on LUKS2 block storage with real fsync semantics. Cluster-wide consumers of the inline kubernetes_storage_class.nfs_proxmox dropped to zero after the rolling, so the resource is removed from infra/stacks/vault/main.tf. Released NFS PVs (6) remain in the cluster and will be reclaimed in Phase 3 cleanup. Lesson learned (recorded in plan): pvc-protection finalizer races the StatefulSet controller — pod recreates on the OLD PVCs unless the finalizer is patched out before pod delete. Force-finalize technique applied to vault-1 + vault-2 successfully. Closes: code-gy7h	2026-04-25 17:10:00 +00:00
Viktor Barzin	df2fa0a31d	state(vault): update encrypted state	2026-04-25 17:09:35 +00:00
Viktor Barzin	bf4c7618d8	wealth: SQLite→PG ETL sidecar + new Grafana dashboard Mirrors Wealthfolio's daily_account_valuation / accounts / activities from SQLite into a new PG database (wealthfolio_sync) every hour, so Grafana can chart net worth, contributions, and growth over time. Components: - dbaas: null_resource creates wealthfolio_sync DB + role on the CNPG cluster (dynamic primary lookup so it survives failover). - vault: pg-wealthfolio-sync static role rotates the password every 7d. - wealthfolio: ExternalSecret pulls the rotated password into the WF namespace; new pg-sync sidecar (alpine + sqlite + postgresql-client + busybox crond) does sqlite3 .backup → TSV dump → truncate-and-reload psql, hourly at :07. Plus a grafana-wealth-datasource ConfigMap in the monitoring namespace (uid: wealth-pg). - monitoring: new Wealth dashboard (wealth.json, 10 panels) — current net worth / contribution / growth / ROI% stats, then time-series for net worth, contribution-vs-market, growth area, per-account stacked area, cash-vs-invested, and a 100-row activity log. Initial sync: 6 accounts, 10,798 daily valuations, 518 activities. Verified PG totals match SQLite latest snapshot exactly.	2026-04-25 17:07:33 +00:00
Viktor Barzin	7dd580972a	state(vault): update encrypted state	2026-04-25 16:57:42 +00:00
Viktor Barzin	ac8d2f548b	paperless-ngx: migrate to proxmox-lvm-encrypted Document scans (receipts, contracts, IDs) are unambiguously sensitive PII. Storage decision rule defaults sensitive data to `proxmox-lvm-encrypted`, but paperless-ngx had been left on plain `proxmox-lvm` by an abandoned migration attempt that left a dormant, non-Terraform-managed encrypted PVC sitting unbound for 11 days. Cleaned up the orphan, added the encrypted PVC properly via Terraform, rsynced data with deployment scaled to 0, swapped claim_name. Plain `proxmox-lvm` PVC retained for a 7-day soak before removal. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 16:48:53 +00:00
Viktor Barzin	4f5f1ff8c2	monitoring(uk-payslip): add yearly receipt stacked barchart panel New panel 16 (barchart, h=11, y=179): one stacked bar per tax year showing total comp split into net pay (bank deposit), cash income tax, RSU tax (band-aware marginal: PAYE+NI), cash NI, student loan, pension salary- sacrifice, and RSU offset (Variant A only). X-axis = tax_year (categorical), y-axis = currencyGBP. Bar height ≈ gross_pay + pension_sacrifice (small over-attribution in Variant A years where the band-aware model exceeds recorded payslip PAYE).	2026-04-25 16:26:57 +00:00
Viktor Barzin	288efa89b3	vault: migrate vault-0 storage to proxmox-lvm-encrypted Phase 2 of the NFS-hostile migration: data + audit storageClass on the vault helm release switches from nfs-proxmox to proxmox-lvm-encrypted, then per-pod rolling swap (24h soak between). vault-0 swap done. vault-1 + vault-2 still on NFS — the rolling part is what makes this safe (raft quorum maintained by 2 healthy pods while one is replaced). Also restores chart-default pod securityContext fields. The previous `statefulSet.securityContext.pod = {fsGroupChangePolicy = "..."}` block REPLACED (not merged) the chart's defaults — fsGroup, runAsGroup, runAsUser, runAsNonRoot were all silently dropped. NFS exports were permissive enough to mask the missing fsGroup; ext4 LV volume root is root:root and the vault user (UID 100) couldn't open vault.db, CrashLoopBackOff. Fix: provide all five fields explicitly, survives future chart bumps. vault-1 and vault-2 retained their correct securityContext from when their pod specs were written to etcd, before the partial customization landed — the bug only surfaces when a pod is recreated. Pre-flight raft snapshot saved at /tmp/vault-pre-migration-*.snap (recovery anchor). Refs: code-gy7h Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 16:19:49 +00:00
Viktor Barzin	08b13858dd	state(vault): update encrypted state	2026-04-25 16:16:35 +00:00
Viktor Barzin	b3c29eda12	monitoring(uk-payslip): model UK income-tax bands + PA-taper for RSU marginal Replaces the flat 47% (45 PAYE + 2 NI) RSU marginal across panels 3, 7, 8, 11, and 12 with an exact piecewise band-aware computation. Each row computes ani_prior/ani_pre/ani_post over the tax-year YTD (chronological model — the RSU is taxed at the band its YTD ANI position occupies at the vest date, mirroring PAYE withholding behaviour). Bands (2024/25+, applied to all years): IT: 0% / 20% / 40% / 60% (PA-taper) / 45% at 12,570 / 50,270 / 100k / 125,140 NI: 0% / 8% / 2% at 12,570 / 50,270 PA-taper modelled as 60% effective IT marginal in £100k–£125,140 (40% on the £1 + 40% on the £0.50 of lost PA = 60%). Spot-checked per tax-year totals via psql; numbers diverge from the flat 47% baseline most for years where vests cross PA-taper or basic-rate bands (2020/21 ~35%, 2024/25 ~41%, 2025/26 ~43%).	2026-04-25 16:14:49 +00:00
Viktor Barzin	3f85cee1ef	state(vault): update encrypted state	2026-04-25 16:08:38 +00:00
Viktor Barzin	43e4f3f68e	immich: migrate PostgreSQL off NFS to proxmox-lvm-encrypted Live PG data moves to a 10Gi LUKS-encrypted RWO PVC. WAL fsync per commit on NFS contributed to the 2026-04-22 NFS writeback storm (2h43m recovery, 3 of 4 nodes hard-reset). Backups remain on NFS (append-only, NFS-tolerant). The init container that writes postgresql.override.conf is now gated on PG_VERSION presence — on a fresh PVC the file would otherwise make initdb refuse the non-empty PGDATA. First boot skips the override and initdb's cleanly; second boot (after a forced restart) writes the override so vchord/vectors/pg_prewarm load before the dump restore. Idempotent on initialised PVCs. Migration executed: pg_dumpall (1.9GB) → restore on encrypted PVC → REINDEX clip_index/face_index → 111,843 assets verified, external HTTP 200, all 10 extensions present (vector minor 0.8.0→0.8.1 only). LV created on PVE host, picked up by lvm-pvc-snapshot. See docs/plans/2026-04-25-nfs-hostile-migration-{design,plan}.md. Phase 2 (Vault Raft) follows under code-gy7h. Closes: code-ahr7 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:47:30 +00:00
Viktor Barzin	0d5f53f337	monitoring(uk-payslip): replace misleading take-home rates in Panel 3 Drop the two misleading series in "Effective rate & take-home % (YTD cumulative)" — both used SUM(gross_pay) as denominator while only counting cash deductions/net in the numerator, which understated take-home by 25-30 pp because RSU shares are absent from the cash deposit but present in gross. Replaced with three semantically clean angles: - ytd_paye_rate_pct: SUM(income_tax) / SUM(taxable_pay) — HMRC audit rate (~41-42% in additional-rate band), kept as before. - ytd_cash_take_home_pct: SUM(net_pay) / SUM(gross_pay - rsu_vest) — what fraction of cash earnings hits the bank (~62-65%). - ytd_total_keep_pct: (SUM(net_pay) + 0.53 × SUM(rsu_vest)) / SUM(gross_pay) — true "what I actually keep" including post-tax RSU shares (47% marginal applied to vest value), ~55-60%. Added field overrides for clear color-coding (red/green/blue). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:45:47 +00:00
Viktor Barzin	8f0d13282c	monitoring(uk-payslip): drop cash PAYE/NI from "Tax & pension — monthly" Same reasoning as panel 2: cash-side income_tax and NI are inherently bumpy in vest months due to UK cumulative PAYE catching up on YTD, and the flat-47% strip can't fix it. Panel now shows only the explicit RSU vest tax (orange, 47% × rsu_vest), student loan, and pensions. The smooth view of total cash deductions stays available on panel 12 (YTD cumulative). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:43:32 +00:00
Viktor Barzin	2230cb6cf4	monitoring(uk-payslip): drop tax/NI from "Monthly cash flow (RSU stripped)" panel Vest months still bumped 4-5x in this panel after the flat-47% strip because UK cumulative PAYE genuinely catches up YTD tax in vest months, on top of the marginal RSU portion — no arithmetic split can make that line flat without distorting the data. The cash-flow question this panel answers (what hits the bank, RSU aside) is already covered cleanly by cash_gross + net_pay; the tax detail lives on Panel 11 where the RSU split is now linear. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:30:46 +00:00
Viktor Barzin	cb3ffa6d8d	monitoring(uk-payslip): smooth quarterly RSU tax bumps via flat 47% marginal Replace the implicit pro-rata RSU/cash split with an explicit flat 47% marginal (45% PAYE + 2% NI) for the RSU vest tax stack. The orange slice now scales linearly with rsu_vest instead of wobbling around the month's effective PAYE rate; cash PAYE/NI slices have those amounts subtracted out so the stack still totals to actual deductions. Affects panel 7 (monthly), panel 12 (YTD cumulative), panel 7 (YTD uses), and the Sankey panel. Verified on 35 months of live data: sum invariant holds exactly (cash + rsu_marginal + cash_ni == income_tax + national_insurance), no negatives in cash slices. Out of scope (left raw): effective-rate %, data-integrity, payslip table, P60/HMRC reconciliation — those are audit views that use unmodified income_tax / cash_income_tax columns. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:13:29 +00:00
Viktor Barzin	4315ed5c2a	[backup] Fix lvm-pvc-snapshot Pushgateway push (stdout pollution in cmd_prune_count) cmd_prune_count's `log " Pruned: ..."` wrote to stdout, which the caller captures via `pruned=$(cmd_prune_count)`. From 2026-04-16 onward (7d retention kicked in), pruned snapshots polluted the captured value with multi-line log text, breaking the Prometheus exposition format on the metric push (`lvm_snapshot_pruned_total ${pruned}` → 400 from Pushgateway). Snapshots themselves were always fine; only the metric push silently failed for ~9 nights, eventually triggering LVMSnapshotNeverRun (alert has 48h `for:`). Fix: redirect the inner log call to stderr so cmd_prune_count's stdout contains only the count. Also adopts `infra/scripts/lvm-pvc-snapshot.sh` as the source-of-truth (was edited only on the PVE host) and updates backup-dr.md to point at the .sh and document the scp deploy. Deploy: scp infra/scripts/lvm-pvc-snapshot.sh root@192.168.1.127:/usr/local/bin/lvm-pvc-snapshot Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 14:30:58 +00:00
Viktor Barzin	d231615ebb	[monitoring] Fix fuse voltage alerts — divide raw deciVolt reading by 10 The tuya-bridge exporter reports `fuse_main_voltage` and `fuse_garage_voltage` as raw uint16 from the Tuya protocol, which encodes voltage in deciVolts (e.g. 2352 = 235.2V). The 200/260V thresholds were comparing against the raw integer, so both FuseMainVoltageAbnormal and FuseGarageVoltageAbnormal fired continuously during normal mains conditions. Dividing in the expression also makes `{{ $value }}V` render the correct human-readable value in the alert summary. Root fix would be in tuya-bridge `_decode_value()` where `name.startswith("voltage")` returns `int.from_bytes(...)` without the /10 scaling that `decode_voltage_threshold` applies. Leaving that alone to avoid breaking the automatic_transfer_switch scrape which uses a different code path (`parse_voltage_string`).	2026-04-24 11:12:56 +00:00
Viktor Barzin	a5e4db9af8	[monitoring] Tuya Cloud root-cause alert + cascade suppression New alert TuyaCloudDown fires when any _tuya_cloud_up gauge == 0 (i.e., the Tuya Cloud API rejects scrape calls — the symptom during last night's iot.tuya.com trial expiry, code=28841002). 5m for-duration beats the 15m window of the seven downstream MetricsMissing alerts, so the new Alertmanager inhibit rule suppresses the per-device noise and only TuyaCloudDown pages. Also flips helm_release.prometheus.force_update from true to false: force_update was tripping on the pushgateway PVC added in rev 188 (commit e51c104) — Helm's --force path tried to reset spec.volumeName on a bound PVC. Disabled here; re-enable temporarily when a StatefulSet volumeClaimTemplate change actually needs --force. Bundled with pre-existing working-tree additions for Fuse/Thermostat threshold alerts and expanded PowerOutage inhibit regex (landed in the same Helm revision 190). Verified: rule loaded, value=7 (all 7 tuya-bridge devices report cloud_up=0 right now), TuyaCloudDown moved pending→firing after 5m, 3 *MetricsMissing alerts currently suppressed in Alertmanager with inhibitedBy=1 (thermostat alerts still pending their 15m window, will be suppressed on transition).	2026-04-23 09:59:48 +00:00

1 2 3 4 5 ...

3049 commits