infra

Author	SHA1	Message	Date
Viktor Barzin	f10784ddb6	infra: document auth = "app\|none" tier on every legacy ingress Sweep through the 30+ stacks that predated the auth = "app" tier and were tagged auth = "none" without a comment explaining why they weren't behind Authentik. Each is now self-documenting at the call site, so the tg-level anti-exposure guard passes and future readers don't have to reverse-engineer the intent. Flipped 6 stacks from "none" to "app" — their backends have their own user auth and the new tier records that more accurately: - navidrome (Subsonic user/password) - ntfy (deny-all default + user.db tokens) - nextcloud (WebDAV/CalDAV/CardDAV app passwords) - vaultwarden (Bitwarden-compatible token auth) - headscale (OIDC + preauth keys for Tailscale nodes) - paperless-ngx (app-layer login + API tokens) Kept "none" with a comment on the rest — they're genuinely public, webhook receivers, native-protocol endpoints, OAuth callbacks, or Anubis-fronted: authentik (×2 + guest outpost), beads-server (dolt), claude-memory (bearer-token MCP), dawarich, ebooks/book-search-api, fire-planner /api, forgejo (git/OCI native clients), frigate (HA integration), immich/frame, insta2spotify /api, instagram-poster (meta fetcher), k8s-portal, matrix (native bearer), monitoring×2 (HA REST scrapes), n8n (webhooks), nvidia, onlyoffice (JWT), owntracks (HTTP Basic), postiz, privatebin (client-side enc), rybbit (analytics tracker), send (E2E file drop), tuya-bridge (API key), vault (own auth + CLI), webhook_handler, woodpecker (forgejo webhooks + OAuth), xray (×3 VPN transports). real-estate-crawler/main.tf:400 already had its comment from a prior edit — not touched here. No live state changes — auth = "app" produces the same middleware chain as auth = "none" (verified earlier this session). This commit is purely documentation + intent-tagging.	2026-05-22 14:16:44 +00:00
Viktor Barzin	20774f794d	dbaas+monitoring: bump PG max_connections to 200, add scrape + alerts Cluster grew past the 100-conn default — steady-state idle was 90/100, leaving zero headroom for terragrunt applies or transient surges. The ceiling was being discovered by Terraform crashing (pq: "remaining connection slots are reserved for roles with the SUPERUSER attribute"), not by alerting, because we had no PG scrape config at all. dbaas (Tier 0): * max_connections: 100 → 200 * shared_buffers: 512MB → 1GB (Postgres recommends ~25% of pod memory) * effective_cache_size: 1536MB → 2560MB (scaled with pod memory) * pod memory: 2Gi → 3Gi (rough rule of thumb: enough for shared_buffers + ~16MB work_mem * concurrent sorts + OS cache + overhead) * Triggers bump on null_resource.pg_cluster forces CNPG to re-apply, which rolls the cluster (standby first, then primary failover). monitoring: * New scrape job 'cnpg' on dbaas namespace pods labeled cnpg.io/podRole=instance, port name=metrics (9187). Relabels add cnpg_cluster + cnpg_role labels for alert grouping. * PGConnectionsHigh (warning, >85% for 10m) — heads-up before exhaustion. * PGConnectionsCritical (critical, >95% for 3m) — last call before refusing connections. Verified: cnpg targets up, sum(cnpg_backends_total)=84, max_connections metric=200, alert ratio 0.42 → both alerts inactive.	2026-05-22 14:16:44 +00:00
Viktor Barzin	665b6b2934	actualbudget+monitoring: per-account bank-sync metrics, drop noisy alert The bank-sync CronJob was posting to /accounts/banksync which fans out to ALL accounts in a single call. With PSD2/GoCardless's 4-successful-pulls per-account per-24h quota, a single rate-limited account would 500 the whole call, and `bank_sync_success` would flip to 0 even though the data itself was still flowing through manual UI syncs. Result: BankSyncFailing fired routinely whenever the user had been active in the UI that day — a structural false positive. Fix: * CronJob: enumerate accounts via GET /accounts, POST per-account /accounts/{id}/banksync, emit bank_sync_account_success and bank_sync_account_last_success_timestamp labelled by account name. Roll up bank_sync_success = 1 iff any account succeeded. * Alerts: drop BankSyncFailing (noise generator). Keep BankSyncStale at 48h (global drought). Add BankSyncAccountStale at 72h (catches single-account auth expiry — the real signal we wanted). Verified: manual run on bank-sync-viktor pushes 6 per-account success + timestamp series; roll-up bank_sync_success=1; no firing alerts.	2026-05-22 14:16:44 +00:00
Viktor Barzin	dd2b7de291	fix: HA Sofia REST sensors + PVC drift safety Two real issues found while triaging HomeAssistantCriticalSensorUnavailable alerts and the prometheus + technitium PVC Terminating-but-in-use state from the earlier session. 1. idrac-redfish-exporter + snmp-exporter ingresses: auth=required → auth=none. HA Sofia REST sensors scrape these endpoints programmatically; with Authentik forward-auth in front, every request got a 302 to authentik.viktorbarzin.me and the REST sensors parsed the HTML login page instead of metrics — leaving the R730, UPS, and ~20 other sensors permanently unavailable. The allow_local_access_only IP allowlist (192.168.0.0/16 + 10.0.0.0/8) already gates external access, so authentik on top was breaking machine-to-machine traffic for no security gain. 2. prometheus_server_pvc + technitium primary_config_encrypted: add lifecycle.ignore_changes = [spec[0].resources[0].requests]. The autoresizer expands these PVCs; PVCs can't shrink. Without the ignore, every TF apply tried to revert the live size back to the TF spec value, hit K8s's shrink-forbidden rule, and force-replaced the PVC. Because the pod still mounted it, the PVC went into Terminating-but-protected limbo — fine until a pod restart would have orphaned the volume. Root cause of the 2026-05-10 PVC Terminating incident. Bonus: prometheus_server_pvc threshold was the inverted "90%" (the same bug the bulk `fecfa211` sweep fixed elsewhere; my regex only matched "80%" so this one slipped through). Now "10%". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 14:16:43 +00:00
Viktor Barzin	e75bcaf394	k8s-version-upgrade: automated kubeadm/kubelet/kubectl upgrade pipeline Adds a weekly detection CronJob (Sun 12:00 UTC) that probes apt-cache madison on master for new patches + HEAD pkgs.k8s.io for next-minor availability, then POSTs to claude-agent-service to dispatch the k8s-version-upgrade agent. The agent (.claude/agents/k8s-version-upgrade.md) orchestrates: pre-flight (5 nodes Ready + halt-on-alert + 24h-quiet + plan target match) -> etcd snapshot save -> optional master containerd skew fix -> apt repo URL rewrite (minor bumps only) -> drain/upgrade/uncordon master via ssh < update_k8s.sh -> sequential workers k8s-node4 -> 3 -> 2 -> 1 with 10-min soak each -> post-flight verification Two new Upgrade Gates alerts catch failure modes: - K8sVersionSkew (kubelet/apiserver gitVersion mismatch >30m) - EtcdPreUpgradeSnapshotMissing (in_flight without snapshot_taken >10m) update_k8s.sh refactored to take --role / --release args; the agent shells it into each node via SSH pipe. update_node.sh annotated as OS-major path. Operator-facing docs: docs/runbooks/k8s-version-upgrade.md and a new section in docs/architecture/automated-upgrades.md. Secrets: secret/k8s-upgrade/{ssh_key,ssh_key_pub,slack_webhook} (ed25519 keypair distributed to all 5 nodes via authorized_keys; slack_webhook reuses kured webhook URL on initial deploy).	2026-05-22 14:16:42 +00:00
Viktor Barzin	ff5538a667	ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default false → unprotected) variable in `modules/kubernetes/ingress_factory` with `auth = string` enum (default "required" → fail-closed). Touches every ingress_factory caller so the audit decision is recorded explicitly in code. ingress_factory (Phase 3): - `auth = "required"`: standard Authentik forward-auth (the legacy `protected = true` semantic). - `auth = "public"`: forward-auth via the new `authentik-forward-auth-public` middleware → dedicated public outpost → guest auto-bind. Logged-in users keep their real identity. - `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost itself. - `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated ingresses don't need anti-AI noise; the auth flow already discourages bots). Audit pass (Phase 4) across 96 ingress_factory call sites: - 49 explicit `protected = true` → `auth = "required"` - 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3) - 64 previously-default (no protected line) → `auth = "required"` ADDED, then reviewed individually: * 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack, homepage, wrongmove UI, privatebin) → `auth = "none"` * 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC, xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich location ingestion, immich frame kiosk, headscale CP, send anonymous drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) → `auth = "none"` * Remaining ~33 → `auth = "required"` confirmed (admin tools, internal UIs, services without app-level auth) - Smoke-test promotions to `auth = "public"`: fire-planner public UI, k8s-portal API, insta2spotify callback. Three call sites in wrapper modules (`stacks/freedify/factory/`, `stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected` bool — they translate to `auth` internally, out of scope for this rename. Behavior change: previously-default ingresses now fail closed (require Authentik login) unless explicitly flipped to `auth = "none"` or `auth = "public"`. This is the audit goal — no more accidentally-unprotected surfaces. Sites that were intentionally public (Anubis content, native APIs, webhooks) are now explicitly recorded as `auth = "none"`. Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via `terraform fmt -recursive` during the audit. Behavior-neutral. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 14:16:42 +00:00
Viktor Barzin	4103ea2ba0	monitoring(prometheus): keep all 4 kubelet_volume_stats_inodes metrics pvc-autoresizer's GetMetrics() returns volume stats for a PVC only if all four kubelet_volume_stats metrics (available_bytes, capacity_bytes, inodes_free, inodes) are retrieved. The keep-list in the kubernetes-nodes scrape job had available_bytes and capacity_bytes (post `9d5da4d8`) but was missing the two inode metrics, so the autoresizer's reconcile logged "failed to get volume stats" for every PVC and never resized anything. Add kubelet_volume_stats_inodes and kubelet_volume_stats_inodes_free to the regex. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-22 14:16:42 +00:00
Viktor Barzin	278ef5f19b	monitoring(grafana): swap python3 for jq in folder-ACL local-exec CI image (ci/Dockerfile) is alpine + jq, no python3. The grafana_admin_only_folder_acl null_resource was parsing /api/folders with a python3 oneliner, which crashed every CI apply with "python3: command not found" and made every monitoring stack apply fail in CI (worked locally because the dev VM has python3). jq is already in the CI image and produces the same output.	2026-05-22 14:16:41 +00:00
Viktor Barzin	5c0ea96a91	infra: re-enable unattended-upgrades with kured prometheus-gating Reverses the March 2026 outage mitigation that disabled unattended- upgrades cluster-wide. Now re-enables it on the k8s template VM with: - Allowed-Origins limited to security/updates pockets - Package-Blacklist for k8s/containerd/runc/calico-node (apt-mark hold on the cluster-critical components) - Automatic-Reboot disabled — kured drives the actual reboots - Compatible with the existing kured + sentinel-gate flow kured side: - rebootDelay 30s, concurrency 1 - Sentinel cool-down stretched 30m → 24h (aligns with the 24h soak window from the post-mortem) - prometheusUrl + alertFilterRegexp wired so any firing non-ignored alert halts the rollout. Ignore-list excludes self-referential alerts (Watchdog/RebootRequired/KuredNodeWasNotDrained/ InfoInhibitor) that would otherwise deadlock kured. Prometheus side (already partly landed in `6c4e0966` — the "Upgrade Gates" rule group): - Refine `KubeQuotaAlmostFull` to include the resourcequota label in both the on-clause and the summary, so multi-quota namespaces (authentik, beads-server, frigate) report the quota name correctly. grafana.tf: terraform fmt whitespace only. Together with the post-mortem 2026-03-22 (memory id=390) the loop is closed: unattended-upgrades runs again, kernel-class updates can land, but only when cluster health is green and the reboot window is open.	2026-05-22 14:16:41 +00:00
Viktor Barzin	fe75fad467	monitoring: protect grafana ingress with authentik + disable anonymous - add traefik-authentik-forward-auth to grafana ingress middleware list - disable auth.anonymous (was Viewer-by-default for the public) - enable auth.proxy with X-authentik-username so Authentik users get signed in seamlessly (no double-login UX) Prometheus and Alertmanager already had forward-auth — no change.	2026-05-22 14:16:41 +00:00
Viktor Barzin	6c294d4bb0	authentik: zero-endpoints alert + upgrade-validation checklist Add `AuthentikForwardAuthFallbackActive` Prometheus alert: fires on sustained 401/s spike on the websecure entrypoint (>5/s for 5m), which is the symptom of the auth-proxy Emergency-Access fallback firing — in turn caused by zero ready endpoints on the outpost service. Why this rule and not `kube_endpoint_address_available == 0`: kube-state-metrics endpoint metrics exist as series names but never have current values in this Prometheus pipeline (something is dropping them silently). Detecting the failure at the edge via Traefik is more reliable than instrumenting the broken middle. Also fix the pre-existing `AuthentikOutpostForwardAuth400Spike` regex — the service label is `authentik-ak-outpost-...`, not `authentik-authentik-outpost-...`, so the alert never matched any series and never could have fired. Verified in Prometheus before/after the fix. Add an "Upgrade Validation Checklist" section to `.claude/reference/authentik-state.md` with the seven-step smoke test to run after Authentik chart bumps, provider bumps, or outpost pod recreation. Covers the brittle surfaces (Service selector, JSON patches, postgres backend wiring, access_token_validity TTL, edge auth flow, plan-to-zero).	2026-05-22 14:16:41 +00:00
Viktor Barzin	a89d4a7d2a	anubis: pull f1 off Anubis (XHR-vs-challenge collision) + add latency alerts f1.viktorbarzin.me is a SPA whose JS fetches /schedule, /embed, /embed-asset, … on the same path tree. With Anubis fronting `/`, those XHRs land on the challenge HTML even when the cookie should be valid, breaking the page with `Unexpected token '<', "<!doctype " ... is not valid JSON`. Removed Anubis from f1 — would need a path carve-out (the way wrongmove does for /api) to re-enable. Added a top-of-block comment so future me remembers why. Plus four new Prometheus alerts in `Slow Ingress Latency` group (stacks/monitoring/.../prometheus_chart_values.tpl): - IngressTTFBHigh (warn, 10m, avg latency >1s) - IngressTTFBCritical (crit, 5m, avg latency >3s) - IngressErrorRate5xxHigh (crit, 5m, 5xx >5%) - AnubisChallengeStoreErrors (crit, 5m, any 5xx on anubis services via Traefik — proxies for the in-pod challenge-store error since Anubis itself only exposes Go-runtime metrics) Notes from the alert author: avg-not-p95 because the existing Prometheus scrape config drops traefik bucket series; once those are restored, swap to histogram_quantile(0.95). TraefikDown inhibit rule extended to suppress these four during a Traefik outage.	2026-05-10 11:12:40 +00:00
Viktor Barzin	8c619278d3	grafana: env-var datasources + reloader so Vault rotations stop breaking dashboards Wealth, Payslips, and Job-Hunter Grafana datasources all baked the rotating PG password into their ConfigMap at TF-apply time, so every 7-day Vault static-role rotation silently broke the panels until a manual `terragrunt apply`. Same family as the recurring grafana-mysql backend bug — Grafana caches creds at startup and never picks up the new ESO-synced password without a restart. Fix: - Each source stack now creates an ExternalSecret in `monitoring` exposing the rotating password as `<NAME>_PG_PASSWORD` env-var. - Grafana mounts those via `envFromSecrets` (optional=true so a missing source stack doesn't block boot) and the datasource ConfigMaps reference `$__env{<NAME>_PG_PASSWORD}` instead of a literal password. - `reloader.stakater.com/auto: "true"` on the Grafana pod restarts it whenever any of the four DB-cred Secrets is updated. Tested end-to-end: forced `vault write -force database/rotate-role/ pg-wealthfolio-sync` → ESO synced (~30s) → reloader fired → Grafana booted with new env in ~50s total → all three /api/datasources /uid/*/health endpoints return "Database Connection OK". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-10 11:12:39 +00:00
Viktor Barzin	8c09543391	fix: restore pvc-autoresizer by allow-listing kubelet_volume_stats_available_bytes The Prometheus scrape config for the kubernetes-nodes job kept capacity_bytes + used_bytes but dropped available_bytes. pvc-autoresizer computes utilization from available/capacity, so without that metric it was silent for every PVC in the cluster — including mailserver, which filled to 89% (1.7G/2.0G) and started rejecting all inbound mail with '452 4.3.1 Insufficient system storage' (15+ hours, all real senders: Brevo, Gmail, Facebook). Also bumps the floors of mailserver (2Gi -> 5Gi, limit 10Gi) and forgejo (15Gi -> 30Gi) PVCs to recover from the immediate outage, and adds ignore_changes on requests.storage so future autoresizer expansions don't cause TF drift.	2026-05-10 11:12:37 +00:00
Viktor Barzin	e110b40a4a	monitoring(wealth): monthly contrib-vs-mkt as line chart, not bars User asked for two lines instead of side-by-side bars at monthly granularity. Converts panel 25 from barchart to timeseries: * type: barchart -> timeseries * format: table -> time_series, SELECT month::timestamp AS time * drawStyle line, lineWidth 2, fillOpacity 0, showPoints auto * Same blue (contributions) / green (market gain) colour overrides Where the green line rises above the blue line is the visual cue that the market out-earned new contributions for that month -- the trend the user wants to track. Diff is small (15 ins / 28 del) because the bar-chart-only fields (barRadius, barWidth, groupWidth, stacking, xField, xTickLabelRotation) are dropped.	2026-05-07 23:29:35 +00:00
Viktor Barzin	84fd752747	monitoring(wealth): monthly contributions vs market gain bar chart Goal stated by user: see when monthly market gain starts to exceed monthly contributions, i.e. the inflection point where the market is out-earning savings rather than the other way around. New panel id=25 between the annual decomposition (13) and per-account ROI (14): bar chart with two side-by-side bars per month -- contributions (blue) and market gain (green). Same calculation as panel 13 but month-grain instead of year-grain. Months where the green bar dwarfs the blue one are visible at a glance. SQL: same endpoints CTE pattern as panel 13, with date_trunc('month', valuation_date) as the grouping key. Uses max_complete cutoff so partial-today doesn't skew the latest month. Layout: panels at y >= 75 shifted down by 11 (chart height). New chart at y=75; panel 14 (per-account ROI) -> y=86; panel 10 (activity log) -> y=96. Spot check (recent months from PG): 2025-07: contrib +£5,601 market +£42,295 <- big market month 2025-09: contrib +£1,501 market +£24,206 2026-02: contrib +£35,501 market +£41,382 2026-03: contrib +£5,501 market -£38,483 <- correction 2026-04: contrib +£73,267 market +£21,448	2026-05-07 23:29:34 +00:00
Viktor Barzin	4ec40ea804	[forgejo] Phases 3+4+5: cutover, decommission, docs sweep End of forgejo-registry-consolidation. After Phase 0/1 already landed (Forgejo ready, dual-push CI, integrity probe, retention CronJob, images migrated via forgejo-migrate-orphan-images.sh), this commit flips everything off registry.viktorbarzin.me onto Forgejo and removes the legacy infrastructure. Phase 3 — image= flips: * infra/stacks/{payslip-ingest,job-hunter,claude-agent-service, fire-planner,freedify/factory,chrome-service,beads-server}/main.tf — image= now points to forgejo.viktorbarzin.me/viktor/<name>. * infra/stacks/claude-memory/main.tf — also moved off DockerHub (viktorbarzin/claude-memory-mcp:17 → forgejo.viktorbarzin.me/viktor/...). * infra/.woodpecker/{default,drift-detection}.yml — infra-ci pulled from Forgejo. build-ci-image.yml dual-pushes still until next build cycle confirms Forgejo as canonical. * /home/wizard/code/CLAUDE.md — claude-memory-mcp install URL updated. Phase 4 — decommission registry-private: * registry-credentials Secret: dropped registry.viktorbarzin.me / registry.viktorbarzin.me:5050 / 10.0.20.10:5050 auths entries. Forgejo entry is the only one left. * infra/stacks/infra/main.tf cloud-init: dropped containerd hosts.toml entries for registry.viktorbarzin.me + 10.0.20.10:5050. (Existing nodes already had the file removed manually by `setup-forgejo-containerd-mirror.sh` rollout — the cloud-init template only fires on new VM provision.) * infra/modules/docker-registry/docker-compose.yml: registry-private service block removed; nginx 5050 port mapping dropped. Pull- through caches for upstream registries (5000/5010/5020/5030/5040) stay on the VM permanently. * infra/modules/docker-registry/nginx_registry.conf: upstream `private` block + port 5050 server block removed. * infra/stacks/monitoring/modules/monitoring/main.tf: registry_ integrity_probe + registry_probe_credentials resources stripped. forgejo_integrity_probe is the only manifest probe now. Phase 5 — final docs sweep: * infra/docs/runbooks/registry-vm.md — VM scope reduced to pull- through caches; forgejo-registry-breakglass.md cross-ref added. * infra/docs/architecture/ci-cd.md — registry component table + diagram now reflect Forgejo. Pre-migration root-cause sentence preserved as historical context with a pointer to the design doc. * infra/docs/architecture/monitoring.md — Registry Integrity Probe row updated to point at the Forgejo probe. * infra/.claude/CLAUDE.md — Private registry section rewritten end- to-end (auth, retention, integrity, where the bake came from). * prometheus_chart_values.tpl — RegistryManifestIntegrityFailure alert annotation simplified now that only one registry is in scope. Operational follow-up (cannot be done from a TF apply): 1. ssh root@10.0.20.10 — edit /opt/registry/docker-compose.yml to match the new template AND `docker compose up -d --remove-orphans` to actually stop the registry-private container. Memory id=1078 confirms cloud-init won't redeploy on TF apply alone. 2. After 1 week of no incidents, `rm -rf /opt/registry/data/private/` on the VM (~2.6GB freed). 3. Open the dual-push step in build-ci-image.yml and drop registry.viktorbarzin.me:5050 from the `repo:` list — at that point the post-push integrity check at line 33-107 also needs to be repointed at Forgejo or removed (the per-build verify is redundant with the every-15min Forgejo probe). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-07 23:29:34 +00:00
Viktor Barzin	70ea1cf6fd	[forgejo] Tolerate missing Vault keys during Phase 0 bootstrap Wrap the three new Vault key reads in try(...) so the first apply succeeds even when forgejo_pull_token / forgejo_cleanup_token / secret/ci/global haven't been populated yet. Without this, CI auto-apply blocks on the very push that introduces the references — chicken-and-egg with the runbook order (which is: apply Forgejo bumps, then create users + PATs, then apply the rest). Empty tokens are intentionally visible-broken (auth fails, probe reports auth failure, cleanup CronJob errors) — that's the signal to run the bootstrap runbook. Subsequent apply picks up the real values. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-07 23:29:33 +00:00
Viktor Barzin	f793a5f50b	[forgejo] Phase 0 of registry consolidation: prepare Forgejo OCI registry Stage 1 of moving private images off the registry:2 container at registry.viktorbarzin.me:5050 (which has hit distribution#3324 corruption 3x in 3 weeks) onto Forgejo's built-in OCI registry. No cutover risk — pods still pull from the existing registry until Phase 3. What changes: * Forgejo deployment: memory 384Mi→1Gi, PVC 5Gi→15Gi (cap 50Gi). Explicit FORGEJO__packages__ENABLED + CHUNKED_UPLOAD_PATH (defensive, v11 default-on). * ingress_factory: max_body_size variable was declared but never wired in after the nginx→Traefik migration. Now creates a per-ingress Buffering middleware when set; default null = no limit (preserves existing behavior). Forgejo ingress sets max_body_size=5g to allow multi-GB layer pushes. * Cluster-wide registry-credentials Secret: 4th auths entry for forgejo.viktorbarzin.me, populated from Vault secret/viktor/ forgejo_pull_token (cluster-puller PAT, read:package). Existing Kyverno ClusterPolicy syncs cluster-wide — no policy edits. * Containerd hosts.toml redirect: forgejo.viktorbarzin.me → in-cluster Traefik LB 10.0.20.200 (avoids hairpin NAT for in-cluster pulls). Cloud-init for new VMs + scripts/setup-forgejo-containerd-mirror.sh for existing nodes. * Forgejo retention CronJob (0 4 * * ): keeps newest 10 versions per package + always :latest. First 7 days dry-run (DRY_RUN=true); flip the local in cleanup.tf after log review. Forgejo integrity probe CronJob (/15): same algorithm as the existing registry-integrity-probe. Existing Prometheus alerts (RegistryManifestIntegrityFailure et al) made instance-aware so they cover both registries during the bake. Docs: design+plan in docs/plans/, setup runbook in docs/runbooks/. Operational note — the apply order is non-trivial because the new Vault keys (forgejo_pull_token, forgejo_cleanup_token, secret/ci/global/forgejo_*) must exist BEFORE terragrunt apply in the kyverno + monitoring + forgejo stacks. The setup runbook documents the bootstrap sequence. Phase 1 (per-project dual-push pipelines) follows in subsequent commits. Bake clock starts when the last project goes dual-push. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-07 23:29:33 +00:00
Viktor Barzin	41655096c7	openclaw: realtime usage dashboard via Prometheus exporter sidecar Stdlib-only Python exporter ($1) reads ~/.openclaw/agents//sessions/.jsonl (assistant messages with usage) plus auth-profiles.json (OAuth expiry, Plus-tier label) and exposes Prometheus text format on :9099/metrics. Container is python:3.12-slim; pod template gets prometheus.io/scrape annotations so the existing kubernetes-pods job picks it up — no ServiceMonitor needed. Metrics exported: openclaw_codex_messages_total{provider,model,session_kind} counter openclaw_codex_input/output/cache_read/cache_write_tokens_total openclaw_codex_message_errors_total{reason} openclaw_codex_active_sessions{kind} gauge openclaw_codex_oauth_expiry_seconds{provider,account,plan} gauge openclaw_codex_last_run_timestamp gauge Grafana dashboard "OpenClaw — Codex Usage" (Applications folder, 30s refresh): messages/5h vs Plus rate-card, % of 1,200 floor, tokens/5h, cache hit %, OAuth expiry days, active sessions, last-turn age, errors, plus per-model timeseries + bar gauge + error table. Plus rate-card thresholds in the gauge are conservative (1,200/5h floor; real cap is dynamic 1,200–7,000). Re-baseline if throttling shows up below 80%.	2026-05-07 23:29:32 +00:00
Viktor Barzin	f006b48566	monitoring(wealth): delta panels to 2x4 grid (rows = type, cols = window) Better visual grouping: instead of 8 paired panels in a single row at w=3 (cramped, hard to scan), arrange as a 2x4 grid at w=6. Top row ("all" — wealth change incl new money), bottom row ("mkt" — pure market gain). Columns are timeframes 1d / 7d / 30d / 90d. Reading vertically: same window, two interpretations side by side. Reading horizontally: same metric across timeframes. Layout shift: delta row goes from y=4 (4 wide) to y=4..11 (8 high). All chart/log panels with y >= 8 shift down by another 4 rows (net-worth chart 8->12, activity log 81->85, etc.).	2026-05-07 23:29:31 +00:00
Viktor Barzin	0f107aeacb	monitoring(wealth): pair every delta panel with market-only twin User feedback: net-worth delta panels (1d/7d/30d/90d) confused because +£174k over 90d looked too big against the £271k cumulative unrealised gain. Decomposition showed the 90d delta was £114k of new money in (contributions) + £60k of actual market gain. So now the delta row shows BOTH: Δ Nd (all) — net-worth change incl new money (the original number) Δ Nd (mkt) — pure market gain, contributions stripped out Pattern for "(mkt)" panels: same now_snap / past_snap CTEs but selecting both total_value and net_contribution, then computing (nw_delta - contrib_delta) = market_gain over window. Layout: 8 panels at w=3 each on the y=4 row, paired by window (all next to mkt for each timeframe), so you can see "wealth change vs investment performance" at a glance. Verified live (90d): all=+£174,612, mkt=+£60,343, contrib=+£114,268.	2026-05-07 23:29:31 +00:00
Viktor Barzin	87069ae5c3	monitoring(wealth): add delta row (1d / 7d / 30d / 90d net-worth changes) New row at y=4 with 4 stat panels showing net-worth change over the trailing windows. Each uses the latest-per-account stitching pattern (skew-resilient against partial-day syncs) and computes: delta = SUM(latest per account) - SUM(latest per account at or before max_complete - N) Where max_complete is the most recent date all accounts have a row. For each window: 1d, 7d, 30d, 90d. Verified live values: +£8,575 / +£22,696 / +£144,633 / +£174,612. All panels at y >= 4 shifted down by 4 rows to make room (Net worth chart 4->8, Per-account stacked 24->28, Activity log 77->81, etc.). Note: this commit also reformats the dashboard JSON from compact- object form to indented form (json.dump indent=2 side effect from the Python patch script). No semantic changes outside the new panels and y-shifts.	2026-05-07 23:29:31 +00:00
Viktor Barzin	1cb2bb30f7	monitoring(wealth): show pre-2024 historical data on timeseries Bug: timeseries panels were empty before 2024-04-10. Cause was the complete_dates CTE filtering to "every active account has a row for this date" -- which excluded every day before the most-recently-added account first appeared. The 6th account (Trading212 Invest GIA) only started 2024-04-10, so 4 years of legitimate historical data (2020-06-07 onwards, when the user genuinely had fewer accounts) got hidden. New pattern across panels 5/6/7/8/9/12/13: replace complete_dates with max_complete cutoff. Compute the most-recent date where all current accounts have a row, then include every historical date up to and including that day. Partial-today is still excluded automatically. Historical days with fewer accounts now show as their actual smaller sums -- which is the correct historical net worth at the time. Verified via PG: new pattern returns 2,159 distinct days from 2020-06-07 to 2026-05-05 (vs the previous 391 from 2024-04-10). Per-account first-seen dates: InvestEngine ISA - 2020-06-07 Schwab US workplace - 2020-11-17 InvestEngine GIA - 2022-03-17 Fidelity UK Pension - 2022-05-16 Trading212 ISA - 2024-04-08 Trading212 Invest GIA - 2024-04-10 (was the bottleneck)	2026-05-05 18:43:26 +00:00
Viktor Barzin	6715cdc51f	monitoring(wealth): re-add milestone annotations (now that PG creds rotated) Re-applies the milestone annotation commit reverted in `0ef36aec`. The earlier "nothing loads / syntax error" was a red herring: Vault had rotated the wealthfolio_sync DB password 7 days prior, the K8s Secret picked it up automatically (pg-sync sidecar still working), but the Grafana datasource ConfigMap is baked at TF-apply time so Grafana was sending the old password. Every panel + the new annotation alike failed with: pq password authentication failed for user wealthfolio_sync. Fix today: refresh the datasource ConfigMap and roll Grafana. scripts/tg apply -target=kubernetes_config_map.grafana_wealth_datasource kubectl -n monitoring rollout restart deploy/grafana Annotation source verified live via /api/ds/query: SQL returns 5 milestone rows correctly. Dashboard charts now show vertical dashed lines at GBP100k 2021-11-01, GBP250k 2023-07-18, GBP500k 2024-09-19, GBP750k 2025-08-26, GBP1M 2026-04-18. KNOWN FOLLOW-UP: Vault rotates pg-wealthfolio-sync every 7 days (static role). Todays failure will recur unless the Grafana datasource auto-refreshes. Options: 1. Annotate Grafana deploy with stakater/reloader so it restarts when wealthfolio-sync-db-creds Secret changes. 2. Switch datasource provisioning to read password from an env var sourced from the Secret instead of baking into the ConfigMap. Combined with reloader, picks up rotation cleanly.	2026-05-02 20:27:21 +00:00
Viktor Barzin	0ef36aec36	Revert "monitoring(wealth): milestone annotations on every timeseries chart" This reverts commit `5a00b9c096`.	2026-05-02 20:20:18 +00:00
Viktor Barzin	5a00b9c096	monitoring(wealth): milestone annotations on every timeseries chart Inspired by the user's "Journey to £1M" reference — adds vertical dashed lines on every timeseries panel at the date net worth first crossed each round threshold (£100k, £250k, £500k, £750k, £1M). Implementation: a dashboard-level annotation source ("Milestones", purple) backed by a PG query that finds the MIN(valuation_date) where SUM(total_value) >= each threshold. The query returns (time, text) pairs, e.g. "2026-04-18 → £1M 🎉". Annotations attach to all timeseries panels automatically; auto-extends as future thresholds are crossed. Verified against current data: £100k → 2021-11-01 £250k → 2023-07-18 £500k → 2024-09-19 £750k → 2025-08-26 £1M → 2026-04-18 🎉 Future work (per user request): add a "Journey" stat-card row at the top mirroring the reference (date achieved + months from previous).	2026-05-02 08:42:21 +00:00
Viktor Barzin	664a85ef1e	Revert "monitoring(wealth): show daily points + lighter fill on timeseries" This reverts commit `5472720c75`.	2026-05-01 16:24:18 +00:00
Viktor Barzin	5472720c75	monitoring(wealth): show daily points + lighter fill on timeseries Make daily movements visible on the line charts. The y-axis still spans ~£700k–£1M so an £8k daily move is ~1% of vertical range and easy to miss when only the line is drawn. Changes per panel: * 5 (Net worth): showPoints never→always, pointSize 4→5, fillOpacity 20→10 * 6 (Net contrib vs market): showPoints never→always, pointSize 4→5 * 7 (Growth over time): showPoints never→always, pointSize 4→5, fillOpacity 50→25 * 8 (Per-account stacked): showPoints never→always (kept stacking fill at 70) * 9 (Cash vs invested stacked): showPoints never→always (kept stacking fill at 70) Each daily value now renders as a visible dot, so even if the line appears flat at this scale, the per-day points trace the wiggle. Lighter fill on the unstacked panels lets the line + points dominate visually. Caveat: the fundamental "£8k on a £1M base" visibility issue is best solved with a dedicated "Daily change" delta panel — happy to add one on next pass if this isn't enough.	2026-05-01 16:23:25 +00:00
Viktor Barzin	2722260ce9	monitoring(wealth): unbreak timeseries SQL — over-escaped time alias Fix: panels 5–9 had `AS \"time\"` (literal backslash-quote sequence embedded in the SQL string). PostgreSQL parsed that as a syntax error at the leading backslash: ERROR: syntax error at or near "\" LINE 1: ...complete_dates)) SELECT valuation_date::timestamp AS \"time\" Root cause: the patch script for the skew-resilient queries (commit `628f5a0d`) used a Python f-string with `\\\"time\\\"`, which produces a literal backslash-quote in the Python string. When that string was JSON-encoded the backslash was preserved verbatim instead of collapsed to plain `"time"`. Replaces all five occurrences with the correct `AS "time"` form. Verified the corrected query against PG returns 7 daily net-worth rows for 04-25..05-01 as expected.	2026-05-01 16:19:07 +00:00
Viktor Barzin	d67416d4ca	monitoring(wealth): tighten default time range, bump decimals for granularity Two adjustments to make daily movements visible: 1. Default time range: now-5y → now-180d. The timeseries charts (Net worth, Net contribution vs market value, Growth, Per-account stacked, Cash vs invested) auto-fit their y-axis to the data range in view. Over 5 years, daily £1k–£10k moves are ~1% of axis range and visually invisible against the cumulative trend. Over 6 months, the same daily moves dominate. Yearly bar charts (12, 13) are unaffected — they aggregate by calendar year and don't filter on $__timeFilter. 2. Decimals → 2 on every currency panel (1, 2, 3, 5–9, 13, 15, 16) and every percent panel (4, 14). Stat panels now show pennies on currency and 0.01% on rates; chart y-axis ticks are likewise more precise. Honest caveat: pennies on a £1M number don't make the absolute readout easier — to see "today changed by £8,358" cleanly we'd want a dedicated delta panel; pending user direction. Widen the time picker manually to recover the 5-year view; default just zooms into the last 6 months.	2026-05-01 16:15:39 +00:00
Viktor Barzin	628f5a0d26	monitoring(wealth): skew-resilient queries, no more partial-day dips Bug witnessed 2026-05-01: dashboard "Net worth (current)" showed £88k instead of £1.03M because at 02:00 UTC an external trigger refreshed ONE account (Trading212 ISA), creating its 05-01 daily_account_valuation row. The 5 other accounts still had their last row at 04-30. The panel SQL `WHERE valuation_date = (SELECT MAX(valuation_date))` then summed only the single account that had a 05-01 row. Two new SQL patterns adopted across all 15 affected panels: 1. Stat / barchart "current snapshot" panels (1, 2, 3, 4, 11, 14, 15, 16): latest-per-account stitching — WITH latest AS (SELECT DISTINCT ON (d.account_id) ... FROM daily_account_valuation d JOIN accounts a ON a.id = d.account_id ORDER BY d.account_id, d.valuation_date DESC) gives a coherent "now" snapshot regardless of refresh skew, and the inner join filters out orphan/deleted accounts (one such was adding a stale £33k from 04-17). 12-month panels add a parallel `ago` CTE picking each account's row closest to (d_now - 12mo). 2. Time-series / yearly panels (5, 6, 7, 8, 9, 12, 13): complete-days- only filter — WITH active_accounts AS (SELECT COUNT() FROM accounts), complete_dates AS (SELECT valuation_date FROM daily_account_valuation d JOIN accounts a ON a.id=d.account_id GROUP BY valuation_date HAVING COUNT() >= active.n) so a partial today never renders as a chart dip. The day rejoins the chart automatically once the daily 16:00 UTC sync writes rows for every account. Verified end-to-end against live PG: new queries produce £1,033,734 (matches the 6 active accounts' true latest sum) where the old query gave £88k.	2026-05-01 16:08:18 +00:00
Viktor Barzin	31b9e5d4a9	monitoring(wealth): add 12mo contrib + 12mo gain to top row Top row goes from 5 → 7 stat panels (widths 4+4+4+3+3+3+3=24): - Net worth, Net contribution, Growth shrink from w=5 to w=4. - ROI % shrinks from w=5 to w=3 (now sits at x=12). - 12mo return slides from x=20/w=4 to x=15/w=3. - New: 12mo contrib (id=15, currency, blue) at x=18 — net contributions added in the trailing 12 months. - New: 12mo gain (id=16, currency, red/green) at x=21 — pure market gain in £ over the trailing 12 months (12mo Δnet-worth − 12mo contribs). Live values verified against PG: contrib_12mo=£245k, gain_12mo=£172k, sum = £417k = nw_now − nw_ago, return = 23.51%.	2026-04-27 06:32:53 +00:00
Viktor Barzin	215717c90f	monitoring(dashboards): tables at the bottom convention wealth: move Activity log table from y=45 to y=77; the three barcharts (Yearly return, Annual change, Per-account ROI) shift up by 14 to fill the gap. uk-payslip: move Sankey "where the money went" from y=80 to y=48 (right above the table block); the three tables (Data integrity, All payslips, YTD reconciliation) shift down by 14 so all four tables (4, 5, 6, 9) sit contiguously at the bottom. fire-planner and job-hunter still have intentional side-by-side table/chart pairings; left untouched pending user direction on whether to break them.	2026-04-26 18:30:52 +00:00
Viktor Barzin	bb28485ce0	monitoring(wealth): move 12mo return to top bar, shrink to w=4 Trailing 12-month investment return % was a full-width stat at y=59. Now sits inline with Net worth / Contribution / Growth / ROI as the fifth headline number — top-row stats reflowed from w=6 (×4) to w=5 (×4) + w=4 (×1). Title shortened to "12mo return" so it fits. Panels below the old row shifted up by 4 rows to close the gap.	2026-04-26 18:19:24 +00:00
Viktor Barzin	a24cd7ceb7	monitoring(uk-payslip): yearly receipt aligns with P60 (RSU gross) Switch the RSU stack from "after band-aware tax" to gross. Receipt total is now pre-sacrifice gross compensation; bar − pension stack ≈ ytd_gross reported on the final March payslip / P60. Verified alignment for 2025/26: bar−pension = £266,752 vs P60 ytd_gross = £268,127 — gap of £1,375 ≈ "other taxable" (benefits, overtime). Remaining year-level gaps are upstream parser/ingest issues, not dashboard logic: - 2024/25 +£27k: March 2025 payslip parsed bonus=£26,969 but never propagated it into gross_pay/income_tax. Receipt is more accurate than ytd_gross here. - 2023/24 −£36k: Feb 2024 payslip row appears to be missing from the table; ytd_gross has it, sum(gross_pay) doesn't. - 2022/23 −£10k: variant A→B transition residual. SQL simplified — band-aware CTE chain dropped (no longer needed for this panel since RSU is shown gross).	2026-04-26 10:24:06 +00:00
Viktor Barzin	222013806d	monitoring(uk-payslip): split salary into cash + pension on yearly receipt The salary field on the payslip is pre-pension-sacrifice, so the "Salary (gross)" stack already silently included the salary-sacrifice pension contribution. Split it out so pension is explicitly visible: - Salary (cash, post-sacrifice) = salary - pension_sacrifice - Pension (salary sacrifice, untaxed) = pension_sacrifice - Bonus - RSU vest (after band-aware tax) Bar total unchanged (just relabels what was already there). Pension is now visibly counted as income — consistent with "untaxed but real" framing. Caveat documented in panel description: receipt total ≠ P60 gross because P60 reports pre-RSU-tax gross. Receipt shows RSU net of tax per earlier intent. To exactly match P60, swap rsu_after_tax → rsu_vest gross.	2026-04-26 09:18:32 +00:00
Viktor Barzin	21ac619fac	monitoring(uk-payslip): promote yearly receipt + YTD gross YoY to row 4 Move both barchart/timeseries panels into row 4 (y=29, side-by-side w=12 each, h=10) so the per-tax-year overviews appear right after the income-tax-and-pension YTD row. Shift panels 13, 4, 5, 6, 8, 9 down by 10 to accommodate. Final ordering: rows 1–3 = monthly + YTD timeseries (panels 1/7/2/3/11/12), row 4 = yearly receipt + YTD gross YoY (16/17), then the wider deduction/integrity/table panels below.	2026-04-25 23:58:15 +00:00
Viktor Barzin	53f555dc61	monitoring(uk-payslip): drop 3 panels referencing undeployed data Removed: - Panel 10 "HMRC Tax Year Reconciliation — Individual Tax API" → references hmrc_sync.tax_year_snapshot schema. The hmrc-sync service / DB has not been deployed, so the panel always errored with "relation does not exist". - Panel 14 "Meta payroll: bank deposit vs payslip net pay" → references payslip_ingest.external_meta_deposits, which is created by alembic migration 0007. The deployed payslip-ingest image is at 0005, so the table doesn't exist. - Panel 15 "RSU vest reconciliation — payslip vs Schwab" → references payslip_ingest.rsu_vest_events, created by migration 0008. Same image-staleness story. Verified all 14 remaining panels return without error via Grafana /api/ds/query. SQL for the removed panels is preserved in git history; re-add when the data sources are actually deployed.	2026-04-25 23:56:03 +00:00
Viktor Barzin	b2a25775aa	monitoring(uk-payslip): simplify yearly receipt to earned-and-kept view Replace the 7-stack "where total comp went" decomposition with a 3-stack "what I actually earned" view: salary (gross), bonus (gross), and RSU vest after band-aware tax (PAYE+NI withheld via sell-to-cover). Skips income tax / NI / student loan / pension / RSU offset. Bar height = real income kept across all components. RSU is net of tax because it's withheld at source and never hits the bank account; salary and bonus are gross because they're paid in full and taxes are deducted elsewhere. This is the income-side view where tax is implicit, not the deduction waterfall. Per-year RSU after tax: 2020/21 £18k · 2021/22 £39k · 2022/23 £50k · 2023/24 £26k · 2024/25 £71k · 2025/26 £73k.	2026-04-25 23:42:20 +00:00
Viktor Barzin	a17304f735	monitoring(uk-payslip): fix empty YTD gross YoY chart Two bugs: 1. Synthetic dates projected onto 1970/71 fell outside the dashboard's default time range (now-10y → now), so Grafana filtered out every point. Switched to a sliding 12-month window (CURRENT_DATE - INTERVAL '12 months') as the projection base, plus a per-panel timeFrom: "13M" override so the panel always shows the last 13 months regardless of the dashboard's time picker. 2. ORDER BY tax_year, pay_date violated Grafana's long→wide conversion requirement (data must be ascending by time). Wrapped in a CTE and re-ordered by the synthetic time column. Pivoted result is now a single wide frame with 7 series (2019/20…2025/26).	2026-04-25 23:36:16 +00:00
Viktor Barzin	ac18c49a7b	monitoring(wealth): fix x-axis label formatting on yearly bars The default fieldConfig unit (percent on Yearly investment return %, currencyGBP on Annual change decomposition) was being applied to the "year" string column too — so x-axis labels rendered as "2024%" and "£2,024" respectively. Add field overrides on the "year" column to force unit=string. The earlier "tax_year" panels weren't affected because "2024/25" doesn't parse as a number; "2024" did.	2026-04-25 23:31:03 +00:00
Viktor Barzin	77bed10a51	monitoring: investment-only returns + YoY YTD gross line chart Wealth dashboard: - "Yearly growth %" → "Yearly investment return %": switched to modified-Dietz formula `market_gain / (nw_start + 0.5 × contributions)` so contributions don't inflate the return. New money in is excluded — this is portfolio performance, not net-worth change. - "Trailing 12-month growth %" → "Trailing 12-month investment return %": same formula, applied to the trailing 12mo window. Pre-fix vs post-fix: 2020: 155.0% → 5.12% (large contributions on small base) 2021: 344.7% → 26.45% 2022: 26.9% → -25.65% (the actual 2022 bear market) 2023: 123.2% → 41.60% 2024: 87.4% → 25.70% 2025: 46.8% → 8.43% 2026: 16.7% → 3.28% (YTD) UK Payslip dashboard: - Replaced the per-tax-year stacked bar with a year-over-year line chart: one line per tax year, X = month-of-tax-year (April→March, projected onto a 1970/71 fiscal calendar so years overlay), Y = cumulative YTD gross. Five+ lines visible at a glance for trend comparison.	2026-04-25 23:25:42 +00:00
Viktor Barzin	55d1da41f6	monitoring: more growth detail in Wealth + gross composition in UK Payslip Wealth (4 new panels at the bottom): - Trailing 12-month growth % (stat) — % change in net worth over last 12mo. - Yearly growth % (bar per calendar year) — first→last valuation each year. - Annual change decomposition (stacked bar) — splits each year's NW change into "net contributions" (new money in) and "market gain" (everything else: appreciation, dividends, FX). Answers "did I grow because I saved or because the market did the work?". - Per-account ROI % (horizontal bar) — (value − contribution) / contribution × 100, latest snapshot. Excludes accounts with zero/negative net contribution (Schwab — distorts ratio after RSU sells). UK Payslip (1 new panel below the yearly receipt): - Gross composition by tax year (stacked bar) — salary / bonus / RSU vest / other components per tax year. Bar height = gross pay. Trends in salary growth, bonus levels, and RSU vest sizing at a glance. All queries spot-checked via Grafana /api/ds/query.	2026-04-25 23:21:42 +00:00
Viktor Barzin	d48e222054	monitoring: lock Finance (Personal) folder to admin + fix cash classification Folder ACL: - Move uk-payslip + wealth dashboards to a new "Finance (Personal)" folder; job-hunter + fire-planner stay in "Finance" (open). - New null_resource calls Grafana's folder permissions API after the dashboard sidecar materialises the folder, setting an admin-only ACL ({Admin: 4}). Default Viewer/Editor inheritance is overridden, so anonymous-Viewer (auth.anonymous=true) is denied. Server-admin always retains access. - Verified: anonymous → 403 on uk-payslip + wealth, 200 on control dashboards (node-exporter); admin → 200 on all. Wealth cash fix: - Wealthfolio dumps WORKPLACE_PENSION wrappers entirely into cash_balance because it doesn't track underlying fund holdings. Reclassify pension cash as invested in the "Cash vs invested" panel so the cash series reflects actual uninvested broker cash (~£16k T212 ISA + Schwab) instead of phantom £154k. Pre-fix: cash=£153,789 / invested=£870,282 / total=£1,024,071 Post-fix: cash=£16,064 / invested=£1,008,008 / total=£1,024,071	2026-04-25 23:11:26 +00:00
Viktor Barzin	f0ce7b0363	fire-planner: add stack, Vault DB role, dashboard, DB New stacks/fire-planner/ mirrors payslip-ingest layout: - ExternalSecret pulling RECOMPUTE_BEARER_TOKEN from Vault secret/fire-planner - DB ExternalSecret templating DB_CONNECTION_STRING via static role pg-fire-planner - FastAPI Deployment (serve), CronJob (recompute-all monthly on 2nd at 09:00 UTC, scheduled after wealthfolio-sync's 1st at 08:00), ClusterIP Service - Grafana datasource ConfigMap "FirePlanner" — `database` inside jsonData (`cc56ba29` fix; otherwise Grafana 11.2+ hits "you do not have default database") Plus: - vault/main.tf: pg-fire-planner static role (7d rotation), allowed_roles - dbaas/modules/dbaas/main.tf: null_resource creates fire_planner DB+role - monitoring/dashboards/fire-planner.json: 9-panel Finance-folder dashboard (NW timeseries, MC fan chart, success heatmap, lifetime tax bars, years-to-ruin table, optimal leave-UK stat, ending wealth stat, UK success-by-strategy bars, sequence-risk correlation table) - monitoring/modules/monitoring/grafana.tf: register "fire-planner.json" in Finance folder Apply order: 1. vault stack — creates the static role 2. dbaas stack — creates the database & role 3. external-secrets stack picks up vault-database refs (no change needed) 4. fire-planner stack — first apply with -target=kubernetes_manifest.db_external_secret before full apply, per the plan-time-data-source pattern 5. monitoring stack — picks up the new dashboard ConfigMap [ci skip] Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 17:27:19 +00:00
Viktor Barzin	bf4c7618d8	wealth: SQLite→PG ETL sidecar + new Grafana dashboard Mirrors Wealthfolio's daily_account_valuation / accounts / activities from SQLite into a new PG database (wealthfolio_sync) every hour, so Grafana can chart net worth, contributions, and growth over time. Components: - dbaas: null_resource creates wealthfolio_sync DB + role on the CNPG cluster (dynamic primary lookup so it survives failover). - vault: pg-wealthfolio-sync static role rotates the password every 7d. - wealthfolio: ExternalSecret pulls the rotated password into the WF namespace; new pg-sync sidecar (alpine + sqlite + postgresql-client + busybox crond) does sqlite3 .backup → TSV dump → truncate-and-reload psql, hourly at :07. Plus a grafana-wealth-datasource ConfigMap in the monitoring namespace (uid: wealth-pg). - monitoring: new Wealth dashboard (wealth.json, 10 panels) — current net worth / contribution / growth / ROI% stats, then time-series for net worth, contribution-vs-market, growth area, per-account stacked area, cash-vs-invested, and a 100-row activity log. Initial sync: 6 accounts, 10,798 daily valuations, 518 activities. Verified PG totals match SQLite latest snapshot exactly.	2026-04-25 17:07:33 +00:00
Viktor Barzin	4f5f1ff8c2	monitoring(uk-payslip): add yearly receipt stacked barchart panel New panel 16 (barchart, h=11, y=179): one stacked bar per tax year showing total comp split into net pay (bank deposit), cash income tax, RSU tax (band-aware marginal: PAYE+NI), cash NI, student loan, pension salary- sacrifice, and RSU offset (Variant A only). X-axis = tax_year (categorical), y-axis = currencyGBP. Bar height ≈ gross_pay + pension_sacrifice (small over-attribution in Variant A years where the band-aware model exceeds recorded payslip PAYE).	2026-04-25 16:26:57 +00:00
Viktor Barzin	b3c29eda12	monitoring(uk-payslip): model UK income-tax bands + PA-taper for RSU marginal Replaces the flat 47% (45 PAYE + 2 NI) RSU marginal across panels 3, 7, 8, 11, and 12 with an exact piecewise band-aware computation. Each row computes ani_prior/ani_pre/ani_post over the tax-year YTD (chronological model — the RSU is taxed at the band its YTD ANI position occupies at the vest date, mirroring PAYE withholding behaviour). Bands (2024/25+, applied to all years): IT: 0% / 20% / 40% / 60% (PA-taper) / 45% at 12,570 / 50,270 / 100k / 125,140 NI: 0% / 8% / 2% at 12,570 / 50,270 PA-taper modelled as 60% effective IT marginal in £100k–£125,140 (40% on the £1 + 40% on the £0.50 of lost PA = 60%). Spot-checked per tax-year totals via psql; numbers diverge from the flat 47% baseline most for years where vests cross PA-taper or basic-rate bands (2020/21 ~35%, 2024/25 ~41%, 2025/26 ~43%).	2026-04-25 16:14:49 +00:00
Viktor Barzin	0d5f53f337	monitoring(uk-payslip): replace misleading take-home rates in Panel 3 Drop the two misleading series in "Effective rate & take-home % (YTD cumulative)" — both used SUM(gross_pay) as denominator while only counting cash deductions/net in the numerator, which understated take-home by 25-30 pp because RSU shares are absent from the cash deposit but present in gross. Replaced with three semantically clean angles: - ytd_paye_rate_pct: SUM(income_tax) / SUM(taxable_pay) — HMRC audit rate (~41-42% in additional-rate band), kept as before. - ytd_cash_take_home_pct: SUM(net_pay) / SUM(gross_pay - rsu_vest) — what fraction of cash earnings hits the bank (~62-65%). - ytd_total_keep_pct: (SUM(net_pay) + 0.53 × SUM(rsu_vest)) / SUM(gross_pay) — true "what I actually keep" including post-tax RSU shares (47% marginal applied to vest value), ~55-60%. Added field overrides for clear color-coding (red/green/blue). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-25 15:45:47 +00:00

1 2 3 4

166 commits