infra/stacks/broker-sync/main.tf

802 lines
26 KiB
Terraform
Raw Normal View History

Add broker-sync Terraform stack (#7) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in c7c7047f (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since c7c7047f. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config | 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in bcad200a (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from bcad200a forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml | 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh | 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \ --include='*.tf' --include='*.hcl' --include='*.yaml' \ --include='*.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for *.viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/** filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks/*/secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits 5a988133 2026-03-11, a6f71fc6 2026-03-18, 9820f2ce 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN *.viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks/**/secrets/** filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. * fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:17:45 +01:00
variable "nfs_server" { type = string }
variable "image_tag" {
type = string
default = "latest"
description = "broker-sync image tag. Use 8-char git SHA in CI; :latest only for local trials."
}
resource "kubernetes_namespace" "broker_sync" {
metadata {
name = "broker-sync"
labels = {
"istio-injection" = "disabled"
tier = local.tiers.aux
}
}
[infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip] ## Context Wave 3B-continued: the Goldilocks VPA dashboard (stacks/vpa) runs a Kyverno ClusterPolicy `goldilocks-vpa-auto-mode` that mutates every namespace with `metadata.labels["goldilocks.fairwinds.com/vpa-update-mode"] = "off"`. This is intentional — Terraform owns container resource limits, and Goldilocks should only provide recommendations, never auto-update. The label is how Goldilocks decides per-namespace whether to run its VPA in `off` mode. Effect on Terraform: every `kubernetes_namespace` resource shows the label as pending-removal (`-> null`) on every `scripts/tg plan`. Dawarich survey 2026-04-18 confirmed the drift. Cluster-side count: 88 namespaces carry the label (`kubectl get ns -o json | jq ... | wc -l`). Every TF-managed namespace is affected. This commit brings the intentional admission drift under the same `# KYVERNO_LIFECYCLE_V1` discoverability marker introduced in c9d221d5 for the ndots dns_config pattern. The marker now stands generically for any Kyverno admission-webhook drift suppression; the inline comment records which specific policy stamps which specific field so future grep audits show why each suppression exists. ## This change 107 `.tf` files touched — every stack's `resource "kubernetes_namespace"` resource gets: ```hcl lifecycle { # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]] } ``` Injection was done with a brace-depth-tracking Python pass (`/tmp/add_goldilocks_ignore.py`): match `^resource "kubernetes_namespace" ` → track `{` / `}` until the outermost closing brace → insert the lifecycle block before the closing brace. The script is idempotent (skips any file that already mentions `goldilocks.fairwinds.com/vpa-update-mode`) so re-running is safe. Vault stack picked up 2 namespaces in the same file (k8s-users produces one, plus a second explicit ns) — confirmed via file diff (+8 lines). ## What is NOT in this change - `stacks/trading-bot/main.tf` — entire file is `/* … */` commented out (paused 2026-04-06 per user decision). Reverted after the script ran. - `stacks/_template/main.tf.example` — per-stack skeleton, intentionally minimal. User keeps it that way. Not touched by the script (file has no real `resource "kubernetes_namespace"` — only a placeholder comment). - `.terraform/` copies (e.g. `stacks/metallb/.terraform/modules/...`) — gitignored, won't commit; the live path was edited. - `terraform fmt` cleanup of adjacent pre-existing alignment issues in authentik, freedify, hermes-agent, nvidia, vault, meshcentral. Reverted to keep the commit scoped to the Goldilocks sweep. Those files will need a separate fmt-only commit or will be cleaned up on next real apply to that stack. ## Verification Dawarich (one of the hundred-plus touched stacks) showed the pattern before and after: ``` $ cd stacks/dawarich && ../../scripts/tg plan Before: Plan: 0 to add, 2 to change, 0 to destroy. # kubernetes_namespace.dawarich will be updated in-place (goldilocks.fairwinds.com/vpa-update-mode -> null) # module.tls_secret.kubernetes_secret.tls_secret will be updated in-place (Kyverno generate.* labels — fixed in 8d94688d) After: No changes. Your infrastructure matches the configuration. ``` Injection count check: ``` $ rg -c 'KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode' stacks/ | awk -F: '{s+=$2} END {print s}' 108 ``` ## Reproduce locally 1. `git pull` 2. Pick any stack: `cd stacks/<name> && ../../scripts/tg plan` 3. Expect: no drift on the namespace's goldilocks.fairwinds.com/vpa-update-mode label. Closes: code-dwx Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:15:27 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
Add broker-sync Terraform stack (#7) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in c7c7047f (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since c7c7047f. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config | 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in bcad200a (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from bcad200a forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml | 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh | 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \ --include='*.tf' --include='*.hcl' --include='*.yaml' \ --include='*.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for *.viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/** filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks/*/secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits 5a988133 2026-03-11, a6f71fc6 2026-03-18, 9820f2ce 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN *.viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks/**/secrets/** filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. * fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:17:45 +01:00
}
# Secrets for all providers. Seeded in Vault at `secret/broker-sync`:
# wf_base_url — e.g. https://wealthfolio.viktorbarzin.me
# wf_username — Wealthfolio login username
# wf_password — Wealthfolio login password (cleartext; server stores Argon2id)
# trading212_api_keys — JSON array of {account_id, account_type, api_key, name, currency}
# imap_host, imap_user, imap_password, imap_directory — for InvestEngine + Schwab email ingest
resource "kubernetes_manifest" "external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "broker-sync-secrets"
namespace = kubernetes_namespace.broker_sync.metadata[0].name
}
spec = {
refreshInterval = "15m"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "broker-sync-secrets"
}
dataFrom = [{
extract = {
key = "broker-sync"
}
}]
}
}
depends_on = [kubernetes_namespace.broker_sync]
}
# Canonical data dir — SQLite watermarks, FX cache, CSV drop/archive, Wealthfolio session cache.
# Encrypted because we're storing brokerage tokens, session cookies, and transaction history.
resource "kubernetes_persistent_volume_claim" "data_encrypted" {
wait_until_bound = false
metadata {
name = "broker-sync-data-encrypted"
namespace = kubernetes_namespace.broker_sync.metadata[0].name
annotations = {
"resize.topolvm.io/threshold" = "10%"
Add broker-sync Terraform stack (#7) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in c7c7047f (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since c7c7047f. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config | 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in bcad200a (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from bcad200a forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml | 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh | 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \ --include='*.tf' --include='*.hcl' --include='*.yaml' \ --include='*.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for *.viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/** filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks/*/secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits 5a988133 2026-03-11, a6f71fc6 2026-03-18, 9820f2ce 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN *.viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks/**/secrets/** filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. * fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:17:45 +01:00
"resize.topolvm.io/increase" = "100%"
"resize.topolvm.io/storage_limit" = "5Gi"
}
}
spec {
access_modes = ["ReadWriteOnce"]
storage_class_name = "proxmox-lvm-encrypted"
resources {
requests = { storage = "1Gi" }
}
}
lifecycle {
# The autoresizer expands requests.storage up to storage_limit and
# PVCs can't shrink. Without this, every TF apply tries to revert
# to the spec value, K8s rejects the shrink, and the PVC ends up
# in Terminating-but-in-use limbo.
ignore_changes = [spec[0].resources[0].requests]
}
Add broker-sync Terraform stack (#7) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in c7c7047f (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since c7c7047f. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config | 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in bcad200a (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from bcad200a forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml | 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh | 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \ --include='*.tf' --include='*.hcl' --include='*.yaml' \ --include='*.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for *.viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/** filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks/*/secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits 5a988133 2026-03-11, a6f71fc6 2026-03-18, 9820f2ce 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN *.viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks/**/secrets/** filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. * fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:17:45 +01:00
}
locals {
broker_sync_image = "viktorbarzin/broker-sync:${var.image_tag}"
# Shared env block for every CronJob: auth into Wealthfolio + data path.
common_env = [
{ name = "BROKER_SYNC_DATA_DIR", value = "/data", from = null },
{ name = "WF_SESSION_PATH", value = "/data/wealthfolio_session.json", from = null },
{ name = "WF_BASE_URL", value = null, from = "wf_base_url" },
{ name = "WF_USERNAME", value = null, from = "wf_username" },
{ name = "WF_PASSWORD", value = null, from = "wf_password" },
]
}
# Phase 0 liveness: proves the image + namespace + PVC + ESO wiring end-to-end.
# Suspended by default; toggle to false to run.
resource "kubernetes_cron_job_v1" "version_probe" {
metadata {
name = "broker-sync-version"
namespace = kubernetes_namespace.broker_sync.metadata[0].name
labels = { app = "broker-sync", component = "version-probe" }
}
spec {
schedule = "0 1 * * *"
concurrency_policy = "Forbid"
successful_jobs_history_limit = 1
failed_jobs_history_limit = 3
job_template {
metadata {}
spec {
backoff_limit = 1
ttl_seconds_after_finished = 86400
Add broker-sync Terraform stack (#7) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in c7c7047f (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since c7c7047f. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config | 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in bcad200a (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from bcad200a forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml | 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh | 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \ --include='*.tf' --include='*.hcl' --include='*.yaml' \ --include='*.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for *.viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/** filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks/*/secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits 5a988133 2026-03-11, a6f71fc6 2026-03-18, 9820f2ce 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN *.viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks/**/secrets/** filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. * fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:17:45 +01:00
template {
metadata {
labels = { app = "broker-sync", component = "version-probe" }
}
spec {
restart_policy = "OnFailure"
container {
name = "broker-sync"
image = local.broker_sync_image
command = ["broker-sync", "version"]
resources {
requests = { cpu = "10m", memory = "32Mi" }
limits = { memory = "128Mi" }
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
Add broker-sync Terraform stack (#7) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in c7c7047f (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since c7c7047f. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config | 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in bcad200a (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from bcad200a forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml | 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh | 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \ --include='*.tf' --include='*.hcl' --include='*.yaml' \ --include='*.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for *.viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/** filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks/*/secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits 5a988133 2026-03-11, a6f71fc6 2026-03-18, 9820f2ce 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN *.viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks/**/secrets/** filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. * fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:17:45 +01:00
}
# Trading212 steady-state daily sync. Phase 1 deliverable.
resource "kubernetes_cron_job_v1" "trading212" {
metadata {
name = "broker-sync-trading212"
namespace = kubernetes_namespace.broker_sync.metadata[0].name
labels = { app = "broker-sync", component = "trading212" }
}
spec {
schedule = "0 2 * * *" # 02:00 UK
concurrency_policy = "Forbid"
starting_deadline_seconds = 300
successful_jobs_history_limit = 3
failed_jobs_history_limit = 5
job_template {
metadata {}
spec {
backoff_limit = 2
ttl_seconds_after_finished = 86400
template {
metadata {
labels = { app = "broker-sync", component = "trading212" }
}
spec {
restart_policy = "OnFailure"
container {
name = "broker-sync"
image = local.broker_sync_image
command = ["broker-sync", "trading212", "--mode", "steady"]
env {
name = "BROKER_SYNC_DATA_DIR"
value = "/data"
}
env {
name = "WF_SESSION_PATH"
value = "/data/wealthfolio_session.json"
}
env {
name = "WF_BASE_URL"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_base_url"
}
}
}
env {
name = "WF_USERNAME"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_username"
}
}
}
env {
name = "WF_PASSWORD"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_password"
}
}
}
env {
name = "T212_API_KEYS_JSON"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "trading212_api_keys"
}
}
}
volume_mount {
name = "data"
mount_path = "/data"
}
resources {
requests = { cpu = "20m", memory = "128Mi" }
limits = { memory = "256Mi" }
}
}
volume {
name = "data"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.data_encrypted.metadata[0].name
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
Add broker-sync Terraform stack (#7) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in c7c7047f (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since c7c7047f. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config | 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in bcad200a (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from bcad200a forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml | 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh | 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \ --include='*.tf' --include='*.hcl' --include='*.yaml' \ --include='*.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for *.viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/** filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks/*/secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits 5a988133 2026-03-11, a6f71fc6 2026-03-18, 9820f2ce 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN *.viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks/**/secrets/** filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. * fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:17:45 +01:00
}
# IMAP ingest — InvestEngine + Schwab email parsers, one combined pod.
# Phase 2 deliverable. Defined ahead of implementation so the rollout is
# one `tf apply` once the image supports the CLI subcommand.
resource "kubernetes_cron_job_v1" "imap" {
metadata {
name = "broker-sync-imap"
namespace = kubernetes_namespace.broker_sync.metadata[0].name
labels = { app = "broker-sync", component = "imap" }
}
spec {
schedule = "30 2 * * *" # 02:30 UK, 30min after T212
concurrency_policy = "Forbid"
successful_jobs_history_limit = 3
failed_jobs_history_limit = 5
# Unsuspended 2026-04-19 for RSU vest ground-truth ingestion — the parser
# now detects Schwab Release Confirmations and scaffolds VestEvents; the
# postgres sink that persists them into payslip_ingest.rsu_vest_events is
# pending a real-email fixture and cross-service DB grant (see
# follow-up beads task filed under the RSU tax spike fix epic).
suspend = false
Add broker-sync Terraform stack (#7) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in c7c7047f (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since c7c7047f. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config | 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in bcad200a (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from bcad200a forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml | 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh | 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \ --include='*.tf' --include='*.hcl' --include='*.yaml' \ --include='*.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for *.viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/** filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks/*/secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits 5a988133 2026-03-11, a6f71fc6 2026-03-18, 9820f2ce 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN *.viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks/**/secrets/** filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. * fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:17:45 +01:00
job_template {
metadata {}
spec {
backoff_limit = 2
ttl_seconds_after_finished = 86400
template {
metadata {
labels = { app = "broker-sync", component = "imap" }
}
spec {
restart_policy = "OnFailure"
container {
name = "broker-sync"
image = local.broker_sync_image
command = ["broker-sync", "imap"]
env {
name = "BROKER_SYNC_DATA_DIR"
value = "/data"
}
env {
name = "WF_SESSION_PATH"
value = "/data/wealthfolio_session.json"
}
env {
name = "WF_BASE_URL"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_base_url"
}
}
}
env {
name = "WF_USERNAME"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_username"
}
}
}
env {
name = "WF_PASSWORD"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_password"
}
}
}
env {
name = "IMAP_HOST"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "imap_host"
}
}
}
env {
name = "IMAP_USER"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "imap_user"
}
}
}
env {
name = "IMAP_PASSWORD"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "imap_password"
}
}
}
env {
name = "IMAP_DIRECTORY"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "imap_directory"
}
}
}
volume_mount {
name = "data"
mount_path = "/data"
}
resources {
requests = { cpu = "10m", memory = "64Mi" }
limits = { memory = "256Mi" }
}
}
volume {
name = "data"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.data_encrypted.metadata[0].name
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
Add broker-sync Terraform stack (#7) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in c7c7047f (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since c7c7047f. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config | 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in bcad200a (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from bcad200a forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml | 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh | 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \ --include='*.tf' --include='*.hcl' --include='*.yaml' \ --include='*.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for *.viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/** filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks/*/secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits 5a988133 2026-03-11, a6f71fc6 2026-03-18, 9820f2ce 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN *.viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks/**/secrets/** filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. * fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:17:45 +01:00
}
# CSV drop-folder processor — Scottish Widows, Fidelity quarterly, Freetrade, etc.
# Phase 3 deliverable. Suspended until CLI subcommand lands.
resource "kubernetes_cron_job_v1" "csv_drop" {
metadata {
name = "broker-sync-csv"
namespace = kubernetes_namespace.broker_sync.metadata[0].name
labels = { app = "broker-sync", component = "csv" }
}
spec {
schedule = "0 3 * * *" # 03:00 UK
concurrency_policy = "Forbid"
successful_jobs_history_limit = 3
failed_jobs_history_limit = 5
suspend = true
job_template {
metadata {}
spec {
backoff_limit = 1
ttl_seconds_after_finished = 86400
template {
metadata {
labels = { app = "broker-sync", component = "csv" }
}
spec {
restart_policy = "OnFailure"
container {
name = "broker-sync"
image = local.broker_sync_image
command = ["broker-sync", "csv-drop"]
env {
name = "BROKER_SYNC_DATA_DIR"
value = "/data"
}
env {
name = "WF_SESSION_PATH"
value = "/data/wealthfolio_session.json"
}
env {
name = "WF_BASE_URL"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_base_url"
}
}
}
env {
name = "WF_USERNAME"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_username"
}
}
}
env {
name = "WF_PASSWORD"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_password"
}
}
}
volume_mount {
name = "data"
mount_path = "/data"
}
resources {
requests = { cpu = "10m", memory = "64Mi" }
limits = { memory = "128Mi" }
}
}
volume {
name = "data"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.data_encrypted.metadata[0].name
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
Add broker-sync Terraform stack (#7) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in c7c7047f (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since c7c7047f. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config | 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in bcad200a (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from bcad200a forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml | 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh | 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \ --include='*.tf' --include='*.hcl' --include='*.yaml' \ --include='*.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for *.viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/** filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks/*/secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits 5a988133 2026-03-11, a6f71fc6 2026-03-18, 9820f2ce 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN *.viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks/**/secrets/** filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. * fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:17:45 +01:00
}
# Monthly HMRC FX reconciliation — rewrites last-month activities with official
# HMRC rates once they publish. Phase 1 tail / Phase 2 deliverable.
resource "kubernetes_cron_job_v1" "fx_reconcile" {
metadata {
name = "broker-sync-fx-reconcile"
namespace = kubernetes_namespace.broker_sync.metadata[0].name
labels = { app = "broker-sync", component = "fx-reconcile" }
}
spec {
schedule = "5 5 7 * *" # 05:05 UK on the 7th
concurrency_policy = "Forbid"
successful_jobs_history_limit = 3
failed_jobs_history_limit = 5
suspend = true
job_template {
metadata {}
spec {
backoff_limit = 1
ttl_seconds_after_finished = 86400
template {
metadata {
labels = { app = "broker-sync", component = "fx-reconcile" }
}
spec {
restart_policy = "OnFailure"
container {
name = "broker-sync"
image = local.broker_sync_image
command = ["broker-sync", "fx-reconcile"]
env {
name = "BROKER_SYNC_DATA_DIR"
value = "/data"
}
env {
name = "WF_SESSION_PATH"
value = "/data/wealthfolio_session.json"
}
env {
name = "WF_BASE_URL"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_base_url"
}
}
}
env {
name = "WF_USERNAME"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_username"
}
}
}
env {
name = "WF_PASSWORD"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_password"
}
}
}
volume_mount {
name = "data"
mount_path = "/data"
}
resources {
requests = { cpu = "10m", memory = "64Mi" }
limits = { memory = "128Mi" }
}
}
volume {
name = "data"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.data_encrypted.metadata[0].name
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
Add broker-sync Terraform stack (#7) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in c7c7047f (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since c7c7047f. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config | 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in bcad200a (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from bcad200a forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml | 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh | 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \ --include='*.tf' --include='*.hcl' --include='*.yaml' \ --include='*.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for *.viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/** filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks/*/secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits 5a988133 2026-03-11, a6f71fc6 2026-03-18, 9820f2ce 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN *.viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks/**/secrets/** filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. * fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:17:45 +01:00
}
# Backup: snapshot sync.db / fx.db / csv-archive into NFS daily, keep 30 days.
# Convention from infra/.claude/CLAUDE.md: every proxmox-lvm app needs a backup
# CronJob writing to /mnt/main/<app>-backup/ on the PVE host (served over NFS).
resource "kubernetes_cron_job_v1" "backup" {
metadata {
name = "broker-sync-backup"
namespace = kubernetes_namespace.broker_sync.metadata[0].name
labels = { app = "broker-sync", component = "backup" }
}
spec {
schedule = "15 4 * * *" # 04:15 UK — after all syncs
concurrency_policy = "Forbid"
successful_jobs_history_limit = 3
failed_jobs_history_limit = 5
job_template {
metadata {}
spec {
backoff_limit = 1
ttl_seconds_after_finished = 86400
template {
metadata {
labels = { app = "broker-sync", component = "backup" }
}
spec {
restart_policy = "OnFailure"
container {
name = "backup"
image = "alpine:3.20"
command = ["/bin/sh", "-c", <<-EOT
set -eu
TIMESTAMP=$(date +%Y-%m-%dT%H-%M-%S)
BACKUP_DIR="/backup/$TIMESTAMP"
mkdir -p "$BACKUP_DIR"
cp -a /data/sync.db "$BACKUP_DIR/" 2>/dev/null || true
cp -a /data/fx.db "$BACKUP_DIR/" 2>/dev/null || true
if [ -d /data/csv-archive ]; then
cp -a /data/csv-archive "$BACKUP_DIR/"
fi
# Retention: keep last 30 days.
find /backup -mindepth 1 -maxdepth 1 -type d -mtime +30 -exec rm -rf {} +
echo "Backup complete: $BACKUP_DIR"
EOT
]
volume_mount {
name = "data"
mount_path = "/data"
read_only = true
}
volume_mount {
name = "backup"
mount_path = "/backup"
}
resources {
requests = { cpu = "5m", memory = "16Mi" }
limits = { memory = "64Mi" }
}
}
volume {
name = "data"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.data_encrypted.metadata[0].name
}
}
volume {
name = "backup"
nfs {
server = var.nfs_server
path = "/srv/nfs/broker-sync-backup"
}
}
}
}
}
}
}
[infra] Sweep dns_config ignore_changes across all pod-owning resources [ci skip] ## Context Wave 3A (commit c9d221d5) added the `# KYVERNO_LIFECYCLE_V1` marker to the 27 pre-existing `ignore_changes = [...dns_config]` sites so they could be grepped and audited. It did NOT address pod-owning resources that were simply missing the suppression entirely. Post-Wave-3A sampling (2026-04-18) found that navidrome, f1-stream, frigate, servarr, monitoring, crowdsec, and many other stacks showed perpetual `dns_config` drift every plan because their `kubernetes_deployment` / `kubernetes_stateful_set` / `kubernetes_cron_job_v1` resources had no `lifecycle {}` block at all. Root cause (same as Wave 3A): Kyverno's admission webhook stamps `dns_config { option { name = "ndots"; value = "2" } }` on every pod's `spec.template.spec.dns_config` to prevent NxDomain search-domain flooding (see `k8s-ndots-search-domain-nxdomain-flood` skill). Without `ignore_changes` on every Terraform-managed pod-owner, Terraform repeatedly tries to strip the injected field. ## This change Extends the Wave 3A convention by sweeping EVERY `kubernetes_deployment`, `kubernetes_stateful_set`, `kubernetes_daemon_set`, `kubernetes_cron_job_v1`, `kubernetes_job_v1` (+ their `_v1` variants) in the repo and ensuring each carries the right `ignore_changes` path: - **kubernetes_deployment / stateful_set / daemon_set / job_v1**: `spec[0].template[0].spec[0].dns_config` - **kubernetes_cron_job_v1**: `spec[0].job_template[0].spec[0].template[0].spec[0].dns_config` (extra `job_template[0]` nesting — the CronJob's PodTemplateSpec is one level deeper) Each injection / extension is tagged `# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2` inline so the suppression is discoverable via `rg 'KYVERNO_LIFECYCLE_V1' stacks/`. Two insertion paths are handled by a Python pass (`/tmp/add_dns_config_ignore.py`): 1. **No existing `lifecycle {}`**: inject a brand-new block just before the resource's closing `}`. 108 new blocks on 93 files. 2. **Existing `lifecycle {}` (usually for `DRIFT_WORKAROUND: CI owns image tag` from Wave 4, commit a62b43d1)**: extend its `ignore_changes` list with the dns_config path. Handles both inline (`= [x]`) and multiline (`= [\n x,\n]`) forms; ensures the last pre-existing list item carries a trailing comma so the extended list is valid HCL. 34 extensions. The script skips anything already mentioning `dns_config` inside an `ignore_changes`, so re-running is a no-op. ## Scale - 142 total lifecycle injections/extensions - 93 `.tf` files touched - 108 brand-new `lifecycle {}` blocks + 34 extensions of existing ones - Every Tier 0 and Tier 1 stack with a pod-owning resource is covered - Together with Wave 3A's 27 pre-existing markers → **169 greppable `KYVERNO_LIFECYCLE_V1` dns_config sites across the repo** ## What is NOT in this change - `stacks/trading-bot/main.tf` — entirely commented-out block (`/* … */`). Python script touched the file, reverted manually. - `_template/main.tf.example` skeleton — kept minimal on purpose; any future stack created from it should either inherit the Wave 3A one-line form or add its own on first `kubernetes_deployment`. - `terraform fmt` fixes to pre-existing alignment issues in meshcentral, nvidia/modules/nvidia, vault — unrelated to this commit. Left for a separate fmt-only pass. - Non-pod resources (`kubernetes_service`, `kubernetes_secret`, `kubernetes_manifest`, etc.) — they don't own pods so they don't get Kyverno dns_config mutation. ## Verification Random sample post-commit: ``` $ cd stacks/navidrome && ../../scripts/tg plan → No changes. $ cd stacks/f1-stream && ../../scripts/tg plan → No changes. $ cd stacks/frigate && ../../scripts/tg plan → No changes. $ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \ | awk -F: '{s+=$2} END {print s}' 169 ``` ## Reproduce locally 1. `git pull` 2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` → 169+ 3. `cd stacks/navidrome && ../../scripts/tg plan` → expect 0 drift on the deployment's dns_config field. Refs: code-seq (Wave 3B dns_config class closed; kubernetes_manifest annotation class handled separately in 8d94688d for tls_secret) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:19:48 +00:00
lifecycle {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config]
}
Add broker-sync Terraform stack (#7) * [f1-stream] Remove committed cluster-admin kubeconfig ## Context A kubeconfig granting cluster-admin access was accidentally committed into the f1-stream stack's application bundle in c7c7047f (2026-02-22). It contained the cluster CA certificate plus the kubernetes-admin client certificate and its RSA private key. Both remotes (github.com, forgejo) are public, so the credential has been reachable for ~2 months. Grep across the repo confirms no .tf / .hcl / .sh / .yaml file references this path; the file is a stray local artifact, likely swept in during a bulk `git add`. ## This change - git rm stacks/f1-stream/files/.config ## What is NOT in this change - Cluster-admin cert rotation on the control plane. The leaked client cert must be invalidated separately via `kubeadm certs renew admin.conf` or CA regeneration. Tracked in the broader secrets-remediation plan. - Git-history rewrite. The file is still reachable in every commit since c7c7047f. A `git filter-repo --path ... --invert-paths` pass against a fresh mirror is planned and will be force-pushed to both remotes. ## Test plan ### Automated No tests needed for a file removal. Sanity: $ grep -rn 'f1-stream/files/\.config' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output) ### Manual Verification 1. `git show HEAD --stat` shows exactly one path deleted: stacks/f1-stream/files/.config | 19 ------------------- 2. `test ! -e stacks/f1-stream/files/.config` returns true. 3. A copy of the leaked file is at /tmp/leaked.conf for post-rotation verification (confirming `kubectl --kubeconfig /tmp/leaked.conf get ns` fails with 401/403 once the admin cert is renewed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in bcad200a (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from bcad200a forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml | 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='*.tf' --include='*.hcl' \ --include='*.yaml' --include='*.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh | 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [monitoring] Delete orphan server-power-cycle/main.sh with iDRAC default creds ## Context stacks/monitoring/modules/monitoring/server-power-cycle/main.sh is an old shell implementation of a power-cycle watchdog that polled the Dell iDRAC on 192.168.1.4 for PSU voltage. It hardcoded the Dell iDRAC default credentials (root:calvin) in 5 `curl -u root:calvin` calls. Both remotes are public, so those credentials — and the implicit statement that 'this host has not rotated the default BMC password' — have been exposed. The current implementation is main.py in the same directory. It reads iDRAC credentials from the environment variables `idrac_user` and `idrac_password` (see module's iDRAC_USER_ENV_VAR / iDRAC_PASSWORD_ENV_VAR constants), which are populated from Vault via ExternalSecret at runtime. main.sh is not referenced by any Terraform, ConfigMap, or deploy script — grep confirms no `file()` / `templatefile()` / `filebase64()` call loads it, and no hand-rolled shell wrapper invokes it. ## This change - git rm stacks/monitoring/modules/monitoring/server-power-cycle/main.sh main.py is retained unchanged. ## What is NOT in this change - iDRAC password rotation on 192.168.1.4. The BMC should be moved off the vendor default `calvin` regardless; rotation is tracked in the broader remediation plan and in the iDRAC web UI. - A separate finding in stacks/monitoring/modules/monitoring/idrac.tf (the redfish-exporter ConfigMap has `default: username: root, password: calvin` as a fallback for iDRAC hosts not explicitly listed) is NOT addressed here — filed as its own task so the fix (drop the default block vs. source from env) can be considered in isolation. - Git-history scrub of main.sh is pending the broader filter-repo pass. ## Test plan ### Automated $ grep -rn 'server-power-cycle/main\.sh\|main\.sh' \ --include='*.tf' --include='*.hcl' --include='*.yaml' \ --include='*.yml' --include='*.sh' (no consumer references) ### Manual Verification 1. `git show HEAD --stat` shows only the one deletion. 2. `test ! -e stacks/monitoring/modules/monitoring/server-power-cycle/main.sh` 3. `kubectl -n monitoring get deploy idrac-redfish-exporter` still shows the exporter running — unrelated to this file. 4. main.py continues to run its watchdog loop without regression, because it was never coupled to main.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * [tls] Move 3 outlier stacks from per-stack PEMs to root-wildcard symlink ## Context foolery, terminal, and claude-memory each had their own `stacks/<x>/secrets/` directory with a plaintext EC-256 private key (privkey.pem, 241 B) and matching TLS certificate (fullchain.pem, 2868 B) for *.viktorbarzin.me. The 92 other stacks under stacks/ symlink `secrets/` → `../../secrets`, which resolves to the repo-root /secrets/ directory covered by the `secrets/** filter=git-crypt` .gitattributes rule — i.e., every other stack consumes the same git-crypt-encrypted root wildcard cert. The 3 outliers shipped their keys in plaintext because `.gitattributes` secrets/** rule matches only repo-root /secrets/, not stacks/*/secrets/. Both remotes are public, so the 6 plaintext PEM files have been exposed for 1–6 weeks (commits 5a988133 2026-03-11, a6f71fc6 2026-03-18, 9820f2ce 2026-04-10). Verified: - Root wildcard cert subject = CN viktorbarzin.me, SAN *.viktorbarzin.me + viktorbarzin.me — covers the 3 subdomains. - Root privkey + fullchain are a valid key pair (pubkey SHA256 match). - All 3 outlier certs have the same subject/SAN as root; different distinct cert material but equivalent coverage. ## This change - Delete plaintext PEMs in all 3 outlier stacks (6 files total). - Replace each stacks/<x>/secrets directory with a symlink to ../../secrets, matching the fleet pattern. - Add `stacks/**/secrets/** filter=git-crypt diff=git-crypt` to .gitattributes as a regression guard — any future real file placed under stacks/<x>/secrets/ gets git-crypt-encrypted automatically. setup_tls_secret module (modules/kubernetes/setup_tls_secret/main.tf) is unchanged. It still reads `file("${path.root}/secrets/fullchain.pem")`, which via the symlink resolves to the root wildcard. ## What is NOT in this change - Revocation of the 3 leaked per-stack certs. Backed up the leaked PEMs to /tmp/leaked-certs/ for `certbot revoke --reason keycompromise` once the user's LE account is authenticated. Revocation must happen before or alongside the history-rewrite force-push to both remotes. - Git-history scrub. The leaked PEM blobs are still reachable in every commit from 2026-03-11 forward. Scheduled for removal via `git filter-repo --path stacks/<x>/secrets/privkey.pem --invert-paths` (and fullchain.pem for each stack) in the broader remediation pass. - cert-manager introduction. The fleet does not use cert-manager today; this commit matches the existing symlink-to-wildcard pattern rather than introducing a new component. ## Test plan ### Automated $ readlink stacks/foolery/secrets ../../secrets (likewise for terminal, claude-memory) $ for s in foolery terminal claude-memory; do openssl x509 -in stacks/$s/secrets/fullchain.pem -noout -subject done subject=CN = viktorbarzin.me (x3 — all resolve via symlink to root wildcard) $ git check-attr filter -- stacks/foolery/secrets/fullchain.pem stacks/foolery/secrets/fullchain.pem: filter: git-crypt (now matched by the new rule, though for the symlink target the repo-root rule already applied) ### Manual Verification 1. `terragrunt plan` in stacks/foolery, stacks/terminal, stacks/claude-memory shows only the K8s TLS secret being re-created with the root-wildcard material. No ingress changes. 2. `terragrunt apply` for each stack → `kubectl -n <ns> get secret <name>-tls -o yaml` → tls.crt decodes to CN viktorbarzin.me with the root serial (different from the pre-change per-stack serials). 3. `curl -v https://foolery.viktorbarzin.me/` (and likewise terminal, claude-memory) → cert chain presents the new serial, handshake OK. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Add broker-sync Terraform stack (pending apply) Context ------- Part of the broker-sync rollout — see the plan at ~/.claude/plans/let-s-work-on-linking-temporal-valiant.md and the companion repo at ViktorBarzin/broker-sync. This change ----------- New stack `stacks/broker-sync/`: - `broker-sync` namespace, aux tier. - ExternalSecret pulling `secret/broker-sync` via vault-kv ClusterSecretStore. - `broker-sync-data-encrypted` PVC (1Gi, proxmox-lvm-encrypted, auto-resizer) — holds the sync SQLite db, FX cache, Wealthfolio cookie, CSV archive, watermarks. - Five CronJobs (all under `viktorbarzin/broker-sync:<tag>`, public DockerHub image; no pull secret): * `broker-sync-version` — daily 01:00 liveness probe (`broker-sync version`), used to smoke-test each new image. * `broker-sync-trading212` — daily 02:00 `broker-sync trading212 --mode steady`. * `broker-sync-imap` — daily 02:30, SUSPENDED (Phase 2). * `broker-sync-csv` — daily 03:00, SUSPENDED (Phase 3). * `broker-sync-fx-reconcile` — 7th of month 05:05, SUSPENDED (Phase 1 tail). - `broker-sync-backup` — daily 04:15, snapshots /data into NFS `/srv/nfs/broker-sync-backup/` with 30-day retention, matches the convention in infra/.claude/CLAUDE.md §3-2-1. NOT in this commit: - Old `wealthfolio-sync` CronJob retirement in stacks/wealthfolio/main.tf — happens in the same commit that first applies this stack, per the plan's "clean cutover" decision. - Vault seed. `secret/broker-sync` must be populated before apply; required keys documented in the ExternalSecret comment block. Test plan --------- ## Automated - `terraform fmt` — clean (ran before commit). - `terraform validate` needs `terragrunt init` first; deferred to apply time. ## Manual Verification 1. Seed Vault `secret/broker-sync/*` (see comment block on the ExternalSecret in main.tf). 2. `cd stacks/broker-sync && scripts/tg apply`. 3. `kubectl -n broker-sync get cronjob` — expect 6 CJs, 3 suspended. 4. `kubectl -n broker-sync create job smoke --from=cronjob/broker-sync-version`. 5. `kubectl -n broker-sync logs -l job-name=smoke` — expect `broker-sync 0.1.0`. * fix(beads-server): disable Authentik + CrowdSec on Workbench Authentik forward-auth returns 400 for dolt-workbench (no Authentik application configured for this domain). CrowdSec bouncer also intermittently returns 400. Both disabled — Workbench is accessible via Cloudflare tunnel only. TODO: Create Authentik application for dolt-workbench.viktorbarzin.me Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 21:17:45 +01:00
}
broker-sync: add Fidelity PlanViewer CronJob (suspended) ## Context Viktor's UK workplace pension is at Fidelity PlanViewer. The broker-sync provider + CLI landed in the broker-sync repo (commits 804e6a8 + 7c9be54); this commit adds the infra bits so the monthly sync runs in-cluster like the other broker-sync jobs. One successful manual backfill on 2026-04-18 pulled 51 contributions + valuation into a new WF WORKPLACE_PENSION account; Net Worth moved from £865k → £1,003k. This commit productionises that flow. ## This change - New kubernetes_cron_job_v1.fidelity in stacks/broker-sync/main.tf: - Schedule: 05:00 UK on the 20th of each month (after mid-month payroll settles; finance data shows credits on the 13th-18th). - Suspended by default — unsuspend once broker-sync image is rebuilt with Chromium baked in (Dockerfile change shipped separately in the broker-sync repo). - Init container materialises the storage_state JSON (projected from the broker-sync-secrets K8s Secret, synced from Vault by ESO) to the encrypted PVC at /data/fidelity_storage_state.json. Chromium then loads it. - Container: broker-sync fidelity-ingest with WF + FIDELITY_* env vars. Memory request 512Mi, limit 1280Mi — Chromium is hungry. - Lifecycle ignore_changes on dns_config per the KYVERNO_LIFECYCLE_V1 convention documented in AGENTS.md. ## What is NOT in this change - The Vault keys fidelity_storage_state + fidelity_plan_id — already staged via `vault kv patch` on 2026-04-18. - Dockerfile Chromium install — in broker-sync repo (commit 7c9be54). - Prometheus BrokerSyncFidelityFailed alert — deferred until the CronJob has run successfully for a month and we have a baseline. Existing broker-sync CronJobs also don't have per-job alerts yet; filing as a follow-up. ## Verification ### Automated terraform fmt ran clean. `terragrunt plan` would show a single new kubernetes_cron_job_v1 (suspended, so no pods scheduled). ### Manual (after apply + image rebuild) 1. Build + push broker-sync:<sha> with Chromium. 2. `scripts/tg apply stacks/broker-sync` (updates image_tag + adds fidelity CronJob). 3. Unsuspend: `kubectl -n broker-sync patch cronjob broker-sync-fidelity \ -p '{"spec":{"suspend":false}}'` OR flip the tf flag. 4. Trigger a test run: `kubectl -n broker-sync create job \ fidelity-test --from=cronjob/broker-sync-fidelity`. 5. Expect logs: `fidelity-ingest: fetched=N new=N imported=N failed=0`. 6. On FidelitySessionError: run `broker-sync fidelity-seed` locally + `vault kv patch secret/broker-sync fidelity_storage_state=@...`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 18:51:20 +00:00
# -----------------------------------------------------------------------------
# Fidelity UK PlanViewer — monthly pension contribution sync
#
# Architecture notes:
# - The CLI (`broker-sync fidelity-ingest`) loads storage_state.json, boots
# headless Chromium, scrapes the transaction history + valuation JSON, and
# posts DEPOSIT activities to Wealthfolio. See
# broker-sync/docs/providers/fidelity-planviewer.md for the seed workflow.
# - Storage_state is staged to Vault (`secret/broker-sync` →
# `fidelity_storage_state`). ESO projects all broker-sync keys into the
# shared `broker-sync-secrets` K8s Secret; an init container writes the
# JSON blob to the PVC so the main container can load it.
# - Image needs Chromium baked in — add the `fidelity-capable: "true"` label
# so the Dockerfile/CI treats this CronJob's pod spec as the Playwright
# variant. Until the Playwright image ships, keep `suspend = true`.
# - Schedule: 05:00 UK on the 20th of each month — well after Viktor's mid-
# month payroll contribution has settled (finance history shows credits
# landing 13th-18th).
resource "kubernetes_cron_job_v1" "fidelity" {
metadata {
name = "broker-sync-fidelity"
namespace = kubernetes_namespace.broker_sync.metadata[0].name
labels = { app = "broker-sync", component = "fidelity" }
}
spec {
schedule = "0 5 20 * *"
concurrency_policy = "Forbid"
successful_jobs_history_limit = 3
failed_jobs_history_limit = 5
# Suspended until the broker-sync image ships with Playwright + Chromium.
suspend = true
job_template {
metadata {}
spec {
backoff_limit = 1
ttl_seconds_after_finished = 86400
template {
metadata {
labels = { app = "broker-sync", component = "fidelity" }
}
spec {
restart_policy = "OnFailure"
# Materialise the JSON storage_state from the projected Secret
# onto the PVC where Playwright expects to read it. Init container
# runs as root; the main broker-sync container runs as uid 10001,
# so we chown+chmod 600 to grant read access to the broker user.
broker-sync: add Fidelity PlanViewer CronJob (suspended) ## Context Viktor's UK workplace pension is at Fidelity PlanViewer. The broker-sync provider + CLI landed in the broker-sync repo (commits 804e6a8 + 7c9be54); this commit adds the infra bits so the monthly sync runs in-cluster like the other broker-sync jobs. One successful manual backfill on 2026-04-18 pulled 51 contributions + valuation into a new WF WORKPLACE_PENSION account; Net Worth moved from £865k → £1,003k. This commit productionises that flow. ## This change - New kubernetes_cron_job_v1.fidelity in stacks/broker-sync/main.tf: - Schedule: 05:00 UK on the 20th of each month (after mid-month payroll settles; finance data shows credits on the 13th-18th). - Suspended by default — unsuspend once broker-sync image is rebuilt with Chromium baked in (Dockerfile change shipped separately in the broker-sync repo). - Init container materialises the storage_state JSON (projected from the broker-sync-secrets K8s Secret, synced from Vault by ESO) to the encrypted PVC at /data/fidelity_storage_state.json. Chromium then loads it. - Container: broker-sync fidelity-ingest with WF + FIDELITY_* env vars. Memory request 512Mi, limit 1280Mi — Chromium is hungry. - Lifecycle ignore_changes on dns_config per the KYVERNO_LIFECYCLE_V1 convention documented in AGENTS.md. ## What is NOT in this change - The Vault keys fidelity_storage_state + fidelity_plan_id — already staged via `vault kv patch` on 2026-04-18. - Dockerfile Chromium install — in broker-sync repo (commit 7c9be54). - Prometheus BrokerSyncFidelityFailed alert — deferred until the CronJob has run successfully for a month and we have a baseline. Existing broker-sync CronJobs also don't have per-job alerts yet; filing as a follow-up. ## Verification ### Automated terraform fmt ran clean. `terragrunt plan` would show a single new kubernetes_cron_job_v1 (suspended, so no pods scheduled). ### Manual (after apply + image rebuild) 1. Build + push broker-sync:<sha> with Chromium. 2. `scripts/tg apply stacks/broker-sync` (updates image_tag + adds fidelity CronJob). 3. Unsuspend: `kubectl -n broker-sync patch cronjob broker-sync-fidelity \ -p '{"spec":{"suspend":false}}'` OR flip the tf flag. 4. Trigger a test run: `kubectl -n broker-sync create job \ fidelity-test --from=cronjob/broker-sync-fidelity`. 5. Expect logs: `fidelity-ingest: fetched=N new=N imported=N failed=0`. 6. On FidelitySessionError: run `broker-sync fidelity-seed` locally + `vault kv patch secret/broker-sync fidelity_storage_state=@...`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 18:51:20 +00:00
init_container {
name = "stage-storage-state"
image = "busybox:1.36"
command = ["/bin/sh", "-c", <<-EOT
set -eu
mkdir -p /data
cp /secrets/fidelity_storage_state /data/fidelity_storage_state.json
chown 10001:10001 /data/fidelity_storage_state.json
broker-sync: add Fidelity PlanViewer CronJob (suspended) ## Context Viktor's UK workplace pension is at Fidelity PlanViewer. The broker-sync provider + CLI landed in the broker-sync repo (commits 804e6a8 + 7c9be54); this commit adds the infra bits so the monthly sync runs in-cluster like the other broker-sync jobs. One successful manual backfill on 2026-04-18 pulled 51 contributions + valuation into a new WF WORKPLACE_PENSION account; Net Worth moved from £865k → £1,003k. This commit productionises that flow. ## This change - New kubernetes_cron_job_v1.fidelity in stacks/broker-sync/main.tf: - Schedule: 05:00 UK on the 20th of each month (after mid-month payroll settles; finance data shows credits on the 13th-18th). - Suspended by default — unsuspend once broker-sync image is rebuilt with Chromium baked in (Dockerfile change shipped separately in the broker-sync repo). - Init container materialises the storage_state JSON (projected from the broker-sync-secrets K8s Secret, synced from Vault by ESO) to the encrypted PVC at /data/fidelity_storage_state.json. Chromium then loads it. - Container: broker-sync fidelity-ingest with WF + FIDELITY_* env vars. Memory request 512Mi, limit 1280Mi — Chromium is hungry. - Lifecycle ignore_changes on dns_config per the KYVERNO_LIFECYCLE_V1 convention documented in AGENTS.md. ## What is NOT in this change - The Vault keys fidelity_storage_state + fidelity_plan_id — already staged via `vault kv patch` on 2026-04-18. - Dockerfile Chromium install — in broker-sync repo (commit 7c9be54). - Prometheus BrokerSyncFidelityFailed alert — deferred until the CronJob has run successfully for a month and we have a baseline. Existing broker-sync CronJobs also don't have per-job alerts yet; filing as a follow-up. ## Verification ### Automated terraform fmt ran clean. `terragrunt plan` would show a single new kubernetes_cron_job_v1 (suspended, so no pods scheduled). ### Manual (after apply + image rebuild) 1. Build + push broker-sync:<sha> with Chromium. 2. `scripts/tg apply stacks/broker-sync` (updates image_tag + adds fidelity CronJob). 3. Unsuspend: `kubectl -n broker-sync patch cronjob broker-sync-fidelity \ -p '{"spec":{"suspend":false}}'` OR flip the tf flag. 4. Trigger a test run: `kubectl -n broker-sync create job \ fidelity-test --from=cronjob/broker-sync-fidelity`. 5. Expect logs: `fidelity-ingest: fetched=N new=N imported=N failed=0`. 6. On FidelitySessionError: run `broker-sync fidelity-seed` locally + `vault kv patch secret/broker-sync fidelity_storage_state=@...`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 18:51:20 +00:00
chmod 600 /data/fidelity_storage_state.json
EOT
]
volume_mount {
name = "secrets"
mount_path = "/secrets"
read_only = true
}
volume_mount {
name = "data"
mount_path = "/data"
}
resources {
requests = { cpu = "5m", memory = "8Mi" }
limits = { memory = "32Mi" }
}
}
container {
name = "broker-sync"
image = local.broker_sync_image
command = ["broker-sync", "fidelity-ingest"]
env {
name = "BROKER_SYNC_DATA_DIR"
value = "/data"
}
env {
name = "WF_SESSION_PATH"
value = "/data/wealthfolio_session.json"
}
env {
name = "FIDELITY_STORAGE_STATE_PATH"
value = "/data/fidelity_storage_state.json"
}
env {
name = "FIDELITY_PLAN_ID"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "fidelity_plan_id"
}
}
}
env {
name = "WF_BASE_URL"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_base_url"
}
}
}
env {
name = "WF_USERNAME"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_username"
}
}
}
env {
name = "WF_PASSWORD"
value_from {
secret_key_ref {
name = "broker-sync-secrets"
key = "wf_password"
}
}
}
volume_mount {
name = "data"
mount_path = "/data"
}
resources {
# Chromium is hungry — headless shell + page rendering
# comfortably under 1Gi, spike up to 1.2Gi during full-page
# screenshots.
requests = { cpu = "50m", memory = "512Mi" }
limits = { memory = "1280Mi" }
}
}
volume {
name = "secrets"
secret {
secret_name = "broker-sync-secrets"
items {
key = "fidelity_storage_state"
path = "fidelity_storage_state"
}
}
}
volume {
name = "data"
persistent_volume_claim {
claim_name = kubernetes_persistent_volume_claim.data_encrypted.metadata[0].name
}
}
}
}
}
}
}
lifecycle {
ignore_changes = [spec[0].job_template[0].spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
}