Infra side of ADR-0014: an mTLS gRPC consumer of Calico Goldmane's Flows API
that records the namespace-pair edge-set in CNPG and posts a daily new-edge
digest to #security. Adds the goldmane-edge-aggregator stack, the
pg-goldmane-edges Vault rotation role (Tier-0 vault state updated here), and the
namespace in the ghcr-credentials allowlist.
Cert: REUSES the operator-minted, Tigera-CA-signed whisker-backend client cert
(Goldmane verifies only the CA chain, not identity) instead of minting from the
Tigera CA private key. This avoids putting the CA key in TF state AND the
hashicorp/tls provider, which is incompatible with this repo's global
generate-providers/lockfile pattern (it broke every stack's lockfile).
Verified live: aggregator streaming flows, 174 edges in Postgres across 50x54
namespaces, db+slack ExternalSecrets synced, digest dry-run formats correctly,
private image pulls via the Kyverno-synced ghcr-credentials.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and
rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto
it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard +
--max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault
secret/stem95su. Requires the GCP OAuth app published to Production or the
refresh token expires ~weekly.
Lands the gdrive-sync stack on master (it had landed on a feature branch
by accident on the shared devvm checkout).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
WHAT LANDED:
- terragrunt.hcl (root): added telmate/proxmox to k8s_providers
required_providers. Other stacks just don't instantiate a provider
block — harmless. Replaces the same-name override trick the infra
stack used to do, which stopped working under Terragrunt v0.77
("Detected generate blocks with the same name").
- stacks/infra/terragrunt.hcl: new generate "proxmox_provider" block
writes proxmox_provider.tf with the provider config; credentials
read from Vault secret/viktor at plan/apply time (no env vars).
- modules/create-vm: new mbps_rd / mbps_wr number variables (default 0
= uncapped), wired into scsi0/scsi1 disk{} blocks as
mbps_r_concurrent / mbps_wr_concurrent. lifecycle.ignore_changes
extended to scsi6..scsi29 (K8s nodes have many CSI-managed slots),
plus scsihw and qemu_os (vary per-VM; non-trivial live changes).
- stacks/infra/main.tf: docker-registry-vm gains mbps_rd=40,
mbps_wr=40 in HCL — already applied live via qm set on 2026-05-26.
WHAT FAILED AND WAS ROLLED BACK:
- Attempted import of 7 VMs (102 devvm, 103 home-assistant, 200
k8s-master, 201 k8s-node1, 202 k8s-node2, 203 k8s-node3, 204
k8s-node4) via import {} blocks. The telmate/proxmox v3.0.2-rc07
provider mangled proxmox-csi PVC slots on apply for vmid 202 and
203: every scsi slot got rewritten from `vm-9999-pvc-<uuid>` to
the boot disk `vm-<vmid>-disk-0`. Restored both .conf files from
the 2026-05-24 nightly PVE config backup at /mnt/backup/pve-config/
etc-pve/nodes/pve/qemu-server/{202,203}.conf — no reboots, no data
loss, K8s CSI reconciled PVC attachments within minutes. Removed
the 7 imports from state via `terraform state rm` and re-encrypted.
Tracked in beads code-xzbl: blocked on bpg/proxmox provider
migration (telmate has the same dynamic-disk defect that bit us on
iSCSI back in 2026-04-02; see memory id=539).
LIVE CAPS STILL IN PLACE (qm set, 2026-05-26 ~03:13 UTC):
102 devvm 60/60 103 home-assistant 40/40 200 k8s-master 100/60
201 k8s-node1 150/120 202 k8s-node2 150/120 203 k8s-node3 150/120
204 k8s-node4 150/120 220 docker-registry 40/40
(pfSense 101 BSD + Windows10 300 intentionally out of scope.)
PRE-EXISTING DRIFT EXPOSED (NOT NEW):
- HCL declares k8s-master (200) and k8s-node2 (202) but neither was
ever imported into TF state — confirmed against the SOPS-encrypted
state in git (lineage e1cc5bb5, serial 42, last touched 2026-04-06).
This commit leaves both declarations in place but does NOT import
them; that's part of the code-xzbl follow-up.
Closes: code-s9xr