stem95su: scheduled Drive->site sync CronJob (every 10m)

CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and
rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto
it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard +
--max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault
secret/stem95su. Requires the GCP OAuth app published to Production or the
refresh token expires ~weekly.

Lands the gdrive-sync stack on master (it had landed on a feature branch
by accident on the shared devvm checkout).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Viktor Barzin 2026-06-09 08:42:26 +00:00
parent 05b50d2b96
commit 6d224861c4
1168 changed files with 120 additions and 358547 deletions

View file

@ -1,29 +0,0 @@
# Node Configuration Drift Quick Wins — Design
**Date**: 2026-02-22
**Status**: Approved
**Context**: From Talos Linux evaluation — these close 95% of the drift gap without changing the OS
## Quick Win 1: Add GPU Label to Terraform
**File**: `stacks/platform/modules/nvidia/main.tf`
Extend the existing `null_resource.gpu_node_taint` to also apply the `gpu=true` label. Rename to `gpu_node_config`. Both commands are idempotent (`--overwrite` for taint, label is a no-op if already set).
## Quick Win 2: Improve API Server OIDC/Audit Idempotency
**Files**: `stacks/platform/modules/rbac/apiserver-oidc.tf`, `audit-policy.tf`
Current grep-before-sed checks prevent duplicate entries but don't handle value changes. Improve the OIDC check to compare the actual issuer URL value, not just the flag name. Audit policy file is always re-uploaded (good), manifest edit is skipped if already configured (acceptable).
## Quick Win 3: Enable Node-Exporter via Prometheus Helm Chart
**File**: `stacks/platform/modules/monitoring/prometheus_chart_values.tpl`
Uncomment `prometheus-node-exporter: enabled: true`. Delete `playbooks/deploy_node_exporter.yaml` (unused, superseded by DaemonSet).
## Quick Win 4: Document Node Rebuild Procedure
**File**: `.claude/CLAUDE.md`
Add a "Node Rebuild Procedure" section documenting the full sequence: VM creation from template → cloud-init → kubeadm join → verify mirrors/labels/taints.