Stage 2 now reuses the existing default/backup-etcd CronJob (NFS-backed PV pointing at 192.168.1.127:/srv/nfs/etcd-backup) instead of trying to ssh into master and run etcdctl against a non-existent /mnt/main mount. The agent triggers a one-shot Job from cronjob/backup-etcd, waits up to 10 min, then parses the backup-manage container log for "Backup done" line + byte count. Test 2 (dry-run) surfaced 5 real cluster blockers — agent loop works end-to-end at the planning level. Expanded the claude-agent ServiceAccount's privileges via a sibling ClusterRole (claude-agent-upgrade-ops): - patch namespaces/k8s-upgrade (in-flight annotation) - create batch/jobs (trigger etcd snapshot Job) - patch nodes (cordon/uncordon) - create pods/eviction (drain) - delete pods (drain fallback) |
||
|---|---|---|
| .. | ||
| agents | ||
| commands | ||
| reference | ||
| scripts | ||
| skills | ||
| calendar-query.py | ||
| CLAUDE.md | ||
| home-assistant-sofia.py | ||
| home-assistant.py | ||
| internet-mode-used_DO_NOT_REMOVE_MANUALLY_SECURITY_RISK | ||
| pfsense.py | ||
| settings.json | ||