Stage 2 now reuses the existing default/backup-etcd CronJob (NFS-backed PV pointing at 192.168.1.127:/srv/nfs/etcd-backup) instead of trying to ssh into master and run etcdctl against a non-existent /mnt/main mount. The agent triggers a one-shot Job from cronjob/backup-etcd, waits up to 10 min, then parses the backup-manage container log for "Backup done" line + byte count. Test 2 (dry-run) surfaced 5 real cluster blockers — agent loop works end-to-end at the planning level. Expanded the claude-agent ServiceAccount's privileges via a sibling ClusterRole (claude-agent-upgrade-ops): - patch namespaces/k8s-upgrade (in-flight annotation) - create batch/jobs (trigger etcd snapshot Job) - patch nodes (cordon/uncordon) - create pods/eviction (drain) - delete pods (drain fallback) |
||
|---|---|---|
| .. | ||
| issue-responder.md | ||
| k8s-version-upgrade.md | ||
| payslip-extractor.md | ||
| post-mortem.md | ||
| postmortem-todo-resolver.md | ||
| service-upgrade.md | ||
| sev-historian.md | ||
| sev-report-writer.md | ||
| sev-triage.md | ||