infra/.claude/agents
Viktor Barzin 988bfde45c k8s-version-upgrade: trigger etcd snapshot via existing backup-etcd Job; broaden agent RBAC
Stage 2 now reuses the existing default/backup-etcd CronJob (NFS-backed
PV pointing at 192.168.1.127:/srv/nfs/etcd-backup) instead of trying to
ssh into master and run etcdctl against a non-existent /mnt/main mount.
The agent triggers a one-shot Job from cronjob/backup-etcd, waits up to
10 min, then parses the backup-manage container log for "Backup done"
line + byte count.

Test 2 (dry-run) surfaced 5 real cluster blockers — agent loop works
end-to-end at the planning level.

Expanded the claude-agent ServiceAccount's privileges via a sibling
ClusterRole (claude-agent-upgrade-ops):
  - patch namespaces/k8s-upgrade (in-flight annotation)
  - create batch/jobs (trigger etcd snapshot Job)
  - patch nodes (cordon/uncordon)
  - create pods/eviction (drain)
  - delete pods (drain fallback)
2026-05-10 19:16:12 +00:00
..
issue-responder.md Add agent task tracking documentation 2026-04-15 17:11:26 +00:00
k8s-version-upgrade.md k8s-version-upgrade: trigger etcd snapshot via existing backup-etcd Job; broaden agent RBAC 2026-05-10 19:16:12 +00:00
payslip-extractor.md [payslip-ingest] Update extractor agent + dashboard for v2 regex parser 2026-04-19 10:54:33 +00:00
post-mortem.md feat: add incident management system with user reporting 2026-04-14 20:00:31 +00:00
postmortem-todo-resolver.md feat: post-mortem automation pipeline 2026-04-14 15:34:42 +00:00
service-upgrade.md [service-upgrade] Drop vault-CLI assumptions + check default workflow only 2026-04-19 13:15:06 +00:00
sev-historian.md feat: add incident management system with user reporting 2026-04-14 20:00:31 +00:00
sev-report-writer.md feat: add incident management system with user reporting 2026-04-14 20:00:31 +00:00
sev-triage.md feat: add incident management system with user reporting 2026-04-14 20:00:31 +00:00