infra/.claude/skills/extend-vm-storage.md
Viktor Barzin 08ea489fe0 [ci skip] Add extend-vm-storage script and skills
- Script to automate K8s node VM disk expansion (drain, shutdown, resize, boot, expand FS, uncordon)
- Skill docs for the workflow and troubleshooting pitfalls (growpart, macOS grep -P, drain timeouts)
- Successfully tested on k8s-node2, k8s-node3, k8s-node4 (64G → 128G)
2026-02-13 22:08:46 +00:00

2 KiB

Extend VM Storage Skill

Purpose: Extend disk storage on a Kubernetes node VM (Proxmox-hosted).

When to use: User wants to increase disk space on a k8s node VM, or a node is running low on disk.

Workflow

1. Identify the Node

Ask the user which node needs more storage and how much to add.

Valid nodes: k8s-master, k8s-node1, k8s-node2, k8s-node3, k8s-node4

2. Run the Script

./scripts/extend_vm_storage.sh <node-name> <size-increment>

Example:

./scripts/extend_vm_storage.sh k8s-node2 +64G

3. What the Script Does

  1. Validates inputs (node name and size format)
  2. Resolves node IP via kubectl
  3. Prompts for confirmation
  4. Drains the node (evicts pods)
  5. Shuts down the VM in Proxmox
  6. Resizes the disk (scsi0) by the given increment
  7. Starts the VM and waits for SSH
  8. Expands the filesystem inside the guest (auto-detects LVM vs direct partition)
  9. Uncordons the node
  10. Shows verification output (df -h and node status)

4. Update Terraform (if needed)

If you want Terraform to reflect the new disk size, update the VM definition in main.tf or modules/create-vm/ so that a future terraform apply doesn't revert the change. Check if the VM disk size is managed by Terraform:

grep -A5 "disk" main.tf | grep -i size

If managed, update the size value to match the new total.

5. Verification

After the script completes, verify:

kubectl --kubeconfig $(pwd)/config get nodes
ssh wizard@<node-ip> "df -h /"

Recovery

If the script fails mid-way:

  1. Check VM status: ssh root@192.168.1.127 "qm status <vmid>"
  2. Start VM if stopped: ssh root@192.168.1.127 "qm start <vmid>"
  3. Uncordon node: kubectl --kubeconfig $(pwd)/config uncordon <node-name>

Constants

Setting Value
Proxmox host root@192.168.1.127
VM SSH user wizard
Disk name scsi0
Shutdown timeout 300s
SSH wait timeout 300s

Questions to Ask User

  1. Which node needs more storage?
  2. How much storage to add? (e.g., +64G)