- Script to automate K8s node VM disk expansion (drain, shutdown, resize, boot, expand FS, uncordon) - Skill docs for the workflow and troubleshooting pitfalls (growpart, macOS grep -P, drain timeouts) - Successfully tested on k8s-node2, k8s-node3, k8s-node4 (64G → 128G)
2 KiB
2 KiB
Extend VM Storage Skill
Purpose: Extend disk storage on a Kubernetes node VM (Proxmox-hosted).
When to use: User wants to increase disk space on a k8s node VM, or a node is running low on disk.
Workflow
1. Identify the Node
Ask the user which node needs more storage and how much to add.
Valid nodes: k8s-master, k8s-node1, k8s-node2, k8s-node3, k8s-node4
2. Run the Script
./scripts/extend_vm_storage.sh <node-name> <size-increment>
Example:
./scripts/extend_vm_storage.sh k8s-node2 +64G
3. What the Script Does
- Validates inputs (node name and size format)
- Resolves node IP via kubectl
- Prompts for confirmation
- Drains the node (evicts pods)
- Shuts down the VM in Proxmox
- Resizes the disk (
scsi0) by the given increment - Starts the VM and waits for SSH
- Expands the filesystem inside the guest (auto-detects LVM vs direct partition)
- Uncordons the node
- Shows verification output (
df -hand node status)
4. Update Terraform (if needed)
If you want Terraform to reflect the new disk size, update the VM definition in main.tf or modules/create-vm/ so that a future terraform apply doesn't revert the change. Check if the VM disk size is managed by Terraform:
grep -A5 "disk" main.tf | grep -i size
If managed, update the size value to match the new total.
5. Verification
After the script completes, verify:
kubectl --kubeconfig $(pwd)/config get nodes
ssh wizard@<node-ip> "df -h /"
Recovery
If the script fails mid-way:
- Check VM status:
ssh root@192.168.1.127 "qm status <vmid>" - Start VM if stopped:
ssh root@192.168.1.127 "qm start <vmid>" - Uncordon node:
kubectl --kubeconfig $(pwd)/config uncordon <node-name>
Constants
| Setting | Value |
|---|---|
| Proxmox host | root@192.168.1.127 |
| VM SSH user | wizard |
| Disk name | scsi0 |
| Shutdown timeout | 300s |
| SSH wait timeout | 300s |
Questions to Ask User
- Which node needs more storage?
- How much storage to add? (e.g., +64G)