From 08ea489fe0f6474a1b2a4298066945208a417991 Mon Sep 17 00:00:00 2001
From: Viktor Barzin <viktorbarzin@meta.com>
Date: Fri, 13 Feb 2026 22:08:46 +0000
Subject: [PATCH] [ci skip] Add extend-vm-storage script and skills
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Script to automate K8s node VM disk expansion (drain, shutdown, resize, boot, expand FS, uncordon)
- Skill docs for the workflow and troubleshooting pitfalls (growpart, macOS grep -P, drain timeouts)
- Successfully tested on k8s-node2, k8s-node3, k8s-node4 (64G → 128G)
---
 .claude/CLAUDE.md                             |   6 +
 .claude/skills/extend-vm-storage.md           |  77 ++++
 .../SKILL.md                                  | 136 +++++++
 scripts/extend_vm_storage.sh                  | 372 ++++++++++++++++++
 4 files changed, 591 insertions(+)
 create mode 100644 .claude/skills/extend-vm-storage.md
 create mode 100644 .claude/skills/proxmox-vm-disk-expansion-pitfalls/SKILL.md
 create mode 100755 scripts/extend_vm_storage.sh
diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
index 20cb1253..b54995c8 100755
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -435,6 +435,12 @@ Skills are specialized workflows for common tasks. Located in `.claude/skills/`.
 - **When to use**: User provides GitHub URL or wants to deploy a new service
 - **Example**: "Deploy [GitHub repo] to the cluster"
 
+**extend-vm-storage** (`.claude/skills/extend-vm-storage.md`)
+- Extend disk storage on K8s node VMs (Proxmox-hosted)
+- Automates: drain → shutdown → resize → boot → expand filesystem → uncordon
+- **When to use**: A k8s node needs more disk space
+- **Example**: "Extend storage on k8s-node2 by 64G"
+
 ---
 
 ## Service-Specific Notes
diff --git a/.claude/skills/extend-vm-storage.md b/.claude/skills/extend-vm-storage.md
new file mode 100644
index 00000000..a994acbb
--- /dev/null
+++ b/.claude/skills/extend-vm-storage.md
@@ -0,0 +1,77 @@
+# Extend VM Storage Skill
+
+**Purpose**: Extend disk storage on a Kubernetes node VM (Proxmox-hosted).
+
+**When to use**: User wants to increase disk space on a k8s node VM, or a node is running low on disk.
+
+## Workflow
+
+### 1. Identify the Node
+
+Ask the user which node needs more storage and how much to add.
+
+Valid nodes: `k8s-master`, `k8s-node1`, `k8s-node2`, `k8s-node3`, `k8s-node4`
+
+### 2. Run the Script
+
+```bash
+./scripts/extend_vm_storage.sh <node-name> <size-increment>
+```
+
+**Example**:
+```bash
+./scripts/extend_vm_storage.sh k8s-node2 +64G
+```
+
+### 3. What the Script Does
+
+1. Validates inputs (node name and size format)
+2. Resolves node IP via kubectl
+3. Prompts for confirmation
+4. Drains the node (evicts pods)
+5. Shuts down the VM in Proxmox
+6. Resizes the disk (`scsi0`) by the given increment
+7. Starts the VM and waits for SSH
+8. Expands the filesystem inside the guest (auto-detects LVM vs direct partition)
+9. Uncordons the node
+10. Shows verification output (`df -h` and node status)
+
+### 4. Update Terraform (if needed)
+
+If you want Terraform to reflect the new disk size, update the VM definition in `main.tf` or `modules/create-vm/` so that a future `terraform apply` doesn't revert the change. Check if the VM disk size is managed by Terraform:
+
+```bash
+grep -A5 "disk" main.tf | grep -i size
+```
+
+If managed, update the size value to match the new total.
+
+### 5. Verification
+
+After the script completes, verify:
+```bash
+kubectl --kubeconfig $(pwd)/config get nodes
+ssh wizard@<node-ip> "df -h /"
+```
+
+## Recovery
+
+If the script fails mid-way:
+1. Check VM status: `ssh root@192.168.1.127 "qm status <vmid>"`
+2. Start VM if stopped: `ssh root@192.168.1.127 "qm start <vmid>"`
+3. Uncordon node: `kubectl --kubeconfig $(pwd)/config uncordon <node-name>`
+
+## Constants
+
+| Setting | Value |
+|---------|-------|
+| Proxmox host | `root@192.168.1.127` |
+| VM SSH user | `wizard` |
+| Disk name | `scsi0` |
+| Shutdown timeout | 300s |
+| SSH wait timeout | 300s |
+
+## Questions to Ask User
+
+1. Which node needs more storage?
+2. How much storage to add? (e.g., +64G)
diff --git a/.claude/skills/proxmox-vm-disk-expansion-pitfalls/SKILL.md b/.claude/skills/proxmox-vm-disk-expansion-pitfalls/SKILL.md
new file mode 100644
index 00000000..89f89f22
--- /dev/null
+++ b/.claude/skills/proxmox-vm-disk-expansion-pitfalls/SKILL.md
@@ -0,0 +1,136 @@
+---
+name: proxmox-vm-disk-expansion-pitfalls
+description: |
+  Troubleshoot common failures when expanding Proxmox VM disks on Ubuntu 24.04
+  cloud-init images and draining Kubernetes nodes. Use when: (1) growpart fails
+  with "command not found" on Ubuntu cloud-init VMs, (2) grep -P fails on macOS
+  with "invalid option -- P", (3) kubectl drain times out with pods stuck
+  terminating, (4) filesystem shows old size after qm resize. Covers
+  cloud-guest-utils installation, macOS-portable regex parsing, drain timeout
+  tuning, and recovery from partial failures.
+author: Claude Code
+version: 1.0.0
+date: 2026-02-13
+---
+
+# Proxmox VM Disk Expansion Pitfalls
+
+## Problem
+
+Expanding disk storage on Proxmox-hosted Ubuntu 24.04 cloud-init VMs (used as
+Kubernetes nodes) fails at multiple points due to missing tools, cross-platform
+incompatibilities, and Kubernetes drain timeouts.
+
+## Context / Trigger Conditions
+
+- Running disk expansion scripts from macOS against Proxmox + Ubuntu VMs
+- Ubuntu 24.04 cloud-init images (the default k8s node template)
+- Kubernetes nodes with many pods or stateful workloads
+- Using `scripts/extend_vm_storage.sh` or similar automation
+
+## Issues and Solutions
+
+### 1. `growpart: command not found` on Ubuntu 24.04
+
+**Symptom**: After `qm resize`, SSH into VM, run `growpart /dev/sda 1` — fails
+with "command not found". `resize2fs` then reports "Nothing to do!" because the
+partition table hasn't been updated.
+
+**Root cause**: Ubuntu 24.04 cloud-init images don't include `cloud-guest-utils`
+by default. The `growpart` tool (which updates the partition table to use new
+disk space) is in this package.
+
+**Fix**:
+```bash
+sudo apt-get update -qq && sudo apt-get install -y -qq cloud-guest-utils
+sudo growpart /dev/sda 1
+sudo resize2fs /dev/sda1
+```
+
+**Prevention**: Check for `growpart` before attempting partition expansion:
+```bash
+if ! command -v growpart &>/dev/null; then
+    sudo apt-get update -qq && sudo apt-get install -y -qq cloud-guest-utils
+fi
+```
+
+### 2. `grep -P` (PCRE) not available on macOS
+
+**Symptom**: Script running on macOS fails with `grep: invalid option -- P`.
+
+**Root cause**: macOS ships BSD grep, which doesn't support `-P` (Perl-compatible
+regex). GNU grep (from Homebrew) does, but scripts shouldn't assume it's installed.
+
+**Fix**: Replace `grep -oP 'pattern\Kcapture'` with portable `sed`:
+```bash
+# BAD (GNU grep only):
+CURRENT_SIZE=$(echo "$LINE" | grep -oP 'size=\K[0-9]+G')
+
+# GOOD (portable):
+CURRENT_SIZE=$(echo "$LINE" | sed -n 's/.*size=\([0-9]*G\).*/\1/p')
+```
+
+**General rule**: In scripts that run on macOS, avoid `grep -P`, `sed -i ''`
+vs `sed -i` differences, and `date` flag differences. Use `sed` with basic
+regex or bash built-in `[[ =~ ]]` for pattern matching.
+
+### 3. `kubectl drain` timeout with stuck pods
+
+**Symptom**: `kubectl drain --timeout=120s` fails with "context deadline exceeded"
+for multiple pods. Pods are evicted but don't terminate in time.
+
+**Root cause**: Some pods (stateful services like ClickHouse, Paperless-ngx,
+OnlyOffice) need more time to shut down gracefully. 120s isn't enough when many
+pods are draining simultaneously.
+
+**Fix**: Use `--force` flag and a longer timeout, or retry:
+```bash
+# First attempt with standard timeout
+kubectl drain <node> --ignore-daemonsets --delete-emptydir-data --timeout=120s
+
+# If it fails, force with longer timeout (pods already evicting)
+kubectl drain <node> --ignore-daemonsets --delete-emptydir-data --timeout=300s --force
+```
+
+**Note**: After a failed drain, the node is already cordoned. A second drain
+attempt only needs to wait for already-evicting pods to finish.
+
+### 4. Recovery from partial failure
+
+If the script fails mid-way (after drain but before uncordon):
+
+```bash
+# Check VM status
+ssh root@192.168.1.127 "qm status <vmid>"
+
+# Start VM if stopped
+ssh root@192.168.1.127 "qm start <vmid>"
+
+# Uncordon node
+kubectl --kubeconfig $(pwd)/config uncordon <node-name>
+```
+
+## Verification
+
+After successful expansion:
+```bash
+# On the VM
+df -h /
+# Should show new size (128G disk → ~126G usable for ext4)
+
+# On the cluster
+kubectl get node <name>
+# Should show Ready status
+```
+
+## Notes
+
+- The k8s node VMs use direct partition layout (`/dev/sda1`), not LVM, despite
+  the script handling both paths
+- `growpart` returns exit code 1 for "NOCHANGE" (partition already at max) —
+  this is not an error
+- Proxmox `qm resize` uses `scsi0` as the disk identifier for these VMs
+- SSH host keys may change if VMs are recreated or network changes — use
+  `-o StrictHostKeyChecking=no` in automated scripts
+
+See also: `extend-vm-storage.md` (the operational skill for running the script)
diff --git a/scripts/extend_vm_storage.sh b/scripts/extend_vm_storage.sh
new file mode 100755
index 00000000..4cc9b609
--- /dev/null
+++ b/scripts/extend_vm_storage.sh
@@ -0,0 +1,372 @@
+#!/usr/bin/env bash
+
+# Extend disk storage on a Kubernetes node VM.
+# Drains the node, shuts down the VM, resizes the disk in Proxmox,
+# boots the VM, expands the filesystem, and uncordons the node.
+#
+# Usage: ./scripts/extend_vm_storage.sh <node-name> <size-increment>
+# Example: ./scripts/extend_vm_storage.sh k8s-node2 +64G
+
+# --- Constants ---
+PROXMOX_HOST="root@192.168.1.127"
+VM_SSH_USER="wizard"
+KUBECTL="kubectl --kubeconfig $(pwd)/config"
+SHUTDOWN_TIMEOUT=300
+SSH_WAIT_TIMEOUT=300
+POLL_INTERVAL=5
+
+# --- Colors ---
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[0;33m'
+BLUE='\033[0;34m'
+NC='\033[0m'
+
+info()  { echo -e "${BLUE}[INFO]${NC} $*"; }
+ok()    { echo -e "${GREEN}[OK]${NC} $*"; }
+warn()  { echo -e "${YELLOW}[WARN]${NC} $*"; }
+error() { echo -e "${RED}[ERROR]${NC} $*"; }
+
+# --- Node-to-VMID mapping ---
+declare -A NODE_VMID=(
+    [k8s-master]=200
+    [k8s-node1]=201
+    [k8s-node2]=202
+    [k8s-node3]=203
+    [k8s-node4]=204
+)
+
+# --- Cleanup trap ---
+DRAINED_NODE=""
+cleanup() {
+    if [[ -n "$DRAINED_NODE" ]]; then
+        echo ""
+        error "Script exited unexpectedly!"
+        warn "The node '$DRAINED_NODE' may still be cordoned/drained."
+        warn "Recovery steps:"
+        warn "  1. Check VM status: ssh $PROXMOX_HOST 'qm status ${NODE_VMID[$DRAINED_NODE]}'"
+        warn "  2. Start VM if stopped: ssh $PROXMOX_HOST 'qm start ${NODE_VMID[$DRAINED_NODE]}'"
+        warn "  3. Uncordon node: $KUBECTL uncordon $DRAINED_NODE"
+    fi
+}
+trap cleanup EXIT
+
+# --- Input validation ---
+usage() {
+    echo "Usage: $0 <node-name> <size-increment>"
+    echo ""
+    echo "Arguments:"
+    echo "  node-name       One of: ${!NODE_VMID[*]}"
+    echo "  size-increment  Disk size increase, e.g. +64G, +128G"
+    echo ""
+    echo "Example:"
+    echo "  $0 k8s-node2 +64G"
+    exit 1
+}
+
+if [[ $# -ne 2 ]]; then
+    usage
+fi
+
+NODE_NAME="$1"
+SIZE_INCREMENT="$2"
+
+if [[ -z "${NODE_VMID[$NODE_NAME]+x}" ]]; then
+    error "Unknown node: '$NODE_NAME'"
+    echo "Valid nodes: ${!NODE_VMID[*]}"
+    exit 1
+fi
+
+if [[ ! "$SIZE_INCREMENT" =~ ^\+[0-9]+G$ ]]; then
+    error "Invalid size increment: '$SIZE_INCREMENT'"
+    echo "Must match pattern +<number>G, e.g. +64G"
+    exit 1
+fi
+
+VMID="${NODE_VMID[$NODE_NAME]}"
+
+# --- Resolve node IP via kubectl ---
+info "Resolving IP for node '$NODE_NAME'..."
+NODE_IP=$($KUBECTL get node "$NODE_NAME" -o jsonpath='{.status.addresses[?(@.type=="InternalIP")].address}' 2>/dev/null)
+if [[ -z "$NODE_IP" ]]; then
+    error "Could not resolve IP for node '$NODE_NAME'. Is the cluster reachable?"
+    exit 1
+fi
+ok "Node IP: $NODE_IP"
+
+# --- Query current disk size ---
+info "Querying current disk size for VM $VMID..."
+SCSI0_LINE=$(ssh "$PROXMOX_HOST" "qm config $VMID" 2>/dev/null | grep '^scsi0:')
+if [[ -z "$SCSI0_LINE" ]]; then
+    error "Could not read scsi0 config for VM $VMID."
+    exit 1
+fi
+# Extract size value, e.g. "size=64G" from the config line
+CURRENT_SIZE=$(echo "$SCSI0_LINE" | sed -n 's/.*size=\([0-9]*G\).*/\1/p')
+if [[ -z "$CURRENT_SIZE" ]]; then
+    error "Could not parse current disk size from: $SCSI0_LINE"
+    exit 1
+fi
+CURRENT_SIZE_NUM=${CURRENT_SIZE%G}
+INCREMENT_NUM=${SIZE_INCREMENT//[+G]/}
+NEW_SIZE_NUM=$((CURRENT_SIZE_NUM + INCREMENT_NUM))
+ok "Current disk size: ${CURRENT_SIZE_NUM}G → New size: ${NEW_SIZE_NUM}G (${SIZE_INCREMENT})"
+
+if [[ $NEW_SIZE_NUM -le $CURRENT_SIZE_NUM ]]; then
+    error "New size (${NEW_SIZE_NUM}G) must be greater than current size (${CURRENT_SIZE_NUM}G)."
+    exit 1
+fi
+
+# --- Confirmation ---
+echo ""
+echo "========================================="
+echo "  Extend VM Storage"
+echo "========================================="
+echo "  Node:       $NODE_NAME"
+echo "  VMID:       $VMID"
+echo "  Node IP:    $NODE_IP"
+echo "  Current:    ${CURRENT_SIZE_NUM}G"
+echo "  Increment:  $SIZE_INCREMENT"
+echo "  New size:   ${NEW_SIZE_NUM}G"
+echo "  Proxmox:    $PROXMOX_HOST"
+echo "========================================="
+echo ""
+echo "This will:"
+echo "  1. Drain the node (evict pods)"
+echo "  2. Shut down the VM"
+echo "  3. Resize disk (scsi0) from ${CURRENT_SIZE_NUM}G to ${NEW_SIZE_NUM}G"
+echo "  4. Start the VM"
+echo "  5. Expand the filesystem inside the guest"
+echo "  6. Uncordon the node"
+echo ""
+read -rp "Proceed? [y/N] " confirm
+if [[ ! "$confirm" =~ ^[yY]$ ]]; then
+    echo "Aborted."
+    exit 0
+fi
+
+# --- Step 1: Drain node ---
+info "Step 1/7: Draining node '$NODE_NAME'..."
+DRAINED_NODE="$NODE_NAME"
+if ! $KUBECTL drain "$NODE_NAME" --ignore-daemonsets --delete-emptydir-data --timeout=120s; then
+    error "Failed to drain node '$NODE_NAME'."
+    exit 1
+fi
+ok "Node drained."
+
+# --- Step 2: Shutdown VM ---
+info "Step 2/7: Shutting down VM $VMID..."
+if ! ssh "$PROXMOX_HOST" "qm shutdown $VMID"; then
+    error "Failed to send shutdown command to VM $VMID."
+    exit 1
+fi
+
+info "Waiting for VM to stop (timeout: ${SHUTDOWN_TIMEOUT}s)..."
+elapsed=0
+while true; do
+    status=$(ssh "$PROXMOX_HOST" "qm status $VMID" 2>/dev/null)
+    if [[ "$status" == *"stopped"* ]]; then
+        break
+    fi
+    if [[ $elapsed -ge $SHUTDOWN_TIMEOUT ]]; then
+        error "VM $VMID did not stop within ${SHUTDOWN_TIMEOUT}s. Current status: $status"
+        exit 1
+    fi
+    sleep "$POLL_INTERVAL"
+    elapsed=$((elapsed + POLL_INTERVAL))
+done
+ok "VM stopped."
+
+# --- Step 3: Resize disk ---
+info "Step 3/7: Resizing disk scsi0 by $SIZE_INCREMENT..."
+if ! ssh "$PROXMOX_HOST" "qm resize $VMID scsi0 $SIZE_INCREMENT"; then
+    error "Failed to resize disk on VM $VMID."
+    exit 1
+fi
+ok "Disk resized."
+
+# --- Step 4: Start VM ---
+info "Step 4/7: Starting VM $VMID..."
+if ! ssh "$PROXMOX_HOST" "qm start $VMID"; then
+    error "Failed to start VM $VMID."
+    exit 1
+fi
+
+info "Waiting for SSH to become available at $NODE_IP (timeout: ${SSH_WAIT_TIMEOUT}s)..."
+elapsed=0
+while true; do
+    if ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no "$VM_SSH_USER@$NODE_IP" "true" 2>/dev/null; then
+        break
+    fi
+    if [[ $elapsed -ge $SSH_WAIT_TIMEOUT ]]; then
+        error "SSH not reachable on $NODE_IP within ${SSH_WAIT_TIMEOUT}s."
+        exit 1
+    fi
+    sleep "$POLL_INTERVAL"
+    elapsed=$((elapsed + POLL_INTERVAL))
+done
+ok "VM is up and SSH is reachable."
+
+info "Waiting 10s for system stabilization..."
+sleep 10
+
+# --- Step 5: Expand filesystem ---
+info "Step 5/7: Expanding filesystem inside the guest..."
+ssh -o StrictHostKeyChecking=no "$VM_SSH_USER@$NODE_IP" 'bash -s' <<'REMOTE_SCRIPT'
+set -o pipefail
+
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[0;33m'
+BLUE='\033[0;34m'
+NC='\033[0m'
+
+info()  { echo -e "${BLUE}[INFO]${NC} $*"; }
+ok()    { echo -e "${GREEN}[OK]${NC} $*"; }
+warn()  { echo -e "${YELLOW}[WARN]${NC} $*"; }
+error() { echo -e "${RED}[ERROR]${NC} $*"; }
+
+ROOT_DEV=$(findmnt -n -o SOURCE /)
+ROOT_FSTYPE=$(findmnt -n -o FSTYPE /)
+info "Root device: $ROOT_DEV"
+info "Root filesystem: $ROOT_FSTYPE"
+
+# Ensure growpart is available
+if ! command -v growpart &>/dev/null; then
+    info "Installing growpart (cloud-guest-utils)..."
+    sudo apt-get update -qq && sudo apt-get install -y -qq cloud-guest-utils
+fi
+
+resize_fs() {
+    local dev="$1"
+    local fstype="$2"
+    if [[ "$fstype" == "ext4" || "$fstype" == "ext3" || "$fstype" == "ext2" ]]; then
+        info "Running resize2fs on $dev..."
+        if ! sudo resize2fs "$dev"; then
+            error "resize2fs failed on $dev"
+            return 1
+        fi
+    elif [[ "$fstype" == "xfs" ]]; then
+        info "Running xfs_growfs on /..."
+        if ! sudo xfs_growfs /; then
+            error "xfs_growfs failed"
+            return 1
+        fi
+    else
+        error "Unsupported filesystem type: $fstype"
+        return 1
+    fi
+    return 0
+}
+
+# Check if root is on LVM (device-mapper)
+if [[ "$ROOT_DEV" == /dev/mapper/* || "$ROOT_DEV" == /dev/dm-* ]]; then
+    info "LVM layout detected."
+
+    # Find the PV device
+    PV_DEV=$(sudo pvs --noheadings -o pv_name | head -1 | tr -d ' ')
+    if [[ -z "$PV_DEV" ]]; then
+        error "Could not determine PV device."
+        exit 1
+    fi
+    info "PV device: $PV_DEV"
+
+    # Parse disk and partition number (handles /dev/sdaX and /dev/nvmeXnXpX)
+    if [[ "$PV_DEV" =~ ^(/dev/nvme[0-9]+n[0-9]+)p([0-9]+)$ ]]; then
+        DISK="${BASH_REMATCH[1]}"
+        PARTNUM="${BASH_REMATCH[2]}"
+    elif [[ "$PV_DEV" =~ ^(/dev/[a-z]+)([0-9]+)$ ]]; then
+        DISK="${BASH_REMATCH[1]}"
+        PARTNUM="${BASH_REMATCH[2]}"
+    else
+        error "Could not parse disk/partition from PV: $PV_DEV"
+        exit 1
+    fi
+    info "Disk: $DISK, Partition: $PARTNUM"
+
+    # Grow partition
+    info "Growing partition $DISK partition $PARTNUM..."
+    sudo growpart "$DISK" "$PARTNUM" || echo "(growpart: partition may already be at max size)"
+
+    # Resize PV
+    info "Resizing PV $PV_DEV..."
+    if ! sudo pvresize "$PV_DEV"; then
+        error "pvresize failed on $PV_DEV"
+        exit 1
+    fi
+
+    # Resolve LV path if using /dev/dm-*
+    if [[ "$ROOT_DEV" == /dev/dm-* ]]; then
+        LV_PATH=$(sudo lvs --noheadings -o lv_path | head -1 | tr -d ' ')
+    else
+        LV_PATH="$ROOT_DEV"
+    fi
+    info "LV path: $LV_PATH"
+
+    # Extend LV
+    info "Extending LV $LV_PATH to use all free space..."
+    if ! sudo lvextend -l +100%FREE "$LV_PATH"; then
+        warn "lvextend reported no change (LV may already use all space)."
+    fi
+
+    # Resize filesystem
+    resize_fs "$LV_PATH" "$ROOT_FSTYPE"
+    if [[ $? -ne 0 ]]; then
+        exit 1
+    fi
+else
+    info "Direct partition layout detected."
+
+    # Parse disk and partition number
+    if [[ "$ROOT_DEV" =~ ^(/dev/nvme[0-9]+n[0-9]+)p([0-9]+)$ ]]; then
+        DISK="${BASH_REMATCH[1]}"
+        PARTNUM="${BASH_REMATCH[2]}"
+    elif [[ "$ROOT_DEV" =~ ^(/dev/[a-z]+)([0-9]+)$ ]]; then
+        DISK="${BASH_REMATCH[1]}"
+        PARTNUM="${BASH_REMATCH[2]}"
+    else
+        error "Could not parse disk/partition from: $ROOT_DEV"
+        exit 1
+    fi
+    info "Disk: $DISK, Partition: $PARTNUM"
+
+    # Grow partition
+    info "Growing partition $DISK partition $PARTNUM..."
+    sudo growpart "$DISK" "$PARTNUM" || echo "(growpart: partition may already be at max size)"
+
+    # Resize filesystem
+    resize_fs "$ROOT_DEV" "$ROOT_FSTYPE"
+    if [[ $? -ne 0 ]]; then
+        exit 1
+    fi
+fi
+
+ok "Filesystem expansion complete."
+df -h /
+REMOTE_SCRIPT
+
+if [[ $? -ne 0 ]]; then
+    error "Filesystem expansion failed on the guest."
+    exit 1
+fi
+ok "Filesystem expanded."
+
+# --- Step 6: Uncordon node ---
+info "Step 6/7: Uncordoning node '$NODE_NAME'..."
+if ! $KUBECTL uncordon "$NODE_NAME"; then
+    error "Failed to uncordon node '$NODE_NAME'."
+    exit 1
+fi
+DRAINED_NODE=""
+ok "Node uncordoned."
+
+# --- Step 7: Verify ---
+info "Step 7/7: Verification"
+echo ""
+info "Disk usage on $NODE_NAME:"
+ssh -o StrictHostKeyChecking=no "$VM_SSH_USER@$NODE_IP" "df -h /"
+echo ""
+info "Node status:"
+$KUBECTL get node "$NODE_NAME"
+echo ""
+ok "Storage extension complete for $NODE_NAME."