infra/modules/create-vm/main.tf
Viktor Barzin 445feb118f infra: per-VM I/O caps + terragrunt v0.77 plumbing + state recovery
WHAT LANDED:
- terragrunt.hcl (root): added telmate/proxmox to k8s_providers
  required_providers. Other stacks just don't instantiate a provider
  block — harmless. Replaces the same-name override trick the infra
  stack used to do, which stopped working under Terragrunt v0.77
  ("Detected generate blocks with the same name").
- stacks/infra/terragrunt.hcl: new generate "proxmox_provider" block
  writes proxmox_provider.tf with the provider config; credentials
  read from Vault secret/viktor at plan/apply time (no env vars).
- modules/create-vm: new mbps_rd / mbps_wr number variables (default 0
  = uncapped), wired into scsi0/scsi1 disk{} blocks as
  mbps_r_concurrent / mbps_wr_concurrent. lifecycle.ignore_changes
  extended to scsi6..scsi29 (K8s nodes have many CSI-managed slots),
  plus scsihw and qemu_os (vary per-VM; non-trivial live changes).
- stacks/infra/main.tf: docker-registry-vm gains mbps_rd=40,
  mbps_wr=40 in HCL — already applied live via qm set on 2026-05-26.

WHAT FAILED AND WAS ROLLED BACK:
- Attempted import of 7 VMs (102 devvm, 103 home-assistant, 200
  k8s-master, 201 k8s-node1, 202 k8s-node2, 203 k8s-node3, 204
  k8s-node4) via import {} blocks. The telmate/proxmox v3.0.2-rc07
  provider mangled proxmox-csi PVC slots on apply for vmid 202 and
  203: every scsi slot got rewritten from `vm-9999-pvc-<uuid>` to
  the boot disk `vm-<vmid>-disk-0`. Restored both .conf files from
  the 2026-05-24 nightly PVE config backup at /mnt/backup/pve-config/
  etc-pve/nodes/pve/qemu-server/{202,203}.conf — no reboots, no data
  loss, K8s CSI reconciled PVC attachments within minutes. Removed
  the 7 imports from state via `terraform state rm` and re-encrypted.
  Tracked in beads code-xzbl: blocked on bpg/proxmox provider
  migration (telmate has the same dynamic-disk defect that bit us on
  iSCSI back in 2026-04-02; see memory id=539).

LIVE CAPS STILL IN PLACE (qm set, 2026-05-26 ~03:13 UTC):
  102 devvm 60/60   103 home-assistant 40/40   200 k8s-master 100/60
  201 k8s-node1 150/120   202 k8s-node2 150/120   203 k8s-node3 150/120
  204 k8s-node4 150/120   220 docker-registry 40/40
  (pfSense 101 BSD + Windows10 300 intentionally out of scope.)

PRE-EXISTING DRIFT EXPOSED (NOT NEW):
- HCL declares k8s-master (200) and k8s-node2 (202) but neither was
  ever imported into TF state — confirmed against the SOPS-encrypted
  state in git (lineage e1cc5bb5, serial 42, last touched 2026-04-06).
  This commit leaves both declarations in place but does NOT import
  them; that's part of the code-xzbl follow-up.

Closes: code-s9xr
2026-05-26 06:46:47 +00:00

313 lines
8.9 KiB
HCL

# ---------------------------------------------------------------------------
# Variables — Required
# ---------------------------------------------------------------------------
variable "vm_name" { type = string }
variable "vmid" {
type = number
default = 0
}
variable "cisnippet_name" {
type = string
default = ""
}
variable "bridge" { type = string }
# ---------------------------------------------------------------------------
# Variables — VM sizing
# ---------------------------------------------------------------------------
variable "vm_cpus" {
type = number
default = 4
}
variable "cpu_sockets" {
type = number
default = 1
}
variable "vm_mem_mb" {
type = number
default = 8192
}
variable "vm_disk_size" {
type = string
default = "64G"
}
variable "balloon" {
type = number
default = 0 # 0 = disabled (recommended for k8s nodes)
}
# ---------------------------------------------------------------------------
# Variables — VM identity & networking
# ---------------------------------------------------------------------------
variable "vm_mac_address" {
type = string
default = null
}
variable "vlan_tag" {
type = string
default = null
}
variable "ipconfig0" {
type = string
default = "ip=dhcp,ip6=dhcp"
}
# ---------------------------------------------------------------------------
# Variables — Boot & hardware
# ---------------------------------------------------------------------------
variable "template_name" {
type = string
default = "" # empty = no clone (for importing existing VMs)
}
variable "scsihw" {
type = string
default = "virtio-scsi-pci"
}
variable "boot" {
type = string
default = "order=scsi0"
}
variable "boot_disk" {
type = string
default = "" # e.g., "scsi0" — only set if boot = "c" (legacy)
}
variable "disk_slot" {
type = string
default = "scsi0" # which SCSI slot the OS disk is on
}
variable "agent" {
type = number
default = 1
}
variable "qemu_os" {
type = string
default = "l26"
}
variable "numa" {
type = bool
default = false
}
variable "machine" {
type = string
default = "" # empty = provider default. Use "q35" for GPU passthrough
}
# ---------------------------------------------------------------------------
# Variables — Startup/shutdown ordering
# ---------------------------------------------------------------------------
variable "startup_order" {
type = number
default = -1
}
variable "startup_delay" {
type = number
default = -1
}
variable "shutdown_timeout" {
type = number
default = -1
}
# ---------------------------------------------------------------------------
# Variables — Cloud-Init (optional — disable for non-cloud-init VMs)
# ---------------------------------------------------------------------------
variable "use_cloud_init" {
type = bool
default = true
}
variable "ssh_keys" {
type = string
default = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIDHLhYDfyx237eJgOGVoJRECpUS95+7rEBS9vacsIxtx devvm"
}
# ---------------------------------------------------------------------------
# Variables — GPU / PCI passthrough
# ---------------------------------------------------------------------------
variable "hostpci0" {
type = string
default = "" # e.g., "0000:06:00.0" for Tesla T4 passthrough
}
# ---------------------------------------------------------------------------
# Variables — Disk I/O throttling (bytes/sec; 0 = uncapped)
# ---------------------------------------------------------------------------
# Caps any single VM's share of the underlying disk so a runaway workload
# (e.g. the 2026-05-23/26 alloy IO storm — memory id=2726) cannot wedge the
# whole Proxmox host's sdc thin pool. Values inferred from PVE RRD p99/max
# observed in /nodes/pve/qemu/<vmid>/rrddata.
variable "mbps_rd" {
type = number
default = 0
}
variable "mbps_wr" {
type = number
default = 0
}
# ---------------------------------------------------------------------------
# Resource
# ---------------------------------------------------------------------------
resource "proxmox_vm_qemu" "cloudinit-vm" {
vmid = var.vmid
name = var.vm_name
target_node = "pve"
agent = var.agent
memory = var.vm_mem_mb
balloon = var.balloon
boot = var.boot
bootdisk = var.boot_disk != "" ? var.boot_disk : null
clone = var.template_name != "" ? var.template_name : null
full_clone = var.template_name != "" ? true : false
scsihw = var.scsihw
vm_state = "running"
automatic_reboot = false # never let Terraform reboot VMs — use /reboot-server skill instead
os_type = var.use_cloud_init ? "cloud-init" : null
machine = var.machine != "" ? var.machine : null
# Cloud-Init configuration (only when use_cloud_init = true)
cicustom = var.use_cloud_init && var.cisnippet_name != "" ? "vendor=local:snippets/${var.cisnippet_name}" : null
ciupgrade = var.use_cloud_init ? true : null
nameserver = var.use_cloud_init ? "1.1.1.1 8.8.8.8" : null
ipconfig0 = var.use_cloud_init ? var.ipconfig0 : null
skip_ipv6 = var.use_cloud_init ? true : null
ciuser = var.use_cloud_init ? "root" : null
cipassword = var.use_cloud_init ? "root" : null
sshkeys = var.use_cloud_init ? var.ssh_keys : null
searchdomain = var.use_cloud_init ? "viktorbarzin.lan" : null
start_at_node_boot = true
qemu_os = var.qemu_os
cpu {
cores = var.vm_cpus
sockets = var.cpu_sockets
type = "host"
}
startup_shutdown {
order = var.startup_order
shutdown_timeout = var.shutdown_timeout
startup_delay = var.startup_delay
}
serial {
id = 0
}
disks {
scsi {
dynamic "scsi0" {
for_each = var.disk_slot == "scsi0" ? [1] : []
content {
disk {
storage = "local-lvm"
size = var.vm_disk_size
discard = true # Enable TRIM passthrough to LVM thin pool — reduces CoW overhead
mbps_r_concurrent = var.mbps_rd
mbps_wr_concurrent = var.mbps_wr
}
}
}
dynamic "scsi1" {
for_each = var.disk_slot == "scsi1" ? [1] : []
content {
disk {
storage = "local-lvm"
size = var.vm_disk_size
discard = true
mbps_r_concurrent = var.mbps_rd
mbps_wr_concurrent = var.mbps_wr
}
}
}
}
dynamic "ide" {
for_each = var.use_cloud_init ? [1] : []
content {
ide1 {
cloudinit {
storage = "local-lvm"
}
}
}
}
}
network {
id = 0
bridge = var.bridge
model = "virtio"
macaddr = var.vm_mac_address
tag = var.vlan_tag
}
# Safety: ignore dynamically-attached iSCSI PVC disks (managed by democratic-csi)
# and cloud-init changes that drift after initial provisioning
lifecycle {
prevent_destroy = true
ignore_changes = [
# proxmox-csi dynamically attaches/detaches PVC disks. K8s workers
# have up to ~30 slots in use simultaneously (k8s-node1: scsi1-29 +
# unused0-29). The k8s-master only uses scsi0 (boot) so most of
# these are no-ops for that VM but harmless.
disks[0].scsi[0].scsi1,
disks[0].scsi[0].scsi2,
disks[0].scsi[0].scsi3,
disks[0].scsi[0].scsi4,
disks[0].scsi[0].scsi5,
disks[0].scsi[0].scsi6,
disks[0].scsi[0].scsi7,
disks[0].scsi[0].scsi8,
disks[0].scsi[0].scsi9,
disks[0].scsi[0].scsi10,
disks[0].scsi[0].scsi11,
disks[0].scsi[0].scsi12,
disks[0].scsi[0].scsi13,
disks[0].scsi[0].scsi14,
disks[0].scsi[0].scsi15,
disks[0].scsi[0].scsi16,
disks[0].scsi[0].scsi17,
disks[0].scsi[0].scsi18,
disks[0].scsi[0].scsi19,
disks[0].scsi[0].scsi20,
disks[0].scsi[0].scsi21,
disks[0].scsi[0].scsi22,
disks[0].scsi[0].scsi23,
disks[0].scsi[0].scsi24,
disks[0].scsi[0].scsi25,
disks[0].scsi[0].scsi26,
disks[0].scsi[0].scsi27,
disks[0].scsi[0].scsi28,
disks[0].scsi[0].scsi29,
# cloud-init config may drift after first boot
cicustom,
ciupgrade,
ciuser,
cipassword,
sshkeys,
# SMBIOS UUID and vmgenid are auto-generated
smbios,
# Tags and description may be edited in Proxmox UI
tags,
desc,
# Provider defaults that differ from imported state
define_connection_info,
full_clone,
# scsihw varies per VM (virtio-scsi-pci / virtio-scsi-single / lsi)
# and changing it on a running VM is risky — leave whatever's live.
scsihw,
# qemu_os is a hint to qemu about the guest OS; some live VMs have
# "other" (unset originally) and the module's "l26" default would
# otherwise force an unnecessary write on apply.
qemu_os,
]
}
}