[docs] Add NFS prerequisite runbook for nfs_volume module [ci skip]
## Context `modules/kubernetes/nfs_volume` creates the K8s PV but NOT the underlying directory on the Proxmox NFS host (`192.168.1.127:/srv/nfs/<subdir>`). The first time a new consumer is added, the mount fails with `mount.nfs: … No such file or directory` and the pod hangs in ContainerCreating. This bit us twice during the Wave 1/2 rollout — once for the mailserver backup (code-z26) and again for the Roundcube backup (code-1f6). Both times the fix was `ssh root@192.168.1.127 'mkdir -p /srv/nfs/<subdir>'`. Rather than automate the SSH dependency into the module (which would break hermeticity and fail for operators without host SSH), this runbook documents the manual bootstrap step and the rationale. Addresses bd code-yo4. ## This change New file: `docs/runbooks/nfs-prerequisites.md`. Lists known consumers, gives the copy-paste SSH command, and explains why auto-creation was rejected (two options, neither worth the churn). ## What is NOT in this change - Any automation of the bootstrap — runbook only - Migration to `nfs-subdir-external-provisioner` — explicitly out of scope ## Test Plan ### Automated ``` $ cat docs/runbooks/nfs-prerequisites.md | head -5 # NFS Prerequisites for `modules/kubernetes/nfs_volume` The `nfs_volume` Terraform module creates a `PersistentVolume` pointing at a path on the Proxmox NFS server (`192.168.1.127`). It does **not** create the underlying directory on the server. ``` ### Manual Verification Before the next stack adds a new `nfs_volume` consumer, read the runbook and run the `ssh root@192.168.1.127 'mkdir -p ...'` step. First pod reaches Ready within a minute of the PV creation. Closes: code-yo4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
28009a0e85
commit
c6784f87b5
1 changed files with 66 additions and 0 deletions
66
docs/runbooks/nfs-prerequisites.md
Normal file
66
docs/runbooks/nfs-prerequisites.md
Normal file
|
|
@ -0,0 +1,66 @@
|
|||
# NFS Prerequisites for `modules/kubernetes/nfs_volume`
|
||||
|
||||
The `nfs_volume` Terraform module creates a `PersistentVolume` pointing at a
|
||||
path on the Proxmox NFS server (`192.168.1.127`). It does **not** create the
|
||||
underlying directory on the server.
|
||||
|
||||
If the path does not exist, the first pod that tries to mount the resulting
|
||||
PVC gets stuck in `ContainerCreating` with the kubelet event:
|
||||
|
||||
```
|
||||
MountVolume.SetUp failed for volume "<name>" : mount failed: exit status 32
|
||||
mount.nfs: mounting 192.168.1.127:/srv/nfs/<path> failed, reason given by
|
||||
server: No such file or directory
|
||||
```
|
||||
|
||||
## Bootstrap before first apply
|
||||
|
||||
Before adding a new `nfs_volume` consumer (backup CronJob, data PV, etc.),
|
||||
create the export root on the PVE host:
|
||||
|
||||
```sh
|
||||
# Replace <app> with the backup stack name, e.g. mailserver-backup,
|
||||
# roundcube-backup, immich-backup, etc.
|
||||
ssh root@192.168.1.127 'mkdir -p /srv/nfs/<app> && chmod 755 /srv/nfs/<app>'
|
||||
|
||||
# Confirm exports are live (no change to /etc/exports needed — `/srv/nfs`
|
||||
# is already exported via the root entry in pve-nfs-exports).
|
||||
ssh root@192.168.1.127 exportfs -v | grep '/srv/nfs\b'
|
||||
```
|
||||
|
||||
`/srv/nfs` is exported with the root entry. Subdirectories inherit the
|
||||
export automatically; they just have to exist on disk.
|
||||
|
||||
## Known consumers
|
||||
|
||||
| Consumer | NFS path | Owning stack |
|
||||
|--------------------------------|---------------------------------|--------------------------|
|
||||
| `mailserver-backup` | `/srv/nfs/mailserver-backup` | `stacks/mailserver/` |
|
||||
| `roundcube-backup` | `/srv/nfs/roundcube-backup` | `stacks/mailserver/` |
|
||||
| `mysql-backup` | `/srv/nfs/mysql-backup` | `stacks/dbaas/` |
|
||||
| `postgresql-backup` | `/srv/nfs/postgresql-backup` | `stacks/dbaas/` |
|
||||
| `vaultwarden-backup` | `/srv/nfs/vaultwarden-backup` | `stacks/vaultwarden/` |
|
||||
|
||||
Use `grep -rn 'nfs_volume' infra/stacks/` to find all active consumers.
|
||||
|
||||
## Why not auto-create?
|
||||
|
||||
Two options were considered for automating this:
|
||||
|
||||
1. `null_resource` + `local-exec` SSH `mkdir` in the `nfs_volume` module —
|
||||
works but adds an SSH dependency to every Terraform run, makes the
|
||||
module non-hermetic, and fails if the operator does not have SSH to
|
||||
the PVE host.
|
||||
2. `nfs-subdir-external-provisioner` — handles subdirs automatically but
|
||||
changes the PV/PVC shape and would require migrating all existing
|
||||
consumers.
|
||||
|
||||
Neither is worth the churn for a one-time operation per new backup stack.
|
||||
Document + checklist is the current call; re-evaluate if we start adding
|
||||
one NFS consumer per week.
|
||||
|
||||
## Related tasks
|
||||
|
||||
- `code-yo4` — this runbook
|
||||
- `code-z26` — mailserver backup CronJob (first-time setup hit this)
|
||||
- `code-1f6` — Roundcube backup CronJob (also hit this)
|
||||
Loading…
Add table
Add a link
Reference in a new issue