Complete rewrite of the user-facing documentation:
- How to report outages and request features
- Mermaid flow diagrams for both incident and feature request paths
- SLA expectations (automated vs human response times)
- Self-service checks before reporting
- Severity level definitions
- Status page explanation
- Full technical architecture section with component inventory
- Safety guardrails, labels, and commit conventions
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add cleanup-failed-pods policy that runs hourly (at :15) to delete all
pods in Failed phase cluster-wide. Prevents stale evicted and failed
CronJob pods from accumulating and creating healthcheck noise.
Also adds ClusterRole + ClusterRoleBinding to grant Kyverno cleanup
controller permission to delete Pods (not included by default).
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
DevVM may have unstaged changes from active sessions. Use git stash
before pull to avoid 'cannot pull with rebase: unstaged changes' errors.
Stash pop after to restore working state.
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Woodpecker injects manual pipeline variables as direct env vars
(e.g., $ISSUE_NUMBER), not as CI_PIPELINE_VARIABLE_* prefixed vars.
The provision-user pipeline already uses this pattern correctly.
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
build-ci-image.yml had event:[push,manual] which caused it to run
on every manual pipeline trigger. Its registry_user/registry_password
secrets don't have the manual event, causing all manual pipelines to
error. Removed manual from its event list since it only needs push.
Reverted evaluate conditions (Woodpecker evaluates secrets before
conditions, so evaluate can't prevent missing-secret errors).
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When GHA triggers a manual pipeline for issue automation, ALL pipelines
with event:manual fire. Added evaluate conditions:
- issue-automation.yml: only runs when ISSUE_NUMBER is set
- provision-user.yml: only runs when ISSUE_NUMBER is NOT set
- build-ci-image.yml: only runs when ISSUE_NUMBER is NOT set
This prevents build-ci-image from failing on missing registry_password
secret when issue automation triggers.
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- daily-backup: handle rsync exit 23 (partial transfer) as OK for LUKS
noload mounts — in-flight writes have corrupt metadata from skipped
journal replay, but core data is intact
- daily-backup: clean up stale LUKS dm mappings from previous crashed
runs before attempting to open
- daily-backup: capture rsync exit code safely with set -e (|| pattern)
- kyverno: bump tier-4-aux requests.memory 2Gi→3Gi (servarr was at 83%)
- actualbudget: patched custom quota 5Gi→6Gi (was at 82%)
Verified: backup now completes status=0 (96 PVCs OK, 0 failed)
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>