Viktor Barzin
b1b408ff0e
fix: use full path to claude CLI for non-interactive SSH
2026-04-14 17:44:50 +00:00
Viktor Barzin
7674cf8c5c
docs: final E2E pipeline test
2026-04-14 17:43:38 +00:00
Viktor Barzin
f2e7367401
fix: use sh instead of bash in pipeline (Alpine compat)
2026-04-14 17:29:14 +00:00
Viktor Barzin
91b97709b7
docs: trigger postmortem pipeline with TODO
2026-04-14 17:27:45 +00:00
Viktor Barzin
c742fa3dfb
fix: scan all post-mortems for TODOs (no git diff needed)
2026-04-14 17:14:22 +00:00
Viktor Barzin
f336e5ed53
docs: E2E test postmortem pipeline with deep clone
2026-04-14 17:12:46 +00:00
Viktor Barzin
0b2f5a4729
fix: use depth 5 clone for postmortem pipeline (need HEAD~1)
2026-04-14 17:12:41 +00:00
Viktor Barzin
59367cc588
fix: handle Woodpecker shallow clone in postmortem pipeline
2026-04-14 17:12:02 +00:00
Viktor Barzin
60c04e51b7
2026-04-14 17:10:45 +00:00
Viktor Barzin
933c562aa9
docs: trigger postmortem pipeline E2E test
2026-04-14 16:49:07 +00:00
Viktor Barzin
ce7a4e6e76
fix: Woodpecker v3 secrets→environment migration
2026-04-14 16:47:17 +00:00
Viktor Barzin
8540f48a28
fix: move pipeline logic to shell script (avoid YAML quoting issues)
2026-04-14 16:46:42 +00:00
Viktor Barzin
df95f52d08
docs: test postmortem with TODO for pipeline E2E
2026-04-14 16:45:44 +00:00
Viktor Barzin
7f5115f9fe
fix: Woodpecker pipeline YAML quoting + trigger test [ci skip]
2026-04-14 16:45:27 +00:00
Viktor Barzin
b3cc5fcc32
test: trigger postmortem pipeline webhook
2026-04-14 16:44:11 +00:00
Viktor Barzin
777450cb19
docs: test post-mortem for pipeline E2E validation
2026-04-14 15:55:32 +00:00
Viktor Barzin
8ad674e7b1
fix: postmortem pipeline uses Vault for SSH key (not Woodpecker secrets)
...
Pipeline authenticates to Vault via K8s SA JWT, fetches devvm_ssh_key
from secret/ci/infra, SSHes to DevVM to run Claude Code headlessly.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:55:12 +00:00
Viktor Barzin
a703c6e84f
docs: update post-mortem follow-up implementation [PM-2026-04-14] [ci skip]
...
Added Uptime Kuma TCP monitor for PVE NFS (192.168.1.127:2049), ID 328,
Tier 1 (30s/3 retries). Investigation TODO flagged for human review.
Co-Authored-By: postmortem-todo-resolver <noreply@anthropic.com>
2026-04-14 15:48:11 +00:00
Viktor Barzin
8badb8181a
feat: post-mortem automation pipeline
...
E2E workflow for incident post-mortems:
1. /post-mortem skill generates structured post-mortem markdown
2. Woodpecker pipeline triggers on docs/post-mortems/*.md changes
3. parse-postmortem-todos.sh extracts safe TODOs (Alert/Config/Monitor)
4. postmortem-todo-resolver agent implements TODOs headlessly
5. Agent updates post-mortem with Follow-up Implementation table
Components:
- .claude/skills/post-mortem/ — writer skill + template
- .claude/agents/postmortem-todo-resolver.md — headless agent
- .woodpecker/postmortem-todos.yml — CI pipeline
- scripts/parse-postmortem-todos.sh — TODO extractor
- cluster-health skill — auto-suggest post-mortem after recovery
Safety: only auto-implements Alert/Config/Monitor types.
Architecture/Migration/Investigation items are skipped.
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:34:42 +00:00
Viktor Barzin
e832581caf
docs: update Apr 14 post-mortem with Phase 2 findings
...
Key additions:
- NFSv3 broke after NFS restart (kernel lockd bug on PVE 6.14)
- All 52 PVs migrated to NFSv4, NFSv3 disabled on PVE
- DNS zone sync gap: secondary/tertiary had no custom zones
- Converted one-time setup Job to recurring zone-sync CronJob
- MySQL, Redis, Vault collateral damage and fixes
- 3 new lessons learned (zone replication, NFS client state, operator rollout)
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:26:11 +00:00
Viktor Barzin
803cb5fd26
fix: convert Technitium zone sync from one-time Job to CronJob
...
Secondary/tertiary DNS instances had no custom zones — only the
primary had viktorbarzin.lan and viktorbarzin.me. The old setup Job
ran once at deployment and never synced new zones.
New CronJob runs every 30 minutes:
- Gets all zones from primary
- Enables zone transfer on primary
- Creates missing zones as Secondary type on replicas
- Resyncs existing zones via AXFR
Fixes .lan resolution failures (2/3 queries returned NXDOMAIN).
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:18:19 +00:00
Viktor Barzin
c0a33b5157
state(technitium): update encrypted state
2026-04-14 12:17:29 +00:00
Viktor Barzin
5ff26dd8ef
state(technitium): update encrypted state
2026-04-14 12:13:27 +00:00
Viktor Barzin
30cdeefb1c
chore: sync terraform state after nfsvers=4 convergence
...
Applied all 20 NFS stacks to converge PV mount_options (nfsvers=4).
State files encrypted and committed.
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 11:20:18 +00:00
Viktor Barzin
bb2731256b
state(immich): update encrypted state
2026-04-14 11:19:06 +00:00
Viktor Barzin
99e2bc1bef
state(immich): update encrypted state
2026-04-14 11:19:00 +00:00
Viktor Barzin
ac6ec06afe
state(ollama): update encrypted state
2026-04-14 11:18:57 +00:00
Viktor Barzin
7a60108e97
state(ollama): update encrypted state
2026-04-14 11:18:20 +00:00
Viktor Barzin
a39b90bbcc
state(ollama): update encrypted state
2026-04-14 11:18:10 +00:00
Viktor Barzin
a25739a572
state(poison-fountain): update encrypted state
2026-04-14 11:13:32 +00:00
Viktor Barzin
d9ddf102ec
state(plotting-book): update encrypted state
2026-04-14 11:13:02 +00:00
Viktor Barzin
6d209fffad
state(meshcentral): update encrypted state
2026-04-14 11:11:59 +00:00
Viktor Barzin
d0805ed2a8
state(infra-maintenance): update encrypted state
2026-04-14 11:11:09 +00:00
Viktor Barzin
28264e69c6
state(headscale): update encrypted state
2026-04-14 11:11:05 +00:00
Viktor Barzin
1738c3437c
state(frigate): update encrypted state
2026-04-14 11:09:30 +00:00
Viktor Barzin
fe42993446
state(ebook2audiobook): update encrypted state
2026-04-14 11:08:37 +00:00
Viktor Barzin
23140cf780
state(real-estate-crawler): update encrypted state
2026-04-14 11:08:24 +00:00
Viktor Barzin
d24e4aac0b
state(osm_routing): update encrypted state
2026-04-14 11:08:09 +00:00
Viktor Barzin
94b7097789
state(openclaw): update encrypted state
2026-04-14 11:08:05 +00:00
Viktor Barzin
25f4682dc0
state(nextcloud): update encrypted state
2026-04-14 11:06:41 +00:00
Viktor Barzin
aac81e0a1f
state(vault): update encrypted state
2026-04-14 11:06:27 +00:00
Viktor Barzin
047f695129
state(ytdlp): update encrypted state
2026-04-14 11:06:11 +00:00
Viktor Barzin
20e86e96a3
state(servarr): update encrypted state
2026-04-14 11:05:54 +00:00
Viktor Barzin
0d6b6cbd95
state(navidrome): update encrypted state
2026-04-14 11:05:10 +00:00
Viktor Barzin
9ea3b33a55
state(ebooks): update encrypted state
2026-04-14 10:54:47 +00:00
Viktor Barzin
ea18116da9
fix: NFS outage recovery — migrate to NFSv4, add alerting
...
NFS server restart broke NFSv3 (lockd kernel bug on PVE 6.14).
All 52 NFS PVs patched to nfsvers=4, NFSv3 disabled on PVE.
Changes:
- nfs_volume module: add nfsvers=4 mount option
- nfs-csi StorageClass: add nfsvers=4 mount option
- dbaas: MySQL serverInstances 3→1, mysql-native-password=ON
- monitoring: add NFSCSINodeDown and NFSMountFailures alerts
[ci skip]
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:28:27 +00:00
Viktor Barzin
92900b5e08
state(dbaas): update encrypted state
2026-04-14 10:27:04 +00:00
Viktor Barzin
b4b6fd5946
state(nfs-csi): update encrypted state
2026-04-14 09:32:41 +00:00
Viktor Barzin
30e5150ecd
state(status-page): update encrypted state
2026-04-14 09:31:50 +00:00
Viktor Barzin
ac3a6a96dd
state(hermes-agent): update encrypted state
2026-04-14 09:04:35 +00:00