Commit graph

16 commits

Author SHA1 Message Date
Viktor Barzin
6e1586342c
[ci skip] expand k8s worker nodes to 256G, update inventory and extend script
- k8s-node2: 128G → 256G (160GB free)
- k8s-node3: 128G → 256G (135GB free)
- k8s-node4: 128G → 256G (127GB free)
- k8s-node1: already 256G (51GB free)
- extend_vm_storage.sh: increase drain timeout to 300s, add --force flag
- Remove Vaultwarden from SQLite migration plan (too risky)
2026-02-28 16:00:16 +00:00
Viktor Barzin
6cc1da4bd6
[ci skip] revise storage reliability design based on research agent findings
Key changes from v1:
- Drop 3-instance replication → 2-instance CNPG, single Redis/MySQL
- Remove Headscale from PG migration (project discourages it)
- Remove MeshCentral from PG migration (NeDB, not SQLite)
- Replace Redis Sentinel with single redis:7 on local disk (modules unused)
- Add RAM overcommit warning and mitigation
- Add explicit single-host limitation acknowledgment
- Add per-component rollback plans
- Fix backup strategy (CNPG can't archive WAL to NFS natively)
- Reorder migration: low-risk services first, authentik last
- Add research gate before each service migration
2026-02-28 14:38:01 +00:00
Viktor Barzin
6381bcee40
[ci skip] add storage reliability design: DB replication + SQLite consolidation 2026-02-28 14:24:42 +00:00
Viktor Barzin
cf67e02135
[ci skip] Reduce node config drift: GPU label, OIDC idempotency, node-exporter, rebuild docs
- Add gpu=true label to Terraform (nvidia null_resource alongside taint)
- Improve API server OIDC config to detect value changes, not just flag presence
- Add policy_hash trigger to audit-policy so rule changes auto-reapply
- Enable prometheus-node-exporter sub-chart, delete unused Ansible playbook
- Document full node rebuild procedure in CLAUDE.md
- Save Talos Linux migration evaluation for future reference
2026-02-22 22:59:38 +00:00
Viktor Barzin
50daa14a1a
[ci skip] Add anti-AI scraping implementation plan 2026-02-22 19:41:39 +00:00
Viktor Barzin
45c8dfd890
[ci skip] Add anti-AI scraping system design doc 2026-02-22 19:37:29 +00:00
Viktor Barzin
534e63c9b8
[ci skip] Remove legacy files and orphaned modules
Delete 20 orphaned module directories and 3 stray files from
modules/kubernetes/ that are no longer referenced by any stack.
Remove 7 root-level legacy files including the empty tfstate,
27MB terraform zip, commented-out main.tf, and migration notes.
Clean up commented-out dockerhub_secret and oauth-proxy references
in blog, travel_blog, and city-guesser stacks. Remove stale
frigate config.yaml entry from .gitignore. Remove ephemeral
docs/plans/ directory.
2026-02-22 15:23:27 +00:00
Viktor Barzin
a7f909b159
[ci skip] Add Terragrunt migration implementation plan 2026-02-22 00:51:00 +00:00
Viktor Barzin
86648f684f
[ci skip] Add Terragrunt migration design document 2026-02-22 00:46:57 +00:00
Viktor Barzin
4700743560
[ci skip] Add OpenClaw cluster health agent implementation plan 2026-02-21 23:48:36 +00:00
Viktor Barzin
4f02ddfeda
[ci skip] Add OpenClaw cluster management agent design doc 2026-02-21 23:45:30 +00:00
Viktor Barzin
734c173f78
[ci skip] Add multi-user Kubernetes access implementation plan 2026-02-17 20:49:14 +00:00
Viktor Barzin
d9913cde2a
[ci skip] Add multi-user Kubernetes access design document 2026-02-17 20:44:23 +00:00
Viktor Barzin
f013c0a139
[ci skip] Fix code review findings: correct Alertmanager URL, add atomic to Loki, remove dead minio NFS export, update design doc 2026-02-13 23:08:44 +00:00
Viktor Barzin
ecffe93c22
[ci skip] Add centralized log collection implementation plan 2026-02-13 21:54:55 +00:00
Viktor Barzin
3d64fc9f2c
[ci skip] Add centralized log collection design doc 2026-02-13 21:53:04 +00:00