infra

Viktor Barzin 5768216d0e anubis: HA with shared valkey/redis store + replicas=2 Anubis pre-2026-05-16 ran at replicas=1 because in-flight PoW challenge state lived in process memory — a challenge issued by pod A wouldn't be verifiable by pod B (HTTP 500 "store: key not found"). The PDB at `minAvailable=1` made this worse: with replicas=1 the eviction API can NEVER satisfy the constraint, so every drain on a node hosting an Anubis pod looped forever. This is what stalled the manual K8s upgrade on 2026-05-11 (had to delete pods directly to bypass eviction) and was about to block kured on Monday 2026-05-18 once the kured sentinel fix landed. Anubis upstream has first-class support for a Valkey/Redis-protocol shared store (documented as the "Kubernetes worker pool" pattern). Wire it up: - modules/kubernetes/anubis_instance: add `shared_store_url` variable. When set, appends a `store: { backend: valkey, parameters: { url } }` block to the rendered policy YAML and defaults replicas to 2 (capped at 2). PDB switched from `minAvailable=1` to `maxUnavailable=1` so drains can take down one pod at a time. topologySpreadConstraint tightened to `DoNotSchedule` so the two replicas land on different nodes — a single node loss never takes a whole Anubis instance down. - All 8 call sites (cyberchef, jsoncrack, kms, homepage, blog, travel_blog, real-estate-crawler, f1-stream) opted in. Each picks a unique Redis DB index (5–12) on `redis-master.redis:6379`. Cluster Redis already runs HA via Sentinel + haproxy, no new infra needed. Verified: every Anubis Deployment now 2/2 Ready with pods on different nodes; PDBs allow 1 disruption; Redis DBs 5,7,8,10 already populated by live traffic post-apply; Palo Alto Networks scanner hit blog right after apply and the challenge log shows the new state path. Drain on any worker now succeeds without a `predrain_unstick` workaround — eviction API is satisfied because at most one pod is unavailable at a time, and the other replica keeps serving. Monday's kured reboot wave should roll through cleanly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>		2026-05-22 14:16:47 +00:00
..
create-template-vm	infra: re-enable unattended-upgrades with kured prometheus-gating	2026-05-22 14:16:41 +00:00
create-vm	ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks	2026-05-22 14:16:42 +00:00
docker-registry	[forgejo] Phase 4 final decommission: drop registry-private container + port 5050	2026-05-07 23:29:34 +00:00
kubernetes	anubis: HA with shared valkey/redis store + replicas=2	2026-05-22 14:16:47 +00:00