infra

Author	SHA1	Message	Date
Viktor Barzin	ee2c6517ba	fix(meshcentral): remove unused NFS modules after Wave 2 storage migration MeshCentral was migrated from NFS to proxmox-lvm storage (Wave 2). The old NFS modules for data and files are no longer used by the deployment, leaving behind orphaned PVCs (meshcentral-data, meshcentral-files). The backups volume remains on NFS per the backup strategy pattern. Changes: - Removed module.nfs_data and module.nfs_files from Terraform config - Active volumes now: meshcentral-data-proxmox, meshcentral-files-proxmox (proxmox-lvm) - Backups volume: meshcentral-backups (NFS) - unchanged Pod status: healthy, running on proxmox-lvm volumes.	2026-04-06 13:13:16 +03:00
Viktor Barzin	7b94abd54e	state(meshcentral): update encrypted state	2026-04-06 13:13:09 +03:00
Viktor Barzin	0162b4f130	priority-pass: update backend to v6 (remove edge artifact scratches)	2026-04-06 13:09:45 +03:00
Viktor Barzin	c38ae944fc	priority-pass: update backend to v5 (QR container sizing fix)	2026-04-06 13:06:18 +03:00
Viktor Barzin	9492874c43	fix: restore technitium MySQL query logging with Vault auto-rotation [ci skip] Query logs stopped syncing on 2026-03-16 due to password mismatch after MySQL cluster rebuild and Technitium app config reset. - Add Vault static role mysql-technitium (7-day rotation) - Add ExternalSecret for technitium-db-creds in technitium namespace - Add password-sync CronJob (6h) to push rotated password to Technitium API - Update Grafana datasource to use ESO-managed password - Remove stale technitium_db_password variable (replaced by ESO) - Update databases.md and restore-mysql.md runbook	2026-04-06 13:00:49 +03:00
Viktor Barzin	1d7244e47a	priority-pass: update backend to v4 (QR container clipping fix)	2026-04-06 13:00:33 +03:00
Viktor Barzin	0c44e11146	priority-pass: update backend to v3 (QR container layout fix)	2026-04-06 12:57:08 +03:00
Viktor Barzin	75b18717a1	priority-pass: update backend to v2 (QR code preservation fix)	2026-04-06 12:53:45 +03:00
Viktor Barzin	ef6f57e82c	priority-pass: update frontend image to v5 (clipboard paste support)	2026-04-06 12:44:19 +03:00
Viktor Barzin	3676cdbeeb	state(technitium): update encrypted state	2026-04-06 12:40:55 +03:00
Viktor Barzin	cd2d00703c	state(vault): update encrypted state	2026-04-06 12:40:54 +03:00
root	a038b2a2c4	Woodpecker CI deploy commit [CI SKIP]	2026-04-06 09:28:10 +00:00
Viktor Barzin	5b43e57efa	actualbudget: use internal ClusterIP for http-api server URL The http-api sidecar was connecting to the public URL (https://budget-.viktorbarzin.me) which goes through Traefik/Authentik. When pods got rescheduled to different nodes, this caused ETIMEDOUT errors. Changed to internal service URL (http://budget-.actualbudget.svc.cluster.local) which is fast and reliable regardless of pod placement.	2026-04-06 12:22:57 +03:00
Viktor Barzin	07bad79489	state(status-page): update encrypted state	2026-04-06 12:20:54 +03:00
Viktor Barzin	5fef9945de	state(actualbudget): update encrypted state	2026-04-06 12:20:28 +03:00
Viktor Barzin	4594472e69	state(status-page): update encrypted state	2026-04-06 12:20:01 +03:00
Viktor Barzin	aac02f0467	meshcentral: restore DB from backup; technitium: remove orphaned PVC - meshcentral: fix homepage annotations formatting (no functional change, serversscheme was tested but not needed since MeshCentral serves HTTP) - meshcentral: restored user DB from Dec 2024 backup (1428B → 45KB) - technitium: remove unused technitium-config-proxmox PVC (WaitForFirstConsumer, never mounted — primary uses NFS, replicas have their own proxmox PVCs)	2026-04-06 12:17:08 +03:00
Viktor Barzin	f675b1492f	state(meshcentral): update encrypted state	2026-04-06 12:10:41 +03:00
Viktor Barzin	bbc9ea3444	state(status-page): update encrypted state	2026-04-06 12:09:50 +03:00
Viktor Barzin	0de2fef9c9	misc: actualbudget, authentik, headscale, rybbit, terminal, dbaas updates - actualbudget: adjust resource config - authentik: add configuration - headscale: minor fix - rybbit: add resources - terminal: add terminal stack config - platform/dbaas: add config - infra: update lock file	2026-04-06 11:58:00 +03:00
Viktor Barzin	c2f9ca0d13	modules: improve create-vm with additional config options and cloud-init updates	2026-04-06 11:57:55 +03:00
Viktor Barzin	f9e85964ce	traefik: add middleware and platform traefik config updates	2026-04-06 11:57:52 +03:00
Viktor Barzin	dca06b8a00	freedify: increase memory limits and add new features	2026-04-06 11:57:47 +03:00
Viktor Barzin	61dc7a6862	nextcloud: refactor chart values and main.tf configuration	2026-04-06 11:57:44 +03:00
Viktor Barzin	fe342a974b	monitoring + proxmox-csi: LVM snapshot RBAC, pushgateway NodePort, backup dashboard - proxmox-csi: add RBAC for PVE host snapshot restore script - monitoring: expose Pushgateway via NodePort for PVE LVM snapshot metrics - monitoring: add backup health Grafana dashboard	2026-04-06 11:57:41 +03:00
Viktor Barzin	72d832fee7	add HA Sofia checks (26-29) to cluster healthcheck and backup-dr docs - Healthcheck: add entity availability, integration health, automation status, and system resources checks for Home Assistant Sofia - Docs: add backup-dr architecture documentation	2026-04-06 11:57:36 +03:00
Viktor Barzin	b0178cf6d2	technitium: add tertiary DNS replica and fix CoreDNS forward order - Add tertiary DNS deployment with zone-transfer replication for externalTrafficPolicy=Local coverage across more nodes - Reorder CoreDNS default forwarders: pfSense (10.0.20.1) first, then public DNS fallbacks (8.8.8.8, 1.1.1.1)	2026-04-06 11:57:31 +03:00
Viktor Barzin	f80e1fa868	cluster health fixes: NFS CSI, Immich ML, dbaas, Redis, DNS, trading-bot removal - NFS CSI: fix liveness-probe port conflict (29652 → 29653) - Immich ML: add gpu-workload priority class to enable preemption on node1 - dbaas: right-size MySQL memory limits (sidecar 6Gi→350Mi, main 4Gi→3Gi) - Redis: add redis-master service via HAProxy for master-only routing, update config.tfvars redis_host to use it - CoreDNS: forward .viktorbarzin.lan to Technitium ClusterIP (10.96.0.53) instead of stale LoadBalancer IP (10.0.20.200) - Trading bot: comment out all resources (no longer needed) - Vault: remove trading-bot PostgreSQL database role	2026-04-06 11:54:45 +03:00
Viktor Barzin	0115320d72	state(status-page): update encrypted state	2026-04-06 11:48:40 +03:00
Viktor Barzin	d0ed3cc3ce	state(status-page): update encrypted state	2026-04-06 11:31:45 +03:00
Viktor Barzin	8e6034c34d	state(status-page): update encrypted state	2026-04-06 11:29:48 +03:00
Viktor Barzin	9479a562f1	state(status-page): update encrypted state	2026-04-06 11:28:40 +03:00
Viktor Barzin	e0dcfd7694	state(status-page): update encrypted state	2026-04-06 11:27:16 +03:00
Viktor Barzin	9f91a3db88	state: update encrypted terraform state	2026-04-06 11:26:45 +03:00
Viktor Barzin	a486bbd66c	state(immich): update encrypted state	2026-04-06 10:50:34 +03:00
Viktor Barzin	c988c5b43e	state(nfs-csi): update encrypted state	2026-04-06 10:40:50 +03:00
Viktor Barzin	58e698c647	state(immich): update encrypted state	2026-04-06 10:38:43 +03:00
Viktor Barzin	9aea54674e	state(trading-bot): update encrypted state	2026-04-06 10:29:54 +03:00
Viktor Barzin	faa6868f79	remove claude-memory PDB (blocks drains with single replica) Single replica + minAvailable=1 makes drains impossible. claude-memory is non-critical and recovers quickly. [ci skip]	2026-04-06 00:47:40 +03:00
Viktor Barzin	70c870a2ed	state: update encrypted terraform state	2026-04-06 00:37:58 +03:00
Viktor Barzin	dcda285d9b	state(authentik): update encrypted state	2026-04-06 00:35:33 +03:00
Viktor Barzin	88307e3e5f	state(headscale): update encrypted state	2026-04-06 00:33:54 +03:00
Viktor Barzin	c8be07c403	resilience improvements: MySQL anti-affinity comment, descheduler 5min, prometheus termination 60s - MySQL InnoDB: keep required anti-affinity but document why (2/3 members OK during node loss) - Descheduler: increase frequency from hourly to every 5 min for faster rebalancing - Prometheus: set terminationGracePeriodSeconds=60 to prevent drain timeout [ci skip]	2026-04-06 00:25:49 +03:00
Viktor Barzin	3eb15149e1	state(platform): update encrypted state	2026-04-06 00:25:21 +03:00
Viktor Barzin	2ead11d36b	state(descheduler): update encrypted state	2026-04-06 00:25:14 +03:00
Viktor Barzin	0f2ef356d6	fix: remove ISCSICSIControllerDown alert (democratic-csi decommissioned) iSCSI CSI (democratic-csi) was replaced by proxmox-csi in April 2026. Controller is intentionally scaled to 0. Remove the stale alert and update CSIDriverCrashLoop to monitor proxmox-csi instead of iscsi-csi.	2026-04-05 23:53:18 +03:00
Viktor Barzin	ae0585048a	fix: bump tier-1-cluster LimitRange max to 8Gi for MySQL 6Gi limit Kyverno's tier-1-cluster LimitRange had max=4Gi which blocked mysql-cluster-2 from starting after we bumped MySQL to 6Gi limit. Also added custom LimitRange in dbaas stack (for when Terraform manages it directly).	2026-04-05 23:31:23 +03:00
Viktor Barzin	aa62e789e5	state(kyverno): update encrypted state	2026-04-05 23:28:46 +03:00
Viktor Barzin	772f59d589	fix: add Vault-managed DB credentials for Matrix Synapse - Create dedicated 'matrix' PostgreSQL user (was using 'postgres' superuser) - Add Vault DB static role pg-matrix with 24h rotation - Add ExternalSecret matrix-db-creds syncing password from Vault - Add inject-db-password init container that patches homeserver.yaml with current Vault password on every pod start - Update dependency annotation to pg-cluster-rw.dbaas - Also updated Vault DB config to use pg-cluster-rw (was legacy postgresql.dbaas)	2026-04-05 23:18:16 +03:00
Viktor Barzin	e064778c2c	fix: resolve tandoor, matrix, navidrome crash loops - Tandoor: pin image to vabene1111/recipes:1.5.27 (latest tag pull failing with EOF from pull-through cache corruption) - Matrix: update homeserver.yaml to use pg-cluster-rw.dbaas instead of legacy postgresql.dbaas service, update CNPG postgres password - Navidrome: deleted corrupted SQLite DB (malformed disk image from proxmox-lvm migration), navidrome recreates fresh DB on startup	2026-04-05 23:12:49 +03:00

1 2 3 4 5 ...

2302 commits