infra

History

OpenClaw 28cc7aea1f fix(monitoring): Expand Loki PVC from 15GB to 50GB to resolve storage exhaustion ISSUE RESOLVED: - Root cause: Loki's 15GB iSCSI PVC was completely full - Symptom: 'no space left on device' errors during TSDB operations - Impact: Loki service completely down, logging unavailable - Side effects: Contributed to node2 containerd corruption incident SOLUTION APPLIED: - Expanded PVC storage: 15Gi → 50Gi via direct kubectl patch - Triggered pod restart to complete filesystem resize - Verified successful expansion and service recovery CURRENT STATUS: ✅ PVC: 50Gi capacity (iscsi-truenas storage class) ✅ Loki StatefulSet: 1/1 ready ✅ Loki Pod: 2/2 containers running ✅ Service: Successfully processing log streams ✅ No storage errors in recent logs TERRAFORM ALIGNED: - Updated loki.yaml persistence.size to match actual PVC - Infrastructure code now reflects deployed state [ci skip] - Emergency fix applied locally first due to service outage		2026-03-17 16:51:02 +00:00
..
authentik	resource quota review: fix OOM risks, close quota gaps, add HA protections	2026-03-08 18:17:46 +00:00
cloudflared	resource quota review: fix OOM risks, close quota gaps, add HA protections	2026-03-08 18:17:46 +00:00
cnpg	[ci skip] install CloudNativePG operator as platform module	2026-02-28 17:22:53 +00:00
crowdsec	resource quota review: fix OOM risks, close quota gaps, add HA protections	2026-03-08 18:17:46 +00:00
dbaas	fix OOM kills: tune MySQL memory, reduce Nextcloud workers, increase Uptime Kuma limit	2026-03-12 07:26:08 +00:00
headscale	[ci skip] fix widget issues: ports, Immich v2 API, Nextcloud trusted domains	2026-03-07 20:39:56 +00:00
infra-maintenance	[ci skip] iSCSI migration, healthcheck fixes, health probes, etcd backup	2026-03-06 19:54:21 +00:00
iscsi-csi	[ci skip] iSCSI migration, healthcheck fixes, health probes, etcd backup	2026-03-06 19:54:21 +00:00
k8s-portal	deploy Sealed Secrets controller for encrypted secret management	2026-03-08 19:49:48 +00:00
kyverno	fix cluster health: pin actualbudget, spread MySQL, scale grampsweb, fix GPU toleration	2026-03-11 11:43:34 +00:00
mailserver	[ci skip] add Homepage gethomepage.dev annotations to all services	2026-03-07 20:39:54 +00:00
metallb	[ci skip] Move Terraform modules into stack directories	2026-02-22 14:38:14 +00:00
metrics-server	[ci skip] Move Terraform modules into stack directories	2026-02-22 14:38:14 +00:00
monitoring	fix(monitoring): Expand Loki PVC from 15GB to 50GB to resolve storage exhaustion	2026-03-17 16:51:02 +00:00
nfs-csi	[ci skip] add NFS CSI driver + nfs_volume shared module	2026-03-01 23:38:58 +00:00
nvidia	fix nvidia quota: use custom quota (32 CPU) instead of Kyverno-generated (16 CPU)	2026-03-12 07:04:34 +00:00
rbac	Woodpecker CI: use built-in clone, fix CoreDNS DNS resolution [CI SKIP]	2026-02-23 00:08:42 +00:00
redis	[ci skip] migrate Redis, Prometheus, Loki storage to iSCSI	2026-03-06 20:50:55 +00:00
reverse_proxy	[ci skip] fix pfSense widget: wan interface is vtnet0 not vmx0	2026-03-07 20:39:56 +00:00
sealed-secrets	deploy Sealed Secrets controller for encrypted secret management	2026-03-08 19:49:48 +00:00
technitium	[ci skip] add Homepage gethomepage.dev annotations to all services	2026-03-07 20:39:54 +00:00
traefik	[ci skip] add Homepage gethomepage.dev annotations to all services	2026-03-07 20:39:54 +00:00
uptime-kuma	fix OOM kills: tune MySQL memory, reduce Nextcloud workers, increase Uptime Kuma limit	2026-03-12 07:26:08 +00:00
vaultwarden	[ci skip] add Homepage gethomepage.dev annotations to all services	2026-03-07 20:39:54 +00:00
vpa	[ci skip] fix Homepage icons for Tandoor, Listenarr, Networking Toolbox, Goldilocks	2026-03-07 21:29:51 +00:00
wireguard	[ci skip] right-size all pod resources based on VPA + live metrics audit	2026-03-01 19:18:50 +00:00
xray	[ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars	2026-03-07 14:30:36 +00:00