- MaxRequestWorkers 25→50: too few workers caused ALL workers to block
on SQLite locks, making liveness probes fail even faster (131 restarts
vs 50 before). 50 is a compromise — enough workers for probes.
- Excluded nextcloud from HighServiceErrorRate alert (chronic SQLite issue)
- MySQL migration attempted but hit: GR error 3100 (fixed with GIPK),
emoji in calendar/filecache (stripped), SQLite corruption (pre-existing
from crash-looping). Migration rolled back, Nextcloud restored to SQLite.
Critical fix: StorageClass mountOptions only apply during dynamic
provisioning. Our static PVs (created by Terraform) were missing
mount_options, so all NFS mounts defaulted to hard,timeo=600 —
the exact stale mount behavior we were trying to eliminate.
Adds mount_options directly to the nfs_volume module PV spec and
to the monitoring PVs (prometheus, loki, alertmanager).
Requires re-applying all stacks to propagate to existing PVs.
Move all 88 service modules (66 individual + 22 platform) from
modules/kubernetes/<service>/ into their corresponding stack directories:
- Service stacks: stacks/<service>/module/
- Platform stack: stacks/platform/modules/<service>/
This collocates module source code with its Terragrunt definition.
Only shared utility modules remain in modules/kubernetes/:
ingress_factory, setup_tls_secret, dockerhub_secret, oauth-proxy.
All cross-references to shared modules updated to use correct
relative paths. Verified with terragrunt run --all -- plan:
0 adds, 0 destroys across all 68 stacks.
2026-02-22 14:38:14 +00:00
Renamed from modules/kubernetes/monitoring/loki.tf (Browse further)