Commit graph

5 commits

Author SHA1 Message Date
Viktor Barzin
e6420c7b36 [ci skip] Move Terraform modules into stack directories
Move all 88 service modules (66 individual + 22 platform) from
modules/kubernetes/<service>/ into their corresponding stack directories:

- Service stacks: stacks/<service>/module/
- Platform stack: stacks/platform/modules/<service>/

This collocates module source code with its Terragrunt definition.
Only shared utility modules remain in modules/kubernetes/:
ingress_factory, setup_tls_secret, dockerhub_secret, oauth-proxy.

All cross-references to shared modules updated to use correct
relative paths. Verified with terragrunt run --all -- plan:
0 adds, 0 destroys across all 68 stacks.
2026-02-22 14:38:14 +00:00
Viktor Barzin
1275697f2b Add GPU node taint tolerations and enhance GPU memory exporter
Add nvidia.com/gpu toleration to all GPU workloads (frigate, ollama)
to support NoSchedule taint on GPU nodes. Update nvidia operator
helm values with daemonset tolerations. Enhance GPU pod memory
exporter with Kubernetes API integration to resolve container IDs
to pod names/namespaces, adding RBAC resources for API access.
2026-02-06 20:19:26 +00:00
Viktor Barzin
8af9e6b5bd set the time slicing config in the nvidia chart values[ci skip] 2025-12-28 08:35:44 +00:00
Viktor Barzin
308ce0019d downgrade nvidia driver to work with 12.8 cuda[ci skip] 2025-12-14 19:09:20 +00:00
Viktor Barzin
58240d640b add nvidia deplaoyment [ci skip] 2025-12-14 09:50:26 +00:00