No description
Find a file
Viktor Barzin df44601a36 Monitoring overhaul: reduce noise, add coverage gaps, auto-load dashboards
Noise reduction (8 alerts tuned):
- PoisonFountainDown: 2m→5m, critical→warning (fail-open service)
- NodeExporterDown: 2m→5m (flaps during node restarts)
- PowerOutage: add for:1m (debounce transient voltage dips)
- New Tailscale client: add for:5m (debounce headscale reauths)
- NoNodeLoadData: use absent() instead of OR vector(0)==0
- NodeHighCPUUsage: 30%→60% (normal for 70+ services)
- HighMemoryUsage GPU: 12GB/5m→14GB/15m (T4=16GB, model loading)
- PrometheusStorageFull: 50GiB→150GiB (TSDB cap is 180GB)

Alert regrouping:
- Move MailServerDown, HackmdDown, PrivatebinDown → new "Application Health"
- Move New Tailscale client → "Infrastructure Health"

New alerts (14):
- Networking: Cloudflared (2), MetalLB (2), Technitium DNS
- Storage: NFS CSI, iSCSI CSI controllers
- Critical Services: PgBouncer, CNPG operator, MySQL operator
- Infra Health: CrowdSec, Kyverno, Sealed Secrets, Woodpecker

Inhibit rules:
- Consolidate 3 NodeDown rules into 1 comprehensive rule
- Extend NFS rule to suppress NFS-dependent services
- Add PowerOutage → downstream suppression

Dashboard loading:
- Add for_each ConfigMap in grafana.tf to auto-load all 18 dashboards
- Remove duplicate caretta dashboard ConfigMap from caretta.tf
2026-03-18 08:03:59 +00:00
.claude authentik: auto-assign invitation group via expression policy [ci skip] 2026-03-18 08:03:58 +00:00
.git-crypt Add 1 git-crypt collaborator [ci skip] 2025-10-24 18:00:00 +00:00
.planning [ci skip] add auto-generated tiers.tf, planning docs, and helm chart cache 2026-03-06 23:55:57 +00:00
.woodpecker [ci skip] update AGENTS.md + CLAUDE.md with SOPS workflow, add k8s-portal CI pipeline 2026-03-07 15:37:19 +00:00
cli update @ record as well 2024-12-02 21:51:05 +00:00
diagram [ci skip] Sunset Drone CI: remove all artifacts, DNS, configs, and references 2026-02-23 19:38:55 +00:00
docs/plans [ci skip] k8s portal: fix setup script + add onboarding hub (5 new pages) 2026-03-07 15:06:26 +00:00
modules [ci skip] iSCSI migration, healthcheck fixes, health probes, etcd backup 2026-03-06 19:54:21 +00:00
playbooks [ci skip] Reduce node config drift: GPU label, OIDC idempotency, node-exporter, rebuild docs 2026-02-22 22:59:38 +00:00
scripts [ci skip] add Forgejo task pipeline for OpenClaw AI agent 2026-03-07 21:11:07 +00:00
secrets [ci skip] remove atuin: destroy stack, DNS, NFS export, PostgreSQL credentials 2026-03-06 20:11:14 +00:00
stacks Monitoring overhaul: reduce noise, add coverage gaps, auto-load dashboards 2026-03-18 08:03:59 +00:00
.gitattributes add git-crypt terraform 2021-02-14 18:17:40 +00:00
.gitignore Archive terraform.tfvars — secrets now in SOPS 2026-03-11 21:16:11 +00:00
.sops.yaml [ci skip] phase 1: SOPS tooling setup (.sops.yaml, scripts/tg, .gitignore) 2026-03-07 13:57:42 +00:00
AGENTS.md [ci skip] add sealed secrets convention: fileset + kubernetes_manifest pattern 2026-03-08 20:03:50 +00:00
config.tfvars Add terminal stack - reverse proxy to ttyd behind authentik 2026-03-10 23:46:01 +00:00
LICENSE.txt Drone CI Update TLS Certificates Commit 2025-10-12 00:13:18 +00:00
MEMORY.md Update MEMORY.md timestamp 2026-03-07 16:43:15 +00:00
README.md [ci skip] Sunset Drone CI: remove all artifacts, DNS, configs, and references 2026-02-23 19:38:55 +00:00
secrets.sops.json [ci skip] fix Navidrome credentials: admin user is wizard not admin 2026-03-07 20:39:56 +00:00
setup-monitoring.sh fix(monitoring): Add setup script for automated health check environment 2026-03-18 08:03:58 +00:00
terragrunt.hcl [ci skip] phase 3: switch terragrunt to load config.tfvars + SOPS secrets 2026-03-07 14:16:28 +00:00
tiers.tf [ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk 2026-02-28 19:08:06 +00:00

This repo contains my infra-as-code sources.

My infrastructure is built using Terraform, Kubernetes and CI/CD is done using Woodpecker CI.

Read more by visiting my website: https://viktorbarzin.me

git-crypt setup

To decrypt the secrets, you need to setup git-crypt.

  1. Install git-crypt.
  2. Setup gpg keys on the machine
  3. git-crypt unlock

This will unlock the secrets and will lock them on commit