Commit graph

27 commits

Author SHA1 Message Date
Viktor Barzin
b1d152be1f [infra] Auto-create Cloudflare DNS records from ingress_factory
## Context

Deploying new services required manually adding hostnames to
cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars —
a separate file from the service stack. This was frequently forgotten,
leaving services unreachable externally.

## This change:

- Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory`
  modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates
  the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP).
- Simplify cloudflared tunnel from 100 per-hostname rules to wildcard
  `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing.
- Add global Cloudflare provider via terragrunt.hcl (separate
  cloudflare_provider.tf with Vault-sourced API key).
- Migrate 118 hostnames from centralized config.tfvars to per-service
  dns_type. 17 hostnames remain centrally managed (Helm ingresses,
  special cases).
- Update docs, AGENTS.md, CLAUDE.md, dns.md runbook.

```
BEFORE                          AFTER
config.tfvars (manual list)     stacks/<svc>/main.tf
        |                         module "ingress" {
        v                           dns_type = "proxied"
stacks/cloudflared/               }
  for_each = list                     |
  cloudflare_record               auto-creates
  tunnel per-hostname             cloudflare_record + annotation
```

## What is NOT in this change:

- Uptime Kuma monitor migration (still reads from config.tfvars)
- 17 remaining centrally-managed hostnames (Helm, special cases)
- Removal of allow_overwrite (keep until migration confirmed stable)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:45:04 +00:00
Viktor Barzin
27d7c91608 feat(beads-server): add Dolt Workbench web UI
Deploy dolthub/dolt-workbench alongside the Dolt server in beads-server
namespace. Provides SQL console, spreadsheet editor, and commit graph
visualization for the centralized beads task database.

- Workbench at dolt-workbench.viktorbarzin.me (Cloudflare-proxied)
- Connects to Dolt server via in-cluster service DNS
- Added to cloudflare_proxied_names for external access

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 21:32:45 +00:00
Viktor Barzin
38d51ab0af deprecate TrueNAS: migrate Immich NFS to Proxmox, remove all 10.0.10.15 references [ci skip]
- Migrate Immich (8 NFS PVs, 1.1TB) from TrueNAS to Proxmox host NFS
- Update config.tfvars nfs_server to 192.168.1.127 (Proxmox)
- Update nfs-csi StorageClass share to /srv/nfs
- Update scripts (weekly-backup, cluster-healthcheck) to Proxmox IP
- Delete obsolete TrueNAS scripts (nfs_exports.sh, truenas-status.sh)
- Rewrite nfs-health.sh for Proxmox NFS monitoring
- Update Freedify nfs_music_server default to Proxmox
- Mark CloudSync monitor CronJob as deprecated
- Update Prometheus alert summaries
- Update all architecture docs, AGENTS.md, and reference docs
- Zero PVs remain on TrueNAS — VM ready for decommission

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 14:42:07 +00:00
Viktor Barzin
1c300a14cf mailserver: overhaul inbound delivery, monitoring, CrowdSec, and migrate to Brevo relay
Inbound:
- Direct MX to mail.viktorbarzin.me (ForwardEmail relay attempted and abandoned)
- Dedicated MetalLB IP 10.0.20.202 with ETP: Local for CrowdSec real-IP detection
- Removed Cloudflare Email Routing (can't store-and-forward)
- Fixed dual SPF violation, hardened to -all
- Added MTA-STS, TLSRPT, imported Rspamd DKIM into Terraform
- Removed dead BIND zones from config.tfvars (199 lines)

Outbound:
- Migrated from Mailgun (100/day) to Brevo (300/day free)
- Added Brevo DKIM CNAMEs and verification TXT

Monitoring:
- Probe frequency: 30m → 20m, alert thresholds adjusted to 60m
- Enabled Dovecot exporter scraping (port 9166)
- Added external SMTP monitor on public IP

Documentation:
- New docs/architecture/mailserver.md with full architecture
- New docs/architecture/mailserver-visual.html visualization
- Updated monitoring.md, CLAUDE.md, historical plan docs
2026-04-12 22:24:38 +01:00
Viktor Barzin
82b0f6c4cb truenas deprecation: migrate all non-immich storage to proxmox NFS
- Migrate 7 backup CronJobs to Proxmox host NFS (192.168.1.127)
  (etcd, mysql, postgresql, nextcloud, redis, vaultwarden, plotting-book)
- Migrate headscale backup, ebook2audiobook, osm_routing to Proxmox NFS
- Migrate servarr (lidarr, readarr, soulseek) NFS refs to Proxmox
- Remove 79 orphaned TrueNAS NFS module declarations from 49 stacks
- Delete stacks/platform/modules/ (27 dead module copies, 65MB)
- Update nfs-truenas StorageClass to point to Proxmox (192.168.1.127)
- Remove iscsi DNS record from config.tfvars
- Fix woodpecker persistence config and alertmanager PV

Only Immich (8 PVCs, ~1.4TB) remains on TrueNAS.
2026-04-12 14:35:39 +01:00
Viktor Barzin
4d3d3316ab feat(phpipam): deploy phpIPAM for live IP address management
Lightweight IPAM with auto-discovery scanning every 15min via fping.
Replaces disabled NetBox (OOM'd). Uses existing MySQL InnoDB cluster
with Vault-rotated credentials. Cloudflare DNS + Authentik auth.

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 14:19:25 +00:00
Viktor Barzin
9820f2ced0 add foolery stack: agent orchestration UI on devvm [ci skip]
Service+Endpoints pattern proxying to 10.0.10.10:3210 (Foolery).
Protected by Authentik forward-auth. DNS via Cloudflare tunnel.
2026-04-10 00:21:59 +01:00
Viktor Barzin
64585e329c fix: update Technitium DNS IP from 10.0.20.200 to 10.0.20.201
Technitium DNS was moved to its own dedicated MetalLB LoadBalancer IP
(10.0.20.201) but several references still pointed to the old shared IP
(10.0.20.200, now used by traefik/coturn/etc). This caused DNS resolution
failures for *.viktorbarzin.lan from pfSense and LAN clients.

- Update CoreDNS Corefile forward in both technitium and platform modules
- Update MetalLB annotation and remove stale allow-shared-ip
- Update zone NS records and apex A record in config.tfvars
- Update legacy BIND forwarder reference

Also fixed on pfSense (not in repo):
- Removed NAT rule redirecting UDP 53 to wrong IP (10.0.20.200)
- Added dnsmasq listen on WAN (192.168.1.2) for LAN clients
- Added domain-specific forwarding (viktorbarzin.lan -> 10.0.20.201)
- Created aliases (technitium_dns, k8s_shared_lb) for all NAT rules

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 09:53:56 +00:00
Viktor Barzin
6abc0b9742 security(monitoring): remove public SNMP exporter ingress
snmp-exporter-external.viktorbarzin.me exposed UPS metrics to the
public internet with no authentication. Removed the external ingress
and Cloudflare DNS record. ha-sofia now accesses the SNMP exporter
via the existing .lan ingress (allow_local_access_only=true) using
direct IP 10.0.20.200 with Host header.
2026-04-06 15:23:56 +03:00
Viktor Barzin
7f141faa8c Fix: Expose SNMP exporter externally to ha-sofia via Cloudflare tunnel
- Add snmp-exporter-ingress-external module for external HTTPS access to snmp-exporter
- Register snmp-exporter-external.viktorbarzin.me in Cloudflare DNS (proxied via tunnel)
- Update ha-sofia REST integration to use external HTTPS endpoint
- Fix ingress backend service routing to use existing snmp-exporter service
- All UPS sensors on ha-sofia now report values (voltage, battery %, load, etc.)
2026-04-06 15:14:19 +03:00
Viktor Barzin
f80e1fa868 cluster health fixes: NFS CSI, Immich ML, dbaas, Redis, DNS, trading-bot removal
- NFS CSI: fix liveness-probe port conflict (29652 → 29653)
- Immich ML: add gpu-workload priority class to enable preemption on node1
- dbaas: right-size MySQL memory limits (sidecar 6Gi→350Mi, main 4Gi→3Gi)
- Redis: add redis-master service via HAProxy for master-only routing,
  update config.tfvars redis_host to use it
- CoreDNS: forward .viktorbarzin.lan to Technitium ClusterIP (10.96.0.53)
  instead of stale LoadBalancer IP (10.0.20.200)
- Trading bot: comment out all resources (no longer needed)
- Vault: remove trading-bot PostgreSQL database role
2026-04-06 11:54:45 +03:00
Viktor Barzin
95e49134ae cleanup: remove old audiobook-search, superseded by book-search
- Delete servarr/audiobook-search TF module (moved to ebooks/book-search)
- Remove audiobook-search from cloudflare_proxied_names
- Remove commented-out module reference in servarr/main.tf
- Clean up "renamed from" comment in ebooks/main.tf
- K8s resources (deploy/svc/ingress) deleted from servarr namespace
- Cloudflare DNS record already absent
- Import book-search and insta2spotify DNS records into cloudflared state
2026-03-25 23:16:01 +02:00
Viktor Barzin
946ea9e1f3 fix ebooks stack: prefix PV names, add book-search DNS, add secrets symlink [ci skip] 2026-03-25 15:14:08 +02:00
Viktor Barzin
6dda15afa0 add insta2spotify stack: namespace, ESO, NFS, 2-container deploy, split ingress
- Namespace insta2spotify (tier 4-aux)
- ExternalSecret from Vault secret/insta2spotify
- NFS volume at /mnt/main/insta2spotify for SQLite + Spotify cache
- Frontend (128Mi) + backend (512Mi req / 2Gi limit) in one pod
- Split ingress: protected (Authentik) for frontend, unprotected for /api/*
- DNS via Cloudflare (proxied)
2026-03-25 13:03:35 +02:00
Viktor Barzin
da00a63e5a add claude-memory to cloudflare proxied DNS records
The MCP server was unreachable because the DNS record was missing.
2026-03-25 01:07:35 +02:00
Viktor Barzin
4ca7af8818 add audiobook-search service to servarr stack
- New audiobook-search deployment + service + ingress (Authentik-protected)
- qBittorrent: add NFS mount for /audiobooks (shared with Audiobookshelf)
- Cloudflare DNS: add audiobook-search.viktorbarzin.me
- Env vars: QBITTORRENT_URL/PASS, AUDIOBOOKSHELF_URL/TOKEN from ESO
2026-03-24 01:21:49 +02:00
Viktor Barzin
644562454c add IPv6 connectivity via Hurricane Electric 6in4 tunnel
- Add public_ipv6 variable and AAAA records for all 34 non-proxied services
- Fix stale DNS records (85.130.108.6 → 176.12.22.76, old IPv6 → HE tunnel)
- Update SPF record with current IPv4/IPv6 addresses
- Add AAAA update support to Technitium DNS updater CLI
- Pin mailserver MetalLB IP to 10.0.20.201 for stable pfSense NAT
- pfSense: HE_IPv6 interface, strict firewall (80,443,25,465,587,993 + ICMPv6),
  socat IPv6→IPv4 proxy, removed dangerous "Allow all DEBUG" rules
2026-03-23 02:22:00 +02:00
Viktor Barzin
0674d6e538 deploy priority-pass app to cluster via private registry
- SvelteKit frontend + FastAPI backend in single pod with sidecar pattern
- Images pushed to 10.0.20.10:5050 private registry (v4/v1)
- SvelteKit server route proxies /api/transform to backend on 127.0.0.1:8000
- Exposed at priority-pass.viktorbarzin.me (Cloudflare-proxied, no auth)
- Uses imagePullSecrets for authenticated registry pulls
2026-03-23 00:55:41 +02:00
Viktor Barzin
36171bcda4 add htpasswd auth to private docker registry + expose at registry.viktorbarzin.me
- Add auth.htpasswd section to config-private.yml
- Mount htpasswd file in registry-private container, fix healthcheck for 401
- Rename registry UI from registry.viktorbarzin.me → docker.viktorbarzin.me
- Add Docker CLI ingress at registry.viktorbarzin.me (HTTPS backend, no rate-limit, unlimited body)
- Add docker to cloudflare_proxied_names (registry stays non-proxied)
- Add Kyverno ClusterPolicy to sync registry-credentials secret to all namespaces
- Update infra provisioning to install apache2-utils and generate htpasswd from Vault
2026-03-22 22:10:10 +02:00
Viktor Barzin
21bb3036af state(dbaas): update encrypted state 2026-03-19 20:23:59 +00:00
Viktor Barzin
708eb69742 fix: update postgresql_host to pg-cluster-rw (old service had no endpoints)
The legacy `postgresql.dbaas` service had no endpoints after CNPG migration,
causing Woodpecker and other stacks to fail DB connections. Changed to
`pg-cluster-rw.dbaas` which points to the CNPG primary.
2026-03-16 07:07:22 +00:00
Viktor Barzin
3aba29e7a3 remove SOPS pipeline, deploy ESO + Vault DB/K8s engines
Vault is now the sole source of truth for secrets. SOPS pipeline
removed entirely — auth via `vault login -method=oidc`.

Part A: SOPS removal
- vault/main.tf: delete 990 lines (93 vars + 43 KV write resources),
  add self-read data source for OIDC creds from secret/vault
- terragrunt.hcl: remove SOPS var loading, vault_root_token, check_secrets hook
- scripts/tg: remove SOPS decryption, keep -auto-approve logic
- .woodpecker/default.yml: replace SOPS with Vault K8s auth via curl
- Delete secrets.sops.json, .sops.yaml

Part B: External Secrets Operator
- New stack stacks/external-secrets/ with Helm chart + 2 ClusterSecretStores
  (vault-kv for KV v2, vault-database for DB engine)

Part C: Database secrets engine (in vault/main.tf)
- MySQL + PostgreSQL connections with static role rotation (24h)
- 6 MySQL roles (speedtest, wrongmove, codimd, nextcloud, shlink, grafana)
- 6 PostgreSQL roles (trading, health, linkwarden, affine, woodpecker, claude_memory)

Part D: Kubernetes secrets engine (in vault/main.tf)
- RBAC for Vault SA to manage K8s tokens
- Roles: dashboard-admin, ci-deployer, openclaw, local-admin
- New scripts/vault-kubeconfig helper for dynamic kubeconfig

K8s auth method with scoped policies for CI, ESO, OpenClaw, Woodpecker sync.
2026-03-15 16:37:38 +00:00
Viktor Barzin
6f562b5da6 add vaultwarden daily backup CronJob to NFS
SQLite backup via Online Backup API + copy of RSA keys,
attachments, sends, and config. 30-day retention with rotation.
Pod affinity ensures co-scheduling with vaultwarden for RWO PVC access.
2026-03-15 00:03:59 +00:00
Viktor Barzin
2c296d4d7c add novelapp deployment [ci skip]
Deploy NovelApp (web novel reading tracker) to k8s cluster.
- Namespace: novelapp, tier: aux
- iSCSI PVC for SQLite persistence
- Ingress at novelapp.viktorbarzin.me
- Browser scraping disabled
2026-03-14 18:51:14 +00:00
Viktor Barzin
5a9881337d Add terminal stack - reverse proxy to ttyd behind authentik
Exposes ttyd at 10.0.10.10:7681 via terminal.viktorbarzin.me with
Cloudflare DNS and Authentik forward-auth protection.
2026-03-10 23:46:01 +00:00
Viktor Barzin
6f8b48a73c [ci skip] k8s portal: fix setup script + add onboarding hub (5 new pages)
Bug fixes:
- CA cert now populated in ConfigMap (was empty → TLS failures)
- Remove useless heredoc quote escaping in setup script
- Fix homepage: VPN callout, correct verification command (get namespaces)
- Fix false-positive sensitive=true on ingress_path, tls_secret_name,
  truenas_host, ollama_host, client_certificate_secret_name

New pages (direct Svelte, no mdsvex dependency):
- /onboarding: step-by-step guide (VPN, kubectl, git, first PR)
- /architecture: cluster topology, storage, networking, tiers
- /services: catalog of 70+ services with URLs
- /contributing: PR workflow, what you can/can't change, NEVER list
- /troubleshooting: common issues and fixes

Navigation bar added to layout. All pages use consistent docs styling.

Requires Docker image rebuild: cd stacks/platform/modules/k8s-portal/files
&& docker build -t viktorbarzin/k8s-portal:latest . && docker push
2026-03-07 15:06:26 +00:00
Viktor Barzin
0d8e3484be [ci skip] phase 2: split terraform.tfvars into config.tfvars + secrets.sops.json
config.tfvars (29 vars, plaintext): hostnames, IPs, DNS records, IDs
secrets.sops.json (140 vars, SOPS-encrypted): passwords, tokens, keys, maps

Both files coexist with terraform.tfvars — no functional change yet.
Complex types preserved: maps (mailserver_accounts, k8s_users, homepage_credentials),
lists (xray_reality_clients), heredocs as \n-escaped JSON strings (SSH keys,
WireGuard conf, headscale config).
2026-03-07 14:04:40 +00:00