Commit graph

1705 commits

Author SHA1 Message Date
Viktor Barzin
449937e22e
Sync realestate-crawler Grafana dashboard with per-endpoint latency panels 2026-02-23 21:31:01 +00:00
Viktor Barzin
8985cd60cc
[ci skip] mailserver: fix Rspamd DKIM signing key path
Mount DKIM private key at Rspamd-expected path
(/tmp/docker-mailserver/rspamd/dkim/viktorbarzin.me/mail.private)
and add dkim_signing.conf override for domain/selector config.
Rspamd does not auto-detect keys from the OpenDKIM path.
2026-02-23 21:01:29 +00:00
Viktor Barzin
04db99fde2
docs: map existing codebase 2026-02-23 20:54:27 +00:00
Viktor Barzin
59c862fe18
[ci skip] dns: remove stale SendGrid CNAME records
Remove em7107, s1._domainkey, and s2._domainkey SendGrid CNAME
records from the bind zone. These are remnants from a previous
relay setup that is no longer in use.
2026-02-23 20:31:07 +00:00
Viktor Barzin
e95ef07b04
[ci skip] mailserver: tighten DMARC policy to quarantine
Move DMARC enforcement from p=none (monitoring only) to p=quarantine
so spoofed emails from viktorbarzin.me are quarantined by recipients.
2026-02-23 20:30:30 +00:00
Viktor Barzin
ce03bc25a9
[ci skip] mailserver: add Postfix rate limiting
Add connection and message rate limits to protect against brute-force
attacks on SMTP/IMAP ports. 10 connections and 30 messages per minute
per client IP.
2026-02-23 20:29:45 +00:00
Viktor Barzin
74948a8af3
[ci skip] roundcubemail: pin to 1.6-apache, disable debug logging
Pin Roundcubemail to stable 1.6-apache tag instead of :latest to
prevent unexpected breakage. Disable SMTP debug and reduce debug
level from 6 to 1 for production use.
2026-02-23 20:29:39 +00:00
Viktor Barzin
b7ccae69bc
[ci skip] monitoring: enable mailserver-down Prometheus alert
Uncomment the mailserver availability alert so we get paged if
the mail server pod has no available replicas for 5 minutes.
2026-02-23 20:29:33 +00:00
Viktor Barzin
75f5cb2001
[ci skip] mailserver: enable Rspamd, disable OpenDKIM
Enable Rspamd for spam filtering and DKIM signing, replacing
OpenDKIM. Rspamd reads existing DKIM keys from the same mount path.
2026-02-23 20:29:32 +00:00
Viktor Barzin
6ca4a1a081
Sync realestate-crawler dashboard with navigation & usage metrics panels 2026-02-23 20:28:55 +00:00
Viktor Barzin
c6a79e89c7
[ci skip] Upgrade Woodpecker CI v3.5.1 → v3.13.0, fix helm healthcheck for v4 2026-02-23 20:14:30 +00:00
Viktor Barzin
0eababf212
[ci skip] Sunset Drone CI: remove all artifacts, DNS, configs, and references
Drone CI has been fully replaced by Woodpecker CI at ci.viktorbarzin.me.
Destroys K8s resources (12), removes DNS records, NFS exports, Uptime Kuma
monitor, dashboard entry, and all code/doc references across 18 files.
2026-02-23 19:38:55 +00:00
Viktor Barzin
b45688646d
Woodpecker CI: use built-in clone, fix CoreDNS DNS resolution [CI SKIP]
- Switch from custom clone override to woodpeckerci/plugin-git built-in clone
  (handles auth automatically via netrc from GitHub OAuth token)
- Add 8.8.8.8 and 1.1.1.1 as CoreDNS upstream resolvers alongside pfSense
  (fixes intermittent DNS timeouts causing clone failures)
- Fix missing comma after heredoc in audit-policy.tf (syntax error)
2026-02-23 00:08:42 +00:00
Viktor Barzin
d870a63130
[ci skip] Reduce healthcheck frequency to 8h, fix apiserver audit duplication bug
Change cluster-healthcheck CronJob from every 30min to every 8h.
Replace fragile sed-based audit config in apiserver manifest with
idempotent Python script that deduplicates by name/mountPath,
preventing the duplicate volume entries that crashed the API server.
2026-02-22 23:18:30 +00:00
Viktor Barzin
860077a126
[ci skip] Remove ResourceQuota limits from nvidia and realestate-crawler namespaces
Add resource-governance/custom-quota=true label to both namespaces so
Kyverno skips auto-generating ResourceQuotas that were causing CPU pressure.
2026-02-22 23:14:53 +00:00
Viktor Barzin
cf67e02135
[ci skip] Reduce node config drift: GPU label, OIDC idempotency, node-exporter, rebuild docs
- Add gpu=true label to Terraform (nvidia null_resource alongside taint)
- Improve API server OIDC config to detect value changes, not just flag presence
- Add policy_hash trigger to audit-policy so rule changes auto-reapply
- Enable prometheus-node-exporter sub-chart, delete unused Ansible playbook
- Document full node rebuild procedure in CLAUDE.md
- Save Talos Linux migration evaluation for future reference
2026-02-22 22:59:38 +00:00
Viktor Barzin
ff66adbe9e
[ci skip] Refactor knowledge: CLAUDE.md 881→190 lines, extract reference data
CLAUDE.md changes:
- Extract service catalog + Cloudflare domains → .claude/reference/service-catalog.md
- Extract Proxmox VMs, hardware, network → .claude/reference/proxmox-inventory.md
- Extract GitHub/Drone API patterns → .claude/reference/github-drone-api.md
- Extract Authentik state snapshot → .claude/reference/authentik-state.md
- Remove Init Container pattern (duplicates setup-project skill)
- Remove Poison Fountain service notes (duplicates Anti-AI section)
- Consolidate Authentik section (link to skills + reference)
- Remove resource limit tables (kept tier definitions inline)

Skill merges (37→32):
- helm-release-force-rerender + helm-stuck-release-recovery → helm-release-troubleshooting
- containerd-multi-registry-pull-through-cache + k8s-docker-registry-cache-bypass → k8s-container-image-caching
- (traefik merges in previous commits)
2026-02-22 22:11:31 +00:00
Viktor Barzin
512b7d08a5
[ci skip] Merge 3 Traefik skills into traefik-helm-configuration
Consolidated traefik-http3-quic, traefik-udp-cross-namespace, and
traefik-plugin-download-failure-404 into a single skill with sections
for HTTP/3 (QUIC), UDP cross-namespace routing, and plugin download
failure troubleshooting.
2026-02-22 22:09:26 +00:00
Viktor Barzin
072642d779
[ci skip] Merge 2 rewrite-body skills into traefik-rewrite-body-troubleshooting 2026-02-22 22:09:03 +00:00
Viktor Barzin
88960ba3a4
[ci skip] Rebuild docker-registry with nginx serialization on all ports
Replace individual `docker run` commands with Docker Compose stack managed
by systemd. Nginx now fronts all 5 registry ports (5000/5010/5020/5030/5040)
with proxy_cache_lock to serialize concurrent blob pulls and prevent
corrupt partial responses. Adds QEMU guest agent for remote management.
2026-02-22 21:45:53 +00:00
Viktor Barzin
9488af2397
[ci skip] Add rewrite-body Accept header skill, update NFS skill
New skill: traefik-rewrite-body-accept-header — rewrite-body plugin
silently skips injection when request Accept header doesn't contain
text/html (curl default Accept: */* doesn't match).

Updated: k8s-nfs-mount-troubleshooting v1.1.0 — added variant for
non-root container UID permission denied on NFS writes.
2026-02-22 21:41:07 +00:00
Viktor Barzin
27bbfdc050
[ci skip] update claude knowledge: add anti-AI scraping & poison-fountain docs 2026-02-22 21:36:40 +00:00
Viktor Barzin
32a25e5779
[ci skip] Update .gitignore: exclude terragrunt-generated files
Add backend.tf, providers.tf, .terraform.lock.hcl, config,
and node_modules to gitignore (all generated or sensitive).
2026-02-22 21:30:45 +00:00
Viktor Barzin
f1a27ed2f9
[ci skip] Add Woodpecker CI stack (WIP) and claude agents
- Add stacks/woodpecker/ with Helm-based deployment config
- Add .woodpecker/ CI pipeline configs (default, build-cli, renew-tls)
- Add NFS export entry for woodpecker
- Add .claude/agents/ definitions
2026-02-22 21:30:25 +00:00
Viktor Barzin
bf90abe7c9
[ci skip] Fix poison fetcher: use HTTP/1.1 for upstream (HTTP/2 hangs)
The Poison Fountain upstream (rnsaffn.com/poison2/) doesn't respond
properly over HTTP/2. Force HTTP/1.1 for reliable content fetching.
Also fixed NFS directory permissions for non-root curl container.
2026-02-22 20:42:53 +00:00
Viktor Barzin
b6169b881e
[ci skip] Add poison-fountain Terraform stack (deployment, service, ingress, CronJob) 2026-02-22 19:50:57 +00:00
Viktor Barzin
1ce8c8096e
[ci skip] Add anti_ai_scraping option to ingress_factory (default: true) 2026-02-22 19:50:07 +00:00
Viktor Barzin
a92fbb8ca5
[ci skip] Add anti-AI scraping Traefik middlewares (ForwardAuth, headers, trap links) 2026-02-22 19:49:32 +00:00
Viktor Barzin
178884714f
[ci skip] Add NFS export and DNS record for poison-fountain 2026-02-22 19:47:46 +00:00
Viktor Barzin
b7e7003e7a
[ci skip] Add poison fountain Python service and fetcher script 2026-02-22 19:46:43 +00:00
Viktor Barzin
50daa14a1a
[ci skip] Add anti-AI scraping implementation plan 2026-02-22 19:41:39 +00:00
Viktor Barzin
45c8dfd890
[ci skip] Add anti-AI scraping system design doc 2026-02-22 19:37:29 +00:00
Viktor Barzin
e051d45160
Apply only platform stack in CI (matches old pipeline scope) 2026-02-22 18:59:02 +00:00
Viktor Barzin
550a682548
Use --queue-ignore-errors for CI (infra stack needs Proxmox SSH) 2026-02-22 18:29:27 +00:00
Viktor Barzin
eace95a1a0
Skip infra stack in CI, remove DRONE_IMAGE_CLONE setting 2026-02-22 18:21:10 +00:00
Viktor Barzin
7cc17b1e4e
Add clone retry logic for intermittent DNS failures 2026-02-22 18:10:31 +00:00
Viktor Barzin
b3d55eab90
Retry CI - test DNS resolution 2026-02-22 18:07:28 +00:00
Viktor Barzin
bdf46cef4d
Use manual clone with alpine instead of drone/git (pull-through cache issue) 2026-02-22 18:05:53 +00:00
Viktor Barzin
d4042ea9c5
Test CI with drone/git:linux-amd64 clone image 2026-02-22 18:02:28 +00:00
Viktor Barzin
81d2a9d708
Test CI pipeline with fixed clone image 2026-02-22 17:54:52 +00:00
Viktor Barzin
c54a416c67
Trigger CI build to test updated Drone pipeline 2026-02-22 17:50:47 +00:00
Viktor Barzin
ea77b91c06
Update Drone CI pipeline for Terragrunt stack architecture
Default pipeline now uses terragrunt run --all to apply all stacks
instead of the broken terraform apply -target=module.kubernetes_cluster.
TLS renewal pipeline stripped of unnecessary Terraform download/init
since renew2.sh is pure shell (certbot + Cloudflare DNS).
2026-02-22 17:47:06 +00:00
Viktor Barzin
91fe79de19
[ci skip] Fix Drone clone image: use alpine/git via DRONE_IMAGE_CLONE
The drone/git:latest image was failing to pull through the registry
cache (corrupted blobs, unexpected EOF). Set DRONE_IMAGE_CLONE on the
Kubernetes runner to use alpine/git:latest globally for all pipelines.
2026-02-22 17:35:04 +00:00
Viktor Barzin
a9f96e2e53
[ci skip] Increase authentik ResourceQuota limits
Authentik is a critical auth service that was at 83% CPU/memory
quota utilization. Double all limits to prevent throttling.
2026-02-22 17:28:41 +00:00
Viktor Barzin
534e63c9b8
[ci skip] Remove legacy files and orphaned modules
Delete 20 orphaned module directories and 3 stray files from
modules/kubernetes/ that are no longer referenced by any stack.
Remove 7 root-level legacy files including the empty tfstate,
27MB terraform zip, commented-out main.tf, and migration notes.
Clean up commented-out dockerhub_secret and oauth-proxy references
in blog, travel_blog, and city-guesser stacks. Remove stale
frigate config.yaml entry from .gitignore. Remove ephemeral
docs/plans/ directory.
2026-02-22 15:23:27 +00:00
Viktor Barzin
b692eb0c34
[ci skip] Flatten module wrappers into stack roots
Remove the module "xxx" { source = "./module" } indirection layer
from all 66 service stacks. Resources are now defined directly in
each stack's main.tf instead of through a wrapper module.

- Merge module/main.tf contents into stack main.tf
- Apply variable replacements (var.tier -> local.tiers.X, renamed vars)
- Fix shared module paths (one fewer ../ at each level)
- Move extra files/dirs (factory/, chart_values, subdirs) to stack root
- Update state files to strip module.<name>. prefix
- Update CLAUDE.md to reflect flat structure

Verified: terragrunt plan shows 0 add, 0 destroy across all stacks.
2026-02-22 15:13:55 +00:00
Viktor Barzin
e4367854f4
[ci skip] Update CLAUDE.md for module colocation
Reflect new directory structure where service modules live inside
their stack directories (stacks/<service>/module/) instead of
modules/kubernetes/<service>/. Update file paths, adding service
instructions, and stack structure documentation.
2026-02-22 14:39:22 +00:00
Viktor Barzin
e225e81ebf
[ci skip] Move Terraform modules into stack directories
Move all 88 service modules (66 individual + 22 platform) from
modules/kubernetes/<service>/ into their corresponding stack directories:

- Service stacks: stacks/<service>/module/
- Platform stack: stacks/platform/modules/<service>/

This collocates module source code with its Terragrunt definition.
Only shared utility modules remain in modules/kubernetes/:
ingress_factory, setup_tls_secret, dockerhub_secret, oauth-proxy.

All cross-references to shared modules updated to use correct
relative paths. Verified with terragrunt run --all -- plan:
0 adds, 0 destroys across all 68 stacks.
2026-02-22 14:38:14 +00:00
Viktor Barzin
73cb696f12
[ci skip] Update CLAUDE.md for Terragrunt migration 2026-02-22 14:12:37 +00:00
Viktor Barzin
ae2bd9a9d8
[ci skip] Fix variable type mismatches in owntracks, ollama, tandoor stacks
- owntracks_credentials: string -> map(string)
- ollama_api_credentials: string -> map(string)
- tandoor_email_password: add default="" (not in tfvars)
2026-02-22 14:07:33 +00:00