infra

Author	SHA1	Message	Date
Viktor Barzin	71d4801cca	[ci skip] audiblez-web: switch from digest to tag for CI-driven deploys Woodpecker CI pipeline now pushes tagged images and patches the deployment with the build number tag. Using :latest as the Terraform baseline so CI can override with specific build tags.	2026-02-28 14:17:19 +00:00
Viktor Barzin	0274cc0722	[ci skip] technitium: add primary-secondary DNS HA with AXFR zone replication Secondary instance on a separate node replicates all zones from primary via zone transfer. LoadBalancer routes DNS queries to both pods. PDB ensures at least 1 DNS pod survives voluntary disruptions. Setup job automates zone transfer enablement and secondary zone creation via Technitium REST API.	2026-02-28 14:14:20 +00:00
Viktor Barzin	3ebf4557f5	[ci skip] update claude knowledge: never restart NFS, NFS export dir prereq	2026-02-28 12:20:36 +00:00
Viktor Barzin	69c4c0c76e	[ci skip] VPA: reduce LimitRange defaults, add overcommit check, protect tier-0 - Reduce Kyverno LimitRange default limits ~4x across all tiers to fix 800-900% memory overcommitment on worker nodes - Add cluster health check #25: per-node resource overcommitment showing requests and limits vs allocatable capacity - Add Kyverno policy for Goldilocks VPA mode by tier: tier-0 namespaces get VPA Off mode (recommend only, no evictions) to prevent downtime on critical infra (traefik, cloudflared, authentik, technitium, etc.) - Non-tier-0 namespaces get VPA Auto mode for active right-sizing	2026-02-26 23:15:43 +00:00
Viktor Barzin	250f805c32	[ci skip] Deploy VPA + Goldilocks for dynamic resource right-sizing Add Vertical Pod Autoscaler (recommender, updater, admission-controller) and Goldilocks dashboard to monitor resource recommendations across all namespaces. Dashboard at goldilocks.viktorbarzin.me behind Authentik.	2026-02-25 21:54:01 +00:00
Viktor Barzin	f1d90ff840	[ci skip] poison-fountain: fix single point of failure causing transient service outages - Scale to 2 replicas with RollingUpdate (maxUnavailable=0) - Add topology spread constraint to place pods on different nodes - Switch from single-threaded to ThreadingMixIn HTTP server so tarpit slow-drip requests no longer block /auth and /healthz endpoints	2026-02-25 21:05:14 +00:00
Viktor Barzin	071a1a1d93	[ci skip] rybbit: increase clickhouse memory limit to fix OOMKilled crash loop	2026-02-25 20:53:08 +00:00
Viktor Barzin	7bc975aa16	[ci skip] kyverno: scale to 2 replicas, eliminate API calls from policies - Scale admission controller to 2 replicas with topology spread across nodes - Rewrite inject-priority-class-from-tier: use namespaceSelector instead of API call per pod admission (eliminates Kyverno→API server round-trip) - Rewrite sync-tier-label-from-namespace: same namespaceSelector approach - Extract governance_tiers local to DRY up tier definitions	2026-02-24 23:09:56 +00:00
Viktor Barzin	dcb465a7e5	[ci skip] Fix Woodpecker GitHub forge: add explicit GITHUB_URL to prevent Forgejo URL bleed When both WOODPECKER_GITHUB and WOODPECKER_FORGEJO are enabled without an explicit WOODPECKER_GITHUB_URL, the GitHub forge inherits the Forgejo URL causing all GitHub API calls to hit forgejo.viktorbarzin.me with GitHub OAuth credentials, resulting in 401 Unauthorized on repo add and cron jobs. Also adds Forgejo forge variables to Terraform.	2026-02-24 23:02:33 +00:00
Viktor Barzin	e7e4faa57a	[ci skip] kyverno: fix crash loop — failurePolicy Ignore, increase memory, pin chart Admission controller was restarting every ~5min due to API server timeouts causing leader election loss. failurePolicy:Fail meant the webhook blocked all pod creation cluster-wide when Kyverno was unavailable.	2026-02-24 23:00:45 +00:00
Viktor Barzin	c35bef2fd8	[ci skip] fix cluster health: GPU tolerations, actualbudget nfs_server, AuthentikDown alert - Add missing nvidia.com/gpu toleration to ollama and yt-highlights deployments - Add node_selector gpu=true to ollama deployment - Pass nfs_server variable through to actualbudget factory modules - Fix AuthentikDown alert to match actual deployment name (goauthentik-server)	2026-02-24 22:55:58 +00:00
Viktor Barzin	4fab38da1f	[ci skip] wrongmove dashboard: add per-path latency table, fix layout, sort top offenders Add "Per-Path Latency Breakdown" table with p50/p95/p99 and request rate per endpoint. Fix bar gauge position to sit next to timeseries. Add sort transformation to "Top Offenders (Avg Duration)" panel.	2026-02-24 22:31:41 +00:00
Viktor Barzin	87a8ea6938	[ci skip] f1-stream: update project state - all 8 phases complete	2026-02-24 00:28:54 +00:00
Viktor Barzin	a3f66c88fd	[ci skip] f1-stream: use v5.0.0 tag to bypass stale pull-through cache	2026-02-24 00:28:12 +00:00
Viktor Barzin	540fffdf3f	f1-stream: fix frontend routing with catch-all handler for SvelteKit SPA	2026-02-24 00:18:28 +00:00
Viktor Barzin	c9b187ed65	f1-stream: fix SvelteKit routing - add trailingSlash for static adapter	2026-02-24 00:11:44 +00:00
Viktor Barzin	d48551609d	f1-stream: add pydantic dependency and trigger CI build	2026-02-24 00:00:02 +00:00
Viktor Barzin	9fd788b158	[ci skip] f1-stream: add CDN token refresh, SvelteKit frontend, multi-stream layout (Phases 6-8) - Phase 6: CDN token lifecycle with 3-strategy URL matching and periodic refresh - Phase 7: SvelteKit 2/Svelte 5 frontend with schedule calendar and hls.js player - Phase 8: Multi-stream layout supporting up to 4 simultaneous HLS streams - Update Dockerfile to multi-stage build (Node.js frontend + Python backend) - Switch deployment to :latest tag with Always pull policy for CI-driven deploys - Update Woodpecker CI to use explicit latest tag	2026-02-23 23:59:35 +00:00
Viktor Barzin	6867036087	[ci skip] f1-stream: add stream health checker and HLS proxy (Phases 4-5) Phase 4 - Stream Health and Fallback: - StreamHealthChecker with partial GET validation of m3u8 content - Bitrate extraction from BANDWIDTH tags - Response time measurement for quality ranking - Fallback ordering: live first, fastest response time first - GET /streams now only returns health-verified streams Phase 5 - HLS Proxy Core: - GET /proxy?url= - m3u8 playlist fetch with full URI rewriting - GET /relay?url= - chunked segment relay (never buffers full segment) - m3u8 rewriter handles master, variant, and segment URIs - Base64url encoding for URL parameters - CORS middleware for browser playback - Range header forwarding for seeking support	2026-02-23 23:41:16 +00:00
Viktor Barzin	a9a4ac37a2	[ci skip] trim CLAUDE.md: remove discoverable info, deduplicate	2026-02-23 23:10:13 +00:00
Viktor Barzin	d15337e838	[ci skip] f1-stream: add extractor framework with demo streams (Phase 3) - BaseExtractor ABC with health_check method - ExtractorRegistry with concurrent fan-out extraction - ExtractionService with in-memory cache and background polling - DemoExtractor with 3 public HLS test streams - Adaptive polling: 5min during live sessions, 30min otherwise - GET /streams, GET /extractors, POST /extract endpoints	2026-02-23 23:02:56 +00:00
Viktor Barzin	461e355a5d	[ci skip] f1-stream: add gitignore for __pycache__, remove committed .pyc	2026-02-23 22:55:38 +00:00
Viktor Barzin	becf56a013	[ci skip] f1-stream: add F1 schedule subsystem (Phase 2) - Fetch 2026 F1 race calendar from jolpica API with all sessions (FP1-3, Qualifying, Sprint, Race) and UTC timestamps - Persist schedule to NFS as JSON, load on startup if fresh - APScheduler daily refresh at 03:00 UTC - GET /schedule endpoint with live/upcoming/past session status - POST /schedule/refresh for manual refresh trigger	2026-02-23 22:55:13 +00:00
Viktor Barzin	f3bcd95242	[ci skip] f1-stream: replace Go service with Python/FastAPI skeleton Replaces the existing Go-based f1-stream service with a new Python/FastAPI backend as the foundation for the rebuilt F1 streaming aggregation service. - New FastAPI backend with health and root endpoints - Python 3.13 slim Dockerfile (replaces Go multi-stage build) - Updated Terraform deployment (port 8000, reduced resources) - Buildx-based redeploy.sh with --platform linux/amd64 - Added Woodpecker CI pipeline for automated builds - Removed all old Go source, node_modules, static assets	2026-02-23 22:47:06 +00:00
Viktor Barzin	c5a4c6e97b	[ci skip] fix plotting-book: add SESSION_SECRET env var Session secret stored in encrypted terraform.tfvars, referenced via variable to avoid committing secrets in plain text.	2026-02-23 22:44:04 +00:00
Viktor Barzin	0a1d53b6dd	[ci skip] platform: add ndots=2 dns_config to all deployment pod specs Prevents Terraform from reverting the Kyverno inject-ndots mutation on every apply. 23 pod specs across 19 platform module files.	2026-02-23 22:43:05 +00:00
Viktor Barzin	a0df23f565	[ci skip] monitoring: increase resource quota limits Bump limits.cpu 80→120 and limits.memory 160Gi→240Gi to provide headroom. Previous values were at 87% and 92% utilization.	2026-02-23 22:42:30 +00:00
Viktor Barzin	83cc053742	[ci skip] fix redis OOMKilled: increase memory limits to 2Gi Redis was CrashLoopBackOff due to OOMKilled - 512Mi limit was insufficient for 650MB RDB dataset plus redis-stack modules.	2026-02-23 22:37:56 +00:00
Viktor Barzin	834d86e0f8	[ci skip] add trading-bot Terraform stack	2026-02-23 22:29:59 +00:00
Viktor Barzin	d57185e262	docs(01-infrastructure-and-deployment): create phase plan	2026-02-23 22:28:54 +00:00
Viktor Barzin	909c28cf4b	docs: create roadmap (8 phases)	2026-02-23 22:22:53 +00:00
Viktor Barzin	8fe7c4967a	docs: define v1 requirements	2026-02-23 22:13:53 +00:00
Viktor Barzin	c61c1744de	[ci skip] update claude knowledge: infrastructure hardening changes - NFS volumes now use var.nfs_server (not hardcoded IP) - Shared infra variables documented (redis_host, postgresql_host, etc.) - Tiers locals now generated by terragrunt.hcl, not duplicated in stacks - Traefik security hardening documented (API, headers, rate limiting) - Kyverno pod security policies documented (audit mode) - Prometheus alert groups updated (Critical Services, PVPredictedFull) - Loki retention updated to 30d, Alloy memory to 512Mi/1Gi - Grampsweb now protected by Authentik - MeshCentral registration disabled	2026-02-23 22:08:46 +00:00
Viktor Barzin	36fd424107	docs: complete project research	2026-02-23 22:06:23 +00:00
Viktor Barzin	89a6e08245	[ci skip] Infrastructure hardening: security, monitoring, reliability, maintainability Phase 1 - Critical Security: - Netbox: move hardcoded DB/superuser passwords to variables - MeshCentral: disable public registration, add Authentik auth - Traefik: disable insecure API dashboard (api.insecure=false) - Traefik: configure forwarded headers with Cloudflare trusted IPs Phase 2 - Security Hardening: - Add security headers middleware (HSTS, X-Frame-Options, nosniff, etc.) - Add Kyverno pod security policies in audit mode (privileged, host namespaces, SYS_ADMIN, trusted registries) - Tighten rate limiting (avg=10, burst=50) - Add Authentik protection to grampsweb Phase 3 - Monitoring & Alerting: - Add critical service alerts (PostgreSQL, MySQL, Redis, Headscale, Authentik, Loki) - Increase Loki retention from 7 to 30 days (720h) - Add predictive PV filling alert (predict_linear) - Re-enable Hackmd and Privatebin down alerts Phase 4 - Reliability: - Add resource requests/limits to Redis, DBaaS, Technitium, Headscale, Vaultwarden, Uptime Kuma - Increase Alloy DaemonSet memory to 512Mi/1Gi Phase 6 - Maintainability: - Extract duplicated tiers locals to terragrunt.hcl generate block (removed from 67 stacks) - Replace hardcoded NFS IP 10.0.10.15 with var.nfs_server (114 instances across 63 files) - Replace hardcoded Redis/PostgreSQL/MySQL/Ollama/mail host references with variables across ~35 stacks - Migrate xray raw ingress resources to ingress_factory modules	2026-02-23 22:05:28 +00:00
Viktor Barzin	1b4737c90c	Reorder realestate-crawler Grafana dashboard sections Move API Performance and Per-Endpoint Latency to the top. Move Scraping Overview, Scraping Activity, and Throttling & Errors to the bottom. Keeps the most operationally relevant panels visible first.	2026-02-23 22:03:27 +00:00
Viktor Barzin	e44e861ec2	chore: add project config	2026-02-23 21:55:25 +00:00
Viktor Barzin	3f3c5ffaa1	docs: initialize project	2026-02-23 21:53:51 +00:00
Viktor Barzin	5fdd9d7f04	Sync realestate-crawler Grafana dashboard with per-endpoint latency panels	2026-02-23 21:31:01 +00:00
Viktor Barzin	15157b50a2	[ci skip] mailserver: fix Rspamd DKIM signing key path Mount DKIM private key at Rspamd-expected path (/tmp/docker-mailserver/rspamd/dkim/viktorbarzin.me/mail.private) and add dkim_signing.conf override for domain/selector config. Rspamd does not auto-detect keys from the OpenDKIM path.	2026-02-23 21:01:29 +00:00
Viktor Barzin	c8e9c41afc	docs: map existing codebase	2026-02-23 20:54:27 +00:00
Viktor Barzin	275eb5aec8	[ci skip] mailserver: tighten DMARC policy to quarantine Move DMARC enforcement from p=none (monitoring only) to p=quarantine so spoofed emails from viktorbarzin.me are quarantined by recipients.	2026-02-23 20:30:30 +00:00
Viktor Barzin	00e1682ec8	[ci skip] mailserver: add Postfix rate limiting Add connection and message rate limits to protect against brute-force attacks on SMTP/IMAP ports. 10 connections and 30 messages per minute per client IP.	2026-02-23 20:29:45 +00:00
Viktor Barzin	ed6d505433	[ci skip] roundcubemail: pin to 1.6-apache, disable debug logging Pin Roundcubemail to stable 1.6-apache tag instead of :latest to prevent unexpected breakage. Disable SMTP debug and reduce debug level from 6 to 1 for production use.	2026-02-23 20:29:39 +00:00
Viktor Barzin	b0aaa7b813	[ci skip] monitoring: enable mailserver-down Prometheus alert Uncomment the mailserver availability alert so we get paged if the mail server pod has no available replicas for 5 minutes.	2026-02-23 20:29:33 +00:00
Viktor Barzin	491f9f4d49	[ci skip] mailserver: enable Rspamd, disable OpenDKIM Enable Rspamd for spam filtering and DKIM signing, replacing OpenDKIM. Rspamd reads existing DKIM keys from the same mount path.	2026-02-23 20:29:32 +00:00
Viktor Barzin	65ca327ed0	Sync realestate-crawler dashboard with navigation & usage metrics panels	2026-02-23 20:28:55 +00:00
Viktor Barzin	d041459ef2	[ci skip] Upgrade Woodpecker CI v3.5.1 → v3.13.0, fix helm healthcheck for v4	2026-02-23 20:14:30 +00:00
Viktor Barzin	c8de2c4803	[ci skip] Sunset Drone CI: remove all artifacts, DNS, configs, and references Drone CI has been fully replaced by Woodpecker CI at ci.viktorbarzin.me. Destroys K8s resources (12), removes DNS records, NFS exports, Uptime Kuma monitor, dashboard entry, and all code/doc references across 18 files.	2026-02-23 19:38:55 +00:00
Viktor Barzin	ebecaaee5c	Woodpecker CI: use built-in clone, fix CoreDNS DNS resolution [CI SKIP] - Switch from custom clone override to woodpeckerci/plugin-git built-in clone (handles auth automatically via netrc from GitHub OAuth token) - Add 8.8.8.8 and 1.1.1.1 as CoreDNS upstream resolvers alongside pfSense (fixes intermittent DNS timeouts causing clone failures) - Fix missing comma after heredoc in audit-policy.tf (syntax error)	2026-02-23 00:08:42 +00:00

1 2 3 4 5 ...

1402 commits