infra

Author	SHA1	Message	Date
Viktor Barzin	e81e836d3a	[setup-tls-secret] Delete deprecated renew.sh with hardcoded Technitium token ## Context modules/kubernetes/setup_tls_secret/renew.sh is a 2.5-year-old expect(1) script for manual Let's Encrypt wildcard-cert renewal via Technitium DNS TXT-record challenges. It hardcodes a 64-char Technitium API token on line 7 (as an expect variable) and line 27 (inside a certbot-cleanup heredoc). Both remotes are public, so the token has been exposed for ~2.5 years. The script is not invoked by the module's Terraform (main.tf only creates a kubernetes.io/tls Secret from PEM files); it is a standalone run-it-yourself tool. grep across the repo confirms nothing references `renew.sh` — neither the 20+ stacks that consume the `setup_tls_secret` module, nor any CI pipeline, nor any shell wrapper. A replacement script `renew2.sh` (4 weeks old) lives alongside it. It sources the Technitium token from `$TECHNITIUM_API_KEY` env var and also supports Cloudflare DNS-01 challenges via `$CLOUDFLARE_TOKEN`. It is the current renewal path. ## This change - git rm modules/kubernetes/setup_tls_secret/renew.sh ## What is NOT in this change - Technitium token rotation. The leaked token still works against `technitium-web.technitium.svc.cluster.local:5380` until revoked in the Technitium admin UI. Rotation is a prerequisite for the upcoming git-history scrub, which will remove the token from every commit via `git filter-repo --replace-text`. - renew2.sh is retained as-is (already env-var-sourced; clean). - The setup_tls_secret module's main.tf is not touched; 20+ consuming stacks keep working. ## Test plan ### Automated $ grep -rn 'renew\.sh' --include='.tf' --include='.hcl' \ --include='.yaml' --include='.yml' --include='*.sh' (no output — confirms no consumer) $ git grep -n 'e28818f309a9ce7f72f0fcc867a365cf5d57b214751b75e2ef3ea74943ef23be' (no output in HEAD after this commit) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/setup_tls_secret/renew.sh \| 136 --------- 2. `test ! -e modules/kubernetes/setup_tls_secret/renew.sh` returns true. 3. `renew2.sh` still exists and is executable: ls -la modules/kubernetes/setup_tls_secret/renew2.sh 4. Next cert-renewal run uses renew2.sh with env-var-sourced token; no behavioral regression because renew.sh was never part of the automated flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 19:41:08 +00:00
Viktor Barzin	d3be9b50af	[frigate] Remove orphan config.yaml with leaked RTSP passwords ## Context A Frigate configuration file was added to modules/kubernetes/frigate/ in `bcad200a` (2026-04-15, ~2 days ago) as part of a bulk `chore: add untracked stacks, scripts, and agent configs` commit. The file contains 14 inline rtsp://admin:<password>@<host>:554/... URLs, leaking two distinct RTSP passwords for the cameras at 192.168.1.10 (LAN-only) and valchedrym.ddns.net (confirmed reachable from public internet on port 554). Both remotes are public, so the creds have been exposed for ~2 days. Grep across the repo confirms nothing references this config.yaml — the active stacks/frigate/main.tf stack reads its configuration from a persistent volume claim named `frigate-config-encrypted`, not from this file. The file is therefore an orphan from the bulk add, with no production function. ## This change - git rm modules/kubernetes/frigate/config.yaml ## What is NOT in this change - Camera password rotation. The user does not own the cameras; rotation must be coordinated out-of-band with the camera operators. The DDNS camera (valchedrym.ddns.net:554) is internet-reachable, so the leaked password is high-priority to rotate from the device side. - Git-history rewrite. The file plus its leaked strings remain in all commits from `bcad200a` forward. Scheduled to be purged via `git filter-repo --path modules/kubernetes/frigate/config.yaml --invert-paths --replace-text <list>` in the broader remediation pass. - Future Frigate config provisioning. If the stack is re-platformed to source config from Git rather than the PVC, the replacement should go through ExternalSecret + env-var interpolation, not an inline YAML. ## Test plan ### Automated $ grep -rn 'frigate/config\.yaml' --include='.tf' --include='.hcl' \ --include='.yaml' --include='.yml' --include='*.sh' (no output — confirms orphan status) ### Manual Verification 1. `git show HEAD --stat` shows exactly one deletion: modules/kubernetes/frigate/config.yaml \| 229 --------------------------------- 2. `test ! -e modules/kubernetes/frigate/config.yaml` returns true. 3. `kubectl -n frigate get pvc frigate-config-encrypted` still shows the PVC bound (unaffected by this change). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 19:39:35 +00:00
Viktor Barzin	b034c868db	[traefik] Remove broken rewrite-body plugin and all rybbit/anti-AI injection The rewrite-body Traefik plugin (both packruler/rewrite-body v1.2.0 and the-ccsn/traefik-plugin-rewritebody v0.1.3) silently fails on Traefik v3.6.12 due to Yaegi interpreter issues with ResponseWriter wrapping. Both plugins load without errors but never inject content. Removed: - rewrite-body plugin download (init container) and registration - strip-accept-encoding middleware (only existed for rewrite-body bug) - anti-ai-trap-links middleware (used rewrite-body for injection) - rybbit_site_id variable from ingress_factory and reverse_proxy factory - rybbit_site_id from 25 service stacks (39 instances) - Per-service rybbit-analytics middleware CRD resources Kept: - compress middleware (entrypoint-level, working correctly) - ai-bot-block middleware (ForwardAuth to bot-block-proxy) - anti-ai-headers middleware (X-Robots-Tag: noai, noimageai) - All CrowdSec, Authentik, rate-limit middleware unchanged Next: Cloudflare Workers with HTMLRewriter for edge-side injection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 12:41:17 +00:00
Viktor Barzin	66d2d9916b	[infra] Per-ingress external-monitor annotation + actualbudget plan-time fix [ci skip] ## Context Two operational gaps surfaced during a healthcheck sweep today: 1. External monitoring coverage: Only ~13 hostnames (via `cloudflare_proxied_names` in `config.tfvars`) had `[External]` monitors in Uptime Kuma. Any service deployed via `ingress_factory` with `dns_type = "proxied"` auto-created its DNS record but was NOT registered for external probing — so outages like Immich going down externally were invisible until a user complained. 99 of ~125 public ingresses had no external monitor. 2. actualbudget stack unplannable: `count = var.budget_encryption_password != null ? 1 : 0` in `factory/main.tf:152` failed with "Invalid count argument" because the value flows from a `data.kubernetes_secret` whose contents are `(known after apply)` at plan time. Blocked CI applies and drift reconciliation. ## This change ### Per-ingress external-monitor annotation (ingress_factory + reverse_proxy/factory) - New variables `external_monitor` (bool, nullable) + `external_monitor_name` (string, nullable). Default is "follow dns_type" — enabled for any public DNS record (`dns_type != "none"`, covers both proxied and non-proxied so Immich and other direct-A records are also monitored). - Emits two annotations on the Ingress: - `uptime.viktorbarzin.me/external-monitor = "true"` - `uptime.viktorbarzin.me/external-monitor-name = "<label>"` (optional override) ### external-monitor-sync CronJob (uptime-kuma stack) - Discovers targets from live Ingress objects via the K8s API first (filter by annotation), falls back to the legacy `external-monitor-targets` ConfigMap on any API error (zero rollout risk). - New `ServiceAccount` + cluster-wide `ClusterRole`/`ClusterRoleBinding` giving `list`/`get` on `networking.k8s.io/ingresses`. - `API_SERVER` now uses the `KUBERNETES_SERVICE_HOST` env var (always injected by K8s) instead of `kubernetes.default.svc` — the search-domain expansion failed in the CronJob pod's DNS config. Verified working: CronJob now logs `Loaded N external monitor targets (source=k8s-api)`. ### actualbudget count-on-unknown refactor - Replaced `count = var.budget_encryption_password != null ? 1 : 0` with two explicit plan-time booleans: `enable_http_api` and `enable_bank_sync`. Values are known at plan; no `-target` workaround needed. - Callers (`stacks/actualbudget/main.tf`) pass `true` explicitly. Runtime behaviour is unchanged — the secret is still consumed via env var. - Also aligned the factory with live state (the 3 budget-* PVCs had been migrated `proxmox-lvm` → `proxmox-lvm-encrypted` outside Terraform): PVC resource renamed `data_proxmox` → `data_encrypted`, storage class updated, orphaned `nfs_data` module removed. State was rm'd + re-imported with matching UIDs, so no data was moved. ## Rollout status (already partially applied in this session) - `stacks/uptime-kuma` applied — SA + RBAC + CronJob changes live; FQDN fix verified - `stacks/actualbudget` applied — budget-{viktor,anca,emo} all 200 OK externally - `stacks/mailserver` + 21 other ingress_factory consumers applied — annotations live - CronJob `external-monitor-sync` latest run: `source=k8s-api`, 26 monitors active (was 13 on the central list) ## Deferred (separate work) - 4 stacks show pre-existing DESTRUCTIVE drift in plan (metallb namespace, claude-memory, rbac, redis) — NOT triggered by this commit but will be by CI's global-file cascade. `[ci skip]` here so those don't auto-apply; they will be fixed manually before the next CI push. - Cleanup of `cloudflare_proxied_names` list once Helm-managed ingresses (authentik, grafana, vault, forgejo) are annotated — separate PR. ## Test plan ### Automated \`\`\` \$ kubectl -n uptime-kuma logs \$(kubectl -n uptime-kuma get pods -l job-name -o name \| tail -1) Loaded 26 external monitor targets (source=k8s-api) Sync complete: 7 created, 0 deleted, 17 unchanged \$ curl -sk -o /dev/null -w "%{http_code}\n" -H "Accept: text/html" \\ https://dawarich.viktorbarzin.me/ https://nextcloud.viktorbarzin.me/ \\ https://budget-viktor.viktorbarzin.me/ 200 302 200 \$ kubectl -n actualbudget get deploy,pvc -l app=budget-viktor deployment.apps/budget-viktor 1/1 1 1 Ready persistentvolumeclaim/budget-viktor-data-encrypted Bound 10Gi RWO proxmox-lvm-encrypted \`\`\` ### Manual Verification 1. Confirm the annotation is present on an ingress_factory ingress: \`\`\` kubectl -n dawarich get ingress dawarich -o \\ jsonpath='{.metadata.annotations.uptime\.viktorbarzin\.me/external-monitor}' # Expected: "true" \`\`\` 2. Confirm the new `[External] <name>` monitor appears in Uptime Kuma within 10 min (CronJob interval). For Immich specifically, it will appear after the immich stack is re-applied. 3. Verify actualbudget plan is clean: \`\`\` cd stacks/actualbudget && scripts/tg plan --non-interactive # Expected: no "Invalid count argument" errors \`\`\` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 10:34:32 +00:00
Viktor Barzin	f8facf44dd	[infra] Fix rewrite-body plugin + cleanup TrueNAS + version bumps ## Context The rewrite-body Traefik plugin (packruler/rewrite-body v1.2.0) silently broke on Traefik v3.6.12 — every service using rybbit analytics or anti-AI injection returned HTTP 200 with "Error 404: Not Found" body. Root cause: middleware specs referenced plugin name `rewrite-body` but Traefik registered it as `traefik-plugin-rewritebody`. Migrated to maintained fork `the-ccsn/traefik-plugin-rewritebody` v0.1.3 which uses the correct plugin name. Also added `lastModified = true` and `methods = ["GET"]` to anti-AI middleware to avoid rewriting non-HTML responses. ## This change - Replace packruler/rewrite-body v1.2.0 with the-ccsn/traefik-plugin-rewritebody v0.1.3 - Fix plugin name in all 3 middleware locations (ingress_factory, reverse-proxy factory, traefik anti-AI) - Remove deprecated TrueNAS cloud sync monitor (VM decommissioned 2026-04-13) - Remove CloudSyncStale/CloudSyncFailing/CloudSyncNeverRun alerts - Fix PrometheusBackupNeverRun alert (for: 48h → 32d to match monthly sidecar schedule) - Bump versions: rybbit v1.0.21→v1.1.0, wealthfolio v1.1.0→v3.2, networking-toolbox 1.1.1→1.6.0, cyberchef v10.24.0→v9.55.0 - MySQL standalone storage_limit 30Gi → 50Gi - beads-server: fix Dolt workbench type casing, remove Authentik on GraphQL endpoint Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 05:51:52 +00:00
Viktor Barzin	b1d152be1f	[infra] Auto-create Cloudflare DNS records from ingress_factory ## Context Deploying new services required manually adding hostnames to cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars — a separate file from the service stack. This was frequently forgotten, leaving services unreachable externally. ## This change: - Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory` modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP). - Simplify cloudflared tunnel from 100 per-hostname rules to wildcard `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing. - Add global Cloudflare provider via terragrunt.hcl (separate cloudflare_provider.tf with Vault-sourced API key). - Migrate 118 hostnames from centralized config.tfvars to per-service dns_type. 17 hostnames remain centrally managed (Helm ingresses, special cases). - Update docs, AGENTS.md, CLAUDE.md, dns.md runbook. ``` BEFORE AFTER config.tfvars (manual list) stacks/<svc>/main.tf \| module "ingress" { v dns_type = "proxied" stacks/cloudflared/ } for_each = list \| cloudflare_record auto-creates tunnel per-hostname cloudflare_record + annotation ``` ## What is NOT in this change: - Uptime Kuma monitor migration (still reads from config.tfvars) - 17 remaining centrally-managed hostnames (Helm, special cases) - Removal of allow_overwrite (keep until migration confirmed stable) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-16 13:45:04 +00:00
Viktor Barzin	bcad200a23	chore: add untracked stacks, scripts, and agent configs - New stacks: beads-server, hermes-agent - Terragrunt tiers.tf for infra, phpipam, status-page - Secrets symlinks for vault, phpipam, hermes-agent - Scripts: cluster_manager, image_pull, containerd pullthrough setup - Frigate config, audiblez-web app source, n8n workflows dir - Claude agent: service-upgrade, reference: upgrade-config.json - Removed: claudeception skill, excalidraw empty submodule, temp listings [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 09:33:06 +00:00
Viktor Barzin	ea18116da9	fix: NFS outage recovery — migrate to NFSv4, add alerting NFS server restart broke NFSv3 (lockd kernel bug on PVE 6.14). All 52 NFS PVs patched to nfsvers=4, NFSv3 disabled on PVE. Changes: - nfs_volume module: add nfsvers=4 mount option - nfs-csi StorageClass: add nfsvers=4 mount option - dbaas: MySQL serverInstances 3→1, mysql-native-password=ON - monitoring: add NFSCSINodeDown and NFSMountFailures alerts [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:28:27 +00:00
Viktor Barzin	878b556179	state(monitoring): update encrypted state	2026-03-29 01:04:11 +02:00
Viktor Barzin	8c6f238697	add default Homepage annotations to ingress_factory for auto-discovery - ingress_factory now injects gethomepage.dev/* annotations on all ingresses (name, group, href, icon) with namespace-to-group mapping - Stacks with explicit annotations override defaults via merge order - New homepage_enabled var allows opt-out for internal-only ingresses - Homepage search widget switched to in-page quicklaunch (Ctrl+K / tap) - Added hideErrors and quicklaunch settings for clean service directory - Result: 116/134 ingresses now discoverable (up from ~30)	2026-03-25 11:00:38 +02:00
Viktor Barzin	2dcb4b7fa4	fix(renew-tls): clean stale _acme-challenge TXT records before certbot 21+ stale TXT records accumulated from previous runs, causing certbot DNS-01 challenge to fail. Now deletes all _acme-challenge records from Cloudflare before certbot creates fresh ones.	2026-03-23 22:32:27 +02:00
Viktor Barzin	250a058627	feat(traefik): add custom error pages with tarampampam/error-pages Deploy error-pages service to show themed error pages instead of raw Traefik 502/503/504 responses. Adds catch-all IngressRoute (priority 1) for 404 on unknown hosts. Only 5xx intercepted to avoid breaking JSON APIs.	2026-03-19 23:14:27 +00:00
Viktor Barzin	1b78e44ab6	[ci skip] fix: add mount_options to nfs_volume PV spec StorageClass mountOptions only apply during dynamic provisioning. Static PVs (created by Terraform) need mount_options set explicitly. Without this, all CSI NFS mounts default to hard,timeo=600 — the exact problem we were trying to fix.	2026-03-02 20:22:47 +00:00
Viktor Barzin	c702fd2565	[ci skip] add NFS CSI driver + nfs_volume shared module - Deploy csi-driver-nfs Helm chart as platform module (nfs-csi) - Create nfs-truenas StorageClass with soft,timeo=30,retrans=3 mount options - Add shared nfs_volume module for PV/PVC boilerplate (modules/kubernetes/nfs_volume/)	2026-03-01 23:38:58 +00:00
Viktor Barzin	7ff3c61bd7	[ci skip] add retry middleware (2 attempts, 100ms) to default ingress chain	2026-03-01 14:35:53 +00:00
Viktor Barzin	006f95337e	[ci skip] Add anti_ai_scraping option to ingress_factory (default: true)	2026-02-22 19:50:07 +00:00
Viktor Barzin	116c4d9c30	[ci skip] Remove legacy files and orphaned modules Delete 20 orphaned module directories and 3 stray files from modules/kubernetes/ that are no longer referenced by any stack. Remove 7 root-level legacy files including the empty tfstate, 27MB terraform zip, commented-out main.tf, and migration notes. Clean up commented-out dockerhub_secret and oauth-proxy references in blog, travel_blog, and city-guesser stacks. Remove stale frigate config.yaml entry from .gitignore. Remove ephemeral docs/plans/ directory.	2026-02-22 15:23:27 +00:00
Viktor Barzin	e6420c7b36	[ci skip] Move Terraform modules into stack directories Move all 88 service modules (66 individual + 22 platform) from modules/kubernetes/<service>/ into their corresponding stack directories: - Service stacks: stacks/<service>/module/ - Platform stack: stacks/platform/modules/<service>/ This collocates module source code with its Terragrunt definition. Only shared utility modules remain in modules/kubernetes/: ingress_factory, setup_tls_secret, dockerhub_secret, oauth-proxy. All cross-references to shared modules updated to use correct relative paths. Verified with terragrunt run --all -- plan: 0 adds, 0 destroys across all 68 stacks.	2026-02-22 14:38:14 +00:00
Viktor Barzin	945a5f35b0	[ci skip] Fix path.root references for git-crypt key in openclaw and drone Modules used filebase64("${path.root}/.git/git-crypt/keys/default") which breaks with Terragrunt since path.root is now stacks/<service>/ instead of repo root. Changed to accept git_crypt_key_base64 variable and resolve the path in the stack wrapper.	2026-02-22 14:01:02 +00:00
Viktor Barzin	71bfdc8e89	[ci skip] Phase 3: Remove migrated service modules from monolith All 66 service modules removed from modules/kubernetes/main.tf (now just a migration notice). The kubernetes_cluster module block removed from root main.tf. All services now managed via stacks/<service>/.	2026-02-22 13:58:07 +00:00
Viktor Barzin	39ce2000cf	[ci skip] Remove 22 platform services from modules/kubernetes/main.tf Migrated to stacks/platform/: metallb, dbaas, redis, traefik, technitium, headscale, authentik, rbac, k8s-portal, crowdsec, monitoring, vaultwarden, reverse-proxy, metrics-server, nvidia, kyverno, uptime-kuma, wireguard, xray, mailserver, cloudflared, infra-maintenance. Also removed null_resource.core_services and all depends_on references to it from the remaining ~66 service modules.	2026-02-22 13:40:45 +00:00
Viktor Barzin	db659b1f7a	[ci skip] Fix dashy OOMKilled and healthcheck DNS false-failure - Add explicit resource limits to dashy (2Gi memory) to prevent OOMKilled during webpack build on startup - Rewrite DNS healthcheck to test from inside the Technitium pod via kubectl exec, since MetalLB virtual IPs aren't reachable from outside the L2 network - Deleted orphaned kured/tls-secret (expired Oct 2025, module disabled, not mounted by kured DaemonSet)	2026-02-22 12:46:12 +00:00
Viktor Barzin	f05bf109c5	[ci skip] Increase Drone CI resource quota to handle concurrent builds Each build pod has 8-10 containers inheriting 1 CPU / 2Gi limits from LimitRange defaults. With 4+ concurrent builds the old quota (48 CPU / 96Gi / 30 pods) was exhausted, blocking new builds. Increase to 64 CPU / 128Gi / 60 pods to safely support 5-6 concurrent builds.	2026-02-22 12:28:42 +00:00
Viktor Barzin	0ff2aaec60	[ci skip] Add native HLS playback for VIPLeague/DaddyLive streams (v1.3.1) - Add HLS proxy (hlsproxy) for rewriting m3u8 playlists and proxying segments with correct Referer/Origin headers (uses ?domain= param) - Add playerconfig service for detecting stream types (VIPLeague, DaddyLive, HLS) and extracting auth params from ksohls pages - Add VIPLeague URL resolution: extract slug from URL path, match against DaddyLive 24/7 channel index with token-based scoring - Replace Clappr with direct HLS.js player for better compatibility - Add CryptoJS CDN for DaddyLive auth module support - Disable CrowdSec on f1-stream ingress to prevent false positives - Bump image to v1.3.1	2026-02-22 01:30:06 +00:00
Viktor Barzin	e59928187b	[ci skip] Set CronJob backoffLimit=0 to prevent duplicate Slack alerts	2026-02-22 00:59:34 +00:00
Viktor Barzin	cd0c030a55	[ci skip] Fix CronJob kubectl image tag to :latest	2026-02-22 00:38:33 +00:00
Viktor Barzin	f79e84c693	[ci skip] Add cluster health check CronJob to OpenClaw module	2026-02-22 00:08:51 +00:00
Viktor Barzin	b925f9caf7	[ci skip] Add Slack webhook env var to OpenClaw deployment	2026-02-21 23:57:34 +00:00
Viktor Barzin	846eb3bd24	[ci skip] Add custom resource quota for authentik namespace Authentik runs ~10 pods (3 server + 3 worker + 3 pgbouncer + outpost) which exceeds the default tier-1-cluster quota limits. Add custom-quota label to opt out of Kyverno-generated quotas and define a Terraform-managed ResourceQuota with limits appropriate for authentik's workload.	2026-02-21 23:44:05 +00:00
Viktor Barzin	d345841ef2	[ci skip] Add tier labels to all namespace resources for Kyverno resource governance Added `tier = var.tier` to kubernetes_namespace labels in ~73 service modules. This enables Kyverno to generate LimitRange defaults, ResourceQuotas, and PriorityClass injection for all namespaces. Previously only 11 namespaces had tier labels; now all 80 active namespaces are labeled. All pods restarted in rolling waves to pick up the new policies.	2026-02-21 23:38:05 +00:00
Viktor Barzin	517f5d6a6c	[ci skip] Increase tier-based resource quotas to prevent quota exhaustion Tier 2-gpu: 32→48 CPU limits, 64→96Gi mem limits, 30→40 pods Tier 3-edge: 2→4 req CPU, 8→16 CPU limits, 16→32Gi mem limits, 20→30 pods Tier 4-aux: 1→2 req CPU, 4→8 CPU limits, 8→16Gi mem limits, 15→20 pods Fixes realestate-crawler (100% quota), nvidia (89.7%), resume/website (75%), and actualbudget (75%) quota exhaustion causing pod creation failures.	2026-02-21 23:26:00 +00:00
Viktor Barzin	ce31571a9f	[ci skip] Fix JS shim rw() routing non-proxy paths through proxy prefix When upstream JS constructs URLs via location.origin + '/path', the rw() function stripped the origin but returned bare '/path' which hit our server's HTML index. Now correctly prefixes with /proxy/{b64origin} so XHR/fetch requests for scripts reach the upstream via proxy. Bump image to v1.2.7	2026-02-21 23:16:09 +00:00
Viktor Barzin	8562ed1b8f	[ci skip] Fix video playback and comprehensive anti-debug neutralization Video: - Add allow="autoplay; encrypted-media; fullscreen" to iframe for media playback Anti-debug: - Strip ad/popup scripts (acscdn, popunder) and context menu blockers from HTML - Strip debugger statements from inline HTML scripts and proxied JS responses - Intercept setTimeout (not just setInterval) for debugger-based detection - Override eval() and Function() constructor to strip debugger statements - Bump image to v1.2.6	2026-02-21 23:12:11 +00:00
Viktor Barzin	642e774b62	[ci skip] Fix Kyverno priority injection to remove default priority/preemptionPolicy The priority injection policy was setting priorityClassName on pods but Kubernetes had already defaulted priority=0 and preemptionPolicy=PreemptLowerPriority on those pods, causing admission controller to reject the mismatch. Switch from patchStrategicMerge to patchesJson6902 to explicitly remove the priority and preemptionPolicy fields before setting priorityClassName.	2026-02-21 23:11:35 +00:00
Viktor Barzin	fc0e1c3c6e	[ci skip] Fix narrow iframe content and strip anti-debug scripts in proxy - Remove flex centering from browser-viewer-content; use absolute positioning for iframe to fill the entire container - Strip disable-devtool and devtools-detect script tags from proxied HTML - Add JS shim hooks to neutralize setInterval-based debugger traps and block loading of anti-debug scripts via setAttribute - Bump image to v1.2.5	2026-02-21 21:32:39 +00:00
Viktor Barzin	0c2c48802f	[ci skip] Sandbox proxy iframe to prevent frame-busting Add sandbox attribute to prevent proxied pages from navigating top.location or replacing the parent page body. Allows scripts, same-origin, forms, popups, and presentation but blocks top-navigation.	2026-02-21 21:25:51 +00:00
Viktor Barzin	7a444b43fa	[ci skip] Add reverse proxy mode to f1-stream Replace CPU-intensive headless Chrome + WebRTC pipeline with a lightweight Go reverse proxy that strips anti-framing headers (X-Frame-Options, CSP) and embeds streaming sites in iframes. - New internal/proxy package with URL rewriting for HTML/CSS - JS shim injection to intercept fetch/XHR/WebSocket/createElement - Referer reconstruction for correct cross-origin auth (HLS streams) - Inline iframe viewer preserving site navigation (not fullscreen overlay)	2026-02-21 21:23:21 +00:00
Viktor Barzin	2446fec1f6	[ci skip] Fix whiteboard priority class mismatch and OnlyOffice OOMKill - Add priority_class_name to nextcloud whiteboard deployment to match Kyverno-injected tier-3-edge priority class - Add explicit resource limits (4Gi memory) for OnlyOffice document server to prevent OOMKill during font generation	2026-02-21 21:22:03 +00:00
Viktor Barzin	26ba9ea371	[ci skip] Fix Prometheus storage alert and Grafana quota exhaustion - Enable size-based TSDB retention (45GB) to clean up old blocks (including 2021-era blocks with failed compaction) - Increase monitoring namespace quota from 64/128Gi to 80/160Gi CPU/memory limits to allow Grafana rolling updates	2026-02-21 21:04:08 +00:00
Viktor Barzin	dcce738641	[ci skip] Bump inotify max_user_instances from 512 to 8192 Fixes "failed to create fsnotify watcher: too many open files" in Drone CI builds where vitest exhausts the default inotify instance limit.	2026-02-21 20:21:04 +00:00
Viktor Barzin	de9c0869ba	[ci skip] Fix CrowdSec pods failing due to priority class mismatch Kyverno injects priorityClassName tier-1-cluster on pods in the crowdsec namespace, but pods had no explicit priorityClassName set, defaulting priority to 0. Admission controller rejected the mismatch (0 vs 800000). Set priorityClassName on LAPI, agent (Helm values) and crowdsec-web (Terraform deployment).	2026-02-21 19:18:15 +00:00
Viktor Barzin	a9e5320427	[ci skip] Disable grampsweb service and remove family DNS record	2026-02-21 18:55:54 +00:00
Viktor Barzin	de1a43a3c7	[ci skip] Add coturn TURN/STUN server for WebRTC relay - Deploy coturn on k8s with MetalLB shared IP (10.0.20.200) - Normal pod networking (no hostNetwork), runs on any node - 100 relay ports (49152-49252), port 3478 for STUN/TURN signaling - Shared secret auth for time-limited TURN credentials - For F1 streaming WebRTC NAT traversal	2026-02-21 18:08:01 +00:00
Viktor Barzin	5fe288a4e4	[ci skip] Real estate crawler: 2 replicas for UI/API, rolling update for celery - UI and API: 1 → 2 replicas for zero-downtime during restarts/crashes - Celery worker: Recreate → RollingUpdate strategy - Celery beat: unchanged (Recreate, singleton scheduler) - Move f1 from Cloudflare proxied to non-proxied DNS	2026-02-21 17:32:45 +00:00
Viktor Barzin	2298459496	[ci skip] Use versioned image tag for f1-stream to bypass stale cache Pull-through cache on registry VM served stale arm64-only manifest for :latest tag. Switch to v1.0.0 tag so cache fetches the fresh amd64 image.	2026-02-21 16:07:58 +00:00
Viktor Barzin	2fe7fa547c	[ci skip] Configure f1-stream: WebAuthn, NFS storage, headless browser - Set WEBAUTHN_RPID/ORIGIN for f1.viktorbarzin.me domain - Add NFS volume at /mnt/main/f1-stream for persistent session/stream data - Enable headless browser extraction (HEADLESS_EXTRACT_ENABLED=true) - Reduce replicas to 1 (file-based sessions don't work across replicas)	2026-02-21 15:57:25 +00:00
Viktor Barzin	a5e0b19a3a	[ci skip] Fix f1-stream port mismatch: container listens on 8080, not 80	2026-02-21 15:42:47 +00:00
Viktor Barzin	8756bcfb9a	[ci skip] Increase Drone CI namespace resource quota Double CPU and memory limits to give CI pipelines more headroom.	2026-02-21 14:49:16 +00:00
Viktor Barzin	144e9b3e39	[ci skip] Add Kyverno policy to inject ndots:2 on all pods Reduces NxDomain query flood caused by Kubernetes default ndots:5 search domain expansion. 78% of DNS queries were wasted NxDomain lookups.	2026-02-20 00:21:03 +00:00
Viktor Barzin	5df615c31d	[ci skip] Add Modal GLM-5 model to OpenClaw, fix streaming and download reliability - Add modal provider (GLM-5-FP8) as primary model with non-streaming mode (GLM-5 uses non-standard reasoning_content field incompatible with streaming) - Add curl --retry flags to init container downloads for reliability - Fallback chain: GLM-5 → Gemini 2.5 Flash → Llama 3.3 70B	2026-02-19 23:17:08 +00:00

1 2 3 4 5 ...

1056 commits