Add sandbox attribute to prevent proxied pages from navigating
top.location or replacing the parent page body. Allows scripts,
same-origin, forms, popups, and presentation but blocks
top-navigation.
Replace CPU-intensive headless Chrome + WebRTC pipeline with a
lightweight Go reverse proxy that strips anti-framing headers
(X-Frame-Options, CSP) and embeds streaming sites in iframes.
- New internal/proxy package with URL rewriting for HTML/CSS
- JS shim injection to intercept fetch/XHR/WebSocket/createElement
- Referer reconstruction for correct cross-origin auth (HLS streams)
- Inline iframe viewer preserving site navigation (not fullscreen overlay)
- Add priority_class_name to nextcloud whiteboard deployment to match
Kyverno-injected tier-3-edge priority class
- Add explicit resource limits (4Gi memory) for OnlyOffice document
server to prevent OOMKill during font generation
- Enable size-based TSDB retention (45GB) to clean up old blocks
(including 2021-era blocks with failed compaction)
- Increase monitoring namespace quota from 64/128Gi to 80/160Gi
CPU/memory limits to allow Grafana rolling updates
Kyverno injects priorityClassName tier-1-cluster on pods in the crowdsec
namespace, but pods had no explicit priorityClassName set, defaulting
priority to 0. Admission controller rejected the mismatch (0 vs 800000).
Set priorityClassName on LAPI, agent (Helm values) and crowdsec-web
(Terraform deployment).
- Deploy coturn on k8s with MetalLB shared IP (10.0.20.200)
- Normal pod networking (no hostNetwork), runs on any node
- 100 relay ports (49152-49252), port 3478 for STUN/TURN signaling
- Shared secret auth for time-limited TURN credentials
- For F1 streaming WebRTC NAT traversal
- UI and API: 1 → 2 replicas for zero-downtime during restarts/crashes
- Celery worker: Recreate → RollingUpdate strategy
- Celery beat: unchanged (Recreate, singleton scheduler)
- Move f1 from Cloudflare proxied to non-proxied DNS
- Set WEBAUTHN_RPID/ORIGIN for f1.viktorbarzin.me domain
- Add NFS volume at /mnt/main/f1-stream for persistent session/stream data
- Enable headless browser extraction (HEADLESS_EXTRACT_ENABLED=true)
- Reduce replicas to 1 (file-based sessions don't work across replicas)
Fixes HA mobile app 403 when embedding Grafana dashboards - the webview
blocks third-party cookies needed by Authentik forward auth. Grafana
already has anonymous Viewer access enabled, so forward auth is not
needed. Also adds grafana_admin_password variable and explicit resource
limits to prevent ResourceQuota issues during rolling updates.
- Add skill_secrets variable to moltbot module with HA tokens and
Uptime Kuma password as container env vars
- Install Python packages (requests, caldav, icalendar, uptime-kuma-api)
in init container with PYTHONPATH for main container access
- Update all skills to use python3 directly instead of ~/.venvs/claude
venv path that doesn't exist in the container
- Remove hardcoded Uptime Kuma password from skill, use env var
- Add RBAC module (modules/kubernetes/rbac/) with admin, power-user,
and namespace-owner roles, API server OIDC flags, and audit logging
- Add self-service portal (modules/kubernetes/k8s-portal/) SvelteKit app
with kubeconfig download and setup instructions
- Configure Alloy to collect audit logs from kube-apiserver
- Add Grafana dashboard for Kubernetes audit log visualization
- Configure Authentik OIDC provider with groups scope mapping
- Wire up k8s_users and ssh_private_key variables through module chain
- Convert setup-project and extend-vm-storage from standalone .md
to directory-based SKILL.md format with YAML frontmatter
- Add symlink in moltbot init container to expose Claude skills
at ~/.openclaw/skills/ for auto-discovery by OpenClaw
- Update CLAUDE.md skill path references
Add MySQL datasource and 15-panel dashboard for DNS analytics:
queries over time, response codes, top domains/clients, response
times, blocked/NxDomain domains. Enable Grafana dashboard sidecar
for auto-provisioning dashboards from ConfigMaps.
Single template regex in the viktorbarzin.lan block catches ALL search
domain expansion junk (*.com.viktorbarzin.lan, *.cluster.local.viktorbarzin.lan,
etc.) instead of needing separate server blocks per pattern. Legitimate
single-label queries (idrac.viktorbarzin.lan) fall through to Technitium.
DNS queries were going through Traefik's IngressRouteUDP, replacing
real client IPs with Traefik pod IPs (10.10.169.150) in Technitium logs.
Changed Technitium DNS service from NodePort to LoadBalancer with
externalTrafficPolicy: Local, removed dns-udp entrypoint and
IngressRouteUDP from Traefik, and updated CoreDNS to forward .lan
queries to Technitium's LoadBalancer IP directly.
- Alloy: bump memory limits from 64Mi/128Mi to 256Mi/768Mi — pods were
OOMKilled at 128Mi, steady-state usage is ~400-450Mi per node
- iDRAC Redfish Exporter: add explicit priority_class_name to resolve
conflict between Kyverno priority injection and default priority: 0
Add kubernetes_config_map for CoreDNS to the technitium module, with a
template block for cluster.local.viktorbarzin.lan that returns NXDOMAIN
immediately. This prevents ndots:5 search domain expansion from flooding
Technitium with ~66k/day junk queries (e.g.
redis.redis.svc.cluster.local.viktorbarzin.lan).
Also enabled saveCache on Technitium so the DNS cache persists across
pod restarts.
Four layers of noisy-neighbor protection using existing tier system:
- PriorityClasses (tier-0-core through tier-4-aux)
- LimitRange defaults auto-generated per namespace tier
- ResourceQuotas auto-generated per namespace tier
- PriorityClassName injection on pods via Kyverno mutate
Custom quota overrides for monitoring and crowdsec namespaces
which exceed the default tier quotas.
Add new Kubernetes service for OpenClaw gateway connected to in-cluster
Ollama, with kubectl/terraform/git access for infrastructure management.
Protected behind Authentik SSO.
Fires after 5m if the haos Prometheus scrape target is unreachable.
Covers the HTTP API endpoint which shares the same process as the
WebSocket API used by the mobile app.
- meshcentral: rename port from "https" to "http" — MeshCentral serves
plain HTTP when REVERSE_PROXY=true, but Traefik inferred HTTPS from the
port name, causing 100% 5xx errors
- osm-routing/otp: scale to 0 — TfL GTFS data expired, OTP crash-loops
trying to build graph with no valid transit trips
- wireguard: add prometheus.io/port=9586 annotation — without it,
Prometheus tried scraping all container ports (51820 UDP, 80)
- travel-blog: remove stale prometheus.io annotations and dead port 9113
— nginx-exporter sidecar was commented out but annotations remained
- dawarich: remove prometheus.io annotations — exporter env vars are
commented out so nothing listens on port 9394
- monitoring: raise CPU temp threshold 60°C→75°C (E5-2699 v4 Tcase is
79°C), lower registry cache threshold 50%→25%, add minimum traffic
floor (>0.1 req/s) to 4xx/5xx rate alerts to prevent false positives
on low-traffic services
- Switch acquisition from ingress-nginx to traefik namespace/pods
- Change collection from crowdsecurity/nginx to crowdsecurity/traefik
- Add Slack notification plugin for ban/captcha decisions
- Wire alertmanager_slack_api_url through to CrowdSec module
- Add color coding (red/green) to Slack alerts, show alertname in title
- Use summary annotation in Slack text (description was always empty)
- Format all alert summaries consistently: value with units and threshold
- Fix ratio expressions (CPU/memory) to display as percentages
- Fix "failiure" typo, capitalize Tailscale
The packruler/rewrite-body plugin (used for rybbit analytics injection)
fails to decompress gzip responses with "flate: corrupt input before
offset 5", corrupting the response body. This broke HA Companion app's
external_auth flow and WebSocket connections on ha-sofia.
Fix: add a strip-accept-encoding middleware that removes Accept-Encoding
from requests when rybbit is active, forcing backends to send uncompressed
responses that the plugin can safely process.
Also add extra_middlewares variable to reverse_proxy factory for
extensibility.