- Anti-AI: 5-layer → 3 active layers (bot-block, X-Robots-Tag, tarpit) - Layer 3 (trap links via rewrite-body) removed — Yaegi v3 incompatible - Rybbit analytics now injected via Cloudflare Worker (HTMLRewriter) - strip-accept-encoding middleware removed from all references Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
15 KiB
Security & L7 Protection
Overview
The homelab implements defense-in-depth security at the application layer (L7) using CrowdSec for threat intelligence and IP reputation, Kyverno for policy enforcement and resource governance, and a 3-layer anti-AI scraping defense (reduced from 5 in April 2026 after removing the rewrite-body plugin). All security components operate in graceful degradation mode (fail-open) to prevent cascading failures. Security policies are deployed in audit mode first, then selectively enforced after validation.
Architecture Diagram
graph LR
Internet[Internet]
CF[Cloudflare WAF]
Tunnel[Cloudflared Tunnel]
CrowdSec[CrowdSec Bouncer<br/>Traefik Plugin]
AntiAI[Anti-AI Check<br/>poison-fountain]
ForwardAuth[Authentik ForwardAuth]
RateLimit[Rate Limit Middleware]
Retry[Retry Middleware<br/>2 attempts, 100ms]
Backend[Backend Service]
LAPI[CrowdSec LAPI<br/>3 replicas]
Agent[CrowdSec Agent]
Internet -->|1| CF
CF -->|2| Tunnel
Tunnel -->|3| CrowdSec
CrowdSec -.->|Query| LAPI
Agent -.->|Report| LAPI
CrowdSec -->|4. Pass/Block| AntiAI
AntiAI -->|5. Human/Bot| ForwardAuth
ForwardAuth -->|6. Authenticated| RateLimit
RateLimit -->|7. Under Limit| Retry
Retry -->|8. Success/Retry| Backend
style CrowdSec fill:#f9f,stroke:#333
style AntiAI fill:#ff9,stroke:#333
style ForwardAuth fill:#9f9,stroke:#333
style RateLimit fill:#99f,stroke:#333
Components
| Component | Version | Location | Purpose |
|---|---|---|---|
| CrowdSec LAPI | Pinned | stacks/crowdsec/ |
Local API, threat intelligence aggregation (3 replicas) |
| CrowdSec Agent | Pinned | stacks/crowdsec/ |
Log parser, scenario detection |
| CrowdSec Traefik Bouncer | Plugin | Traefik config | Plugin-based IP reputation check |
| Kyverno | Pinned chart | stacks/kyverno/ |
Policy engine for K8s admission control |
| poison-fountain | Latest | stacks/poison-fountain/ |
Anti-AI bot detection and tarpit service |
| cert-manager/certbot | - | stacks/cert-manager/ |
TLS certificate management |
| Traefik | Latest | stacks/platform/ |
Ingress controller with HTTP/3 (QUIC) |
How It Works
Request Security Layers
Every incoming request passes through 6 security layers:
- Cloudflare WAF - DDoS protection, bot detection, firewall rules (external)
- Cloudflared Tunnel - Zero Trust tunnel, hides origin IP
- CrowdSec Bouncer - IP reputation check against LAPI (fail-open on error)
- Anti-AI Scraping - 3-layer bot defense (optional per service, updated 2026-04-17)
- Authentik ForwardAuth - Authentication check (if
protected = true) - Rate Limiting - Per-source IP rate limits (returns 429 on breach)
- Retry Middleware - Auto-retry on transient errors (2 attempts, 100ms delay)
CrowdSec Threat Intelligence
CrowdSec operates in a hub-and-agent model:
LAPI (Local API):
- 3 replicas for high availability
- Aggregates threat intelligence from agent + community
- Maintains ban list (IP reputation database)
- Version pinned to prevent breaking changes
Agent:
- Parses Traefik access logs
- Detects attack scenarios (SQL injection, directory traversal, brute force)
- Reports malicious IPs to LAPI
- Shares threat intel with CrowdSec community (anonymized)
Traefik Bouncer Plugin:
- Integrated as Traefik middleware
- Queries LAPI for IP reputation on each request
- Fail-open mode: If LAPI unreachable, allows traffic (graceful degradation)
- Blocks IPs on ban list, allows others
Metabase (disabled by default):
- Dashboard for CrowdSec analytics
- CPU-intensive, only enable when investigating incidents
Kyverno Policy Engine
Kyverno enforces cluster-wide policies via admission webhooks. All policies use failurePolicy=Ignore to prevent blocking cluster operations.
5-Tier Resource Governance
Namespaces are labeled with a tier (tier: 0 through tier: 4). Kyverno auto-generates:
- LimitRange - Per-container CPU/memory limits
- ResourceQuota - Namespace-wide resource caps
| Tier | CPU Limit/Container | Memory Limit/Container | Namespace CPU Quota | Namespace Memory Quota |
|---|---|---|---|---|
| 0 | 100m | 128Mi | 500m | 512Mi |
| 1 | 250m | 256Mi | 1000m | 1Gi |
| 2 | 500m | 512Mi | 2000m | 2Gi |
| 3 | 1000m | 1Gi | 4000m | 4Gi |
| 4 | 2000m | 2Gi | 8000m | 8Gi |
This prevents resource exhaustion and enforces governance without manual quota management.
Security Policies (ALL in Audit Mode)
Why audit mode? Gradual rollout without breaking existing workloads. Policies collect violations, then selectively enforced after cleanup.
| Policy | Purpose | Enforcement |
|---|---|---|
deny-privileged-containers |
Block privileged pods | Audit |
deny-host-namespaces |
Block hostNetwork/hostPID/hostIPC | Audit |
restrict-sys-admin |
Block CAP_SYS_ADMIN | Audit |
require-trusted-registries |
Only allow approved image registries | Audit |
Operational Policies
| Policy | Purpose | Mode |
|---|---|---|
inject-priority-class-from-tier |
Set pod priorityClass based on namespace tier | Enforce (CREATE only) |
inject-ndots |
Set DNS ndots:2 for faster lookups |
Enforce |
sync-tier-label |
Propagate tier label to child resources | Enforce |
goldilocks-vpa-auto-mode |
Disable VPA globally (VPA off) | Enforce |
Anti-AI Scraping (3 Active Layers) (Updated 2026-04-17)
Enabled by default via ingress_factory module. Disable per-service with anti_ai_scraping = false.
Active middleware chain: ai-bot-block (ForwardAuth) + anti-ai-headers (X-Robots-Tag). The strip-accept-encoding and anti-ai-trap-links middlewares were removed in April 2026 due to Traefik v3.6.12 Yaegi plugin incompatibility with the rewrite-body plugin.
Layer 1: Bot Blocking (ForwardAuth)
- Middleware calls
poison-fountainservice before backend - Analyzes User-Agent, request patterns, timing
- Blocks known AI scrapers (GPTBot, CCBot, etc.)
- Fail-open: If poison-fountain down, allows traffic
Layer 2: X-Robots-Tag Header
- HTTP response header:
X-Robots-Tag: noai, noindex, nofollow - Instructs compliant bots to skip content
- Lightweight, no performance impact
Layer 3: Trap Links (REMOVED)
Removed April 2026. The rewrite-body Traefik plugin used to inject hidden trap links broke on Traefik v3.6.12 due to Yaegi runtime bugs. The companion strip-accept-encoding middleware was also removed.
Layer 3 (formerly 4): Tarpit / Poison Content
poison-fountainservice still exists as a standalone service atpoison.viktorbarzin.me- Serves AI bots extremely slowly (~100 bytes/sec tarpit)
- CronJob every 6 hours generates fake content
- Trap links are no longer injected into real pages, but bots that discover
poison.viktorbarzin.medirectly still get tarpitted and poisoned
Implementation: See stacks/poison-fountain/ and stacks/platform/modules/traefik/middleware.tf
TLS & HTTP/3
Traefik handles TLS termination:
- HTTP/3 (QUIC) enabled for performance
- Automatic HTTP → HTTPS redirect
- cert-manager/certbot manages certificate lifecycle
- Let's Encrypt integration for automatic renewal
Rate Limiting
Per-source IP limits:
- Default: 100 requests/minute
- Returns 429 Too Many Requests (not 503)
- Higher limits for upload-heavy services:
- Immich: 500 req/min (photo uploads)
- Nextcloud: 300 req/min (file sync)
Retry Middleware:
- 2 attempts max
- 100ms delay between retries
- Applied after rate limiting
- Handles transient backend errors
Fallback Proxies
Authentik Fallback:
- If Authentik down, falls back to basicAuth
- Prevents total service outage during IdP maintenance
- Temporary credentials stored in Vault
Poison-Fountain Fallback:
- If anti-AI service down, allows all traffic
- Fail-open prevents blocking legitimate users
- Monitors for service health, auto-recovers
Configuration
Key Config Files
| Path | Purpose |
|---|---|
stacks/crowdsec/ |
CrowdSec LAPI, agent, bouncer config |
stacks/kyverno/ |
Kyverno deployment + policies |
stacks/poison-fountain/ |
Anti-AI service + CronJob |
stacks/platform/modules/traefik/middleware.tf |
Security middleware definitions |
stacks/platform/modules/ingress_factory/ |
Per-service security toggles |
Vault Paths
- CrowdSec API key:
secret/crowdsec/api-key- LAPI authentication - BasicAuth fallback:
secret/authentik/fallback-creds- Emergency auth - TLS certificates:
secret/tls/- Certificate private keys
Terraform Stacks
stacks/crowdsec/- CrowdSec infrastructurestacks/kyverno/- Policy enginestacks/poison-fountain/- Anti-AI defensestacks/platform/- Traefik + middleware
Per-Service Security Config
module "myapp_ingress" {
source = "./modules/ingress_factory"
name = "myapp"
host = "myapp.viktorbarzin.me"
# Security toggles
protected = true # Enable ForwardAuth
anti_ai_scraping = false # Disable anti-AI (e.g., for public API)
rate_limit = 200 # Custom rate limit (req/min)
}
Kyverno Policy Example
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: inject-ndots
spec:
background: false
rules:
- name: inject-ndots
match:
resources:
kinds:
- Pod
mutate:
patchStrategicMerge:
spec:
dnsConfig:
options:
- name: ndots
value: "2"
Decisions & Rationale
Why CrowdSec over ModSecurity?
- Community threat intelligence: Shared ban lists, crowdsourced attack detection
- Easier management: YAML scenarios vs complex ModSecurity rules
- Better performance: Lightweight Go agent vs resource-heavy Apache module
- Active development: More frequent updates, responsive community
Why Audit-Only Security Policies?
- Gradual rollout: Identify violations without breaking existing workloads
- Risk reduction: Prevents policy bugs from blocking critical deployments
- Better observability: Collect violation metrics before enforcing
- Selective enforcement: Move to enforce mode per-policy after validation
Why Multi-Layer Anti-AI Defense? (Updated 2026-04-17)
- Defense in depth: Each layer catches different bot types
- Compliant bots: Layer 2 (X-Robots-Tag) handles respectful crawlers
- Persistent bots: Tarpit makes scraping uneconomical
- Poison content: Degrades training data for bots that reach poison-fountain
- Layer 3 (trap links via rewrite-body) was removed due to Traefik v3 plugin incompatibility
Why Fail-Open Mode?
- Availability over security: Homelab prioritizes uptime
- Graceful degradation: Single component failure doesn't cascade
- Manual intervention: Security incidents are rare, can handle manually
- Layer redundancy: If one layer fails, others still protect
Why Pin CrowdSec/Kyverno Versions?
- Breaking changes: Both projects had breaking config changes in past
- Controlled upgrades: Test in staging before upgrading production
- Stability: Prevents auto-upgrade during outages
- Rollback: Easy to revert if upgrade causes issues
Why HTTP/3 (QUIC)?
- Performance: Lower latency, better mobile performance
- Connection migration: Survives IP changes (mobile networks)
- 0-RTT: Faster TLS handshake for repeat visitors
- Future-proof: Industry moving to HTTP/3
Troubleshooting
CrowdSec Blocking Legitimate IP
Problem: Legitimate user IP on ban list.
Fix:
- Check LAPI decisions:
kubectl exec -it crowdsec-lapi-0 -- cscli decisions list - Remove ban:
kubectl exec -it crowdsec-lapi-0 -- cscli decisions delete --ip <IP> - Whitelist if needed: Add to
stacks/crowdsec/whitelist.yaml
Kyverno Policy Blocking Deployment
Problem: Pod creation fails with policy violation.
Fix:
- Check policy reports:
kubectl get policyreport -A - Verify
failurePolicy=Ignoreis set (should never block) - If blocking, temporarily disable policy:
kubectl annotate clusterpolicy <policy> kyverno.io/exclude=true - Investigate root cause, fix workload or update policy
Anti-AI Service Down, Traffic Blocked
Problem: poison-fountain service unhealthy, all traffic blocked.
Fix:
- Verify fail-open config: Check
stacks/platform/modules/traefik/middleware.tfforfailurePolicy: allow - Restart service:
kubectl rollout restart deployment/poison-fountain -n poison-fountain - Temporary disable: Set
anti_ai_scraping = falseiningress_factoryfor affected services
Rate Limit Too Aggressive
Problem: Legitimate users getting 429 errors.
Fix:
- Check Traefik logs for rate limit hits:
kubectl logs -n traefik -l app=traefik | grep 429 - Increase limit in
ingress_factory:rate_limit = 300 - Apply:
terraform apply
HTTP/3 Not Working
Problem: Browser shows HTTP/2, not HTTP/3.
Fix:
- Verify Traefik HTTP/3 enabled:
kubectl get cm traefik-config -o yaml | grep http3 - Check UDP port 443 accessible:
nc -u <public-ip> 443 - Browser support: Use Chrome/Firefox dev tools, check Protocol column
TLS Certificate Expired
Problem: Browser shows certificate expired.
Fix:
- Check cert-manager:
kubectl get certificate -A - Force renewal:
kubectl delete secret <tls-secret> -n <namespace> - cert-manager will auto-renew within 5 minutes
- If fails, check Let's Encrypt rate limits
Traefik Retry Loop
Problem: Backend logs show duplicate requests.
Fix:
- Check retry middleware config: Should be 2 attempts max
- Verify backend isn't returning transient errors: Check for 5xx responses
- Disable retry for specific service: Remove retry middleware from
ingress_factory
Poison Content Not Serving (Updated 2026-04-17)
Problem: Bots not receiving poisoned content on poison.viktorbarzin.me.
Note: Poison content is no longer injected into real pages (rewrite-body removed). It is only served directly via the poison.viktorbarzin.me subdomain.
Fix:
- Verify CronJob running:
kubectl get cronjob -n poison-fountain - Check logs:
kubectl logs -n poison-fountain -l app=poison-fountain - Manually trigger:
kubectl create job --from=cronjob/poison-content manual-poison
Related
- Authentication & Authorization - Authentik, OIDC, ForwardAuth
- Networking - Ingress, DNS, load balancing
- Monitoring - Prometheus, Grafana, alerting
- CrowdSec Runbook - CrowdSec operations
- Kyverno Policy Management - Policy authoring and troubleshooting