docs: rewrite CrowdSec enforcement architecture (firewall-bouncer + CF WAF; Yaegi plugin removed)
The Traefik Yaegi CrowdSec bouncer plugin was dead on Traefik 3.7.5 (handler never invoked) and has been removed. Document the replacement: in-kernel nftables drop via cs-firewall-bouncer on direct hosts, and a Cloudflare IP-List + zone WAF block rule (fed by a LAPI->CF-list sync CronJob) on proxied hosts. Both add zero per-request latency and fail open. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
4df741f6de
commit
ceae4d5f06
3 changed files with 176 additions and 111 deletions
|
|
@ -202,7 +202,7 @@ the workflow's built-in `GITHUB_TOKEN` (`packages: write`).
|
||||||
- **Critical path services scaled to 3**: Traefik, Authentik, CrowdSec LAPI, PgBouncer, Cloudflared.
|
- **Critical path services scaled to 3**: Traefik, Authentik, CrowdSec LAPI, PgBouncer, Cloudflared.
|
||||||
- **PDBs**: minAvailable=2 on Traefik and Authentik.
|
- **PDBs**: minAvailable=2 on Traefik and Authentik.
|
||||||
- **Fallback proxies**: basicAuth when Authentik is down, fail-open when poison-fountain is down.
|
- **Fallback proxies**: basicAuth when Authentik is down, fail-open when poison-fountain is down.
|
||||||
- **CrowdSec bouncer**: graceful degradation mode (fail-open on error).
|
- **CrowdSec enforcement is out-of-band** (no Traefik plugin/middleware — the dead Yaegi `crowdsec-bouncer-traefik-plugin` was removed on Traefik 3.7.5): banned IPs are dropped **in-kernel via nftables** by the `cs-firewall-bouncer` DaemonSet on **direct** hosts (drops in BOTH the `input` and `forward` hooks — Traefik is ETP=Local so client traffic is DNAT'd to the pod via `forward`; pulls ALL decisions incl. the ~31k CAPI blocklist), and **blocked at the Cloudflare edge** for **proxied** hosts (one `crowdsec_ban` Rules List + a zone WAF block rule, fed by the `crowdsec-cf-sync` CronJob in `rybbit` ns every 2 min — excludes CAPI). Zero per-request latency; **fails open** (LAPI down → no new bans, existing drops persist, legit traffic never blocked). Whitelist covers RFC1918 + tailnet + internal CIDRs. Full as-built: `docs/architecture/security.md`.
|
||||||
- **Rate limiting**: Return 429 (not 503). Per-service tuning via dedicated middleware + `skip_default_rate_limit` (default 10/s burst 50): Immich 1000/20000, ActualBudget 50/300 (app boot = ~70 parallel revalidations).
|
- **Rate limiting**: Return 429 (not 503). Per-service tuning via dedicated middleware + `skip_default_rate_limit` (default 10/s burst 50): Immich 1000/20000, ActualBudget 50/300 (app boot = ~70 parallel revalidations).
|
||||||
- **Retry middleware**: 2 attempts, 100ms — in default ingress chain.
|
- **Retry middleware**: 2 attempts, 100ms — in default ingress chain.
|
||||||
- **Entrypoint transport timeouts** (`websecure` `respondingTimeouts`): `writeTimeout=0` (unlimited download duration), `readTimeout=3600s` (uploads ≤1h), `idleTimeout=600s`. These are **HARD total-duration caps**, not nginx-style per-read idle timeouts — a finite `writeTimeout` truncates *any* large download at that wall-clock mark (a prior `writeTimeout=60s` silently cut Immich videos at 60s). **Do NOT re-tighten `writeTimeout`**; keep `readTimeout` finite (slow-loris backstop) but ≥ longest expected upload. Full rationale: `docs/architecture/networking.md` → "Entrypoint Transport Timeouts".
|
- **Entrypoint transport timeouts** (`websecure` `respondingTimeouts`): `writeTimeout=0` (unlimited download duration), `readTimeout=3600s` (uploads ≤1h), `idleTimeout=600s`. These are **HARD total-duration caps**, not nginx-style per-read idle timeouts — a finite `writeTimeout` truncates *any* large download at that wall-clock mark (a prior `writeTimeout=60s` silently cut Immich videos at 60s). **Do NOT re-tighten `writeTimeout`**; keep `readTimeout` finite (slow-loris backstop) but ≥ longest expected upload. Full rationale: `docs/architecture/networking.md` → "Entrypoint Transport Timeouts".
|
||||||
|
|
@ -216,7 +216,7 @@ the workflow's built-in `GITHUB_TOKEN` (`packages: write`).
|
||||||
|---------|--------------------------|
|
|---------|--------------------------|
|
||||||
| Nextcloud | MaxRequestWorkers=150, needs 8Gi limit (Apache transient memory spikes, see commit eb94144), very generous startup probe |
|
| Nextcloud | MaxRequestWorkers=150, needs 8Gi limit (Apache transient memory spikes, see commit eb94144), very generous startup probe |
|
||||||
| Immich | ML on SSD (CUDA), disable ModSecurity (breaks streaming), frequent upgrades. **`immich-machine-learning` MUST run with `MACHINE_LEARNING_MODEL_TTL > 0`** (set to `600` in `stacks/immich/main.tf`, env on the `immich-machine-learning` deployment). At `0`, no model ever unloads and onnxruntime's CUDA arena (OCR's dynamic input shapes inflate it to ~10 GB) is held forever on the **time-sliced T4 it shares with llama-swap/frigate/immich-server** — which has no VRAM isolation, so immich-ml starved llama-swap (qwen3-8b) and silently broke recruiter-responder triage for ~5 h on 2026-06-02 (post-mortem `docs/post-mortems/2026-06-02-immich-ml-ttl-gpu-oom-recruiter.md`). TTL>0 lets idle models (OCR, face — AND CLIP) free VRAM. The TTL is a single GLOBAL knob (no per-model pin), so CLIP would also unload after 600s idle; the `clip-keepalive` CronJob (`*/5 * * * *`, same stack) pings the CLIP textual encoder so smart-search stays warm without pinning the ad-hoc models. **Smart search has a SECOND warmth layer in Postgres** (don't conflate it with the ML model): the ~665MB vchord `clip_index` must stay resident in PG `shared_buffers`, else an ANN probe that lands on an evicted list pays a ~1.8s cold storage read vs ~4ms warm. The `postStart` hook prewarms it ONCE at pod start and `pg_prewarm.autoprewarm` only re-warms at *startup*, so the index decays out of cache over days under job buffer-pressure (observed ~33% resident after 9d uptime → slow context search, easily misattributed to the ML model). The `clip-index-prewarm` CronJob (`*/5`, same stack) re-runs `pg_prewarm('clip_index')` to pin it hot; `immich-search-probe` (`*/5`) measures live latency + residency → Pushgateway gauges (`immich_smart_search_db_seconds`, `immich_clip_index_cached_pct`) → alerts `ImmichSmartSearchSlow`/`ImmichClipIndexColdCache`/`ImmichSearchProbeStale` + cluster-health check #46 (`check_immich_search`). immich PG role is a superuser so the CronJobs can run `pg_prewarm`/`pg_buffercache`. **Video transcoding is GPU-accelerated**: `immich-server` is pinned to GPU node1 (nodeSelector `nvidia.com/gpu.present` + NoSchedule toleration + `gpu-workload` priority) with a time-sliced `nvidia.com/gpu=1` slice — the stock immich-server image's ffmpeg already ships h264/hevc_nvenc + NVDEC. Activated via `ffmpeg.accel=nvenc` + `accelDecode=true` in the **DB** system-config (`system_metadata` table, key `system-config`, JSONB — NOT Terraform; app config is DB-managed here like oauth/smtp). Direct DB edits need a pod **recreate** to reload (config is cached at boot; only API-driven changes broadcast a reload). **Streaming bitrate is capped** to keep 4K playback smooth on the contended HDD and over remote uplinks: `ffmpeg.maxBitrate=20000k` + `preset=medium` + `transcode=bitrate` (set 2026-06-01 — was uncapped `maxBitrate=0` + `ultrafast` + `targetResolution=original`, which produced 77–264 Mbps 4K transcodes that stuttered for every client, local and remote, since even a single stream needs ~10–13.5 MB/s off the shared `sdc` spindle). 4K resolution is preserved (`targetResolution=original`); originals are NEVER modified — only the `encoded-video/` streaming copy. To re-apply transcode settings to EXISTING videos (config changes only affect new/missing ones): delete the offenders' `asset_file` rows `WHERE type='encoded_video'` (derived/regenerable — never touches originals) then run videoConversion `force=false` (admin Jobs API → "Missing"); it regenerates them to the deterministic `<assetId>.mp4` path at concurrency 1 (gentle on sdc). See `docs/runbooks/immich-transcode-bitrate.md`. If Immich is ever reinstalled fresh (not restored), re-set these keys (accel, accelDecode, **maxBitrate=20000k, preset=medium, transcode=bitrate**). Thumbnails/previews live on SSD NFS (sdb) — do NOT move to block storage (HDD sdc = slower + the contended IO domain). **Background-job concurrency is capped to protect sdc** (DB-managed system-config, `system_metadata` key `system-config`, JSONB `job.*.concurrency`; re-set on fresh install): `thumbnailGeneration=2`, `metadataExtraction=2`, `library=2` — these jobs read ORIGINALS off the HDD library. Left uncapped (were 8/4/4) a library-wide job (e.g. Duplicate Detection on 2026-06-01) fans the ML/thumbnail backfill out into a read storm that saturates sdc and starves etcd → apiserver down. `sidecar`/`smartSearch`/`faceDetection` stay at Immich defaults (small `.xmp` / SSD previews). Apply via Job Settings UI or the `system-config` API; **direct DB edits need an `immich-server` pod recreate to reload** (config cached at boot). See `docs/post-mortems/2026-05-25-immich-anca-elements-io-storm.md`. |
|
| Immich | ML on SSD (CUDA), disable ModSecurity (breaks streaming), frequent upgrades. **`immich-machine-learning` MUST run with `MACHINE_LEARNING_MODEL_TTL > 0`** (set to `600` in `stacks/immich/main.tf`, env on the `immich-machine-learning` deployment). At `0`, no model ever unloads and onnxruntime's CUDA arena (OCR's dynamic input shapes inflate it to ~10 GB) is held forever on the **time-sliced T4 it shares with llama-swap/frigate/immich-server** — which has no VRAM isolation, so immich-ml starved llama-swap (qwen3-8b) and silently broke recruiter-responder triage for ~5 h on 2026-06-02 (post-mortem `docs/post-mortems/2026-06-02-immich-ml-ttl-gpu-oom-recruiter.md`). TTL>0 lets idle models (OCR, face — AND CLIP) free VRAM. The TTL is a single GLOBAL knob (no per-model pin), so CLIP would also unload after 600s idle; the `clip-keepalive` CronJob (`*/5 * * * *`, same stack) pings the CLIP textual encoder so smart-search stays warm without pinning the ad-hoc models. **Smart search has a SECOND warmth layer in Postgres** (don't conflate it with the ML model): the ~665MB vchord `clip_index` must stay resident in PG `shared_buffers`, else an ANN probe that lands on an evicted list pays a ~1.8s cold storage read vs ~4ms warm. The `postStart` hook prewarms it ONCE at pod start and `pg_prewarm.autoprewarm` only re-warms at *startup*, so the index decays out of cache over days under job buffer-pressure (observed ~33% resident after 9d uptime → slow context search, easily misattributed to the ML model). The `clip-index-prewarm` CronJob (`*/5`, same stack) re-runs `pg_prewarm('clip_index')` to pin it hot; `immich-search-probe` (`*/5`) measures live latency + residency → Pushgateway gauges (`immich_smart_search_db_seconds`, `immich_clip_index_cached_pct`) → alerts `ImmichSmartSearchSlow`/`ImmichClipIndexColdCache`/`ImmichSearchProbeStale` + cluster-health check #46 (`check_immich_search`). immich PG role is a superuser so the CronJobs can run `pg_prewarm`/`pg_buffercache`. **Video transcoding is GPU-accelerated**: `immich-server` is pinned to GPU node1 (nodeSelector `nvidia.com/gpu.present` + NoSchedule toleration + `gpu-workload` priority) with a time-sliced `nvidia.com/gpu=1` slice — the stock immich-server image's ffmpeg already ships h264/hevc_nvenc + NVDEC. Activated via `ffmpeg.accel=nvenc` + `accelDecode=true` in the **DB** system-config (`system_metadata` table, key `system-config`, JSONB — NOT Terraform; app config is DB-managed here like oauth/smtp). Direct DB edits need a pod **recreate** to reload (config is cached at boot; only API-driven changes broadcast a reload). **Streaming bitrate is capped** to keep 4K playback smooth on the contended HDD and over remote uplinks: `ffmpeg.maxBitrate=20000k` + `preset=medium` + `transcode=bitrate` (set 2026-06-01 — was uncapped `maxBitrate=0` + `ultrafast` + `targetResolution=original`, which produced 77–264 Mbps 4K transcodes that stuttered for every client, local and remote, since even a single stream needs ~10–13.5 MB/s off the shared `sdc` spindle). 4K resolution is preserved (`targetResolution=original`); originals are NEVER modified — only the `encoded-video/` streaming copy. To re-apply transcode settings to EXISTING videos (config changes only affect new/missing ones): delete the offenders' `asset_file` rows `WHERE type='encoded_video'` (derived/regenerable — never touches originals) then run videoConversion `force=false` (admin Jobs API → "Missing"); it regenerates them to the deterministic `<assetId>.mp4` path at concurrency 1 (gentle on sdc). See `docs/runbooks/immich-transcode-bitrate.md`. If Immich is ever reinstalled fresh (not restored), re-set these keys (accel, accelDecode, **maxBitrate=20000k, preset=medium, transcode=bitrate**). Thumbnails/previews live on SSD NFS (sdb) — do NOT move to block storage (HDD sdc = slower + the contended IO domain). **Background-job concurrency is capped to protect sdc** (DB-managed system-config, `system_metadata` key `system-config`, JSONB `job.*.concurrency`; re-set on fresh install): `thumbnailGeneration=2`, `metadataExtraction=2`, `library=2` — these jobs read ORIGINALS off the HDD library. Left uncapped (were 8/4/4) a library-wide job (e.g. Duplicate Detection on 2026-06-01) fans the ML/thumbnail backfill out into a read storm that saturates sdc and starves etcd → apiserver down. `sidecar`/`smartSearch`/`faceDetection` stay at Immich defaults (small `.xmp` / SSD previews). Apply via Job Settings UI or the `system-config` API; **direct DB edits need an `immich-server` pod recreate to reload** (config cached at boot). See `docs/post-mortems/2026-05-25-immich-anca-elements-io-storm.md`. |
|
||||||
| CrowdSec | Pin version, disable Metabase when not needed (CPU hog), LAPI scaled to 3, **DB on PostgreSQL** (migrated from MySQL), flush config: max_items=10000/max_age=7d/agents_autodelete=30d, DECISION_DURATION=168h in blocklist CronJob |
|
| CrowdSec | Pin version, disable Metabase when not needed (CPU hog), LAPI scaled to 3, **DB on PostgreSQL** (migrated from MySQL), flush config: max_items=10000/max_age=7d/agents_autodelete=30d, DECISION_DURATION=168h in blocklist CronJob. **Enforcement is out-of-band, NOT a Traefik plugin** (the Yaegi `crowdsec-bouncer-traefik-plugin` was dead on Traefik 3.7.5 and removed): `cs-firewall-bouncer` DaemonSet drops in-kernel via nftables on direct hosts (bouncer key `firewall`, v0.0.34 binary fetched at runtime, hostNetwork+NET_ADMIN, `stacks/crowdsec/modules/crowdsec/firewall_bouncer.tf`); `crowdsec-cf-sync` CronJob blocks at the CF edge for proxied hosts (bouncer key `kvsync`, `stacks/rybbit/crowdsec_edge.tf`). Both fail open. See `docs/architecture/security.md` |
|
||||||
| Frigate | GPU stall detection in liveness probe (inference speed check), high CPU |
|
| Frigate | GPU stall detection in liveness probe (inference speed check), high CPU |
|
||||||
| Authentik | 3 server replicas + 2-replica embedded outpost (PG-backed sessions), PgBouncer in front of PostgreSQL, strip auth headers before forwarding. **`authentik.*` Helm values are INERT** (existingSecret skips chart env rendering) — tune via `server.env`/`worker.env` in `modules/authentik/values.yaml`. Single-screen login (password embedded in identification stage); all first-party OIDC apps use implicit consent (2026-06-10). `/static` ingress carve-out serves assets with immutable Cache-Control. |
|
| Authentik | 3 server replicas + 2-replica embedded outpost (PG-backed sessions), PgBouncer in front of PostgreSQL, strip auth headers before forwarding. **`authentik.*` Helm values are INERT** (existingSecret skips chart env rendering) — tune via `server.env`/`worker.env` in `modules/authentik/values.yaml`. Single-screen login (password embedded in identification stage); all first-party OIDC apps use implicit consent (2026-06-10). `/static` ingress carve-out serves assets with immutable Cache-Control. |
|
||||||
| Kyverno | failurePolicy=Ignore to prevent blocking cluster, pin chart version |
|
| Kyverno | failurePolicy=Ignore to prevent blocking cluster, pin chart version |
|
||||||
|
|
|
||||||
|
|
@ -4,7 +4,7 @@ Last updated: 2026-04-19 (WS E — Kea DHCP pushes dual DNS per subnet; Kea DDNS
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
The homelab network is built on a dual-VLAN architecture with pfSense providing gateway services, Technitium for internal DNS, and Cloudflare for external DNS. Traefik serves as the Kubernetes ingress controller with a comprehensive middleware chain including CrowdSec bot protection, Authentik forward-auth, and rate limiting. All HTTP traffic flows through Cloudflared tunnels, avoiding the need for port forwarding or exposing public IPs.
|
The homelab network is built on a dual-VLAN architecture with pfSense providing gateway services, Technitium for internal DNS, and Cloudflare for external DNS. Traefik serves as the Kubernetes ingress controller with a middleware chain of anti-AI bot-blocking, Authentik forward-auth, rate limiting, and retry. CrowdSec IP-reputation enforcement is **out-of-band** (not a Traefik hop): banned IPs are dropped in-kernel via nftables on direct hosts and blocked at the Cloudflare edge on proxied hosts (see `docs/architecture/security.md`). All HTTP traffic flows through Cloudflared tunnels, avoiding the need for port forwarding or exposing public IPs.
|
||||||
|
|
||||||
## Architecture Diagram
|
## Architecture Diagram
|
||||||
|
|
||||||
|
|
@ -16,12 +16,14 @@ graph TB
|
||||||
Traefik[Traefik Ingress<br/>3 replicas + PDB]
|
Traefik[Traefik Ingress<br/>3 replicas + PDB]
|
||||||
|
|
||||||
subgraph "Middleware Chain"
|
subgraph "Middleware Chain"
|
||||||
CS[CrowdSec Bouncer<br/>fail-open]
|
AntiAI[Anti-AI bot-block<br/>fail-open]
|
||||||
Auth[Authentik Forward-Auth<br/>3 replicas + PDB]
|
Auth[Authentik Forward-Auth<br/>3 replicas + PDB]
|
||||||
RL[Rate Limiter<br/>429 response]
|
RL[Rate Limiter<br/>429 response]
|
||||||
Retry[Retry<br/>2 attempts, 100ms]
|
Retry[Retry<br/>2 attempts, 100ms]
|
||||||
end
|
end
|
||||||
|
|
||||||
|
CSdrop[CrowdSec drop<br/>nftables / CF edge<br/>out-of-band, pre-Traefik]
|
||||||
|
|
||||||
subgraph "Proxmox Host (eno1)"
|
subgraph "Proxmox Host (eno1)"
|
||||||
vmbr0[vmbr0 Bridge<br/>192.168.1.127/24]
|
vmbr0[vmbr0 Bridge<br/>192.168.1.127/24]
|
||||||
vmbr1[vmbr1 Internal<br/>VLAN-aware]
|
vmbr1[vmbr1 Internal<br/>VLAN-aware]
|
||||||
|
|
@ -53,8 +55,9 @@ graph TB
|
||||||
Internet -->|DNS query| CF
|
Internet -->|DNS query| CF
|
||||||
CF -->|CNAME to tunnel| CFD
|
CF -->|CNAME to tunnel| CFD
|
||||||
CFD --> Traefik
|
CFD --> Traefik
|
||||||
Traefik --> CS
|
CSdrop -.->|banned IPs dropped before Traefik| Traefik
|
||||||
CS --> Auth
|
Traefik --> AntiAI
|
||||||
|
AntiAI --> Auth
|
||||||
Auth --> RL
|
Auth --> RL
|
||||||
RL --> Retry
|
RL --> Retry
|
||||||
Retry --> Service
|
Retry --> Service
|
||||||
|
|
@ -82,7 +85,7 @@ graph TB
|
||||||
| Cloudflare DNS | SaaS | External | ~50 public domains under viktorbarzin.me |
|
| Cloudflare DNS | SaaS | External | ~50 public domains under viktorbarzin.me |
|
||||||
| Cloudflared | Container | K8s (3 replicas) | Tunnel ingress, replaces port forwarding |
|
| Cloudflared | Container | K8s (3 replicas) | Tunnel ingress, replaces port forwarding |
|
||||||
| Traefik | Helm chart | K8s (3 replicas + PDB) | Ingress controller, HTTP/3 enabled |
|
| Traefik | Helm chart | K8s (3 replicas + PDB) | Ingress controller, HTTP/3 enabled |
|
||||||
| CrowdSec | Helm chart | K8s (LAPI: 3 replicas) | Bot protection, fail-open bouncer |
|
| CrowdSec | Helm chart | K8s (LAPI: 3 replicas) | IP reputation. Out-of-band enforcement: `cs-firewall-bouncer` DaemonSet (in-kernel nftables drop, direct hosts) + Cloudflare edge WAF rule (proxied hosts). Fail-open |
|
||||||
| Authentik | Helm chart | K8s (3 replicas + PDB) | SSO, forward-auth middleware |
|
| Authentik | Helm chart | K8s (3 replicas + PDB) | SSO, forward-auth middleware |
|
||||||
| MetalLB | v0.15.3 Helm chart | K8s | LoadBalancer IPs (10.0.20.200-10.0.20.220), all services on 10.0.20.200 |
|
| MetalLB | v0.15.3 Helm chart | K8s | LoadBalancer IPs (10.0.20.200-10.0.20.220), all services on 10.0.20.200 |
|
||||||
| Registry Cache | Container | 10.0.20.10 | Pull-through for docker.io:5000, ghcr.io:5010 |
|
| Registry Cache | Container | 10.0.20.10 | Pull-through for docker.io:5000, ghcr.io:5010 |
|
||||||
|
|
@ -208,24 +211,31 @@ VMs tag traffic on vmbr1 to isolate workloads. pfSense bridges VLAN 20 to the up
|
||||||
|
|
||||||
### Ingress Flow
|
### Ingress Flow
|
||||||
|
|
||||||
|
CrowdSec is **not** a step in this chain — banned IPs are dropped before the
|
||||||
|
request ever reaches Traefik (Cloudflare edge WAF rule on proxied hosts; host
|
||||||
|
nftables on direct hosts). The flow below is for a request that survives that
|
||||||
|
out-of-band gate.
|
||||||
|
|
||||||
```mermaid
|
```mermaid
|
||||||
sequenceDiagram
|
sequenceDiagram
|
||||||
participant Client
|
participant Client
|
||||||
participant Cloudflare
|
participant CFedge as Cloudflare (edge WAF: crowdsec_ban block)
|
||||||
participant Cloudflared
|
participant Cloudflared
|
||||||
participant Traefik
|
participant Traefik
|
||||||
participant CrowdSec
|
participant AntiAI
|
||||||
participant Authentik
|
participant Authentik
|
||||||
participant RateLimit
|
participant RateLimit
|
||||||
participant Retry
|
participant Retry
|
||||||
participant Service
|
participant Service
|
||||||
participant Pod
|
participant Pod
|
||||||
|
|
||||||
Client->>Cloudflare: HTTPS request to blog.viktorbarzin.me
|
Client->>CFedge: HTTPS request to blog.viktorbarzin.me
|
||||||
Cloudflare->>Cloudflared: Forward via tunnel (QUIC)
|
Note over CFedge: banned IP → blocked here (proxied hosts)
|
||||||
|
CFedge->>Cloudflared: Forward via tunnel (QUIC)
|
||||||
Cloudflared->>Traefik: HTTP to LoadBalancer IP
|
Cloudflared->>Traefik: HTTP to LoadBalancer IP
|
||||||
Traefik->>CrowdSec: Apply bouncer middleware
|
Note over Traefik: on direct hosts, banned IPs already dropped in-kernel (nftables forward hook)
|
||||||
CrowdSec->>Authentik: If allowed, check auth (protected=true)
|
Traefik->>AntiAI: anti-AI bot-block (fail-open)
|
||||||
|
AntiAI->>Authentik: If allowed, check auth (protected=true)
|
||||||
Authentik->>RateLimit: If authenticated, check rate limit
|
Authentik->>RateLimit: If authenticated, check rate limit
|
||||||
RateLimit->>Retry: If within limit, continue
|
RateLimit->>Retry: If within limit, continue
|
||||||
Retry->>Service: Forward to Service
|
Retry->>Service: Forward to Service
|
||||||
|
|
@ -234,24 +244,27 @@ sequenceDiagram
|
||||||
Service-->>Retry: Response
|
Service-->>Retry: Response
|
||||||
Retry-->>RateLimit: Response
|
Retry-->>RateLimit: Response
|
||||||
RateLimit-->>Authentik: Response (strip auth headers)
|
RateLimit-->>Authentik: Response (strip auth headers)
|
||||||
Authentik-->>CrowdSec: Response
|
Authentik-->>AntiAI: Response
|
||||||
CrowdSec-->>Traefik: Response
|
AntiAI-->>Traefik: Response
|
||||||
Traefik-->>Cloudflared: Response
|
Traefik-->>Cloudflared: Response
|
||||||
Cloudflared-->>Cloudflare: Response via tunnel
|
Cloudflared-->>CFedge: Response via tunnel
|
||||||
Cloudflare-->>Client: HTTPS response
|
CFedge-->>Client: HTTPS response
|
||||||
```
|
```
|
||||||
|
|
||||||
### Middleware Chain
|
### Middleware Chain
|
||||||
|
|
||||||
Every ingress created by the `ingress_factory` module follows this chain:
|
CrowdSec IP-reputation enforcement is **not** in this chain — it is out-of-band
|
||||||
|
(host nftables on direct hosts; the Cloudflare edge WAF `crowdsec_ban` rule on
|
||||||
|
proxied hosts), so banned IPs never reach the chain and there is no per-request
|
||||||
|
CrowdSec hop. Every ingress created by the `ingress_factory` module follows this
|
||||||
|
Traefik chain:
|
||||||
|
|
||||||
1. **CrowdSec Bouncer**: Checks IP against threat database. **Fail-open** mode — if LAPI is unreachable, traffic passes through to prevent outages.
|
1. **Anti-AI bot-block** (`ai-bot-block` ForwardAuth, on by default via `ingress_factory`): blocks/tarpits known AI crawlers. **Fail-open** (currently a no-op `return 200` — poison-fountain scaled to 0; see `docs/architecture/security.md`).
|
||||||
2. **Authentik Forward-Auth** (if `protected = true`): SSO authentication via OIDC. Non-authenticated users are redirected to login. Auth headers are stripped before forwarding to backend.
|
2. **Authentik Forward-Auth** (if `protected = true`): SSO authentication via OIDC. Non-authenticated users are redirected to login. Auth headers are stripped before forwarding to backend.
|
||||||
3. **Rate Limiting**: Per-IP throttling. Returns **429 Too Many Requests** (not 503) when limit exceeded. Default is `rate-limit` (average 10 req/s, burst 50). Services whose clients legitimately burst harder get a dedicated middleware via `skip_default_rate_limit = true` + `extra_middlewares`: Immich (`immich-rate-limit`, 1000/20000, photo uploads) and ActualBudget (`actualbudget-rate-limit`, 50/300 — the Actual web app boots with ~70 parallel asset/migration revalidations; the default burst 429'd the tail and stalled every page load).
|
3. **Rate Limiting**: Per-IP throttling. Returns **429 Too Many Requests** (not 503) when limit exceeded. Default is `rate-limit` (average 10 req/s, burst 50). Services whose clients legitimately burst harder get a dedicated middleware via `skip_default_rate_limit = true` + `extra_middlewares`: Immich (`immich-rate-limit`, 1000/20000, photo uploads) and ActualBudget (`actualbudget-rate-limit`, 50/300 — the Actual web app boots with ~70 parallel asset/migration revalidations; the default burst 429'd the tail and stalled every page load).
|
||||||
4. **Retry**: 2 attempts with 100ms delay on transient failures (5xx errors, connection errors).
|
4. **Retry**: 2 attempts with 100ms delay on transient failures (5xx errors, connection errors).
|
||||||
|
|
||||||
Additional middleware:
|
Additional middleware:
|
||||||
- **Anti-AI**: On by default via `ingress_factory`. Blocks common AI crawler user-agents.
|
|
||||||
- **HTTP/3 (QUIC)**: Enabled globally on Traefik.
|
- **HTTP/3 (QUIC)**: Enabled globally on Traefik.
|
||||||
|
|
||||||
### Entrypoint Transport Timeouts
|
### Entrypoint Transport Timeouts
|
||||||
|
|
@ -348,7 +361,7 @@ Containerd on all K8s nodes uses `hosts.toml` to redirect pulls to the local cac
|
||||||
| pfSense | `stacks/pfsense/` | VM + cloud-init config |
|
| pfSense | `stacks/pfsense/` | VM + cloud-init config |
|
||||||
| Technitium | `stacks/technitium/` | Deployment, Service, PVC |
|
| Technitium | `stacks/technitium/` | Deployment, Service, PVC |
|
||||||
| Traefik | `stacks/platform/` (sub-module) | Helm release, IngressRoute CRDs |
|
| Traefik | `stacks/platform/` (sub-module) | Helm release, IngressRoute CRDs |
|
||||||
| CrowdSec | `stacks/platform/` (sub-module) | Helm release, LAPI + bouncer |
|
| CrowdSec | `stacks/crowdsec/` (+ edge in `stacks/rybbit/`) | Helm release, LAPI + agent; `cs-firewall-bouncer` DaemonSet (nftables, direct hosts) + Cloudflare edge sync (proxied hosts) |
|
||||||
| Authentik | `stacks/authentik/` | Helm release, ingress, OIDC configs |
|
| Authentik | `stacks/authentik/` | Helm release, ingress, OIDC configs |
|
||||||
| MetalLB | `stacks/platform/` (sub-module) | Helm release, IPAddressPool |
|
| MetalLB | `stacks/platform/` (sub-module) | Helm release, IPAddressPool |
|
||||||
| Cloudflared | `stacks/cloudflared/` | Deployment (3 replicas), tunnel config; runs `--no-autoupdate` (in-place self-updates exited the pods and severed all tunnel WebSockets, 2026-06-09/10) |
|
| Cloudflared | `stacks/cloudflared/` | Deployment (3 replicas), tunnel config; runs `--no-autoupdate` (in-place self-updates exited the pods and severed all tunnel WebSockets, 2026-06-09/10) |
|
||||||
|
|
@ -436,13 +449,30 @@ Containerd on all K8s nodes uses `hosts.toml` to redirect pulls to the local cac
|
||||||
|
|
||||||
**Decision**: Technitium handles internal `.lan` domains with near-zero latency. Cloudflare handles public domains with global DNS. K8s nodes use Technitium as primary, which forwards non-.lan queries to Cloudflare.
|
**Decision**: Technitium handles internal `.lan` domains with near-zero latency. Cloudflare handles public domains with global DNS. K8s nodes use Technitium as primary, which forwards non-.lan queries to Cloudflare.
|
||||||
|
|
||||||
### Why Fail-Open on CrowdSec Bouncer?
|
### Why CrowdSec Enforcement Is Out-of-Band (and Fails Open)
|
||||||
|
|
||||||
**Alternatives considered**:
|
CrowdSec used to enforce inline as a Traefik middleware (the
|
||||||
1. **Fail-closed**: Maximum security, but LAPI downtime blocks all traffic.
|
`crowdsec-bouncer-traefik-plugin`). On Traefik 3.7.5 the Yaegi plugin handler was
|
||||||
2. **Redundant LAPI**: Already scaled to 3 replicas, but resource pressure can still cause outages.
|
never invoked, so it enforced nothing; the plugin was removed and enforcement
|
||||||
|
moved off the request path entirely (full history in
|
||||||
|
`docs/architecture/security.md`). It now runs on two surfaces:
|
||||||
|
|
||||||
**Decision**: Availability > strict bot blocking. CrowdSec LAPI is scaled to 3 replicas for resilience, but during cluster-wide resource exhaustion (e.g., memory pressure), bouncer falls back to allowing traffic. This prevents a complete service outage due to a security add-on.
|
- **Direct hosts** → `cs-firewall-bouncer` DaemonSet drops banned IPs in the host
|
||||||
|
nftables, in **both the `input` and `forward` hooks**. The `forward` hook is
|
||||||
|
the load-bearing one: with Traefik on a dedicated LB IP at
|
||||||
|
`externalTrafficPolicy=Local`, client packets are DNAT'd to the Traefik **pod**
|
||||||
|
and transit the node's `forward` chain (not `input`) — which is exactly why the
|
||||||
|
ingress must preserve the **real client IP** end-to-end (ETP=Local + PROXY-v2
|
||||||
|
for IPv6; see the Traefik LB IP and IPv6 ingress notes above). Without the real
|
||||||
|
client IP the firewall-bouncer (and the CF edge rule) would have nothing to
|
||||||
|
match on.
|
||||||
|
- **Proxied hosts** → a Cloudflare edge WAF rule (`ip.src in $crowdsec_ban`) fed
|
||||||
|
by the `crowdsec-cf-sync` CronJob.
|
||||||
|
|
||||||
|
Both **fail open**: if LAPI is unreachable, the firewall-bouncer simply stops
|
||||||
|
receiving new decisions (existing drops persist) and the CF sync skips a run —
|
||||||
|
neither ever blocks legitimate traffic. Availability > strict bot blocking, and
|
||||||
|
out-of-band enforcement adds **zero per-request latency** (no Traefik hop).
|
||||||
|
|
||||||
### Why HTTP/3 (QUIC)?
|
### Why HTTP/3 (QUIC)?
|
||||||
|
|
||||||
|
|
@ -473,9 +503,10 @@ Containerd on all K8s nodes uses `hosts.toml` to redirect pulls to the local cac
|
||||||
|
|
||||||
**Symptoms**: All ingress routes return 503, Traefik dashboard shows no backends available.
|
**Symptoms**: All ingress routes return 503, Traefik dashboard shows no backends available.
|
||||||
|
|
||||||
**Diagnosis**: Middleware chain is blocking traffic. Check:
|
**Diagnosis**: Middleware chain is blocking traffic. (CrowdSec is **not** in the
|
||||||
1. Authentik status: `kubectl get pod -n authentik`
|
chain — a CrowdSec/LAPI outage cannot cause 503s; it only stops new bans.) Check:
|
||||||
2. CrowdSec LAPI status: `kubectl get pod -n crowdsec`
|
1. Authentik status: `kubectl get pod -n authentik` (ForwardAuth fails closed if the auth server is unreachable)
|
||||||
|
2. `bot-block-proxy` status: `kubectl get pod -n traefik -l app=bot-block-proxy` (anti-AI ForwardAuth target — also fails closed if down)
|
||||||
3. Traefik logs: `kubectl logs -n kube-system deploy/traefik`
|
3. Traefik logs: `kubectl logs -n kube-system deploy/traefik`
|
||||||
|
|
||||||
**Fix**: If Authentik is down and ingress uses forward-auth, pods won't pass health checks. Scale Authentik to 3 replicas or temporarily disable forward-auth middleware.
|
**Fix**: If Authentik is down and ingress uses forward-auth, pods won't pass health checks. Scale Authentik to 3 replicas or temporarily disable forward-auth middleware.
|
||||||
|
|
|
||||||
|
|
@ -2,40 +2,50 @@
|
||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
The homelab implements defense-in-depth security at the application layer (L7) using CrowdSec for threat intelligence and IP reputation, Kyverno for policy enforcement and resource governance, and a 3-layer anti-AI scraping defense (reduced from 5 in April 2026 after removing the rewrite-body plugin). All security components operate in graceful degradation mode (fail-open) to prevent cascading failures. Security policies are deployed in audit mode first, then selectively enforced after validation.
|
The homelab implements defense-in-depth security using CrowdSec for threat intelligence and IP reputation, Kyverno for policy enforcement and resource governance, and a 3-layer anti-AI scraping defense (reduced from 5 in April 2026 after removing the rewrite-body plugin). CrowdSec enforcement is **out-of-band** (not a per-request Traefik hop — see the CrowdSec section): banned IPs are dropped in-kernel via nftables on direct hosts, and blocked at the Cloudflare edge on proxied hosts, so enforcement adds **zero per-request latency**. All security components fail open (a CrowdSec outage stops new bans but never blocks legitimate traffic). Security policies are deployed in audit mode first, then selectively enforced after validation.
|
||||||
|
|
||||||
## Architecture Diagram
|
## Architecture Diagram
|
||||||
|
|
||||||
|
CrowdSec enforcement is out-of-band (NOT an inline Traefik middleware hop). The
|
||||||
|
Traefik request chain is anti-AI → Authentik ForwardAuth → rate-limit → retry;
|
||||||
|
CrowdSec drops banned IPs *before* (direct hosts) or *off* (proxied hosts) that
|
||||||
|
chain entirely.
|
||||||
|
|
||||||
```mermaid
|
```mermaid
|
||||||
graph LR
|
graph TB
|
||||||
Internet[Internet]
|
Internet[Internet]
|
||||||
CF[Cloudflare WAF]
|
|
||||||
|
subgraph "Proxied hosts (orange-cloud)"
|
||||||
|
CFedge[Cloudflare edge<br/>WAF rule: ip.src in $crowdsec_ban → block]
|
||||||
|
end
|
||||||
|
subgraph "Direct hosts (grey-cloud / internal)"
|
||||||
|
NFT[Host nftables<br/>table crowdsec/crowdsec6<br/>drop in input + forward]
|
||||||
|
end
|
||||||
|
|
||||||
Tunnel[Cloudflared Tunnel]
|
Tunnel[Cloudflared Tunnel]
|
||||||
CrowdSec[CrowdSec Bouncer<br/>Traefik Plugin]
|
Traefik[Traefik<br/>anti-AI → Authentik → rate-limit → retry]
|
||||||
AntiAI[Anti-AI Check<br/>poison-fountain]
|
|
||||||
ForwardAuth[Authentik ForwardAuth]
|
|
||||||
RateLimit[Rate Limit Middleware]
|
|
||||||
Retry[Retry Middleware<br/>2 attempts, 100ms]
|
|
||||||
Backend[Backend Service]
|
Backend[Backend Service]
|
||||||
|
|
||||||
LAPI[CrowdSec LAPI<br/>3 replicas]
|
LAPI[CrowdSec LAPI<br/>3 replicas]
|
||||||
Agent[CrowdSec Agent]
|
Agent[CrowdSec Agent<br/>parses Traefik logs]
|
||||||
|
FWB[cs-firewall-bouncer<br/>DaemonSet, every node]
|
||||||
|
CFsync[crowdsec-cf-sync<br/>CronJob, every 2 min]
|
||||||
|
|
||||||
Internet -->|1| CF
|
Internet -->|proxied| CFedge
|
||||||
CF -->|2| Tunnel
|
Internet -->|direct| NFT
|
||||||
Tunnel -->|3| CrowdSec
|
CFedge -->|allowed| Tunnel
|
||||||
CrowdSec -.->|Query| LAPI
|
Tunnel --> Traefik
|
||||||
Agent -.->|Report| LAPI
|
NFT -->|allowed| Traefik
|
||||||
CrowdSec -->|4. Pass/Block| AntiAI
|
Traefik --> Backend
|
||||||
AntiAI -->|5. Human/Bot| ForwardAuth
|
|
||||||
ForwardAuth -->|6. Authenticated| RateLimit
|
|
||||||
RateLimit -->|7. Under Limit| Retry
|
|
||||||
Retry -->|8. Success/Retry| Backend
|
|
||||||
|
|
||||||
style CrowdSec fill:#f9f,stroke:#333
|
Agent -.->|report| LAPI
|
||||||
style AntiAI fill:#ff9,stroke:#333
|
LAPI -.->|all decisions incl. CAPI| FWB
|
||||||
style ForwardAuth fill:#9f9,stroke:#333
|
FWB -.->|program drop rules| NFT
|
||||||
style RateLimit fill:#99f,stroke:#333
|
LAPI -.->|ban/captcha decisions, CAPI excluded| CFsync
|
||||||
|
CFsync -.->|push IP list| CFedge
|
||||||
|
|
||||||
|
style CFedge fill:#f9f,stroke:#333
|
||||||
|
style NFT fill:#f9f,stroke:#333
|
||||||
```
|
```
|
||||||
|
|
||||||
## Components
|
## Components
|
||||||
|
|
@ -44,7 +54,8 @@ graph LR
|
||||||
|-----------|---------|----------|---------|
|
|-----------|---------|----------|---------|
|
||||||
| CrowdSec LAPI | Pinned | `stacks/crowdsec/` | Local API, threat intelligence aggregation (3 replicas) |
|
| CrowdSec LAPI | Pinned | `stacks/crowdsec/` | Local API, threat intelligence aggregation (3 replicas) |
|
||||||
| CrowdSec Agent | Pinned | `stacks/crowdsec/` | Log parser, scenario detection |
|
| CrowdSec Agent | Pinned | `stacks/crowdsec/` | Log parser, scenario detection |
|
||||||
| CrowdSec Traefik Bouncer | Plugin | Traefik config | Plugin-based IP reputation check |
|
| cs-firewall-bouncer | v0.0.34 | `stacks/crowdsec/modules/crowdsec/firewall_bouncer.tf` | In-kernel nftables drop on every node (DIRECT hosts). Bouncer key `firewall` |
|
||||||
|
| crowdsec-cf-sync | — | `stacks/rybbit/crowdsec_edge.tf` | LAPI→Cloudflare-IP-List sync CronJob (PROXIED hosts). Bouncer key `kvsync` |
|
||||||
| Kyverno | Pinned chart | `stacks/kyverno/` | Policy engine for K8s admission control |
|
| Kyverno | Pinned chart | `stacks/kyverno/` | Policy engine for K8s admission control |
|
||||||
| poison-fountain | Latest | `stacks/poison-fountain/` | Anti-AI bot detection and tarpit service |
|
| poison-fountain | Latest | `stacks/poison-fountain/` | Anti-AI bot detection and tarpit service |
|
||||||
| cert-manager/certbot | - | `stacks/cert-manager/` | TLS certificate management |
|
| cert-manager/certbot | - | `stacks/cert-manager/` | TLS certificate management |
|
||||||
|
|
@ -54,11 +65,15 @@ graph LR
|
||||||
|
|
||||||
### Request Security Layers
|
### Request Security Layers
|
||||||
|
|
||||||
Every incoming request passes through 6 security layers:
|
CrowdSec IP-reputation enforcement happens **before** a request reaches the
|
||||||
|
Traefik chain (banned IPs are dropped in-kernel on direct hosts, or blocked at
|
||||||
|
the Cloudflare edge on proxied hosts — see CrowdSec Threat Intelligence below).
|
||||||
|
A request that survives that out-of-band gate then passes through the Traefik
|
||||||
|
middleware chain:
|
||||||
|
|
||||||
1. **Cloudflare WAF** - DDoS protection, bot detection, firewall rules (external)
|
1. **Cloudflare WAF / edge** - DDoS protection, bot detection, firewall rules incl. the CrowdSec `crowdsec_ban` block rule (proxied hosts only)
|
||||||
2. **Cloudflared Tunnel** - Zero Trust tunnel, hides origin IP
|
2. **Cloudflared Tunnel** - Zero Trust tunnel, hides origin IP (proxied hosts)
|
||||||
3. **CrowdSec Bouncer** - IP reputation check against LAPI (fail-open on error)
|
3. **CrowdSec out-of-band drop** - nftables on direct hosts; *not* a Traefik hop (zero per-request latency)
|
||||||
4. **Anti-AI Scraping** - 3-layer bot defense (optional per service, updated 2026-04-17)
|
4. **Anti-AI Scraping** - 3-layer bot defense (optional per service, updated 2026-04-17)
|
||||||
5. **Authentik ForwardAuth** - Authentication check (if `protected = true`)
|
5. **Authentik ForwardAuth** - Authentication check (if `protected = true`)
|
||||||
6. **Rate Limiting** - Per-source IP rate limits (returns 429 on breach)
|
6. **Rate Limiting** - Per-source IP rate limits (returns 429 on breach)
|
||||||
|
|
@ -80,58 +95,71 @@ CrowdSec operates in a hub-and-agent model:
|
||||||
- Reports malicious IPs to LAPI
|
- Reports malicious IPs to LAPI
|
||||||
- Shares threat intel with CrowdSec community (anonymized)
|
- Shares threat intel with CrowdSec community (anonymized)
|
||||||
|
|
||||||
**Traefik Bouncer Plugin** (`crowdsec-bouncer-traefik-plugin`, `stacks/traefik/modules/traefik/middleware.tf`):
|
Enforcement is split across **two out-of-band surfaces**, neither of which adds
|
||||||
- Integrated as Traefik middleware (in the default ingress chain)
|
any per-request latency. (See "Why the Traefik bouncer plugin was removed" below
|
||||||
- Queries LAPI for IP reputation on each request
|
for the supersession history — there is no longer an inline Traefik bouncer.)
|
||||||
- **Registered with LAPI** via `BOUNCER_KEY_traefik` env on the LAPI container
|
|
||||||
(`stacks/crowdsec/modules/crowdsec/values.yaml`), seeded from the same Vault key
|
|
||||||
the middleware presents (`ingress_crowdsec_api_key`). **Before 2026-06-19 the
|
|
||||||
bouncer was never registered → LAPI returned 403 → the plugin failed open and
|
|
||||||
enforced nothing (no bans, no captcha).** The seed re-registers automatically on
|
|
||||||
every LAPI start, so a DB wipe (e.g. the MySQL→PostgreSQL migration that lost the
|
|
||||||
original registration) can't silently disable enforcement again.
|
|
||||||
- **Fail-open mode**: If LAPI unreachable, allows traffic (graceful degradation)
|
|
||||||
- **Only sees non-proxied (direct) apps' real client IPs** (ETP=Local). Proxied
|
|
||||||
apps arrive from cloudflared's pod IP (in `clientTrustedIPs`) and are bypassed —
|
|
||||||
extending enforcement to proxied apps needs `forwardedHeadersTrustedIPs` (future).
|
|
||||||
- Honours two LAPI remediation types (profiles in `stacks/crowdsec/modules/crowdsec/values.yaml`):
|
|
||||||
- **`ban`** → HTTP 403 (serious attacks: CVE exploits, scanners, brute force)
|
|
||||||
- **`captcha`** → **Cloudflare Turnstile challenge** so the flagged user can
|
|
||||||
self-unblock (lower-severity abuse: `http-429-abuse`, `http-403-abuse`,
|
|
||||||
`http-crawl-non_statics`, `http-sensitive-files`). The plugin is configured
|
|
||||||
with `captchaProvider=turnstile` + the widget keys; the `captcha.html`
|
|
||||||
template is mounted into the Traefik pod at `/captcha`. The widget is
|
|
||||||
Terraform-managed in `stacks/traefik/main.tf`
|
|
||||||
(`cloudflare_turnstile_widget.crowdsec_captcha`, scoped to `viktorbarzin.me`
|
|
||||||
so it covers every subdomain). **Before 2026-06-19 no captcha provider was
|
|
||||||
configured, so `captcha` decisions silently degraded to a 403 ban** — users
|
|
||||||
had no way to self-unblock; wiring Turnstile fixed that.
|
|
||||||
|
|
||||||
**Cloudflare Edge Enforcement for proxied hosts** (`stacks/rybbit/crowdsec_edge.tf` + `lapi_kv_sync.py`):
|
**Surface 1 — DIRECT (non-Cloudflare-proxied) hosts → in-kernel nftables drop**
|
||||||
- Proxied (orange-cloud) hosts terminate at the Cloudflare edge, so the in-cluster
|
(`cs-firewall-bouncer` DaemonSet, `stacks/crowdsec/modules/crowdsec/firewall_bouncer.tf`):
|
||||||
bouncer above never decides on them. Edge enforcement instead syncs LAPI
|
- Runs on **every node** (no nodeSelector). Programs the HOST nftables — `table ip
|
||||||
decisions into **one Cloudflare account IP List (`crowdsec_ban`)** + a single
|
crowdsec` / `table ip6 crowdsec6` — with drop rules in **both the `input` AND
|
||||||
**zone-scoped WAF custom rule** blocking `(ip.src in $crowdsec_ban)` across every
|
the `forward` hooks**. The `forward` hook is required because Traefik is a
|
||||||
proxied host. CronJob `crowdsec-cf-sync` (rybbit ns, every 2 min) reconciles it.
|
LoadBalancer with `externalTrafficPolicy=Local`: client traffic is DNAT'd to the
|
||||||
- **BAN-ONLY (2026-06-20):** only `type=ban` decisions sync to the edge. `captcha`
|
Traefik **pod** and transits the node's `forward` hook (not `input`) with the
|
||||||
decisions are deliberately NOT pushed — the CF account allows only ONE Rules List
|
real client IP preserved. Chains use `policy accept` (only set members drop —
|
||||||
with a single block action, so folding captcha in would hard-block a soft
|
it can never blackhole normal traffic).
|
||||||
challenge on every proxied host. (Before 2026-06-20 captcha was downgraded to a
|
- Pulls **all** decisions from LAPI, **including the CAPI community blocklist
|
||||||
hard block at the edge.)
|
(~31k IPs)**. Packets from banned IPs are dropped **in-kernel before reaching
|
||||||
- **Auth carve-out (2026-06-20):** the WAF rule excludes `authentik.viktorbarzin.me`
|
Traefik** → zero per-request hops, no Traefik involvement at all.
|
||||||
+ `public-auth.viktorbarzin.me` (`… and not (http.host in {…})`), and the
|
- **Packaging**: cs-firewall-bouncer publishes no container image, so the
|
||||||
Authentik UI ingress sets `exclude_crowdsec = true` for the in-cluster bouncer. A
|
**v0.0.34** static binary is fetched at runtime by an initContainer onto a
|
||||||
CrowdSec hit must never wall a user out of the login / WebAuthn flow they
|
`debian:bookworm-slim` runtime container. Needs `hostNetwork` +
|
||||||
authenticate through; auth keeps `traefik-rate-limit` for brute-force protection.
|
`NET_ADMIN`/`NET_RAW` to talk netlink directly. Registered bouncer key:
|
||||||
- **⚠️ Currently NON-FUNCTIONAL (known issue, pre-existing since the 2026-06-20
|
**`firewall`**.
|
||||||
rollout):** `crowdsec-cf-sync` fails every run — `cf_list_items()` pagination
|
- **Fail-open**: if LAPI is unreachable it just stops receiving new decisions
|
||||||
gets CF `HTTP 400 code 10027 "invalid or expired cursor"`, so the list never
|
(existing drop rules persist); it never blocks legitimate traffic.
|
||||||
populates (`num_items=0`) and the edge rule blocks nothing. LAPI also returns
|
|
||||||
~31k ban IPs, likely exceeding CF IP-List capacity even once pagination is fixed.
|
**Surface 2 — PROXIED (Cloudflare orange-cloud) hosts → Cloudflare edge block**
|
||||||
**Edge enforcement for proxied hosts is therefore inert pending a fix** (the
|
(`stacks/rybbit/crowdsec_edge.tf` + `lapi_kv_sync.py`):
|
||||||
in-cluster bouncer still protects direct apps; the auth carve-out is correct
|
- Proxied hosts terminate at the Cloudflare edge, so a host-level nftables drop
|
||||||
regardless). Fix needs: (1) correct CF cursor pagination, (2) a capacity strategy
|
would never see them. Enforcement is instead a single Cloudflare Rules List
|
||||||
for the ban set.
|
**`crowdsec_ban`** + a zone-scoped WAF custom rule `(ip.src in $crowdsec_ban)`
|
||||||
|
→ **block** action, which covers every proxied host in the zone.
|
||||||
|
- Fed by the **`crowdsec-cf-sync` CronJob** (namespace `rybbit`, every 2 min,
|
||||||
|
pure-stdlib Python in a ConfigMap). It pulls local **ban/captcha ip-scoped**
|
||||||
|
decisions and pushes them into the CF list, but **EXCLUDES the ~31k CAPI
|
||||||
|
community blocklist** — that set is far too large for a CF Rules List (the CF
|
||||||
|
account hard-limits to **one** list), and CAPI is already covered in-kernel on
|
||||||
|
direct hosts and by Cloudflare's own managed protections on proxied hosts.
|
||||||
|
Registered bouncer key: **`kvsync`**.
|
||||||
|
- **Block-only**: the single-list limit precludes a separate
|
||||||
|
captcha/managed-challenge list, so both ban and captcha decisions are enforced
|
||||||
|
as a plain block at the edge.
|
||||||
|
- **Auth carve-out:** the WAF rule excludes `authentik.viktorbarzin.me` +
|
||||||
|
`public-auth.viktorbarzin.me` (`… and not (http.host in {…})`). A CrowdSec hit
|
||||||
|
must never wall a user out of the login / WebAuthn flow they authenticate
|
||||||
|
through; auth keeps `traefik-rate-limit` for brute-force protection.
|
||||||
|
|
||||||
|
**Whitelist** (`stacks/crowdsec/whitelist.yaml`): a CrowdSec whitelist covers
|
||||||
|
RFC1918 + the tailnet + internal CIDRs (plus one specific external IP), so
|
||||||
|
internal users are never enforced. Internal access uses split-horizon DNS
|
||||||
|
straight to Traefik, and direct internal clients are RFC1918 — both whitelisted.
|
||||||
|
|
||||||
|
#### Why the Traefik bouncer plugin was removed
|
||||||
|
|
||||||
|
Enforcement used to run as an inline Traefik middleware — the
|
||||||
|
`crowdsec-bouncer-traefik-plugin` (Yaegi/Lua), which queried LAPI on every
|
||||||
|
request and could serve a Cloudflare Turnstile captcha for soft remediations.
|
||||||
|
On **Traefik 3.7.5 the Yaegi handler was never invoked**, so the bouncer was
|
||||||
|
registered but enforced **nothing** despite appearing healthy. Rather than chase
|
||||||
|
the Yaegi runtime, the whole plugin path was **removed** (2026-06): the plugin
|
||||||
|
static config + initContainer download, the `crowdsec` Middleware CRD, the
|
||||||
|
`captcha.html` template + its ConfigMap and volume mount, and the Cloudflare
|
||||||
|
Turnstile widget (`cloudflare_turnstile_widget.crowdsec_captcha`). It was
|
||||||
|
replaced by the two out-of-band surfaces above, which add zero per-request
|
||||||
|
latency and fail open. (The earlier `crowdsec-cf-sync` cursor-pagination /
|
||||||
|
IP-List-capacity issues are also moot now that CAPI is excluded from the edge
|
||||||
|
list and dropped in-kernel instead.)
|
||||||
|
|
||||||
**Metabase** (disabled by default):
|
**Metabase** (disabled by default):
|
||||||
- Dashboard for CrowdSec analytics
|
- Dashboard for CrowdSec analytics
|
||||||
|
|
@ -377,10 +405,12 @@ Beads: `code-8ywc` W1.6 + W1.7. **Status: planned.**
|
||||||
|
|
||||||
| Path | Purpose |
|
| Path | Purpose |
|
||||||
|------|---------|
|
|------|---------|
|
||||||
| `stacks/crowdsec/` | CrowdSec LAPI, agent, bouncer config |
|
| `stacks/crowdsec/` | CrowdSec LAPI, agent config + `whitelist.yaml` |
|
||||||
|
| `stacks/crowdsec/modules/crowdsec/firewall_bouncer.tf` | cs-firewall-bouncer DaemonSet (in-kernel nftables drop, direct hosts) |
|
||||||
|
| `stacks/rybbit/crowdsec_edge.tf` + `lapi_kv_sync.py` | Cloudflare IP-List + WAF block rule + LAPI→CF sync CronJob (proxied hosts) |
|
||||||
| `stacks/kyverno/` | Kyverno deployment + policies |
|
| `stacks/kyverno/` | Kyverno deployment + policies |
|
||||||
| `stacks/poison-fountain/` | Anti-AI service + CronJob |
|
| `stacks/poison-fountain/` | Anti-AI service + CronJob |
|
||||||
| `stacks/platform/modules/traefik/middleware.tf` | Security middleware definitions |
|
| `stacks/traefik/modules/traefik/middleware.tf` | Security middleware definitions (no longer includes a CrowdSec bouncer) |
|
||||||
| `stacks/platform/modules/ingress_factory/` | Per-service security toggles |
|
| `stacks/platform/modules/ingress_factory/` | Per-service security toggles |
|
||||||
|
|
||||||
### Vault Paths
|
### Vault Paths
|
||||||
|
|
@ -490,7 +520,11 @@ spec:
|
||||||
**Fix**:
|
**Fix**:
|
||||||
1. Check LAPI decisions: `kubectl exec -it crowdsec-lapi-0 -- cscli decisions list`
|
1. Check LAPI decisions: `kubectl exec -it crowdsec-lapi-0 -- cscli decisions list`
|
||||||
2. Remove ban: `kubectl exec -it crowdsec-lapi-0 -- cscli decisions delete --ip <IP>`
|
2. Remove ban: `kubectl exec -it crowdsec-lapi-0 -- cscli decisions delete --ip <IP>`
|
||||||
3. Whitelist if needed: Add to `stacks/crowdsec/whitelist.yaml`
|
— the in-kernel drop clears as soon as `cs-firewall-bouncer` reconciles (direct
|
||||||
|
hosts); for proxied hosts the `crowdsec-cf-sync` CronJob removes it from the
|
||||||
|
`crowdsec_ban` CF list within ~2 min.
|
||||||
|
3. Whitelist if needed: Add to `stacks/crowdsec/whitelist.yaml` (RFC1918 + tailnet
|
||||||
|
+ internal CIDRs are already whitelisted, so internal clients are never banned).
|
||||||
|
|
||||||
### Kyverno Policy Blocking Deployment
|
### Kyverno Policy Blocking Deployment
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue