From 65b0f30d5ebe37a687c863ad8cefd52ef491d2fe Mon Sep 17 00:00:00 2001
From: Viktor Barzin <vbarzin@gmail.com>
Date: Fri, 17 Apr 2026 21:43:13 +0000
Subject: [PATCH] [docs] Update anti-AI and rybbit docs after rewrite-body
 removal
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- Anti-AI: 5-layer → 3 active layers (bot-block, X-Robots-Tag, tarpit)
- Layer 3 (trap links via rewrite-body) removed — Yaegi v3 incompatible
- Rybbit analytics now injected via Cloudflare Worker (HTMLRewriter)
- strip-accept-encoding middleware removed from all references

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 .claude/reference/patterns.md                 |  9 ++--
 docs/architecture/security.md                 | 48 ++++++++-----------
 .../2026-02-22-anti-ai-scraping-design.md     | 21 ++++----
 3 files changed, 36 insertions(+), 42 deletions(-)
diff --git a/.claude/reference/patterns.md b/.claude/reference/patterns.md
index 6035b764..4a563e3c 100644
--- a/.claude/reference/patterns.md
+++ b/.claude/reference/patterns.md
@@ -26,11 +26,12 @@ module "nfs_data" {
 ## ~~iSCSI Storage~~ (REMOVED — replaced by proxmox-lvm)
 > iSCSI via democratic-csi and TrueNAS has been fully removed (2026-04). All database storage now uses `StorageClass: proxmox-lvm` (Proxmox CSI, LVM-thin hotplug). TrueNAS has been decommissioned.
 
-## Anti-AI Scraping (5-Layer Defense)
+## Anti-AI Scraping (3 Active Layers) (Updated 2026-04-17)
 Default `anti_ai_scraping = true` in ingress_factory. Disable per-service: `anti_ai_scraping = false`.
-1. Bot blocking (ForwardAuth → poison-fountain) 2. X-Robots-Tag noai 3. Trap links before `</body>`
-4. Tarpit (~100 bytes/sec) 5. Poison content (CronJob every 6h, `--http1.1` required)
-Key files: `stacks/poison-fountain/`, `stacks/platform/modules/traefik/middleware.tf`
+1. Bot blocking (ForwardAuth → poison-fountain) 2. X-Robots-Tag noai 3. Tarpit/poison content (standalone at poison.viktorbarzin.me)
+Trap links (formerly layer 3) removed April 2026 — rewrite-body plugin broken on Traefik v3.6.12 (Yaegi bugs). `strip-accept-encoding` and `anti-ai-trap-links` middlewares deleted.
+Rybbit analytics injection now via Cloudflare Worker (`stacks/rybbit/worker/`, HTMLRewriter, wildcard route `*.viktorbarzin.me/*`, 28 site ID mappings).
+Key files: `stacks/poison-fountain/`, `stacks/rybbit/worker/`, `stacks/platform/modules/traefik/middleware.tf`
 
 ## Terragrunt Architecture
 - Root `terragrunt.hcl`: DRY providers, backend, variable loading, `generate "tiers"` block
diff --git a/docs/architecture/security.md b/docs/architecture/security.md
index 7796a88c..3d68bedc 100644
--- a/docs/architecture/security.md
+++ b/docs/architecture/security.md
@@ -2,7 +2,7 @@
 
 ## Overview
 
-The homelab implements defense-in-depth security at the application layer (L7) using CrowdSec for threat intelligence and IP reputation, Kyverno for policy enforcement and resource governance, and a 5-layer anti-AI scraping defense. All security components operate in graceful degradation mode (fail-open) to prevent cascading failures. Security policies are deployed in audit mode first, then selectively enforced after validation.
+The homelab implements defense-in-depth security at the application layer (L7) using CrowdSec for threat intelligence and IP reputation, Kyverno for policy enforcement and resource governance, and a 3-layer anti-AI scraping defense (reduced from 5 in April 2026 after removing the rewrite-body plugin). All security components operate in graceful degradation mode (fail-open) to prevent cascading failures. Security policies are deployed in audit mode first, then selectively enforced after validation.
 
 ## Architecture Diagram
 
@@ -59,7 +59,7 @@ Every incoming request passes through 6 security layers:
 1. **Cloudflare WAF** - DDoS protection, bot detection, firewall rules (external)
 2. **Cloudflared Tunnel** - Zero Trust tunnel, hides origin IP
 3. **CrowdSec Bouncer** - IP reputation check against LAPI (fail-open on error)
-4. **Anti-AI Scraping** - 5-layer bot defense (optional per service)
+4. **Anti-AI Scraping** - 3-layer bot defense (optional per service, updated 2026-04-17)
 5. **Authentik ForwardAuth** - Authentication check (if `protected = true`)
 6. **Rate Limiting** - Per-source IP rate limits (returns 429 on breach)
 7. **Retry Middleware** - Auto-retry on transient errors (2 attempts, 100ms delay)
@@ -131,10 +131,12 @@ This prevents resource exhaustion and enforces governance without manual quota m
 | `sync-tier-label` | Propagate tier label to child resources | Enforce |
 | `goldilocks-vpa-auto-mode` | Disable VPA globally (VPA off) | Enforce |
 
-### Anti-AI Scraping (5-Layer Defense)
+### Anti-AI Scraping (3 Active Layers) (Updated 2026-04-17)
 
 Enabled by default via `ingress_factory` module. Disable per-service with `anti_ai_scraping = false`.
 
+Active middleware chain: `ai-bot-block` (ForwardAuth) + `anti-ai-headers` (X-Robots-Tag). The `strip-accept-encoding` and `anti-ai-trap-links` middlewares were removed in April 2026 due to Traefik v3.6.12 Yaegi plugin incompatibility with the rewrite-body plugin.
+
 #### Layer 1: Bot Blocking (ForwardAuth)
 
 - Middleware calls `poison-fountain` service before backend
@@ -148,25 +150,16 @@ Enabled by default via `ingress_factory` module. Disable per-service with `anti_
 - Instructs compliant bots to skip content
 - Lightweight, no performance impact
 
-#### Layer 3: Trap Links
+#### ~~Layer 3: Trap Links~~ (REMOVED)
 
-- JavaScript injects invisible links before `</body>`
-- Links point to honeypot endpoints
-- Legitimate browsers don't click, bots follow
-- Triggered bots get added to ban list
+Removed April 2026. The rewrite-body Traefik plugin used to inject hidden trap links broke on Traefik v3.6.12 due to Yaegi runtime bugs. The companion `strip-accept-encoding` middleware was also removed.
 
-#### Layer 4: Tarpit
-
-- Serves AI bots extremely slowly (~100 bytes/sec)
-- Wastes bot resources, makes scraping uneconomical
-- Humans see normal speed (only applies to detected bots)
-
-#### Layer 5: Poison Content
+#### Layer 3 (formerly 4): Tarpit / Poison Content
 
+- `poison-fountain` service still exists as a standalone service at `poison.viktorbarzin.me`
+- Serves AI bots extremely slowly (~100 bytes/sec tarpit)
 - CronJob every 6 hours generates fake content
-- Injects misleading/nonsense data into pages shown to bots
-- Degrades AI training data quality
-- **Requires `--http1.1` flag** to work with current HTTP/2 setup
+- Trap links are no longer injected into real pages, but bots that discover `poison.viktorbarzin.me` directly still get tarpitted and poisoned
 
 **Implementation**: See `stacks/poison-fountain/` and `stacks/platform/modules/traefik/middleware.tf`
 
@@ -286,13 +279,13 @@ spec:
 - **Better observability**: Collect violation metrics before enforcing
 - **Selective enforcement**: Move to enforce mode per-policy after validation
 
-### Why 5-Layer Anti-AI Defense?
+### Why Multi-Layer Anti-AI Defense? (Updated 2026-04-17)
 
 - **Defense in depth**: Each layer catches different bot types
 - **Compliant bots**: Layer 2 (X-Robots-Tag) handles respectful crawlers
-- **Dumb bots**: Layer 3 (trap links) catches simple scrapers
-- **Persistent bots**: Layer 4 (tarpit) makes scraping uneconomical
-- **Sophisticated bots**: Layer 5 (poison content) degrades training data
+- **Persistent bots**: Tarpit makes scraping uneconomical
+- **Poison content**: Degrades training data for bots that reach poison-fountain
+- Layer 3 (trap links via rewrite-body) was removed due to Traefik v3 plugin incompatibility
 
 ### Why Fail-Open Mode?
 
@@ -382,15 +375,16 @@ spec:
 2. Verify backend isn't returning transient errors: Check for 5xx responses
 3. Disable retry for specific service: Remove retry middleware from `ingress_factory`
 
-### Poison Content Not Injecting
+### Poison Content Not Serving (Updated 2026-04-17)
 
-**Problem**: Bots not receiving poisoned content.
+**Problem**: Bots not receiving poisoned content on `poison.viktorbarzin.me`.
+
+**Note**: Poison content is no longer injected into real pages (rewrite-body removed). It is only served directly via the `poison.viktorbarzin.me` subdomain.
 
 **Fix**:
 1. Verify CronJob running: `kubectl get cronjob -n poison-fountain`
-2. Check logs: `kubectl logs -n poison-fountain -l app=poison-content-injector`
-3. Ensure `--http1.1` flag set (required for HTTP/2 backends)
-4. Manually trigger: `kubectl create job --from=cronjob/poison-content manual-poison`
+2. Check logs: `kubectl logs -n poison-fountain -l app=poison-fountain`
+3. Manually trigger: `kubectl create job --from=cronjob/poison-content manual-poison`
 
 ## Related
 
diff --git a/docs/plans/2026-02-22-anti-ai-scraping-design.md b/docs/plans/2026-02-22-anti-ai-scraping-design.md
index fa9fd330..b1072981 100644
--- a/docs/plans/2026-02-22-anti-ai-scraping-design.md
+++ b/docs/plans/2026-02-22-anti-ai-scraping-design.md
@@ -1,5 +1,7 @@
 # Anti-AI Scraping System Design
 
+> **Status (Updated 2026-04-17):** Partially superseded. Layer 3 (trap links via rewrite-body plugin) removed due to Traefik v3.6.12 Yaegi plugin incompatibility. The `strip-accept-encoding` and `anti-ai-trap-links` middlewares have been deleted. Rybbit analytics injection moved from Traefik rewrite-body to a Cloudflare Worker (`infra/stacks/rybbit/worker/`). Active layers: 1 (bot-block), 2 (headers), 4 (tarpit), 5 (poison content).
+
 ## Problem
 
 AI scrapers crawl public web services to harvest training data. We want to:
@@ -9,7 +11,7 @@ AI scrapers crawl public web services to harvest training data. We want to:
 
 ## Architecture
 
-Five defense layers applied to all public services via Traefik:
+Four active defense layers applied to all public services via Traefik (Layer 3 removed April 2026):
 
 ```
 Internet -> Cloudflare -> Traefik
@@ -18,7 +20,7 @@ Internet -> Cloudflare -> Traefik
                            |
                            +-- Layer 2: Headers -> X-Robots-Tag: noai, noimageai
                            |
-                           +-- Layer 3: Rewrite-body -> inject hidden trap links into HTML
+                           +-- [REMOVED] Layer 3: Rewrite-body trap links (April 2026 — Yaegi bugs in Traefik v3.6.12)
                            |
                            +-- Layer 4: Poison service -> serve cached Poison Fountain data
                            |
@@ -68,13 +70,10 @@ All defined in `stacks/platform/modules/traefik/middleware.tf`:
 - Sets `X-Robots-Tag: noai, noimageai` on all responses
 - Added to all public services via ingress_factory
 
-**`anti-ai-trap-links` (rewrite-body plugin)**:
-- Regex: `</body>` -> injects hidden div with trap links + `</body>`
-- Links point to `https://poison.viktorbarzin.me/article/<slug>`
-- CSS: invisible to humans (`position:absolute;left:-9999px;height:0;overflow:hidden;aria-hidden=true`)
-- Only processes `text/html` responses
-- Requires strip-accept-encoding companion middleware (already exists)
-- Applied globally via ingress_factory
+**`anti-ai-trap-links` (rewrite-body plugin)** — REMOVED (Updated 2026-04-17):
+- Removed due to Traefik v3.6.12 Yaegi runtime bugs making the rewrite-body plugin unreliable
+- The companion `strip-accept-encoding` middleware was also removed (only existed for rewrite-body)
+- Trap link injection is no longer active; poison-fountain still serves tarpit content standalone
 
 ### 4. Trap subdomain: poison.viktorbarzin.me
 
@@ -88,7 +87,7 @@ All defined in `stacks/platform/modules/traefik/middleware.tf`:
 
 New variables:
 - `anti_ai_scraping` (bool, default: true) - enable all anti-AI layers
-- When true, adds to middleware chain: `ai-bot-block`, `anti-ai-headers`, `strip-accept-encoding`, `anti-ai-trap-links`
+- When true, adds to middleware chain: `ai-bot-block`, `anti-ai-headers`
 - Services can opt out with `anti_ai_scraping = false`
 
 ## Human User Protection
@@ -97,7 +96,7 @@ New variables:
 |---------|-----------|
 | Hidden links visible | CSS `position:absolute;left:-9999px;height:0;overflow:hidden` + `aria-hidden="true"` |
 | False positive blocking | Only blocks specific AI bot User-Agent strings; no browser matches these |
-| Performance overhead | ForwardAuth is a string match (<1ms). Rewrite-body already proven with Rybbit. |
+| Performance overhead | ForwardAuth is a string match (<1ms). Rybbit injected via Cloudflare Worker (not Traefik). |
 | Poison content leakage | Only served on poison.viktorbarzin.me, not linked from any navigation |
 | Slow responses | Tarpit only applies to poison.viktorbarzin.me, not to real services |