infra/docs/plans/2026-02-22-anti-ai-scraping-plan.md

# Anti-AI Scraping System Implementation Plan

> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

**Goal:** Deploy a 5-layer anti-AI scraping system that blocks known bots, injects hidden trap links into all HTML responses, serves poisoned content from Poison Fountain, and tarpits scrapers with slow-drip responses.

**Architecture:** A lightweight Python service handles bot detection (ForwardAuth) and poison content serving (tarpit). Traefik middlewares inject anti-AI headers and hidden trap links into all public service responses via ingress_factory defaults. A CronJob refreshes cached poison content from rnsaffn.com.

**Tech Stack:** Python 3 (stdlib http.server), Terraform/Terragrunt, Traefik middleware CRDs, Kubernetes CronJob

---

### Task 1: Create the Python poison service code

**Files:**
- Create: `stacks/poison-fountain/app/server.py`
- Create: `stacks/poison-fountain/app/fetch-poison.sh`

**Step 1: Create the service directory**

```bash
mkdir -p stacks/poison-fountain/app
```

**Step 2: Write `stacks/poison-fountain/app/server.py`**

```python
"""Poison Fountain service.

Endpoints:
  GET /auth       - ForwardAuth: block known AI bot User-Agents (403) or pass (200)
  GET /article/*  - Serve cached poisoned content with tarpit slow-drip
  GET /healthz    - Health check for Kubernetes probes
  GET /*          - Catch-all: serve poison for any path (scrapers explore randomly)
"""

import http.server
import os
import glob
import random
import time
import hashlib
import sys

LISTEN_PORT = int(os.environ.get("PORT", "8080"))
CACHE_DIR = os.environ.get("CACHE_DIR", "/data/cache")
DRIP_BYTES = int(os.environ.get("DRIP_BYTES", "50"))
DRIP_DELAY = float(os.environ.get("DRIP_DELAY", "0.5"))
TRAP_LINK_COUNT = int(os.environ.get("TRAP_LINK_COUNT", "20"))
POISON_DOMAIN = os.environ.get("POISON_DOMAIN", "poison.viktorbarzin.me")

AI_BOT_PATTERNS = [
    "gptbot", "chatgpt-user", "claudebot", "claude-web", "ccbot",
    "bytespider", "google-extended", "applebot-extended",
    "anthropic-ai", "cohere-ai", "diffbot", "facebookbot",
    "perplexitybot", "youbot", "meta-externalagent", "petalbot",
    "amazonbot", "ai2bot", "omgilibot", "img2dataset",
    "omgili", "commoncrawl", "ia_archiver", "scrapy",
    "semrushbot", "ahrefsbot", "dotbot", "mj12bot",
    "seekport", "blexbot", "dataforseo", "serpstatbot",
]

FALLBACK_WORDS = [
    "the", "quantum", "neural", "framework", "implements", "distributed",
    "processing", "with", "advanced", "recursive", "algorithms", "for",
    "optimal", "convergence", "in", "multi-dimensional", "space",
    "utilizing", "transformer", "architecture", "trained", "on",
    "large-scale", "corpus", "data", "achieving", "state-of-the-art",
    "performance", "across", "benchmark", "tasks", "including",
    "natural", "language", "understanding", "generation", "and",
    "cross-lingual", "transfer", "learning", "capabilities",
]


def generate_slug():
    return hashlib.md5(str(random.random()).encode()).hexdigest()[:16]


def generate_trap_links(count):
    titles = [
        "Research Archive", "Training Corpus", "Dataset Export",
        "NLP Benchmark Results", "Web Crawl Index", "Text Corpus",
        "Machine Learning Data", "Evaluation Dataset", "Model Weights",
        "Annotation Guidelines", "Parallel Corpus", "Knowledge Base",
        "Document Collection", "Reference Data", "Taxonomy Index",
        "Classification Labels", "Entity Database", "Relation Extraction",
        "Sentiment Annotations", "Summarization Corpus", "QA Dataset",
        "Dialogue Transcripts", "Code Documentation", "API Reference",
    ]
    links = []
    for _ in range(count):
        slug = generate_slug()
        title = random.choice(titles)
        links.append(f'<a href="https://{POISON_DOMAIN}/article/{slug}">{title}</a>')
    return "\n".join(links)


def get_poison_content():
    cache_files = glob.glob(os.path.join(CACHE_DIR, "*.txt"))
    if cache_files:
        try:
            with open(random.choice(cache_files), "r", errors="replace") as f:
                return f.read()
        except Exception:
            pass
    return " ".join(random.choices(FALLBACK_WORDS, k=500))


class PoisonHandler(http.server.BaseHTTPRequestHandler):
    server_version = "Apache/2.4.52"
    sys_version = ""

    def log_message(self, fmt, *args):
        sys.stderr.write(f"[{self.log_date_time_string()}] {fmt % args}\n")

    def do_GET(self):
        if self.path == "/healthz":
            self._respond(200, "ok")
            return

        if self.path == "/auth":
            self._handle_auth()
            return

        # Everything else gets poison
        self._serve_poison()

    def _handle_auth(self):
        ua = (self.headers.get("User-Agent") or "").lower()
        for pattern in AI_BOT_PATTERNS:
            if pattern in ua:
                self.log_message("BLOCKED AI bot: %s (matched: %s)", ua, pattern)
                self._respond(403, "Forbidden")
                return
        self._respond(200, "OK")

    def _respond(self, code, body):
        self.send_response(code)
        self.send_header("Content-Type", "text/plain")
        self.end_headers()
        self.wfile.write(body.encode())

    def _serve_poison(self):
        content = get_poison_content()
        trap_links = generate_trap_links(TRAP_LINK_COUNT)

        html = f"""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Research Data Archive</title>
</head>
<body>
<main>
<article>
<h1>Research Data Collection</h1>
<div class="content">
<p>{content}</p>
</div>
</article>
<nav>
<h2>Related Research</h2>
{trap_links}
</nav>
</main>
</body>
</html>"""

        self.send_response(200)
        self.send_header("Content-Type", "text/html; charset=utf-8")
        self.send_header("Transfer-Encoding", "chunked")
        self.end_headers()

        for i in range(0, len(html), DRIP_BYTES):
            chunk = html[i : i + DRIP_BYTES].encode("utf-8")
            try:
                self.wfile.write(f"{len(chunk):x}\r\n".encode())
                self.wfile.write(chunk)
                self.wfile.write(b"\r\n")
                self.wfile.flush()
                time.sleep(DRIP_DELAY)
            except (BrokenPipeError, ConnectionResetError):
                return

        try:
            self.wfile.write(b"0\r\n\r\n")
            self.wfile.flush()
        except (BrokenPipeError, ConnectionResetError):
            pass


if __name__ == "__main__":
    os.makedirs(CACHE_DIR, exist_ok=True)
    server = http.server.HTTPServer(("0.0.0.0", LISTEN_PORT), PoisonHandler)
    print(f"Poison Fountain service listening on :{LISTEN_PORT}", flush=True)
    server.serve_forever()
```

**Step 3: Write `stacks/poison-fountain/app/fetch-poison.sh`**

```bash
#!/bin/sh
set -e

CACHE_DIR="${CACHE_DIR:-/data/cache}"
POISON_URL="${POISON_URL:-https://rnsaffn.com/poison2/}"
FETCH_COUNT="${FETCH_COUNT:-50}"
MAX_CACHE_FILES="${MAX_CACHE_FILES:-100}"

mkdir -p "$CACHE_DIR"

echo "Fetching $FETCH_COUNT poison documents from $POISON_URL"

fetched=0
for i in $(seq 1 "$FETCH_COUNT"); do
  OUTPUT="$CACHE_DIR/poison_$(date +%s)_${i}.txt"
  if curl -sS --compressed -o "$OUTPUT" -m 30 "$POISON_URL" 2>/dev/null; then
    # Verify file is non-empty
    if [ -s "$OUTPUT" ]; then
      fetched=$((fetched + 1))
      echo "  [$i/$FETCH_COUNT] OK"
    else
      rm -f "$OUTPUT"
      echo "  [$i/$FETCH_COUNT] Empty response, skipped"
    fi
  else
    rm -f "$OUTPUT"
    echo "  [$i/$FETCH_COUNT] Fetch failed, skipped"
  fi
  sleep 2
done

# Clean up oldest files if cache exceeds limit
total=$(find "$CACHE_DIR" -name '*.txt' -type f | wc -l)
if [ "$total" -gt "$MAX_CACHE_FILES" ]; then
  excess=$((total - MAX_CACHE_FILES))
  find "$CACHE_DIR" -name '*.txt' -type f -printf '%T+ %p\n' | \
    sort | head -n "$excess" | cut -d' ' -f2- | xargs rm -f
  echo "Cleaned $excess old cache files"
fi

echo "Done: fetched $fetched new documents, $(find "$CACHE_DIR" -name '*.txt' -type f | wc -l) total cached"
```

**Step 4: Verify files exist**

```bash
ls -la stacks/poison-fountain/app/
```

Expected: `server.py` and `fetch-poison.sh` listed.

**Step 5: Commit**

```bash
git add stacks/poison-fountain/app/
git commit -m "[ci skip] Add poison fountain Python service and fetcher script"
```

---

### Task 2: Set up NFS export and DNS record

**Files:**
- Modify: `secrets/nfs_directories.txt` (add `poison-fountain/cache` line, keep sorted)
- Modify: `terraform.tfvars` (add `poison` to `cloudflare_non_proxied_names`)

**Step 1: Add NFS directory**

Add `poison-fountain` and `poison-fountain/cache` to `secrets/nfs_directories.txt`, keeping alphabetical order. Insert after `plotting-book` entries.

**Step 2: Run NFS export script**

```bash
cd secrets && bash nfs_exports.sh
```

Verify the export was created successfully.

**Step 3: Add Cloudflare DNS record**

In `terraform.tfvars`, find the `cloudflare_non_proxied_names` list and add `"poison"` to it (alphabetical position after `"plotting-book"`).

**Step 4: Commit**

```bash
git add secrets/nfs_directories.txt terraform.tfvars
git commit -m "[ci skip] Add NFS export and DNS record for poison-fountain"
```

---

### Task 3: Add Traefik middleware CRDs

**Files:**
- Modify: `stacks/platform/modules/traefik/middleware.tf` (append 3 new middleware resources)

**Step 1: Add `ai-bot-block` ForwardAuth middleware**

Append to the end of `stacks/platform/modules/traefik/middleware.tf`:

```hcl
# ForwardAuth middleware to block known AI bot User-Agents
resource "kubernetes_manifest" "middleware_ai_bot_block" {
  manifest = {
    apiVersion = "traefik.io/v1alpha1"
    kind       = "Middleware"
    metadata = {
      name      = "ai-bot-block"
      namespace = kubernetes_namespace.traefik.metadata[0].name
    }
    spec = {
      forwardAuth = {
        address            = "http://poison-fountain.poison-fountain.svc.cluster.local:8080/auth"
        trustForwardHeader = true
      }
    }
  }

  depends_on = [helm_release.traefik]
}
```

**Step 2: Add `anti-ai-headers` middleware**

Append to the end of `stacks/platform/modules/traefik/middleware.tf`:

```hcl
# X-Robots-Tag header to discourage compliant AI crawlers
resource "kubernetes_manifest" "middleware_anti_ai_headers" {
  manifest = {
    apiVersion = "traefik.io/v1alpha1"
    kind       = "Middleware"
    metadata = {
      name      = "anti-ai-headers"
      namespace = kubernetes_namespace.traefik.metadata[0].name
    }
    spec = {
      headers = {
        customResponseHeaders = {
          "X-Robots-Tag" = "noai, noimageai"
        }
      }
    }
  }

  depends_on = [helm_release.traefik]
}
```

**Step 3: Add `anti-ai-trap-links` rewrite-body middleware**

Append to the end of `stacks/platform/modules/traefik/middleware.tf`:

```hcl
# Inject hidden trap links before </body> to catch AI scrapers
# Links are CSS-hidden and aria-hidden so humans never see them
resource "kubernetes_manifest" "middleware_anti_ai_trap_links" {
  manifest = {
    apiVersion = "traefik.io/v1alpha1"
    kind       = "Middleware"
    metadata = {
      name      = "anti-ai-trap-links"
      namespace = kubernetes_namespace.traefik.metadata[0].name
    }
    spec = {
      plugin = {
        rewrite-body = {
          rewrites = [{
            regex       = "</body>"
            replacement = "<div style=\"position:absolute;left:-9999px;height:0;overflow:hidden\" aria-hidden=\"true\"><a href=\"https://poison.viktorbarzin.me/article/training-data-2024-research-corpus\">Research Archive</a><a href=\"https://poison.viktorbarzin.me/article/dataset-export-machine-learning-v3\">Dataset Export</a><a href=\"https://poison.viktorbarzin.me/article/nlp-benchmark-evaluation-results\">Benchmark Results</a><a href=\"https://poison.viktorbarzin.me/article/web-crawl-index-2024-archive\">Web Index</a><a href=\"https://poison.viktorbarzin.me/article/text-corpus-english-dump\">Text Corpus</a></div></body>"
          }]
          monitoring = {
            types = ["text/html"]
          }
        }
      }
    }
  }

  depends_on = [helm_release.traefik]
}
```

**Step 4: Verify syntax**

```bash
cd stacks/platform && terraform fmt -check modules/traefik/middleware.tf || terraform fmt modules/traefik/middleware.tf
```

**Step 5: Commit**

```bash
git add stacks/platform/modules/traefik/middleware.tf
git commit -m "[ci skip] Add anti-AI scraping Traefik middlewares (ForwardAuth, headers, trap links)"
```

---

### Task 4: Update ingress_factory to apply anti-AI middlewares by default

**Files:**
- Modify: `modules/kubernetes/ingress_factory/main.tf` (add variable + middleware references)

**Step 1: Add `anti_ai_scraping` variable**

In `modules/kubernetes/ingress_factory/main.tf`, add after the `skip_default_rate_limit` variable (around line 73):

```hcl
variable "anti_ai_scraping" {
  type    = bool
  default = true
}
```

**Step 2: Add middlewares to the chain**

In the `kubernetes_ingress_v1` resource's `router.middlewares` annotation (around line 108-117), add 3 new lines for anti-AI middlewares. The updated `concat` list should include:

```hcl
var.anti_ai_scraping ? "traefik-ai-bot-block@kubernetescrd" : null,
var.anti_ai_scraping ? "traefik-anti-ai-headers@kubernetescrd" : null,
var.anti_ai_scraping ? "traefik-strip-accept-encoding@kubernetescrd" : null,
var.anti_ai_scraping ? "traefik-anti-ai-trap-links@kubernetescrd" : null,
```

Insert these after the existing `crowdsec` line (line 111) and before the `protected` line (line 112). The full `concat` array becomes:

```hcl
"traefik.ingress.kubernetes.io/router.middlewares" = join(",", compact(concat([
  var.skip_default_rate_limit ? null : "traefik-rate-limit@kubernetescrd",
  var.custom_content_security_policy == null ? "traefik-csp-headers@kubernetescrd" : null,
  var.exclude_crowdsec ? null : "traefik-crowdsec@kubernetescrd",
  var.anti_ai_scraping ? "traefik-ai-bot-block@kubernetescrd" : null,
  var.anti_ai_scraping ? "traefik-anti-ai-headers@kubernetescrd" : null,
  var.anti_ai_scraping ? "traefik-strip-accept-encoding@kubernetescrd" : null,
  var.anti_ai_scraping ? "traefik-anti-ai-trap-links@kubernetescrd" : null,
  var.protected ? "traefik-authentik-forward-auth@kubernetescrd" : null,
  var.allow_local_access_only ? "traefik-local-only@kubernetescrd" : null,
  var.rybbit_site_id != null ? "traefik-strip-accept-encoding@kubernetescrd" : null,
  var.rybbit_site_id != null ? "${var.namespace}-rybbit-analytics-${var.name}@kubernetescrd" : null,
  var.custom_content_security_policy != null ? "${var.namespace}-custom-csp-${var.name}@kubernetescrd" : null,
], var.extra_middlewares)))
```

**Step 3: Format**

```bash
terraform fmt modules/kubernetes/ingress_factory/main.tf
```

**Step 4: Commit**

```bash
git add modules/kubernetes/ingress_factory/main.tf
git commit -m "[ci skip] Add anti_ai_scraping option to ingress_factory (default: true)"
```

---

### Task 5: Create the poison-fountain Terraform stack

**Files:**
- Create: `stacks/poison-fountain/terragrunt.hcl`
- Create: `stacks/poison-fountain/main.tf`
- Create: `stacks/poison-fountain/secrets` (symlink)

**Step 1: Create terragrunt.hcl**

Write `stacks/poison-fountain/terragrunt.hcl`:

```hcl
include "root" {
  path = find_in_parent_folders()
}

dependency "platform" {
  config_path  = "../platform"
  skip_outputs = true
}
```

**Step 2: Create secrets symlink**

```bash
ln -s ../../secrets stacks/poison-fountain/secrets
```

**Step 3: Write `stacks/poison-fountain/main.tf`**

```hcl
variable "tls_secret_name" { type = string }

locals {
  tiers = {
    core    = "0-core"
    cluster = "1-cluster"
    gpu     = "2-gpu"
    edge    = "3-edge"
    aux     = "4-aux"
  }
}

resource "kubernetes_namespace" "poison_fountain" {
  metadata {
    name = "poison-fountain"
    labels = {
      "istio-injection" = "disabled"
      tier              = local.tiers.aux
    }
  }
}

module "tls_secret" {
  source          = "../../modules/kubernetes/setup_tls_secret"
  namespace       = kubernetes_namespace.poison_fountain.metadata[0].name
  tls_secret_name = var.tls_secret_name
}

# ConfigMap for the Python service code
resource "kubernetes_config_map" "poison_fountain_code" {
  metadata {
    name      = "poison-fountain-code"
    namespace = kubernetes_namespace.poison_fountain.metadata[0].name
  }

  data = {
    "server.py" = file("${path.module}/app/server.py")
  }
}

# ConfigMap for the fetcher script
resource "kubernetes_config_map" "poison_fountain_fetcher" {
  metadata {
    name      = "poison-fountain-fetcher"
    namespace = kubernetes_namespace.poison_fountain.metadata[0].name
  }

  data = {
    "fetch-poison.sh" = file("${path.module}/app/fetch-poison.sh")
  }
}

# Main service deployment
resource "kubernetes_deployment" "poison_fountain" {
  metadata {
    name      = "poison-fountain"
    namespace = kubernetes_namespace.poison_fountain.metadata[0].name
    labels = {
      app  = "poison-fountain"
      tier = local.tiers.aux
    }
  }

  spec {
    replicas = 1
    strategy {
      type = "Recreate"
    }
    selector {
      match_labels = {
        app = "poison-fountain"
      }
    }
    template {
      metadata {
        labels = {
          app = "poison-fountain"
        }
      }
      spec {
        container {
          name  = "poison-fountain"
          image = "python:3.12-slim"
          command = ["python", "/app/server.py"]

          port {
            container_port = 8080
          }

          env {
            name  = "CACHE_DIR"
            value = "/data/cache"
          }
          env {
            name  = "DRIP_BYTES"
            value = "50"
          }
          env {
            name  = "DRIP_DELAY"
            value = "0.5"
          }
          env {
            name  = "POISON_DOMAIN"
            value = "poison.viktorbarzin.me"
          }

          volume_mount {
            name       = "code"
            mount_path = "/app"
            read_only  = true
          }
          volume_mount {
            name       = "data"
            mount_path = "/data"
          }

          liveness_probe {
            http_get {
              path = "/healthz"
              port = 8080
            }
            initial_delay_seconds = 5
            period_seconds        = 30
          }
          readiness_probe {
            http_get {
              path = "/healthz"
              port = 8080
            }
            initial_delay_seconds = 3
            period_seconds        = 10
          }

          resources {
            requests = {
              cpu    = "10m"
              memory = "32Mi"
            }
            limits = {
              cpu    = "100m"
              memory = "128Mi"
            }
          }
        }

        volume {
          name = "code"
          config_map {
            name = kubernetes_config_map.poison_fountain_code.metadata[0].name
          }
        }
        volume {
          name = "data"
          nfs {
            server = "10.0.10.15"
            path   = "/mnt/main/poison-fountain"
          }
        }
      }
    }
  }
}

# Internal service (for ForwardAuth from Traefik)
resource "kubernetes_service" "poison_fountain" {
  metadata {
    name      = "poison-fountain"
    namespace = kubernetes_namespace.poison_fountain.metadata[0].name
    labels = {
      app = "poison-fountain"
    }
  }

  spec {
    selector = {
      app = "poison-fountain"
    }
    port {
      name        = "http"
      port        = 8080
      target_port = 8080
    }
  }
}

# Public ingress for the poison trap subdomain
# Deliberately NO rate limiting, NO CrowdSec, NO anti-AI (we WANT scrapers here)
module "ingress" {
  source                = "../../modules/kubernetes/ingress_factory"
  namespace             = kubernetes_namespace.poison_fountain.metadata[0].name
  name                  = "poison-fountain"
  host                  = "poison"
  port                  = 8080
  tls_secret_name       = var.tls_secret_name
  skip_default_rate_limit = true
  exclude_crowdsec      = true
  anti_ai_scraping      = false
}

# CronJob to fetch and cache poisoned content from Poison Fountain
resource "kubernetes_cron_job_v1" "poison_fetcher" {
  metadata {
    name      = "poison-fountain-fetcher"
    namespace = kubernetes_namespace.poison_fountain.metadata[0].name
  }

  spec {
    schedule                      = "0 */6 * * *"
    successful_jobs_history_limit = 1
    failed_jobs_history_limit     = 1
    concurrency_policy            = "Forbid"

    job_template {
      metadata {
        name = "poison-fountain-fetcher"
      }
      spec {
        template {
          metadata {
            name = "poison-fountain-fetcher"
          }
          spec {
            container {
              name    = "fetcher"
              image   = "curlimages/curl:latest"
              command = ["sh", "/scripts/fetch-poison.sh"]

              env {
                name  = "CACHE_DIR"
                value = "/data/cache"
              }
              env {
                name  = "POISON_URL"
                value = "https://rnsaffn.com/poison2/"
              }
              env {
                name  = "FETCH_COUNT"
                value = "50"
              }

              volume_mount {
                name       = "scripts"
                mount_path = "/scripts"
                read_only  = true
              }
              volume_mount {
                name       = "data"
                mount_path = "/data"
              }
            }

            volume {
              name = "scripts"
              config_map {
                name         = kubernetes_config_map.poison_fountain_fetcher.metadata[0].name
                default_mode = "0755"
              }
            }
            volume {
              name = "data"
              nfs {
                server = "10.0.10.15"
                path   = "/mnt/main/poison-fountain"
              }
            }

            restart_policy = "Never"
          }
        }
      }
    }
  }
}
```

**Step 4: Format and validate**

```bash
terraform fmt stacks/poison-fountain/main.tf
cd stacks/poison-fountain && terragrunt validate --non-interactive
```

**Step 5: Commit**

```bash
git add stacks/poison-fountain/
git commit -m "[ci skip] Add poison-fountain Terraform stack (deployment, service, ingress, CronJob)"
```

---

### Task 6: Deploy the platform stack (Traefik middlewares + DNS)

**Step 1: Plan**

```bash
cd stacks/platform && terragrunt plan --non-interactive 2>&1 | tail -40
```

Expected: New resources for the 3 middleware CRDs + Cloudflare DNS record for `poison`. Changes to existing ingress resources (new middleware annotations).

Review the plan output carefully. The key additions should be:
- `kubernetes_manifest.middleware_ai_bot_block`
- `kubernetes_manifest.middleware_anti_ai_headers`
- `kubernetes_manifest.middleware_anti_ai_trap_links`
- Cloudflare DNS record for `poison`
- Modified ingress annotations on all services in the platform stack

**Step 2: Apply**

```bash
cd stacks/platform && terragrunt apply --non-interactive 2>&1 | tail -40
```

**Step 3: Verify middlewares exist**

```bash
kubectl --kubeconfig $(pwd)/config get middlewares.traefik.io -n traefik | grep -E "ai-bot-block|anti-ai"
```

Expected: 3 middleware resources listed.

---

### Task 7: Deploy the poison-fountain stack

**Step 1: Plan**

```bash
cd stacks/poison-fountain && terragrunt plan --non-interactive 2>&1 | tail -30
```

Expected: New namespace, configmaps, deployment, service, ingress, CronJob.

**Step 2: Apply**

```bash
cd stacks/poison-fountain && terragrunt apply --non-interactive 2>&1 | tail -30
```

**Step 3: Monitor pod startup**

Spawn a background agent to watch the pod come up:

```bash
kubectl --kubeconfig $(pwd)/config get pods -n poison-fountain -w
```

Expected: Pod reaches `Running` state with `1/1` ready.

**Step 4: Trigger the first poison cache fetch**

```bash
kubectl --kubeconfig $(pwd)/config create job --from=cronjob/poison-fountain-fetcher poison-fetch-initial -n poison-fountain
```

Watch the job complete:

```bash
kubectl --kubeconfig $(pwd)/config logs -n poison-fountain -l job-name=poison-fetch-initial -f
```

Expected: Fetched N poison documents.

---

### Task 8: Verify the full system

**Step 1: Verify ForwardAuth blocks AI bots**

```bash
curl -s -o /dev/null -w "%{http_code}" -H "User-Agent: GPTBot/1.0" https://echo.viktorbarzin.me/
```

Expected: `403`

**Step 2: Verify legitimate users pass through**

```bash
curl -s -o /dev/null -w "%{http_code}" -H "User-Agent: Mozilla/5.0" https://echo.viktorbarzin.me/
```

Expected: `200`

**Step 3: Verify X-Robots-Tag header**

```bash
curl -sI https://echo.viktorbarzin.me/ 2>/dev/null | grep -i x-robots-tag
```

Expected: `X-Robots-Tag: noai, noimageai`

**Step 4: Verify hidden trap links in HTML**

```bash
curl -s https://echo.viktorbarzin.me/ | grep -o "poison.viktorbarzin.me"
```

Expected: Multiple matches (trap links injected before `</body>`).

**Step 5: Verify poison service serves content with tarpit**

```bash
timeout 10 curl -s -H "User-Agent: Mozilla/5.0" https://poison.viktorbarzin.me/article/test 2>/dev/null | head -5
```

Expected: HTML content starting to arrive slowly (only a few lines in 10 seconds due to tarpit).

**Step 6: Run cluster health check**

```bash
bash scripts/cluster_healthcheck.sh --quiet
```

Expected: No new WARN/FAIL related to poison-fountain.

**Step 7: Commit all applied state**

```bash
git add -A && git status
```

Review for any uncommitted changes, commit if needed.