Viktor Barzin 5bc1a47cb8 [ci skip] Add anti-AI scraping implementation plan

2026-02-22 19:41:39 +00:00

25 KiB

Raw Blame History

Anti-AI Scraping System Implementation Plan

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Deploy a 5-layer anti-AI scraping system that blocks known bots, injects hidden trap links into all HTML responses, serves poisoned content from Poison Fountain, and tarpits scrapers with slow-drip responses.

Architecture: A lightweight Python service handles bot detection (ForwardAuth) and poison content serving (tarpit). Traefik middlewares inject anti-AI headers and hidden trap links into all public service responses via ingress_factory defaults. A CronJob refreshes cached poison content from rnsaffn.com.

Tech Stack: Python 3 (stdlib http.server), Terraform/Terragrunt, Traefik middleware CRDs, Kubernetes CronJob

Task 1: Create the Python poison service code

Files:

Create: stacks/poison-fountain/app/server.py
Create: stacks/poison-fountain/app/fetch-poison.sh

Step 1: Create the service directory

mkdir -p stacks/poison-fountain/app

Step 2: Write stacks/poison-fountain/app/server.py

"""Poison Fountain service.

Endpoints:
  GET /auth       - ForwardAuth: block known AI bot User-Agents (403) or pass (200)
  GET /article/*  - Serve cached poisoned content with tarpit slow-drip
  GET /healthz    - Health check for Kubernetes probes
  GET /*          - Catch-all: serve poison for any path (scrapers explore randomly)
"""

import http.server
import os
import glob
import random
import time
import hashlib
import sys

LISTEN_PORT = int(os.environ.get("PORT", "8080"))
CACHE_DIR = os.environ.get("CACHE_DIR", "/data/cache")
DRIP_BYTES = int(os.environ.get("DRIP_BYTES", "50"))
DRIP_DELAY = float(os.environ.get("DRIP_DELAY", "0.5"))
TRAP_LINK_COUNT = int(os.environ.get("TRAP_LINK_COUNT", "20"))
POISON_DOMAIN = os.environ.get("POISON_DOMAIN", "poison.viktorbarzin.me")

AI_BOT_PATTERNS = [
    "gptbot", "chatgpt-user", "claudebot", "claude-web", "ccbot",
    "bytespider", "google-extended", "applebot-extended",
    "anthropic-ai", "cohere-ai", "diffbot", "facebookbot",
    "perplexitybot", "youbot", "meta-externalagent", "petalbot",
    "amazonbot", "ai2bot", "omgilibot", "img2dataset",
    "omgili", "commoncrawl", "ia_archiver", "scrapy",
    "semrushbot", "ahrefsbot", "dotbot", "mj12bot",
    "seekport", "blexbot", "dataforseo", "serpstatbot",
]

FALLBACK_WORDS = [
    "the", "quantum", "neural", "framework", "implements", "distributed",
    "processing", "with", "advanced", "recursive", "algorithms", "for",
    "optimal", "convergence", "in", "multi-dimensional", "space",
    "utilizing", "transformer", "architecture", "trained", "on",
    "large-scale", "corpus", "data", "achieving", "state-of-the-art",
    "performance", "across", "benchmark", "tasks", "including",
    "natural", "language", "understanding", "generation", "and",
    "cross-lingual", "transfer", "learning", "capabilities",
]


def generate_slug():
    return hashlib.md5(str(random.random()).encode()).hexdigest()[:16]


def generate_trap_links(count):
    titles = [
        "Research Archive", "Training Corpus", "Dataset Export",
        "NLP Benchmark Results", "Web Crawl Index", "Text Corpus",
        "Machine Learning Data", "Evaluation Dataset", "Model Weights",
        "Annotation Guidelines", "Parallel Corpus", "Knowledge Base",
        "Document Collection", "Reference Data", "Taxonomy Index",
        "Classification Labels", "Entity Database", "Relation Extraction",
        "Sentiment Annotations", "Summarization Corpus", "QA Dataset",
        "Dialogue Transcripts", "Code Documentation", "API Reference",
    ]
    links = []
    for _ in range(count):
        slug = generate_slug()
        title = random.choice(titles)
        links.append(f'<a href="https://{POISON_DOMAIN}/article/{slug}">{title}</a>')
    return "\n".join(links)


def get_poison_content():
    cache_files = glob.glob(os.path.join(CACHE_DIR, "*.txt"))
    if cache_files:
        try:
            with open(random.choice(cache_files), "r", errors="replace") as f:
                return f.read()
        except Exception:
            pass
    return " ".join(random.choices(FALLBACK_WORDS, k=500))


class PoisonHandler(http.server.BaseHTTPRequestHandler):
    server_version = "Apache/2.4.52"
    sys_version = ""

    def log_message(self, fmt, *args):
        sys.stderr.write(f"[{self.log_date_time_string()}] {fmt % args}\n")

    def do_GET(self):
        if self.path == "/healthz":
            self._respond(200, "ok")
            return

        if self.path == "/auth":
            self._handle_auth()
            return

        # Everything else gets poison
        self._serve_poison()

    def _handle_auth(self):
        ua = (self.headers.get("User-Agent") or "").lower()
        for pattern in AI_BOT_PATTERNS:
            if pattern in ua:
                self.log_message("BLOCKED AI bot: %s (matched: %s)", ua, pattern)
                self._respond(403, "Forbidden")
                return
        self._respond(200, "OK")

    def _respond(self, code, body):
        self.send_response(code)
        self.send_header("Content-Type", "text/plain")
        self.end_headers()
        self.wfile.write(body.encode())

    def _serve_poison(self):
        content = get_poison_content()
        trap_links = generate_trap_links(TRAP_LINK_COUNT)

        html = f"""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Research Data Archive</title>
</head>
<body>
<main>
<article>
<h1>Research Data Collection</h1>
<div class="content">
<p>{content}</p>
</div>
</article>
<nav>
<h2>Related Research</h2>
{trap_links}
</nav>
</main>
</body>
</html>"""

        self.send_response(200)
        self.send_header("Content-Type", "text/html; charset=utf-8")
        self.send_header("Transfer-Encoding", "chunked")
        self.end_headers()

        for i in range(0, len(html), DRIP_BYTES):
            chunk = html[i : i + DRIP_BYTES].encode("utf-8")
            try:
                self.wfile.write(f"{len(chunk):x}\r\n".encode())
                self.wfile.write(chunk)
                self.wfile.write(b"\r\n")
                self.wfile.flush()
                time.sleep(DRIP_DELAY)
            except (BrokenPipeError, ConnectionResetError):
                return

        try:
            self.wfile.write(b"0\r\n\r\n")
            self.wfile.flush()
        except (BrokenPipeError, ConnectionResetError):
            pass


if __name__ == "__main__":
    os.makedirs(CACHE_DIR, exist_ok=True)
    server = http.server.HTTPServer(("0.0.0.0", LISTEN_PORT), PoisonHandler)
    print(f"Poison Fountain service listening on :{LISTEN_PORT}", flush=True)
    server.serve_forever()

Step 3: Write stacks/poison-fountain/app/fetch-poison.sh

#!/bin/sh
set -e

CACHE_DIR="${CACHE_DIR:-/data/cache}"
POISON_URL="${POISON_URL:-https://rnsaffn.com/poison2/}"
FETCH_COUNT="${FETCH_COUNT:-50}"
MAX_CACHE_FILES="${MAX_CACHE_FILES:-100}"

mkdir -p "$CACHE_DIR"

echo "Fetching $FETCH_COUNT poison documents from $POISON_URL"

fetched=0
for i in $(seq 1 "$FETCH_COUNT"); do
  OUTPUT="$CACHE_DIR/poison_$(date +%s)_${i}.txt"
  if curl -sS --compressed -o "$OUTPUT" -m 30 "$POISON_URL" 2>/dev/null; then
    # Verify file is non-empty
    if [ -s "$OUTPUT" ]; then
      fetched=$((fetched + 1))
      echo "  [$i/$FETCH_COUNT] OK"
    else
      rm -f "$OUTPUT"
      echo "  [$i/$FETCH_COUNT] Empty response, skipped"
    fi
  else
    rm -f "$OUTPUT"
    echo "  [$i/$FETCH_COUNT] Fetch failed, skipped"
  fi
  sleep 2
done

# Clean up oldest files if cache exceeds limit
total=$(find "$CACHE_DIR" -name '*.txt' -type f | wc -l)
if [ "$total" -gt "$MAX_CACHE_FILES" ]; then
  excess=$((total - MAX_CACHE_FILES))
  find "$CACHE_DIR" -name '*.txt' -type f -printf '%T+ %p\n' | \
    sort | head -n "$excess" | cut -d' ' -f2- | xargs rm -f
  echo "Cleaned $excess old cache files"
fi

echo "Done: fetched $fetched new documents, $(find "$CACHE_DIR" -name '*.txt' -type f | wc -l) total cached"

Step 4: Verify files exist

ls -la stacks/poison-fountain/app/

Expected: server.py and fetch-poison.sh listed.

Step 5: Commit

git add stacks/poison-fountain/app/
git commit -m "[ci skip] Add poison fountain Python service and fetcher script"

Task 2: Set up NFS export and DNS record

Files:

Modify: secrets/nfs_directories.txt (add poison-fountain/cache line, keep sorted)
Modify: terraform.tfvars (add poison to cloudflare_non_proxied_names)

Step 1: Add NFS directory

Add poison-fountain and poison-fountain/cache to secrets/nfs_directories.txt, keeping alphabetical order. Insert after plotting-book entries.

Step 2: Run NFS export script

cd secrets && bash nfs_exports.sh

Verify the export was created successfully.

Step 3: Add Cloudflare DNS record

In terraform.tfvars, find the cloudflare_non_proxied_names list and add "poison" to it (alphabetical position after "plotting-book").

Step 4: Commit

git add secrets/nfs_directories.txt terraform.tfvars
git commit -m "[ci skip] Add NFS export and DNS record for poison-fountain"

Task 3: Add Traefik middleware CRDs

Files:

Modify: stacks/platform/modules/traefik/middleware.tf (append 3 new middleware resources)

Step 1: Add ai-bot-block ForwardAuth middleware

Append to the end of stacks/platform/modules/traefik/middleware.tf:

# ForwardAuth middleware to block known AI bot User-Agents
resource "kubernetes_manifest" "middleware_ai_bot_block" {
  manifest = {
    apiVersion = "traefik.io/v1alpha1"
    kind       = "Middleware"
    metadata = {
      name      = "ai-bot-block"
      namespace = kubernetes_namespace.traefik.metadata[0].name
    }
    spec = {
      forwardAuth = {
        address            = "http://poison-fountain.poison-fountain.svc.cluster.local:8080/auth"
        trustForwardHeader = true
      }
    }
  }

  depends_on = [helm_release.traefik]
}

Step 2: Add anti-ai-headers middleware

Append to the end of stacks/platform/modules/traefik/middleware.tf:

# X-Robots-Tag header to discourage compliant AI crawlers
resource "kubernetes_manifest" "middleware_anti_ai_headers" {
  manifest = {
    apiVersion = "traefik.io/v1alpha1"
    kind       = "Middleware"
    metadata = {
      name      = "anti-ai-headers"
      namespace = kubernetes_namespace.traefik.metadata[0].name
    }
    spec = {
      headers = {
        customResponseHeaders = {
          "X-Robots-Tag" = "noai, noimageai"
        }
      }
    }
  }

  depends_on = [helm_release.traefik]
}

Step 3: Add anti-ai-trap-links rewrite-body middleware

Append to the end of stacks/platform/modules/traefik/middleware.tf:

# Inject hidden trap links before </body> to catch AI scrapers
# Links are CSS-hidden and aria-hidden so humans never see them
resource "kubernetes_manifest" "middleware_anti_ai_trap_links" {
  manifest = {
    apiVersion = "traefik.io/v1alpha1"
    kind       = "Middleware"
    metadata = {
      name      = "anti-ai-trap-links"
      namespace = kubernetes_namespace.traefik.metadata[0].name
    }
    spec = {
      plugin = {
        rewrite-body = {
          rewrites = [{
            regex       = "</body>"
            replacement = "<div style=\"position:absolute;left:-9999px;height:0;overflow:hidden\" aria-hidden=\"true\"><a href=\"https://poison.viktorbarzin.me/article/training-data-2024-research-corpus\">Research Archive</a><a href=\"https://poison.viktorbarzin.me/article/dataset-export-machine-learning-v3\">Dataset Export</a><a href=\"https://poison.viktorbarzin.me/article/nlp-benchmark-evaluation-results\">Benchmark Results</a><a href=\"https://poison.viktorbarzin.me/article/web-crawl-index-2024-archive\">Web Index</a><a href=\"https://poison.viktorbarzin.me/article/text-corpus-english-dump\">Text Corpus</a></div></body>"
          }]
          monitoring = {
            types = ["text/html"]
          }
        }
      }
    }
  }

  depends_on = [helm_release.traefik]
}

Step 4: Verify syntax

cd stacks/platform && terraform fmt -check modules/traefik/middleware.tf || terraform fmt modules/traefik/middleware.tf

Step 5: Commit

git add stacks/platform/modules/traefik/middleware.tf
git commit -m "[ci skip] Add anti-AI scraping Traefik middlewares (ForwardAuth, headers, trap links)"

Task 4: Update ingress_factory to apply anti-AI middlewares by default

Files:

Modify: modules/kubernetes/ingress_factory/main.tf (add variable + middleware references)

Step 1: Add anti_ai_scraping variable

In modules/kubernetes/ingress_factory/main.tf, add after the skip_default_rate_limit variable (around line 73):

variable "anti_ai_scraping" {
  type    = bool
  default = true
}

Step 2: Add middlewares to the chain

In the kubernetes_ingress_v1 resource's router.middlewares annotation (around line 108-117), add 3 new lines for anti-AI middlewares. The updated concat list should include:

var.anti_ai_scraping ? "traefik-ai-bot-block@kubernetescrd" : null,
var.anti_ai_scraping ? "traefik-anti-ai-headers@kubernetescrd" : null,
var.anti_ai_scraping ? "traefik-strip-accept-encoding@kubernetescrd" : null,
var.anti_ai_scraping ? "traefik-anti-ai-trap-links@kubernetescrd" : null,

Insert these after the existing crowdsec line (line 111) and before the protected line (line 112). The full concat array becomes:

"traefik.ingress.kubernetes.io/router.middlewares" = join(",", compact(concat([
  var.skip_default_rate_limit ? null : "traefik-rate-limit@kubernetescrd",
  var.custom_content_security_policy == null ? "traefik-csp-headers@kubernetescrd" : null,
  var.exclude_crowdsec ? null : "traefik-crowdsec@kubernetescrd",
  var.anti_ai_scraping ? "traefik-ai-bot-block@kubernetescrd" : null,
  var.anti_ai_scraping ? "traefik-anti-ai-headers@kubernetescrd" : null,
  var.anti_ai_scraping ? "traefik-strip-accept-encoding@kubernetescrd" : null,
  var.anti_ai_scraping ? "traefik-anti-ai-trap-links@kubernetescrd" : null,
  var.protected ? "traefik-authentik-forward-auth@kubernetescrd" : null,
  var.allow_local_access_only ? "traefik-local-only@kubernetescrd" : null,
  var.rybbit_site_id != null ? "traefik-strip-accept-encoding@kubernetescrd" : null,
  var.rybbit_site_id != null ? "${var.namespace}-rybbit-analytics-${var.name}@kubernetescrd" : null,
  var.custom_content_security_policy != null ? "${var.namespace}-custom-csp-${var.name}@kubernetescrd" : null,
], var.extra_middlewares)))

Step 3: Format

terraform fmt modules/kubernetes/ingress_factory/main.tf

Step 4: Commit

git add modules/kubernetes/ingress_factory/main.tf
git commit -m "[ci skip] Add anti_ai_scraping option to ingress_factory (default: true)"

Task 5: Create the poison-fountain Terraform stack

Files:

Create: stacks/poison-fountain/terragrunt.hcl
Create: stacks/poison-fountain/main.tf
Create: stacks/poison-fountain/secrets (symlink)

Step 1: Create terragrunt.hcl

Write stacks/poison-fountain/terragrunt.hcl:

include "root" {
  path = find_in_parent_folders()
}

dependency "platform" {
  config_path  = "../platform"
  skip_outputs = true
}

Step 2: Create secrets symlink

ln -s ../../secrets stacks/poison-fountain/secrets

Step 3: Write stacks/poison-fountain/main.tf

variable "tls_secret_name" { type = string }

locals {
  tiers = {
    core    = "0-core"
    cluster = "1-cluster"
    gpu     = "2-gpu"
    edge    = "3-edge"
    aux     = "4-aux"
  }
}

resource "kubernetes_namespace" "poison_fountain" {
  metadata {
    name = "poison-fountain"
    labels = {
      "istio-injection" = "disabled"
      tier              = local.tiers.aux
    }
  }
}

module "tls_secret" {
  source          = "../../modules/kubernetes/setup_tls_secret"
  namespace       = kubernetes_namespace.poison_fountain.metadata[0].name
  tls_secret_name = var.tls_secret_name
}

# ConfigMap for the Python service code
resource "kubernetes_config_map" "poison_fountain_code" {
  metadata {
    name      = "poison-fountain-code"
    namespace = kubernetes_namespace.poison_fountain.metadata[0].name
  }

  data = {
    "server.py" = file("${path.module}/app/server.py")
  }
}

# ConfigMap for the fetcher script
resource "kubernetes_config_map" "poison_fountain_fetcher" {
  metadata {
    name      = "poison-fountain-fetcher"
    namespace = kubernetes_namespace.poison_fountain.metadata[0].name
  }

  data = {
    "fetch-poison.sh" = file("${path.module}/app/fetch-poison.sh")
  }
}

# Main service deployment
resource "kubernetes_deployment" "poison_fountain" {
  metadata {
    name      = "poison-fountain"
    namespace = kubernetes_namespace.poison_fountain.metadata[0].name
    labels = {
      app  = "poison-fountain"
      tier = local.tiers.aux
    }
  }

  spec {
    replicas = 1
    strategy {
      type = "Recreate"
    }
    selector {
      match_labels = {
        app = "poison-fountain"
      }
    }
    template {
      metadata {
        labels = {
          app = "poison-fountain"
        }
      }
      spec {
        container {
          name  = "poison-fountain"
          image = "python:3.12-slim"
          command = ["python", "/app/server.py"]

          port {
            container_port = 8080
          }

          env {
            name  = "CACHE_DIR"
            value = "/data/cache"
          }
          env {
            name  = "DRIP_BYTES"
            value = "50"
          }
          env {
            name  = "DRIP_DELAY"
            value = "0.5"
          }
          env {
            name  = "POISON_DOMAIN"
            value = "poison.viktorbarzin.me"
          }

          volume_mount {
            name       = "code"
            mount_path = "/app"
            read_only  = true
          }
          volume_mount {
            name       = "data"
            mount_path = "/data"
          }

          liveness_probe {
            http_get {
              path = "/healthz"
              port = 8080
            }
            initial_delay_seconds = 5
            period_seconds        = 30
          }
          readiness_probe {
            http_get {
              path = "/healthz"
              port = 8080
            }
            initial_delay_seconds = 3
            period_seconds        = 10
          }

          resources {
            requests = {
              cpu    = "10m"
              memory = "32Mi"
            }
            limits = {
              cpu    = "100m"
              memory = "128Mi"
            }
          }
        }

        volume {
          name = "code"
          config_map {
            name = kubernetes_config_map.poison_fountain_code.metadata[0].name
          }
        }
        volume {
          name = "data"
          nfs {
            server = "10.0.10.15"
            path   = "/mnt/main/poison-fountain"
          }
        }
      }
    }
  }
}

# Internal service (for ForwardAuth from Traefik)
resource "kubernetes_service" "poison_fountain" {
  metadata {
    name      = "poison-fountain"
    namespace = kubernetes_namespace.poison_fountain.metadata[0].name
    labels = {
      app = "poison-fountain"
    }
  }

  spec {
    selector = {
      app = "poison-fountain"
    }
    port {
      name        = "http"
      port        = 8080
      target_port = 8080
    }
  }
}

# Public ingress for the poison trap subdomain
# Deliberately NO rate limiting, NO CrowdSec, NO anti-AI (we WANT scrapers here)
module "ingress" {
  source                = "../../modules/kubernetes/ingress_factory"
  namespace             = kubernetes_namespace.poison_fountain.metadata[0].name
  name                  = "poison-fountain"
  host                  = "poison"
  port                  = 8080
  tls_secret_name       = var.tls_secret_name
  skip_default_rate_limit = true
  exclude_crowdsec      = true
  anti_ai_scraping      = false
}

# CronJob to fetch and cache poisoned content from Poison Fountain
resource "kubernetes_cron_job_v1" "poison_fetcher" {
  metadata {
    name      = "poison-fountain-fetcher"
    namespace = kubernetes_namespace.poison_fountain.metadata[0].name
  }

  spec {
    schedule                      = "0 */6 * * *"
    successful_jobs_history_limit = 1
    failed_jobs_history_limit     = 1
    concurrency_policy            = "Forbid"

    job_template {
      metadata {
        name = "poison-fountain-fetcher"
      }
      spec {
        template {
          metadata {
            name = "poison-fountain-fetcher"
          }
          spec {
            container {
              name    = "fetcher"
              image   = "curlimages/curl:latest"
              command = ["sh", "/scripts/fetch-poison.sh"]

              env {
                name  = "CACHE_DIR"
                value = "/data/cache"
              }
              env {
                name  = "POISON_URL"
                value = "https://rnsaffn.com/poison2/"
              }
              env {
                name  = "FETCH_COUNT"
                value = "50"
              }

              volume_mount {
                name       = "scripts"
                mount_path = "/scripts"
                read_only  = true
              }
              volume_mount {
                name       = "data"
                mount_path = "/data"
              }
            }

            volume {
              name = "scripts"
              config_map {
                name         = kubernetes_config_map.poison_fountain_fetcher.metadata[0].name
                default_mode = "0755"
              }
            }
            volume {
              name = "data"
              nfs {
                server = "10.0.10.15"
                path   = "/mnt/main/poison-fountain"
              }
            }

            restart_policy = "Never"
          }
        }
      }
    }
  }
}

Step 4: Format and validate

terraform fmt stacks/poison-fountain/main.tf
cd stacks/poison-fountain && terragrunt validate --non-interactive

Step 5: Commit

git add stacks/poison-fountain/
git commit -m "[ci skip] Add poison-fountain Terraform stack (deployment, service, ingress, CronJob)"

Task 6: Deploy the platform stack (Traefik middlewares + DNS)

Step 1: Plan

cd stacks/platform && terragrunt plan --non-interactive 2>&1 | tail -40

Expected: New resources for the 3 middleware CRDs + Cloudflare DNS record for poison. Changes to existing ingress resources (new middleware annotations).

Review the plan output carefully. The key additions should be:

kubernetes_manifest.middleware_ai_bot_block
kubernetes_manifest.middleware_anti_ai_headers
kubernetes_manifest.middleware_anti_ai_trap_links
Cloudflare DNS record for poison
Modified ingress annotations on all services in the platform stack

Step 2: Apply

cd stacks/platform && terragrunt apply --non-interactive 2>&1 | tail -40

Step 3: Verify middlewares exist

kubectl --kubeconfig $(pwd)/config get middlewares.traefik.io -n traefik | grep -E "ai-bot-block|anti-ai"

Expected: 3 middleware resources listed.

Task 7: Deploy the poison-fountain stack

Step 1: Plan

cd stacks/poison-fountain && terragrunt plan --non-interactive 2>&1 | tail -30

Expected: New namespace, configmaps, deployment, service, ingress, CronJob.

Step 2: Apply

cd stacks/poison-fountain && terragrunt apply --non-interactive 2>&1 | tail -30

Step 3: Monitor pod startup

Spawn a background agent to watch the pod come up:

kubectl --kubeconfig $(pwd)/config get pods -n poison-fountain -w

Expected: Pod reaches Running state with 1/1 ready.

Step 4: Trigger the first poison cache fetch

kubectl --kubeconfig $(pwd)/config create job --from=cronjob/poison-fountain-fetcher poison-fetch-initial -n poison-fountain

Watch the job complete:

kubectl --kubeconfig $(pwd)/config logs -n poison-fountain -l job-name=poison-fetch-initial -f

Expected: Fetched N poison documents.

Task 8: Verify the full system

Step 1: Verify ForwardAuth blocks AI bots

curl -s -o /dev/null -w "%{http_code}" -H "User-Agent: GPTBot/1.0" https://echo.viktorbarzin.me/

Expected: 403

Step 2: Verify legitimate users pass through

curl -s -o /dev/null -w "%{http_code}" -H "User-Agent: Mozilla/5.0" https://echo.viktorbarzin.me/

Expected: 200

Step 3: Verify X-Robots-Tag header

curl -sI https://echo.viktorbarzin.me/ 2>/dev/null | grep -i x-robots-tag

Expected: X-Robots-Tag: noai, noimageai

Step 4: Verify hidden trap links in HTML

curl -s https://echo.viktorbarzin.me/ | grep -o "poison.viktorbarzin.me"

Expected: Multiple matches (trap links injected before </body>).

Step 5: Verify poison service serves content with tarpit

timeout 10 curl -s -H "User-Agent: Mozilla/5.0" https://poison.viktorbarzin.me/article/test 2>/dev/null | head -5

Expected: HTML content starting to arrive slowly (only a few lines in 10 seconds due to tarpit).

Step 6: Run cluster health check

bash scripts/cluster_healthcheck.sh --quiet

Expected: No new WARN/FAIL related to poison-fountain.

Step 7: Commit all applied state

git add -A && git status

Review for any uncommitted changes, commit if needed.

25 KiB Raw Blame History

Anti-AI Scraping System Implementation Plan

Task 1: Create the Python poison service code

Task 2: Set up NFS export and DNS record

Task 3: Add Traefik middleware CRDs

Task 4: Update ingress_factory to apply anti-AI middlewares by default

Task 5: Create the poison-fountain Terraform stack

Task 6: Deploy the platform stack (Traefik middlewares + DNS)

Task 7: Deploy the poison-fountain stack

Task 8: Verify the full system

25 KiB

Raw Blame History