diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 68fcfd38..3275cd4b 100755 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -29,6 +29,7 @@ Violations cause state drift, which causes future applies to break or silently r - **New services need CI/CD** and **monitoring** (Prometheus/Uptime Kuma) - **New service**: Use `setup-project` skill for full workflow - **Ingress**: `ingress_factory` module. Auth: `protected = true`. Anti-AI: on by default. **DNS**: `dns_type = "proxied"` (Cloudflare CDN) or `"non-proxied"` (direct A/AAAA). DNS records are auto-created — no need to edit `config.tfvars`. +- **Anubis PoW challenge** (`modules/kubernetes/anubis_instance/`): per-site reverse proxy that issues a 30-day JWT cookie after a tiny PoW solve. Use for **public, content-bearing sites without app-level auth** (blog, docs, wikis, static landing pages). Pattern: declare `module "anubis" { source = "../../modules/kubernetes/anubis_instance"; name = "X"; namespace = ...; target_url = "http://..svc.cluster.local" }`, then in `ingress_factory` set `service_name = module.anubis.service_name`, `port = module.anubis.service_port`, `anti_ai_scraping = false`. Shared ed25519 key in Vault `secret/viktor` -> `anubis_ed25519_key`; cookie scoped to `viktorbarzin.me` so one solve covers all Anubis-fronted subdomains. **DO NOT put Anubis in front of Git/API/WebDAV/CLI endpoints** — clients without JS can't solve PoW. Active on: blog, kms, travel-blog. See `.claude/reference/patterns.md` "Anti-AI Scraping" for full layering. - **Docker images**: Always build for `linux/amd64`. Use 8-char git SHA tags — `:latest` causes stale pull-through cache. - **Private registry**: `forgejo.viktorbarzin.me/viktor/` (Forgejo packages, OAuth-style PAT auth). Use `image: forgejo.viktorbarzin.me/viktor/:` + `imagePullSecrets: [{name: registry-credentials}]`. Kyverno auto-syncs the Secret to all namespaces. Containerd `hosts.toml` on every node redirects to in-cluster Traefik LB `10.0.20.200` to avoid hairpin NAT. Push-side: viktor PAT in Vault `secret/ci/global/forgejo_push_token` (Forgejo container packages are scoped per-user; only the package owner can push, ci-pusher cannot write to viktor/*). Pull-side: cluster-puller PAT in Vault `secret/viktor/forgejo_pull_token`. Retention CronJob (`forgejo-cleanup` in `forgejo` ns, daily 04:00) keeps newest 10 versions + always `:latest`; integrity probed every 15min by `forgejo-integrity-probe` in `monitoring` ns (catalog walk + manifest HEAD on every blob). See `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md` for the migration history. Pull-through caches for upstream registries (DockerHub, GHCR, Quay, k8s.gcr, Kyverno) stay on the registry VM at `10.0.20.10` ports 5000/5010/5020/5030/5040 — the old port-5050 R/W private registry was decommissioned 2026-05-07. - **LinuxServer.io containers**: `DOCKER_MODS` runs apt-get on every start — bake slow mods into a custom image (`RUN /docker-mods || true` then `ENV DOCKER_MODS=`). Set `NO_CHOWN=true` to skip recursive chown that hangs on NFS mounts. diff --git a/.claude/reference/patterns.md b/.claude/reference/patterns.md index 4a563e3c..56ec6750 100644 --- a/.claude/reference/patterns.md +++ b/.claude/reference/patterns.md @@ -26,12 +26,16 @@ module "nfs_data" { ## ~~iSCSI Storage~~ (REMOVED — replaced by proxmox-lvm) > iSCSI via democratic-csi and TrueNAS has been fully removed (2026-04). All database storage now uses `StorageClass: proxmox-lvm` (Proxmox CSI, LVM-thin hotplug). TrueNAS has been decommissioned. -## Anti-AI Scraping (3 Active Layers) (Updated 2026-04-17) +## Anti-AI Scraping (4 Active Layers) (Updated 2026-05-10) Default `anti_ai_scraping = true` in ingress_factory. Disable per-service: `anti_ai_scraping = false`. -1. Bot blocking (ForwardAuth → poison-fountain) 2. X-Robots-Tag noai 3. Tarpit/poison content (standalone at poison.viktorbarzin.me) -Trap links (formerly layer 3) removed April 2026 — rewrite-body plugin broken on Traefik v3.6.12 (Yaegi bugs). `strip-accept-encoding` and `anti-ai-trap-links` middlewares deleted. +1. **Anubis PoW challenge** (per-site reverse proxy) — `modules/kubernetes/anubis_instance/`. Latest: `ghcr.io/techarohq/anubis:v1.25.0`. Difficulty 2 (~250 ms desktop / ~700 ms mobile), 30-day JWT cookie scoped to `viktorbarzin.me` so a single solve covers every Anubis-fronted subdomain. Active on: `viktorbarzin.me`, `kms.viktorbarzin.me`, `travel.viktorbarzin.me`. Add to a stack: `module "anubis" { source = "../../modules/kubernetes/anubis_instance"; name = "X"; namespace = ...; target_url = "http://..svc.cluster.local" }`, then point ingress_factory at `module.anubis.service_name` + `port = module.anubis.service_port` and set `anti_ai_scraping = false`. Shared ed25519 signing key in Vault `secret/viktor` -> `anubis_ed25519_key`. **Avoid putting Anubis in front of CLI/API/Git endpoints (Forgejo, APIs, WebDAV)** — clients without JS can't solve PoW. +2. **Bot blocking forwardAuth** (ForwardAuth → bot-block-proxy → poison-fountain) — global default for non-Anubis sites. `bot-block-proxy` (OpenResty in `traefik` ns) is fail-open with 100 ms connect / 200 ms read timeouts so a downed poison-fountain costs ≤200 ms per request. Source: `stacks/traefik/modules/traefik/main.tf`. +3. **X-Robots-Tag noai** — set by `traefik-anti-ai-headers` middleware. Anubis additionally serves a comprehensive `/robots.txt` (`SERVE_ROBOTS_TXT=true`) to well-behaved bots. +4. **Tarpit/poison content** (standalone at poison.viktorbarzin.me, `stacks/poison-fountain/`). Currently scaled to `replicas = 0` — fail-open path means no live traffic, no penalty. + +Trap links (formerly a layer) removed April 2026 — rewrite-body plugin broken on Traefik v3.6.12 (Yaegi bugs). `strip-accept-encoding` and `anti-ai-trap-links` middlewares deleted. Rybbit analytics injection now via Cloudflare Worker (`stacks/rybbit/worker/`, HTMLRewriter, wildcard route `*.viktorbarzin.me/*`, 28 site ID mappings). -Key files: `stacks/poison-fountain/`, `stacks/rybbit/worker/`, `stacks/platform/modules/traefik/middleware.tf` +Key files: `modules/kubernetes/anubis_instance/`, `stacks/poison-fountain/`, `stacks/rybbit/worker/`, `stacks/traefik/modules/traefik/main.tf` ## Terragrunt Architecture - Root `terragrunt.hcl`: DRY providers, backend, variable loading, `generate "tiers"` block diff --git a/modules/kubernetes/anubis_instance/main.tf b/modules/kubernetes/anubis_instance/main.tf new file mode 100644 index 00000000..991da06a --- /dev/null +++ b/modules/kubernetes/anubis_instance/main.tf @@ -0,0 +1,346 @@ +terraform { + required_providers { + kubernetes = { + source = "hashicorp/kubernetes" + } + } +} + +# Per-site Anubis reverse proxy. +# Sits between Traefik and the real backend. On first visit, serves a +# proof-of-work challenge; on success, drops a long-lived JWT cookie and +# proxies the request through to `target_url`. +# +# Sharing a single ed25519 signing key across instances + COOKIE_DOMAIN at +# the registrable domain means a token solved on one viktorbarzin.me subdomain +# is honoured by every other Anubis-fronted site. + +variable "name" { + type = string + description = "Short logical name (e.g. \"blog\"). Used to derive Service / Deployment / Secret names as anubis-." +} + +variable "namespace" { + type = string + description = "Namespace to deploy into — typically the same as the protected backend service." +} + +variable "target_url" { + type = string + description = "Backend URL Anubis forwards passing requests to (e.g. http://blog.website.svc.cluster.local)." +} + +variable "cookie_domain" { + type = string + default = "viktorbarzin.me" + description = "Cookie domain — set to the registrable domain so a single PoW solve covers every Anubis-fronted subdomain." +} + +variable "difficulty" { + type = number + default = 2 + description = "PoW difficulty (leading-zero hex chars). 2 = ~250ms desktop / ~700ms mobile. Bump for stronger filtering." +} + +variable "cookie_expiration_hours" { + type = number + default = 720 # 30 days + description = "Lifetime of the issued JWT cookie in hours." +} + +variable "image_tag" { + type = string + default = "v1.25.0" + description = "ghcr.io/techarohq/anubis tag — pin to a release, never :latest." +} + +variable "replicas" { + type = number + default = 2 + description = "Replica count. 2 + matching ed25519 key = HA without sticky sessions." +} + +variable "memory" { + type = string + default = "128Mi" + description = "requests==limits memory. Anubis docs suggest 128Mi handles many concurrent clients." +} + +variable "cpu_request" { + type = string + default = "20m" + description = "CPU request. PoW verification is server-cheap (just hash check)." +} + +locals { + full_name = "anubis-${var.name}" + labels = { + "app" = local.full_name + "app.kubernetes.io/name" = "anubis" + "app.kubernetes.io/instance" = local.full_name + "app.kubernetes.io/component" = "ai-bot-challenge" + "app.kubernetes.io/managed-by" = "terraform" + } +} + +# ED25519 signing key — pulled from Vault `secret/viktor` -> field +# `anubis_ed25519_key`. Same key across every instance so JWTs are +# cross-validatable, enabling cross-subdomain SSO. +resource "kubernetes_manifest" "ed25519_secret" { + manifest = { + apiVersion = "external-secrets.io/v1beta1" + kind = "ExternalSecret" + metadata = { + name = "${local.full_name}-key" + namespace = var.namespace + } + spec = { + refreshInterval = "1h" + secretStoreRef = { + name = "vault-kv" + kind = "ClusterSecretStore" + } + target = { + name = "${local.full_name}-key" + creationPolicy = "Owner" + } + data = [{ + secretKey = "key" + remoteRef = { + key = "viktor" + property = "anubis_ed25519_key" + } + }] + } + } +} + +resource "kubernetes_deployment" "anubis" { + metadata { + name = local.full_name + namespace = var.namespace + labels = local.labels + } + + spec { + replicas = var.replicas + + selector { + match_labels = { app = local.full_name } + } + + strategy { + type = "RollingUpdate" + rolling_update { + max_surge = 1 + max_unavailable = 0 + } + } + + template { + metadata { + labels = local.labels + } + + spec { + # Spread replicas across nodes to survive a single node failure. + topology_spread_constraint { + max_skew = 1 + topology_key = "kubernetes.io/hostname" + when_unsatisfiable = "ScheduleAnyway" + label_selector { + match_labels = { app = local.full_name } + } + } + + container { + name = "anubis" + image = "ghcr.io/techarohq/anubis:${var.image_tag}" + + port { + name = "http" + container_port = 8923 + } + port { + name = "metrics" + container_port = 9090 + } + + env { + name = "BIND" + value = ":8923" + } + env { + name = "METRICS_BIND" + value = ":9090" + } + env { + name = "TARGET" + value = var.target_url + } + env { + name = "DIFFICULTY" + value = tostring(var.difficulty) + } + env { + name = "COOKIE_EXPIRATION_TIME" + value = "${var.cookie_expiration_hours}h" + } + # Cross-subdomain SSO: cookie scoped to the registrable domain so + # a JWT solved on any Anubis-fronted subdomain is honoured on every + # other one. (COOKIE_DOMAIN and COOKIE_DYNAMIC_DOMAIN are mutually + # exclusive — picking the explicit form.) + env { + name = "COOKIE_DOMAIN" + value = var.cookie_domain + } + env { + name = "COOKIE_SECURE" + value = "true" + } + env { + name = "COOKIE_SAME_SITE" + value = "Lax" + } + # Built-in robots.txt that disallows known AI scrapers — well-behaved + # bots get blocked here without ever paying the PoW cost. + env { + name = "SERVE_ROBOTS_TXT" + value = "true" + } + # Drop cluster-internal IPs from XFF so Anubis sees the real client. + env { + name = "XFF_STRIP_PRIVATE" + value = "true" + } + env { + name = "SLOG_LEVEL" + value = "INFO" + } + env { + name = "ED25519_PRIVATE_KEY_HEX_FILE" + # Mounted from the ESO-managed Secret below. + value = "/keys/key" + } + + volume_mount { + name = "ed25519-key" + mount_path = "/keys" + read_only = true + } + + resources { + requests = { + cpu = var.cpu_request + memory = var.memory + } + limits = { + memory = var.memory + } + } + + # Liveness + readiness on the metrics endpoint (zero auth, always 200). + liveness_probe { + http_get { + path = "/metrics" + port = "metrics" + } + initial_delay_seconds = 10 + period_seconds = 30 + failure_threshold = 3 + } + readiness_probe { + http_get { + path = "/metrics" + port = "metrics" + } + initial_delay_seconds = 2 + period_seconds = 5 + failure_threshold = 2 + } + + security_context { + run_as_non_root = true + run_as_user = 1000 + run_as_group = 1000 + allow_privilege_escalation = false + read_only_root_filesystem = true + capabilities { + drop = ["ALL"] + } + } + } + + volume { + name = "ed25519-key" + secret { + secret_name = "${local.full_name}-key" + items { + key = "key" + path = "key" + } + } + } + } + } + } + + lifecycle { + # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2 + ignore_changes = [spec[0].template[0].spec[0].dns_config] + } + + depends_on = [kubernetes_manifest.ed25519_secret] +} + +resource "kubernetes_service" "anubis" { + metadata { + name = local.full_name + namespace = var.namespace + labels = local.labels + annotations = { + "prometheus.io/scrape" = "true" + "prometheus.io/path" = "/metrics" + "prometheus.io/port" = "9090" + } + } + + spec { + selector = { app = local.full_name } + port { + name = "http" + port = 8080 + target_port = 8923 + protocol = "TCP" + } + port { + name = "metrics" + port = 9090 + target_port = 9090 + protocol = "TCP" + } + } +} + +resource "kubernetes_pod_disruption_budget_v1" "anubis" { + metadata { + name = local.full_name + namespace = var.namespace + } + spec { + min_available = "1" + selector { + match_labels = { app = local.full_name } + } + } +} + +output "service_name" { + value = kubernetes_service.anubis.metadata[0].name + description = "ClusterIP service name. Pass this to ingress_factory's `service_name` so Traefik routes through Anubis." +} + +output "service_port" { + value = 8080 + description = "Service port. Anubis listens on 8923 inside; the Service exposes 8080." +} diff --git a/stacks/blog/main.tf b/stacks/blog/main.tf index bf5e2699..44bfc78f 100644 --- a/stacks/blog/main.tf +++ b/stacks/blog/main.tf @@ -112,14 +112,26 @@ resource "kubernetes_service" "blog" { } } +# Anubis reverse proxy in front of the blog. First-time visitors solve a +# tiny PoW (~250ms desktop), get a 30-day cookie, and pass through. Replaces +# the global ai-bot-block forwardAuth for this site. +module "anubis" { + source = "../../modules/kubernetes/anubis_instance" + name = "blog" + namespace = kubernetes_namespace.website.metadata[0].name + target_url = "http://${kubernetes_service.blog.metadata[0].name}.${kubernetes_namespace.website.metadata[0].name}.svc.cluster.local" +} + module "ingress" { - source = "../../modules/kubernetes/ingress_factory" - namespace = kubernetes_namespace.website.metadata[0].name - name = "blog" - service_name = "blog" - full_host = "viktorbarzin.me" - dns_type = "proxied" - tls_secret_name = var.tls_secret_name + source = "../../modules/kubernetes/ingress_factory" + namespace = kubernetes_namespace.website.metadata[0].name + name = "blog" + service_name = module.anubis.service_name + port = module.anubis.service_port + full_host = "viktorbarzin.me" + dns_type = "proxied" + tls_secret_name = var.tls_secret_name + anti_ai_scraping = false # Anubis is the gatekeeper now — drop the redundant ai-bot-block forwardAuth. extra_annotations = { "gethomepage.dev/enabled" = "true" "gethomepage.dev/name" = "Blog" @@ -131,10 +143,12 @@ module "ingress" { } module "ingress-www" { - source = "../../modules/kubernetes/ingress_factory" - namespace = kubernetes_namespace.website.metadata[0].name - name = "blog-www" - service_name = "blog" - full_host = "www.viktorbarzin.me" - tls_secret_name = var.tls_secret_name + source = "../../modules/kubernetes/ingress_factory" + namespace = kubernetes_namespace.website.metadata[0].name + name = "blog-www" + service_name = module.anubis.service_name + port = module.anubis.service_port + full_host = "www.viktorbarzin.me" + tls_secret_name = var.tls_secret_name + anti_ai_scraping = false } diff --git a/stacks/kms/main.tf b/stacks/kms/main.tf index 9f8c2094..088bf441 100644 --- a/stacks/kms/main.tf +++ b/stacks/kms/main.tf @@ -103,12 +103,22 @@ resource "kubernetes_service" "kms-web-page" { } } +module "anubis" { + source = "../../modules/kubernetes/anubis_instance" + name = "kms" + namespace = kubernetes_namespace.kms.metadata[0].name + target_url = "http://${kubernetes_service.kms-web-page.metadata[0].name}.${kubernetes_namespace.kms.metadata[0].name}.svc.cluster.local" +} + module "ingress" { - source = "../../modules/kubernetes/ingress_factory" - dns_type = "non-proxied" - namespace = kubernetes_namespace.kms.metadata[0].name - name = "kms" - tls_secret_name = var.tls_secret_name + source = "../../modules/kubernetes/ingress_factory" + dns_type = "non-proxied" + namespace = kubernetes_namespace.kms.metadata[0].name + name = "kms" + service_name = module.anubis.service_name + port = module.anubis.service_port + tls_secret_name = var.tls_secret_name + anti_ai_scraping = false extra_annotations = { "gethomepage.dev/enabled" = "true" "gethomepage.dev/name" = "KMS" diff --git a/stacks/traefik/modules/traefik/main.tf b/stacks/traefik/modules/traefik/main.tf index 8be8b10a..4edea94b 100644 --- a/stacks/traefik/modules/traefik/main.tf +++ b/stacks/traefik/modules/traefik/main.tf @@ -314,9 +314,13 @@ resource "kubernetes_config_map" "bot_block_proxy_config" { ngx.req.clear_header("If-Unmodified-Since") } proxy_pass http://poison_fountain; - proxy_connect_timeout 3s; - proxy_read_timeout 5s; - proxy_send_timeout 5s; + # Tight timeouts: poison-fountain may be scaled to 0 (graveyard + # endpoints) — failing open in <200ms keeps the 68-ingress chain + # responsive instead of paying 3s per request. Healthy upstream + # responds in <50ms anyway. + proxy_connect_timeout 100ms; + proxy_read_timeout 200ms; + proxy_send_timeout 200ms; proxy_intercept_errors on; error_page 502 503 504 =200 /fallback-allow; proxy_set_header Host $host; diff --git a/stacks/travel_blog/main.tf b/stacks/travel_blog/main.tf index 26c9ae67..3b53b0ee 100644 --- a/stacks/travel_blog/main.tf +++ b/stacks/travel_blog/main.tf @@ -102,12 +102,21 @@ resource "kubernetes_service" "travel-blog" { } } +module "anubis" { + source = "../../modules/kubernetes/anubis_instance" + name = "travel" + namespace = kubernetes_namespace.travel-blog.metadata[0].name + target_url = "http://${kubernetes_service.travel-blog.metadata[0].name}.${kubernetes_namespace.travel-blog.metadata[0].name}.svc.cluster.local" +} + module "ingress" { - source = "../../modules/kubernetes/ingress_factory" - namespace = kubernetes_namespace.travel-blog.metadata[0].name - name = "travel" - tls_secret_name = var.tls_secret_name - service_name = "travel-blog" + source = "../../modules/kubernetes/ingress_factory" + namespace = kubernetes_namespace.travel-blog.metadata[0].name + name = "travel" + tls_secret_name = var.tls_secret_name + service_name = module.anubis.service_name + port = module.anubis.service_port + anti_ai_scraping = false extra_annotations = { "gethomepage.dev/enabled" = "true" "gethomepage.dev/name" = "Travel Blog"