infra/scripts/tg: enforce ingress_factory auth-comment convention

Every `tg plan/apply/destroy/refresh` now runs `scripts/check-ingress-auth-comments.py` against the current stack before invoking terragrunt. The check fails closed if any `auth = "app"` or `auth = "none"` line in the stack's .tf files lacks an immediately-preceding `# auth = "<tier>": ...` comment documenting what gates the app (for "app") or why the endpoint is intentionally public (for "none"). Why tg-level (not git pre-commit): tg is the universal entry point for all infra changes. CI runs it, headless agents run it, humans run it. A pre-commit hook only catches the human path. Wiring the check into tg means the anti-exposure guard fires regardless of who or what is invoking terragrunt. Stack-scoped: each stack documents itself the next time it's edited. The 30+ existing `auth = "none"` stacks that predate this guard are not blocked from operating today; they'll need the comment added the next time someone runs `tg plan` on them — at which point the gate forces a conscious "yes, this is intentional" moment before any state change can land. Skipped on: init, fmt, validate, output, etc. — anything that doesn't read or write infra state.
2026-05-11 19:18:27 +00:00 · 2026-05-11 19:18:27 +00:00 · 9e5a5fb0c7
commit 9e5a5fb0c7
parent e826e36658
3 changed files with 149 additions and 1 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@ -33,7 +33,7 @@ Violations cause state drift, which causes future applies to break or silently r
  - `auth = "app"` — the backend handles its own user authentication (NextAuth, Django, OAuth, bearer-token API, etc.); Authentik would only break it. No middleware attached; the app's own login is the gate. Examples: immich, linkwarden, tandoor, freshrss, affine, actualbudget, audiobookshelf, novelapp. **Functionally identical to `"none"`** — the distinct name exists to record intent at the call site.
  - `auth = "public"` — Authentik anonymous binding via the dedicated `public` outpost (routes via `traefik-authentik-forward-auth-public` → `ak-outpost-public.authentik.svc:9000`). Strangers auto-bound to `guest`; logged-in users keep their identity in `X-authentik-username`. **Only works for top-level browser navigation** — CORS preflight rejects XHR/fetch and automation can't replay the cookie dance. Audit trail, not a gate.
  - `auth = "none"` — no Authentik, no own-auth claim. Use for Anubis-fronted content (Anubis is the gate), native-client APIs (Git, `/v2/`, WebDAV/CalDAV, CardDAV), webhook receivers, OAuth callbacks, and Authentik outposts themselves.
-  - **Anti-exposure rule** (the reason `"app"` exists): only pick `"app"` or `"none"` AFTER you've verified the app has its own user auth (`"app"`) OR the endpoint is intentionally public (`"none"`). Default is `"required"` so accidental omission fails closed. **Convention**: when using `"app"` or `"none"`, add a comment line above the `auth = "..."` line stating what gates the app or why it's public.
+  - **Anti-exposure rule** (the reason `"app"` exists): only pick `"app"` or `"none"` AFTER you've verified the app has its own user auth (`"app"`) OR the endpoint is intentionally public (`"none"`). Default is `"required"` so accidental omission fails closed. **Convention**: when using `"app"` or `"none"`, add a comment line above the `auth = "..."` line stating what gates the app or why it's public. **Enforced by `scripts/tg`**: every `tg plan/apply/destroy/refresh` runs `scripts/check-ingress-auth-comments.py` against the current stack and aborts if any `auth = "app|none"` line lacks the preceding `# auth = "<tier>": ...` comment. Stack-scoped — untouched stacks aren't blocked until they're next edited.
  - **Anti-AI**: on by default when `auth = "none"` or `auth = "app"` (no Authentik to discourage bots); redundant on `"required"` and `"public"`.
  - **DNS**: `dns_type = "proxied"` (Cloudflare CDN) or `"non-proxied"` (direct A/AAAA). DNS records are auto-created — no need to edit `config.tfvars`. Smoke-test target: `echo.viktorbarzin.me` (auth=public, header-reflecting backend).
 - **Anubis PoW challenge** (`modules/kubernetes/anubis_instance/`): per-site reverse proxy that issues a 30-day JWT cookie after a tiny PoW solve. Use for **public, content-bearing sites without app-level auth** (blog, docs, wikis, static landing pages). Pattern: declare `module "anubis" { source = "../../modules/kubernetes/anubis_instance"; name = "X"; namespace = ...; target_url = "http://<backend>.<ns>.svc.cluster.local" }`, then in `ingress_factory` set `service_name = module.anubis.service_name`, `port = module.anubis.service_port`, `anti_ai_scraping = false`. Shared ed25519 key in Vault `secret/viktor` -> `anubis_ed25519_key`; cookie scoped to `viktorbarzin.me` so one solve covers all Anubis-fronted subdomains. **DO NOT put Anubis in front of Git/API/WebDAV/CLI endpoints** — clients without JS can't solve PoW. **Replicas default to 1** because Anubis stores in-flight challenges in process memory; a challenge issued by pod A and solved against pod B errors with `store: key not found` (HTTP 500). Bumping replicas requires wiring a shared Redis store (TODO). For path-level carve-outs (e.g. wrongmove has `/` behind Anubis but `/api` direct), declare a second `ingress_factory` with `ingress_path = ["/api"]` pointing at the bare backend service. Active on: blog, www, kms, travel, f1, cc, json, pb (privatebin), home (homepage), wrongmove (UI only). See `.claude/reference/patterns.md` "Anti-AI Scraping" for full layering.
--- a/scripts/check-ingress-auth-comments.py
+++ b/scripts/check-ingress-auth-comments.py
@ -0,0 +1,124 @@
+#!/usr/bin/env python3
+"""Enforce the inline-comment convention for ingress_factory auth tiers.
+
+Every `auth = "app"` or `auth = "none"` line under a stack must have an
+immediately-preceding comment block containing `# auth = "<tier>":`
+that documents what gates the app (for "app") or why the endpoint is
+intentionally public (for "none").
+
+This is the static guard for the anti-exposure rule documented in
+`infra/.claude/CLAUDE.md` "Auth" section. It's invoked by `scripts/tg`
+before every plan/apply/destroy/refresh, so it fires regardless of who
+or what is running terragrunt — local laptop, CI, headless agent.
+
+Stack-scoped by design: only checks the .tf files under the stack
+being acted on. Other stacks' historical violations don't block work
+on the current stack; each stack documents itself the next time it's
+edited.
+
+Usage:
+  check-ingress-auth-comments.py <stack-path>     # scan one stack
+  check-ingress-auth-comments.py --all            # scan every stack
+"""
+
+import argparse
+import os
+import re
+import sys
+
+AUTH_LINE = re.compile(r'^\s*auth\s*=\s*"(app|none)"\s*$')
+COMMENT_LINE = re.compile(r'^\s*#')
+COMMENT_TIER = re.compile(r'auth\s*=\s*"(app|none)"')
+
+
+def scan_dir(path):
+    violations = []
+    for root, _, files in os.walk(path):
+        for f in files:
+            if not f.endswith('.tf'):
+                continue
+            full = os.path.join(root, f)
+            try:
+                with open(full) as fh:
+                    lines = fh.readlines()
+            except OSError:
+                continue
+            for i, line in enumerate(lines):
+                m = AUTH_LINE.match(line)
+                if not m:
+                    continue
+                tier = m.group(1)
+                # Walk backwards through contiguous comment lines.
+                # Pass if ANY of them documents the matching tier.
+                ok = False
+                j = i - 1
+                while j >= 0 and COMMENT_LINE.match(lines[j]):
+                    cm = COMMENT_TIER.search(lines[j])
+                    if cm and cm.group(1) == tier:
+                        ok = True
+                        break
+                    j -= 1
+                if not ok:
+                    violations.append((full, i + 1, tier))
+    return violations
+
+
+def main():
+    ap = argparse.ArgumentParser(description=__doc__.splitlines()[0])
+    g = ap.add_mutually_exclusive_group(required=True)
+    g.add_argument('path', nargs='?', help='Stack directory to scan')
+    g.add_argument('--all', action='store_true', help='Scan every stack under stacks/')
+    args = ap.parse_args()
+
+    if args.all:
+        scan_paths = ['stacks']
+    else:
+        if not os.path.isdir(args.path):
+            print(f"ERROR: {args.path} is not a directory", file=sys.stderr)
+            sys.exit(2)
+        scan_paths = [args.path]
+
+    violations = []
+    for p in scan_paths:
+        violations.extend(scan_dir(p))
+
+    if not violations:
+        return
+
+    print(
+        "\n"
+        "==============================================================\n"
+        "ingress_factory auth-comment convention violated\n"
+        "==============================================================\n"
+        "\n"
+        "Every `auth = \"app\"` or `auth = \"none\"` line must have a\n"
+        "preceding comment line documenting what gates the app (for\n"
+        "\"app\") or why the endpoint is intentionally public (for\n"
+        "\"none\"). This guard prevents accidentally exposing private\n"
+        "services. See infra/.claude/CLAUDE.md Auth section.\n"
+        "\n"
+        "Add a comment line directly above the auth line:\n"
+        "\n"
+        "  # auth = \"app\":  <what gates the app, e.g. NextAuth + OAuth>\n"
+        "  auth = \"app\"\n"
+        "\n"
+        "or:\n"
+        "\n"
+        "  # auth = \"none\": <why public, e.g. webhook receiver, CalDAV>\n"
+        "  auth = \"none\"\n"
+        "\n"
+        "Violations:",
+        file=sys.stderr,
+    )
+    for path, line_no, tier in violations:
+        print(
+            f"  {path}:{line_no}: auth = \"{tier}\" missing preceding "
+            f"`# auth = \"{tier}\":` comment",
+            file=sys.stderr,
+        )
+    print(file=sys.stderr)
+    sys.exit(1)
+
+
+if __name__ == '__main__':
+    main()
--- a/scripts/tg
+++ b/scripts/tg
@ -102,6 +102,30 @@ for arg in "$@"; do
  esac
 done

+# Detect if this is a plan/apply/destroy/refresh — anything that reads or
+# writes infra state. Cheap pre-flight check below scans only the current
+# stack's .tf files for the ingress_factory auth-comment convention. Other
+# tg verbs (init, fmt, validate) skip the check.
+is_tf_op=false
+for arg in "$@"; do
+  case "$arg" in
+    plan|apply|destroy|refresh) is_tf_op=true ;;
+  esac
+done
+
+# Anti-exposure guard: every `auth = "app"` or `auth = "none"` in this stack
+# must have a preceding `# auth = "<tier>":` comment documenting what gates
+# the app or why the endpoint is intentionally public. See:
+# - infra/modules/kubernetes/ingress_factory/main.tf (variable description)
+# - infra/.claude/CLAUDE.md "Auth" section
+# Stack-scoped: untouched stacks aren't blocked from future applies until
+# they're actually edited, at which point the convention applies.
+if $is_tf_op && [ -n "$STACK_NAME" ]; then
+  if ! "$REPO_ROOT/scripts/check-ingress-auth-comments.py" "$REPO_ROOT/stacks/$STACK_NAME"; then
+    exit 1
+  fi
+fi
+
 # Acquire lock for mutating operations (Tier 0 only — Tier 1 uses pg_advisory_lock)
 if $is_mutating && [ -n "$STACK_NAME" ] && is_tier0 "$STACK_NAME"; then
  if command -v vault &>/dev/null && [ -n "${VAULT_TOKEN:-}" ]; then