From 9e5a5fb0c7c7cfd73b51af05fcf1853568f24e6d Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Mon, 11 May 2026 19:18:27 +0000 Subject: [PATCH] infra/scripts/tg: enforce ingress_factory auth-comment convention MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Every `tg plan/apply/destroy/refresh` now runs `scripts/check-ingress-auth-comments.py` against the current stack before invoking terragrunt. The check fails closed if any `auth = "app"` or `auth = "none"` line in the stack's .tf files lacks an immediately-preceding `# auth = "": ...` comment documenting what gates the app (for "app") or why the endpoint is intentionally public (for "none"). Why tg-level (not git pre-commit): tg is the universal entry point for all infra changes. CI runs it, headless agents run it, humans run it. A pre-commit hook only catches the human path. Wiring the check into tg means the anti-exposure guard fires regardless of who or what is invoking terragrunt. Stack-scoped: each stack documents itself the next time it's edited. The 30+ existing `auth = "none"` stacks that predate this guard are not blocked from operating today; they'll need the comment added the next time someone runs `tg plan` on them — at which point the gate forces a conscious "yes, this is intentional" moment before any state change can land. Skipped on: init, fmt, validate, output, etc. — anything that doesn't read or write infra state. --- .claude/CLAUDE.md | 2 +- scripts/check-ingress-auth-comments.py | 124 +++++++++++++++++++++++++ scripts/tg | 24 +++++ 3 files changed, 149 insertions(+), 1 deletion(-) create mode 100755 scripts/check-ingress-auth-comments.py diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index a4541789..7e520085 100755 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -33,7 +33,7 @@ Violations cause state drift, which causes future applies to break or silently r - `auth = "app"` — the backend handles its own user authentication (NextAuth, Django, OAuth, bearer-token API, etc.); Authentik would only break it. No middleware attached; the app's own login is the gate. Examples: immich, linkwarden, tandoor, freshrss, affine, actualbudget, audiobookshelf, novelapp. **Functionally identical to `"none"`** — the distinct name exists to record intent at the call site. - `auth = "public"` — Authentik anonymous binding via the dedicated `public` outpost (routes via `traefik-authentik-forward-auth-public` → `ak-outpost-public.authentik.svc:9000`). Strangers auto-bound to `guest`; logged-in users keep their identity in `X-authentik-username`. **Only works for top-level browser navigation** — CORS preflight rejects XHR/fetch and automation can't replay the cookie dance. Audit trail, not a gate. - `auth = "none"` — no Authentik, no own-auth claim. Use for Anubis-fronted content (Anubis is the gate), native-client APIs (Git, `/v2/`, WebDAV/CalDAV, CardDAV), webhook receivers, OAuth callbacks, and Authentik outposts themselves. - - **Anti-exposure rule** (the reason `"app"` exists): only pick `"app"` or `"none"` AFTER you've verified the app has its own user auth (`"app"`) OR the endpoint is intentionally public (`"none"`). Default is `"required"` so accidental omission fails closed. **Convention**: when using `"app"` or `"none"`, add a comment line above the `auth = "..."` line stating what gates the app or why it's public. + - **Anti-exposure rule** (the reason `"app"` exists): only pick `"app"` or `"none"` AFTER you've verified the app has its own user auth (`"app"`) OR the endpoint is intentionally public (`"none"`). Default is `"required"` so accidental omission fails closed. **Convention**: when using `"app"` or `"none"`, add a comment line above the `auth = "..."` line stating what gates the app or why it's public. **Enforced by `scripts/tg`**: every `tg plan/apply/destroy/refresh` runs `scripts/check-ingress-auth-comments.py` against the current stack and aborts if any `auth = "app|none"` line lacks the preceding `# auth = "": ...` comment. Stack-scoped — untouched stacks aren't blocked until they're next edited. - **Anti-AI**: on by default when `auth = "none"` or `auth = "app"` (no Authentik to discourage bots); redundant on `"required"` and `"public"`. - **DNS**: `dns_type = "proxied"` (Cloudflare CDN) or `"non-proxied"` (direct A/AAAA). DNS records are auto-created — no need to edit `config.tfvars`. Smoke-test target: `echo.viktorbarzin.me` (auth=public, header-reflecting backend). - **Anubis PoW challenge** (`modules/kubernetes/anubis_instance/`): per-site reverse proxy that issues a 30-day JWT cookie after a tiny PoW solve. Use for **public, content-bearing sites without app-level auth** (blog, docs, wikis, static landing pages). Pattern: declare `module "anubis" { source = "../../modules/kubernetes/anubis_instance"; name = "X"; namespace = ...; target_url = "http://..svc.cluster.local" }`, then in `ingress_factory` set `service_name = module.anubis.service_name`, `port = module.anubis.service_port`, `anti_ai_scraping = false`. Shared ed25519 key in Vault `secret/viktor` -> `anubis_ed25519_key`; cookie scoped to `viktorbarzin.me` so one solve covers all Anubis-fronted subdomains. **DO NOT put Anubis in front of Git/API/WebDAV/CLI endpoints** — clients without JS can't solve PoW. **Replicas default to 1** because Anubis stores in-flight challenges in process memory; a challenge issued by pod A and solved against pod B errors with `store: key not found` (HTTP 500). Bumping replicas requires wiring a shared Redis store (TODO). For path-level carve-outs (e.g. wrongmove has `/` behind Anubis but `/api` direct), declare a second `ingress_factory` with `ingress_path = ["/api"]` pointing at the bare backend service. Active on: blog, www, kms, travel, f1, cc, json, pb (privatebin), home (homepage), wrongmove (UI only). See `.claude/reference/patterns.md` "Anti-AI Scraping" for full layering. diff --git a/scripts/check-ingress-auth-comments.py b/scripts/check-ingress-auth-comments.py new file mode 100755 index 00000000..263d6aa0 --- /dev/null +++ b/scripts/check-ingress-auth-comments.py @@ -0,0 +1,124 @@ +#!/usr/bin/env python3 +"""Enforce the inline-comment convention for ingress_factory auth tiers. + +Every `auth = "app"` or `auth = "none"` line under a stack must have an +immediately-preceding comment block containing `# auth = "":` +that documents what gates the app (for "app") or why the endpoint is +intentionally public (for "none"). + +This is the static guard for the anti-exposure rule documented in +`infra/.claude/CLAUDE.md` "Auth" section. It's invoked by `scripts/tg` +before every plan/apply/destroy/refresh, so it fires regardless of who +or what is running terragrunt — local laptop, CI, headless agent. + +Stack-scoped by design: only checks the .tf files under the stack +being acted on. Other stacks' historical violations don't block work +on the current stack; each stack documents itself the next time it's +edited. + +Usage: + check-ingress-auth-comments.py # scan one stack + check-ingress-auth-comments.py --all # scan every stack +""" + +import argparse +import os +import re +import sys + +AUTH_LINE = re.compile(r'^\s*auth\s*=\s*"(app|none)"\s*$') +COMMENT_LINE = re.compile(r'^\s*#') +COMMENT_TIER = re.compile(r'auth\s*=\s*"(app|none)"') + + +def scan_dir(path): + violations = [] + for root, _, files in os.walk(path): + for f in files: + if not f.endswith('.tf'): + continue + full = os.path.join(root, f) + try: + with open(full) as fh: + lines = fh.readlines() + except OSError: + continue + for i, line in enumerate(lines): + m = AUTH_LINE.match(line) + if not m: + continue + tier = m.group(1) + # Walk backwards through contiguous comment lines. + # Pass if ANY of them documents the matching tier. + ok = False + j = i - 1 + while j >= 0 and COMMENT_LINE.match(lines[j]): + cm = COMMENT_TIER.search(lines[j]) + if cm and cm.group(1) == tier: + ok = True + break + j -= 1 + if not ok: + violations.append((full, i + 1, tier)) + return violations + + +def main(): + ap = argparse.ArgumentParser(description=__doc__.splitlines()[0]) + g = ap.add_mutually_exclusive_group(required=True) + g.add_argument('path', nargs='?', help='Stack directory to scan') + g.add_argument('--all', action='store_true', help='Scan every stack under stacks/') + args = ap.parse_args() + + if args.all: + scan_paths = ['stacks'] + else: + if not os.path.isdir(args.path): + print(f"ERROR: {args.path} is not a directory", file=sys.stderr) + sys.exit(2) + scan_paths = [args.path] + + violations = [] + for p in scan_paths: + violations.extend(scan_dir(p)) + + if not violations: + return + + print( + "\n" + "==============================================================\n" + "ingress_factory auth-comment convention violated\n" + "==============================================================\n" + "\n" + "Every `auth = \"app\"` or `auth = \"none\"` line must have a\n" + "preceding comment line documenting what gates the app (for\n" + "\"app\") or why the endpoint is intentionally public (for\n" + "\"none\"). This guard prevents accidentally exposing private\n" + "services. See infra/.claude/CLAUDE.md Auth section.\n" + "\n" + "Add a comment line directly above the auth line:\n" + "\n" + " # auth = \"app\": \n" + " auth = \"app\"\n" + "\n" + "or:\n" + "\n" + " # auth = \"none\": \n" + " auth = \"none\"\n" + "\n" + "Violations:", + file=sys.stderr, + ) + for path, line_no, tier in violations: + print( + f" {path}:{line_no}: auth = \"{tier}\" missing preceding " + f"`# auth = \"{tier}\":` comment", + file=sys.stderr, + ) + print(file=sys.stderr) + sys.exit(1) + + +if __name__ == '__main__': + main() diff --git a/scripts/tg b/scripts/tg index 15cea845..b9e9f0da 100755 --- a/scripts/tg +++ b/scripts/tg @@ -102,6 +102,30 @@ for arg in "$@"; do esac done +# Detect if this is a plan/apply/destroy/refresh — anything that reads or +# writes infra state. Cheap pre-flight check below scans only the current +# stack's .tf files for the ingress_factory auth-comment convention. Other +# tg verbs (init, fmt, validate) skip the check. +is_tf_op=false +for arg in "$@"; do + case "$arg" in + plan|apply|destroy|refresh) is_tf_op=true ;; + esac +done + +# Anti-exposure guard: every `auth = "app"` or `auth = "none"` in this stack +# must have a preceding `# auth = "":` comment documenting what gates +# the app or why the endpoint is intentionally public. See: +# - infra/modules/kubernetes/ingress_factory/main.tf (variable description) +# - infra/.claude/CLAUDE.md "Auth" section +# Stack-scoped: untouched stacks aren't blocked from future applies until +# they're actually edited, at which point the convention applies. +if $is_tf_op && [ -n "$STACK_NAME" ]; then + if ! "$REPO_ROOT/scripts/check-ingress-auth-comments.py" "$REPO_ROOT/stacks/$STACK_NAME"; then + exit 1 + fi +fi + # Acquire lock for mutating operations (Tier 0 only — Tier 1 uses pg_advisory_lock) if $is_mutating && [ -n "$STACK_NAME" ] && is_tier0 "$STACK_NAME"; then if command -v vault &>/dev/null && [ -n "${VAULT_TOKEN:-}" ]; then