infra/scripts/tg: enforce ingress_factory auth-comment convention

Every `tg plan/apply/destroy/refresh` now runs
`scripts/check-ingress-auth-comments.py` against the current stack
before invoking terragrunt. The check fails closed if any
`auth = "app"` or `auth = "none"` line in the stack's .tf files lacks
an immediately-preceding `# auth = "<tier>": ...` comment documenting
what gates the app (for "app") or why the endpoint is intentionally
public (for "none").

Why tg-level (not git pre-commit): tg is the universal entry point
for all infra changes. CI runs it, headless agents run it, humans
run it. A pre-commit hook only catches the human path. Wiring the
check into tg means the anti-exposure guard fires regardless of who
or what is invoking terragrunt.

Stack-scoped: each stack documents itself the next time it's edited.
The 30+ existing `auth = "none"` stacks that predate this guard are
not blocked from operating today; they'll need the comment added the
next time someone runs `tg plan` on them — at which point the gate
forces a conscious "yes, this is intentional" moment before any
state change can land.

Skipped on: init, fmt, validate, output, etc. — anything that doesn't
read or write infra state.
This commit is contained in:
Viktor Barzin 2026-05-11 19:18:27 +00:00
parent e826e36658
commit 9e5a5fb0c7
3 changed files with 149 additions and 1 deletions

View file

@ -33,7 +33,7 @@ Violations cause state drift, which causes future applies to break or silently r
- `auth = "app"` — the backend handles its own user authentication (NextAuth, Django, OAuth, bearer-token API, etc.); Authentik would only break it. No middleware attached; the app's own login is the gate. Examples: immich, linkwarden, tandoor, freshrss, affine, actualbudget, audiobookshelf, novelapp. **Functionally identical to `"none"`** — the distinct name exists to record intent at the call site.
- `auth = "public"` — Authentik anonymous binding via the dedicated `public` outpost (routes via `traefik-authentik-forward-auth-public``ak-outpost-public.authentik.svc:9000`). Strangers auto-bound to `guest`; logged-in users keep their identity in `X-authentik-username`. **Only works for top-level browser navigation** — CORS preflight rejects XHR/fetch and automation can't replay the cookie dance. Audit trail, not a gate.
- `auth = "none"` — no Authentik, no own-auth claim. Use for Anubis-fronted content (Anubis is the gate), native-client APIs (Git, `/v2/`, WebDAV/CalDAV, CardDAV), webhook receivers, OAuth callbacks, and Authentik outposts themselves.
- **Anti-exposure rule** (the reason `"app"` exists): only pick `"app"` or `"none"` AFTER you've verified the app has its own user auth (`"app"`) OR the endpoint is intentionally public (`"none"`). Default is `"required"` so accidental omission fails closed. **Convention**: when using `"app"` or `"none"`, add a comment line above the `auth = "..."` line stating what gates the app or why it's public.
- **Anti-exposure rule** (the reason `"app"` exists): only pick `"app"` or `"none"` AFTER you've verified the app has its own user auth (`"app"`) OR the endpoint is intentionally public (`"none"`). Default is `"required"` so accidental omission fails closed. **Convention**: when using `"app"` or `"none"`, add a comment line above the `auth = "..."` line stating what gates the app or why it's public. **Enforced by `scripts/tg`**: every `tg plan/apply/destroy/refresh` runs `scripts/check-ingress-auth-comments.py` against the current stack and aborts if any `auth = "app|none"` line lacks the preceding `# auth = "<tier>": ...` comment. Stack-scoped — untouched stacks aren't blocked until they're next edited.
- **Anti-AI**: on by default when `auth = "none"` or `auth = "app"` (no Authentik to discourage bots); redundant on `"required"` and `"public"`.
- **DNS**: `dns_type = "proxied"` (Cloudflare CDN) or `"non-proxied"` (direct A/AAAA). DNS records are auto-created — no need to edit `config.tfvars`. Smoke-test target: `echo.viktorbarzin.me` (auth=public, header-reflecting backend).
- **Anubis PoW challenge** (`modules/kubernetes/anubis_instance/`): per-site reverse proxy that issues a 30-day JWT cookie after a tiny PoW solve. Use for **public, content-bearing sites without app-level auth** (blog, docs, wikis, static landing pages). Pattern: declare `module "anubis" { source = "../../modules/kubernetes/anubis_instance"; name = "X"; namespace = ...; target_url = "http://<backend>.<ns>.svc.cluster.local" }`, then in `ingress_factory` set `service_name = module.anubis.service_name`, `port = module.anubis.service_port`, `anti_ai_scraping = false`. Shared ed25519 key in Vault `secret/viktor` -> `anubis_ed25519_key`; cookie scoped to `viktorbarzin.me` so one solve covers all Anubis-fronted subdomains. **DO NOT put Anubis in front of Git/API/WebDAV/CLI endpoints** — clients without JS can't solve PoW. **Replicas default to 1** because Anubis stores in-flight challenges in process memory; a challenge issued by pod A and solved against pod B errors with `store: key not found` (HTTP 500). Bumping replicas requires wiring a shared Redis store (TODO). For path-level carve-outs (e.g. wrongmove has `/` behind Anubis but `/api` direct), declare a second `ingress_factory` with `ingress_path = ["/api"]` pointing at the bare backend service. Active on: blog, www, kms, travel, f1, cc, json, pb (privatebin), home (homepage), wrongmove (UI only). See `.claude/reference/patterns.md` "Anti-AI Scraping" for full layering.

View file

@ -0,0 +1,124 @@
#!/usr/bin/env python3
"""Enforce the inline-comment convention for ingress_factory auth tiers.
Every `auth = "app"` or `auth = "none"` line under a stack must have an
immediately-preceding comment block containing `# auth = "<tier>":`
that documents what gates the app (for "app") or why the endpoint is
intentionally public (for "none").
This is the static guard for the anti-exposure rule documented in
`infra/.claude/CLAUDE.md` "Auth" section. It's invoked by `scripts/tg`
before every plan/apply/destroy/refresh, so it fires regardless of who
or what is running terragrunt local laptop, CI, headless agent.
Stack-scoped by design: only checks the .tf files under the stack
being acted on. Other stacks' historical violations don't block work
on the current stack; each stack documents itself the next time it's
edited.
Usage:
check-ingress-auth-comments.py <stack-path> # scan one stack
check-ingress-auth-comments.py --all # scan every stack
"""
import argparse
import os
import re
import sys
AUTH_LINE = re.compile(r'^\s*auth\s*=\s*"(app|none)"\s*$')
COMMENT_LINE = re.compile(r'^\s*#')
COMMENT_TIER = re.compile(r'auth\s*=\s*"(app|none)"')
def scan_dir(path):
violations = []
for root, _, files in os.walk(path):
for f in files:
if not f.endswith('.tf'):
continue
full = os.path.join(root, f)
try:
with open(full) as fh:
lines = fh.readlines()
except OSError:
continue
for i, line in enumerate(lines):
m = AUTH_LINE.match(line)
if not m:
continue
tier = m.group(1)
# Walk backwards through contiguous comment lines.
# Pass if ANY of them documents the matching tier.
ok = False
j = i - 1
while j >= 0 and COMMENT_LINE.match(lines[j]):
cm = COMMENT_TIER.search(lines[j])
if cm and cm.group(1) == tier:
ok = True
break
j -= 1
if not ok:
violations.append((full, i + 1, tier))
return violations
def main():
ap = argparse.ArgumentParser(description=__doc__.splitlines()[0])
g = ap.add_mutually_exclusive_group(required=True)
g.add_argument('path', nargs='?', help='Stack directory to scan')
g.add_argument('--all', action='store_true', help='Scan every stack under stacks/')
args = ap.parse_args()
if args.all:
scan_paths = ['stacks']
else:
if not os.path.isdir(args.path):
print(f"ERROR: {args.path} is not a directory", file=sys.stderr)
sys.exit(2)
scan_paths = [args.path]
violations = []
for p in scan_paths:
violations.extend(scan_dir(p))
if not violations:
return
print(
"\n"
"==============================================================\n"
"ingress_factory auth-comment convention violated\n"
"==============================================================\n"
"\n"
"Every `auth = \"app\"` or `auth = \"none\"` line must have a\n"
"preceding comment line documenting what gates the app (for\n"
"\"app\") or why the endpoint is intentionally public (for\n"
"\"none\"). This guard prevents accidentally exposing private\n"
"services. See infra/.claude/CLAUDE.md Auth section.\n"
"\n"
"Add a comment line directly above the auth line:\n"
"\n"
" # auth = \"app\": <what gates the app, e.g. NextAuth + OAuth>\n"
" auth = \"app\"\n"
"\n"
"or:\n"
"\n"
" # auth = \"none\": <why public, e.g. webhook receiver, CalDAV>\n"
" auth = \"none\"\n"
"\n"
"Violations:",
file=sys.stderr,
)
for path, line_no, tier in violations:
print(
f" {path}:{line_no}: auth = \"{tier}\" missing preceding "
f"`# auth = \"{tier}\":` comment",
file=sys.stderr,
)
print(file=sys.stderr)
sys.exit(1)
if __name__ == '__main__':
main()

View file

@ -102,6 +102,30 @@ for arg in "$@"; do
esac
done
# Detect if this is a plan/apply/destroy/refresh — anything that reads or
# writes infra state. Cheap pre-flight check below scans only the current
# stack's .tf files for the ingress_factory auth-comment convention. Other
# tg verbs (init, fmt, validate) skip the check.
is_tf_op=false
for arg in "$@"; do
case "$arg" in
plan|apply|destroy|refresh) is_tf_op=true ;;
esac
done
# Anti-exposure guard: every `auth = "app"` or `auth = "none"` in this stack
# must have a preceding `# auth = "<tier>":` comment documenting what gates
# the app or why the endpoint is intentionally public. See:
# - infra/modules/kubernetes/ingress_factory/main.tf (variable description)
# - infra/.claude/CLAUDE.md "Auth" section
# Stack-scoped: untouched stacks aren't blocked from future applies until
# they're actually edited, at which point the convention applies.
if $is_tf_op && [ -n "$STACK_NAME" ]; then
if ! "$REPO_ROOT/scripts/check-ingress-auth-comments.py" "$REPO_ROOT/stacks/$STACK_NAME"; then
exit 1
fi
fi
# Acquire lock for mutating operations (Tier 0 only — Tier 1 uses pg_advisory_lock)
if $is_mutating && [ -n "$STACK_NAME" ] && is_tier0 "$STACK_NAME"; then
if command -v vault &>/dev/null && [ -n "${VAULT_TOKEN:-}" ]; then