2026-04-16 13:45:04 +00:00
terraform {
required_providers {
cloudflare = {
source = " cloudflare/cloudflare "
version = " ~> 4 "
}
kubernetes = {
source = " hashicorp/kubernetes "
}
}
}
2025-01-14 20:12:26 +00:00
2025-01-14 22:53:04 +00:00
variable " name " { type = string }
variable " service_name " {
type = string
default = null # defaults to name
}
variable " host " {
type = string
default = null
}
2025-01-14 20:12:26 +00:00
variable " namespace " { type = string }
variable " external_name " {
type = string
default = null
}
variable " port " {
default = " 80 "
}
variable " tls_secret_name " { }
variable " backend_protocol " {
default = " HTTP "
}
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
variable " auth " {
type = string
default = " required "
description = < < - EOT
infra/ingress_factory: add auth = "app" mode for self-authed backends
Adds a fourth auth tier alongside required/public/none. "app" is
functionally identical to "none" — no Authentik middleware attached —
but the distinct name records intent at the call site: this backend
has its own user login (NextAuth, Django, OAuth, bearer-token API,
etc.) and Authentik would only break it.
Why the new tier: with only required/none, every "the app has its
own auth so drop Authentik" decision looked identical at the call
site to "this is an OAuth callback / webhook receiver / native-client
API". Future readers couldn't tell whether a stack was intentionally
unauthenticated or relying on backend auth. Now they can.
Migrates the 8 stacks flipped earlier this session (novelapp, immich,
linkwarden, tandoor, freshrss, affine, actualbudget, ebooks/audiobookshelf)
from "none" to "app". Confirmed no-op: `tg plan` on novelapp showed
"No changes" — same middleware chain, same live state.
The variable description and the .claude/CLAUDE.md Auth section now
spell out the anti-exposure rule: only pick "app" or "none" AFTER
verifying the app has its own user auth ("app") or the endpoint is
intentionally public ("none"). Default stays "required" so accidental
omission fails closed.
[ci skip]
2026-05-11 18:59:11 +00:00
Auth posture for this ingress . Pick by asking " what gates the app? " :
* " required " ( default , fail - closed ) : Authentik forward - auth gates every
request . Pick this when the backend has NO built - in user auth and
Authentik is the only thing standing between strangers and the app .
Examples : prowlarr , qbittorrent , netbox , phpipam , k8s - dashboard , any
admin UI shipped without its own login .
* " app " : the backend handles its own user authentication ( NextAuth ,
Django sessions , OAuth , bearer - token API , etc . ) and Authentik would
only get in the way . No Authentik middleware is attached ; the app ' s
own login is the gate . Examples : immich , linkwarden , tandoor ,
freshrss , affine , actualbudget , audiobookshelf , novelapp .
* * Functionally identical to " none " * * — the distinct name exists to
record intent at the call site so future readers don ' t have to guess .
* " public " : Authentik anonymous binding via the ` public ` outpost .
Strangers are auto - bound to the ` guest ` Authentik user ; logged - in
users keep their identity in X - authentik - username . Only works for
top - level browser navigation — CORS preflight rejects XHR / fetch and
automation can ' t replay the cookie dance . Audit trail , not a gate .
* " none " : no Authentik middleware , no own - auth claim — explicitly
public or unauthenticated - by - design . Use for : Anubis - fronted content
sites ( where Anubis is the gate ) , native - client APIs that auth
themselves ( Git , / v2 / , WebDAV / CalDAV , CardDAV ) , webhook receivers ,
OAuth callbacks , and Authentik outposts themselves .
* * Anti - exposure rule * * ( the reason " app " exists as a distinct mode ) :
only pick " app " or " none " AFTER you have verified the app has its own
user auth ( for " app " ) OR the endpoint is intentionally public ( for
" none " ) . Picking either of these on a naked admin UI exposes it to the
internet . The default is " required " specifically so accidental omission
fails closed .
* * Convention * * : when using " app " or " none " , add a comment line above
the ` auth = " ... " ` line stating what gates the app or why it ' s public .
Future - you reads the call site , not the module description .
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
EOT
validation {
infra/ingress_factory: add auth = "app" mode for self-authed backends
Adds a fourth auth tier alongside required/public/none. "app" is
functionally identical to "none" — no Authentik middleware attached —
but the distinct name records intent at the call site: this backend
has its own user login (NextAuth, Django, OAuth, bearer-token API,
etc.) and Authentik would only break it.
Why the new tier: with only required/none, every "the app has its
own auth so drop Authentik" decision looked identical at the call
site to "this is an OAuth callback / webhook receiver / native-client
API". Future readers couldn't tell whether a stack was intentionally
unauthenticated or relying on backend auth. Now they can.
Migrates the 8 stacks flipped earlier this session (novelapp, immich,
linkwarden, tandoor, freshrss, affine, actualbudget, ebooks/audiobookshelf)
from "none" to "app". Confirmed no-op: `tg plan` on novelapp showed
"No changes" — same middleware chain, same live state.
The variable description and the .claude/CLAUDE.md Auth section now
spell out the anti-exposure rule: only pick "app" or "none" AFTER
verifying the app has its own user auth ("app") or the endpoint is
intentionally public ("none"). Default stays "required" so accidental
omission fails closed.
[ci skip]
2026-05-11 18:59:11 +00:00
condition = contains ( [ " required " , " app " , " public " , " none " ] , var . auth )
error_message = " auth must be one of: required, app, public, none. "
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
}
2025-01-14 20:12:26 +00:00
}
variable " ingress_path " {
type = list ( string )
default = [ " / " ]
}
variable " max_body_size " {
2026-05-07 15:51:34 +00:00
type = string
default = null
description = " Maximum request body size, e.g. '5g'. null = no limit (Traefik default). When set, a per-ingress Buffering middleware is created and attached. "
2025-01-14 20:12:26 +00:00
}
variable " extra_annotations " {
default = { }
}
2025-12-14 09:47:52 +00:00
variable " ssl_redirect " {
default = true
type = bool
}
variable " allow_local_access_only " {
default = false
type = bool
}
variable " root_domain " {
default = " viktorbarzin.me "
type = string
}
2026-02-07 13:24:58 +00:00
variable " custom_content_security_policy " {
2025-12-18 10:45:03 +00:00
type = string
2026-02-07 13:24:58 +00:00
default = null
}
variable " exclude_crowdsec " {
type = bool
default = false
2025-12-18 10:45:03 +00:00
}
2026-02-10 21:11:46 +00:00
variable " full_host " {
type = string
default = null
}
variable " extra_middlewares " {
type = list ( string )
default = [ ]
}
variable " skip_default_rate_limit " {
type = bool
default = false
}
2026-02-22 19:50:07 +00:00
variable " anti_ai_scraping " {
type = bool
2026-03-29 01:04:11 +02:00
default = null # null = auto (enabled when not protected, disabled when protected)
2026-02-22 19:50:07 +00:00
}
2026-02-10 21:11:46 +00:00
2026-04-16 13:45:04 +00:00
variable " dns_type " {
type = string
default = " none "
description = " Cloudflare DNS: 'proxied' (CNAME to tunnel), 'non-proxied' (A/AAAA to public IP), or 'none' "
validation {
condition = contains ( [ " proxied " , " non-proxied " , " none " ] , var . dns_type )
error_message = " dns_type must be 'proxied', 'non-proxied', or 'none'. "
}
}
[infra] Per-ingress external-monitor annotation + actualbudget plan-time fix [ci skip]
## Context
Two operational gaps surfaced during a healthcheck sweep today:
1. **External monitoring coverage**: Only ~13 hostnames (via `cloudflare_proxied_names`
in `config.tfvars`) had `[External]` monitors in Uptime Kuma. Any service deployed via
`ingress_factory` with `dns_type = "proxied"` auto-created its DNS record but was NOT
registered for external probing — so outages like Immich going down externally were
invisible until a user complained. 99 of ~125 public ingresses had no external
monitor.
2. **actualbudget stack unplannable**: `count = var.budget_encryption_password != null
? 1 : 0` in `factory/main.tf:152` failed with "Invalid count argument" because the
value flows from a `data.kubernetes_secret` whose contents are `(known after apply)`
at plan time. Blocked CI applies and drift reconciliation.
## This change
### Per-ingress external-monitor annotation (ingress_factory + reverse_proxy/factory)
- New variables `external_monitor` (bool, nullable) + `external_monitor_name` (string,
nullable). Default is "follow dns_type" — enabled for any public DNS record
(`dns_type != "none"`, covers both proxied and non-proxied so Immich and other
direct-A records are also monitored).
- Emits two annotations on the Ingress:
- `uptime.viktorbarzin.me/external-monitor = "true"`
- `uptime.viktorbarzin.me/external-monitor-name = "<label>"` (optional override)
### external-monitor-sync CronJob (uptime-kuma stack)
- Discovers targets from live Ingress objects via the K8s API first (filter by
annotation), falls back to the legacy `external-monitor-targets` ConfigMap on any
API error (zero rollout risk).
- New `ServiceAccount` + cluster-wide `ClusterRole`/`ClusterRoleBinding` giving
`list`/`get` on `networking.k8s.io/ingresses`.
- `API_SERVER` now uses the `KUBERNETES_SERVICE_HOST` env var (always injected by K8s)
instead of `kubernetes.default.svc` — the search-domain expansion failed in the
CronJob pod's DNS config. Verified working: CronJob now logs
`Loaded N external monitor targets (source=k8s-api)`.
### actualbudget count-on-unknown refactor
- Replaced `count = var.budget_encryption_password != null ? 1 : 0` with two explicit
plan-time booleans: `enable_http_api` and `enable_bank_sync`. Values are known at
plan; no `-target` workaround needed.
- Callers (`stacks/actualbudget/main.tf`) pass `true` explicitly. Runtime behaviour is
unchanged — the secret is still consumed via env var.
- Also aligned the factory with live state (the 3 budget-* PVCs had been migrated
`proxmox-lvm` → `proxmox-lvm-encrypted` outside Terraform): PVC resource renamed
`data_proxmox` → `data_encrypted`, storage class updated, orphaned `nfs_data` module
removed. State was rm'd + re-imported with matching UIDs, so no data was moved.
## Rollout status (already partially applied in this session)
- `stacks/uptime-kuma` applied — SA + RBAC + CronJob changes live; FQDN fix verified
- `stacks/actualbudget` applied — budget-{viktor,anca,emo} all 200 OK externally
- `stacks/mailserver` + 21 other ingress_factory consumers applied — annotations live
- CronJob `external-monitor-sync` latest run: `source=k8s-api`, 26 monitors active
(was 13 on the central list)
## Deferred (separate work)
- 4 stacks show pre-existing DESTRUCTIVE drift in plan (metallb namespace, claude-memory,
rbac, redis) — NOT triggered by this commit but will be by CI's global-file cascade.
`[ci skip]` here so those don't auto-apply; they will be fixed manually before the
next CI push.
- Cleanup of `cloudflare_proxied_names` list once Helm-managed ingresses (authentik,
grafana, vault, forgejo) are annotated — separate PR.
## Test plan
### Automated
\`\`\`
\$ kubectl -n uptime-kuma logs \$(kubectl -n uptime-kuma get pods -l job-name -o name | tail -1)
Loaded 26 external monitor targets (source=k8s-api)
Sync complete: 7 created, 0 deleted, 17 unchanged
\$ curl -sk -o /dev/null -w "%{http_code}\n" -H "Accept: text/html" \\
https://dawarich.viktorbarzin.me/ https://nextcloud.viktorbarzin.me/ \\
https://budget-viktor.viktorbarzin.me/
200 302 200
\$ kubectl -n actualbudget get deploy,pvc -l app=budget-viktor
deployment.apps/budget-viktor 1/1 1 1 Ready
persistentvolumeclaim/budget-viktor-data-encrypted Bound 10Gi RWO proxmox-lvm-encrypted
\`\`\`
### Manual Verification
1. Confirm the annotation is present on an ingress_factory ingress:
\`\`\`
kubectl -n dawarich get ingress dawarich -o \\
jsonpath='{.metadata.annotations.uptime\.viktorbarzin\.me/external-monitor}'
# Expected: "true"
\`\`\`
2. Confirm the new `[External] <name>` monitor appears in Uptime Kuma within 10 min
(CronJob interval). For Immich specifically, it will appear after the immich stack
is re-applied.
3. Verify actualbudget plan is clean:
\`\`\`
cd stacks/actualbudget && scripts/tg plan --non-interactive
# Expected: no "Invalid count argument" errors
\`\`\`
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 10:34:32 +00:00
# Uptime Kuma external monitor: when true, annotate the ingress so the
# external-monitor-sync CronJob creates a `[External] <name>` monitor pointing
# at https://<host>. Null means "follow dns_type" — enabled when proxied.
variable " external_monitor " {
type = bool
default = null
description = " Enable Uptime Kuma external monitor. null = auto (enabled when dns_type == 'proxied'). "
}
variable " external_monitor_name " {
type = string
default = null
description = " Override the monitor label. Defaults to the ingress hostname label (e.g. 'dawarich' for dawarich.viktorbarzin.me). "
}
2026-04-16 13:45:04 +00:00
# Cloudflare config defaults — override via variables if these change.
# Source of truth: config.tfvars (cloudflare_zone_id, cloudflare_tunnel_id, public_ip, public_ipv6)
variable " cloudflare_zone_id " {
type = string
default = " fd2c5dd4efe8fe38958944e74d0ced6d "
}
variable " cloudflare_tunnel_id " {
type = string
default = " 75182cd7-bb91-4310-b961-5d8967da8b41 "
}
variable " public_ip " {
type = string
default = " 176.12.22.76 "
}
variable " public_ipv6 " {
type = string
default = " 2001:470:6e:43d::2 "
}
2026-03-25 11:00:38 +02:00
variable " homepage_group " {
type = string
default = null # auto-detect from namespace
}
variable " homepage_enabled " {
type = bool
default = true
}
2026-02-10 21:11:46 +00:00
locals {
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
effective_host = var . full_host ! = null ? var . full_host : " ${ var . host ! = null ? var . host : var . name } . ${ var . root_domain } "
infra/ingress_factory: add auth = "app" mode for self-authed backends
Adds a fourth auth tier alongside required/public/none. "app" is
functionally identical to "none" — no Authentik middleware attached —
but the distinct name records intent at the call site: this backend
has its own user login (NextAuth, Django, OAuth, bearer-token API,
etc.) and Authentik would only break it.
Why the new tier: with only required/none, every "the app has its
own auth so drop Authentik" decision looked identical at the call
site to "this is an OAuth callback / webhook receiver / native-client
API". Future readers couldn't tell whether a stack was intentionally
unauthenticated or relying on backend auth. Now they can.
Migrates the 8 stacks flipped earlier this session (novelapp, immich,
linkwarden, tandoor, freshrss, affine, actualbudget, ebooks/audiobookshelf)
from "none" to "app". Confirmed no-op: `tg plan` on novelapp showed
"No changes" — same middleware chain, same live state.
The variable description and the .claude/CLAUDE.md Auth section now
spell out the anti-exposure rule: only pick "app" or "none" AFTER
verifying the app has its own user auth ("app") or the endpoint is
intentionally public ("none"). Default stays "required" so accidental
omission fails closed.
[ci skip]
2026-05-11 18:59:11 +00:00
# Anti-AI default: ON when no Authentik auth fronts the ingress (auth =
# "none" or auth = "app" — either the app gates users itself or the site
# is intentionally public). When Authentik gates the request
# (required/public), the auth flow already discourages bots.
effective_anti_ai = var . anti_ai_scraping ! = null ? var . anti_ai_scraping : ( var . auth == " none " | | var . auth == " app " )
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
infra/ingress_factory: add auth = "app" mode for self-authed backends
Adds a fourth auth tier alongside required/public/none. "app" is
functionally identical to "none" — no Authentik middleware attached —
but the distinct name records intent at the call site: this backend
has its own user login (NextAuth, Django, OAuth, bearer-token API,
etc.) and Authentik would only break it.
Why the new tier: with only required/none, every "the app has its
own auth so drop Authentik" decision looked identical at the call
site to "this is an OAuth callback / webhook receiver / native-client
API". Future readers couldn't tell whether a stack was intentionally
unauthenticated or relying on backend auth. Now they can.
Migrates the 8 stacks flipped earlier this session (novelapp, immich,
linkwarden, tandoor, freshrss, affine, actualbudget, ebooks/audiobookshelf)
from "none" to "app". Confirmed no-op: `tg plan` on novelapp showed
"No changes" — same middleware chain, same live state.
The variable description and the .claude/CLAUDE.md Auth section now
spell out the anti-exposure rule: only pick "app" or "none" AFTER
verifying the app has its own user auth ("app") or the endpoint is
intentionally public ("none"). Default stays "required" so accidental
omission fails closed.
[ci skip]
2026-05-11 18:59:11 +00:00
# Auth middleware selection. "app" and "none" both attach no Authentik
# middleware — "app" signals "the backend has its own user auth", "none"
# signals "intentionally public / native-client API / webhook". The
# distinction lives at the call site for human readers; the runtime
# effect is identical.
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
auth_middleware = (
var . auth == " required " ? " traefik-authentik-forward-auth@kubernetescrd " :
var . auth == " public " ? " traefik-authentik-forward-auth-public@kubernetescrd " :
null
)
2026-03-25 11:00:38 +02:00
[infra] Per-ingress external-monitor annotation + actualbudget plan-time fix [ci skip]
## Context
Two operational gaps surfaced during a healthcheck sweep today:
1. **External monitoring coverage**: Only ~13 hostnames (via `cloudflare_proxied_names`
in `config.tfvars`) had `[External]` monitors in Uptime Kuma. Any service deployed via
`ingress_factory` with `dns_type = "proxied"` auto-created its DNS record but was NOT
registered for external probing — so outages like Immich going down externally were
invisible until a user complained. 99 of ~125 public ingresses had no external
monitor.
2. **actualbudget stack unplannable**: `count = var.budget_encryption_password != null
? 1 : 0` in `factory/main.tf:152` failed with "Invalid count argument" because the
value flows from a `data.kubernetes_secret` whose contents are `(known after apply)`
at plan time. Blocked CI applies and drift reconciliation.
## This change
### Per-ingress external-monitor annotation (ingress_factory + reverse_proxy/factory)
- New variables `external_monitor` (bool, nullable) + `external_monitor_name` (string,
nullable). Default is "follow dns_type" — enabled for any public DNS record
(`dns_type != "none"`, covers both proxied and non-proxied so Immich and other
direct-A records are also monitored).
- Emits two annotations on the Ingress:
- `uptime.viktorbarzin.me/external-monitor = "true"`
- `uptime.viktorbarzin.me/external-monitor-name = "<label>"` (optional override)
### external-monitor-sync CronJob (uptime-kuma stack)
- Discovers targets from live Ingress objects via the K8s API first (filter by
annotation), falls back to the legacy `external-monitor-targets` ConfigMap on any
API error (zero rollout risk).
- New `ServiceAccount` + cluster-wide `ClusterRole`/`ClusterRoleBinding` giving
`list`/`get` on `networking.k8s.io/ingresses`.
- `API_SERVER` now uses the `KUBERNETES_SERVICE_HOST` env var (always injected by K8s)
instead of `kubernetes.default.svc` — the search-domain expansion failed in the
CronJob pod's DNS config. Verified working: CronJob now logs
`Loaded N external monitor targets (source=k8s-api)`.
### actualbudget count-on-unknown refactor
- Replaced `count = var.budget_encryption_password != null ? 1 : 0` with two explicit
plan-time booleans: `enable_http_api` and `enable_bank_sync`. Values are known at
plan; no `-target` workaround needed.
- Callers (`stacks/actualbudget/main.tf`) pass `true` explicitly. Runtime behaviour is
unchanged — the secret is still consumed via env var.
- Also aligned the factory with live state (the 3 budget-* PVCs had been migrated
`proxmox-lvm` → `proxmox-lvm-encrypted` outside Terraform): PVC resource renamed
`data_proxmox` → `data_encrypted`, storage class updated, orphaned `nfs_data` module
removed. State was rm'd + re-imported with matching UIDs, so no data was moved.
## Rollout status (already partially applied in this session)
- `stacks/uptime-kuma` applied — SA + RBAC + CronJob changes live; FQDN fix verified
- `stacks/actualbudget` applied — budget-{viktor,anca,emo} all 200 OK externally
- `stacks/mailserver` + 21 other ingress_factory consumers applied — annotations live
- CronJob `external-monitor-sync` latest run: `source=k8s-api`, 26 monitors active
(was 13 on the central list)
## Deferred (separate work)
- 4 stacks show pre-existing DESTRUCTIVE drift in plan (metallb namespace, claude-memory,
rbac, redis) — NOT triggered by this commit but will be by CI's global-file cascade.
`[ci skip]` here so those don't auto-apply; they will be fixed manually before the
next CI push.
- Cleanup of `cloudflare_proxied_names` list once Helm-managed ingresses (authentik,
grafana, vault, forgejo) are annotated — separate PR.
## Test plan
### Automated
\`\`\`
\$ kubectl -n uptime-kuma logs \$(kubectl -n uptime-kuma get pods -l job-name -o name | tail -1)
Loaded 26 external monitor targets (source=k8s-api)
Sync complete: 7 created, 0 deleted, 17 unchanged
\$ curl -sk -o /dev/null -w "%{http_code}\n" -H "Accept: text/html" \\
https://dawarich.viktorbarzin.me/ https://nextcloud.viktorbarzin.me/ \\
https://budget-viktor.viktorbarzin.me/
200 302 200
\$ kubectl -n actualbudget get deploy,pvc -l app=budget-viktor
deployment.apps/budget-viktor 1/1 1 1 Ready
persistentvolumeclaim/budget-viktor-data-encrypted Bound 10Gi RWO proxmox-lvm-encrypted
\`\`\`
### Manual Verification
1. Confirm the annotation is present on an ingress_factory ingress:
\`\`\`
kubectl -n dawarich get ingress dawarich -o \\
jsonpath='{.metadata.annotations.uptime\.viktorbarzin\.me/external-monitor}'
# Expected: "true"
\`\`\`
2. Confirm the new `[External] <name>` monitor appears in Uptime Kuma within 10 min
(CronJob interval). For Immich specifically, it will appear after the immich stack
is re-applied.
3. Verify actualbudget plan is clean:
\`\`\`
cd stacks/actualbudget && scripts/tg plan --non-interactive
# Expected: no "Invalid count argument" errors
\`\`\`
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 10:34:32 +00:00
# External monitor enabled by default when the ingress has a public DNS
# record (either CF-proxied or direct A/AAAA). Explicit bool overrides.
effective_external_monitor = var . external_monitor ! = null ? var . external_monitor : ( var . dns_type ! = " none " )
2026-04-19 15:18:27 +00:00
# Emit the annotation when effective is true (positive signal), or when the
# caller explicitly set external_monitor=false (opt-out). When the caller
# leaves it null AND dns_type="none", emit nothing — the sync script's
# default opt-in (any *.viktorbarzin.me ingress) keeps monitoring services
# that are publicly reachable via routes we don't manage here (e.g.
# helm-provisioned ingresses, services behind cloudflared tunnel with DNS
# set elsewhere).
[infra] Per-ingress external-monitor annotation + actualbudget plan-time fix [ci skip]
## Context
Two operational gaps surfaced during a healthcheck sweep today:
1. **External monitoring coverage**: Only ~13 hostnames (via `cloudflare_proxied_names`
in `config.tfvars`) had `[External]` monitors in Uptime Kuma. Any service deployed via
`ingress_factory` with `dns_type = "proxied"` auto-created its DNS record but was NOT
registered for external probing — so outages like Immich going down externally were
invisible until a user complained. 99 of ~125 public ingresses had no external
monitor.
2. **actualbudget stack unplannable**: `count = var.budget_encryption_password != null
? 1 : 0` in `factory/main.tf:152` failed with "Invalid count argument" because the
value flows from a `data.kubernetes_secret` whose contents are `(known after apply)`
at plan time. Blocked CI applies and drift reconciliation.
## This change
### Per-ingress external-monitor annotation (ingress_factory + reverse_proxy/factory)
- New variables `external_monitor` (bool, nullable) + `external_monitor_name` (string,
nullable). Default is "follow dns_type" — enabled for any public DNS record
(`dns_type != "none"`, covers both proxied and non-proxied so Immich and other
direct-A records are also monitored).
- Emits two annotations on the Ingress:
- `uptime.viktorbarzin.me/external-monitor = "true"`
- `uptime.viktorbarzin.me/external-monitor-name = "<label>"` (optional override)
### external-monitor-sync CronJob (uptime-kuma stack)
- Discovers targets from live Ingress objects via the K8s API first (filter by
annotation), falls back to the legacy `external-monitor-targets` ConfigMap on any
API error (zero rollout risk).
- New `ServiceAccount` + cluster-wide `ClusterRole`/`ClusterRoleBinding` giving
`list`/`get` on `networking.k8s.io/ingresses`.
- `API_SERVER` now uses the `KUBERNETES_SERVICE_HOST` env var (always injected by K8s)
instead of `kubernetes.default.svc` — the search-domain expansion failed in the
CronJob pod's DNS config. Verified working: CronJob now logs
`Loaded N external monitor targets (source=k8s-api)`.
### actualbudget count-on-unknown refactor
- Replaced `count = var.budget_encryption_password != null ? 1 : 0` with two explicit
plan-time booleans: `enable_http_api` and `enable_bank_sync`. Values are known at
plan; no `-target` workaround needed.
- Callers (`stacks/actualbudget/main.tf`) pass `true` explicitly. Runtime behaviour is
unchanged — the secret is still consumed via env var.
- Also aligned the factory with live state (the 3 budget-* PVCs had been migrated
`proxmox-lvm` → `proxmox-lvm-encrypted` outside Terraform): PVC resource renamed
`data_proxmox` → `data_encrypted`, storage class updated, orphaned `nfs_data` module
removed. State was rm'd + re-imported with matching UIDs, so no data was moved.
## Rollout status (already partially applied in this session)
- `stacks/uptime-kuma` applied — SA + RBAC + CronJob changes live; FQDN fix verified
- `stacks/actualbudget` applied — budget-{viktor,anca,emo} all 200 OK externally
- `stacks/mailserver` + 21 other ingress_factory consumers applied — annotations live
- CronJob `external-monitor-sync` latest run: `source=k8s-api`, 26 monitors active
(was 13 on the central list)
## Deferred (separate work)
- 4 stacks show pre-existing DESTRUCTIVE drift in plan (metallb namespace, claude-memory,
rbac, redis) — NOT triggered by this commit but will be by CI's global-file cascade.
`[ci skip]` here so those don't auto-apply; they will be fixed manually before the
next CI push.
- Cleanup of `cloudflare_proxied_names` list once Helm-managed ingresses (authentik,
grafana, vault, forgejo) are annotated — separate PR.
## Test plan
### Automated
\`\`\`
\$ kubectl -n uptime-kuma logs \$(kubectl -n uptime-kuma get pods -l job-name -o name | tail -1)
Loaded 26 external monitor targets (source=k8s-api)
Sync complete: 7 created, 0 deleted, 17 unchanged
\$ curl -sk -o /dev/null -w "%{http_code}\n" -H "Accept: text/html" \\
https://dawarich.viktorbarzin.me/ https://nextcloud.viktorbarzin.me/ \\
https://budget-viktor.viktorbarzin.me/
200 302 200
\$ kubectl -n actualbudget get deploy,pvc -l app=budget-viktor
deployment.apps/budget-viktor 1/1 1 1 Ready
persistentvolumeclaim/budget-viktor-data-encrypted Bound 10Gi RWO proxmox-lvm-encrypted
\`\`\`
### Manual Verification
1. Confirm the annotation is present on an ingress_factory ingress:
\`\`\`
kubectl -n dawarich get ingress dawarich -o \\
jsonpath='{.metadata.annotations.uptime\.viktorbarzin\.me/external-monitor}'
# Expected: "true"
\`\`\`
2. Confirm the new `[External] <name>` monitor appears in Uptime Kuma within 10 min
(CronJob interval). For Immich specifically, it will appear after the immich stack
is re-applied.
3. Verify actualbudget plan is clean:
\`\`\`
cd stacks/actualbudget && scripts/tg plan --non-interactive
# Expected: no "Invalid count argument" errors
\`\`\`
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 10:34:32 +00:00
external_monitor_annotations = local . effective_external_monitor ? merge (
{ " uptime.viktorbarzin.me/external-monitor " = " true " } ,
var . external_monitor_name ! = null ? { " uptime.viktorbarzin.me/external-monitor-name " = var . external_monitor_name } : { } ,
2026-04-19 15:18:27 +00:00
) : ( var . external_monitor == false ?
{ " uptime.viktorbarzin.me/external-monitor " = " false " } : { }
)
[infra] Per-ingress external-monitor annotation + actualbudget plan-time fix [ci skip]
## Context
Two operational gaps surfaced during a healthcheck sweep today:
1. **External monitoring coverage**: Only ~13 hostnames (via `cloudflare_proxied_names`
in `config.tfvars`) had `[External]` monitors in Uptime Kuma. Any service deployed via
`ingress_factory` with `dns_type = "proxied"` auto-created its DNS record but was NOT
registered for external probing — so outages like Immich going down externally were
invisible until a user complained. 99 of ~125 public ingresses had no external
monitor.
2. **actualbudget stack unplannable**: `count = var.budget_encryption_password != null
? 1 : 0` in `factory/main.tf:152` failed with "Invalid count argument" because the
value flows from a `data.kubernetes_secret` whose contents are `(known after apply)`
at plan time. Blocked CI applies and drift reconciliation.
## This change
### Per-ingress external-monitor annotation (ingress_factory + reverse_proxy/factory)
- New variables `external_monitor` (bool, nullable) + `external_monitor_name` (string,
nullable). Default is "follow dns_type" — enabled for any public DNS record
(`dns_type != "none"`, covers both proxied and non-proxied so Immich and other
direct-A records are also monitored).
- Emits two annotations on the Ingress:
- `uptime.viktorbarzin.me/external-monitor = "true"`
- `uptime.viktorbarzin.me/external-monitor-name = "<label>"` (optional override)
### external-monitor-sync CronJob (uptime-kuma stack)
- Discovers targets from live Ingress objects via the K8s API first (filter by
annotation), falls back to the legacy `external-monitor-targets` ConfigMap on any
API error (zero rollout risk).
- New `ServiceAccount` + cluster-wide `ClusterRole`/`ClusterRoleBinding` giving
`list`/`get` on `networking.k8s.io/ingresses`.
- `API_SERVER` now uses the `KUBERNETES_SERVICE_HOST` env var (always injected by K8s)
instead of `kubernetes.default.svc` — the search-domain expansion failed in the
CronJob pod's DNS config. Verified working: CronJob now logs
`Loaded N external monitor targets (source=k8s-api)`.
### actualbudget count-on-unknown refactor
- Replaced `count = var.budget_encryption_password != null ? 1 : 0` with two explicit
plan-time booleans: `enable_http_api` and `enable_bank_sync`. Values are known at
plan; no `-target` workaround needed.
- Callers (`stacks/actualbudget/main.tf`) pass `true` explicitly. Runtime behaviour is
unchanged — the secret is still consumed via env var.
- Also aligned the factory with live state (the 3 budget-* PVCs had been migrated
`proxmox-lvm` → `proxmox-lvm-encrypted` outside Terraform): PVC resource renamed
`data_proxmox` → `data_encrypted`, storage class updated, orphaned `nfs_data` module
removed. State was rm'd + re-imported with matching UIDs, so no data was moved.
## Rollout status (already partially applied in this session)
- `stacks/uptime-kuma` applied — SA + RBAC + CronJob changes live; FQDN fix verified
- `stacks/actualbudget` applied — budget-{viktor,anca,emo} all 200 OK externally
- `stacks/mailserver` + 21 other ingress_factory consumers applied — annotations live
- CronJob `external-monitor-sync` latest run: `source=k8s-api`, 26 monitors active
(was 13 on the central list)
## Deferred (separate work)
- 4 stacks show pre-existing DESTRUCTIVE drift in plan (metallb namespace, claude-memory,
rbac, redis) — NOT triggered by this commit but will be by CI's global-file cascade.
`[ci skip]` here so those don't auto-apply; they will be fixed manually before the
next CI push.
- Cleanup of `cloudflare_proxied_names` list once Helm-managed ingresses (authentik,
grafana, vault, forgejo) are annotated — separate PR.
## Test plan
### Automated
\`\`\`
\$ kubectl -n uptime-kuma logs \$(kubectl -n uptime-kuma get pods -l job-name -o name | tail -1)
Loaded 26 external monitor targets (source=k8s-api)
Sync complete: 7 created, 0 deleted, 17 unchanged
\$ curl -sk -o /dev/null -w "%{http_code}\n" -H "Accept: text/html" \\
https://dawarich.viktorbarzin.me/ https://nextcloud.viktorbarzin.me/ \\
https://budget-viktor.viktorbarzin.me/
200 302 200
\$ kubectl -n actualbudget get deploy,pvc -l app=budget-viktor
deployment.apps/budget-viktor 1/1 1 1 Ready
persistentvolumeclaim/budget-viktor-data-encrypted Bound 10Gi RWO proxmox-lvm-encrypted
\`\`\`
### Manual Verification
1. Confirm the annotation is present on an ingress_factory ingress:
\`\`\`
kubectl -n dawarich get ingress dawarich -o \\
jsonpath='{.metadata.annotations.uptime\.viktorbarzin\.me/external-monitor}'
# Expected: "true"
\`\`\`
2. Confirm the new `[External] <name>` monitor appears in Uptime Kuma within 10 min
(CronJob interval). For Immich specifically, it will appear after the immich stack
is re-applied.
3. Verify actualbudget plan is clean:
\`\`\`
cd stacks/actualbudget && scripts/tg plan --non-interactive
# Expected: no "Invalid count argument" errors
\`\`\`
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 10:34:32 +00:00
2026-03-25 11:00:38 +02:00
ns_to_group = {
[infra] Per-ingress external-monitor annotation + actualbudget plan-time fix [ci skip]
## Context
Two operational gaps surfaced during a healthcheck sweep today:
1. **External monitoring coverage**: Only ~13 hostnames (via `cloudflare_proxied_names`
in `config.tfvars`) had `[External]` monitors in Uptime Kuma. Any service deployed via
`ingress_factory` with `dns_type = "proxied"` auto-created its DNS record but was NOT
registered for external probing — so outages like Immich going down externally were
invisible until a user complained. 99 of ~125 public ingresses had no external
monitor.
2. **actualbudget stack unplannable**: `count = var.budget_encryption_password != null
? 1 : 0` in `factory/main.tf:152` failed with "Invalid count argument" because the
value flows from a `data.kubernetes_secret` whose contents are `(known after apply)`
at plan time. Blocked CI applies and drift reconciliation.
## This change
### Per-ingress external-monitor annotation (ingress_factory + reverse_proxy/factory)
- New variables `external_monitor` (bool, nullable) + `external_monitor_name` (string,
nullable). Default is "follow dns_type" — enabled for any public DNS record
(`dns_type != "none"`, covers both proxied and non-proxied so Immich and other
direct-A records are also monitored).
- Emits two annotations on the Ingress:
- `uptime.viktorbarzin.me/external-monitor = "true"`
- `uptime.viktorbarzin.me/external-monitor-name = "<label>"` (optional override)
### external-monitor-sync CronJob (uptime-kuma stack)
- Discovers targets from live Ingress objects via the K8s API first (filter by
annotation), falls back to the legacy `external-monitor-targets` ConfigMap on any
API error (zero rollout risk).
- New `ServiceAccount` + cluster-wide `ClusterRole`/`ClusterRoleBinding` giving
`list`/`get` on `networking.k8s.io/ingresses`.
- `API_SERVER` now uses the `KUBERNETES_SERVICE_HOST` env var (always injected by K8s)
instead of `kubernetes.default.svc` — the search-domain expansion failed in the
CronJob pod's DNS config. Verified working: CronJob now logs
`Loaded N external monitor targets (source=k8s-api)`.
### actualbudget count-on-unknown refactor
- Replaced `count = var.budget_encryption_password != null ? 1 : 0` with two explicit
plan-time booleans: `enable_http_api` and `enable_bank_sync`. Values are known at
plan; no `-target` workaround needed.
- Callers (`stacks/actualbudget/main.tf`) pass `true` explicitly. Runtime behaviour is
unchanged — the secret is still consumed via env var.
- Also aligned the factory with live state (the 3 budget-* PVCs had been migrated
`proxmox-lvm` → `proxmox-lvm-encrypted` outside Terraform): PVC resource renamed
`data_proxmox` → `data_encrypted`, storage class updated, orphaned `nfs_data` module
removed. State was rm'd + re-imported with matching UIDs, so no data was moved.
## Rollout status (already partially applied in this session)
- `stacks/uptime-kuma` applied — SA + RBAC + CronJob changes live; FQDN fix verified
- `stacks/actualbudget` applied — budget-{viktor,anca,emo} all 200 OK externally
- `stacks/mailserver` + 21 other ingress_factory consumers applied — annotations live
- CronJob `external-monitor-sync` latest run: `source=k8s-api`, 26 monitors active
(was 13 on the central list)
## Deferred (separate work)
- 4 stacks show pre-existing DESTRUCTIVE drift in plan (metallb namespace, claude-memory,
rbac, redis) — NOT triggered by this commit but will be by CI's global-file cascade.
`[ci skip]` here so those don't auto-apply; they will be fixed manually before the
next CI push.
- Cleanup of `cloudflare_proxied_names` list once Helm-managed ingresses (authentik,
grafana, vault, forgejo) are annotated — separate PR.
## Test plan
### Automated
\`\`\`
\$ kubectl -n uptime-kuma logs \$(kubectl -n uptime-kuma get pods -l job-name -o name | tail -1)
Loaded 26 external monitor targets (source=k8s-api)
Sync complete: 7 created, 0 deleted, 17 unchanged
\$ curl -sk -o /dev/null -w "%{http_code}\n" -H "Accept: text/html" \\
https://dawarich.viktorbarzin.me/ https://nextcloud.viktorbarzin.me/ \\
https://budget-viktor.viktorbarzin.me/
200 302 200
\$ kubectl -n actualbudget get deploy,pvc -l app=budget-viktor
deployment.apps/budget-viktor 1/1 1 1 Ready
persistentvolumeclaim/budget-viktor-data-encrypted Bound 10Gi RWO proxmox-lvm-encrypted
\`\`\`
### Manual Verification
1. Confirm the annotation is present on an ingress_factory ingress:
\`\`\`
kubectl -n dawarich get ingress dawarich -o \\
jsonpath='{.metadata.annotations.uptime\.viktorbarzin\.me/external-monitor}'
# Expected: "true"
\`\`\`
2. Confirm the new `[External] <name>` monitor appears in Uptime Kuma within 10 min
(CronJob interval). For Immich specifically, it will appear after the immich stack
is re-applied.
3. Verify actualbudget plan is clean:
\`\`\`
cd stacks/actualbudget && scripts/tg plan --non-interactive
# Expected: no "Invalid count argument" errors
\`\`\`
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 10:34:32 +00:00
monitoring = " Infrastructure "
prometheus = " Infrastructure "
technitium = " Infrastructure "
traefik = " Infrastructure "
metallb - system = " Infrastructure "
kyverno = " Infrastructure "
authentik = " Identity & Security "
crowdsec = " Identity & Security "
woodpecker = " Development & CI "
forgejo = " Development & CI "
immich = " Media & Entertainment "
frigate = " Smart Home "
home - assistant = " Smart Home "
ollama = " AI & Data "
dbaas = " Infrastructure "
servarr = " Media & Entertainment "
navidrome = " Media & Entertainment "
nextcloud = " Productivity "
n8n = " Automation "
changedetection = " Automation "
finance = " Finance & Personal "
homepage = " Core Platform "
reverse - proxy = " Smart Home "
mailserver = " Infrastructure "
2026-03-25 11:00:38 +02:00
}
homepage_group = coalesce (
var . homepage_group ,
lookup ( local . ns_to_group , var . namespace , " Other " )
)
2026-04-16 13:45:04 +00:00
dns_name = local . effective_host == var . root_domain ? " @ " : replace ( local . effective_host , " . ${ var . root_domain } " , " " )
2026-03-25 11:00:38 +02:00
homepage_defaults = var . homepage_enabled ? {
" gethomepage.dev/enabled " = " true "
" gethomepage.dev/name " = replace ( replace ( var . name , " - " , " " ) , " _ " , " " )
" gethomepage.dev/group " = local . homepage_group
" gethomepage.dev/href " = " https:// ${ local . effective_host } "
" gethomepage.dev/icon " = " ${ replace ( var . name , " - " , " " ) } .png "
} : { }
2026-05-07 15:51:34 +00:00
# Parse "5g"/"50m"/"1024k"/"42" into bytes. Traefik's Buffering middleware
# takes maxRequestBodyBytes as an integer. Empty unit = bytes.
body_size_match = var . max_body_size == null ? null : regex ( " ^([0-9]+)([kmgKMG]?) $ " , var . max_body_size )
body_size_unit_multiplier = var . max_body_size == null ? 0 : (
lower ( local . body_size_match [ 1 ] ) = = " g " ? 1073741824 :
lower ( local . body_size_match [ 1 ] ) = = " m " ? 1048576 :
lower ( local . body_size_match [ 1 ] ) = = " k " ? 1024 :
1
)
max_body_size_bytes = var . max_body_size == null ? 0 : tonumber ( local . body_size_match [ 0 ] ) * local . body_size_unit_multiplier
2026-02-10 21:11:46 +00:00
}
2025-01-14 20:12:26 +00:00
resource " kubernetes_service " " proxied-service " {
count = var . external_name == null ? 0 : 1
metadata {
name = var . name
namespace = var . namespace
labels = {
" app " = var . name
}
}
spec {
type = var . external_name ! = null ? " ExternalName " : " ClusterIP "
external_name = var . name
port {
name = " ${ var . name } -web "
port = var . port
protocol = " TCP "
target_port = var . port
}
}
}
resource " kubernetes_ingress_v1 " " proxied-ingress " {
metadata {
name = var . name
namespace = var . namespace
annotations = merge ( {
2026-02-10 21:11:46 +00:00
" traefik.ingress.kubernetes.io/router.middlewares " = join ( " , " , compact ( concat ( [
2026-03-01 14:35:53 +00:00
" traefik-retry@kubernetescrd " ,
2026-03-19 23:14:27 +00:00
" traefik-error-pages@kubernetescrd " ,
2026-02-10 21:11:46 +00:00
var . skip_default_rate_limit ? null : " traefik-rate-limit@kubernetescrd " ,
2026-02-07 13:24:58 +00:00
var . custom_content_security_policy == null ? " traefik-csp-headers@kubernetescrd " : null ,
var . exclude_crowdsec ? null : " traefik-crowdsec@kubernetescrd " ,
2026-03-29 01:04:11 +02:00
local . effective_anti_ai ? " traefik-ai-bot-block@kubernetescrd " : null ,
local . effective_anti_ai ? " traefik-anti-ai-headers@kubernetescrd " : null ,
ingress_factory: replace `protected` bool with `auth` enum + audit pass across 100 stacks
Phase 3+4 of default-deny ingress plan. Replaces the `protected = bool` (default
false → unprotected) variable in `modules/kubernetes/ingress_factory` with
`auth = string` enum (default "required" → fail-closed). Touches every
ingress_factory caller so the audit decision is recorded explicitly in code.
ingress_factory (Phase 3):
- `auth = "required"`: standard Authentik forward-auth (the legacy
`protected = true` semantic).
- `auth = "public"`: forward-auth via the new `authentik-forward-auth-public`
middleware → dedicated public outpost → guest auto-bind. Logged-in users
keep their real identity.
- `auth = "none"`: no Authentik middleware. For Anubis-fronted content, native
client APIs (Git, /v2/, WebDAV), webhook receivers, the Authentik outpost
itself.
- `effective_anti_ai` default flips ON only when `auth = "none"` (auth-gated
ingresses don't need anti-AI noise; the auth flow already discourages bots).
Audit pass (Phase 4) across 96 ingress_factory call sites:
- 49 explicit `protected = true` → `auth = "required"`
- 8 explicit `protected = false` → `auth = "none"` (5) or `auth = "public"` (3)
- 64 previously-default (no protected line) → `auth = "required"` ADDED, then
reviewed individually:
* 9 Anubis-fronted (blog, www, kms, travel, f1, cyberchef, jsoncrack,
homepage, wrongmove UI, privatebin) → `auth = "none"`
* 22 native-client / programmatic surfaces (Forgejo Git+/v2/, webhook
handler, claude-memory MCP, Nextcloud WebDAV, Matrix, Vault CLI/OIDC,
xray VPN, ntfy, woodpecker webhooks, n8n triggers, ntfy push, dawarich
location ingestion, immich frame kiosk, headscale CP, send anonymous
drops, rybbit beacon, vaultwarden API, Authentik UI itself + outposts) →
`auth = "none"`
* Remaining ~33 → `auth = "required"` confirmed (admin tools, internal
UIs, services without app-level auth)
- Smoke-test promotions to `auth = "public"`: fire-planner public UI,
k8s-portal API, insta2spotify callback.
Three call sites in wrapper modules (`stacks/freedify/factory/`,
`stacks/reverse-proxy/modules/reverse_proxy/`) keep their internal `protected`
bool — they translate to `auth` internally, out of scope for this rename.
Behavior change: previously-default ingresses now fail closed (require
Authentik login) unless explicitly flipped to `auth = "none"` or
`auth = "public"`. This is the audit goal — no more accidentally-unprotected
surfaces. Sites that were intentionally public (Anubis content, native APIs,
webhooks) are now explicitly recorded as `auth = "none"`.
Drive-by: `modules/create-vm/main.tf` picked up cosmetic alignment via
`terraform fmt -recursive` during the audit. Behavior-neutral.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-10 18:53:49 +00:00
local . auth_middleware ,
2026-02-07 13:24:58 +00:00
var . allow_local_access_only ? " traefik-local-only@kubernetescrd " : null ,
var . custom_content_security_policy ! = null ? " ${ var . namespace } -custom-csp- ${ var . name } @kubernetescrd " : null ,
2026-05-07 15:51:34 +00:00
var . max_body_size ! = null ? " ${ var . namespace } -buffering- ${ var . name } @kubernetescrd " : null ,
2026-02-10 21:11:46 +00:00
] , var . extra_middlewares ) ) )
2026-02-07 13:24:58 +00:00
" traefik.ingress.kubernetes.io/router.entrypoints " = " websecure "
[infra] Per-ingress external-monitor annotation + actualbudget plan-time fix [ci skip]
## Context
Two operational gaps surfaced during a healthcheck sweep today:
1. **External monitoring coverage**: Only ~13 hostnames (via `cloudflare_proxied_names`
in `config.tfvars`) had `[External]` monitors in Uptime Kuma. Any service deployed via
`ingress_factory` with `dns_type = "proxied"` auto-created its DNS record but was NOT
registered for external probing — so outages like Immich going down externally were
invisible until a user complained. 99 of ~125 public ingresses had no external
monitor.
2. **actualbudget stack unplannable**: `count = var.budget_encryption_password != null
? 1 : 0` in `factory/main.tf:152` failed with "Invalid count argument" because the
value flows from a `data.kubernetes_secret` whose contents are `(known after apply)`
at plan time. Blocked CI applies and drift reconciliation.
## This change
### Per-ingress external-monitor annotation (ingress_factory + reverse_proxy/factory)
- New variables `external_monitor` (bool, nullable) + `external_monitor_name` (string,
nullable). Default is "follow dns_type" — enabled for any public DNS record
(`dns_type != "none"`, covers both proxied and non-proxied so Immich and other
direct-A records are also monitored).
- Emits two annotations on the Ingress:
- `uptime.viktorbarzin.me/external-monitor = "true"`
- `uptime.viktorbarzin.me/external-monitor-name = "<label>"` (optional override)
### external-monitor-sync CronJob (uptime-kuma stack)
- Discovers targets from live Ingress objects via the K8s API first (filter by
annotation), falls back to the legacy `external-monitor-targets` ConfigMap on any
API error (zero rollout risk).
- New `ServiceAccount` + cluster-wide `ClusterRole`/`ClusterRoleBinding` giving
`list`/`get` on `networking.k8s.io/ingresses`.
- `API_SERVER` now uses the `KUBERNETES_SERVICE_HOST` env var (always injected by K8s)
instead of `kubernetes.default.svc` — the search-domain expansion failed in the
CronJob pod's DNS config. Verified working: CronJob now logs
`Loaded N external monitor targets (source=k8s-api)`.
### actualbudget count-on-unknown refactor
- Replaced `count = var.budget_encryption_password != null ? 1 : 0` with two explicit
plan-time booleans: `enable_http_api` and `enable_bank_sync`. Values are known at
plan; no `-target` workaround needed.
- Callers (`stacks/actualbudget/main.tf`) pass `true` explicitly. Runtime behaviour is
unchanged — the secret is still consumed via env var.
- Also aligned the factory with live state (the 3 budget-* PVCs had been migrated
`proxmox-lvm` → `proxmox-lvm-encrypted` outside Terraform): PVC resource renamed
`data_proxmox` → `data_encrypted`, storage class updated, orphaned `nfs_data` module
removed. State was rm'd + re-imported with matching UIDs, so no data was moved.
## Rollout status (already partially applied in this session)
- `stacks/uptime-kuma` applied — SA + RBAC + CronJob changes live; FQDN fix verified
- `stacks/actualbudget` applied — budget-{viktor,anca,emo} all 200 OK externally
- `stacks/mailserver` + 21 other ingress_factory consumers applied — annotations live
- CronJob `external-monitor-sync` latest run: `source=k8s-api`, 26 monitors active
(was 13 on the central list)
## Deferred (separate work)
- 4 stacks show pre-existing DESTRUCTIVE drift in plan (metallb namespace, claude-memory,
rbac, redis) — NOT triggered by this commit but will be by CI's global-file cascade.
`[ci skip]` here so those don't auto-apply; they will be fixed manually before the
next CI push.
- Cleanup of `cloudflare_proxied_names` list once Helm-managed ingresses (authentik,
grafana, vault, forgejo) are annotated — separate PR.
## Test plan
### Automated
\`\`\`
\$ kubectl -n uptime-kuma logs \$(kubectl -n uptime-kuma get pods -l job-name -o name | tail -1)
Loaded 26 external monitor targets (source=k8s-api)
Sync complete: 7 created, 0 deleted, 17 unchanged
\$ curl -sk -o /dev/null -w "%{http_code}\n" -H "Accept: text/html" \\
https://dawarich.viktorbarzin.me/ https://nextcloud.viktorbarzin.me/ \\
https://budget-viktor.viktorbarzin.me/
200 302 200
\$ kubectl -n actualbudget get deploy,pvc -l app=budget-viktor
deployment.apps/budget-viktor 1/1 1 1 Ready
persistentvolumeclaim/budget-viktor-data-encrypted Bound 10Gi RWO proxmox-lvm-encrypted
\`\`\`
### Manual Verification
1. Confirm the annotation is present on an ingress_factory ingress:
\`\`\`
kubectl -n dawarich get ingress dawarich -o \\
jsonpath='{.metadata.annotations.uptime\.viktorbarzin\.me/external-monitor}'
# Expected: "true"
\`\`\`
2. Confirm the new `[External] <name>` monitor appears in Uptime Kuma within 10 min
(CronJob interval). For Immich specifically, it will appear after the immich stack
is re-applied.
3. Verify actualbudget plan is clean:
\`\`\`
cd stacks/actualbudget && scripts/tg plan --non-interactive
# Expected: no "Invalid count argument" errors
\`\`\`
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 10:34:32 +00:00
} , local . homepage_defaults , var . extra_annotations ,
var . dns_type ! = " none " ? { " cloudflare.viktorbarzin.me/dns-type " = var . dns_type } : { } ,
local . external_monitor_annotations ,
2026-04-16 13:45:04 +00:00
)
2025-01-14 20:12:26 +00:00
}
2025-12-18 08:53:16 +00:00
2026-02-07 13:24:58 +00:00
spec {
ingress_class_name = " traefik "
tls {
2026-02-10 21:11:46 +00:00
hosts = [ local . effective_host ]
2026-02-07 13:24:58 +00:00
secret_name = var . tls_secret_name
}
rule {
2026-02-10 21:11:46 +00:00
host = local . effective_host
2026-02-07 13:24:58 +00:00
http {
dynamic " path " {
for_each = var . ingress_path
content {
path = path . value
backend {
service {
name = var . service_name ! = null ? var . service_name : var . name
port {
number = var . port
}
2025-01-14 20:12:26 +00:00
}
}
}
}
}
}
}
}
2026-02-07 13:24:58 +00:00
# Custom CSP headers middleware - created per service when custom_content_security_policy is set
resource " kubernetes_manifest " " custom_csp " {
count = var . custom_content_security_policy ! = null ? 1 : 0
manifest = {
apiVersion = " traefik.io/v1alpha1 "
kind = " Middleware "
metadata = {
name = " custom-csp- ${ var . name } "
namespace = var . namespace
}
spec = {
headers = {
contentSecurityPolicy = var . custom_content_security_policy
}
}
}
}
2026-04-16 13:45:04 +00:00
2026-05-07 15:51:34 +00:00
# Buffering middleware - created per service when max_body_size is set.
# Traefik default is unlimited; setting maxRequestBodyBytes enforces a limit
# (e.g. Forgejo container pushes can ship multi-GB layer blobs).
resource " kubernetes_manifest " " buffering " {
count = var . max_body_size ! = null ? 1 : 0
manifest = {
apiVersion = " traefik.io/v1alpha1 "
kind = " Middleware "
metadata = {
name = " buffering- ${ var . name } "
namespace = var . namespace
}
spec = {
buffering = {
maxRequestBodyBytes = local . max_body_size_bytes
}
}
}
}
2026-04-16 13:45:04 +00:00
# Cloudflare DNS records — created automatically when dns_type is set.
# Proxied: CNAME to Cloudflare tunnel. Non-proxied: A + AAAA to public IP.
resource " cloudflare_record " " proxied " {
count = var . dns_type == " proxied " ? 1 : 0
name = local . dns_name
content = " ${ var . cloudflare_tunnel_id } .cfargotunnel.com "
proxied = true
ttl = 1
type = " CNAME "
zone_id = var . cloudflare_zone_id
allow_overwrite = true
}
resource " cloudflare_record " " non_proxied_a " {
count = var . dns_type == " non-proxied " ? 1 : 0
name = local . dns_name
content = var . public_ip
proxied = false
ttl = 1
type = " A "
zone_id = var . cloudflare_zone_id
allow_overwrite = true
}
resource " cloudflare_record " " non_proxied_aaaa " {
count = var . dns_type == " non-proxied " ? 1 : 0
name = local . dns_name
content = var . public_ipv6
proxied = false
ttl = 1
type = " AAAA "
zone_id = var . cloudflare_zone_id
allow_overwrite = true
}