infra/CONTEXT.md
Viktor Barzin 7e5e0e7080 docs: add CONTEXT.md domain glossary [ci skip]
Adds the per-repo domain glossary that engineering skills
(diagnose, tdd, improve-codebase-architecture, grill-with-docs)
read before working in this repo. Terms only — no implementation
detail. Six clusters (code organization, cluster, networking,
storage, secrets, CI/CD), 22 terms, plus relationships, an example
dialogue, and five flagged ambiguities.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 11:48:19 +00:00

150 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Infra
Terragrunt-managed homelab declaring a 5-node Kubernetes cluster on a single Proxmox host. Vault is the secrets source of truth; everything else flows from this repo via `scripts/tg apply`.
## Language
### Code organization
**Service**:
The deployed app as a domain concept — one logical thing that runs in the cluster (e.g. immich, technitium, freshrss). Defined by exactly one **Stack**.
_Avoid_: bare "app" without the Service definition; "deployment" (collides with K8s `Deployment`).
**Stack**:
The HCL directory under `stacks/<name>/` that defines a Service, applied independently with `scripts/tg apply`. A Stack is the unit of Terraform organisation; a Service is the running thing. They are 1:1 but not synonyms.
_Avoid_: using "Stack" when you mean the running Service.
**Module**:
A reusable HCL primitive under `modules/`, consumed by Stacks via `source =`.
_Avoid_: "library", "package".
**Factory module**:
A Module that hides convention (defaults, drift handling, secret wiring) behind a small input surface. Canonical examples: `ingress_factory`, `nfs_volume`, `k8s_app`, `helm_app`, `postgres_app`.
_Avoid_: "wrapper".
**State tier**:
Terraform state-backend partition. **Tier 0** = bootstrap Stacks (`infra`, `platform`, `cnpg`, `vault`, `dbaas`, `external-secrets`) on local SOPS-encrypted state. **Tier 1** = every other Stack, on PG-backed state.
_Avoid_: "phase", "bootstrap stack" — say Tier 0 explicitly.
### Cluster
**Node**:
A K8s worker VM (`k8s-master`, `k8s-node1..4`). Default reading of the bare word "node" in this repo.
_Avoid_: "k8s node" (redundant), "host" (ambiguous).
**PVE node** / **PVE host**:
The single physical Dell R730 running Proxmox; sole hypervisor and sole NFS server. There is exactly one.
_Avoid_: "server", "hypervisor", "Proxmox" alone when you mean the host.
**Namespace tier**:
A namespace-prefix partition (`0-core-*`, `1-cluster-*`, `2-gpu-*`, `3-edge-*`, `4-aux-*`) driving PriorityClass, default resources, and ResourceQuota — generated by **Kyverno policy** from the namespace name. Orthogonal to **State tier**.
_Avoid_: "Service tier" (the partition is on the namespace, not the Service); collapsing Namespace tier with State tier — they are different axes.
**Kyverno policy**:
The convention engine of the cluster — a ClusterPolicy or Policy resource that mutates/generates/validates on admission. Owns Namespace tier limits/quotas, `dns_config` injection on every pod-owning workload, Forgejo pull-credential sync across namespaces, TLS-secret replication. When the repo says "this happens automatically", a Kyverno policy is usually the actor.
_Avoid_: bare "policy" (overloaded with Vault, RBAC, NetworkPolicy).
**Critical-path Service**:
One of {Traefik, Authentik, CrowdSec LAPI, PgBouncer, Cloudflared} — replicas ≥3, PDB enforced, monitored independently.
_Avoid_: "core service" (collides with the `0-core-*` Namespace tier name).
**Namespace-owner**:
A non-admin identity declared in `secret/platform → k8s_users` (JSON map). Owns one or more namespaces and one or more public subdomains.
_Avoid_: bare "user", "tenant".
### Networking
**Public domain**:
`viktorbarzin.me`, served through Cloudflare. DNS records are either **proxied** (Cloudflare CDN/WAF in front) or **non-proxied** (direct A/AAAA reachable via Cloudflared Tunnel).
_Avoid_: "external", "outside".
**Internal domain**:
`viktorbarzin.lan`, served by Technitium DNS. Resolves only inside the homelab network.
_Avoid_: bare "lan", "private", "intranet".
**Ingress auth tier**:
The `auth = "..."` parameter on `ingress_factory`, one of `required` (Authentik forward-auth gates every request), `app` (the backend owns its login), `public` (anonymous Authentik binding for audit only), or `none` (Anubis-fronted content, or native-client API).
_Avoid_: "auth mode" — the canonical key is `auth`.
**Authentik outpost**:
A standalone Authentik deployment that terminates the proxy/auth flow for a specific binding model. The repo runs two distinct ones: the default outpost (used by `auth = "required"`) and the `public` outpost (anonymous binding, used by `auth = "public"`).
_Avoid_: conflating outpost with Authentik core; "Authentik instance".
**Cloudflared Tunnel**:
The channel by which non-proxied **public domain** traffic reaches the cluster, terminating at Traefik. Backs every `dns_type = "non-proxied"` record and is the fallback path for the wildcard `*.viktorbarzin.me`.
_Avoid_: "the tunnel" without "Cloudflared" (could mean Headscale).
**Ingress chain**:
The opinionated stack of Traefik middlewares that `ingress_factory` layers onto every Ingress. Slots, in order: forward-auth (per **Ingress auth tier**) → anti-AI scraping (default-on when no Authentik is in the path) → CrowdSec bouncer (fail-open) → retry (2× / 100ms) → rate-limit (429, not 503). Adding or removing a middleware is a Stack-level choice, but the chain order is convention.
_Avoid_: "middleware list", "Traefik chain". The Anubis PoW gate is upstream of this chain, not inside it.
### Storage
**proxmox-lvm-encrypted**:
Default StorageClass for any workload holding sensitive data (databases, auth, password managers, email, financial data). LUKS2 over a Proxmox LVM-thin LV.
_Avoid_: bare "encrypted PVC" — name the StorageClass.
**proxmox-lvm**:
Block StorageClass for non-sensitive workloads (caches, monitoring data, indexes, app state without secrets).
**NFS volume**:
RWX file storage for shared media libraries, large datasets, or anything that needs to be inspected from outside K8s. Provisioned via the `nfs_volume` Module.
_Avoid_: "shared storage" (ambiguous).
**nfs-truenas StorageClass**:
A historical SC name retained only because StorageClass strings are immutable on bound PVs. The underlying server is the **PVE host**, not TrueNAS; TrueNAS is decommissioned.
_Avoid_: assuming this means TrueNAS.
**3-2-1 backup**:
The named posture of where data lives: **Copy 1** = live on the PVE thin pool (sdc), **Copy 2** = sda backup disk (`/mnt/backup`), **Copy 3** = offsite Synology NAS. Per-PVC file-level rsync from LVM thin snapshots; databases additionally dump to NFS for per-DB restore.
_Avoid_: bare "backup" without saying which copy you mean (a service is "backed up" only once it's on Copy 2; Copy 3 is the disaster floor).
### Secrets
**Vault path**:
Convention: `secret/<service>` for Service-owned secrets, `secret/viktor` for personal/global, `secret/platform` for cluster-wide maps (`k8s_users`, `homepage_credentials`).
_Avoid_: conflating Vault path (e.g. `secret/viktor`) with Vault field (e.g. `forgejo_pull_token`).
**ExternalSecret** / **ESO**:
A K8s manifest that materialises a Vault KV value as a K8s Secret. Two ClusterSecretStores: `vault-kv` (KV engine) and `vault-database` (rotating DB creds).
**Plan-time secret**:
A secret value read in Terraform via `data "kubernetes_secret"` (i.e. via the ESO-created K8s Secret) at plan time, with no Vault provider call. Distinct from a **vault data source** read (`data "vault_kv_secret_v2"`), which still goes through the Vault provider. A few Stacks remain hybrid (plan-time for env vars, vault data source for module inputs).
**Sealed Secret**:
A user-managed secret committed to a Stack directory as `sealed-*.yaml`. Distinct from ExternalSecret — Sealed Secrets carry their own bytes, ExternalSecrets reference Vault.
### CI/CD
**GHA build + Woodpecker deploy**:
The split where Docker images are built+pushed by GitHub Actions and Woodpecker only runs `kubectl set image` on a deploy-only pipeline. Repos that can't fit GHA limits stay on Woodpecker for build too.
_Avoid_: bare "Woodpecker pipeline" — say "build" or "deploy".
**Anubis**:
A PoW reverse-proxy issuing a 30-day JWT cookie, used in front of public content-bearing sites without app-level auth (blog, wiki, landing pages). Never in front of Git, WebDAV, CalDAV, or API endpoints (clients can't solve PoW).
## Relationships
- A **Service** is defined by exactly one **Stack**, which declares zero or more **Modules** and resolves to one or more K8s workloads.
- A **Namespace-owner** owns one or more namespaces and one or more public subdomains.
- A **Service** owns its **Vault path** at `secret/<service>`, surfaces values through **ExternalSecrets**, and reads them at plan time via **plan-time secrets**.
- An **Ingress** picks exactly one **Ingress auth tier**; the choice defines how strangers reach the backend.
- A **proxmox-lvm-encrypted** PVC binds to one Node at a time (RWO) and requires a Service-level backup CronJob; an **NFS volume** is RWX and is backed up at the host level via rsync.
- **State tier** and **Namespace tier** are orthogonal — a Tier 0 Stack can deploy a Service into any Namespace tier and vice versa.
## Example dialogue
> **Dev:** "I'm adding a new **Service** — FastAPI backend with its own JWT login. Do I need Authentik?"
> **Domain expert:** "If the FastAPI login is the gate, set `auth = "app"` on the ingress. That records the intent that you _chose_ not to layer Authentik — leave a one-line comment above stating what gates the Service, or `scripts/tg` will refuse the apply."
> **Dev:** "And storage?"
> **Domain expert:** "Does it hold user data? If yes, `proxmox-lvm-encrypted` — that's the default for anything sensitive. Add a backup CronJob writing to `/mnt/main/<service>-backup/`. If the data is just caches, plain `proxmox-lvm` is fine."
> **Dev:** "What about a Secret with the JWT signing key?"
> **Domain expert:** "Put the key in `secret/<service>` in Vault, then declare an **ExternalSecret** to materialise it as a K8s Secret. Read it at plan time with `data "kubernetes_secret"` — that keeps Vault out of the plan path."
## Flagged ambiguities
- **"tier"** is overloaded — *Namespace tier* (`0-core`..`4-aux`, scheduling priority) is distinct from *State tier* (Tier 0 / Tier 1, Terraform backend partition). Always qualify which axis.
- **"node"** can mean a K8s Node (default) or a PVE node. For Proxmox-level statements, say **PVE node** explicitly.
- **"service"** spans two distinct concepts: the deployed app (capitalised **Service**, this repo's domain noun) and the K8s `Service` object (in backticks or qualified "K8s Service"). Lowercase "service" in prose is fine when context disambiguates; flag it when it doesn't.
- **"secret"** spans Vault entries, K8s Secret objects, **ExternalSecrets**, and **Sealed Secrets**. Always specify which.
- **"proxied"** / **"non-proxied"** refer to Cloudflare's CDN posture for a DNS record, _not_ Anubis or forward-auth layering.