diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index bb1ce653..8d281743 100755 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -29,6 +29,7 @@ Violations cause state drift, which causes future applies to break or silently r - **New services need CI/CD** and **monitoring** (Prometheus/Uptime Kuma) - **New service**: Use `setup-project` skill for full workflow - **Ingress**: `ingress_factory` module. Auth: `protected = true`. Anti-AI: on by default. **DNS**: `dns_type = "proxied"` (Cloudflare CDN) or `"non-proxied"` (direct A/AAAA). DNS records are auto-created — no need to edit `config.tfvars`. +- **Anubis PoW challenge** (`modules/kubernetes/anubis_instance/`): per-site reverse proxy that issues a 30-day JWT cookie after a tiny PoW solve. Use for **public, content-bearing sites without app-level auth** (blog, docs, wikis, static landing pages). Pattern: declare `module "anubis" { source = "../../modules/kubernetes/anubis_instance"; name = "X"; namespace = ...; target_url = "http://..svc.cluster.local" }`, then in `ingress_factory` set `service_name = module.anubis.service_name`, `port = module.anubis.service_port`, `anti_ai_scraping = false`. Shared ed25519 key in Vault `secret/viktor` -> `anubis_ed25519_key`; cookie scoped to `viktorbarzin.me` so one solve covers all Anubis-fronted subdomains. **DO NOT put Anubis in front of Git/API/WebDAV/CLI endpoints** — clients without JS can't solve PoW. **Replicas default to 1** because Anubis stores in-flight challenges in process memory; a challenge issued by pod A and solved against pod B errors with `store: key not found` (HTTP 500). Bumping replicas requires wiring a shared Redis store (TODO). For path-level carve-outs (e.g. wrongmove has `/` behind Anubis but `/api` direct), declare a second `ingress_factory` with `ingress_path = ["/api"]` pointing at the bare backend service. Active on: blog, www, kms, travel, f1, cc, json, pb (privatebin), home (homepage), wrongmove (UI only). See `.claude/reference/patterns.md` "Anti-AI Scraping" for full layering. - **Docker images**: Always build for `linux/amd64`. Use 8-char git SHA tags — `:latest` causes stale pull-through cache. - **Private registry**: `forgejo.viktorbarzin.me/viktor/` (Forgejo packages, OAuth-style PAT auth). Use `image: forgejo.viktorbarzin.me/viktor/:` + `imagePullSecrets: [{name: registry-credentials}]`. Kyverno auto-syncs the Secret to all namespaces. Containerd `hosts.toml` on every node redirects to in-cluster Traefik LB `10.0.20.200` to avoid hairpin NAT. Push-side: viktor PAT in Vault `secret/ci/global/forgejo_push_token` (Forgejo container packages are scoped per-user; only the package owner can push, ci-pusher cannot write to viktor/*). Pull-side: cluster-puller PAT in Vault `secret/viktor/forgejo_pull_token`. Retention CronJob (`forgejo-cleanup` in `forgejo` ns, daily 04:00) keeps newest 10 versions + always `:latest`; integrity probed every 15min by `forgejo-integrity-probe` in `monitoring` ns (catalog walk + manifest HEAD on every blob). See `docs/plans/2026-05-07-forgejo-registry-consolidation-{design,plan}.md` for the migration history. Pull-through caches for upstream registries (DockerHub, GHCR, Quay, k8s.gcr, Kyverno) stay on the registry VM at `10.0.20.10` ports 5000/5010/5020/5030/5040 — the old port-5050 R/W private registry was decommissioned 2026-05-07. - **LinuxServer.io containers**: `DOCKER_MODS` runs apt-get on every start — bake slow mods into a custom image (`RUN /docker-mods || true` then `ENV DOCKER_MODS=`). Set `NO_CHOWN=true` to skip recursive chown that hangs on NFS mounts. @@ -188,11 +189,20 @@ resource "kubernetes_persistent_volume_claim" "data_proxmox" { requests = { storage = "1Gi" } } } + lifecycle { + # pvc-autoresizer expands this PVC up to storage_limit; ignore drift on + # requests.storage so the next TF apply doesn't try to shrink it back + # (K8s rejects shrinks → apply fails). To bump the floor manually: + # temporarily remove this block, apply the new size, re-add the block, + # apply again. + ignore_changes = [spec[0].resources[0].requests] + } } ``` - `wait_until_bound = false` is **required** (WaitForFirstConsumer binding) - Deployment strategy **must be Recreate** (RWO volumes) - Autoresizer annotations are **required** on all proxmox-lvm PVCs +- `lifecycle.ignore_changes` on `requests` is **required** to coexist with the autoresizer - Every proxmox-lvm app **MUST** add a backup CronJob writing to NFS `/mnt/main/-backup/` **proxmox-lvm-encrypted PVC template** (Terraform) — use for all sensitive data: @@ -215,9 +225,13 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" { requests = { storage = "1Gi" } } } + lifecycle { + # See data_proxmox above — required for autoresizer coexistence. + ignore_changes = [spec[0].resources[0].requests] + } } ``` -- Same rules as `proxmox-lvm` (wait_until_bound, Recreate strategy, autoresizer, backup CronJob) +- Same rules as `proxmox-lvm` (wait_until_bound, Recreate strategy, autoresizer, backup CronJob, `lifecycle.ignore_changes`) - Uses LUKS2 encryption with Argon2id key derivation via Proxmox CSI plugin - Encryption passphrase stored in Vault KV (`secret/viktor/proxmox_csi_encryption_passphrase`), synced to K8s Secret `proxmox-csi-encryption` in `kube-system` via ExternalSecret - Backup key at `/root/.luks-backup-key` on PVE host (chmod 600) diff --git a/.claude/reference/patterns.md b/.claude/reference/patterns.md index 4a563e3c..56ec6750 100644 --- a/.claude/reference/patterns.md +++ b/.claude/reference/patterns.md @@ -26,12 +26,16 @@ module "nfs_data" { ## ~~iSCSI Storage~~ (REMOVED — replaced by proxmox-lvm) > iSCSI via democratic-csi and TrueNAS has been fully removed (2026-04). All database storage now uses `StorageClass: proxmox-lvm` (Proxmox CSI, LVM-thin hotplug). TrueNAS has been decommissioned. -## Anti-AI Scraping (3 Active Layers) (Updated 2026-04-17) +## Anti-AI Scraping (4 Active Layers) (Updated 2026-05-10) Default `anti_ai_scraping = true` in ingress_factory. Disable per-service: `anti_ai_scraping = false`. -1. Bot blocking (ForwardAuth → poison-fountain) 2. X-Robots-Tag noai 3. Tarpit/poison content (standalone at poison.viktorbarzin.me) -Trap links (formerly layer 3) removed April 2026 — rewrite-body plugin broken on Traefik v3.6.12 (Yaegi bugs). `strip-accept-encoding` and `anti-ai-trap-links` middlewares deleted. +1. **Anubis PoW challenge** (per-site reverse proxy) — `modules/kubernetes/anubis_instance/`. Latest: `ghcr.io/techarohq/anubis:v1.25.0`. Difficulty 2 (~250 ms desktop / ~700 ms mobile), 30-day JWT cookie scoped to `viktorbarzin.me` so a single solve covers every Anubis-fronted subdomain. Active on: `viktorbarzin.me`, `kms.viktorbarzin.me`, `travel.viktorbarzin.me`. Add to a stack: `module "anubis" { source = "../../modules/kubernetes/anubis_instance"; name = "X"; namespace = ...; target_url = "http://..svc.cluster.local" }`, then point ingress_factory at `module.anubis.service_name` + `port = module.anubis.service_port` and set `anti_ai_scraping = false`. Shared ed25519 signing key in Vault `secret/viktor` -> `anubis_ed25519_key`. **Avoid putting Anubis in front of CLI/API/Git endpoints (Forgejo, APIs, WebDAV)** — clients without JS can't solve PoW. +2. **Bot blocking forwardAuth** (ForwardAuth → bot-block-proxy → poison-fountain) — global default for non-Anubis sites. `bot-block-proxy` (OpenResty in `traefik` ns) is fail-open with 100 ms connect / 200 ms read timeouts so a downed poison-fountain costs ≤200 ms per request. Source: `stacks/traefik/modules/traefik/main.tf`. +3. **X-Robots-Tag noai** — set by `traefik-anti-ai-headers` middleware. Anubis additionally serves a comprehensive `/robots.txt` (`SERVE_ROBOTS_TXT=true`) to well-behaved bots. +4. **Tarpit/poison content** (standalone at poison.viktorbarzin.me, `stacks/poison-fountain/`). Currently scaled to `replicas = 0` — fail-open path means no live traffic, no penalty. + +Trap links (formerly a layer) removed April 2026 — rewrite-body plugin broken on Traefik v3.6.12 (Yaegi bugs). `strip-accept-encoding` and `anti-ai-trap-links` middlewares deleted. Rybbit analytics injection now via Cloudflare Worker (`stacks/rybbit/worker/`, HTMLRewriter, wildcard route `*.viktorbarzin.me/*`, 28 site ID mappings). -Key files: `stacks/poison-fountain/`, `stacks/rybbit/worker/`, `stacks/platform/modules/traefik/middleware.tf` +Key files: `modules/kubernetes/anubis_instance/`, `stacks/poison-fountain/`, `stacks/rybbit/worker/`, `stacks/traefik/modules/traefik/main.tf` ## Terragrunt Architecture - Root `terragrunt.hcl`: DRY providers, backend, variable loading, `generate "tiers"` block diff --git a/.woodpecker/build-cli.yml b/.woodpecker/build-cli.yml index 4da90a43..cf95da7e 100644 --- a/.woodpecker/build-cli.yml +++ b/.woodpecker/build-cli.yml @@ -15,22 +15,23 @@ steps: username: "viktorbarzin" password: from_secret: dockerhub-pat + # Phase 4 of forgejo-registry-consolidation 2026-05-07 — + # registry.viktorbarzin.me:5050 decommissioned. Push to DockerHub + # (the public-facing infra image) AND Forgejo (the cluster pull + # source). Same image, two locations. repo: - viktorbarzin/infra - - registry.viktorbarzin.me:5050/infra + - forgejo.viktorbarzin.me/viktor/infra logins: - registry: https://index.docker.io/v1/ username: viktorbarzin password: from_secret: dockerhub-pat - # Private registry on :5050 requires htpasswd auth since 2026-03-22. - # Without this, buildx pushes the second repo but blob HEAD comes - # back 401 → pipeline fails → CI false-negative (see bd code-12b). - - registry: registry.viktorbarzin.me:5050 + - registry: forgejo.viktorbarzin.me username: - from_secret: registry_user + from_secret: forgejo_user password: - from_secret: registry_password + from_secret: forgejo_push_token dockerfile: cli/Dockerfile context: cli auto_tag: true diff --git a/.woodpecker/default.yml b/.woodpecker/default.yml index 05a579ea..5661bccd 100644 --- a/.woodpecker/default.yml +++ b/.woodpecker/default.yml @@ -73,6 +73,38 @@ steps: # the env var is unset. umask 077; printf '%s' "$VAULT_TOKEN" > "$HOME/.vault-token" + # ── Generate kubeconfig from projected SA token ── + # terragrunt.hcl injects `-var kube_config_path=/config` for every + # terraform invocation, so we need a kubeconfig file at that path. The + # `default` SA in the woodpecker namespace is cluster-admin (via the + # `woodpecker-default` ClusterRoleBinding), so the projected token is + # sufficient to apply any stack. Using `tokenFile` (not an inline token) + # so the provider re-reads it if kubelet rotates the projected token + # mid-pipeline. + - | + cat > config <<'EOF' + apiVersion: v1 + kind: Config + clusters: + - name: kubernetes + cluster: + server: https://10.0.20.100:6443 + certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt + contexts: + - name: ci + context: + cluster: kubernetes + user: ci + current-context: ci + users: + - name: ci + user: + tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token + EOF + chmod 600 config + # Sanity check: kubeconfig works + kubectl --kubeconfig=config get ns kube-system -o name >/dev/null + # ── Detect changed stacks ── - | PLATFORM_STACKS="dbaas authentik crowdsec monitoring nvidia mailserver cloudflared kyverno metallb redis traefik technitium headscale rbac k8s-portal vaultwarden reverse-proxy metrics-server vpa nfs-csi iscsi-csi cnpg sealed-secrets uptime-kuma wireguard xray infra-maintenance platform vault reloader descheduler external-secrets" diff --git a/.woodpecker/drift-detection.yml b/.woodpecker/drift-detection.yml index 438c408c..38cc60b9 100644 --- a/.woodpecker/drift-detection.yml +++ b/.woodpecker/drift-detection.yml @@ -41,6 +41,34 @@ steps: export VAULT_TOKEN=$(curl -s -X POST "$VAULT_ADDR/v1/auth/kubernetes/login" \ -d "{\"role\":\"ci\",\"jwt\":\"$SA_TOKEN\"}" | jq -r .auth.client_token) + # ── Generate kubeconfig from projected SA token ── + # See default.yml for rationale. terragrunt.hcl injects + # `-var kube_config_path=/config` for every terraform invocation, + # so we need a kubeconfig file at that path. The woodpecker default SA + # is cluster-admin, so the projected token is sufficient. + - | + cat > config <<'EOF' + apiVersion: v1 + kind: Config + clusters: + - name: kubernetes + cluster: + server: https://10.0.20.100:6443 + certificate-authority: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt + contexts: + - name: ci + context: + cluster: kubernetes + user: ci + current-context: ci + users: + - name: ci + user: + tokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token + EOF + chmod 600 config + kubectl --kubeconfig=config get ns kube-system -o name >/dev/null + # ── Run terraform plan on all stacks ── # Emits two timestamps per drifted stack so the Pushgateway/Prometheus # side can compute drift-age-hours via `time() - drift_stack_first_seen`. diff --git a/docs/architecture/backup-dr.md b/docs/architecture/backup-dr.md index b307ec6c..55201417 100644 --- a/docs/architecture/backup-dr.md +++ b/docs/architecture/backup-dr.md @@ -267,7 +267,7 @@ Native LVM thin snapshots provide crash-consistent point-in-time recovery for 62 **Snapshot Pruning**: Deletes LVM snapshots older than 7 days (safety net for snapshots that outlive `lvm-pvc-snapshot` timer). -**Monitoring**: Pushes `backup_weekly_last_success_timestamp` to Pushgateway. Alerts: `WeeklyBackupStale` (>8d), `WeeklyBackupFailing`. +**Monitoring**: Pushes `daily_backup_last_run_timestamp`, `daily_backup_last_status`, and `daily_backup_bytes_synced` to Pushgateway (job `daily-backup`). Alerts: `WeeklyBackupStale` (>9d on `daily_backup_last_run_timestamp`), `WeeklyBackupFailing` (`daily_backup_last_status != 0`). The metric is pushed both on clean exit AND from a `trap TERM INT` handler — a 2026-04-30 → 2026-05-09 silent-failure incident traced to systemd SIGTERMing the script before it reached its final push, leaving the alert blind. ### Layer 2b: Application-Level Backups @@ -686,9 +686,11 @@ module "nfs_backup" { **Metrics sources**: - Backup CronJobs: Push `backup_last_success_timestamp` to Pushgateway on completion -- LVM snapshot script: Pushes `lvm_snapshot_last_success_timestamp`, `lvm_snapshot_count`, `lvm_thin_pool_free_percent` -- Daily backup script: Pushes `backup_weekly_last_success_timestamp`, `backup_disk_usage_percent` -- Offsite sync script: Pushes `offsite_backup_sync_last_success_timestamp` +- LVM snapshot script: Pushes `lvm_snapshot_last_run_timestamp`, `lvm_snapshot_last_status`, `lvm_snapshot_created_total`, `lvm_snapshot_failed_total`, `lvm_snapshot_pruned_total`, `lvm_snapshot_thinpool_free_pct` (job `lvm-pvc-snapshot`) +- Daily backup script: Pushes `daily_backup_last_run_timestamp`, `daily_backup_last_status`, `daily_backup_bytes_synced` (job `daily-backup`). Disk-fullness alert (`BackupDiskFull`) does NOT use a script-pushed metric; it derives from node-exporter `node_filesystem_avail_bytes{job="proxmox-host", mountpoint="/mnt/backup"}`. +- pfSense backup (step 3 of `daily-backup`): Pushes `backup_last_run_timestamp`, `backup_last_status`, and `backup_last_success_timestamp` (only on success) under job `pfsense-backup`. Pushed in BOTH success and failure paths so `PfsenseBackupStale` doesn't go silent when SSH-to-pfsense breaks. +- Offsite sync script: Pushes `backup_last_success_timestamp`, `offsite_sync_last_status` (job `offsite-backup-sync`) +- Prometheus backup (sidecar in prometheus-server pod, monthly 1st-Sunday 04:00 UTC): Pushes `prometheus_backup_last_success_timestamp` (job `prometheus-backup`) - ~~CloudSync monitor~~: Removed (TrueNAS decommissioned) - Vaultwarden integrity: Pushes `vaultwarden_sqlite_integrity_ok` hourly @@ -728,6 +730,8 @@ the 2026-04-22 backup_offsite_sync FAIL (node3 kubelet hiccup at | NovelApp | ✓ | ✓ | — | ✓ | proxmox-lvm | | Headscale | ✓ | ✓ | — | ✓ | proxmox-lvm | | Uptime Kuma | ✓ | ✓ | — | ✓ | proxmox-lvm | +| **Other apps not enumerated above** | ✓¹ | ✓¹ | varies | ✓ | proxmox-lvm / proxmox-lvm-encrypted | +| **Postiz** (bundled bitnami PG on local-path) | — | — | ✓ daily pg_dump → NFS | ✓ | local-path + NFS | | **Media (NFS)** | | Immich (~800GB) | — | — | — | ✓ | NFS | | Audiobookshelf | — | — | — | ✓ | NFS | @@ -739,7 +743,13 @@ the 2026-04-22 backup_offsite_sync FAIL (node3 kubelet hiccup at - — = Not needed (other layers cover it, or data is regenerable/disposable) - excluded = Too large/regenerable, not worth offsite bandwidth -**Note**: All 65 proxmox-lvm PVCs get LVM snapshots (except dbaas+monitoring = 3 PVCs) + file-level backup (except dbaas+monitoring). NFS-backed media syncs directly to Synology `nfs/` and `nfs-ssd/` via inotify change tracking. +**Note**: All proxmox-lvm and proxmox-lvm-encrypted PVCs get LVM snapshots (except `dbaas` and `monitoring` namespaces, excluded for write-amplification reasons) + file-level backup. NFS-backed media syncs directly to Synology `nfs/` and `nfs-ssd/` via inotify change tracking. + +¹ **"Other apps not enumerated above"** — the table only enumerates services worth calling out. The default backup posture for any service using `proxmox-lvm` or `proxmox-lvm-encrypted` (outside `dbaas`/`monitoring`) is **automatic** Layer 1 (LVM thin snapshots, 7d retention) + Layer 2 (file backup, 4 weekly versions on sda) + Layer 3 (offsite to Synology). Auto-discovery is by LV name pattern (`vm-*-pvc-*`), so adding a new service to the cluster gets it covered without any explicit registration. Run `ssh root@192.168.1.127 lvs --noheadings -o lv_name pve | grep '^vm-.*-pvc-' | grep -v _snap_ | wc -l` to see the live count. + +**Known gaps** — services with PVCs not on the proxmox-lvm path lose Layer 1+2: +- **Postiz** PG and Redis (bundled bitnami chart) live on `local-path` (K8s node OS disk). PG covered by the postiz-postgres-backup CronJob (daily pg_dump → `/srv/nfs/postiz-backup/`, Layer 3 via offsite sync). Redis is regenerable cache — not backed up. +- **Prometheus, Alertmanager, Pushgateway** — `monitoring` namespace excluded by policy; loss is acceptable (metrics regenerable, silences ephemeral, Pushgateway has on-disk persistence for 24h gap tolerance). ## Recovery Procedures diff --git a/docs/architecture/networking.md b/docs/architecture/networking.md index e7959589..68834017 100644 --- a/docs/architecture/networking.md +++ b/docs/architecture/networking.md @@ -261,7 +261,7 @@ MetalLB v0.15.3 allocates IPs from the range 10.0.20.200-10.0.20.220 in **Layer | traefik | traefik | 10.0.20.200 (shared) | 80, 443, 443/UDP (HTTP/3), 10200, 10300, 11434/TCP | | coturn | coturn | 10.0.20.200 (shared) | 3478/UDP (STUN/TURN), 49152-49252/UDP (relay) | | headscale | headscale | 10.0.20.200 (shared) | 41641/UDP, 3479/UDP | -| windows-kms | kms | 10.0.20.200 (shared) | 1688/TCP | +| windows-kms¹ | kms | 10.0.20.200 (shared) | 1688/TCP | | qbittorrent | servarr | 10.0.20.200 (shared) | 50000/TCP+UDP | | shadowsocks | shadowsocks | 10.0.20.200 (shared) | 8388/TCP+UDP | | torrserver-bt | tor-proxy | 10.0.20.200 (shared) | 5665/TCP | @@ -272,6 +272,8 @@ MetalLB v0.15.3 allocates IPs from the range 10.0.20.200-10.0.20.220 in **Layer pfSense aliases reference these IPs: `k8s_shared_lb` (10.0.20.200), `technitium_dns` (10.0.20.201). NAT rules use aliases for maintainability. +¹ **windows-kms is publicly WAN-exposed.** pfSense forwards WAN TCP/1688 → `k8s_shared_lb:1688` so any internet host can activate. The matching filter rule applies a per-source rate limit (`max-src-conn 50`, `max-src-conn-rate 10/60`) with `overload ` flush — offenders are auto-added to pfSense's stock `virusprot` pf table for follow-on blocks. Operations (rate-limit tuning, log locations, revocation) are documented in `docs/runbooks/kms-public-exposure.md`. + Critical services are scaled to **3 replicas**: - Traefik (PDB: minAvailable=2) - Authentik (PDB: minAvailable=2) diff --git a/docs/architecture/storage.md b/docs/architecture/storage.md index df1e89f9..de0b2111 100644 --- a/docs/architecture/storage.md +++ b/docs/architecture/storage.md @@ -1,6 +1,6 @@ # Storage Architecture -Last updated: 2026-04-15 +Last updated: 2026-05-09 ## Overview @@ -13,7 +13,7 @@ The cluster uses two storage backends: **Proxmox CSI** for database block storag All services storing sensitive data were migrated to `proxmox-lvm-encrypted` on 2026-04-15. This eliminates the previous double-CoW (ZFS + LVM-thin) path and ensures data-at-rest encryption. **NFS storage (Proxmox host)**: ~100 NFS shares for media libraries (Immich, audiobookshelf, servarr, navidrome), backup targets (`*-backup/` directories), and app data are served directly from the Proxmox host at `192.168.1.127`. Two NFS export roots exist: -- **HDD NFS**: `/srv/nfs` on ext4 LV `pve/nfs-data` (2TB) — bulk media and backup targets +- **HDD NFS**: `/srv/nfs` on ext4 LV `pve/nfs-data` (3TB) — bulk media and backup targets - **SSD NFS**: `/srv/nfs-ssd` on ext4 LV `ssd/nfs-ssd-data` (100GB) — high-performance data (Immich ML) Both `StorageClass: nfs-truenas` and `StorageClass: nfs-proxmox` point to the Proxmox host and are functionally identical. The `nfs-truenas` name is historical — it was retained because StorageClass names are immutable on bound PVs (48 PVs reference it) and renaming would force mass PV churn across the cluster. @@ -31,7 +31,7 @@ graph TB subgraph Proxmox["Proxmox Host (192.168.1.127)"] sdc["sdc: 10.7TB RAID1 HDD
VG pve, LV data (thin pool)
~67 proxmox-lvm PVCs
~28 proxmox-lvm-encrypted PVCs"] sda["sda: 1.1TB RAID1 SAS
VG backup, LV data (ext4)
/mnt/backup"] - NFS_HDD["LV pve/nfs-data (2TB ext4)
/srv/nfs
~100 NFS shares
Media + backup targets"] + NFS_HDD["LV pve/nfs-data (3TB ext4)
/srv/nfs
~100 NFS shares
Media + backup targets"] NFS_SSD["LV ssd/nfs-ssd-data (100GB ext4)
/srv/nfs-ssd
High-performance data
(Immich ML)"] NFS_Exports["NFS Exports
managed by /etc/exports"] NFS_HDD --> NFS_Exports @@ -74,7 +74,7 @@ graph TB | **Proxmox CSI plugin** | Helm chart | Namespace: proxmox-csi | Block storage via LVM-thin hotplug | | **StorageClass `proxmox-lvm`** | RWO, WaitForFirstConsumer | Cluster-wide | Non-sensitive stateful apps | | **StorageClass `proxmox-lvm-encrypted`** | RWO, WaitForFirstConsumer, LUKS2 | Cluster-wide | **All sensitive data** (databases, auth, email, passwords, git) | -| Proxmox NFS (HDD) | LV `pve/nfs-data`, 2TB ext4 | 192.168.1.127:/srv/nfs | Bulk NFS data for all services | +| Proxmox NFS (HDD) | LV `pve/nfs-data`, 3TB ext4 | 192.168.1.127:/srv/nfs | Bulk NFS data for all services | | Proxmox NFS (SSD) | LV `ssd/nfs-ssd-data`, 100GB ext4 | 192.168.1.127:/srv/nfs-ssd | High-performance data (Immich ML) | | nfs-csi | Helm chart | Namespace: nfs-csi | NFS CSI driver | | StorageClass `nfs-proxmox` | RWX, soft mount | Cluster-wide | NFS storage, points to Proxmox host | diff --git a/docs/post-mortems/2026-05-09-io-pressure-stale-nfs.md b/docs/post-mortems/2026-05-09-io-pressure-stale-nfs.md new file mode 100644 index 00000000..de0b5719 --- /dev/null +++ b/docs/post-mortems/2026-05-09-io-pressure-stale-nfs.md @@ -0,0 +1,56 @@ +# Post-Mortem: IO Pressure Stalls from Stale NFS Client to Decommissioned TrueNAS + +| Field | Value | +|-------|-------| +| **Date** | 2026-05-09 (issue first observable in journal at 2026-05-08 00:00:04) | +| **Duration** | Intermittent IO PSI stalls and kubectl TLS handshake timeouts during the session; PVE host loadavg ~15 sustained. No user-visible outage. | +| **Severity** | SEV3 (degraded host I/O, no service down) | +| **Affected Components** | PVE host (192.168.1.127), `node_exporter` (PID 1479, D-state), kernel NFS kthread `[10.0.10.15-manager]`, k8s-node3 (downstream IO PSI). | +| **Status** | Resolved structurally. Stale connection source removed; recurring trigger eliminated. Wedged kthread persists in kernel queue — clears on next PVE reboot. | + +## Summary + +The PVE host's NFS client was retaining a wedged connection to `10.0.10.15` — the IP of the TrueNAS VM that was operationally decommissioned 2026-04-13 (storage migrated to `192.168.1.127:/srv/nfs`). The connection was created by `/usr/local/bin/weekly-backup`, a legacy script left over from before the NFS migration that had never been removed. Its kernel kthread `[10.0.10.15-manager]` parked itself in `rpc_wait_bit_killable` and stayed there. Any process that touched `/proc/mountstats` — including `node_exporter` — got dragged into D-state alongside it, which in turn fed back into IO pressure metrics. cluster-health surfaced this as `k8s-node3 full avg10=23%` and PVE loadavg sustained at ~15. + +## Impact + +- **User-facing**: None directly. Intermittent kubectl TLS handshake timeouts during the session, attributable to the elevated PVE loadavg. +- **Blast radius**: Single PVE host. node_exporter (PID 1479) wedged in D-state with the kthread. k8s-node3 downstream IO PSI peaked at `full avg10=23%`. +- **Data loss**: None. +- **Observability gap**: No alert fired for "stale NFS connection to decommissioned host". The IO PSI watchdog caught the symptom, not the cause. + +## Root Cause + +`/usr/local/bin/weekly-backup` was an artifact of the pre-2026-04-13 backup pipeline (when TrueNAS at `10.0.10.15` was the NFS server). After the TrueNAS decommission and migration to host NFS at `192.168.1.127`, the script was never deleted. It executed at least once recently (manually, or via a cron entry that has since been pruned), opening an NFS RPC session to `10.0.10.15`. With no peer answering, the kernel's RPC retry timer parked the manager kthread in `rpc_wait_bit_killable`. The kthread holds a lock that any reader of `/proc/mountstats` must take — `node_exporter` reads that file every scrape interval, so its scrape goroutine wedged in D-state too. + +## Resolution + +1. `lvextend -L +1T /dev/pve/nfs-data` + `resize2fs` — `/srv/nfs` 2 TiB → 3 TiB (90% → 60% used). Unrelated to the IO issue but bundled because `/srv/nfs` was at 90% and the user picked "grow LV" over "diet Immich". Thinpool (sdc) had ~4.6 TiB free. +2. `rm /usr/local/bin/weekly-backup` — eliminates the trigger. Backup pipeline is now `daily-backup.service` + `offsite-sync-backup.service` + per-app CronJobs (mysql/postgres/vault/etc.); `weekly-backup` was fully redundant. +3. `systemctl restart node_exporter` — replaces the wedged process. New PID 183319 healthy, `:9100/metrics` responsive. +4. `mysql-standalone` memory bump 2 Gi → 4 Gi limit, 1.5 Gi → 3 Gi request (commit forthcoming). Coincident May 8 18:05 OOM, not caused by this incident — `innodb_buffer_pool_size=1Gi` plus connection buffers and InnoDB internals didn't fit in 2 Gi. + +## Open / Out-of-Scope + +- **Wedged kthread `[10.0.10.15-manager]` (PID 3796184)** persists in the kernel queue. The kernel will eventually reap it once the RPC retry timer gives up, or it clears at next PVE reboot. With the script gone, no new ops queue against it. **Plan**: if PVE host PSI does not fully clear within 24 h, fold a PVE reboot into the next maintenance window. Not done in this change. +- **Transient OOMs unrelated to this incident**: + - `mysql-standalone-0` May 8 18:05 (anon-rss 2 GB at 2 Gi limit) — addressed by the limit bump above. + - postgres helpers May 9 12:37 — anon-rss <8 MB, pods no longer exist, no recurrence. No action. + - python pod May 9 13:36 (anon-rss 518 MB on k8s-node2) — pod no longer exists, no recurrence. No action. +- **Pre-existing TF drift**: `null_resource.pg_job_hunter_db` in `stacks/dbaas/modules/dbaas/main.tf` execs against `pg-cluster-1`, but the current CNPG primary is `pg-cluster-2`. Unrelated to this incident; surfaced during the targeted MySQL apply. Fix is a separate ticket — should resolve the primary dynamically (e.g., via the `cnpg.io/instanceRole=primary` selector) instead of hardcoding pod ordinal. + +## Action Items + +- [x] Delete `/usr/local/bin/weekly-backup` on PVE host. +- [x] Restart `node_exporter.service` on PVE host. +- [x] Grow `pve/nfs-data` LV to 3 TiB; online `resize2fs`. +- [x] Bump `mysql-standalone` memory request/limit to 3 Gi / 4 Gi. +- [x] Update `docs/architecture/storage.md` to record the new LV size. +- [ ] Reboot PVE host at next maintenance window if `[10.0.10.15-manager]` kthread does not clear within 24 h. +- [ ] (Separate ticket) Fix `null_resource.pg_*_db` resources to target the actual CNPG primary instead of hardcoding `pg-cluster-1`. + +## Related + +- TrueNAS decommission: memory `id=674` (2026-04-13). +- Prior LV grow on `pve/nfs-data` (2 TiB out-of-band): memory `id=691` (2026-04-12). +- Architecture: `docs/architecture/storage.md`, `docs/architecture/backup-dr.md`. diff --git a/docs/runbooks/kms-public-exposure.md b/docs/runbooks/kms-public-exposure.md new file mode 100644 index 00000000..9a6a4a6f --- /dev/null +++ b/docs/runbooks/kms-public-exposure.md @@ -0,0 +1,115 @@ +# Runbook: KMS public exposure (kms.viktorbarzin.me:1688) + +`kms.viktorbarzin.me:1688/TCP` is intentionally open to the internet so any +visitor can activate Volume License Microsoft products. The webpage at +`https://kms.viktorbarzin.me/` documents how to use it. + +This runbook covers operations on the public exposure: where to find logs, +how to tune the rate limit, how to revoke if abused. + +## Architecture + +- **K8s service**: `windows-kms` in namespace `kms`, MetalLB shared LB IP + `10.0.20.200:1688`. ETP=Cluster, so client IPs in vlmcsd logs are SNAT'd + k8s node IPs (not real-world client IPs). Trade-off accepted — + preserving real client IPs would require a dedicated MetalLB IP with + ETP=Local or a PROXY-protocol bounce; vlmcsd doesn't speak PROXY-v2. +- **pfSense WAN forward**: `WAN TCP/1688 → k8s_shared_lb:1688` + (alias = `10.0.20.200`). Description: `KMS public — kms.viktorbarzin.me`. +- **Filter rule** on the WAN interface, TCP/1688, with state-table + per-source caps: + - `max-src-conn 50` — concurrent connections per source IP + - `max-src-conn-rate 10/60` — 10 new connections per 60 seconds per + source + - `overload ` flush — sources that exceed either cap get added + to pfSense's stock `virusprot` pf table and have their existing states + flushed. (`virusprot` is the only table pfSense's filter generator + targets for `overload`; see `/etc/inc/filter.inc`. Don't try to point + it at a custom table — the schema doesn't expose that knob.) + +## Where the logs are + +### vlmcsd (kms namespace, k8s) + +```bash +# Live tail +kubectl logs -n kms -l app=kms-service -c windows-kms --tail=50 -f + +# All activations in the running pod +kubectl logs -n kms -l app=kms-service -c windows-kms | grep "Incoming KMS request" +``` + +Source IPs in this log are the SNAT'd node IPs because the LB Service uses +ETP=Cluster on a shared MetalLB IP. Don't expect real WAN client IPs here. + +### Slack notifier (kms namespace, k8s) + +```bash +kubectl logs -n kms -l app=kms-service -c slack-notifier --tail=50 -f +``` + +Posts to `#alerts`, dedup window 1h per (source-IP, product). Activations +also increment the Prometheus counter `kms_activations_total{product,status}` +exposed on the same pod at `:9101/metrics` (scraped by the cluster-wide +`kubernetes-pods` job; query via Prometheus or Grafana directly). + +### pfSense — virusprot table and filter hits + +```bash +# SSH to 10.0.20.1 as root +pfctl -t virusprot -T show # who's currently in the virusprot table +pfctl -t virusprot -T expire 86400 # boot anyone added more than 24h ago +pfctl -t virusprot -T flush # nuke the entire table + +# Filter rule hit counts (find the KMS public rule, look at Evaluations / States) +pfctl -sr -v | grep -A 4 1688 + +# State table — current TCP/1688 connections, per source +pfctl -ss | grep ':1688 ' +``` + +## Tightening or loosening the rate limit + +The filter rule is configured via the pfSense web UI +(`Firewall → Rules → WAN`, look for the `KMS public — kms.viktorbarzin.me` +rule) under **Advanced Options → "Maximum new connections per source per +seconds"** and **"Maximum state entries per source"**. + +- **Default**: `max-src-conn 50`, `max-src-conn-rate 10/60` +- To **tighten** (suspected abuse): drop to `max-src-conn 10`, + `max-src-conn-rate 3/60`. Flush state and existing virusprot afterwards + (`pfctl -k 0.0.0.0/0 -K 0.0.0.0/0` is overkill — just save+apply the + rule, pfSense reloads pf and existing virusprot stay blocked). +- To **loosen** (legitimate users blocked): bump to + `max-src-conn-rate 30/60`. The `virusprot` table flush still applies on + overload; reduce its lifetime via + `Firewall → Advanced → State Timeouts` if entries linger. + +The `overload` table entry survives pf reloads. Running +`pfctl -t virusprot -T flush` after a tuning change clears the slate. + +## Revoking the public exposure + +If the activation surface needs to come down (abuse, legal, audit): + +1. **pfSense web UI** → `Firewall → NAT → Port Forward` → find + `WAN TCP/1688 → k8s_shared_lb` → **delete** (or disable). Apply. +2. **pfSense web UI** → `Firewall → Rules → WAN` → find + `KMS public — kms.viktorbarzin.me` → **delete** (or disable). Apply. +3. Verify externally: from a phone tether, `nc -zw3 kms.viktorbarzin.me 1688` + should now fail. + +The k8s service stays reachable on the LAN +(`10.0.20.200:1688` and the internal `kms.viktorbarzin.lan` ingress for +the webpage) — only the WAN port-forward is removed. + +To put it back, recreate the NAT rule (target alias `k8s_shared_lb`, +port `1688`) and the filter rule with the same per-source caps. + +## Related + +- Stack: `stacks/kms/` (Terraform; deployment, MetalLB Service, ingress, + ExternalSecret for the Slack webhook) +- Webpage source: `kms-website/` repo (Hugo + nginx, deployed via Drone CI) +- Networking architecture footnote: + `docs/architecture/networking.md` § "MetalLB & Load Balancing" diff --git a/docs/runbooks/woodpecker-onboard-forgejo-repo.md b/docs/runbooks/woodpecker-onboard-forgejo-repo.md index 0a4de682..9cc64826 100644 --- a/docs/runbooks/woodpecker-onboard-forgejo-repo.md +++ b/docs/runbooks/woodpecker-onboard-forgejo-repo.md @@ -2,72 +2,85 @@ Last updated: 2026-05-07 -When you create a new repo on `forgejo.viktorbarzin.me`, Woodpecker -does NOT auto-discover it via the cluster's existing OAuth session. -The `forgejo` user inside Woodpecker (Forgejo-OAuth'd) needs to: +## Programmatic (preferred) -1. Open `https://ci.viktorbarzin.me/` in a browser. -2. Log in via Forgejo OAuth (the "Sign in with Forgejo" button). -3. Click "Add Repository" — your new repo should appear. -4. Click the toggle to activate it. Woodpecker will: - - Add a webhook on the Forgejo repo (push, PR, release events). - - Register the repo's `forge_remote_id` in its DB so subsequent - hooks deserialize correctly. -5. Push a commit (or hit "Run pipeline" in Woodpecker UI) — first - build fires. +```bash +infra/scripts/woodpecker-register-forgejo-repo.sh viktor/ +``` -## Why API-only doesn't work +The script: +1. Pulls the `viktor` (Forgejo-OAuth'd) user's `hash` from the + Woodpecker PG `users` table. +2. Mints a session JWT (HS256, signed with that hash) — Woodpecker + per-user session JWTs have payload + `{"type":"user","user-id":""}` and the signing key is the + user's `hash` column. (Confirmed against a known-good admin + token: same payload shape, signature reproducible from the user's + stored hash via `openssl dgst -sha256 -hmac "$HASH"`.) +3. Looks up the Forgejo repo id and POSTs to + `https://ci.viktorbarzin.me/api/repos?forge_remote_id=` as + that user. Woodpecker server creates the per-repo webhook + + per-repo signing key on the Forgejo side automatically (uses + the user's stored Forgejo OAuth `access_token` to do so — that's + why this only works with viktor's user, not the GitHub admin's). -The webhook URL contains a JWT signed with a per-server key that's -stored in the DB and only accessible at OAuth-flow time. POST'ing -`/api/repos` as the admin (`ViktorBarzin` GitHub user) returns 500 -because the lookup queries forge-side OAuth state for THAT user, -which doesn't exist for the Forgejo `viktor` user. We confirmed: +Pre-requisites: +- `vault login -method=oidc` with read access to + `database/static-creds/pg-woodpecker`. +- `kubectl` cluster access (the script spawns a 5-min psql pod in + the `woodpecker` namespace to query the DB). +- A Forgejo PAT in `secret/viktor/forgejo_admin_token` (or pass + `FORGEJO_TOKEN=…` env), used to look up the repo's numeric ID. +- The `viktor` Woodpecker user must already exist (i.e., they've + logged in via Forgejo OAuth at least once on the Web UI). + If user_id=2 / forge_id=2 doesn't exist in `users`, the OAuth + bootstrap is unavoidable — but it only needs to happen once for + the lifetime of the Woodpecker DB. -- Direct `POST /api/repos?forge_remote_id=N` → HTTP 500 server-side. -- Generating a JWT with the agent secret → "token is unverifiable" - on hook delivery (the signing key is repo-specific, not the - global agent secret). +## Why the GitHub admin token can't do this -There's no admin endpoint that side-steps the OAuth flow. +The earlier 500 from `POST /api/repos?forge_remote_id=N` was +because my admin session token authenticates as `ViktorBarzin` +(GitHub user, forge_id=1). Woodpecker tries to call Forgejo as +that user (using their stored Forgejo OAuth token) — which doesn't +exist for the GitHub user, hence the lookup error. There's no way +around this without acting as the Forgejo user. -## Bootstrap when UI access isn't available +## Why the previous "JWT for the webhook" approach didn't work -If you absolutely need to bootstrap a new image without UI access -(e.g., during an outage), the workaround is: +I tried generating a webhook JWT signed with `WOODPECKER_AGENT_SECRET` +(the global agent secret) and registering it directly on Forgejo. +That fails because the webhook JWT verification path runs through a +DB-backed `keyfunc` — Woodpecker stores a per-repo signing key when +the repo is activated, and rejects any JWT signed with a different +key. POST /api/repos is what creates that per-repo key. -1. Build locally: - ```bash - docker build -t forgejo.viktorbarzin.me/viktor/: /path/to/source - docker push forgejo.viktorbarzin.me/viktor/: - ``` -2. Or pull from another already-built source and retag: - ```bash - docker pull viktorbarzin/: # DockerHub - docker tag viktorbarzin/: forgejo.viktorbarzin.me/viktor/: - docker push forgejo.viktorbarzin.me/viktor/: - ``` -3. Flip the cluster `image=` reference and restart deployments. +## After registration -Document the bootstrap in the relevant stack so future maintainers -know the image was put there by hand. After Woodpecker UI onboarding, -the next pipeline run replaces the bootstrap image with a CI-built one. +Pipelines fire automatically on push. The `WOODPECKER_FORGE_TIMEOUT` +default of 3s was too tight for our cluster (Forgejo response time +spikes to 1-2s under load) — bumped to 30s in +`infra/stacks/woodpecker/values.yaml` 2026-05-07. Without that bump, +config-loader hits the deadline and every pipeline errors with +`could not load config from forge: context deadline exceeded`. -## Repos onboarded in flight 2026-05-07 +## When the v3.13 → v3.14 server upgrade matters -These were created during the forgejo-registry-consolidation but the -UI step above hasn't been done yet — their `.woodpecker.yml` / -`.woodpecker/build.yml` exists on Forgejo but no pipeline fires: +`v3.14.0` doesn't fix this on its own — the timeout default is the +same. Set `WOODPECKER_FORGE_TIMEOUT` regardless of version. The +v3.14 upgrade was useful for unrelated forge-API changes (smarter +config-loader, fewer redundant calls per trigger). -- `viktor/broker-sync` — image bootstrapped via DockerHub (see - `infra/stacks/wealthfolio/main.tf` comment). -- `viktor/fire-planner` — image bootstrapped via local docker build. -- `viktor/hmrc-sync` -- `viktor/freedify` -- `viktor/claude-agent-service` -- `viktor/beadboard` — image bootstrapped via local docker build. -- `viktor/claude-memory-mcp` +## Troubleshooting -Walk through each in the Woodpecker UI to enable. Pipelines for -already-onboarded repos (payslip-ingest, job-hunter, infra) fired -correctly after the v3.13 → v3.14 upgrade. +- Pipeline status `error` with `could not load config from forge`: + bump `WOODPECKER_FORGE_TIMEOUT`. 30s is plenty. +- Pipeline status `error` with `secret "registry-password" not found`: + the repo's `.woodpecker.yml` still references registry-private + credentials. Drop the `registry.viktorbarzin.me` block — Forgejo + is the only registry now. +- Pipeline status `failure` with `"/vault": not found` (or any + other COPY of a binary): the gitignored binary wasn't pushed to + Forgejo. Switch the Dockerfile to `curl … && unzip` from the + HashiCorp/upstream release URL. See `claude-agent-service/Dockerfile` + commit bab6dd2 for the pattern. diff --git a/modules/kubernetes/anubis_instance/main.tf b/modules/kubernetes/anubis_instance/main.tf new file mode 100644 index 00000000..55129bbf --- /dev/null +++ b/modules/kubernetes/anubis_instance/main.tf @@ -0,0 +1,406 @@ +terraform { + required_providers { + kubernetes = { + source = "hashicorp/kubernetes" + } + } +} + +# Per-site Anubis reverse proxy. +# Sits between Traefik and the real backend. On first visit, serves a +# proof-of-work challenge; on success, drops a long-lived JWT cookie and +# proxies the request through to `target_url`. +# +# Sharing a single ed25519 signing key across instances + COOKIE_DOMAIN at +# the registrable domain means a token solved on one viktorbarzin.me subdomain +# is honoured by every other Anubis-fronted site. + +variable "name" { + type = string + description = "Short logical name (e.g. \"blog\"). Used to derive Service / Deployment / Secret names as anubis-." +} + +variable "namespace" { + type = string + description = "Namespace to deploy into — typically the same as the protected backend service." +} + +variable "target_url" { + type = string + description = "Backend URL Anubis forwards passing requests to (e.g. http://blog.website.svc.cluster.local)." +} + +variable "cookie_domain" { + type = string + default = "viktorbarzin.me" + description = "Cookie domain — set to the registrable domain so a single PoW solve covers every Anubis-fronted subdomain." +} + +variable "difficulty" { + type = number + default = 2 + description = "PoW difficulty (leading-zero hex chars). 2 = ~250ms desktop / ~700ms mobile. Bump for stronger filtering." +} + +variable "cookie_expiration_hours" { + type = number + default = 720 # 30 days + description = "Lifetime of the issued JWT cookie in hours." +} + +variable "image_tag" { + type = string + default = "v1.25.0" + description = "ghcr.io/techarohq/anubis tag — pin to a release, never :latest." +} + +variable "replicas" { + type = number + default = 1 + description = "Replica count. Default 1 because Anubis stores in-flight challenges in process memory — with N>1 a challenge issued by pod A and solved against pod B fails with `store: key not found` (HTTP 500). For HA, configure a shared store (Redis) and bump this. Per-pod 128Mi @ idle is cheap, single-pod restart is sub-second, so 1 is fine for content sites." +} + +variable "memory" { + type = string + default = "128Mi" + description = "requests==limits memory. Anubis docs suggest 128Mi handles many concurrent clients." +} + +variable "policy_yaml" { + type = string + default = null + description = "Override the strict default bot-policy YAML. Leave null to use the catch-all CHALLENGE policy." +} + +variable "cpu_request" { + type = string + default = "20m" + description = "CPU request. PoW verification is server-cheap (just hash check)." +} + +locals { + full_name = "anubis-${var.name}" + labels = { + "app" = local.full_name + "app.kubernetes.io/name" = "anubis" + "app.kubernetes.io/instance" = local.full_name + "app.kubernetes.io/component" = "ai-bot-challenge" + "app.kubernetes.io/managed-by" = "terraform" + } + + # Strict bot policy. Default Anubis policy only WEIGHs Mozilla|Opera UAs + # and lets unmatched UAs (curl, wget, Python-requests, scrapy, headless + # CLI scrapers) fall through to ALLOW. We import the same upstream + # snippets and append a catch-all CHALLENGE so anyone without JS+PoW + # capability is filtered. + default_policy_yaml = <<-EOT + bots: + # Hard-deny known-bad bots first. + - import: (data)/bots/_deny-pathological.yaml + - import: (data)/bots/aggressive-brazilian-scrapers.yaml + # Hard-deny declared AI/LLM crawlers (ClaudeBot, GPTBot, Bytespider, …). + - import: (data)/meta/ai-block-aggressive.yaml + # Whitelist legitimate search-engine crawlers (Googlebot, Bingbot, …). + - import: (data)/crawlers/_allow-good.yaml + # Challenge Firefox AI previews specifically. + - import: (data)/clients/x-firefox-ai.yaml + # Allow /.well-known, /robots.txt, /favicon.*, /sitemap.xml — keeps + # the internet working for benign crawlers and discovery clients. + - import: (data)/common/keep-internet-working.yaml + # Catch-all: every remaining request must solve the challenge. This + # closes the "unmatched UA falls through to ALLOW" gap that lets + # curl/wget/Python-requests scrape non-CDN-fronted hosts. + - name: catchall-challenge + path_regex: .* + action: CHALLENGE + EOT +} + +# Bot policy ConfigMap. Mounted into the pod and referenced by POLICY_FNAME. +resource "kubernetes_config_map" "policy" { + metadata { + name = "${local.full_name}-policy" + namespace = var.namespace + labels = local.labels + } + data = { + "botPolicies.yaml" = coalesce(var.policy_yaml, local.default_policy_yaml) + } +} + +# ED25519 signing key — pulled from Vault `secret/viktor` -> field +# `anubis_ed25519_key`. Same key across every instance so JWTs are +# cross-validatable, enabling cross-subdomain SSO. +resource "kubernetes_manifest" "ed25519_secret" { + manifest = { + apiVersion = "external-secrets.io/v1beta1" + kind = "ExternalSecret" + metadata = { + name = "${local.full_name}-key" + namespace = var.namespace + } + spec = { + refreshInterval = "1h" + secretStoreRef = { + name = "vault-kv" + kind = "ClusterSecretStore" + } + target = { + name = "${local.full_name}-key" + creationPolicy = "Owner" + } + data = [{ + secretKey = "key" + remoteRef = { + key = "viktor" + property = "anubis_ed25519_key" + } + }] + } + } +} + +resource "kubernetes_deployment" "anubis" { + metadata { + name = local.full_name + namespace = var.namespace + labels = local.labels + } + + spec { + replicas = var.replicas + + selector { + match_labels = { app = local.full_name } + } + + strategy { + type = "RollingUpdate" + rolling_update { + max_surge = 1 + max_unavailable = 0 + } + } + + template { + metadata { + labels = local.labels + } + + spec { + # Spread replicas across nodes to survive a single node failure. + topology_spread_constraint { + max_skew = 1 + topology_key = "kubernetes.io/hostname" + when_unsatisfiable = "ScheduleAnyway" + label_selector { + match_labels = { app = local.full_name } + } + } + + container { + name = "anubis" + image = "ghcr.io/techarohq/anubis:${var.image_tag}" + + port { + name = "http" + container_port = 8923 + } + port { + name = "metrics" + container_port = 9090 + } + + env { + name = "BIND" + value = ":8923" + } + env { + name = "METRICS_BIND" + value = ":9090" + } + env { + name = "TARGET" + value = var.target_url + } + env { + name = "DIFFICULTY" + value = tostring(var.difficulty) + } + env { + name = "COOKIE_EXPIRATION_TIME" + value = "${var.cookie_expiration_hours}h" + } + # Cross-subdomain SSO: cookie scoped to the registrable domain so + # a JWT solved on any Anubis-fronted subdomain is honoured on every + # other one. (COOKIE_DOMAIN and COOKIE_DYNAMIC_DOMAIN are mutually + # exclusive — picking the explicit form.) + env { + name = "COOKIE_DOMAIN" + value = var.cookie_domain + } + env { + name = "COOKIE_SECURE" + value = "true" + } + env { + name = "COOKIE_SAME_SITE" + value = "Lax" + } + # Built-in robots.txt that disallows known AI scrapers — well-behaved + # bots get blocked here without ever paying the PoW cost. + env { + name = "SERVE_ROBOTS_TXT" + value = "true" + } + # Drop cluster-internal IPs from XFF so Anubis sees the real client. + env { + name = "XFF_STRIP_PRIVATE" + value = "true" + } + env { + name = "SLOG_LEVEL" + value = "INFO" + } + env { + name = "ED25519_PRIVATE_KEY_HEX_FILE" + # Mounted from the ESO-managed Secret below. + value = "/keys/key" + } + env { + name = "POLICY_FNAME" + value = "/config/botPolicies.yaml" + } + + volume_mount { + name = "ed25519-key" + mount_path = "/keys" + read_only = true + } + volume_mount { + name = "policy" + mount_path = "/config" + read_only = true + } + + resources { + requests = { + cpu = var.cpu_request + memory = var.memory + } + limits = { + memory = var.memory + } + } + + # Liveness + readiness on the metrics endpoint (zero auth, always 200). + liveness_probe { + http_get { + path = "/metrics" + port = "metrics" + } + initial_delay_seconds = 10 + period_seconds = 30 + failure_threshold = 3 + } + readiness_probe { + http_get { + path = "/metrics" + port = "metrics" + } + initial_delay_seconds = 2 + period_seconds = 5 + failure_threshold = 2 + } + + security_context { + run_as_non_root = true + run_as_user = 1000 + run_as_group = 1000 + allow_privilege_escalation = false + read_only_root_filesystem = true + capabilities { + drop = ["ALL"] + } + } + } + + volume { + name = "ed25519-key" + secret { + secret_name = "${local.full_name}-key" + items { + key = "key" + path = "key" + } + } + } + volume { + name = "policy" + config_map { + name = kubernetes_config_map.policy.metadata[0].name + } + } + } + } + } + + lifecycle { + # KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2 + ignore_changes = [spec[0].template[0].spec[0].dns_config] + } + + depends_on = [kubernetes_manifest.ed25519_secret] +} + +resource "kubernetes_service" "anubis" { + metadata { + name = local.full_name + namespace = var.namespace + labels = local.labels + annotations = { + "prometheus.io/scrape" = "true" + "prometheus.io/path" = "/metrics" + "prometheus.io/port" = "9090" + } + } + + spec { + selector = { app = local.full_name } + port { + name = "http" + port = 8080 + target_port = 8923 + protocol = "TCP" + } + port { + name = "metrics" + port = 9090 + target_port = 9090 + protocol = "TCP" + } + } +} + +resource "kubernetes_pod_disruption_budget_v1" "anubis" { + metadata { + name = local.full_name + namespace = var.namespace + } + spec { + min_available = "1" + selector { + match_labels = { app = local.full_name } + } + } +} + +output "service_name" { + value = kubernetes_service.anubis.metadata[0].name + description = "ClusterIP service name. Pass this to ingress_factory's `service_name` so Traefik routes through Anubis." +} + +output "service_port" { + value = 8080 + description = "Service port. Anubis listens on 8923 inside; the Service exposes 8080." +} diff --git a/scripts/daily-backup.service b/scripts/daily-backup.service index a2bf2d85..752c79dd 100644 --- a/scripts/daily-backup.service +++ b/scripts/daily-backup.service @@ -8,4 +8,7 @@ ExecStart=/usr/local/bin/daily-backup StandardOutput=journal StandardError=journal SyslogIdentifier=daily-backup -TimeoutStartSec=3600 +# 4h budget — the snapshot mount + LUKS decrypt + rsync + sqlite scan loop +# scales with the number of PVCs (118 today). Hit the 1h ceiling around week +# 18 of 2026 and silently SIGTERM'd for 10 days. Bumped to 4h with margin. +TimeoutStartSec=14400 diff --git a/scripts/daily-backup.sh b/scripts/daily-backup.sh index a9d776a7..1d5b289a 100644 --- a/scripts/daily-backup.sh +++ b/scripts/daily-backup.sh @@ -21,15 +21,48 @@ warn() { log "WARN: $*" >&2; } die() { log "FATAL: $*" >&2; push_metrics 1 0; exit 1; } # --- Locking --- +# Track whether we got SIGTERM/SIGINT so cleanup can push a non-success metric. +# Without this, a systemd timeout-kill leaves WeeklyBackupFailing alerts blind: +# the script never reaches the success push at the end and the metric goes stale +# silently. (Root cause of 2026-04-30 → 2026-05-09 silent-failure run.) +KILLED="" + cleanup() { - umount "${PVC_MOUNT}" 2>/dev/null || true + # Recursively unmount /tmp/pvc-mount: previous SIGTERM'd runs left snapshot + # mounts stacked here, which made every subsequent run start with an + # already-occupied mountpoint and time out before reaching its own umount. + while mountpoint -q "${PVC_MOUNT}" 2>/dev/null; do + umount "${PVC_MOUNT}" 2>/dev/null || umount -l "${PVC_MOUNT}" 2>/dev/null || break + done + # Close any LUKS mappers we opened (or that were left over from a prior crash). + for m in /dev/mapper/pvc-snap-*; do + [ -e "$m" ] || continue + cryptsetup close "$(basename "$m")" 2>/dev/null || true + done rm -f "${LOCKFILE}" + if [ -n "${KILLED}" ]; then + # status=2 = aborted (matches lvm-pvc-snapshot's convention) + push_metrics 2 "${TOTAL_BYTES:-0}" + fi } trap cleanup EXIT +trap 'KILLED=1; exit 143' TERM INT + if ! ( set -o noclobber; echo $$ > "${LOCKFILE}" ) 2>/dev/null; then die "Another instance is running (PID $(cat "${LOCKFILE}" 2>/dev/null || echo unknown))" fi +# Belt-and-braces: if a previous run was SIGTERM'd before its trap completed, +# /tmp/pvc-mount may have stacked mounts and stale LUKS mappers. The lock above +# guarantees we're alone, so it's safe to clean these up now. +while mountpoint -q "${PVC_MOUNT}" 2>/dev/null; do + umount "${PVC_MOUNT}" 2>/dev/null || umount -l "${PVC_MOUNT}" 2>/dev/null || break +done +for m in /dev/mapper/pvc-snap-*; do + [ -e "$m" ] || continue + cryptsetup close "$(basename "$m")" 2>/dev/null || true +done + # --- Metrics --- push_metrics() { local status="${1:-0}" bytes="${2:-0}" @@ -243,6 +276,7 @@ fi log "--- Step 3: pfsense backup ---" PFSENSE_DEST="${BACKUP_ROOT}/pfsense" DATE=$(date +%Y%m%d) +PFSENSE_STATUS=0 mkdir -p "${PFSENSE_DEST}" if timeout 10 ssh -o BatchMode=yes -o ConnectTimeout=5 root@10.0.20.1 true 2>/dev/null; then @@ -253,6 +287,7 @@ if timeout 10 ssh -o BatchMode=yes -o ConnectTimeout=5 root@10.0.20.1 true 2>/de else warn "Failed to copy pfsense config.xml" STATUS=1 + PFSENSE_STATUS=1 fi # Full filesystem tar @@ -264,21 +299,28 @@ if timeout 10 ssh -o BatchMode=yes -o ConnectTimeout=5 root@10.0.20.1 true 2>/de else warn "Failed to tar pfsense filesystem" STATUS=1 + PFSENSE_STATUS=1 fi # Retention: keep 4 weekly copies ls -t "${PFSENSE_DEST}"/config-*.xml 2>/dev/null | tail -n +5 | xargs rm -f 2>/dev/null || true ls -t "${PFSENSE_DEST}"/pfsense-full-*.tar.gz 2>/dev/null | tail -n +5 | xargs rm -f 2>/dev/null || true - - # Push pfsense-specific metric - echo "backup_last_success_timestamp $(date +%s)" | \ - curl -s --connect-timeout 5 --max-time 10 --data-binary @- \ - "${PUSHGATEWAY}/metrics/job/pfsense-backup" 2>/dev/null || true else warn "Cannot SSH to pfsense (10.0.20.1) — skipping" STATUS=1 + PFSENSE_STATUS=1 fi +# Push pfsense-backup metrics in BOTH success and failure paths so +# PfsenseBackupStale + PfsenseBackupFailing alerts can fire instead of going +# silent when ssh-to-pfsense is broken. +{ + echo "backup_last_run_timestamp $(date +%s)" + echo "backup_last_status ${PFSENSE_STATUS}" + [ "${PFSENSE_STATUS}" -eq 0 ] && echo "backup_last_success_timestamp $(date +%s)" +} | curl -s --connect-timeout 5 --max-time 10 --data-binary @- \ + "${PUSHGATEWAY}/metrics/job/pfsense-backup" 2>/dev/null || true + # ============================================================ # STEP 4: PVE host config backup # ============================================================ diff --git a/scripts/woodpecker-register-forgejo-repo.sh b/scripts/woodpecker-register-forgejo-repo.sh new file mode 100755 index 00000000..ee85f9b1 --- /dev/null +++ b/scripts/woodpecker-register-forgejo-repo.sh @@ -0,0 +1,121 @@ +#!/usr/bin/env bash +# Programmatically register a Forgejo repo in Woodpecker without needing the +# Web UI's OAuth flow. +# +# Earlier we believed only the OAuth login could create a working webhook +# because the webhook URL contains a JWT signed with a server-side key. +# That's true for the JWT, BUT the webhook is created server-side when the +# repo is activated through POST /api/repos — Woodpecker handles the JWT +# generation internally. We just need to call that endpoint as the right +# user (the one whose forge OAuth token can read the repo). +# +# The Woodpecker admin token (mine, ViktorBarzin@github) is a session JWT +# of the form `{"type":"user","user-id":"1"}` signed with the user's +# `hash` column (per-user, stored in the `users` table). Forge-API calls +# made on behalf of that user use the user's stored OAuth `access_token` +# from the same row. My GitHub admin can't read Forgejo repos, so the +# admin token can't activate Forgejo repos. +# +# The fix: mint a session JWT for the Forgejo `viktor` user (user_id=2) +# using `viktor`'s `hash`. Then POST /api/repos as viktor — viktor's +# stored Forgejo OAuth token has the access needed. +# +# Usage: +# ./woodpecker-register-forgejo-repo.sh [ ...] +# Example: +# ./woodpecker-register-forgejo-repo.sh viktor/broker-sync viktor/freedify +# +# Requires: +# - vault CLI logged in (oidc or token), with read access to +# secret/database/static-creds/pg-woodpecker AND a Forgejo PAT in +# secret/viktor/forgejo_admin_token (or pass FORGEJO_TOKEN env var) +# - kubectl with cluster access (for the temporary psql pod) +# - openssl + +set -euo pipefail + +NS=${NS:-woodpecker} +WP_URL=${WP_URL:-https://ci.viktorbarzin.me} +FORGEJO_URL=${FORGEJO_URL:-https://forgejo.viktorbarzin.me} +FORGEJO_USER_LOGIN=${FORGEJO_USER_LOGIN:-viktor} + +if [ "$#" -lt 1 ]; then + echo "usage: $0 [ ...]" >&2 + exit 1 +fi + +# Pull viktor's `hash` from the woodpecker DB (used to sign the session JWT) +# and OAuth access_token (sanity check it exists). +WP_DB_USER=$(vault read -format=json database/static-creds/pg-woodpecker | jq -r .data.username) +WP_DB_PASS=$(vault read -format=json database/static-creds/pg-woodpecker | jq -r .data.password) + +PG_POD=tmp-wp-register-$$ +cat </dev/null +apiVersion: v1 +kind: Pod +metadata: { name: $PG_POD, namespace: $NS } +spec: + restartPolicy: Never + containers: + - name: psql + image: postgres:15 + env: [{name: PGPASSWORD, value: "$WP_DB_PASS"}] + command: ["sleep", "300"] +EOF +trap "kubectl delete pod -n $NS $PG_POD --wait=false >/dev/null 2>&1 || true" EXIT +for _ in $(seq 1 30); do + PHASE=$(kubectl get pod -n $NS $PG_POD -o jsonpath='{.status.phase}' 2>/dev/null || true) + [ "$PHASE" = "Running" ] && break + sleep 1 +done + +VIKTOR_HASH=$(kubectl exec -n $NS $PG_POD -- psql -h pg-cluster-rw.dbaas -U "$WP_DB_USER" -d woodpecker -tA -c \ + "SELECT hash FROM users WHERE login='$FORGEJO_USER_LOGIN' AND forge_id=2" | tr -d '[:space:]') + +if [ -z "$VIKTOR_HASH" ]; then + echo "ERROR: no woodpecker user found for forge_id=2 login=$FORGEJO_USER_LOGIN" >&2 + echo " (have they ever logged in via Forgejo OAuth?)" >&2 + exit 1 +fi + +# Mint a session JWT (HS256) for that user. +b64() { openssl base64 -A | tr '+/' '-_' | tr -d '='; } +HEADER=$(printf '%s' '{"alg":"HS256","typ":"JWT"}' | b64) +PAYLOAD=$(printf '{"type":"user","user-id":"%s"}' \ + "$(kubectl exec -n $NS $PG_POD -- psql -h pg-cluster-rw.dbaas -U "$WP_DB_USER" -d woodpecker -tA -c \ + "SELECT id FROM users WHERE login='$FORGEJO_USER_LOGIN' AND forge_id=2" | tr -d '[:space:]')" | b64) +SIG=$(printf '%s.%s' "$HEADER" "$PAYLOAD" | openssl dgst -sha256 -hmac "$VIKTOR_HASH" -binary | b64) +TOKEN="$HEADER.$PAYLOAD.$SIG" + +# Sanity check: am I really logged in as viktor? +ME=$(curl -sf "$WP_URL/api/user" -H "Authorization: Bearer $TOKEN" | jq -r '.login') +if [ "$ME" != "$FORGEJO_USER_LOGIN" ]; then + echo "ERROR: minted token authenticates as '$ME', not '$FORGEJO_USER_LOGIN'" >&2 + exit 1 +fi +echo "Authenticated as: $ME" + +# Activate each repo via POST /api/repos?forge_remote_id=N +# Forgejo repo ID is fetched via the Forgejo API. +FORGEJO_AUTH="${FORGEJO_TOKEN:-$(vault kv get -field=forgejo_admin_token secret/viktor 2>/dev/null || true)}" +if [ -z "$FORGEJO_AUTH" ]; then + echo "ERROR: set FORGEJO_TOKEN env or seed secret/viktor/forgejo_admin_token in vault" >&2 + exit 1 +fi + +for repo in "$@"; do + FRID=$(curl -sf "$FORGEJO_URL/api/v1/repos/$repo" -H "Authorization: token $FORGEJO_AUTH" | jq -r .id 2>/dev/null || true) + if [ -z "$FRID" ] || [ "$FRID" = "null" ]; then + echo " $repo: ERROR resolving Forgejo repo id" >&2 + continue + fi + HTTP=$(curl -s -X POST "$WP_URL/api/repos?forge_remote_id=$FRID" \ + -H "Authorization: Bearer $TOKEN" \ + -o /tmp/wp-add-$FRID.json -w "%{http_code}") + case "$HTTP" in + 200) echo " $repo: activated (id=$(jq -r .id /tmp/wp-add-$FRID.json))" ;; + 409) echo " $repo: already active" ;; + *) echo " $repo: HTTP $HTTP — $(cat /tmp/wp-add-$FRID.json)" ;; + esac + rm -f /tmp/wp-add-$FRID.json +done diff --git a/secrets/fullchain.pem b/secrets/fullchain.pem index 435a3239..b0dcf470 100644 Binary files a/secrets/fullchain.pem and b/secrets/fullchain.pem differ diff --git a/secrets/privkey.pem b/secrets/privkey.pem index b6cf256c..444b348e 100644 Binary files a/secrets/privkey.pem and b/secrets/privkey.pem differ diff --git a/stacks/actualbudget/.terraform.lock.hcl b/stacks/actualbudget/.terraform.lock.hcl index e8910be1..7959dc66 100644 --- a/stacks/actualbudget/.terraform.lock.hcl +++ b/stacks/actualbudget/.terraform.lock.hcl @@ -24,6 +24,14 @@ provider "registry.terraform.io/cloudflare/cloudflare" { ] } +provider "registry.terraform.io/goauthentik/authentik" { + version = "2024.12.1" + constraints = "~> 2024.10" + hashes = [ + "h1:roBMd+gi+TGgikH/bMzEI8JfvJiMAQWt+8FmokCrQIs=", + ] +} + provider "registry.terraform.io/hashicorp/helm" { version = "3.1.1" hashes = [ diff --git a/stacks/actualbudget/factory/main.tf b/stacks/actualbudget/factory/main.tf index 820f3117..af80e962 100644 --- a/stacks/actualbudget/factory/main.tf +++ b/stacks/actualbudget/factory/main.tf @@ -33,6 +33,10 @@ variable "homepage_annotations" { type = map(string) default = {} } +variable "storage_size" { + type = string + default = "1Gi" +} resource "kubernetes_persistent_volume_claim" "data_encrypted" { wait_until_bound = false @@ -50,7 +54,7 @@ resource "kubernetes_persistent_volume_claim" "data_encrypted" { storage_class_name = "proxmox-lvm-encrypted" resources { requests = { - storage = "1Gi" + storage = var.storage_size } } } @@ -261,7 +265,7 @@ resource "kubernetes_cron_job_v1" "bank-sync" { metadata {} spec { backoff_limit = 1 - ttl_seconds_after_finished = 300 + ttl_seconds_after_finished = 86400 template { metadata {} spec { @@ -287,23 +291,28 @@ resource "kubernetes_cron_job_v1" "bank-sync" { LAST_SUCCESS=$END else SUCCESS=0 - LAST_SUCCESS=0 echo "Bank sync failed with HTTP $HTTP_CODE:" cat /tmp/response.txt echo "" fi - cat < ExtractorRegistry: # JW Player file URL. The site embeds the m3u8 in HTML so curl-based # parsing is enough — no browser needed. registry.register(DD12Extractor()) + # HmembedsExtractor offline-decodes hmembeds.one JWT m3u8 URLs + # (base64+XOR with hardcoded key per page; reverse-engineered + # 2026-05-07). Verifier filters dead origins. + registry.register(HmembedsExtractor()) # StremioAddonExtractor calls Stremio addon HTTP APIs (TvVoo, StremVerse) # which already index Sky F1 / DAZN F1 / Vavoo IPTV channels. No # Stremio client needed — just /stream//.json calls. diff --git a/stacks/f1-stream/files/backend/extractors/hmembeds.py b/stacks/f1-stream/files/backend/extractors/hmembeds.py new file mode 100644 index 00000000..b12b861d --- /dev/null +++ b/stacks/f1-stream/files/backend/extractors/hmembeds.py @@ -0,0 +1,131 @@ +"""hmembeds.one decoder + extractor. + +Reverse-engineered 2026-05-07 (4-agent parallel session). The hmembeds +embed page contains an inline `