Merge forgejo/master: reconcile diverged lineages [ci skip]
Local checkout carried the 2026-06-10 DNS/registry architecture series (pfSense forward-zone, CoreDNS viktorbarzin.me:53 carve-out, nodes stock) + vzdump/nfs-mirror/workstation-rebuild commits that never reached the canonical remote, while forgejo master received the emo-access series via isolated worktrees. Viktor asked to merge. Conflict resolutions (newest iteration wins in each file): - stacks/forgejo/cleanup.tf: LOCAL — dry_run=true (2026-06-10 revert after live retention orphaned OCI indexes; remote had 06-09 enable) - .claude/CLAUDE.md, docs/architecture/backup-dr.md: LOCAL — final registry/DNS architecture + implemented vzdump alerts - scripts/workstation/setup-devvm.sh: LOCAL — pinned-version, reproducible-rebuild refactor (kubelogin pin, restructured staging) - scripts/workstation/managed-settings.json: FORGEJO — the allow-then-audit claudeMd (matches /etc deployment byte-for-byte) - scripts/t3-provision-users.sh: FORGEJO comment; refresh_locked_clone intact [ci skip]: all stack changes in the local lineage were applied live this morning — CI would re-walk 100+ stacks via the modules/ fallback for zero state change. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This commit is contained in:
commit
8cfd0e5e5c
10 changed files with 160 additions and 5 deletions
|
|
@ -197,6 +197,34 @@ steps:
|
|||
- Keeps Woodpecker global secrets in sync with Vault
|
||||
- Runs in `woodpecker` namespace
|
||||
|
||||
## Infra repo CI (Woodpecker repo 82 — Forgejo forge)
|
||||
|
||||
The infra repo itself runs on Woodpecker via the **Forgejo** forge (repo id 82,
|
||||
registered 2026-06-08; the GitHub-side repo id 1 also remains registered).
|
||||
Pushes to `master` fire `.woodpecker/default.yml` (changed-stacks terragrunt
|
||||
apply) plus the `notify-nonadmin-push` Slack audit step (allow-then-audit
|
||||
contribution model — see `multi-tenancy.md`). Operational facts (2026-06-10):
|
||||
|
||||
- **Webhook URL is the IN-CLUSTER service**: `http://woodpecker-server.woodpecker.svc.cluster.local/api/hook?...`
|
||||
(PATCHed via the Forgejo API). The Woodpecker-generated default
|
||||
(`https://ci.viktorbarzin.me/...`) resolves to the non-proxied public A
|
||||
record from pods → NAT hairpin → intermittent `context deadline exceeded`,
|
||||
silently dropping push events (found when a push produced no pipeline).
|
||||
If Woodpecker ever "repairs" the repo it will rewrite the hook back to
|
||||
`ci.viktorbarzin.me` — re-apply the in-cluster URL (or pin `ci.viktorbarzin.me`
|
||||
in the CoreDNS pod carve-out alongside forgejo).
|
||||
- **Repo-scoped secrets must exist on BOTH repos**: pipelines reference
|
||||
repo-level secrets (`registry_ssh_key`, `pve_ssh_key`, `CLOUDFLARE_TOKEN`,
|
||||
…). Repo 82 was registered without them and every all-workflow compile
|
||||
errored with `secret "registry_ssh_key" not found`. Fixed by cloning repo-1
|
||||
rows to repo 82 in the Woodpecker DB (`insert into secrets … select … where
|
||||
repo_id=1`). When registering a new forge repo for infra, clone the secret
|
||||
set too.
|
||||
- **Empty commits defeat path filters**: a commit with no changed files makes
|
||||
Woodpecker include ALL workflow files (path conditions can't exclude), so
|
||||
every repo secret must resolve. Normal commits with real files only compile
|
||||
the matching workflows.
|
||||
|
||||
## Decisions & Rationale
|
||||
|
||||
### Why GitHub Actions + Woodpecker?
|
||||
|
|
|
|||
|
|
@ -543,9 +543,17 @@ Separate from the in-cluster namespace-owner model above, the **devvm** (`10.0.1
|
|||
|
||||
**Config inheritance (live):** wizard authors the base (his chezmoi-versioned `~/.claude`). Two native layers carry it to every user — the enforced org `claudeMd` in `/etc/claude-code/managed-settings.json` (top precedence, all sessions) and per-user `~/.claude/{skills,rules,…}` **symlinks** to the base (seeded via `/etc/skel`; edits propagate live). Secrets stay per-user at mode 600, never symlinked.
|
||||
|
||||
**Infra access:** non-admins get their own **writable, git-crypt-LOCKED** clone of the (public) infra repo at `~/code` — code/docs plaintext, secret files (`*.tfvars`, `secrets/**`) stay ciphertext. Changes are ungated (push ≠ apply); the real boundary is apply-time (`scripts/tg apply` needs an admin Vault token + cluster RBAC).
|
||||
**Infra access:** non-admins get their own **writable, git-crypt-LOCKED** clone of the (public) infra repo at `~/code` — code/docs plaintext, secret files (`*.tfvars`, `secrets/**`) stay ciphertext. The provisioner clones anonymously from the public GitHub mirror; **contribute access is wired per-user on top** (see below). The apply boundary still holds (`scripts/tg apply` needs an admin Vault token + cluster RBAC), but **pushing `master` is NOT inert** — the Forgejo→Woodpecker webhook fires `.woodpecker/default.yml` (`event: push, branch: master`, `require_approval: forks` only), which terragrunt-applies changed stacks. `master` is **branch-protected on Forgejo** (force-push disabled for everyone — history is append-only; push + merge whitelists = `viktor` + explicitly granted users, deploy keys allowed). **Allow-then-audit (Viktor, 2026-06-10):** `ebarzin` (emo) is on the whitelist and pushes straight to `master` — no PR gate. The tracking burden moves to: (a) **commit messages that record what + why** (the agent instructions in AGENTS.md and the managed claudeMd require the body to paraphrase the user's request), (b) the **`notify-nonadmin-push` Slack audit step** in `.woodpecker/default.yml` — every master push by a non-admin author is posted to Slack (admin pushes are not), and (c) non-admins **never use `[ci skip]`** so every change fires the pipeline (and thus the audit feed). Users NOT on the whitelist fall back to `<user>/<topic>` branches + PRs. **Clones stay fresh automatically** (2026-06-10): the hourly `t3-provision-users` reconcile runs `refresh_locked_clone` (fetch all remotes + fast-forward `master`, ONLY when on master with a clean tree and an upstream — dirty trees and local commits are left alone with a WARN), and `start-claude.sh` does the same freshen at session launch (15s-capped so an offline remote never stalls the session).
|
||||
|
||||
**Status (2026-06-08):** built + verified on the live host — capacity (8 GiB swap), config inheritance, roster-driven provisioner, per-user locked clone, **per-user OIDC kubeconfig + the `oidc-power-user-readonly` ClusterRole + emo's `k8s_users` entry (applied + impersonation-verified), and the Authentik `T3 Users` edge gate (applied + verified)**. **Remaining (held / future):** the emo cutover to his own locked clone (Phase 5), the offboarding apply-side (Phase 7), per-user MCP/auth injection, and roster-reconciled `T3 Users` membership. See `../runbooks/offboard-user.md` for deprovisioning.
|
||||
**Contribute access (per non-admin, manual — the anca/tripit PAT precedent):**
|
||||
1. Add their Forgejo user as a **write** collaborator on `viktor/infra` (`PUT /api/v1/repos/viktor/infra/collaborators/<login>`).
|
||||
2. Mint a PAT — the admin REST endpoint 404s here, use the in-pod CLI: `kubectl -n forgejo exec deploy/forgejo -- su -s /bin/sh git -c "forgejo admin user generate-access-token --username <login> --token-name devvm-infra-git --scopes 'write:repository'"`.
|
||||
3. Install it in their `~/.git-credentials` (`https://<login>:<token>@forgejo.viktorbarzin.me`, mode 600) + `git config --global credential.helper store`, set `user.name`/`user.email`.
|
||||
4. In their clone: `git remote add forgejo https://forgejo.viktorbarzin.me/viktor/infra.git` and `git branch --set-upstream-to=forgejo/master master` (origin stays the anonymous GitHub mirror).
|
||||
5. (Optional — Viktor's call per user) Grant direct master push: add their login to the `master` branch-protection push + merge whitelists (`PATCH /api/v1/repos/viktor/infra/branch_protections/master`). Done for `ebarzin` 2026-06-10.
|
||||
6. Verify: branch push succeeds; a `master` push succeeds for whitelisted users and is rejected with `Not allowed to push to protected branch` otherwise.
|
||||
|
||||
**Status (2026-06-10):** built + verified on the live host — capacity (8 GiB swap), config inheritance, roster-driven provisioner, per-user locked clone, per-user OIDC kubeconfig + the `oidc-power-user-readonly` ClusterRole + emo's `k8s_users` entry (applied + impersonation-verified), the Authentik `T3 Users` edge gate, **the emo Phase-5 cutover (own clone + launcher repoint + `code-shared` removal, completed 2026-06-10) and emo's contribute access (`ebarzin` write collaborator + PAT + protected `master`)**. Per the live `/etc/skel` design, non-admin `~/.claude/{rules,skills}` symlinks into the admin base are **kept** (they ARE the shared-base delivery mechanism — the plan's step to remove them is obsolete). **Remaining (held / future):** the offboarding apply-side (Phase 7), per-user MCP/auth injection, and roster-reconciled `T3 Users` membership. See `../runbooks/offboard-user.md` for deprovisioning.
|
||||
|
||||
## Related
|
||||
|
||||
|
|
|
|||
|
|
@ -166,6 +166,8 @@ Design principle: **every bit of devvm setup is an idempotent git script** — n
|
|||
- **ADR-0002 — devvm Linux users, not K8s ephemeral pods.** Re-platforming is overkill at this scale; config-push is easier on one host.
|
||||
- **ADR-0003 — Config inheritance via native machine-wide layers + per-user override.** Rejected: periodic sync, OverlayFS (no live lowerdir edits), Nix (rebuild not live).
|
||||
- **ADR-0004 — Infra access via per-user writable git-crypt-locked clones (changes ungated).** Each non-admin gets their own writable, keyless (locked) clone — read + edit + push freely, no PR gate. Safe because infra apply is manual + admin-only (push ≠ apply, id=4355) and the clone can't decrypt secrets. Rejected: the shared read-only mirror (gated changes) and the shared unlocked tree (secret leak + commit entanglement). Trade: repo-local CLAUDE.md updates via pull, not live (global config inheritance stays live via §4).
|
||||
- **AMENDED 2026-06-10 — the "push ≠ apply" premise was WRONG.** The Forgejo→Woodpecker webhook on `viktor/infra` fires `.woodpecker/default.yml` on `push` to `master` (`require_approval: forks` only), which terragrunt-applies changed stacks — so an ungated master push IS a deploy. Enforcement added instead of dropping the ADR: Forgejo **branch protection on `master`** (push + merge whitelists = `viktor`, deploy keys allowed). Non-admins keep free branch pushes + PRs; only admin merges land on master. "No PR gate" is thereby reversed for non-admins; the rest of the ADR (per-user locked clones) stands. As-built: `../architecture/multi-tenancy.md` → "Contribute access".
|
||||
- **AMENDED AGAIN 2026-06-10 (later) — allow-then-audit.** Viktor granted emo (`ebarzin`) direct master push ("he's allowed to make any change; what matters is tracking what changed and why"). The PR gate is dropped FOR WHITELISTED USERS; tracking is enforced instead: agent-written commit messages must carry the user's plain-language intent (the WHY), a `notify-nonadmin-push` Slack step in `.woodpecker/default.yml` surfaces every non-admin master push, `[ci skip]` is forbidden for non-admins, and force-push stays disabled (append-only history). Accepted consequence: emo's pushes auto-apply changed stacks via CI. Branch protection + the PR fallback remain for non-whitelisted users.
|
||||
- **ADR-0005 — Power-user = cluster-wide read-only (no Secrets), via a NEW dedicated ClusterRole.** Re-widens cross-tenant READ for the trusted power-user tier only — but via a NEW `oidc-power-user-readonly` ClusterRole (get/list/watch, NO `secrets`), NOT the existing `oidc-power-user` (which grants read+write+Secrets and is unbound). Bound to the user's OIDC identity (kubelogin) — the apiserver accepts Authentik OIDC for the `kubernetes` audience; the dashboard's SA-token pattern is for the dashboard UI only.
|
||||
- **ADR-0006 — The roster is the single source of truth for the FULL lifecycle.** `roster.yaml` drives onboard *and* offboard; `/etc/ttyd-user-map`, `dispatch.json`, and Authentik `T3 Users` membership are *derived* from it, and tier is *validated* against `k8s_users` (fail-loud on mismatch). Rejected: hand-maintaining the four membership lists in parallel (guaranteed drift). Offboarding is first-class + staged (reversible cut → cluster revoke → gated `userdel`), not an afterthought.
|
||||
- **ADR-0007 — Add swap + a capacity budget to the devvm before onboarding active users.** A shared 24 GB / **0-swap** host OOM-kills live sessions under multi-user load (wizard alone runs ~20). Swap + a max-concurrent ceiling are prerequisites, not follow-ups.
|
||||
|
|
|
|||
|
|
@ -171,6 +171,8 @@ users:
|
|||
|
||||
### Task 5.1: Cut emo over to his own writable locked clone (opt-in, reversible)
|
||||
|
||||
> **DONE 2026-06-10** (staged across 06-08 → 06-10), with two deviations: (1) step 4(c) **skipped deliberately** — the live `/etc/skel` shared base delivers `~/.claude/{rules,skills}` AS symlinks into the admin base, so emo's existing symlinks match the as-built design and were kept; (2) push access was **added** (not in this plan): `ebarzin` = write collaborator on Forgejo `viktor/infra` + PAT in `~/.git-credentials` + `forgejo` remote, with `master` branch-protected (see ADR-0004 amendment — push to master auto-applies via Woodpecker, so it is whitelist-gated to `viktor`). Verified: branch push OK, master push rejected, `code-shared` removed, admin tree unreadable as emo.
|
||||
|
||||
**Files:** none (host state; an explicit one-time action — NOT the routine reconcile)
|
||||
|
||||
- [ ] **Step 1: Prereqs.** Confirm emo inherits config (Phase 1) + has his scoped kubeconfig (Phase 2). (Phase 3 deliberately SKIPPED emo — his clone is created *here*.)
|
||||
|
|
|
|||
|
|
@ -29,7 +29,26 @@ gated `userdel_archive`, which is **never** auto-applied).
|
|||
sudo systemctl disable --now t3-serve@<os_user>.service
|
||||
sudo passwd -l <os_user>
|
||||
```
|
||||
4. **Verify:** they can no longer reach `t3.viktorbarzin.me` (302 → Authentik, then
|
||||
4. **Revoke git + group access** *(manual)*:
|
||||
```bash
|
||||
# legacy secret-bearing group, if they were ever in it
|
||||
sudo gpasswd -d <os_user> code-shared
|
||||
# drop write access to the infra repo
|
||||
curl -X DELETE -H "Authorization: token <admin_pat>" \
|
||||
https://forgejo.viktorbarzin.me/api/v1/repos/viktor/infra/collaborators/<forgejo_login>
|
||||
# if they were whitelisted for direct master push, remove them from the
|
||||
# branch-protection whitelists (PATCH with the remaining usernames)
|
||||
curl -X PATCH -H "Authorization: token <admin_pat>" -H 'Content-Type: application/json' \
|
||||
https://forgejo.viktorbarzin.me/api/v1/repos/viktor/infra/branch_protections/master \
|
||||
-d '{"push_whitelist_usernames":["viktor"],"merge_whitelist_usernames":["viktor"]}'
|
||||
# revoke their devvm git PAT (token name: devvm-infra-git; admin PAT may
|
||||
# manage other users' tokens — verified 2026-06-10; the CLI has no delete)
|
||||
curl -X DELETE -H "Authorization: token <admin_pat>" \
|
||||
https://forgejo.viktorbarzin.me/api/v1/users/<forgejo_login>/tokens/devvm-infra-git
|
||||
```
|
||||
Note: their already-running sessions keep dropped groups until cycled — restart
|
||||
`t3-serve@<os_user>` to enforce immediately.
|
||||
5. **Verify:** they can no longer reach `t3.viktorbarzin.me` (302 → Authentik, then
|
||||
denied once removed from the `T3 Users` group — Part C) and cannot log in. Nothing
|
||||
is deleted; re-adding the roster entry + reconcile fully restores them.
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue