Commit graph

34 commits

Author SHA1 Message Date
Viktor Barzin
410c893647 fix(provision): security hardening from code review
- Add input validation: username regex + email format check in pipeline
- Quote variables in .provision-env to prevent shell injection
- Remove dead source command (each Woodpecker command is separate shell)
- Use jq to build JSON payloads (prevents injection via group names)
- Clean up git-crypt key on failure (use ; instead of &&)
- Add Kyverno ndots lifecycle ignore to webhook-handler deployment
2026-03-18 21:25:03 +00:00
Viktor Barzin
82403a933c fix(provision): remove TF apply from pipeline, notify for manual apply
Vault stack can't be applied in CI (git-crypt TLS certs + sensitive
for_each on k8s_users). Pipeline now automates Vault KV update +
Authentik group creation, then notifies admin to apply stacks manually.
This matches the existing pattern — vault is not in default.yml either.
2026-03-18 00:23:06 +00:00
Viktor Barzin
d76b4b698f fix(provision): targeted vault apply + git-crypt in terragrunt step
- Two-pass vault apply: first target new user resources, then full apply
- Add git-crypt unlock to terragrunt step (TLS certs needed at plan time)
2026-03-18 00:19:16 +00:00
Viktor Barzin
6fad484126 fix(provision): reduce memory limit to 4Gi (LimitRange max) 2026-03-18 00:15:26 +00:00
Viktor Barzin
de6a5caecc fix(provision): merge terragrunt-apply into single shell block for env persistence 2026-03-18 00:11:14 +00:00
Viktor Barzin
7a24ff6702 fix(provision): use $USERNAME/$EMAIL directly — Woodpecker 3.x env vars
Woodpecker 3.x exposes pipeline variables with their original key names
(USERNAME, EMAIL), not CI_PIPELINE_VARIABLE_ prefix.
2026-03-18 00:04:51 +00:00
Viktor Barzin
52dc657af5 debug(provision): dump env vars to find correct variable names 2026-03-18 00:00:33 +00:00
Viktor Barzin
0a05343d86 fix(provision): use $VAR instead of ${VAR} to avoid Woodpecker interpolation
Woodpecker performs compile-time substitution on ${...} patterns,
replacing pipeline variables with empty strings. Using $VAR without
braces lets the shell evaluate them at runtime.
2026-03-17 23:58:46 +00:00
Viktor Barzin
fd130971aa feat(provision): automated user provisioning via Authentik webhook
- Expand CI Vault policy: write secret/data/platform + Transit SOPS keys
- Add Woodpecker provision-user.yml pipeline (manual event, API-triggered)
- Add env vars to webhook-handler deployment for Woodpecker/Authentik integration
- Update add-user skill with automated flow documentation
- Update Woodpecker repo ID list in CLAUDE.md
2026-03-17 23:56:30 +00:00
Viktor Barzin
73511b1230 extract remaining 19 modules from platform, complete stack split [ci skip]
Phase 3: all 27 platform modules now run as independent stacks.
Platform reduced to empty shell (outputs only) for backward compat
with 72 app stacks that declare dependency "platform".
Fixed technitium cross-module dashboard reference by copying file.
Woodpecker pipeline applies all 27+1 stacks in parallel via loop.
All applied with zero destroys.
2026-03-17 21:42:16 +00:00
Viktor Barzin
ae36dc253b extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip]
Phase 2 of platform stack split. 5 more modules extracted into
independent stacks. All applied successfully with zero destroys.
Cloudflared now reads k8s_users from Vault directly to compute
user_domains. Woodpecker pipeline runs all 8 extracted stacks
in parallel. Memory bumped to 6Gi for 9 concurrent TF processes.
Platform reduced from 27 to 19 modules.
2026-03-17 21:34:11 +00:00
Viktor Barzin
3c804aedf8 extract dbaas, authentik, crowdsec from platform into independent stacks [ci skip]
Phase 1 of platform stack split for parallel CI applies.
All 3 modules were fully independent (no cross-module refs).
State migrated via terraform state mv. All 3 stacks applied
with zero changes (dbaas had pre-existing ResourceQuota drift).
Woodpecker pipeline updated to run extracted stacks in parallel.
2026-03-17 18:11:53 +00:00
Viktor Barzin
b6d619e5df fix: increase terragrunt-apply step memory to 2Gi
LimitRange defaults containers to 192Mi which is insufficient for
terragrunt apply on the platform stack (48 vault refs, many modules).
Set explicit 1Gi request / 2Gi limit via backend_options.
2026-03-15 22:59:34 +00:00
Viktor Barzin
0c1239030d fix: CI pipeline - disable corrupted cache, add pull before push
- build-cli.yml: comment out cache_from/cache_to to avoid BuildKit
  "short read" errors from corrupted registry cache
- default.yml: add git pull --rebase before push in cleanup-and-push
  to handle remote having newer commits
2026-03-15 22:51:08 +00:00
Viktor Barzin
50620e6047 add generic multi-user cluster onboarding system
Data-driven user onboarding: add a JSON entry to Vault KV k8s_users,
apply vault + platform + woodpecker stacks, and everything is auto-generated.

Vault stack: namespace creation, per-user Vault policies with secret isolation
via identity entities/aliases, K8s deployer roles, CI policy update.

Platform stack: domains field in k8s_users type, TLS secrets per user namespace,
user domains merged into Cloudflare DNS, user-roles ConfigMap mounted in portal.

Woodpecker stack: admin list auto-generated from k8s_users, WOODPECKER_OPEN=true.

K8s-portal: dual-track onboarding (general/namespace-owner), namespace-owner
dashboard with Vault/kubectl commands, setup script adds Vault+Terraform+Terragrunt,
contributing page with CI pipeline template, versioned image tags in CI pipeline.

New: stacks/_template/ with copyable stack template for namespace-owners.
2026-03-15 22:23:36 +00:00
Viktor Barzin
3aba29e7a3 remove SOPS pipeline, deploy ESO + Vault DB/K8s engines
Vault is now the sole source of truth for secrets. SOPS pipeline
removed entirely — auth via `vault login -method=oidc`.

Part A: SOPS removal
- vault/main.tf: delete 990 lines (93 vars + 43 KV write resources),
  add self-read data source for OIDC creds from secret/vault
- terragrunt.hcl: remove SOPS var loading, vault_root_token, check_secrets hook
- scripts/tg: remove SOPS decryption, keep -auto-approve logic
- .woodpecker/default.yml: replace SOPS with Vault K8s auth via curl
- Delete secrets.sops.json, .sops.yaml

Part B: External Secrets Operator
- New stack stacks/external-secrets/ with Helm chart + 2 ClusterSecretStores
  (vault-kv for KV v2, vault-database for DB engine)

Part C: Database secrets engine (in vault/main.tf)
- MySQL + PostgreSQL connections with static role rotation (24h)
- 6 MySQL roles (speedtest, wrongmove, codimd, nextcloud, shlink, grafana)
- 6 PostgreSQL roles (trading, health, linkwarden, affine, woodpecker, claude_memory)

Part D: Kubernetes secrets engine (in vault/main.tf)
- RBAC for Vault SA to manage K8s tokens
- Roles: dashboard-admin, ci-deployer, openclaw, local-admin
- New scripts/vault-kubeconfig helper for dynamic kubeconfig

K8s auth method with scoped policies for CI, ESO, OpenClaw, Woodpecker sync.
2026-03-15 16:37:38 +00:00
Viktor Barzin
9f2ac0fd1a [ci skip] update AGENTS.md + CLAUDE.md with SOPS workflow, add k8s-portal CI pipeline
AGENTS.md: added SOPS secrets management section, scripts/tg usage,
contributor onboarding steps, pull-through cache bypass notes.

CLAUDE.md: added SOPS workflow note, linux/amd64 build reminder,
versioned tag guidance for pull-through cache.

CI: new .woodpecker/k8s-portal.yml pipeline — auto-builds and deploys
the k8s portal when files under stacks/platform/modules/k8s-portal/files/
change on master push. Uses buildx for linux/amd64.
2026-03-07 15:37:19 +00:00
Viktor Barzin
1f2c1ca361 [ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars
Phase 5 — CI pipelines:
- default.yml: add SOPS decrypt in prepare step, change git add . to
  specific paths (stacks/ state/ .woodpecker/), cleanup on success+failure
- renew-tls.yml: change git add . to git add secrets/ state/

Phase 6 — sensitive=true:
- Add sensitive = true to 256 variable declarations across 149 stack files
- Prevents secret values from appearing in terraform plan output
- Does NOT modify shared modules (ingress_factory, nfs_volume) to avoid
  breaking module interface contracts

Note: CI pipeline SOPS decryption requires sops_age_key Woodpecker secret
to be created before the pipeline will work with SOPS. Until then, the old
terraform.tfvars path continues to function.
2026-03-07 14:30:36 +00:00
Viktor Barzin
b1a3685f50 remove f1-stream CI pipeline (moved to separate repo)
The f1-stream project now lives at github.com/ViktorBarzin/f1-stream
with its own Woodpecker CI pipeline.
2026-03-01 14:43:06 +00:00
Viktor Barzin
691c74aa30 fix: use inline cache to avoid cache_from comma splitting bug 2026-02-28 19:56:00 +00:00
Viktor Barzin
96c0353c13 [ci skip] add TLS to private registry, switch to registry.viktorbarzin.me 2026-02-28 19:40:38 +00:00
Viktor Barzin
140f48c6ee fix: remove private registry from docker builds, push only to DockerHub
The BuildKit builder cannot push to the insecure HTTP registry at
registry.viktorbarzin.lan:5050 because buildkit_config is not being
applied by the plugin. Simplified to DockerHub-only push for now.
Private registry caching and push can be re-added once buildkit_config
issue is resolved.
2026-02-28 19:15:54 +00:00
Viktor Barzin
a1ba218cd2 [ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk
Major milestone - shared PostgreSQL moved from NFS to CloudNativePG:
- CNPG cluster (pg-cluster) running in dbaas namespace on local-path storage
- PostGIS image (ghcr.io/cloudnative-pg/postgis:16) for dawarich compatibility
- All 20 databases and 19 roles restored from pg_dumpall backup
- postgresql.dbaas Service patched to point at CNPG primary
- Old PG deployment scaled to 0 (NFS data intact for rollback)
- All 12+ dependent services verified running:
  authentik, n8n, dawarich, tandoor, linkwarden, netbox, woodpecker,
  rybbit, affine, health, resume, trading-bot, atuin
- Authentik PgBouncer working through the switched endpoint

TODO: codify CNPG cluster in Terraform, add 2nd replica, update backup CronJob
2026-02-28 19:08:06 +00:00
Viktor Barzin
b3eaf76684 fix: use cache_images instead of cache_from to avoid comma splitting
The plugin-docker-buildx (Codeberg version) changed CacheFrom from
string to StringSlice, which causes urfave/cli to split on commas.
The cache_images setting properly handles registry refs by generating
both --cache-from and --cache-to flags automatically.
2026-02-28 18:53:08 +00:00
Viktor Barzin
87fc11121d fix: use plain string for cache_from/cache_to and fix caretta helm_release
- cache_from/cache_to must be plain strings, not YAML lists — the
  plugin-docker-buildx treats them as single string values and the
  Woodpecker settings layer was splitting comma-separated list items
  into separate --cache-from flags (type=registry and ref=... separately)
- caretta.tf: replace deprecated set{} blocks with values=[yamlencode()]
  to fix Terraform plan error with newer Helm provider
2026-02-28 18:47:20 +00:00
Viktor Barzin
0ebf850893 fix: use YAML list for cache_from/cache_to to prevent comma splitting 2026-02-28 18:40:55 +00:00
Viktor Barzin
be47592e08 fix: remove deprecated secrets field from slack step 2026-02-28 18:32:10 +00:00
Viktor Barzin
4b6ade7b08 fix: replace removed woodpeckerci/plugin-slack with curl-based webhook 2026-02-28 18:25:23 +00:00
Viktor Barzin
c7bb324f64 [ci skip] add BuildKit layer caching and dual-push to f1-stream pipeline 2026-02-28 17:56:56 +00:00
Viktor Barzin
4a0fc18c60 [ci skip] add BuildKit layer caching and dual-push to build-cli pipeline 2026-02-28 17:56:52 +00:00
Viktor Barzin
9fd788b158 [ci skip] f1-stream: add CDN token refresh, SvelteKit frontend, multi-stream layout (Phases 6-8)
- Phase 6: CDN token lifecycle with 3-strategy URL matching and periodic refresh
- Phase 7: SvelteKit 2/Svelte 5 frontend with schedule calendar and hls.js player
- Phase 8: Multi-stream layout supporting up to 4 simultaneous HLS streams
- Update Dockerfile to multi-stage build (Node.js frontend + Python backend)
- Switch deployment to :latest tag with Always pull policy for CI-driven deploys
- Update Woodpecker CI to use explicit latest tag
2026-02-23 23:59:35 +00:00
Viktor Barzin
f3bcd95242 [ci skip] f1-stream: replace Go service with Python/FastAPI skeleton
Replaces the existing Go-based f1-stream service with a new Python/FastAPI
backend as the foundation for the rebuilt F1 streaming aggregation service.

- New FastAPI backend with health and root endpoints
- Python 3.13 slim Dockerfile (replaces Go multi-stage build)
- Updated Terraform deployment (port 8000, reduced resources)
- Buildx-based redeploy.sh with --platform linux/amd64
- Added Woodpecker CI pipeline for automated builds
- Removed all old Go source, node_modules, static assets
2026-02-23 22:47:06 +00:00
Viktor Barzin
ebecaaee5c Woodpecker CI: use built-in clone, fix CoreDNS DNS resolution [CI SKIP]
- Switch from custom clone override to woodpeckerci/plugin-git built-in clone
  (handles auth automatically via netrc from GitHub OAuth token)
- Add 8.8.8.8 and 1.1.1.1 as CoreDNS upstream resolvers alongside pfSense
  (fixes intermittent DNS timeouts causing clone failures)
- Fix missing comma after heredoc in audit-policy.tf (syntax error)
2026-02-23 00:08:42 +00:00
Viktor Barzin
cbf041bcc9 [ci skip] Add Woodpecker CI stack (WIP) and claude agents
- Add stacks/woodpecker/ with Helm-based deployment config
- Add .woodpecker/ CI pipeline configs (default, build-cli, renew-tls)
- Add NFS export entry for woodpecker
- Add .claude/agents/ definitions
2026-02-22 21:30:25 +00:00