infra

Author	SHA1	Message	Date
Viktor Barzin	b28c76e371	[infra] Wire drift detection to Pushgateway + alert on stale/unaddressed drift ## Context Wave 7 of the state-drift consolidation plan. The drift-detection pipeline (`.woodpecker/drift-detection.yml`) already ran terragrunt plan on every stack daily and Slack-posted a summary, but its output was ephemeral — nothing persisted in Prometheus, so there was no historical view of which stacks drift, when, or for how long. Following the convergence work in waves 1–6 (168 KYVERNO_LIFECYCLE_V1 markers, 4 stacks adopted, Phase 4 mysql cleanup), the baseline is clean enough that new drift should stand out. That only works if we have observability. ## This change ### `.woodpecker/drift-detection.yml` Enhances the existing cron pipeline to push a batched set of metrics to the in-cluster Pushgateway (`prometheus-prometheus-pushgateway.monitoring:9091`) after each run: \| Metric \| Kind \| Purpose \| \|---\|---\|---\| \| `drift_stack_state{stack}` \| gauge, 0/1/2 \| 0=clean, 1=drift, 2=error \| \| `drift_stack_first_seen{stack}` \| gauge (unix seconds) \| Preserved across runs for drift-age tracking \| \| `drift_stack_age_hours{stack}` \| gauge (hours) \| Computed from `first_seen` \| \| `drift_stack_count` \| gauge (count) \| Total drifted stacks this run \| \| `drift_error_count` \| gauge (count) \| Total plan-errored stacks \| \| `drift_clean_count` \| gauge (count) \| Total clean stacks \| \| `drift_detection_last_run_timestamp` \| gauge (unix seconds) \| Pipeline heartbeat \| First-seen preservation: on each drift hit, the pipeline queries Pushgateway for the existing `drift_stack_first_seen{stack=<stack>}` value. If present and non-zero, reuse it; otherwise stamp with `NOW`. That means age-hours grows monotonically until the stack goes clean (at which point state=0 resets first_seen by omission). Atomic batched push: all metrics for a run are POST'd in a single HTTP request. Pushgateway doesn't support atomic multi-metric updates natively, but batching at the pipeline layer prevents half-updated state if the curl is interrupted mid-run (the second call would just fail the entire run and alert on `DriftDetectionStale`). ### `stacks/monitoring/.../prometheus_chart_values.tpl` New `Infrastructure Drift` alert group with three rules: - DriftDetectionStale (warning, 30m): fires if `drift_detection_last_run_timestamp` is older than 26h. Gives a 2h grace window on top of the 24h cron so transient Pushgateway or cluster unavailability doesn't false-alarm. Guards against the pipeline silently failing or the cron not firing. - DriftUnaddressed (warning, 1h): fires if any stack has `drift_stack_age_hours > 72` — three days of unacknowledged drift. Three days is long enough to absorb weekends + typical review cycles but short enough to force follow-up before drift compounds. - DriftStacksMany (warning, 30m): fires if `drift_stack_count > 10` in a single run. Sudden wide drift usually signals systemic causes (new admission webhook, provider version bump, cluster-wide CRD upgrade) rather than individual configuration errors, and the alert body nudges toward that diagnosis. Applied to `stacks/monitoring` this session — 1 helm_release changed, no other drift surfaced. ## What is NOT in this change - The Wave 7 GitHub issue auto-filer — the full plan included filing a `drift-detected` issue per drifted stack. Deferred because it requires wiring the `file-issue` skill's convention + a gh token exposed to Woodpecker, both of which need separate setup. The Slack alert covers the same need at lower fidelity in the meantime. - The Wave 7 PG drift_history table — would provide the richest historical view but adds a new DB schema dependency for a CI pipeline. Pushgateway + Prometheus handle the 72h window we care about; PG history is nice-to-have for quarterly reviews. - Auto-apply marker (`# DRIFT_AUTO_APPLY_OK`) — premature until the baseline has been stable for a few cycles. Follow-ups tracked: file dedicated beads items for GH-issue filer + PG drift_history. ## Verification ``` $ cd stacks/monitoring && ../../scripts/tg apply --non-interactive Apply complete! Resources: 0 added, 1 changed, 0 destroyed. # After next cron run (cron expr: "drift-detection" in Woodpecker UI): $ curl -s http://prometheus-prometheus-pushgateway.monitoring:9091/metrics \ \| grep -c '^drift_' # expect a positive number ``` ## Reproduce locally 1. `git pull` 2. Check Prometheus rules: `curl -sk https://prometheus.viktorbarzin.lan/api/v1/rules \| jq '.data.groups[] \| select(.name == "Infrastructure Drift")'` 3. Manually trigger the Woodpecker cron and watch Pushgateway populate. Refs: Wave 7 umbrella (code-hl1) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 22:42:51 +00:00
Viktor Barzin	42f1c3cf4f	[claude-agent-service] Migrate all pipelines from DevVM SSH to K8s HTTP ## Context The claude-agent-service K8s pod (deployed 2026-04-15) provides an HTTP API for running Claude headless agents. Three workflows still SSH'd to the DevVM (10.0.10.10) to invoke `claude -p`. This eliminates that dependency. ## This change Pipeline migrations (SSH → HTTP POST to claude-agent-service): - `.woodpecker/issue-automation.yml` — Vault auth fetches API token instead of SSH key; curl POST /execute + poll /jobs/{id} replaces SSH invocation - `scripts/postmortem-pipeline.sh` — same pattern; uses jq for safe JSON construction of TODO payloads - `.woodpecker/postmortem-todos.yml` — drop openssh-client from apk install - `stacks/n8n/workflows/diun-upgrade.json` — SSH node replaced with HTTP Request node; API token via $env.CLAUDE_AGENT_API_TOKEN (added to Vault secret/n8n) Documentation updates: - `docs/architecture/incident-response.md` — Mermaid diagram: DevVM → K8s - `docs/architecture/automated-upgrades.md` — pipeline diagram + n8n action - `AGENTS.md` — pipeline description updated ## What is NOT in this change - DevVM decommissioning (still hosts terminal/foolery services) - Removal of SSH key secrets from Vault (kept for rollback) - n8n workflow import (must be done manually in n8n UI) [ci skip] Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>	2026-04-18 10:12:02 +00:00
Viktor Barzin	89af09852f	feat(ci): add Vault advisory locks to CI terraform applies CI now uses scripts/tg instead of raw terragrunt apply, acquiring the same per-stack Vault KV lock that user sessions use. This prevents CI from overwriting in-flight user applies. Changes: - Switch from xargs -P 4 (parallel) to serial while-read loop - CI skips stacks locked by users instead of racing them - Git rebase failures now exit 1 instead of silently continuing - Updated header comments to reflect new locking behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 20:53:00 +00:00
Viktor Barzin	f726d1c3fd	fix: stash local changes before git pull in CI pipelines DevVM may have unstaged changes from active sessions. Use git stash before pull to avoid 'cannot pull with rebase: unstaged changes' errors. Stash pop after to restore working state. [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:37:10 +00:00
Viktor Barzin	dcce76403a	fix: use direct env vars for Woodpecker pipeline variables Woodpecker injects manual pipeline variables as direct env vars (e.g., $ISSUE_NUMBER), not as CI_PIPELINE_VARIABLE_* prefixed vars. The provision-user pipeline already uses this pattern correctly. [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:34:42 +00:00
Viktor Barzin	704fa09185	fix: remove manual event from build-ci-image to fix issue automation build-ci-image.yml had event:[push,manual] which caused it to run on every manual pipeline trigger. Its registry_user/registry_password secrets don't have the manual event, causing all manual pipelines to error. Removed manual from its event list since it only needs push. Reverted evaluate conditions (Woodpecker evaluates secrets before conditions, so evaluate can't prevent missing-secret errors). [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:31:25 +00:00
Viktor Barzin	a583b11484	fix: guard manual Woodpecker pipelines with evaluate conditions When GHA triggers a manual pipeline for issue automation, ALL pipelines with event:manual fire. Added evaluate conditions: - issue-automation.yml: only runs when ISSUE_NUMBER is set - provision-user.yml: only runs when ISSUE_NUMBER is NOT set - build-ci-image.yml: only runs when ISSUE_NUMBER is NOT set This prevents build-ci-image from failing on missing registry_password secret when issue automation triggers. [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:29:35 +00:00
Viktor Barzin	92495d0fc3	fix: start Claude from ~/code to load root CLAUDE.md correctly Both issue-automation and postmortem pipelines were cd'ing into ~/code/infra before running Claude, missing the root CLAUDE.md with beads config and project-wide instructions. Now cd to ~/code and use relative agent paths from there. [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:15:32 +00:00
Viktor Barzin	7bb9ec2934	Add agent task tracking documentation Documents the centralized Beads/Dolt task tracking system used by all Claude Code sessions. Covers architecture, session lifecycle, settings hierarchy, known issues, and E2E test verification. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 17:11:26 +00:00
Viktor Barzin	601a83d84e	fix: CI pipeline image pull auth + shallow clone resilience [ci skip] - Add WOODPECKER_BACKEND_K8S_PULL_SECRET_NAMES to agent env so step pods can pull from private registry (registry.viktorbarzin.me:5050) - Add fallback in default.yml when HEAD~1 is unavailable (shallow clone with depth=1): fetch more history, or apply all platform stacks as safe default - Root cause: pipeline #243 failed because infra-ci:latest image couldn't be pulled (no imagePullSecrets on step pods) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 14:41:08 +00:00
Viktor Barzin	36454b87d1	feat: CI/CD performance overhaul - New custom CI Docker image (ci/Dockerfile) with TF 1.5.7, TG 0.99.4, git-crypt, sops, kubectl pre-installed. Pushed to private registry. Eliminates 17 apk add calls + binary downloads per pipeline run. - Unified CI pipeline: merge default.yml + app-stacks.yml into one. Changed-stacks-only detection (git diff, with global-file fallback). Concurrency limit (xargs -P 4). Step consolidation (2 steps vs 4). Shallow clone (depth=2). Provider cache (TF_PLUGIN_CACHE_DIR). - Per-stack Vault advisory locks in scripts/tg. 30min TTL with stale lock detection. Blocks concurrent applies to same stack. - TF_PLUGIN_CACHE_DIR enabled by default in scripts/tg for local dev. - Daily drift detection pipeline (.woodpecker/drift-detection.yml). Runs terraform plan on all stacks, Slack alert on drift. - CI image build pipeline (.woodpecker/build-ci-image.yml). Expected speedup: ~5-10 min per pipeline run → ~2-4 min. [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-15 11:22:26 +00:00
Viktor Barzin	0b2f5a4729	fix: use depth 5 clone for postmortem pipeline (need HEAD~1)	2026-04-14 17:12:41 +00:00
Viktor Barzin	ce7a4e6e76	fix: Woodpecker v3 secrets→environment migration	2026-04-14 16:47:17 +00:00
Viktor Barzin	8540f48a28	fix: move pipeline logic to shell script (avoid YAML quoting issues)	2026-04-14 16:46:42 +00:00
Viktor Barzin	7f5115f9fe	fix: Woodpecker pipeline YAML quoting + trigger test [ci skip]	2026-04-14 16:45:27 +00:00
Viktor Barzin	8ad674e7b1	fix: postmortem pipeline uses Vault for SSH key (not Woodpecker secrets) Pipeline authenticates to Vault via K8s SA JWT, fetches devvm_ssh_key from secret/ci/infra, SSHes to DevVM to run Claude Code headlessly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:55:12 +00:00
Viktor Barzin	8badb8181a	feat: post-mortem automation pipeline E2E workflow for incident post-mortems: 1. /post-mortem skill generates structured post-mortem markdown 2. Woodpecker pipeline triggers on docs/post-mortems/*.md changes 3. parse-postmortem-todos.sh extracts safe TODOs (Alert/Config/Monitor) 4. postmortem-todo-resolver agent implements TODOs headlessly 5. Agent updates post-mortem with Follow-up Implementation table Components: - .claude/skills/post-mortem/ — writer skill + template - .claude/agents/postmortem-todo-resolver.md — headless agent - .woodpecker/postmortem-todos.yml — CI pipeline - scripts/parse-postmortem-todos.sh — TODO extractor - cluster-health skill — auto-suggest post-mortem after recovery Safety: only auto-implements Alert/Config/Monitor types. Architecture/Migration/Investigation items are skipped. [ci skip] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:34:42 +00:00
Viktor Barzin	09b4bad958	feat: pin ~28 images to specific versions, enable DIUN monitoring, add app-stacks pipeline Pin third-party images from :latest to current stable versions: - Platform: cloudflared, technitium, snmp-exporter, pve-exporter, headscale, shadowsocks, xray - Apps: paperless-ngx, linkwarden, wealthfolio, speedtest, synapse, n8n, prowlarr, qbittorrent, lidarr, rybbit, ollama, immichframe, cyberchef, networking-toolbox, echo, coturn, shlink, affine Enable DIUN annotations on all pinned deployments with per-image tag patterns. Add Woodpecker app-stacks pipeline for selective terragrunt apply on changed app stacks.	2026-04-06 14:27:13 +03:00
Viktor Barzin	3bca7a97c2	fix(renew-tls): update TLS secret in ALL namespaces, not just kyverno Kyverno generate+synchronize only manages secrets it created itself. Existing Terraform-managed secrets in ~70 namespaces weren't updated. Now loops through all namespaces and kubectl apply the new cert.	2026-03-23 22:36:31 +02:00
Viktor Barzin	b7409cea4e	fix(renew-tls): use alpine+curl for kubectl step to avoid permission denied bitnami/kubectl runs as non-root UID 1001, cannot read git-crypt decrypted secrets owned by root. Switch to alpine (runs as root) with kubectl downloaded directly.	2026-03-23 22:28:56 +02:00
Viktor Barzin	16cde1eab5	add Kyverno TLS secret sync + enhance renewal pipeline Kyverno ClusterPolicy clones tls-secret from kyverno namespace to all namespaces with synchronize=true. Renewal pipeline now updates the source secret via kubectl, verifies cert validity, and sends Slack notification.	2026-03-23 22:19:34 +02:00
Viktor Barzin	410c893647	fix(provision): security hardening from code review - Add input validation: username regex + email format check in pipeline - Quote variables in .provision-env to prevent shell injection - Remove dead source command (each Woodpecker command is separate shell) - Use jq to build JSON payloads (prevents injection via group names) - Clean up git-crypt key on failure (use ; instead of &&) - Add Kyverno ndots lifecycle ignore to webhook-handler deployment	2026-03-18 21:25:03 +00:00
Viktor Barzin	82403a933c	fix(provision): remove TF apply from pipeline, notify for manual apply Vault stack can't be applied in CI (git-crypt TLS certs + sensitive for_each on k8s_users). Pipeline now automates Vault KV update + Authentik group creation, then notifies admin to apply stacks manually. This matches the existing pattern — vault is not in default.yml either.	2026-03-18 00:23:06 +00:00
Viktor Barzin	d76b4b698f	fix(provision): targeted vault apply + git-crypt in terragrunt step - Two-pass vault apply: first target new user resources, then full apply - Add git-crypt unlock to terragrunt step (TLS certs needed at plan time)	2026-03-18 00:19:16 +00:00
Viktor Barzin	6fad484126	fix(provision): reduce memory limit to 4Gi (LimitRange max)	2026-03-18 00:15:26 +00:00
Viktor Barzin	de6a5caecc	fix(provision): merge terragrunt-apply into single shell block for env persistence	2026-03-18 00:11:14 +00:00
Viktor Barzin	7a24ff6702	fix(provision): use $USERNAME/$EMAIL directly — Woodpecker 3.x env vars Woodpecker 3.x exposes pipeline variables with their original key names (USERNAME, EMAIL), not CI_PIPELINE_VARIABLE_ prefix.	2026-03-18 00:04:51 +00:00
Viktor Barzin	52dc657af5	debug(provision): dump env vars to find correct variable names	2026-03-18 00:00:33 +00:00
Viktor Barzin	0a05343d86	fix(provision): use $VAR instead of ${VAR} to avoid Woodpecker interpolation Woodpecker performs compile-time substitution on ${...} patterns, replacing pipeline variables with empty strings. Using $VAR without braces lets the shell evaluate them at runtime.	2026-03-17 23:58:46 +00:00
Viktor Barzin	fd130971aa	feat(provision): automated user provisioning via Authentik webhook - Expand CI Vault policy: write secret/data/platform + Transit SOPS keys - Add Woodpecker provision-user.yml pipeline (manual event, API-triggered) - Add env vars to webhook-handler deployment for Woodpecker/Authentik integration - Update add-user skill with automated flow documentation - Update Woodpecker repo ID list in CLAUDE.md	2026-03-17 23:56:30 +00:00
Viktor Barzin	73511b1230	extract remaining 19 modules from platform, complete stack split [ci skip] Phase 3: all 27 platform modules now run as independent stacks. Platform reduced to empty shell (outputs only) for backward compat with 72 app stacks that declare dependency "platform". Fixed technitium cross-module dashboard reference by copying file. Woodpecker pipeline applies all 27+1 stacks in parallel via loop. All applied with zero destroys.	2026-03-17 21:42:16 +00:00
Viktor Barzin	ae36dc253b	extract monitoring, nvidia, mailserver, cloudflared, kyverno from platform [ci skip] Phase 2 of platform stack split. 5 more modules extracted into independent stacks. All applied successfully with zero destroys. Cloudflared now reads k8s_users from Vault directly to compute user_domains. Woodpecker pipeline runs all 8 extracted stacks in parallel. Memory bumped to 6Gi for 9 concurrent TF processes. Platform reduced from 27 to 19 modules.	2026-03-17 21:34:11 +00:00
Viktor Barzin	3c804aedf8	extract dbaas, authentik, crowdsec from platform into independent stacks [ci skip] Phase 1 of platform stack split for parallel CI applies. All 3 modules were fully independent (no cross-module refs). State migrated via terraform state mv. All 3 stacks applied with zero changes (dbaas had pre-existing ResourceQuota drift). Woodpecker pipeline updated to run extracted stacks in parallel.	2026-03-17 18:11:53 +00:00
Viktor Barzin	b6d619e5df	fix: increase terragrunt-apply step memory to 2Gi LimitRange defaults containers to 192Mi which is insufficient for terragrunt apply on the platform stack (48 vault refs, many modules). Set explicit 1Gi request / 2Gi limit via backend_options.	2026-03-15 22:59:34 +00:00
Viktor Barzin	0c1239030d	fix: CI pipeline - disable corrupted cache, add pull before push - build-cli.yml: comment out cache_from/cache_to to avoid BuildKit "short read" errors from corrupted registry cache - default.yml: add git pull --rebase before push in cleanup-and-push to handle remote having newer commits	2026-03-15 22:51:08 +00:00
Viktor Barzin	50620e6047	add generic multi-user cluster onboarding system Data-driven user onboarding: add a JSON entry to Vault KV k8s_users, apply vault + platform + woodpecker stacks, and everything is auto-generated. Vault stack: namespace creation, per-user Vault policies with secret isolation via identity entities/aliases, K8s deployer roles, CI policy update. Platform stack: domains field in k8s_users type, TLS secrets per user namespace, user domains merged into Cloudflare DNS, user-roles ConfigMap mounted in portal. Woodpecker stack: admin list auto-generated from k8s_users, WOODPECKER_OPEN=true. K8s-portal: dual-track onboarding (general/namespace-owner), namespace-owner dashboard with Vault/kubectl commands, setup script adds Vault+Terraform+Terragrunt, contributing page with CI pipeline template, versioned image tags in CI pipeline. New: stacks/_template/ with copyable stack template for namespace-owners.	2026-03-15 22:23:36 +00:00
Viktor Barzin	3aba29e7a3	remove SOPS pipeline, deploy ESO + Vault DB/K8s engines Vault is now the sole source of truth for secrets. SOPS pipeline removed entirely — auth via `vault login -method=oidc`. Part A: SOPS removal - vault/main.tf: delete 990 lines (93 vars + 43 KV write resources), add self-read data source for OIDC creds from secret/vault - terragrunt.hcl: remove SOPS var loading, vault_root_token, check_secrets hook - scripts/tg: remove SOPS decryption, keep -auto-approve logic - .woodpecker/default.yml: replace SOPS with Vault K8s auth via curl - Delete secrets.sops.json, .sops.yaml Part B: External Secrets Operator - New stack stacks/external-secrets/ with Helm chart + 2 ClusterSecretStores (vault-kv for KV v2, vault-database for DB engine) Part C: Database secrets engine (in vault/main.tf) - MySQL + PostgreSQL connections with static role rotation (24h) - 6 MySQL roles (speedtest, wrongmove, codimd, nextcloud, shlink, grafana) - 6 PostgreSQL roles (trading, health, linkwarden, affine, woodpecker, claude_memory) Part D: Kubernetes secrets engine (in vault/main.tf) - RBAC for Vault SA to manage K8s tokens - Roles: dashboard-admin, ci-deployer, openclaw, local-admin - New scripts/vault-kubeconfig helper for dynamic kubeconfig K8s auth method with scoped policies for CI, ESO, OpenClaw, Woodpecker sync.	2026-03-15 16:37:38 +00:00
Viktor Barzin	9f2ac0fd1a	[ci skip] update AGENTS.md + CLAUDE.md with SOPS workflow, add k8s-portal CI pipeline AGENTS.md: added SOPS secrets management section, scripts/tg usage, contributor onboarding steps, pull-through cache bypass notes. CLAUDE.md: added SOPS workflow note, linux/amd64 build reminder, versioned tag guidance for pull-through cache. CI: new .woodpecker/k8s-portal.yml pipeline — auto-builds and deploys the k8s portal when files under stacks/platform/modules/k8s-portal/files/ change on master push. Uses buildx for linux/amd64.	2026-03-07 15:37:19 +00:00
Viktor Barzin	1f2c1ca361	[ci skip] phase 5+6: update CI pipelines for SOPS, add sensitive=true to secret vars Phase 5 — CI pipelines: - default.yml: add SOPS decrypt in prepare step, change git add . to specific paths (stacks/ state/ .woodpecker/), cleanup on success+failure - renew-tls.yml: change git add . to git add secrets/ state/ Phase 6 — sensitive=true: - Add sensitive = true to 256 variable declarations across 149 stack files - Prevents secret values from appearing in terraform plan output - Does NOT modify shared modules (ingress_factory, nfs_volume) to avoid breaking module interface contracts Note: CI pipeline SOPS decryption requires sops_age_key Woodpecker secret to be created before the pipeline will work with SOPS. Until then, the old terraform.tfvars path continues to function.	2026-03-07 14:30:36 +00:00
Viktor Barzin	b1a3685f50	remove f1-stream CI pipeline (moved to separate repo) The f1-stream project now lives at github.com/ViktorBarzin/f1-stream with its own Woodpecker CI pipeline.	2026-03-01 14:43:06 +00:00
Viktor Barzin	691c74aa30	fix: use inline cache to avoid cache_from comma splitting bug	2026-02-28 19:56:00 +00:00
Viktor Barzin	96c0353c13	[ci skip] add TLS to private registry, switch to registry.viktorbarzin.me	2026-02-28 19:40:38 +00:00
Viktor Barzin	140f48c6ee	fix: remove private registry from docker builds, push only to DockerHub The BuildKit builder cannot push to the insecure HTTP registry at registry.viktorbarzin.lan:5050 because buildkit_config is not being applied by the plugin. Simplified to DockerHub-only push for now. Private registry caching and push can be re-added once buildkit_config issue is resolved.	2026-02-28 19:15:54 +00:00
Viktor Barzin	a1ba218cd2	[ci skip] Phase 1: PostgreSQL migrated to CNPG on local disk Major milestone - shared PostgreSQL moved from NFS to CloudNativePG: - CNPG cluster (pg-cluster) running in dbaas namespace on local-path storage - PostGIS image (ghcr.io/cloudnative-pg/postgis:16) for dawarich compatibility - All 20 databases and 19 roles restored from pg_dumpall backup - postgresql.dbaas Service patched to point at CNPG primary - Old PG deployment scaled to 0 (NFS data intact for rollback) - All 12+ dependent services verified running: authentik, n8n, dawarich, tandoor, linkwarden, netbox, woodpecker, rybbit, affine, health, resume, trading-bot, atuin - Authentik PgBouncer working through the switched endpoint TODO: codify CNPG cluster in Terraform, add 2nd replica, update backup CronJob	2026-02-28 19:08:06 +00:00
Viktor Barzin	b3eaf76684	fix: use cache_images instead of cache_from to avoid comma splitting The plugin-docker-buildx (Codeberg version) changed CacheFrom from string to StringSlice, which causes urfave/cli to split on commas. The cache_images setting properly handles registry refs by generating both --cache-from and --cache-to flags automatically.	2026-02-28 18:53:08 +00:00
Viktor Barzin	87fc11121d	fix: use plain string for cache_from/cache_to and fix caretta helm_release - cache_from/cache_to must be plain strings, not YAML lists — the plugin-docker-buildx treats them as single string values and the Woodpecker settings layer was splitting comma-separated list items into separate --cache-from flags (type=registry and ref=... separately) - caretta.tf: replace deprecated set{} blocks with values=[yamlencode()] to fix Terraform plan error with newer Helm provider	2026-02-28 18:47:20 +00:00
Viktor Barzin	0ebf850893	fix: use YAML list for cache_from/cache_to to prevent comma splitting	2026-02-28 18:40:55 +00:00
Viktor Barzin	be47592e08	fix: remove deprecated secrets field from slack step	2026-02-28 18:32:10 +00:00
Viktor Barzin	4b6ade7b08	fix: replace removed woodpeckerci/plugin-slack with curl-based webhook	2026-02-28 18:25:23 +00:00
Viktor Barzin	c7bb324f64	[ci skip] add BuildKit layer caching and dual-push to f1-stream pipeline	2026-02-28 17:56:56 +00:00

1 2

55 commits