Commit graph

2766 commits

Author SHA1 Message Date
Viktor Barzin
4c8e5bea0b [traefik] Add global compress middleware to fix response compression
The rewrite-body plugin (rybbit analytics, anti-AI trap links) requires
strip-accept-encoding to work, which killed HTTP compression for 50+
services. This adds Traefik's built-in compress middleware at the
websecure entrypoint level to re-compress responses to clients after
rewrite-body has modified them.

Uses includedContentTypes whitelist (not excludedContentTypes) so only
text-based types are compressed. SSE, WebSocket, gRPC, and binary
downloads are unaffected.

Measured improvement on ha-sofia:
- app.js: 540KB → 167KB (3.2x)
- core.js: 52KB → 19KB (2.7x)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 22:18:51 +00:00
Viktor Barzin
e80b2f026f [infra] Migrate Terraform state from local SOPS to PostgreSQL backend
Two-tier state architecture:
- Tier 0 (infra, platform, cnpg, vault, dbaas, external-secrets): local
  state with SOPS encryption in git — unchanged, required for bootstrap.
- Tier 1 (105 app stacks): PostgreSQL backend on CNPG cluster at
  10.0.20.200:5432/terraform_state with native pg_advisory_lock.

Motivation: multi-operator friction (every workstation needed SOPS + age +
git-crypt), bootstrap complexity for new operators, and headless agents/CI
needing the full encryption toolchain just to read state.

Changes:
- terragrunt.hcl: conditional backend (local vs pg) based on tier0 list
- scripts/tg: tier detection, auto-fetch PG creds from Vault for Tier 1,
  skip SOPS and Vault KV locking for Tier 1 stacks
- scripts/state-sync: tier-aware encrypt/decrypt (skips Tier 1)
- scripts/migrate-state-to-pg: one-shot migration script (idempotent)
- stacks/vault/main.tf: pg-terraform-state static role + K8s auth role
  for claude-agent namespace
- stacks/dbaas: terraform_state DB creation + MetalLB LoadBalancer
  service on shared IP 10.0.20.200
- Deleted 107 .tfstate.enc files for migrated Tier 1 stacks
- Cleaned up per-stack tiers.tf (now generated by root terragrunt.hcl)

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 19:33:12 +00:00
Viktor Barzin
f538115c43 [dbaas] Migrate MySQL from InnoDB Cluster to standalone StatefulSet
## Context
Disk write analysis showed MySQL InnoDB Cluster writing ~95 GB/day for only
~35 MB of actual data due to Group Replication overhead (binlog, relay log,
GR apply log). The operator enforces GR even with serverInstances=1.

Bitnami Helm charts were deprecated by Broadcom in Aug 2025 — no free
container images available. Using official mysql:8.4 image instead.

## This change:
- Replace helm_release.mysql_cluster service selector with raw
  kubernetes_stateful_set_v1 using official mysql:8.4 image
- ConfigMap mysql-standalone-cnf: skip-log-bin, innodb_flush_log_at_trx_commit=2,
  innodb_doublewrite=ON (re-enabled for standalone safety)
- Service selector switched to standalone pod labels
- Technitium: disable SQLite query logging (18 GB/day write amplification),
  keep PostgreSQL-only logging (90-day retention)
- Grafana datasource and dashboards migrated from MySQL to PostgreSQL
- Dashboard SQL queries fixed for PG integer division (::float cast)
- Updated CLAUDE.md service-specific notes

## What is NOT in this change:
- InnoDB Cluster + operator removal (Phase 4, 7+ days from now)
- Stale Vault role cleanup (Phase 4)
- Old PVC deletion (Phase 4)

Expected write reduction: ~113 GB/day (MySQL 95 + Technitium 18)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 19:01:06 +00:00
Viktor Barzin
ef30f27ac9 state(dbaas): update encrypted state 2026-04-16 18:56:59 +00:00
Viktor Barzin
b6fc1e63a6 state(dbaas): import postgresql-lb service 2026-04-16 18:55:40 +00:00
Viktor Barzin
14fa2b9762 state(vault): update encrypted state 2026-04-16 18:43:06 +00:00
Viktor Barzin
1a42f750f8 state(dbaas): update encrypted state 2026-04-16 18:41:34 +00:00
Viktor Barzin
0a43b5c2ac state(dbaas): update encrypted state 2026-04-16 18:31:33 +00:00
Viktor Barzin
cd513a2226 state(dbaas): update encrypted state 2026-04-16 18:24:31 +00:00
Viktor Barzin
0368601eff state(dbaas): update encrypted state 2026-04-16 18:24:20 +00:00
Viktor Barzin
a237ac97e0 docs(upgrades): add bulk upgrade results from first production run
12 services upgraded in 30 min: audiobookshelf, owntracks, open-webui,
immich, coturn, shlink, phpipam, onlyoffice, paperless-ngx, linkwarden,
synapse, dawarich. Documents auto-rollback behavior, resource awareness
(paperless memory bump), bulk upgrade procedure, and rate limit reset.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 17:34:27 +00:00
Viktor Barzin
39b5ed04a7 upgrade: dawarich 0.37.1 -> 1.6.1 (fix entrypoint + add production env)
## Context
Version 1.3.0+ changed the recommended command from `bin/dev` (development)
to `bin/rails server -p 3000 -b ::` (production). Also requires RAILS_ENV=production,
SECRET_KEY_BASE, and RAILS_LOG_TO_STDOUT env vars.

## This change
- Command: `bin/dev` → `bin/rails server -p 3000 -b ::`
- Add RAILS_ENV=production
- Add SECRET_KEY_BASE (stored in Vault secret/dawarich, synced via ESO)
- Add RAILS_LOG_TO_STDOUT=true

## What happened
1. Initial upgrade applied version 1.6.1 — DB migrations ran but pod
   CrashLooped due to wrong entrypoint (bin/dev exits in production mode)
2. Rollback to 0.37.1 failed because 1.6.1 migrations already ran
   (ActiveRecord::UnknownPrimaryKey on rails_pulse_routes)
3. Rolled forward with corrected entrypoint + env vars
4. Service now stable: 20/20 health checks passed over 5 minutes

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 17:25:29 +00:00
Viktor Barzin
8bd2ace00d state(technitium): update encrypted state 2026-04-16 17:21:06 +00:00
Viktor Barzin
7680d4e009 state(dawarich): update encrypted state 2026-04-16 17:19:29 +00:00
Viktor Barzin
59e99f2a3a upgrade: paperless-ngx increase memory 1Gi -> 2Gi
v2.20.14 OOMKills at 1Gi during search index rebuild on upgrade.
Bumped to 2Gi request=limit to handle startup index operations.

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 17:16:23 +00:00
Viktor Barzin
611c67b92c state(paperless-ngx): update encrypted state 2026-04-16 17:10:57 +00:00
Viktor Barzin
5f1b14ad53 upgrade: dawarich re-apply 1.6.1 (forward-fix after failed rollback)
DB migrations from 1.6.1 already ran, making 0.37.1 incompatible
(ActiveRecord::UnknownPrimaryKey on rails_pulse_routes table).
Rolling forward is the correct path.

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 17:08:20 +00:00
Viktor Barzin
e9275534b6 state(dawarich): update encrypted state 2026-04-16 17:08:15 +00:00
Viktor Barzin
1f589a403c state(dawarich): update encrypted state 2026-04-16 17:04:44 +00:00
Viktor Barzin
f5883be981 Revert "upgrade: dawarich 0.37.1 -> 1.6.1"
This reverts commit ec8b4dbaac.
2026-04-16 17:04:06 +00:00
Viktor Barzin
178fc4b398 state(matrix): update encrypted state 2026-04-16 17:01:28 +00:00
Viktor Barzin
449f1af9d6 state(immich): update encrypted state 2026-04-16 17:00:59 +00:00
Viktor Barzin
0ec48d942f state(paperless-ngx): update encrypted state 2026-04-16 17:00:58 +00:00
Viktor Barzin
88c47efa1d state(url): update encrypted state 2026-04-16 16:54:31 +00:00
Viktor Barzin
7b69641357 upgrade: linkwarden v2.9.1 -> v2.14.0
Changelog summary: 23 intermediate releases spanning text highlighting, drag & drop
organization, tag management page, compact sidebar, mobile app (iOS/Android),
SingleFile upload support, performance refactors, and NextJS CVE patches.
Risk: SAFE
Breaking changes: none
DB backup: yes (job: pre-upgrade-linkwarden-1776357253, 254M dump confirmed)
Config changes applied: none
Flagged for manual review: none

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 16:54:30 +00:00
Viktor Barzin
e1aa59ce53 upgrade: paperless-ngx 2.16.4 -> 2.20.14
Changelog summary: 4 minor versions of bug fixes, security patches (7+ CVEs),
and features (nested tags, PDF editor, advanced workflow filters, Trixie base).

Risk: CAUTION
Breaking changes:
- v2.17.0: Scheduled workflow offset sign corrected (restores pre-2.16 behavior)
- v2.20.7: Filename template rendering restricted to safe document context
DB backup: yes (job: pre-upgrade-paperless-ngx-1776357314, 254MiB)
Config changes applied: none
Flagged for manual review:
- Check scheduled workflow offsets if any were modified during 2.16.x
- Verify storage path templates still render correctly after v2.20.7 restriction

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 16:53:42 +00:00
Viktor Barzin
237126eb3a upgrade: onlyoffice 8.2.3 -> 9.3.1
Changelog summary: Major version bump spanning 13 releases. v9.0.0 adds PDF editor
API, macro recording, Service Worker caching. v9.2.1 fixes critical security vulns
(XSS, memory manipulation leading to RCE in XLS conversion). v9.3.0 adds GIF animations,
multiple pages view, signature settings, hyperlinks on images/shapes.
Risk: CAUTION (major version bump 8->9)
Breaking changes: none affecting Docker+MySQL deployment. PostgreSQL schema change
in v9.0.0 (irrelevant — we use MySQL). API endpoint deprecations (ConvertService.ashx,
GET requests to converter/command) — not removals. Config parameter renames
(leftMenu->layout.leftMenu etc.) are editor JS API, not server config.
DB backup: yes (job: pre-upgrade-onlyoffice-1776357277, MySQL full dump)
Config changes applied: none required
Flagged for manual review: none

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 16:53:24 +00:00
Viktor Barzin
8637d82817 upgrade: phpipam v1.7.0 -> v1.7.4
Changelog summary: Bugfixes (PHP8 compat, UI performance, jQuery errors)
and security fixes (XSS reflected, CSRF cookie, RCE via ping_path,
DB credential exposure via mysqldump).

Risk: SAFE
Breaking changes: none
DB backup: yes (job: pre-upgrade-phpipam-1776357227, mysql, dbaas ns)
Config changes applied: none
Flagged for manual review: none

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 16:53:21 +00:00
Viktor Barzin
bf257414b3 state(dawarich): update encrypted state 2026-04-16 16:53:19 +00:00
Viktor Barzin
6727894573 upgrade: url (shlink) 4.3.4 -> 5.0.2
Changelog summary: Major version bump. v5.0.0 removes QR code generation,
REDIRECT_APPEND_EXTRA_PATH env var, and trusted proxy auto-detection.
Various CLI option removals. v4.4-4.6 added REDIRECT_EXTRA_PATH_MODE,
DB_USE_ENCRYPTION, TRUSTED_PROXIES, CORS controls, FrankenPHP support.

Risk: CAUTION (major version bump 4→5)
Breaking changes: QR codes removed, REDIRECT_APPEND_EXTRA_PATH removed,
  trusted proxy auto-detection removed, CLI option renames
DB backup: yes (job: pre-upgrade-url-1776357271, completed)
Config changes applied: none (no affected env vars in current config)
Flagged for manual review: TRUSTED_PROXIES env var may be needed
  (Shlink behind Cloudflare + Traefik = 2 proxies, auto-detection removed in 5.0.0)

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 16:49:42 +00:00
Viktor Barzin
727b3c4570 state(coturn): update encrypted state 2026-04-16 16:48:48 +00:00
Viktor Barzin
1171b390c5 state(owntracks): update encrypted state 2026-04-16 16:48:40 +00:00
Viktor Barzin
ec8b4dbaac upgrade: dawarich 0.37.1 -> 1.6.1
Changelog summary: 19 intermediate releases. 1.0.0 is cosmetic (same as 0.37.3).
Key changes: per-user timezone (1.3.0), motion_data column with background migration (1.3.0),
GPS noise filtering (1.5.0), family page with map (1.4.0), redesigned archival system (1.4.0).
Risk: CAUTION (major version boundary + breaking keyword in 1.3.3)
Breaking changes: 1.3.3 API change (distance field integer→object, affects API consumers only)
DB backup: yes (job: pre-upgrade-dawarich-1776357303, postgresql, completed)
Config changes applied: none (existing TIME_ZONE=Europe/London is compatible)
Flagged for manual review: none

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 16:48:34 +00:00
Viktor Barzin
a0ea11a4b4 state(coturn): update encrypted state 2026-04-16 16:48:12 +00:00
Viktor Barzin
5d610baed8 state(ollama): update encrypted state 2026-04-16 16:44:34 +00:00
Viktor Barzin
287d5eb28d upgrade: coturn 4.6.3-r1 -> 4.10.0-r1
Changelog summary: Security fixes (CVE-2025-69217, CVE-2026-27624,
CVE-2026-40613), performance improvements (recvmmsg, lock-free atomics),
memory safety fixes, and DDoS handling improvements.

Risk: CAUTION (4.7.0 has breaking changes for deprecated config options)
Breaking changes: 4.7.0 removed keep-address-family,
  response-origin-only-with-rfc5780, inverted no-stun-backward-compatibility.
  None of these are in our config — no impact.
DB backup: no (not DB-backed)
Config changes applied: none (no-tlsv1, no-tlsv1_1, no-cli now unnecessary
  but still accepted — no removal needed)
Flagged for manual review: none

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 16:34:59 +00:00
Viktor Barzin
cce513349a upgrade: immich v2.7.4 -> v2.7.5
Changelog summary: Bug fix for version check rate limiting and deduplication,
translation updates. Patch-only release with no breaking changes.
Risk: SAFE
Breaking changes: none
DB backup: yes (job: pre-upgrade-immich-1776357229, 1.9G, immich namespace)
Config changes applied: none
Flagged for manual review: none

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 16:34:57 +00:00
Viktor Barzin
3afdc9a6cb upgrade: ollama (open-webui) v0.7.2 -> v0.8.12
Changelog summary: 13 intermediate releases. v0.8.0 introduces analytics dashboard,
Skills support, Open Responses protocol, and a long-running DB migration on
chat_message table. v0.8.1-v0.8.12 add model editing shortcuts, OIDC logout
endpoint, terminal integration, notebook execution, and numerous bug fixes.

Risk: CAUTION
Breaking changes: v0.8.0 long-running chat_message table migration + schema changes,
  v0.8.1 additional schema changes. SQLite auto-migrates on startup.
DB backup: skipped (SQLite on proxmox-lvm PVC, LVM snapshots available for rollback)
Config changes applied: none
Flagged for manual review: none — all changes are additive features/fixes

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 16:34:48 +00:00
Viktor Barzin
1ea48c93e5 upgrade: owntracks 0.9.9 -> 1.0.1
Changelog summary:
- 1.0.0: POI inline image support, deprecate google maps in vmap.html, packaging fixes
- 1.0.1: ocat JSON array output fix, revgeo error messages, OpenBSD support, storage dir env fix

Risk: CAUTION (major version 0→1, but changes are benign — no schema/config/API breaking changes)
Breaking changes: none (deprecate keyword hit on vmap.html google maps — cosmetic only)
DB backup: skipped (not DB-backed)
Config changes applied: none required
Flagged for manual review: none

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 16:34:29 +00:00
Viktor Barzin
216d4240c9 [infra] Add Cloudflare provider to all stack lock files and generated providers
Terragrunt now generates cloudflare_provider.tf (Vault-sourced API key)
and includes cloudflare in required_providers. These are the generated
files from running `terragrunt init -upgrade` across all stacks.

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 16:31:36 +00:00
Viktor Barzin
541bee7176 state(ebooks): update encrypted state 2026-04-16 16:05:27 +00:00
Viktor Barzin
cf93f123f1 upgrade: audiobookshelf 2.32.1 -> 2.33.1
Changelog summary: Security fixes (IDOR vulnerabilities in sessions/progress/bookmarks),
DB index + query parallelization for discover performance, crash fixes, HTML sanitization
on playlist/collection/podcast endpoints, API key enabled/disabled fix.
Risk: SAFE
Breaking changes: none
DB backup: no (not DB-backed)
Config changes applied: none
Flagged for manual review: none

Co-Authored-By: Service Upgrade Agent <noreply@viktorbarzin.me>
2026-04-16 16:00:26 +00:00
root
af090c818b Woodpecker CI deploy [CI SKIP] 2026-04-16 13:46:08 +00:00
Viktor Barzin
b1d152be1f [infra] Auto-create Cloudflare DNS records from ingress_factory
## Context

Deploying new services required manually adding hostnames to
cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars —
a separate file from the service stack. This was frequently forgotten,
leaving services unreachable externally.

## This change:

- Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory`
  modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates
  the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP).
- Simplify cloudflared tunnel from 100 per-hostname rules to wildcard
  `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing.
- Add global Cloudflare provider via terragrunt.hcl (separate
  cloudflare_provider.tf with Vault-sourced API key).
- Migrate 118 hostnames from centralized config.tfvars to per-service
  dns_type. 17 hostnames remain centrally managed (Helm ingresses,
  special cases).
- Update docs, AGENTS.md, CLAUDE.md, dns.md runbook.

```
BEFORE                          AFTER
config.tfvars (manual list)     stacks/<svc>/main.tf
        |                         module "ingress" {
        v                           dns_type = "proxied"
stacks/cloudflared/               }
  for_each = list                     |
  cloudflare_record               auto-creates
  tunnel per-hostname             cloudflare_record + annotation
```

## What is NOT in this change:

- Uptime Kuma monitor migration (still reads from config.tfvars)
- 17 remaining centrally-managed hostnames (Helm, special cases)
- Removal of allow_overwrite (keep until migration confirmed stable)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:45:04 +00:00
Viktor Barzin
95d2a6abf8 state(wealthfolio): update encrypted state 2026-04-16 11:30:59 +00:00
Viktor Barzin
e8874dd37a state(cloudflared): update encrypted state 2026-04-16 10:59:30 +00:00
Viktor Barzin
997fd4f85b state(linkwarden): update encrypted state 2026-04-16 10:35:35 +00:00
Viktor Barzin
2ae31148cb state(ytdlp): update encrypted state 2026-04-16 10:33:55 +00:00
Viktor Barzin
43b0316978 state(xray): update encrypted state 2026-04-16 10:33:39 +00:00
Viktor Barzin
f0e7de8e57 state(woodpecker): update encrypted state 2026-04-16 10:33:27 +00:00