Commit graph

24 commits

Author SHA1 Message Date
github-actions[bot]
b45c45e419 priority-pass: bump image_tag to 88f18e53 [ci skip]
Auto-committed by ViktorBarzin/priority-pass GHA on push to main.
Source: 88f18e532b
2026-05-05 21:13:14 +00:00
Viktor Barzin
fb454e16d5 priority-pass: parameterise image_tag via var pattern (matches job-hunter)
Adopts the always-latest convention used by job-hunter, payslip-ingest,
and fire-planner: image SHA lives in stacks/priority-pass/terragrunt.hcl
inputs, default in main.tf var. The priority-pass GHA build workflow
auto-commits new SHAs to this file on every successful push.

- Add `variable "image_tag"` (default = current value 7c01448d).
- Both containers now use `local.{frontend,backend}_image` interpolation.
- Replace symlinked terragrunt.hcl with a real file so the stack-local
  inputs block can override image_tag (mirrors payslip-ingest exactly).

State note: priority-pass TF state is currently empty (Tier 1 PG migration
skipped this stack). A subsequent `terragrunt import` is required to
adopt the live deployment + namespace + ingress before running apply.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-05 21:03:46 +00:00
Viktor Barzin
c4c5057edc priority-pass: pin to backend 7c01448d (transplant QR into golden-position container)
[ci skip]
2026-05-05 19:14:11 +00:00
Viktor Barzin
86385f5842 priority-pass: pin to DockerHub viktorbarzin/* (GHA-built, sha 50a432ad)
Now that priority-pass has its own GitHub repo (ViktorBarzin/priority-pass)
with a working GHA build pipeline that pushes to DockerHub, switch the TF
deployment pin from registry.viktorbarzin.me to docker.io/viktorbarzin so
future automated rollouts (once the repo is registered with Woodpecker)
land on the matching image source.

[ci skip]
2026-05-01 19:27:33 +00:00
Viktor Barzin
d76b5dbc4b priority-pass: backend c2b4ac50 — crop to card before transforming
Three fixes for boarding passes uploaded as iPhone screenshots (input
includes phone status bar, partial Tesco card below, etc.):

1. Detect the card region first and crop to it. All proportional
   coordinates (Step 8 text replacement, Step 9 logo removal) are now
   card-relative instead of full-image-relative — they were landing
   in the wrong region on tall screenshots, putting "Priority" text
   inside the QR area and leaving a yellow icon box at the bottom.

2. Step 8 now picks the LONGEST contiguous dark-row run inside a wider
   y-band, instead of using the dark-row [first, last] span. This
   distinguishes the QUEUE value text from the QUEUE label above it
   (both are dark blue in the original) so the erase rectangle no
   longer eats into the labels.

3. QR container padding bumped 8% → 12% so QR/container ratio matches
   the ~74-80% golden look.

Verified end-to-end against three real samples saved by the previous
build's training-data feature, plus the original non-priority.jpeg
fixture: outputs now match priority.jpeg layout.

[ci skip]
2026-05-01 19:06:02 +00:00
Viktor Barzin
dfbf6faf3d priority-pass: backend f4246691 (QR fit fix + persist uploads), add encrypted PVC
Backend changes:
- transformers.py: QR container now sized to actual qr_bbox + 8% padding
  (was fixed at 45% of card width). When QR was wider than 45% of card,
  the leftover-pixel branch color-remapped QR pixels outside the
  container, breaking the scan. New container always encloses qr_mask.
- main.py: persist input + output + json metadata under
  $UPLOAD_DIR/<airline>/<ts-uuid>-{input.<ext>,output.png,*.json} for
  future training. Failure to save is logged, never breaks the API.

Infra:
- New PVC priority-pass-uploads (1Gi proxmox-lvm-encrypted, 10Gi
  autoresize cap) — encrypted because boarding passes contain PII.
- Deployment strategy → Recreate (RWO requirement).
- Volume + volumeMount + UPLOAD_DIR env on backend container.

Applied via kubectl (TF state for this stack is empty — see prior
commit). New pod priority-pass-77956b64fb rolled out, PVC bound,
test transform succeeded, sample written to /data/uploads/ryanair/.

[ci skip]
2026-05-01 18:50:51 +00:00
Viktor Barzin
ce7a584801 priority-pass: frontend ea9176f8 (gallery upload), sync backend pin to live
Frontend bug fix: photo input forced camera on mobile via
capture="environment". Added separate "Choose from Gallery" / "Take
Photo" buttons so users can pick from their photo library.

Backend image unchanged; pin synced from stale v8 to live SHA
ae1420a0 (the v8 tag was never pushed to the registry).

Image built locally and pushed to registry.viktorbarzin.me. The
priority-pass project directory isn't under git, so deployment was
applied via kubectl set image (matches existing pattern — TF state
for this stack is empty).

[ci skip]
2026-05-01 18:38:30 +00:00
Viktor Barzin
8b43692af0 [infra] Suppress Goldilocks vpa-update-mode label drift on all namespaces [ci skip]
## Context

Wave 3B-continued: the Goldilocks VPA dashboard (stacks/vpa) runs a Kyverno
ClusterPolicy `goldilocks-vpa-auto-mode` that mutates every namespace with
`metadata.labels["goldilocks.fairwinds.com/vpa-update-mode"] = "off"`. This
is intentional — Terraform owns container resource limits, and Goldilocks
should only provide recommendations, never auto-update. The label is how
Goldilocks decides per-namespace whether to run its VPA in `off` mode.

Effect on Terraform: every `kubernetes_namespace` resource shows the label
as pending-removal (`-> null`) on every `scripts/tg plan`. Dawarich survey
2026-04-18 confirmed the drift. Cluster-side count: 88 namespaces carry the
label (`kubectl get ns -o json | jq ... | wc -l`). Every TF-managed namespace
is affected.

This commit brings the intentional admission drift under the same
`# KYVERNO_LIFECYCLE_V1` discoverability marker introduced in c9d221d5 for
the ndots dns_config pattern. The marker now stands generically for any
Kyverno admission-webhook drift suppression; the inline comment records
which specific policy stamps which specific field so future grep audits
show why each suppression exists.

## This change

107 `.tf` files touched — every stack's `resource "kubernetes_namespace"`
resource gets:

```hcl
lifecycle {
  # KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode ClusterPolicy stamps this label on every namespace
  ignore_changes = [metadata[0].labels["goldilocks.fairwinds.com/vpa-update-mode"]]
}
```

Injection was done with a brace-depth-tracking Python pass (`/tmp/add_goldilocks_ignore.py`):
match `^resource "kubernetes_namespace" ` → track `{` / `}` until the
outermost closing brace → insert the lifecycle block before the closing
brace. The script is idempotent (skips any file that already mentions
`goldilocks.fairwinds.com/vpa-update-mode`) so re-running is safe.

Vault stack picked up 2 namespaces in the same file (k8s-users produces
one, plus a second explicit ns) — confirmed via file diff (+8 lines).

## What is NOT in this change

- `stacks/trading-bot/main.tf` — entire file is `/* … */` commented out
  (paused 2026-04-06 per user decision). Reverted after the script ran.
- `stacks/_template/main.tf.example` — per-stack skeleton, intentionally
  minimal. User keeps it that way. Not touched by the script (file
  has no real `resource "kubernetes_namespace"` — only a placeholder
  comment).
- `.terraform/` copies (e.g. `stacks/metallb/.terraform/modules/...`) —
  gitignored, won't commit; the live path was edited.
- `terraform fmt` cleanup of adjacent pre-existing alignment issues in
  authentik, freedify, hermes-agent, nvidia, vault, meshcentral. Reverted
  to keep the commit scoped to the Goldilocks sweep. Those files will
  need a separate fmt-only commit or will be cleaned up on next real
  apply to that stack.

## Verification

Dawarich (one of the hundred-plus touched stacks) showed the pattern
before and after:

```
$ cd stacks/dawarich && ../../scripts/tg plan

Before:
  Plan: 0 to add, 2 to change, 0 to destroy.
   # kubernetes_namespace.dawarich will be updated in-place
     (goldilocks.fairwinds.com/vpa-update-mode -> null)
   # module.tls_secret.kubernetes_secret.tls_secret will be updated in-place
     (Kyverno generate.* labels — fixed in 8d94688d)

After:
  No changes. Your infrastructure matches the configuration.
```

Injection count check:
```
$ rg -c 'KYVERNO_LIFECYCLE_V1: goldilocks-vpa-auto-mode' stacks/ | awk -F: '{s+=$2} END {print s}'
108
```

## Reproduce locally
1. `git pull`
2. Pick any stack: `cd stacks/<name> && ../../scripts/tg plan`
3. Expect: no drift on the namespace's goldilocks.fairwinds.com/vpa-update-mode label.

Closes: code-dwx

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 21:15:27 +00:00
Viktor Barzin
c9d221d578 [infra] Establish KYVERNO_LIFECYCLE_V1 drift-suppression convention [ci skip]
## Context

Phase 1 of the state-drift consolidation audit (plan Wave 3) identified that
the entire repo leans on a repeated `lifecycle { ignore_changes = [...dns_config] }`
snippet to suppress Kyverno's admission-webhook dns_config mutation (the ndots=2
override that prevents NxDomain search-domain flooding). 27 occurrences across
19 stacks. Without this suppression, every pod-owning resource shows perpetual
TF plan drift.

The original plan proposed a shared `modules/kubernetes/kyverno_lifecycle/`
module emitting the ignore-paths list as an output that stacks would consume in
their `ignore_changes` blocks. That approach is architecturally impossible:
Terraform's `ignore_changes` meta-argument accepts only static attribute paths
— it rejects module outputs, locals, variables, and any expression (the HCL
spec evaluates `lifecycle` before the regular expression graph). So a DRY
module cannot exist. The canonical pattern IS the repeated snippet.

What the snippet was missing was a *discoverability tag* so that (a) new
resources can be validated for compliance, (b) the existing 27 sites can be
grep'd in a single command, and (c) future maintainers understand the
convention rather than each reinventing it.

## This change

- Introduces `# KYVERNO_LIFECYCLE_V1` as the canonical marker comment.
  Attached inline on every `spec[0].template[0].spec[0].dns_config` line
  (or `spec[0].job_template[0].spec[0]...` for CronJobs) across all 27
  existing suppression sites.
- Documents the convention with rationale and copy-paste snippets in
  `AGENTS.md` → new "Kyverno Drift Suppression" section.
- Expands the existing `.claude/CLAUDE.md` Kyverno ndots note to reference
  the marker and explain why the module approach is blocked.
- Updates `_template/main.tf.example` so every new stack starts compliant.

## What is NOT in this change

- The `kubernetes_manifest` Kyverno annotation drift (beads `code-seq`)
  — that is Phase B with a sibling `# KYVERNO_MANIFEST_V1` marker.
- Behavioral changes — every `ignore_changes` list is byte-identical
  save for the inline comment.
- The fallback module the original plan anticipated — skipped because
  Terraform rejects expressions in `ignore_changes`.
- `terraform fmt` cleanup on adjacent unrelated blocks in three files
  (claude-agent-service, freedify/factory, hermes-agent). Reverted to
  keep this commit scoped to the convention rollout.

## Before / after

Before (cannot distinguish accidental-forgotten from intentional-convention):
```hcl
lifecycle {
  ignore_changes = [spec[0].template[0].spec[0].dns_config]
}
```

After (greppable, self-documenting, discoverable by tooling):
```hcl
lifecycle {
  ignore_changes = [spec[0].template[0].spec[0].dns_config] # KYVERNO_LIFECYCLE_V1
}
```

## Test Plan

### Automated
```
$ rg -c 'KYVERNO_LIFECYCLE_V1' stacks/ --include='*.tf' --include='*.tf.example' \
    | awk -F: '{s+=$2} END {print s}'
27

$ git diff --stat | grep -E '\.(tf|tf\.example|md)$' | wc -l
21

# All code-file diffs are 1 insertion + 1 deletion per marker site,
# except beads-server (3), ebooks (4), immich (3), uptime-kuma (2).
$ git diff --stat stacks/ | tail -1
20 files changed, 45 insertions(+), 28 deletions(-)
```

### Manual Verification

No apply required — HCL comments only. Zero effect on any stack's plan output.
Future audits: `rg 'KYVERNO_LIFECYCLE_V1' stacks/ | wc -l` must grow as new
pod-owning resources are added.

## Reproduce locally
1. `cd infra && git pull`
2. `rg 'KYVERNO_LIFECYCLE_V1' stacks/` → expect 27 hits in 19 files
3. Grep any new `kubernetes_deployment` for the marker; absence = missing
   suppression.

Closes: code-28m

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 14:15:51 +00:00
Viktor Barzin
e80b2f026f [infra] Migrate Terraform state from local SOPS to PostgreSQL backend
Two-tier state architecture:
- Tier 0 (infra, platform, cnpg, vault, dbaas, external-secrets): local
  state with SOPS encryption in git — unchanged, required for bootstrap.
- Tier 1 (105 app stacks): PostgreSQL backend on CNPG cluster at
  10.0.20.200:5432/terraform_state with native pg_advisory_lock.

Motivation: multi-operator friction (every workstation needed SOPS + age +
git-crypt), bootstrap complexity for new operators, and headless agents/CI
needing the full encryption toolchain just to read state.

Changes:
- terragrunt.hcl: conditional backend (local vs pg) based on tier0 list
- scripts/tg: tier detection, auto-fetch PG creds from Vault for Tier 1,
  skip SOPS and Vault KV locking for Tier 1 stacks
- scripts/state-sync: tier-aware encrypt/decrypt (skips Tier 1)
- scripts/migrate-state-to-pg: one-shot migration script (idempotent)
- stacks/vault/main.tf: pg-terraform-state static role + K8s auth role
  for claude-agent namespace
- stacks/dbaas: terraform_state DB creation + MetalLB LoadBalancer
  service on shared IP 10.0.20.200
- Deleted 107 .tfstate.enc files for migrated Tier 1 stacks
- Cleaned up per-stack tiers.tf (now generated by root terragrunt.hcl)

[ci skip]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 19:33:12 +00:00
root
af090c818b Woodpecker CI deploy [CI SKIP] 2026-04-16 13:46:08 +00:00
Viktor Barzin
b1d152be1f [infra] Auto-create Cloudflare DNS records from ingress_factory
## Context

Deploying new services required manually adding hostnames to
cloudflare_proxied_names/cloudflare_non_proxied_names in config.tfvars —
a separate file from the service stack. This was frequently forgotten,
leaving services unreachable externally.

## This change:

- Add `dns_type` parameter to `ingress_factory` and `reverse_proxy/factory`
  modules. Setting `dns_type = "proxied"` or `"non-proxied"` auto-creates
  the Cloudflare DNS record (CNAME to tunnel or A/AAAA to public IP).
- Simplify cloudflared tunnel from 100 per-hostname rules to wildcard
  `*.viktorbarzin.me → Traefik`. Traefik still handles host-based routing.
- Add global Cloudflare provider via terragrunt.hcl (separate
  cloudflare_provider.tf with Vault-sourced API key).
- Migrate 118 hostnames from centralized config.tfvars to per-service
  dns_type. 17 hostnames remain centrally managed (Helm ingresses,
  special cases).
- Update docs, AGENTS.md, CLAUDE.md, dns.md runbook.

```
BEFORE                          AFTER
config.tfvars (manual list)     stacks/<svc>/main.tf
        |                         module "ingress" {
        v                           dns_type = "proxied"
stacks/cloudflared/               }
  for_each = list                     |
  cloudflare_record               auto-creates
  tunnel per-hostname             cloudflare_record + annotation
```

## What is NOT in this change:

- Uptime Kuma monitor migration (still reads from config.tfvars)
- 17 remaining centrally-managed hostnames (Helm, special cases)
- Removal of allow_overwrite (keep until migration confirmed stable)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 13:45:04 +00:00
Viktor Barzin
6ee5b70a36 priority-pass: update backend to v8 (expanded QR container margins) 2026-04-06 13:22:27 +03:00
Viktor Barzin
feeed5ac35 priority-pass: update backend to v7 (square QR container) 2026-04-06 13:16:35 +03:00
Viktor Barzin
0162b4f130 priority-pass: update backend to v6 (remove edge artifact scratches) 2026-04-06 13:09:45 +03:00
Viktor Barzin
c38ae944fc priority-pass: update backend to v5 (QR container sizing fix) 2026-04-06 13:06:18 +03:00
Viktor Barzin
1d7244e47a priority-pass: update backend to v4 (QR container clipping fix) 2026-04-06 13:00:33 +03:00
Viktor Barzin
0c44e11146 priority-pass: update backend to v3 (QR container layout fix) 2026-04-06 12:57:08 +03:00
Viktor Barzin
75b18717a1 priority-pass: update backend to v2 (QR code preservation fix) 2026-04-06 12:53:45 +03:00
Viktor Barzin
ef6f57e82c priority-pass: update frontend image to v5 (clipboard paste support) 2026-04-06 12:44:19 +03:00
Viktor Barzin
3676cdbeeb state(technitium): update encrypted state 2026-04-06 12:40:55 +03:00
Viktor Barzin
1f4e8cb278 use registry.viktorbarzin.me hostname for private images + protect ingress
- Switch priority-pass images from 10.0.20.10:5050 to registry.viktorbarzin.me
- Add containerd hosts.toml for registry.viktorbarzin.me on all nodes + template
  (redirects to 10.0.20.10:5050 LAN direct, avoids Traefik round-trip)
- Enable Authentik protection on priority-pass ingress
2026-03-23 01:02:27 +02:00
Viktor Barzin
e9919d8fc9 fix priority-pass: bump backend memory to 512Mi (OOM with OpenCV) 2026-03-23 00:58:39 +02:00
Viktor Barzin
0674d6e538 deploy priority-pass app to cluster via private registry
- SvelteKit frontend + FastAPI backend in single pod with sidecar pattern
- Images pushed to 10.0.20.10:5050 private registry (v4/v1)
- SvelteKit server route proxies /api/transform to backend on 127.0.0.1:8000
- Exposed at priority-pass.viktorbarzin.me (Cloudflare-proxied, no auth)
- Uses imagePullSecrets for authenticated registry pulls
2026-03-23 00:55:41 +02:00