infra/docs/plans/2026-06-04-f1-stream-extraction-design.md
Viktor Barzin 90ad6b9125 fan-control: presence-aware IPMI fan curve for the R730 PVE host
The iDRAC stock curve runs the CPU at ~72°C on the 7080 RPM floor even
under load (optimises for quiet, not cool). Add a bash daemon + systemd
unit that drives the chassis fans from CPU temp on two curves, picked by
garage occupancy (the server is in the garage): COOL when empty
(measured ~58-65°C under load), QUIET near the silent floor when the
ha-sofia garage door shows someone is there (open, or <15min since last
activity).

Manual fan mode is backstopped: bash EXIT trap + systemd ExecStopPost
hand fans back to Dell auto on stop/crash; CPU>=83°C or repeated IPMI
failures do the same. Pushgateway metrics (job=fan_control). 36 unit
tests cover the pure curve/hysteresis/presence/parse logic; DRY_RUN +
RUN_ONCE for integration checks. Deployed and verified on 192.168.1.127
(CPU 70->58°C in cool mode, hysteresis stepping confirmed).

Design:  docs/plans/2026-06-04-pve-fan-control-design.md
Runbook: docs/runbooks/fan-control.md

[ci skip]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 09:19:11 +00:00

78 lines
4.1 KiB
Markdown

# f1-stream extraction + productionization — design (2026-06-04)
## Problem
`f1-stream` (FastAPI backend serving a SvelteKit SPA; ~15 pluggable stream
extractors + a Playwright/chrome-service playback verifier) lived **inside**
the infra monorepo at `infra/stacks/f1-stream/files/`. It had:
- no standalone repo — source coupled to the Terraform stack;
- **no real CI** — only a manual `redeploy.sh` doing a local `docker buildx`
push to DockerHub (`viktorbarzin/f1-stream`) + `kubectl rollout restart`;
- no README, no tests, a loose unpinned `requirements.txt`, no semver tags;
- a stale CI claim in docs ("migrated to GHA, Woodpecker repo id 10") that did
not match reality (no GHA workflow ever existed for it).
## Goal
Extract the app into its own Forgejo repo `viktor/f1-stream` and productionize
it, mirroring the established owned-app pattern (`tuya_bridge`, `job-hunter`,
`tripit`, `travel-agent`).
## Decisions (with rationale)
- **Registry → Forgejo private** (`forgejo.viktorbarzin.me/viktor/f1-stream`),
matching the fleet standard. Needs the `registry-credentials` pull secret
(Kyverno-synced to every namespace) on the deployment.
- **Packaging → Poetry + ruff + mypy** (replaces the loose pip
`requirements.txt`). Python **package stays `backend`** — imports are
`from backend.x` and the entrypoint is `uvicorn backend.main:app`; renaming
would churn every module + the Dockerfile + the staticfiles path. Python
**3.13 kept** (the live image already runs it; tripit's 3.12 pin is for
zxing-cpp/pymupdf, which f1-stream lacks).
- **Tests → pragmatic pure-logic only**. The extractors + verifier are
network/browser-bound; full coverage is brittle. Unit-test the deterministic
core: `m3u8_rewriter` (incl. the EXT-X tag rewriters), the `proxy` HLS
parsers, `schedule` parsing/status, the extractor `registry`. 63 tests.
- **CI → single `.woodpecker.yml`**: `lint-and-test` (ruff + mypy + pytest on
`python:3.13-slim`) → `build-and-push` (buildx → Forgejo, tags `latest` +
`${CI_COMMIT_SHA:0:8}`) → `deploy` (`kubectl set image` + `rollout status`).
**Keel stays enrolled** as a redundant net. This is the `tuya_bridge`
"build drives the rollout" model + a `travel-agent`-style test gate.
- A Slack-notify step was prototyped but **dropped**: the
`environment: { from_secret }` form is rejected by this Woodpecker
version's pipeline-struct decoder (`yaml: did not find expected key`), and
the canonical owned-app refs (`tuya_bridge`, `job-hunter`) have no Slack
step. Deploy success is confirmed by `rollout status`.
- **Versioning → first git tag `v2.0.1`** (continuity with the existing image
lineage; a fresh `v0.1.0` on a production 2.x app would mislead
monitoring/homepage). Deviates deliberately from the `v0.1.0` precedent of
tripit/travel-agent.
- **Runtime stays root** (matching the prior working image) to avoid a
non-root regression on the `/data` NFS write path and the Playwright browser
cache. Non-root is a possible future hardening.
## Terraform delta (the only infra change)
`infra/stacks/f1-stream/main.tf`:
- image `viktorbarzin/f1-stream:latest` (DockerHub) →
`forgejo.viktorbarzin.me/viktor/f1-stream:${var.image_tag}` (new
`var.image_tag`, default `latest`);
- add `image_pull_secrets { name = "registry-credentials" }` to the pod spec;
- delete `files/` (source now lives in the standalone repo) and `redeploy.sh`.
The image field is in the deployment's `ignore_changes` (KEEL_IGNORE_IMAGE), so
the live tag is managed by CI/Keel, not Terraform. Everything else — namespace,
ExternalSecrets (`f1-stream-secrets`, `chrome-service-client-secrets`), NFS data
volume, Anubis PoW policy, `ingress_factory`, homepage + x402 annotations,
Discord + chrome-service env — is unchanged.
## Blast radius
- The `f1-stream` K8s service is the only consumer; no other stack references
`viktorbarzin/f1-stream` or the `files/` dir (verified: no `path.module` /
`archive_file` / `null_resource` references the dir).
- Adding `imagePullSecrets` triggers one Recreate rollout that pulls the
*current* (still-DockerHub, public) image — safe; CI then switches it to the
Forgejo image.