Completes the ADR-0018 cutover. The stack is emptied to a tombstone so CI destroys nginx, the NFS content volume, the ingress, the per-site gdrive-sync CronJob and the namespace; serving + sync are owned by stacks/valia-sites since the cutover commits. Catalog + runbook updated to the migrated state (incl. the one-time 42.9→21.4MB video compression Viktor approved). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
98 lines
4.8 KiB
Markdown
98 lines
4.8 KiB
Markdown
# Valia sites — add / update / retire
|
||
|
||
Off-infra static sites authored by Valia (ADR-0018, CONTEXT.md "Valia site").
|
||
Serving: Cloudflare Pages. Freshness: the `valia-sites-sync` CronJob
|
||
(`valia-sites` ns) mirrors each Content folder every 10 minutes and deploys
|
||
only when the folder's manifest hash changed. Registry: `local.sites` in
|
||
`stacks/valia-sites/main.tf` — one entry per site drives everything (Pages
|
||
project, custom domain, public CNAME, internal split-horizon CNAME, sync).
|
||
|
||
Current sites: `bridge` (ОбУ „Отец Паисий“ — "мост"), `stem95su` (95. СУ STEM
|
||
board).
|
||
|
||
## Add a site
|
||
|
||
1. Valia shares the Drive folder with **vbarzin@gmail.com** (viewer is enough —
|
||
the pipeline is strictly read-only towards Drive).
|
||
2. Get the folder id from its URL (`drive.google.com/drive/folders/<ID>`).
|
||
3. Pick the **English** subdomain name (Viktor's call — CONTEXT.md naming rule).
|
||
4. Add one entry to `local.sites` in `stacks/valia-sites/main.tf`:
|
||
|
||
```hcl
|
||
<name> = {
|
||
folder_id = "<ID>"
|
||
src_path = "" # or "sub/folder" if servable files live deeper
|
||
entry_file = "index.html" # or whatever her main HTML file is called
|
||
manage_dns = true
|
||
}
|
||
```
|
||
|
||
5. Commit + push; CI applies. Within ~10 min the sync deploys content and the
|
||
site serves at `https://<name>.viktorbarzin.me` (custom-domain TLS takes
|
||
~5–10 min extra on first attach — CF returns 522 for the hostname until
|
||
then). Internal LAN/VLAN/pod resolution appears when the hourly
|
||
`technitium-ingress-dns-sync` next runs — trigger it early with:
|
||
`kubectl create job --from=cronjob/technitium-ingress-dns-sync valia-dns-now -n technitium`
|
||
|
||
## Content rules (what Valia's folder must look like)
|
||
|
||
- The **entry file** must exist — the sync stages a copy as `index.html` at
|
||
deploy time, so `/` works; the original filename keeps working too (deep
|
||
links survive). If the folder is empty or the entry file is missing, the
|
||
sync **skips the site and leaves it as-is** (never wipes a live site).
|
||
- Google-native files (Docs/Sheets) are **ignored** (`--drive-skip-gdocs`) —
|
||
only real files (`.html`, images, …) deploy. Gemini's HTML exports are fine.
|
||
- Per-file limit 25 MB (Cloudflare Pages), 20k files max — far beyond a
|
||
1-page site.
|
||
|
||
## Update a site
|
||
|
||
Nothing to do: Valia edits the folder, the site follows within ~10 minutes.
|
||
Force it early: `kubectl create job --from=cronjob/valia-sites-sync sync-now -n valia-sites`
|
||
|
||
## Rename / retire a site
|
||
|
||
Rename = retire + add (Pages projects can't be renamed). Retire:
|
||
|
||
1. Delete the entry from `local.sites`; commit + push. TF destroys the public
|
||
CNAME + custom domain + Pages project; the internal record is removed by
|
||
the next `technitium-ingress-dns-sync` run (its deletion pass drops any
|
||
internal `*.pages.dev` CNAME that left the `valia-sites-dns` ConfigMap —
|
||
scoped so it can never touch non-Pages records).
|
||
2. That's all — no manual DNS cleanup (the pre-ADR-0018 add-only gotcha is
|
||
fixed by the deletion pass).
|
||
|
||
## Failure modes / debugging
|
||
|
||
- **Visibility is failed-Job-only by choice** (ADR-0018): no alerts, no
|
||
notifications. Check: `kubectl get jobs -n valia-sites | tail`, logs of the
|
||
last `valia-sites-sync-*` pod.
|
||
- **Drive auth broken** (`FATAL … Drive list failed`): the shared
|
||
`secret/valia-sites.rclone_conf` token died. The GCP OAuth app
|
||
(`home-lab-1700868541205`) must stay published to "Production" or refresh
|
||
tokens expire weekly (same constraint as the old stem95su conf, which this
|
||
one was copied from). Re-mint and `vault kv patch secret/valia-sites
|
||
rclone_conf=@…`.
|
||
- **Wrangler auth broken**: `secret/valia-sites.cloudflare_pages_token` is a
|
||
SCOPED token (Pages Read+Write on the account, id
|
||
`355d2c9d11579bdad1e9498dafca30d5`) — re-mint via
|
||
`POST /user/tokens` with the Global API Key (`secret/platform`), patch
|
||
Vault. Do NOT put the Global API Key in the pod.
|
||
- **Site serves stale content**: check the state CM
|
||
(`kubectl get cm valia-sites-state -n valia-sites -o yaml`) — deleting a
|
||
site's key forces a redeploy on the next run.
|
||
- **`GUARD … skipping`** in logs: Valia's folder is empty or renamed the
|
||
entry file — the site deliberately kept its last content. Fix the folder or
|
||
update `entry_file`.
|
||
|
||
## History
|
||
|
||
- stem95su served in-cluster (nginx + NFS + its own rclone CronJob) until
|
||
2026-07-03, when it was cut over to this pattern and the old stack retired
|
||
(ADR-0018). The blocking 42.9 MB `stem_video.mp4` was compressed to 21.4 MB
|
||
(same 1080p, ~2.5 Mbps H.264) and replaced in Valia's folder with Viktor's
|
||
explicit one-time OK. `secret/stem95su` is superseded by
|
||
`secret/valia-sites`; `/srv/nfs/stem-site` on the PVE host is a harmless
|
||
leftover.
|
||
- bridge started as a hand-deployed wrangler experiment (2026-07-03, memory
|
||
id 7085) and was adopted into the stack the same day.
|