Commit graph

4740 commits

Author SHA1 Message Date
Viktor Barzin
1a63fee4e4 cloudflare: drop 6 dead legacy DNS names (zone at Free-plan 200-record cap)
Some checks failed
ci/woodpecker/push/default Pipeline failed
authelia, immich-powertools, loki, mcaptcha, nfty, whiteboard removed from
cloudflare_proxied_names — all verified dead (no HTTP response, no cluster
route; authelia superseded by Authentik, nfty was a typo of ntfy, whiteboard
was excalidraw's old name). The cap blocked the new drone-logbook stack's
dronelog record (Cloudflare error 81045). Records already destroyed via
targeted local apply; Viktor approved the removal. Zone now at 195/200.
2026-07-04 09:31:32 +00:00
Viktor Barzin
7e49bf394d Merge remote-tracking branch 'forgejo/master' into wizard/drone-logbook
Some checks failed
ci/woodpecker/push/default Pipeline failed
2026-07-04 08:44:04 +00:00
Viktor Barzin
c52cdd1f68 Merge branch 'master' of https://forgejo.viktorbarzin.me/viktor/infra
Some checks failed
ci/woodpecker/push/default Pipeline was canceled
2026-07-04 08:43:55 +00:00
Viktor Barzin
c868ef3332 nfs_directories: add drone-logbook sync-logs + backup dirs
Drop folder for the new drone-logbook stack's auto-import (SYNC_LOGS_PATH)
and its daily backup target. Both created on 192.168.1.127 (root:www-data,
2777 — root-squash-writable like vaultwarden-backup).
2026-07-04 08:43:38 +00:00
Viktor Barzin
50778d47d3 drone-logbook: new stack — self-hosted Open DroneLog at dronelog.viktorbarzin.me
Viktor asked to self-host the DJI flight-log analyzer for his DJI Mini 4 Pro
(his fork ViktorBarzin/drone-logbook -> upstream arpanghosh8453/open-dronelog).
Upstream ghcr image with Keel auto-upgrade, DuckDB data on an encrypted
proxmox-lvm PVC (GPS traces = sensitive), NFS /sync-logs drop folder imported
every 8h, daily backup CronJob to /srv/nfs/drone-logbook-backup (vaultwarden
pattern), Authentik-gated ingress, PROFILE_CREATION_PASS from Vault via ESO.
Design + plan in docs/plans/; service-catalog updated.
2026-07-04 08:42:53 +00:00
Viktor Barzin
d9717a53bf vault-token-renew runbook: document the self-heal behavior
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Drift guard section rewritten: admin-capable clobbers now self-heal at the
nightly run (HEALED log line); weak clobbers keep the loud DRIFT failure;
manual re-mint is only the weak-clobber recovery now.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 20:20:44 +00:00
Viktor Barzin
4a7b6db806 vault-token-renew: self-heal the periodic token on admin-capable clobber
Viktor asked for 'vault login -method=oidc' to work seamlessly: the OIDC
login the docs prescribe kept clobbering ~/.vault-token with a 7-day token,
and detect-only DRIFT failures went unnoticed for weeks (weekly-expiry
loop, twice in June). On drift the renewer now re-mints the periodic token
with the clobbering token's own authority (Vault's 403 is the judge — no
policy guessing), sanity-checks it, replaces the file atomically, and
revokes stale token-devvm-wizard leftovers. Weak/read-only clobbers still
fail loudly on purpose. Design: docs/plans/2026-07-03-vault-token-self-heal-design.md

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 20:20:00 +00:00
Viktor Barzin
8631709ca2 vault-token-renew: pure helpers for the self-heal revoke filter
vtr_accessor parses the accessor from lookup JSON; vtr_is_stale_periodic
decides which old token-devvm-wizard tokens a heal may revoke (never the
just-minted one, never foreign tokens, nothing when the keeper is unknown).
TDD red-green for the heal branch that lands next.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 20:19:09 +00:00
Viktor Barzin
029b65ff93 state(vault): update encrypted state 2026-07-03 20:14:54 +00:00
Viktor Barzin
c48ce73c80 state(vault): update encrypted state 2026-07-03 20:14:35 +00:00
Viktor Barzin
b03a295397 state(vault): update encrypted state 2026-07-03 20:14:18 +00:00
Viktor Barzin
a07a603b80 docs/plans: vault-token self-heal implementation plan
Task-by-task TDD plan for the approved self-heal design: pure-function
tests first, then the heal branch, runbook update, deploy + live clobber
simulation, landing and memory updates.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 20:09:36 +00:00
Viktor Barzin
e2bfb20c84 docs/plans: vault-token self-heal design (devvm renewer)
Viktor asked to make 'vault login -method=oidc' work seamlessly on devvm:
today any OIDC login clobbers the permanent periodic token in
~/.vault-token, the drift guard only logs the drift, and his access
effectively expires weekly. Approved design: the nightly renewer re-mints
the periodic token from any admin-capable clobber (weak clobbers keep
failing loudly) and revokes stale periodic tokens after each heal.
Implementation follows on this branch.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 20:02:53 +00:00
Viktor Barzin
6698018ab6 service-catalog: add tasks row + tasks to the proxied-domains list
Some checks failed
ci/woodpecker/push/default Pipeline failed
Docs-with-change convention: the new tasks stack (Reminders-style PWA over
Nextcloud CalDAV) gets its catalog entry — what it is, its CNPG db + Vault
static role, the auth=required/X-authentik-username trust model with the
SEC-1 NetworkPolicy, and the ADR-0002 CI/CD path — and tasks joins the
Cloudflare proxied hostname list.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 19:53:42 +00:00
Viktor Barzin
02640df620 stacks/tasks: new stack for the tasks PWA (Authentik-gated, CNPG-backed)
Deploys the Reminders-style tasks app at tasks.viktorbarzin.me: namespace,
ExternalSecrets (fernet_key from secret/tasks; TASKS_DB_DSN composed from
the pg-tasks static-creds password the tripit way), single-replica
Deployment of ghcr.io/viktorbarzin/tasks:latest (image ignore_changes per
the fleet set-image pattern; Reloader restarts it on the 7-day DB password
rotation; /healthz probes on 8000; Europe/Sofia local tz; DEV_USER
deliberately absent — security invariant), Service on 8000, and an
ingress_factory host with auth=required + dns_type=proxied since Authentik
forward-auth is the app's only gate. NetworkPolicy tasks-ingress (SEC-1)
limits pod ingress to the traefik namespace plus monitoring on 8000 for
/metrics, so the trusted X-authentik-username header cannot be spoofed by
other pods.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 19:53:27 +00:00
Viktor Barzin
e0db1054e7 dbaas+vault: provision tasks CNPG database, role and rotating password
The new tasks PWA (Reminders-style front-end over Nextcloud CalDAV, per
tasks/docs/2026-07-03-tasks-pwa-design.md) needs its own Postgres database
for Connected Accounts and sync state. Follows the tripit/job_hunter
pattern exactly: idempotent null_resource creates role+db on the CNPG
primary with a placeholder password, and the Vault database engine static
role pg-tasks (added to the postgresql connection allowed_roles) rotates
the real password every 7 days, consumed by the tasks stack via a
vault-database ExternalSecret.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 19:53:13 +00:00
Viktor Barzin
9dcd3b0d5d Merge remote-tracking branch 'forgejo/master' into wizard/stem95su-cutover
All checks were successful
ci/woodpecker/push/default Pipeline was successful
2026-07-03 15:27:04 +00:00
Viktor Barzin
5367d4a055 paperless-mail-ingest: rules process inline attachments (Apple Mail lesson)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor's first real forward carried the invoice PDF with
Content-Disposition: inline (Apple Mail does this for real documents),
and the attachments-only rules consumed nothing — recorded
PROCESSED_WO_CONSUMPTION, which also blocks reprocessing. Flipped all 5
rules to attachment_type=2 (process inline) via the API and documented
the trade-off + the ProcessedMail unblock step.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 15:25:44 +00:00
Viktor Barzin
21c6e7112e stem95su: retire the in-cluster serving stack — now a Valia site on Pages
Completes the ADR-0018 cutover. The stack is emptied to a tombstone so
CI destroys nginx, the NFS content volume, the ingress, the per-site
gdrive-sync CronJob and the namespace; serving + sync are owned by
stacks/valia-sites since the cutover commits. Catalog + runbook updated
to the migrated state (incl. the one-time 42.9→21.4MB video compression
Viktor approved).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 15:22:32 +00:00
Viktor Barzin
974c9976e3 valia-sites: take over stem95su DNS (manage_dns=true) — cutover half 2
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Creates the public proxied CNAME stem95su -> stem95su.pages.dev and
adds the internal split-horizon entry via the valia-sites-dns
ConfigMap (the sync's update pass repoints the existing internal
record). Completes the ADR-0018 cutover; the old in-cluster serving
stack is retired in a follow-up.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 15:21:18 +00:00
Viktor Barzin
5c8e9daabd stem95su: release the public CNAME (dns_type=none) for the Pages cutover
All checks were successful
ci/woodpecker/push/default Pipeline was successful
First half of the ADR-0018 stem95su cutover: the tunnel-target CNAME is
destroyed so stacks/valia-sites can create the Pages-target record for
the same name (Cloudflare allows one CNAME per name; the follow-up
commit flips manage_dns=true there). stem_video.mp4 was compressed to
21.4MB with Viktor's explicit OK, clearing the 25MB Pages cap; content
is already deployed on the stem95su Pages project. Brief public
NXDOMAIN window between the two applies is accepted.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 15:21:18 +00:00
Viktor Barzin
c1ee6863b3 mailserver docs: troubleshooting entry for the postsrsd 100%-CPU spin
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Hit during the docs@ rollout: after a pod restart postsrsd came up
spinning without binding its TCP ports, so postfix cleanup tempfailed
every message with 451 queue file write error. Document the signature
and the supervisorctl-restart / pod-recreate fix.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 14:39:13 +00:00
Viktor Barzin
4ee4d1927d mailserver: guard alias filter against short lines with a lazy ternary
Some checks failed
ci/woodpecker/push/default Pipeline was canceled
CI pipeline 469 failed with 'Invalid index' on the postfix_virtual alias
filter: terraform only short-circuits &&/|| from v1.6, and the older
terraform in the infra-ci image still evaluated split(" ", line)[1] for
the blank and comment lines that have been in extra/aliases.txt since the
plans@ block. The devvm's newer terraform short-circuits, which is why the
local apply of the same commit passed. A conditional expression is lazy on
every terraform version, so move the length guard into a ternary.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 14:38:30 +00:00
Viktor Barzin
68b9858eff paperless-mail-ingest runbook: manual mail_fetcher must drop to the paperless user
All checks were successful
ci/woodpecker/push/default Pipeline was successful
A root-run kubectl exec mail_fetcher downloads attachments root-owned into
the scratch dir and the celery consumer (uid 1000) fails with
PermissionError — found during the build E2E. Document s6-setuidgid usage
and the recovery step.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 14:26:12 +00:00
Viktor Barzin
77fcb08e8e mailserver: add docs@ paperless ingest mailbox (sieve sender allowlist)
Some checks failed
ci/woodpecker/push/default Pipeline failed
Viktor asked to forward arbitrary emails with PDF attachments into
paperless-ngx, with the forwarding sender mapping 1:1 to the paperless
account that owns the document. paperless-ngx's built-in IMAP consumer
already does the sender->owner mapping, so the infra half is a dedicated
real mailbox docs@viktorbarzin.me: an explicit self-alias (the @domain
catch-all would otherwise divert it into the TripIt-swept spam@ mailbox,
whose sweeper LLM-parses and auto-replies to mail from linked senders)
plus a per-user Dovecot sieve that discards non-family senders at
delivery (chosen behaviour for unmatched senders: ignore and delete;
also keeps spam out of the guessable address). The mailbox credential
was added to Vault secret/platform.mailserver_accounts. Paperless-side
mail account + 5 per-sender rules are DB state, configured via the API
per the new runbook docs/runbooks/paperless-mail-ingest.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 14:06:19 +00:00
Viktor Barzin
f5187806f9 ADR-0017: replace ASCII trunk diagram with excalidraw VLAN-tagging diagram
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor wants the traffic-flow view as a colored excalidraw instead of
the ASCII block (which was the only thing rendering after the earlier
VLAN-tagging SVG commit failed to push — a locally-masked non-fast-
forward this session, not a merge clobber). Ships both the editable
.excalidraw scene and a hand-drawn-style SVG export embedded in the
Traffic-on-the-trunk section: two lanes showing where the 802.1Q tag
is added, carried (only P5<->vmbr0) and stripped, L2 membership drops
vs L3 firewall verdicts.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 13:21:59 +00:00
Viktor Barzin
316cdb7441 docs: valia-sites runbook + dns.md CM mechanism + service-catalog entries
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Runbook covers add/update/retire (one map entry; internal DNS now
cleans up after itself), content rules for Valia's folders, and the
failure modes incl. both token re-mint paths. dns.md superset-rule
paragraph now describes the declarative ConfigMap reconcile instead of
hand-added static CNAMEs. Catalog: new valia-sites row; stem95su row
notes its Pages cutover is parked on the 42.9MB stem_video.mp4
exceeding the 25MB Pages per-file cap.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 12:46:24 +00:00
Viktor Barzin
4a3c8287c3 Merge remote-tracking branch 'forgejo/master' into wizard/valia-sites
All checks were successful
ci/woodpecker/push/default Pipeline was successful
2026-07-03 12:43:28 +00:00
Viktor Barzin
e0991853e4 valia-sites: 25MB Pages-limit guard; cloudflared: drop removed{} (CI TF <1.7)
Two fixes from the first live runs. (1) The sync job now skips a whole
site when any file exceeds Cloudflare Pages' 25MB per-file cap, leaving
current serving untouched — stem95su's stem_board.html references a
42.9MB stem_video.mp4, which made every run fail; the guard turns that
into a loud skip so bridge keeps syncing. (2) The CI terraform is older
than 1.7 and rejects removed{} blocks anywhere (pipelines 461/464), so
the bridge record handoff was completed with a one-time manual
'tg state rm module.cloudflared.cloudflare_record.bridge_pages' from
the main checkout; the block is deleted and the module comment records
the manual step.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 12:43:13 +00:00
Viktor Barzin
348f64d34d ADR-0017: add physical-cabling diagram (wires only)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked for one diagram showing just the physical connections
between nodes, separate from the logical/VLAN topology: ISP->AX6000,
the in-wall apartment->garage run into P1, 4G router (cellular OOB),
UPS mgmt, the PoE cat6 to the camera, the LAN1 cable to eno1, dark
eno2 fallback + free eno3/4, iDRAC on shared-LOM, and the note that
everything else on the R730 is virtual. Referenced from the ADR next
to the logical SVG.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 12:40:29 +00:00
Viktor Barzin
126cf4c88e Merge origin/master into wizard/cctv-adr-trunk
All checks were successful
ci/woodpecker/push/default Pipeline was successful
2026-07-03 12:32:00 +00:00
Viktor Barzin
695e020111 cloudflared: move bridge removed{} to stack root — removed blocks are root-module-only
Some checks failed
ci/woodpecker/push/default Pipeline failed
Pipeline 461 failed terraform init: the removed{} handoff block sat in
the stack-local module, but Terraform only allows removed blocks in the
root module. Same intent, correct position (from =
module.cloudflared.cloudflare_record.bridge_pages, destroy=false).
Without this the stale state entry would make the next cloudflared
apply destroy the record valia-sites now owns.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 12:31:53 +00:00
Viktor Barzin
5d16a18cf4 ADR-0017: document trunk traffic semantics + ASCII topology
While reviewing the single-switch design Viktor asked whether both the
home LAN and the camera VLAN 'go via pfSense which forwards upstream' -
a natural misreading a future reader would repeat. Added a section
spelling out the vmbr0 fork: untagged home LAN is L2-bridged past
pfSense (gateway stays the AX6000, rack outage does not affect it, OOB
via 4G survives), while tagged-30 can only land on the dCCTV interface,
making a pfSense bypass impossible by construction. Includes a compact
ASCII topology for terminal readers alongside the SVG.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 12:31:48 +00:00
Viktor Barzin
8b80b4cc41 valia-sites: registry stack for Valia's Pages sites + declarative internal DNS (ADR-0018)
Some checks failed
Build valia-sites-sync / build (push) Waiting to run
ci/woodpecker/push/default Pipeline failed
Valia keeps asking Viktor to host 1-page sites from her Drive folders;
this makes it one map entry. New stacks/valia-sites: per site a CF Pages
project + custom domain + proxied CNAME (bridge adopted via import{}),
a ConfigMap feed (valia-sites-dns) the technitium ingress-dns-sync
script now reconciles internal CNAMEs from (add/update/REMOVE — fixes
the add-only stale-record gotcha), and one shared 10-min CronJob that
mirrors each Content folder (rclone, drive.readonly, stem95su's guards)
and wrangler-deploys ONLY on manifest change (free-tier deploy cap).
Scoped CF Pages token + shared rclone conf in secret/valia-sites; the
Global API Key never enters a pod. cloudflared forgets bridge's record
via removed{} (no destroy). stem95su is in the map dns-parked
(manage_dns=false) until its cutover commit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 12:28:06 +00:00
Viktor Barzin
5c42155b81 docs: Valia-sites domain language + ADR-0018 (off-infra Pages, in-cluster sync)
Grill session with Viktor: his mother Valia will keep asking for 1-page
site hosting, so the pattern is being made repeatable. Decisions: all
Valia sites serve off-infra on Cloudflare Pages (survive homelab
outages); one shared in-cluster CronJob mirrors her Drive folders every
10 min and redeploys on change; English subdomain names picked by
Viktor; failed-Job-only visibility; stem95su migrates onto the pattern.
CONTEXT.md gains Valia site / Content folder / Entry file; full
rationale and rejected options in ADR-0018.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 12:17:45 +00:00
Viktor Barzin
e1bd111562 rename CF Pages site most.viktorbarzin.me -> bridge.viktorbarzin.me
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked to rename the 'мост' school static site to 'bridge'.
New Cloudflare Pages project 'bridge' (bridge-cv2.pages.dev) already
deployed and the custom domain attached; this renames the public CNAME
(TF resource most_pages -> bridge_pages, destroy+create swaps the
record) and the internal split-horizon static CNAME in the
ingress-dns-sync CronJob. The old 'most' Pages project and the stale
internal 'most' record are removed out-of-band after this applies.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 10:52:30 +00:00
Viktor Barzin
7dd80b6c7c technitium: mirror most.viktorbarzin.me into the internal zone (CF Pages site)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
The internal split-horizon zone is authoritative for viktorbarzin.me,
so the new Cloudflare Pages site (most.viktorbarzin.me, added for
Viktor's 'мост' school static site) NXDOMAINed for every internal
client — LAN, VLANs and pods — while resolving fine externally.
Per the superset rule, add it as a static CNAME (-> most-6if.pages.dev)
in the ingress-dns-sync CronJob next to the mail-auth records, and
document the off-infra-site case in dns.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 10:10:46 +00:00
Viktor Barzin
217a54be9d cloudflared: add most.viktorbarzin.me CNAME for Cloudflare Pages site
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked to host a static HTML site (the 'мост' school project,
ОбУ „Отец Паисий", pulled from his Google Drive) on Cloudflare Pages
with a custom domain, as a try-out of Pages hosting. The site content
is deployed off-infra via wrangler to the Pages project 'most'
(most-6if.pages.dev); this CNAME points most.viktorbarzin.me at it.
The custom domain is already attached to the Pages project and is
waiting on this DNS record to validate.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 10:06:33 +00:00
Viktor Barzin
be80ef23bb ADR-0017 rev 3: single switch — PE replaces the SG105E, CCTV rides a VLAN-30 trunk on the LAN1 cable
Viktor prefers not running two switches, so the TL-SG105PE takes over
all rack duties (apartment uplink, 4G, UPS, camera PoE) and the CCTV
segment moves onto a managed tagged trunk over the existing LAN1 cable:
pfSense net3 re-pointed from vmbr2 to vmbr0 tag=30 (applied live; same
MAC so vtnet3/dCCTV survived untouched). This is safe where the original
802.1Q rejection was not, because the managed switch is the only device
on eno1 and polices VLAN-30 membership. eno2/vmbr2 kept dormant as the
documented fallback. Old SG105E retires to cold spare; PE inherits
192.168.1.6. Glossary Segment term updated (all three segments are now
bridge-tags feeding untagged pfSense vNICs).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 09:15:52 +00:00
Viktor Barzin
4082934bc1 Merge origin/master into wizard/cctv-two-switch
All checks were successful
ci/woodpecker/push/default Pipeline was successful
2026-07-03 08:37:34 +00:00
Viktor Barzin
e11bd6e893 ADR-0017 rev 2: two switches — the PE is a dedicated CCTV island, no VLAN table anywhere
Viktor asked to verify free ports on the garage switch (192.168.1.6)
before finalizing. Logging into it showed it is NOT the TL-SG105PE from
the plan but a pre-existing non-PoE TL-SG105E with 4 of 5 ports in use
(apartment uplink, R730 LAN1, 4G router, UPS) - the single-shared-switch
port-VLAN design written earlier today was based on conflating the two
devices. Corrected: the new TL-SG105PE carries ONLY camera + eno2
uplink (mgmt 10.0.30.6 inside the segment), the old switch is untouched,
and no VLAN config exists anywhere. ADR, topology SVG and networking.md
updated to match.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 08:37:15 +00:00
Viktor Barzin
08fb65827c tripit: set PLACE_PHOTO_PROVIDER=wikipedia — real place preview photos
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked for place photos on the tripit Trip board. The app-side
work (add-time photo fetch, board place cards) shipped in tripit
v0.106.0, but prod never set PLACE_PHOTO_PROVIDER, so the fake provider
would store placeholder PNGs for every hand-added place. Same class of
fake-default gap as PLACE_RESOLVER_MODE (set explicitly for the same
reason); the ADR-0035 rollout had left both the env flip and its
backfill cron undone.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 21:57:21 +00:00
Viktor Barzin
b761701994 ADR-0017: add network topology diagram (SVG) next to the decision
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor asked for a reviewable network visualization committed alongside
the CCTV-segment ADR. Hand-drawn SVG (renders on Forgejo, validated
palette): physical path camera -> TL-SG105PE port-VLANs -> eno2/vmbr2 ->
pfSense dCCTV, the firewall flows (Frigate RTSP, ha-sofia ISAPI/RTSP,
NTP-only egress, default deny), and the dashed camera-day steps (patch
cable, cat6 run, AX6000 static route).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 20:25:28 +00:00
Viktor Barzin
248e186dce CCTV segment (dCCTV 10.0.30.0/24) on a dedicated pfSense leg for the garage camera
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor and emo are adding the first owned camera at the Sofia site (HiLook
IPC-T241H-C watching the garage / server rack). Viktor asked to finalize
emo's plan; the grilling session resolved emo's five open decisions and
replaced the doc's 802.1Q-trunk idea with the site idiom: a dedicated
physical leg (R730 eno2 -> vmbr2 -> pfSense net3 = dCCTV 10.0.30.1/24),
port-based VLAN split on the shared TL-SG105PE, camera default-deny with
NTP-only egress, Frigate + ha-sofia as the only consumers.

The PVE bridge, pfSense interface, Kea subnet and firewall rules were
applied live this session (hand-managed hosts, backed up). This commit
records the decision (ADR-0017), the glossary terms (Segment / CCTV
segment), the as-built architecture doc, and bumps Frigate's ADR-0016
VRAM budget 2000 -> 2300 MiB for the upcoming NVDEC stream.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 20:01:45 +00:00
3a5194c9d4 Merge pull request 'immich(frame-emo): show photos from the last 365 days (was 730)' (#18) from emo/frame-emo-1year into master
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Reviewed-on: #18
2026-07-02 19:05:31 +00:00
9e253d409a immich(frame-emo): show photos from the last 365 days (was 730)
Emil asked his Sofia Portal Mini photo-frame to show only the past
year of photos rolling from today, instead of the last two years.
Changes ImagesFromDays 730 -> 365 in the frame-emo Settings.yml.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 19:05:31 +00:00
Viktor Barzin
4c532dbf97 devvm containment: drop the MemoryHigh throttle band, straight to MemoryMax OOM
All checks were successful
ci/woodpecker/push/postmortem-todos Pipeline was successful
ci/woodpecker/push/default Pipeline was successful
t3.viktorbarzin.me went down 2026-07-02 15:42-16:35 UTC: an agent-spawned
12.3G ugrep plateaued inside t3-serve@wizard's MemoryHigh(12G)..MemoryMax(16G)
band. With MemorySwapMax=0 its anon pages were unreclaimable, so the kernel
throttled every task in the cgroup indefinitely (memory.pressure full ~80%,
oom_kill never fired) - the t3 event loop starved, the accept queue rotted,
and the terminal was dead until the hog was SIGKILLed by hand.

The 2026-06-22 design assumed 'throttle to a crawl, then OOM locally'; a hog
that stabilises between high and max never OOMs, so the throttle band is a
livelock zone, not a safety layer. Viktor asked to close that gap: MemoryHigh
is now explicitly infinity on all three work cgroup definitions (t3-serve@
unit, user-<uid>.slice drop-in, docker.slice) so a runaway is cgroup-OOM-
killed at MemoryMax immediately - OOMPolicy=continue already keeps the t3
server alive when a child dies. MemoryMax/MemorySwapMax=0/earlyoom unchanged.
Applied live to the devvm the same day (daemon-reload + runtime set-property
on running cgroups, no session restarts). Post-mortem addendum + runbook
updated in the same commit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 16:59:38 +00:00
Viktor Barzin
684ca4527c docs(CLAUDE.md): T4 now has a VRAM budget + watchdog (ADR-0016, dry-run); note llama-swap budget miscalibration
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Session wrap-up doc sync: the Immich note still claimed the shared T4 had no
VRAM isolation. Record the gpumem budget/watchdog shipped earlier today, that
the watchdog is observe-only, and that budgets need a retune (llama-swap's
real 16k-ctx resident is ~7GB, not 4.35) before arming.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-07-02 15:20:06 +00:00
Viktor Barzin
21afae85c9 dawarich: dedicated 100/1000 Traefik rate limit (default 10/50 429'd page loads)
All checks were successful
ci/woodpecker/push/default Pipeline was successful
Viktor saw dawarich throwing 429s through Traefik and asked to loosen
the burst for it. The access log confirms the burst pattern: one page
load fires the whole fingerprinted-asset tail (SVG store badges,
favicons, webmanifest) from a single client IP and trips the default
10 req/s / burst 50 limiter (repro: 80 parallel GETs -> 28x 429).
Same remedy as ha-sofia, ActualBudget, noVNC, tripit, health and
authentik: dedicated dawarich-rate-limit middleware (average 100 /
burst 1000) + skip_default_rate_limit on the dawarich ingress. Also
updates the networking.md middleware enumerations (adding the
previously undocumented tripit/health limiters alongside dawarich).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-07-02 15:03:08 +00:00
Viktor Barzin
91d0213d1a Merge remote-tracking branch 'forgejo/master' into wizard/excalidraw-export-rename
Some checks failed
ci/woodpecker/push/default Pipeline was successful
Build excalidraw-library / build (push) Has been cancelled
2026-07-02 14:29:34 +00:00