Adds REDDIT_CLIENT_ID / REDDIT_CLIENT_SECRET to the f1-stream deployment,
sourced from the f1-stream-secrets Secret with optional=true so the pod still
starts before the credentials exist. This activates the replays feature (app
repo ADR-0002) once reddit_client_id / reddit_client_secret are added to the
Vault "f1-stream" key (auto-synced via the ExternalSecret's dataFrom.extract)
and the pod is restarted. Dormant/no-op until then.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Post-rollout discovery during wrap-up: the London Portal Plus leases on the
GUEST network (Portal-75AE8F9C2A8A = 192.168.9.198), not the main LAN, so the
allowlist shipped in 8bac9914 would have 403'd it once it woke. Verified the
forwarded path end-to-end on the Flint 2 (read-only): VPN_PREROUTING_HOOK
hooks BOTH br-lan and br-guest into ROUTE_POLICY -> TUNNEL10_ROUTE_POLICY,
which marks all dst_net10 (10/8) traffic onto the WG tunnel — so the Portal
reaches 10.0.20.203 with source 192.168.9.198 once on-screen. (Side finding,
router-originated only: the firewall.user LOCAL_POLICY dst_net10 injection
from vpn.md has rotted — admin curls from the router itself don't tunnel;
clients unaffected. Not fixed here — live-device change, needs Viktor's OK.)
Middleware already applied live via targeted tg apply (20:11 UTC).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor asked to remove the truncation from memory outputs after multiple
agent sessions were misled by it: the 240-rune pretty preview cut memories
mid-sentence, made sessions wrongly conclude no full-content read-back
existed, and made a blind 'update --content' from the preview destroy the
stored tail. The server never truncated — this was purely client-side
display.
recall/list now print each memory's full content (still one line per
memory, newlines flattened). The per-turn recall hook pipes CLI output
through verbatim, so injected memories are complete too. printMemories is
split into a pure renderMemories for testability; truncatePreview and its
UTF-8-boundary test are deleted (nothing is sliced anymore, so the
invalid-UTF-8 class is gone by construction). v0.11.0 -> v0.12.0.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor asked to tighten who can see the immich-frame deployments: make
them not public while keeping the two Meta Portals working as frames.
The Portal app bakes the URL into the APK, so the same hostnames must
keep loading from the home networks with zero device or router changes.
- New shared Traefik middleware home-lans-only (Sofia/London/Valchedrym
LANs + 10/8 + internal v6) — separate from local-only so the remote
LANs don't inherit access to admin surfaces.
- New ingress_factory dns_type="internal": publicly-resolvable A record
carrying the internal Traefik LB IP (10.0.20.203). Outsiders resolve
but can't route; WG spokes policy-route 10/8 down the tunnel. Never
combine the allowlist with proxied DNS (cloudflared pod IPs are in
10/8 and would bypass it).
- Both frame ingresses: dns_type internal + allowlist attached +
external_monitor=false (drop the doomed [External] monitors).
- rybbit worker: highlights-immich route/site removed (off Cloudflare).
- Docs: CLAUDE.md/AGENTS.md ingress tiers, networking.md DNS categories,
design doc docs/plans/2026-07-04-immich-frame-lan-only-design.md.
Pre-verified: London router DNS returns RFC1918 answers unfiltered;
Technitium already CNAMEs both hosts to the LB; no public wildcard.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Rollernet's free tier failed the validation gates before any DNS change
(200 msgs / 10 MB per rolling week, then 48h of SMTP 5xx bounces —
worse than no backup MX; free accounts being discontinued). Viktor
chose to stay free, so the backup MX becomes a Postfix store-and-forward
relay on an Oracle Always-Free VM (mx2.viktorbarzin.me, MX pref 20),
draining via port 2526 through the existing pfSense HAProxy frontend
since Oracle blocks egress 25.
Two independent adversarial reviews then fixed the design: primary-side
drain enablement moved to the layers that actually reject (unknown-
client-hostname, spoof protection, anvil limits, rspamd reject tier ->
external_relay + action cap, never backscatter), monitoring moved off
the nonexistent cluster->tailnet path to allowlisted public-IP scrapes,
bounce lifetime cut to 1d (the VM can never deliver DSNs), OCI OS-level
iptables + reserved-IP + mandatory PAYG requirements added, and 4xx-only
postscreen hygiene replaces the blanket no-filtering stance.
ADR-0019 and the design doc renamed accordingly (rollernet -> oracle).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor asked for the Rollernet account credentials to live in
Vaultwarden (the personal password manager) rather than HashiCorp
Vault. Item 'Rollernet (backup MX)' created; doc updated to match.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
macOS Safari's Add to Dock (and iOS/Android home-screen installs) fetch
the app icon and web manifest without any session cookies, so the
Authentik forward-auth 302 on tasks.viktorbarzin.me made Safari fall
back to a letter monogram instead of the real icon. Viktor asked for an
ingress carve-out so exactly these five static PWA assets are publicly
fetchable: /apple-touch-icon.png, /favicon.png, /pwa-192x192.png,
/pwa-512x512.png, /manifest.webmanifest.
A second ingress_factory instance (auth=none, dns_type=none, same host)
routes only those paths straight to the tasks service; the SPA shell and
/api stay behind Authentik exactly as before. The new carve-out is also
registered in the Authentik walling-off probe so a future regression
(anything 302-ing these paths to Authentik again) alarms, and the
service catalog entry records the exception.
stacks/tasks/imports.tf adopts the live tasks resources into Terraform
state first: the stack's first-ever apply (pipeline 477, 2026-07-03)
died mid-apply after creating the resources but before the pg state
write, leaving tasks.states empty — without the import blocks this (and
every future) tasks apply would create-fail with 'already exists'. Same
pattern as the monitoring alert-digest adoption.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor wants inbound email to survive homelab outages without loss;
delayed delivery is acceptable and the budget is zero, which rules out
the previously doc-flagged Dynu option. Design adopts Roller Network's
free Secondary MX (3-week store-and-forward queue, no forced filtering,
catch-all-compatible) with our-side postscreen/rspamd whitelisting,
five validation gates before any DNS change, and a live failover test.
Also records the dangling-MTA-STS finding (TXT published, policy host
absent) as a follow-up. Implementation starts only after Viktor reviews
these docs; account will use rollernet@viktorbarzin.me.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
authelia, immich-powertools, loki, mcaptcha, nfty, whiteboard removed from
cloudflare_proxied_names — all verified dead (no HTTP response, no cluster
route; authelia superseded by Authentik, nfty was a typo of ntfy, whiteboard
was excalidraw's old name). The cap blocked the new drone-logbook stack's
dronelog record (Cloudflare error 81045). Records already destroyed via
targeted local apply; Viktor approved the removal. Zone now at 195/200.
Drop folder for the new drone-logbook stack's auto-import (SYNC_LOGS_PATH)
and its daily backup target. Both created on 192.168.1.127 (root:www-data,
2777 — root-squash-writable like vaultwarden-backup).
Viktor asked to self-host the DJI flight-log analyzer for his DJI Mini 4 Pro
(his fork ViktorBarzin/drone-logbook -> upstream arpanghosh8453/open-dronelog).
Upstream ghcr image with Keel auto-upgrade, DuckDB data on an encrypted
proxmox-lvm PVC (GPS traces = sensitive), NFS /sync-logs drop folder imported
every 8h, daily backup CronJob to /srv/nfs/drone-logbook-backup (vaultwarden
pattern), Authentik-gated ingress, PROFILE_CREATION_PASS from Vault via ESO.
Design + plan in docs/plans/; service-catalog updated.
Drift guard section rewritten: admin-capable clobbers now self-heal at the
nightly run (HEALED log line); weak clobbers keep the loud DRIFT failure;
manual re-mint is only the weak-clobber recovery now.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor asked for 'vault login -method=oidc' to work seamlessly: the OIDC
login the docs prescribe kept clobbering ~/.vault-token with a 7-day token,
and detect-only DRIFT failures went unnoticed for weeks (weekly-expiry
loop, twice in June). On drift the renewer now re-mints the periodic token
with the clobbering token's own authority (Vault's 403 is the judge — no
policy guessing), sanity-checks it, replaces the file atomically, and
revokes stale token-devvm-wizard leftovers. Weak/read-only clobbers still
fail loudly on purpose. Design: docs/plans/2026-07-03-vault-token-self-heal-design.md
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
vtr_accessor parses the accessor from lookup JSON; vtr_is_stale_periodic
decides which old token-devvm-wizard tokens a heal may revoke (never the
just-minted one, never foreign tokens, nothing when the keeper is unknown).
TDD red-green for the heal branch that lands next.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Task-by-task TDD plan for the approved self-heal design: pure-function
tests first, then the heal branch, runbook update, deploy + live clobber
simulation, landing and memory updates.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor asked to make 'vault login -method=oidc' work seamlessly on devvm:
today any OIDC login clobbers the permanent periodic token in
~/.vault-token, the drift guard only logs the drift, and his access
effectively expires weekly. Approved design: the nightly renewer re-mints
the periodic token from any admin-capable clobber (weak clobbers keep
failing loudly) and revokes stale periodic tokens after each heal.
Implementation follows on this branch.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Docs-with-change convention: the new tasks stack (Reminders-style PWA over
Nextcloud CalDAV) gets its catalog entry — what it is, its CNPG db + Vault
static role, the auth=required/X-authentik-username trust model with the
SEC-1 NetworkPolicy, and the ADR-0002 CI/CD path — and tasks joins the
Cloudflare proxied hostname list.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Deploys the Reminders-style tasks app at tasks.viktorbarzin.me: namespace,
ExternalSecrets (fernet_key from secret/tasks; TASKS_DB_DSN composed from
the pg-tasks static-creds password the tripit way), single-replica
Deployment of ghcr.io/viktorbarzin/tasks:latest (image ignore_changes per
the fleet set-image pattern; Reloader restarts it on the 7-day DB password
rotation; /healthz probes on 8000; Europe/Sofia local tz; DEV_USER
deliberately absent — security invariant), Service on 8000, and an
ingress_factory host with auth=required + dns_type=proxied since Authentik
forward-auth is the app's only gate. NetworkPolicy tasks-ingress (SEC-1)
limits pod ingress to the traefik namespace plus monitoring on 8000 for
/metrics, so the trusted X-authentik-username header cannot be spoofed by
other pods.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The new tasks PWA (Reminders-style front-end over Nextcloud CalDAV, per
tasks/docs/2026-07-03-tasks-pwa-design.md) needs its own Postgres database
for Connected Accounts and sync state. Follows the tripit/job_hunter
pattern exactly: idempotent null_resource creates role+db on the CNPG
primary with a placeholder password, and the Vault database engine static
role pg-tasks (added to the postgresql connection allowed_roles) rotates
the real password every 7 days, consumed by the tasks stack via a
vault-database ExternalSecret.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor's first real forward carried the invoice PDF with
Content-Disposition: inline (Apple Mail does this for real documents),
and the attachments-only rules consumed nothing — recorded
PROCESSED_WO_CONSUMPTION, which also blocks reprocessing. Flipped all 5
rules to attachment_type=2 (process inline) via the API and documented
the trade-off + the ProcessedMail unblock step.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Completes the ADR-0018 cutover. The stack is emptied to a tombstone so
CI destroys nginx, the NFS content volume, the ingress, the per-site
gdrive-sync CronJob and the namespace; serving + sync are owned by
stacks/valia-sites since the cutover commits. Catalog + runbook updated
to the migrated state (incl. the one-time 42.9→21.4MB video compression
Viktor approved).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Creates the public proxied CNAME stem95su -> stem95su.pages.dev and
adds the internal split-horizon entry via the valia-sites-dns
ConfigMap (the sync's update pass repoints the existing internal
record). Completes the ADR-0018 cutover; the old in-cluster serving
stack is retired in a follow-up.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
First half of the ADR-0018 stem95su cutover: the tunnel-target CNAME is
destroyed so stacks/valia-sites can create the Pages-target record for
the same name (Cloudflare allows one CNAME per name; the follow-up
commit flips manage_dns=true there). stem_video.mp4 was compressed to
21.4MB with Viktor's explicit OK, clearing the 25MB Pages cap; content
is already deployed on the stem95su Pages project. Brief public
NXDOMAIN window between the two applies is accepted.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Hit during the docs@ rollout: after a pod restart postsrsd came up
spinning without binding its TCP ports, so postfix cleanup tempfailed
every message with 451 queue file write error. Document the signature
and the supervisorctl-restart / pod-recreate fix.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
CI pipeline 469 failed with 'Invalid index' on the postfix_virtual alias
filter: terraform only short-circuits &&/|| from v1.6, and the older
terraform in the infra-ci image still evaluated split(" ", line)[1] for
the blank and comment lines that have been in extra/aliases.txt since the
plans@ block. The devvm's newer terraform short-circuits, which is why the
local apply of the same commit passed. A conditional expression is lazy on
every terraform version, so move the length guard into a ternary.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A root-run kubectl exec mail_fetcher downloads attachments root-owned into
the scratch dir and the celery consumer (uid 1000) fails with
PermissionError — found during the build E2E. Document s6-setuidgid usage
and the recovery step.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor asked to forward arbitrary emails with PDF attachments into
paperless-ngx, with the forwarding sender mapping 1:1 to the paperless
account that owns the document. paperless-ngx's built-in IMAP consumer
already does the sender->owner mapping, so the infra half is a dedicated
real mailbox docs@viktorbarzin.me: an explicit self-alias (the @domain
catch-all would otherwise divert it into the TripIt-swept spam@ mailbox,
whose sweeper LLM-parses and auto-replies to mail from linked senders)
plus a per-user Dovecot sieve that discards non-family senders at
delivery (chosen behaviour for unmatched senders: ignore and delete;
also keeps spam out of the guessable address). The mailbox credential
was added to Vault secret/platform.mailserver_accounts. Paperless-side
mail account + 5 per-sender rules are DB state, configured via the API
per the new runbook docs/runbooks/paperless-mail-ingest.md.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor wants the traffic-flow view as a colored excalidraw instead of
the ASCII block (which was the only thing rendering after the earlier
VLAN-tagging SVG commit failed to push — a locally-masked non-fast-
forward this session, not a merge clobber). Ships both the editable
.excalidraw scene and a hand-drawn-style SVG export embedded in the
Traffic-on-the-trunk section: two lanes showing where the 802.1Q tag
is added, carried (only P5<->vmbr0) and stripped, L2 membership drops
vs L3 firewall verdicts.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Runbook covers add/update/retire (one map entry; internal DNS now
cleans up after itself), content rules for Valia's folders, and the
failure modes incl. both token re-mint paths. dns.md superset-rule
paragraph now describes the declarative ConfigMap reconcile instead of
hand-added static CNAMEs. Catalog: new valia-sites row; stem95su row
notes its Pages cutover is parked on the 42.9MB stem_video.mp4
exceeding the 25MB Pages per-file cap.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Two fixes from the first live runs. (1) The sync job now skips a whole
site when any file exceeds Cloudflare Pages' 25MB per-file cap, leaving
current serving untouched — stem95su's stem_board.html references a
42.9MB stem_video.mp4, which made every run fail; the guard turns that
into a loud skip so bridge keeps syncing. (2) The CI terraform is older
than 1.7 and rejects removed{} blocks anywhere (pipelines 461/464), so
the bridge record handoff was completed with a one-time manual
'tg state rm module.cloudflared.cloudflare_record.bridge_pages' from
the main checkout; the block is deleted and the module comment records
the manual step.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor asked for one diagram showing just the physical connections
between nodes, separate from the logical/VLAN topology: ISP->AX6000,
the in-wall apartment->garage run into P1, 4G router (cellular OOB),
UPS mgmt, the PoE cat6 to the camera, the LAN1 cable to eno1, dark
eno2 fallback + free eno3/4, iDRAC on shared-LOM, and the note that
everything else on the R730 is virtual. Referenced from the ADR next
to the logical SVG.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pipeline 461 failed terraform init: the removed{} handoff block sat in
the stack-local module, but Terraform only allows removed blocks in the
root module. Same intent, correct position (from =
module.cloudflared.cloudflare_record.bridge_pages, destroy=false).
Without this the stale state entry would make the next cloudflared
apply destroy the record valia-sites now owns.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
While reviewing the single-switch design Viktor asked whether both the
home LAN and the camera VLAN 'go via pfSense which forwards upstream' -
a natural misreading a future reader would repeat. Added a section
spelling out the vmbr0 fork: untagged home LAN is L2-bridged past
pfSense (gateway stays the AX6000, rack outage does not affect it, OOB
via 4G survives), while tagged-30 can only land on the dCCTV interface,
making a pfSense bypass impossible by construction. Includes a compact
ASCII topology for terminal readers alongside the SVG.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Valia keeps asking Viktor to host 1-page sites from her Drive folders;
this makes it one map entry. New stacks/valia-sites: per site a CF Pages
project + custom domain + proxied CNAME (bridge adopted via import{}),
a ConfigMap feed (valia-sites-dns) the technitium ingress-dns-sync
script now reconciles internal CNAMEs from (add/update/REMOVE — fixes
the add-only stale-record gotcha), and one shared 10-min CronJob that
mirrors each Content folder (rclone, drive.readonly, stem95su's guards)
and wrangler-deploys ONLY on manifest change (free-tier deploy cap).
Scoped CF Pages token + shared rclone conf in secret/valia-sites; the
Global API Key never enters a pod. cloudflared forgets bridge's record
via removed{} (no destroy). stem95su is in the map dns-parked
(manage_dns=false) until its cutover commit.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Grill session with Viktor: his mother Valia will keep asking for 1-page
site hosting, so the pattern is being made repeatable. Decisions: all
Valia sites serve off-infra on Cloudflare Pages (survive homelab
outages); one shared in-cluster CronJob mirrors her Drive folders every
10 min and redeploys on change; English subdomain names picked by
Viktor; failed-Job-only visibility; stem95su migrates onto the pattern.
CONTEXT.md gains Valia site / Content folder / Entry file; full
rationale and rejected options in ADR-0018.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor asked to rename the 'мост' school static site to 'bridge'.
New Cloudflare Pages project 'bridge' (bridge-cv2.pages.dev) already
deployed and the custom domain attached; this renames the public CNAME
(TF resource most_pages -> bridge_pages, destroy+create swaps the
record) and the internal split-horizon static CNAME in the
ingress-dns-sync CronJob. The old 'most' Pages project and the stale
internal 'most' record are removed out-of-band after this applies.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The internal split-horizon zone is authoritative for viktorbarzin.me,
so the new Cloudflare Pages site (most.viktorbarzin.me, added for
Viktor's 'мост' school static site) NXDOMAINed for every internal
client — LAN, VLANs and pods — while resolving fine externally.
Per the superset rule, add it as a static CNAME (-> most-6if.pages.dev)
in the ingress-dns-sync CronJob next to the mail-auth records, and
document the off-infra-site case in dns.md.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor asked to host a static HTML site (the 'мост' school project,
ОбУ „Отец Паисий", pulled from his Google Drive) on Cloudflare Pages
with a custom domain, as a try-out of Pages hosting. The site content
is deployed off-infra via wrangler to the Pages project 'most'
(most-6if.pages.dev); this CNAME points most.viktorbarzin.me at it.
The custom domain is already attached to the Pages project and is
waiting on this DNS record to validate.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor prefers not running two switches, so the TL-SG105PE takes over
all rack duties (apartment uplink, 4G, UPS, camera PoE) and the CCTV
segment moves onto a managed tagged trunk over the existing LAN1 cable:
pfSense net3 re-pointed from vmbr2 to vmbr0 tag=30 (applied live; same
MAC so vtnet3/dCCTV survived untouched). This is safe where the original
802.1Q rejection was not, because the managed switch is the only device
on eno1 and polices VLAN-30 membership. eno2/vmbr2 kept dormant as the
documented fallback. Old SG105E retires to cold spare; PE inherits
192.168.1.6. Glossary Segment term updated (all three segments are now
bridge-tags feeding untagged pfSense vNICs).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Viktor asked to verify free ports on the garage switch (192.168.1.6)
before finalizing. Logging into it showed it is NOT the TL-SG105PE from
the plan but a pre-existing non-PoE TL-SG105E with 4 of 5 ports in use
(apartment uplink, R730 LAN1, 4G router, UPS) - the single-shared-switch
port-VLAN design written earlier today was based on conflating the two
devices. Corrected: the new TL-SG105PE carries ONLY camera + eno2
uplink (mgmt 10.0.30.6 inside the segment), the old switch is untouched,
and no VLAN config exists anywhere. ADR, topology SVG and networking.md
updated to match.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>