CWA NETWORK_SHARE_MODE=true skips post-import chown, leaving files as
root. book-search now mounts the library to periodically fix permissions
on recently imported books.
Adds stacks-config volume mount to book-search pod so it can delete
Stacks history entries and force re-downloads when a book was consumed
by CWA but failed to import.
The init container was cloning the dotfiles repo via git on every pod
start, causing 200+ small NFS writes that amplified through ZFS.
Dotfiles already exist on NFS from a previous clone — no need to
re-clone on every restart. To update dotfiles, run git pull manually.
Also cleaned up stale Uptime Kuma files (1.6GB old SQLite DB + 289MB
error log left over from migration to MariaDB).
- API_KEY env var from calibre-secrets for /api/download-url auth
- SHORTCUT_ICLOUD_URL env var for /shortcut redirect
- Separate ingress for /api/download-url and /shortcut (bypasses Authentik)
HA frontend loads 30-50 JS bundles on page load, exhausting the burst.
iOS Companion app reconnections also trigger bursts. 172 rate-limited
(429) requests found in Traefik logs causing intermittent connectivity
failures for ha-sofia iOS app.
- Remove viktorbarzin.me from split DNS (same IPs as public DNS,
was adding unnecessary tunnel overhead for every DNS query)
- Narrow reverse DNS split scope from 10.0.0.0/8 → 10.0.20.0/24
and 10.0.10.0/24 only; 192.168.0.0/16 → 192.168.1.0/24 only
- Add extra_records for key internal services (technitium, k8s-master)
for instant MagicDNS resolution without tunnel roundtrip
- Replace full Tailscale DERP map (29 regions) with curated set:
home + 8 European + 5 global fallback DERPs (14 total)
- Add custom derp.yaml to ConfigMap, sourced from Vault
Port 80 DERP dropped — Traefik's global HTTP→HTTPS redirect
prevents non-TLS DERP upgrades on the web entrypoint.
- linkwarden: add Reloader match annotation to DB secret so pods
auto-restart on Vault credential rotation (was causing 100% 5xx)
- authentik: increase memory limits (server 1Gi→1.5Gi, worker 896Mi→1Gi)
to prevent OOM kills
- prometheus: drop 113k high-cardinality series to reduce HDD write rate
from ~8.8 to ~6.0 MB/s (31% reduction):
- drop all traefik/apiserver/etcd histogram bucket metrics
- drop goflow2_flow_process_nf_templates_total (9.3k series)
- drop container_tasks_state and container_memory_failures_total
- rewrite HighServiceLatency alert to use avg latency (_sum/_count)
- update cluster_health dashboard to match
- raise KubeletRuntimeOperationsLatency threshold from 30s to 60s
- Add SQLite backup CronJob (every 6h to NFS for cloud sync pickup)
- Move headscale-ui secrets (COOKIE_SECRET, ROOT_API_KEY) from hardcoded
values to Vault-managed secrets
- Add DERP IPv6 address (2001:470:6e:43d::2) for IPv6-capable clients
- Clean up stale test nodes, duplicate users, rename "localhost" nodes
Also updated headscale_config in Vault to include DERP ipv6 field
and headscale_ui_cookie_secret/headscale_ui_api_key secrets.
CrowdSec, rate limiting, anti-AI, and error pages middlewares were
interfering with the Upgrade: DERP protocol handshake. Also updated
Headscale ACL in Vault to allow tailnet DNS traffic to Technitium
(10.0.20.200:53).
NFS PVs report the entire NFS server filesystem usage (e.g., navidrome-music
shows 5.3 TiB Synology volume at 97%), not PVC-specific usage. Filter out
PVs with >1TiB capacity (always NFS mounts; iSCSI PVCs are 10-50Gi).
Stagger token periods across roles (7d/8d/9d/10d) to prevent
bulk lease revocation storms that caused transient 504s.
Periodic tokens auto-renew indefinitely, eliminating mass expiry.
- Remove ClusterMemoryRequestsHigh, ContainerNearOOM, NodeLowFreeMemory,
NodeMemoryPressureTrending — all fire regularly due to intentional
memory overcommit and are not actionable
- Keep ContainerOOMKilled (actionable — container actually died)
- Raise HighServiceLatency p99 threshold from 10s to 30s to ignore
transient spikes
Both services migrated to unified ebooks namespace. Remove:
- Old stack directories and Terraform state
- calibre references from monitoring namespace lists
- calibre/audiobookshelf from operational scripts
- Delete servarr/audiobook-search TF module (moved to ebooks/book-search)
- Remove audiobook-search from cloudflare_proxied_names
- Remove commented-out module reference in servarr/main.tf
- Clean up "renamed from" comment in ebooks/main.tf
- K8s resources (deploy/svc/ingress) deleted from servarr namespace
- Cloudflare DNS record already absent
- Import book-search and insta2spotify DNS records into cloudflared state
- New ebooks namespace with CWA, Stacks, Audiobookshelf, book-search
- book-search (renamed from audiobook-search) with CWA ingest volume
- Comment out audiobook_search module from servarr
- All NFS volumes and secrets consolidated
- Namespace insta2spotify (tier 4-aux)
- ExternalSecret from Vault secret/insta2spotify
- NFS volume at /mnt/main/insta2spotify for SQLite + Spotify cache
- Frontend (128Mi) + backend (512Mi req / 2Gi limit) in one pod
- Split ingress: protected (Authentik) for frontend, unprotected for /api/*
- DNS via Cloudflare (proxied)
- ingress_factory now injects gethomepage.dev/* annotations on all ingresses
(name, group, href, icon) with namespace-to-group mapping
- Stacks with explicit annotations override defaults via merge order
- New homepage_enabled var allows opt-out for internal-only ingresses
- Homepage search widget switched to in-page quicklaunch (Ctrl+K / tap)
- Added hideErrors and quicklaunch settings for clean service directory
- Result: 116/134 ingresses now discoverable (up from ~30)
- Replace custom ViktorBarzin/metallb module with official Helm chart
- Migrate from ConfigMap-based config to CRD (IPAddressPool + L2Advertisement)
- Update Traefik LB annotations from metallb.universe.tf to metallb.io format
- Technitium DNS keeps stable IP 10.0.20.204 via MetalLB auto-assignment
- Headscale split DNS already configured to use 10.0.20.204
- Expose STUN port 3479/UDP on container and LoadBalancer service
- Upgrade headscale from 0.23.0 to 0.28.0
- Vault config updated: auto DERP region with ipv4 field, ISP router
port forward for UDP 3479 added
Home DERP now shows ~3ms latency and is selected as nearest relay.
Kyverno ClusterPolicy clones tls-secret from kyverno namespace to all
namespaces with synchronize=true. Renewal pipeline now updates the source
secret via kubectl, verifies cert validity, and sends Slack notification.
The upstream ghcr.io/mrlhansen/idrac_exporter:2.4.1 is missing
NewPowerSupplyInputVoltage in RefreshPowerOld, so the R730 iDRAC
never emits idrac_power_supply_input_voltage. Switch to the patched
viktorbarzin/idrac-redfish-exporter:2.4.1-voltage-fix image.
/proc/self/io inside $(awk ...) resolves to the awk subprocess PID,
not the parent bash shell. Use $$ (bash PID) to read the correct
process IO counters.