infra

Author SHA1 Message Date

Author	SHA1	Message	Date
Viktor Barzin	fd0f4a0365	fix: restore tree dropped by `6d224861`; land stem95su gdrive-sync (10m) [ci skip] `6d224861` came from a --no-checkout worktree whose empty index made the commit drop every file except two. This restores 05b50d2b's full tree and correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the live infra was never applied from the broken commit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 08:45:33 +00:00
Viktor Barzin	6d224861c4	stem95su: scheduled Drive->site sync CronJob (every 10m) CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard + --max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault secret/stem95su. Requires the GCP OAuth app published to Production or the refresh token expires ~weekly. Lands the gdrive-sync stack on master (it had landed on a feature branch by accident on the shared devvm checkout). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-09 08:42:26 +00:00
Viktor Barzin	0f6321ce86	[dns] NodeLocal DNSCache — deploy DaemonSet to all nodes (WS C) Adds per-node DNS cache that transparently intercepts pod queries on 10.96.0.10 (kube-dns ClusterIP) AND 169.254.20.10 (link-local) via hostNetwork + NET_ADMIN iptables NOTRACK rules. Pods keep using their existing /etc/resolv.conf (nameserver 10.96.0.10) unchanged — no kubelet rollout needed for transparent mode. Layout mirrors existing stacks (technitium, descheduler, kured): stacks/nodelocal-dns/ main.tf # module wiring + IP params modules/nodelocal-dns/main.tf # SA, Services, ConfigMap, DS Key decisions: - Image: registry.k8s.io/dns/k8s-dns-node-cache:1.23.1 - Co-listens on 169.254.20.10 + 10.96.0.10 (transparent interception) - Upstream path: kube-dns-upstream (new headless svc) → CoreDNS pods (separate ClusterIP avoids cache looping back through itself) - viktorbarzin.lan zone forwards directly to Technitium ClusterIP (10.96.0.53), bypassing CoreDNS for internal names - priorityClassName: system-node-critical - tolerations: operator=Exists (runs on master + all tainted nodes) - No CPU limit (cluster-wide policy); mem requests=32Mi, limit=128Mi - Kyverno dns_config drift suppressed on the DaemonSet - Kubelet clusterDNS NOT changed — transparent mode is sufficient; rolling 5 nodes just to switch to 169.254.20.10 has no additional benefit and expanding blast radius for no reason. Verified: - DaemonSet 5/5 Ready across k8s-master + 4 workers - dig @169.254.20.10 idrac.viktorbarzin.lan -> 192.168.1.4 - dig @169.254.20.10 github.com -> 140.82.121.3 - Deleted all 3 CoreDNS pods; cached queries still resolved via NodeLocal DNSCache (resilience confirmed) Docs: architecture/dns.md — adds NodeLocal DNSCache to Components table, graph diagram, stacks table; rewrites pod DNS resolution paths to show the cache layer; adds troubleshooting entry. Closes: code-2k6	2026-04-19 15:46:41 +00:00

Viktor Barzin

fd0f4a0365

fix: restore tree dropped by 6d224861; land stem95su gdrive-sync (10m) [ci skip]

6d224861 came from a --no-checkout worktree whose empty index made the
commit drop every file except two. This restores 05b50d2b's full tree and
correctly adds stacks/stem95su/gdrive-sync.tf + the service-catalog stem95su
entry. Forward-only (parent=6d224861, no force-push); [ci skip] since the
live infra was never applied from the broken commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-09 08:45:33 +00:00

Viktor Barzin

6d224861c4

stem95su: scheduled Drive->site sync CronJob (every 10m)

CronJob stem95su-gdrive-sync (*/10) mounts the content PVC RW and
rclone-syncs the read-only Drive folder "claude" (stem claude/files) onto
it (rclone/rclone:1.74.3, scope=drive.readonly, empty-source guard +
--max-delete 25). ESO ExternalSecret stem95su-rclone <- Vault
secret/stem95su. Requires the GCP OAuth app published to Production or the
refresh token expires ~weekly.

Lands the gdrive-sync stack on master (it had landed on a feature branch
by accident on the shared devvm checkout).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-09 08:42:26 +00:00

Viktor Barzin

0f6321ce86

[dns] NodeLocal DNSCache — deploy DaemonSet to all nodes (WS C)

Adds per-node DNS cache that transparently intercepts pod queries on
10.96.0.10 (kube-dns ClusterIP) AND 169.254.20.10 (link-local) via
hostNetwork + NET_ADMIN iptables NOTRACK rules. Pods keep using their
existing /etc/resolv.conf (nameserver 10.96.0.10) unchanged — no kubelet
rollout needed for transparent mode.

Layout mirrors existing stacks (technitium, descheduler, kured):
  stacks/nodelocal-dns/
    main.tf                                 # module wiring + IP params
    modules/nodelocal-dns/main.tf           # SA, Services, ConfigMap, DS

Key decisions:
  - Image: registry.k8s.io/dns/k8s-dns-node-cache:1.23.1
  - Co-listens on 169.254.20.10 + 10.96.0.10 (transparent interception)
  - Upstream path: kube-dns-upstream (new headless svc) → CoreDNS pods
    (separate ClusterIP avoids cache looping back through itself)
  - viktorbarzin.lan zone forwards directly to Technitium ClusterIP
    (10.96.0.53), bypassing CoreDNS for internal names
  - priorityClassName: system-node-critical
  - tolerations: operator=Exists (runs on master + all tainted nodes)
  - No CPU limit (cluster-wide policy); mem requests=32Mi, limit=128Mi
  - Kyverno dns_config drift suppressed on the DaemonSet
  - Kubelet clusterDNS NOT changed — transparent mode is sufficient;
    rolling 5 nodes just to switch to 169.254.20.10 has no additional
    benefit and expanding blast radius for no reason.

Verified:
  - DaemonSet 5/5 Ready across k8s-master + 4 workers
  - dig @169.254.20.10 idrac.viktorbarzin.lan -> 192.168.1.4
  - dig @169.254.20.10 github.com -> 140.82.121.3
  - Deleted all 3 CoreDNS pods; cached queries still resolved via
    NodeLocal DNSCache (resilience confirmed)

Docs: architecture/dns.md — adds NodeLocal DNSCache to Components table,
graph diagram, stacks table; rewrites pod DNS resolution paths to show
the cache layer; adds troubleshooting entry.

Closes: code-2k6

2026-04-19 15:46:41 +00:00

3 commits