From bd60c3d5e07e0546b658ce3d58ae2321176e6181 Mon Sep 17 00:00:00 2001 From: Viktor Barzin Date: Wed, 10 Jun 2026 22:55:20 +0000 Subject: [PATCH] pve-host/dns: register loki.viktorbarzin.lan CNAME, drop the /etc/hosts pin MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Follow-up to the pve-host Loki shipper (aac807fb). The host reached Loki via an /etc/hosts pin of the Traefik LB IP — Viktor flagged that as the wrong solution (no hardcoding; the DNS infra should handle it). Registered loki.viktorbarzin.lan in Technitium as a CNAME -> ingress.viktorbarzin.lan (the anchor whose A record auto-tracks the live Traefik LB IP, so it's renumber-proof), via the Technitium API + zone-sync to all 3 instances. Removed the /etc/hosts pin from the PVE host; promtail now resolves the name purely via DNS (verified still shipping to Loki). insecure_skip_verify stays — the internal .lan cert isn't publicly trusted. Docs (monitoring.md) + the pve-promtail.yaml header updated to drop the pin references. The DNS record is API-managed (the viktorbarzin.lan zone convention), not in this repo; auto-managing .lan CNAMEs in technitium-ingress-dns-sync remains a noted follow-up. Co-Authored-By: Claude Opus 4.8 --- docs/architecture/monitoring.md | 25 ++++++++++++++----------- scripts/pve-promtail.yaml | 6 +++--- 2 files changed, 17 insertions(+), 14 deletions(-) diff --git a/docs/architecture/monitoring.md b/docs/architecture/monitoring.md index b7d0619d..a5dec0af 100644 --- a/docs/architecture/monitoring.md +++ b/docs/architecture/monitoring.md @@ -119,15 +119,18 @@ no `level` stream label. cluster error/warn line counts (5-min window) → `sensor.cluster_log_errors_5m` / `sensor.cluster_log_warnings_5m`, for a compact trend card on the Барзини status view plus a Grafana-link button. Those sensors reach Loki via the Traefik LB IP -`10.0.20.203` + a `Host: loki.viktorbarzin.lan` header (`verify_ssl: false`) -because `loki.viktorbarzin.lan` has **no Technitium record yet** (the -`technitium-ingress-dns-sync` CronJob only creates `.me` CNAMEs + pins -`ingress.viktorbarzin.lan`). The **PVE host** promtail (see "External host: pve" -below) reaches Loki the same way, via an `/etc/hosts` pin -`10.0.20.203 loki.viktorbarzin.lan`. **Follow-up (now 3 consumers — this sensor, -rpi-sofia, the PVE host):** register `loki.viktorbarzin.lan` in Technitium as a -CNAME → `ingress.viktorbarzin.lan` (auto-tracks Traefik LB renumbers) so all -three resolve it by name instead of pinning the LB IP. +`10.0.20.203` + a `Host: loki.viktorbarzin.lan` header (`verify_ssl: false`). +**Update 2026-06-10:** `loki.viktorbarzin.lan` is now **registered in Technitium** +as a CNAME → `ingress.viktorbarzin.lan` (the anchor whose A record auto-tracks the +live Traefik LB IP), added via the Technitium API and AXFR-replicated to all 3 +instances — so it resolves by name LAN-wide. The **PVE host** promtail (see +"External host: pve" below) uses the name directly, with **no `/etc/hosts` pin**. +This HA sensor and the rpi-sofia promtail still pin the LB IP in their own configs +and can drop to the name on next touch (`verify_ssl: false` / `insecure_skip_verify` +stays — the internal `.lan` cert isn't publicly trusted). Per-host `.lan` CNAMEs +are still added manually via the API; auto-managing them in +`technitium-ingress-dns-sync` (today `.me`-only + the `ingress.viktorbarzin.lan` +anchor) remains a follow-up. ### External host: rpi-sofia (Sofia Raspberry Pi) @@ -155,7 +158,7 @@ Query examples (Grafana → Loki): `{job="rpi-sofia-journal"}`, `{job="rpi-sofia **Why now:** emo's Claude agent was granted **root SSH** to the host (a dedicated shared-root key `emo-pve-agent@devvm`, fingerprint `SHA256:Wd+m0EABlm4RDDykDh85PIYSqe0Al8Hr9AZ+7Ksy4HQ`, reachable as `ssh pve` from the devvm) so he can manage the host (e.g. the R730 fan daemon) via his agent. To keep an audit trail, **snoopy** (enabled via `/etc/ld.so.preload` → `libsnoopy.so`; config `scripts/pve-snoopy.ini`) logs every `execve()` to journald under identifier `snoopy`, and promtail ships it to Loki. -**Logs** — `promtail` v3.5.1 (amd64) at `/usr/local/bin/promtail`, config `scripts/pve-promtail.yaml`, unit `scripts/pve-promtail.service`. Ships `/var/log/journal` to `https://loki.viktorbarzin.lan/loki/api/v1/push` (`insecure_skip_verify`; LB-IP reached via the `/etc/hosts` pin noted above). Relabels: `unit`, `level`, `identifier`; sshd lines (`identifier=~"sshd.*"`) are re-jobbed to `sshd-pve` so the S1 rule matches. Streams: +**Logs** — `promtail` v3.5.1 (amd64) at `/usr/local/bin/promtail`, config `scripts/pve-promtail.yaml`, unit `scripts/pve-promtail.service`. Ships `/var/log/journal` to `https://loki.viktorbarzin.lan/loki/api/v1/push` (`insecure_skip_verify` — the internal `.lan` cert isn't publicly trusted; the name resolves via the Technitium CNAME above, no `/etc/hosts` pin). Relabels: `unit`, `level`, `identifier`; sshd lines (`identifier=~"sshd.*"`) are re-jobbed to `sshd-pve` so the S1 rule matches. Streams: - `{job="pve-journal", host="pve"}` — full host journal (kernel, pvestatd, fan-control, NFS, etc.). - `{job="pve-journal", identifier="snoopy"}` — **command audit** (every execve: `uid login tty sid cwd cmdline`). - `{job="sshd-pve"}` — sshd auth; an `Accepted publickey ... SHA256:` line ties a session to a key (e.g. emo's fp above). Feeds S1. @@ -164,7 +167,7 @@ Query examples (Grafana → Loki): `{job="rpi-sofia-journal"}`, `{job="rpi-sofia Query examples (Grafana → Loki): `{host="pve"}`, `{job="pve-journal", identifier="snoopy"}` (command audit), `{job="sshd-pve"} |= "Accepted publickey"`. -> Hand-managed (not Terraform), like the rpi-sofia and fan-control pieces: the promtail binary/config/unit, the snoopy enable (`/etc/ld.so.preload`), and the `/etc/hosts` Loki pin all live on the host. Source-of-truth files: `scripts/pve-promtail.{yaml,service}` + `scripts/pve-snoopy.ini`; deploy steps are in the `pve-promtail.yaml` header. +> Hand-managed (not Terraform), like the rpi-sofia and fan-control pieces: the promtail binary/config/unit and the snoopy enable (`/etc/ld.so.preload`) live on the host (Loki resolves via the Technitium CNAME — no `/etc/hosts` pin). Source-of-truth files: `scripts/pve-promtail.{yaml,service}` + `scripts/pve-snoopy.ini`; deploy steps are in the `pve-promtail.yaml` header. ### Dell R730 iDRAC: SNMP-primary + Redfish remnant (migrated 2026-06-05) diff --git a/scripts/pve-promtail.yaml b/scripts/pve-promtail.yaml index 92e311e1..b923e98c 100644 --- a/scripts/pve-promtail.yaml +++ b/scripts/pve-promtail.yaml @@ -8,9 +8,9 @@ # scp scripts/pve-promtail.service root@192.168.1.127:/etc/systemd/system/promtail.service # ssh root@192.168.1.127 'mkdir -p /var/lib/promtail && systemctl daemon-reload && systemctl enable --now promtail' # # Binary: grafana/loki v3.5.1 promtail-linux-amd64 -> /usr/local/bin/promtail (chmod 0755). -# # Loki reach: /etc/hosts pin "10.0.20.203 loki.viktorbarzin.lan" (Traefik LB, ETP-Local). -# # FOLLOW-UP: replace the pin with a Technitium CNAME loki.viktorbarzin.lan -> ingress.viktorbarzin.lan -# # so it auto-tracks Traefik LB renumbers (also fixes the rpi-sofia pin — see docs/architecture/monitoring.md). +# # Loki reach: loki.viktorbarzin.lan resolves via a Technitium CNAME -> ingress.viktorbarzin.lan +# # (registered 2026-06-10 via the Technitium API; auto-tracks the live Traefik LB IP, AXFR'd to all +# # 3 instances). NO /etc/hosts pin. insecure_skip_verify stays — the internal .lan cert isn't trusted. # # Streams produced: # {job="pve-journal"} — full host journal (filter identifier="snoopy" for the command audit)