pve-host/dns: register loki.viktorbarzin.lan CNAME, drop the /etc/hosts pin
Follow-up to the pve-host Loki shipper (aac807fb). The host reached Loki via an
/etc/hosts pin of the Traefik LB IP — Viktor flagged that as the wrong solution
(no hardcoding; the DNS infra should handle it). Registered loki.viktorbarzin.lan
in Technitium as a CNAME -> ingress.viktorbarzin.lan (the anchor whose A record
auto-tracks the live Traefik LB IP, so it's renumber-proof), via the Technitium
API + zone-sync to all 3 instances. Removed the /etc/hosts pin from the PVE host;
promtail now resolves the name purely via DNS (verified still shipping to Loki).
insecure_skip_verify stays — the internal .lan cert isn't publicly trusted.
Docs (monitoring.md) + the pve-promtail.yaml header updated to drop the pin
references. The DNS record is API-managed (the viktorbarzin.lan zone convention),
not in this repo; auto-managing .lan CNAMEs in technitium-ingress-dns-sync
remains a noted follow-up.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
93ba67c84a
commit
bd60c3d5e0
2 changed files with 17 additions and 14 deletions
|
|
@ -119,15 +119,18 @@ no `level` stream label.
|
|||
cluster error/warn line counts (5-min window) → `sensor.cluster_log_errors_5m` /
|
||||
`sensor.cluster_log_warnings_5m`, for a compact trend card on the Барзини status
|
||||
view plus a Grafana-link button. Those sensors reach Loki via the Traefik LB IP
|
||||
`10.0.20.203` + a `Host: loki.viktorbarzin.lan` header (`verify_ssl: false`)
|
||||
because `loki.viktorbarzin.lan` has **no Technitium record yet** (the
|
||||
`technitium-ingress-dns-sync` CronJob only creates `.me` CNAMEs + pins
|
||||
`ingress.viktorbarzin.lan`). The **PVE host** promtail (see "External host: pve"
|
||||
below) reaches Loki the same way, via an `/etc/hosts` pin
|
||||
`10.0.20.203 loki.viktorbarzin.lan`. **Follow-up (now 3 consumers — this sensor,
|
||||
rpi-sofia, the PVE host):** register `loki.viktorbarzin.lan` in Technitium as a
|
||||
CNAME → `ingress.viktorbarzin.lan` (auto-tracks Traefik LB renumbers) so all
|
||||
three resolve it by name instead of pinning the LB IP.
|
||||
`10.0.20.203` + a `Host: loki.viktorbarzin.lan` header (`verify_ssl: false`).
|
||||
**Update 2026-06-10:** `loki.viktorbarzin.lan` is now **registered in Technitium**
|
||||
as a CNAME → `ingress.viktorbarzin.lan` (the anchor whose A record auto-tracks the
|
||||
live Traefik LB IP), added via the Technitium API and AXFR-replicated to all 3
|
||||
instances — so it resolves by name LAN-wide. The **PVE host** promtail (see
|
||||
"External host: pve" below) uses the name directly, with **no `/etc/hosts` pin**.
|
||||
This HA sensor and the rpi-sofia promtail still pin the LB IP in their own configs
|
||||
and can drop to the name on next touch (`verify_ssl: false` / `insecure_skip_verify`
|
||||
stays — the internal `.lan` cert isn't publicly trusted). Per-host `.lan` CNAMEs
|
||||
are still added manually via the API; auto-managing them in
|
||||
`technitium-ingress-dns-sync` (today `.me`-only + the `ingress.viktorbarzin.lan`
|
||||
anchor) remains a follow-up.
|
||||
|
||||
### External host: rpi-sofia (Sofia Raspberry Pi)
|
||||
|
||||
|
|
@ -155,7 +158,7 @@ Query examples (Grafana → Loki): `{job="rpi-sofia-journal"}`, `{job="rpi-sofia
|
|||
|
||||
**Why now:** emo's Claude agent was granted **root SSH** to the host (a dedicated shared-root key `emo-pve-agent@devvm`, fingerprint `SHA256:Wd+m0EABlm4RDDykDh85PIYSqe0Al8Hr9AZ+7Ksy4HQ`, reachable as `ssh pve` from the devvm) so he can manage the host (e.g. the R730 fan daemon) via his agent. To keep an audit trail, **snoopy** (enabled via `/etc/ld.so.preload` → `libsnoopy.so`; config `scripts/pve-snoopy.ini`) logs every `execve()` to journald under identifier `snoopy`, and promtail ships it to Loki.
|
||||
|
||||
**Logs** — `promtail` v3.5.1 (amd64) at `/usr/local/bin/promtail`, config `scripts/pve-promtail.yaml`, unit `scripts/pve-promtail.service`. Ships `/var/log/journal` to `https://loki.viktorbarzin.lan/loki/api/v1/push` (`insecure_skip_verify`; LB-IP reached via the `/etc/hosts` pin noted above). Relabels: `unit`, `level`, `identifier`; sshd lines (`identifier=~"sshd.*"`) are re-jobbed to `sshd-pve` so the S1 rule matches. Streams:
|
||||
**Logs** — `promtail` v3.5.1 (amd64) at `/usr/local/bin/promtail`, config `scripts/pve-promtail.yaml`, unit `scripts/pve-promtail.service`. Ships `/var/log/journal` to `https://loki.viktorbarzin.lan/loki/api/v1/push` (`insecure_skip_verify` — the internal `.lan` cert isn't publicly trusted; the name resolves via the Technitium CNAME above, no `/etc/hosts` pin). Relabels: `unit`, `level`, `identifier`; sshd lines (`identifier=~"sshd.*"`) are re-jobbed to `sshd-pve` so the S1 rule matches. Streams:
|
||||
- `{job="pve-journal", host="pve"}` — full host journal (kernel, pvestatd, fan-control, NFS, etc.).
|
||||
- `{job="pve-journal", identifier="snoopy"}` — **command audit** (every execve: `uid login tty sid cwd cmdline`).
|
||||
- `{job="sshd-pve"}` — sshd auth; an `Accepted publickey ... SHA256:<fp>` line ties a session to a key (e.g. emo's fp above). Feeds S1.
|
||||
|
|
@ -164,7 +167,7 @@ Query examples (Grafana → Loki): `{job="rpi-sofia-journal"}`, `{job="rpi-sofia
|
|||
|
||||
Query examples (Grafana → Loki): `{host="pve"}`, `{job="pve-journal", identifier="snoopy"}` (command audit), `{job="sshd-pve"} |= "Accepted publickey"`.
|
||||
|
||||
> Hand-managed (not Terraform), like the rpi-sofia and fan-control pieces: the promtail binary/config/unit, the snoopy enable (`/etc/ld.so.preload`), and the `/etc/hosts` Loki pin all live on the host. Source-of-truth files: `scripts/pve-promtail.{yaml,service}` + `scripts/pve-snoopy.ini`; deploy steps are in the `pve-promtail.yaml` header.
|
||||
> Hand-managed (not Terraform), like the rpi-sofia and fan-control pieces: the promtail binary/config/unit and the snoopy enable (`/etc/ld.so.preload`) live on the host (Loki resolves via the Technitium CNAME — no `/etc/hosts` pin). Source-of-truth files: `scripts/pve-promtail.{yaml,service}` + `scripts/pve-snoopy.ini`; deploy steps are in the `pve-promtail.yaml` header.
|
||||
|
||||
### Dell R730 iDRAC: SNMP-primary + Redfish remnant (migrated 2026-06-05)
|
||||
|
||||
|
|
|
|||
|
|
@ -8,9 +8,9 @@
|
|||
# scp scripts/pve-promtail.service root@192.168.1.127:/etc/systemd/system/promtail.service
|
||||
# ssh root@192.168.1.127 'mkdir -p /var/lib/promtail && systemctl daemon-reload && systemctl enable --now promtail'
|
||||
# # Binary: grafana/loki v3.5.1 promtail-linux-amd64 -> /usr/local/bin/promtail (chmod 0755).
|
||||
# # Loki reach: /etc/hosts pin "10.0.20.203 loki.viktorbarzin.lan" (Traefik LB, ETP-Local).
|
||||
# # FOLLOW-UP: replace the pin with a Technitium CNAME loki.viktorbarzin.lan -> ingress.viktorbarzin.lan
|
||||
# # so it auto-tracks Traefik LB renumbers (also fixes the rpi-sofia pin — see docs/architecture/monitoring.md).
|
||||
# # Loki reach: loki.viktorbarzin.lan resolves via a Technitium CNAME -> ingress.viktorbarzin.lan
|
||||
# # (registered 2026-06-10 via the Technitium API; auto-tracks the live Traefik LB IP, AXFR'd to all
|
||||
# # 3 instances). NO /etc/hosts pin. insecure_skip_verify stays — the internal .lan cert isn't trusted.
|
||||
#
|
||||
# Streams produced:
|
||||
# {job="pve-journal"} — full host journal (filter identifier="snoopy" for the command audit)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue