infra/scripts/pve-promtail.yaml
Viktor Barzin bd60c3d5e0
All checks were successful
ci/woodpecker/push/default Pipeline was successful
ci/woodpecker/push/build-cli Pipeline was successful
pve-host/dns: register loki.viktorbarzin.lan CNAME, drop the /etc/hosts pin
Follow-up to the pve-host Loki shipper (aac807fb). The host reached Loki via an
/etc/hosts pin of the Traefik LB IP — Viktor flagged that as the wrong solution
(no hardcoding; the DNS infra should handle it). Registered loki.viktorbarzin.lan
in Technitium as a CNAME -> ingress.viktorbarzin.lan (the anchor whose A record
auto-tracks the live Traefik LB IP, so it's renumber-proof), via the Technitium
API + zone-sync to all 3 instances. Removed the /etc/hosts pin from the PVE host;
promtail now resolves the name purely via DNS (verified still shipping to Loki).
insecure_skip_verify stays — the internal .lan cert isn't publicly trusted.

Docs (monitoring.md) + the pve-promtail.yaml header updated to drop the pin
references. The DNS record is API-managed (the viktorbarzin.lan zone convention),
not in this repo; auto-managing .lan CNAMEs in technitium-ingress-dns-sync
remains a noted follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-10 22:55:20 +00:00

53 lines
2.3 KiB
YAML

# Promtail config for the PVE host (192.168.1.127) — ships the systemd journal to cluster Loki.
#
# NOT Terraform-managed (the PVE host is the hypervisor, outside k8s). Deployed by hand,
# same pattern as scripts/fan-control.* and the rpi-sofia promtail. This file is source-of-truth.
#
# Deploy:
# scp scripts/pve-promtail.yaml root@192.168.1.127:/etc/promtail/config.yml
# scp scripts/pve-promtail.service root@192.168.1.127:/etc/systemd/system/promtail.service
# ssh root@192.168.1.127 'mkdir -p /var/lib/promtail && systemctl daemon-reload && systemctl enable --now promtail'
# # Binary: grafana/loki v3.5.1 promtail-linux-amd64 -> /usr/local/bin/promtail (chmod 0755).
# # Loki reach: loki.viktorbarzin.lan resolves via a Technitium CNAME -> ingress.viktorbarzin.lan
# # (registered 2026-06-10 via the Technitium API; auto-tracks the live Traefik LB IP, AXFR'd to all
# # 3 instances). NO /etc/hosts pin. insecure_skip_verify stays — the internal .lan cert isn't trusted.
#
# Streams produced:
# {job="pve-journal"} — full host journal (filter identifier="snoopy" for the command audit)
# {job="sshd-pve"} — sshd auth lines; feeds the Loki S1 security rule (docs/architecture/security.md)
# {job="pve-journal", identifier="snoopy"} — snoopy command audit (every execve on the host; see scripts/pve-snoopy.ini)
server:
http_listen_port: 9080
grpc_listen_port: 0
log_level: warn
positions:
filename: /var/lib/promtail/positions.yaml
clients:
- url: https://loki.viktorbarzin.lan/loki/api/v1/push
tls_config:
insecure_skip_verify: true
scrape_configs:
- job_name: journal
journal:
max_age: 12h
json: false
path: /var/log/journal
labels:
host: pve
job: pve-journal
relabel_configs:
- source_labels: ['__journal__systemd_unit']
target_label: unit
- source_labels: ['__journal_priority_keyword']
target_label: level
- source_labels: ['__journal_syslog_identifier']
target_label: identifier
# sshd auth lines (identifier sshd / sshd-session) -> job=sshd-pve so the Loki S1
# security rule ({job="sshd-pve"}) matches. snoopy command lines stay job=pve-journal.
- source_labels: ['__journal_syslog_identifier']
regex: 'sshd.*'
target_label: job
replacement: 'sshd-pve'