infra/stacks/monitoring/modules/monitoring/tempo.yaml
Viktor Barzin 7513468a2d
Some checks failed
ci/woodpecker/push/default Pipeline failed
feat(monitoring): Tempo + OTel Collector for tripit tracing (ADR-0032 Phase 2)
Stand up the cluster's first trace store + OTLP ingress so tripit's OpenTelemetry
spans (Phase 1, already live in prod) export and correlate with logs:
- Grafana Tempo (single-binary, filesystem on proxmox-lvm 20Gi, 30d)
- OTel Collector (contrib; otlp -> redaction deny-list backstop -> batch -> tempo)
- Grafana: a Tempo datasource + an ADDITIVE trace_id->Tempo derivedField on the
  Loki datasource (no uid change, so existing dashboards are unaffected)
- tripit deployment: LOG_FORMAT=json + OTEL_EXPORTER_OTLP_ENDPOINT -> the Collector

Additive (new helm releases; Loki/Prometheus/Grafana untouched). Offline
'terraform validate' clean; full plan+apply runs in CI (locked git-crypt blocks a
local plan as non-admin).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-22 06:31:11 +00:00

36 lines
911 B
YAML

# Grafana Tempo — single-binary trace store for the TripIt observability stack
# (tripit ADR-0032). Mirrors Loki: filesystem storage on a proxmox-lvm PVC,
# SingleBinary, ingests OTLP from the OTel Collector. Additive — independent of
# Loki/Prometheus/Grafana.
tempo:
retention: 720h # 30d, matching Loki
storage:
trace:
backend: local
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
# OTLP ingest (from the OTel Collector). gRPC 4317 / HTTP 4318.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
# Tempo query/HTTP API — the Grafana datasource URL targets this (3100).
server:
http_listen_port: 3100
persistence:
enabled: true
size: 20Gi
storageClassName: proxmox-lvm
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
memory: 1Gi