[ci skip] add network traffic visualization design doc
This commit is contained in:
parent
c36d953573
commit
887075189a
1 changed files with 91 additions and 0 deletions
91
docs/plans/2026-02-28-network-visualization-design.md
Normal file
91
docs/plans/2026-02-28-network-visualization-design.md
Normal file
|
|
@ -0,0 +1,91 @@
|
||||||
|
# Network Traffic Visualization Design
|
||||||
|
|
||||||
|
**Date**: 2026-02-28
|
||||||
|
**Goal**: Real-time visualization of all network traffic — pod-to-pod (K8s) and full network (up to 192.168.1.1) — using Grafana as the single pane of glass.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
192.168.1.1 (ISP router)
|
||||||
|
└── 10.0.20.1 (pfSense + softflowd) ──NetFlow UDP──► GoFlow2 (K8s)
|
||||||
|
├── Proxmox (192.168.1.127) │
|
||||||
|
│ └── K8s nodes (10.0.20.100-104) ▼
|
||||||
|
│ └── Pods ◄──eBPF──► Caretta Prometheus
|
||||||
|
├── TrueNAS (10.0.10.15) │
|
||||||
|
└── Other devices ▼
|
||||||
|
Grafana
|
||||||
|
(Node Graph panels)
|
||||||
|
```
|
||||||
|
|
||||||
|
Two complementary data paths:
|
||||||
|
1. **Caretta** (eBPF DaemonSet) → tracks pod-to-pod TCP connections → Prometheus metrics → Grafana Node Graph
|
||||||
|
2. **GoFlow2** (NetFlow collector) ← pfSense softflowd → Prometheus metrics → Grafana dashboards
|
||||||
|
|
||||||
|
## Component 1: Caretta
|
||||||
|
|
||||||
|
- **Stack**: `stacks/caretta/`
|
||||||
|
- **Namespace**: `caretta`
|
||||||
|
- **Deployment**: Helm release from `https://helm.groundcover.com/`, chart `caretta`
|
||||||
|
- **Config**:
|
||||||
|
- Disable bundled Grafana (`grafana.enabled: false`)
|
||||||
|
- Disable bundled VictoriaMetrics (`victoria-metrics-single.enabled: false`)
|
||||||
|
- DaemonSet runs eBPF agent on each node
|
||||||
|
- Exposes Prometheus metrics on port 7117
|
||||||
|
- **Key metric**: `caretta_links_observed{client_name, client_namespace, server_name, server_namespace, server_port}`
|
||||||
|
- **Grafana**: ConfigMap dashboard with Node Graph panel, label `grafana_dashboard: "1"`
|
||||||
|
- **Resources**: ~100Mi RAM, ~50m CPU per node
|
||||||
|
|
||||||
|
## Component 2: GoFlow2
|
||||||
|
|
||||||
|
- **Stack**: `stacks/goflow2/`
|
||||||
|
- **Namespace**: `goflow2`
|
||||||
|
- **Deployment**: Raw Terraform (Deployment + Service) — single binary, no Helm chart needed
|
||||||
|
- **Image**: `netsampler/goflow2`
|
||||||
|
- **Ports**:
|
||||||
|
- UDP 2055: NetFlow v9 receiver (from pfSense)
|
||||||
|
- TCP 8080: Prometheus metrics endpoint
|
||||||
|
- **Service**: NodePort for UDP 2055 so pfSense (10.0.20.1) can reach it on any node IP
|
||||||
|
- **Key metrics**: `flow_bytes`, `flow_packets` with labels for src/dst IP, port, protocol
|
||||||
|
- **Grafana**: ConfigMap dashboard showing network flows (top talkers, protocol breakdown, inter-VLAN traffic)
|
||||||
|
- **Resources**: ~100Mi RAM, ~50m CPU (single pod, not DaemonSet)
|
||||||
|
|
||||||
|
## Component 3: pfSense softflowd
|
||||||
|
|
||||||
|
- **Host**: 10.0.20.1 (SSH as admin)
|
||||||
|
- **Package**: `softflowd` (install via pfSense package manager)
|
||||||
|
- **Config**:
|
||||||
|
- Monitor LAN interface(s)
|
||||||
|
- Export NetFlow v9 to `<k8s-node-ip>:<goflow2-nodeport>` (UDP)
|
||||||
|
- Tracking level: full (track individual connections)
|
||||||
|
- **Note**: This is a manual SSH step — pfSense is not Terraform-managed
|
||||||
|
|
||||||
|
## Component 4: Prometheus Integration
|
||||||
|
|
||||||
|
Two new scrape targets in `stacks/platform/modules/monitoring/prometheus_chart_values.tpl` (`extraScrapeConfigs`):
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- job_name: 'caretta'
|
||||||
|
static_configs:
|
||||||
|
- targets: ["caretta.caretta.svc.cluster.local:7117"]
|
||||||
|
|
||||||
|
- job_name: 'goflow2'
|
||||||
|
static_configs:
|
||||||
|
- targets: ["goflow2.goflow2.svc.cluster.local:8080"]
|
||||||
|
```
|
||||||
|
|
||||||
|
Requires re-applying the platform stack.
|
||||||
|
|
||||||
|
## Deployment Order
|
||||||
|
|
||||||
|
1. Apply `stacks/caretta/` — deploys eBPF DaemonSet
|
||||||
|
2. Apply `stacks/goflow2/` — deploys NetFlow collector
|
||||||
|
3. Re-apply `stacks/platform/` — adds Prometheus scrape targets
|
||||||
|
4. SSH to pfSense — install softflowd, configure NetFlow export to GoFlow2 NodePort
|
||||||
|
5. Verify in Grafana — confirm both dashboards show data
|
||||||
|
|
||||||
|
## Grafana Dashboards
|
||||||
|
|
||||||
|
Two dashboards, both auto-loaded via sidecar (ConfigMap label `grafana_dashboard: "1"`):
|
||||||
|
|
||||||
|
1. **K8s Pod Topology** (Caretta): Node Graph panel showing pods as nodes, TCP connections as edges, byte counts as edge weights
|
||||||
|
2. **Network Flows** (GoFlow2): Top talkers, protocol breakdown, inter-VLAN traffic, external destinations
|
||||||
Loading…
Add table
Add a link
Reference in a new issue