[ci skip] add network visualization implementation plan

This commit is contained in:
Viktor Barzin 2026-02-28 18:19:36 +00:00
parent 887075189a
commit 052662540b

View file

@ -0,0 +1,445 @@
# Network Traffic Visualization Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Deploy Caretta (pod-to-pod eBPF topology) and GoFlow2 + pfSense softflowd (full network NetFlow) with Grafana dashboards for real-time network visualization.
**Architecture:** Two data paths feed into existing Prometheus+Grafana: (1) Caretta eBPF DaemonSet tracks pod TCP connections, (2) pfSense exports NetFlow to GoFlow2 collector pod. Both expose Prometheus metrics scraped by existing Prometheus, visualized in Grafana Node Graph panels.
**Tech Stack:** Terraform/Terragrunt, Helm (Caretta), raw K8s resources (GoFlow2), pfSense SSH (softflowd), Prometheus, Grafana
**Design doc:** `docs/plans/2026-02-28-network-visualization-design.md`
---
### Task 1: Create Caretta Terraform stack
**Files:**
- Create: `stacks/caretta/terragrunt.hcl`
- Create: `stacks/caretta/main.tf`
**Step 1: Create the terragrunt.hcl**
```hcl
# stacks/caretta/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
dependency "platform" {
config_path = "../platform"
skip_outputs = true
}
```
**Step 2: Create main.tf with Helm release**
```hcl
variable "tls_secret_name" { type = string }
resource "kubernetes_namespace" "caretta" {
metadata {
name = "caretta"
labels = {
tier = local.tiers.cluster
}
}
}
resource "helm_release" "caretta" {
namespace = kubernetes_namespace.caretta.metadata[0].name
name = "caretta"
repository = "https://helm.groundcover.com/"
chart = "caretta"
version = "0.0.16"
set {
name = "victoria-metrics-single.enabled"
value = "false"
}
set {
name = "grafana.enabled"
value = "false"
}
}
```
**Step 3: Create secrets symlink**
Run: `cd stacks/caretta && ln -s ../../secrets secrets`
**Step 4: Apply**
Run: `cd stacks/caretta && terragrunt apply --non-interactive`
**Step 5: Verify DaemonSet is running**
Run: `kubectl --kubeconfig $(pwd)/config get daemonset -n caretta`
Expected: Caretta DaemonSet with 5 pods (one per node)
**Step 6: Commit**
```bash
git add stacks/caretta/
git commit -m "[ci skip] deploy caretta eBPF pod topology visualization"
```
---
### Task 2: Add Caretta Grafana dashboard
**Files:**
- Modify: `stacks/caretta/main.tf`
**Step 1: Download dashboard JSON**
Run: `curl -sL https://raw.githubusercontent.com/groundcover-com/caretta/master/chart/dashboard.json > stacks/caretta/dashboard.json`
**Step 2: Add ConfigMap to main.tf**
Append to `stacks/caretta/main.tf`:
```hcl
resource "kubernetes_config_map" "caretta_dashboard" {
metadata {
name = "caretta-grafana-dashboard"
namespace = kubernetes_namespace.caretta.metadata[0].name
labels = {
grafana_dashboard = "1"
}
}
data = {
"caretta-dashboard.json" = file("${path.module}/dashboard.json")
}
}
```
**Step 3: Apply**
Run: `cd stacks/caretta && terragrunt apply --non-interactive`
**Step 4: Verify dashboard appears in Grafana**
Open `https://grafana.viktorbarzin.me` → Dashboards → search "Caretta"
Expected: Dashboard visible with Node Graph panel (may be empty until Prometheus scrape is configured)
**Step 5: Commit**
```bash
git add stacks/caretta/
git commit -m "[ci skip] add caretta grafana dashboard via sidecar configmap"
```
---
### Task 3: Create GoFlow2 Terraform stack
**Files:**
- Create: `stacks/goflow2/terragrunt.hcl`
- Create: `stacks/goflow2/main.tf`
**Step 1: Create the terragrunt.hcl**
```hcl
# stacks/goflow2/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
dependency "platform" {
config_path = "../platform"
skip_outputs = true
}
```
**Step 2: Create main.tf with Deployment + Services**
```hcl
variable "tls_secret_name" { type = string }
resource "kubernetes_namespace" "goflow2" {
metadata {
name = "goflow2"
labels = {
tier = local.tiers.cluster
}
}
}
resource "kubernetes_deployment" "goflow2" {
metadata {
name = "goflow2"
namespace = kubernetes_namespace.goflow2.metadata[0].name
}
spec {
replicas = 1
selector {
match_labels = {
app = "goflow2"
}
}
template {
metadata {
labels = {
app = "goflow2"
}
}
spec {
container {
name = "goflow2"
image = "netsampler/goflow2:v2.2.1"
args = ["-listen", "netflow://:2055", "-transport", "stdout", "-format", "json"]
port {
name = "netflow"
container_port = 2055
protocol = "UDP"
}
port {
name = "metrics"
container_port = 8080
protocol = "TCP"
}
resources {
requests = {
cpu = "50m"
memory = "64Mi"
}
limits = {
cpu = "200m"
memory = "256Mi"
}
}
}
}
}
}
}
resource "kubernetes_service" "goflow2_metrics" {
metadata {
name = "goflow2"
namespace = kubernetes_namespace.goflow2.metadata[0].name
}
spec {
selector = {
app = "goflow2"
}
port {
name = "metrics"
port = 8080
target_port = 8080
protocol = "TCP"
}
}
}
resource "kubernetes_service" "goflow2_netflow" {
metadata {
name = "goflow2-netflow"
namespace = kubernetes_namespace.goflow2.metadata[0].name
}
spec {
type = "NodePort"
selector = {
app = "goflow2"
}
port {
name = "netflow"
port = 2055
target_port = 2055
protocol = "UDP"
node_port = 32055
}
}
}
```
**Step 3: Create secrets symlink**
Run: `cd stacks/goflow2 && ln -s ../../secrets secrets`
**Step 4: Apply**
Run: `cd stacks/goflow2 && terragrunt apply --non-interactive`
**Step 5: Verify pod is running**
Run: `kubectl --kubeconfig $(pwd)/config get pods -n goflow2`
Expected: 1 goflow2 pod running
**Step 6: Verify NodePort is accessible**
Run: `kubectl --kubeconfig $(pwd)/config get svc -n goflow2 goflow2-netflow`
Expected: NodePort 32055/UDP
**Step 7: Commit**
```bash
git add stacks/goflow2/
git commit -m "[ci skip] deploy goflow2 netflow collector for network visualization"
```
---
### Task 4: Add Prometheus scrape targets for Caretta and GoFlow2
**Files:**
- Modify: `stacks/platform/modules/monitoring/prometheus_chart_values.tpl` (append to extraScrapeConfigs)
**Step 1: Append scrape jobs**
Add at the end of `extraScrapeConfigs` (before the final blank line at line 882):
```yaml
- job_name: 'caretta'
static_configs:
- targets:
- "caretta-caretta.caretta.svc.cluster.local:7117"
metrics_path: '/metrics'
- job_name: 'goflow2'
static_configs:
- targets:
- "goflow2.goflow2.svc.cluster.local:8080"
metrics_path: '/metrics'
```
**Step 2: Apply platform stack**
Run: `cd stacks/platform && terragrunt apply --non-interactive`
**Step 3: Verify Prometheus targets**
Open `https://grafana.viktorbarzin.me` → Explore → Prometheus → query `up{job="caretta"}` and `up{job="goflow2"}`
Expected: Both return `1`
**Step 4: Verify Caretta metrics flowing**
Query: `caretta_links_observed`
Expected: Multiple time series with client_name/server_name labels showing pod connections
**Step 5: Commit**
```bash
git add stacks/platform/modules/monitoring/prometheus_chart_values.tpl
git commit -m "[ci skip] add caretta and goflow2 prometheus scrape targets"
```
---
### Task 5: Install and configure softflowd on pfSense
**Files:** None (SSH to pfSense)
**Step 1: SSH to pfSense and install softflowd**
Run: `ssh admin@10.0.20.1 "pkg install -y softflowd"`
If `softflowd` is available via pfSense package manager instead:
Run: `ssh admin@10.0.20.1 "pfSsh.php playback installpkg softflowd"`
**Step 2: Determine LAN interface name**
Run: `ssh admin@10.0.20.1 "ifconfig -l"`
Expected: Identify the LAN interface (likely `vtnet1` or `igb1`)
**Step 3: Configure softflowd**
Pick any K8s node IP (e.g., 10.0.20.100) with NodePort 32055:
Run:
```bash
ssh admin@10.0.20.1 "softflowd -i <LAN_INTERFACE> -n 10.0.20.100:32055 -v 9 -t maxlife=300"
```
Flags:
- `-i <interface>`: Monitor this interface
- `-n 10.0.20.100:32055`: Send NetFlow v9 to GoFlow2 NodePort
- `-v 9`: NetFlow version 9
- `-t maxlife=300`: Max flow lifetime 5 minutes
**Step 4: Verify flows are arriving at GoFlow2**
Run: `kubectl --kubeconfig $(pwd)/config logs -n goflow2 -l app=goflow2 --tail=20`
Expected: JSON flow records appearing in stdout
**Step 5: Make softflowd persistent**
Ensure softflowd starts on boot. On pfSense/FreeBSD:
Run: `ssh admin@10.0.20.1 'echo "softflowd_enable=\"YES\"" >> /etc/rc.conf && echo "softflowd_flags=\"-i <LAN_INTERFACE> -n 10.0.20.100:32055 -v 9\"" >> /etc/rc.conf'`
---
### Task 6: Add GoFlow2 Grafana dashboard
**Files:**
- Create: `stacks/goflow2/dashboard.json`
- Modify: `stacks/goflow2/main.tf`
**Step 1: Create a GoFlow2 dashboard JSON**
Create `stacks/goflow2/dashboard.json` — a Grafana dashboard with panels for:
- Top talkers by bytes (bar chart, query: `topk(10, sum by (src_addr, dst_addr) (rate(flow_bytes[5m])))`)
- Protocol breakdown (pie chart, query: `sum by (proto) (rate(flow_bytes[5m]))`)
- Flows over time (time series, query: `sum(rate(flow_packets[5m]))`)
Note: Exact metric names will depend on GoFlow2's Prometheus output — verify after Task 5 by querying `{job="goflow2"}` in Prometheus. Adjust dashboard queries to match actual metric names.
**Step 2: Add ConfigMap to main.tf**
Append to `stacks/goflow2/main.tf`:
```hcl
resource "kubernetes_config_map" "goflow2_dashboard" {
metadata {
name = "goflow2-grafana-dashboard"
namespace = kubernetes_namespace.goflow2.metadata[0].name
labels = {
grafana_dashboard = "1"
}
}
data = {
"goflow2-dashboard.json" = file("${path.module}/dashboard.json")
}
}
```
**Step 3: Apply**
Run: `cd stacks/goflow2 && terragrunt apply --non-interactive`
**Step 4: Verify in Grafana**
Open `https://grafana.viktorbarzin.me` → Dashboards → search "GoFlow2"
Expected: Dashboard with network flow data from pfSense
**Step 5: Commit**
```bash
git add stacks/goflow2/
git commit -m "[ci skip] add goflow2 grafana dashboard for network flow visualization"
```
---
### Task 7: End-to-end verification
**Step 1: Verify Caretta topology**
Open Grafana → Caretta Dashboard → Service Map panel
Expected: Node graph showing pods connected by edges with byte counts
**Step 2: Verify GoFlow2 flows**
Open Grafana → GoFlow2 Dashboard
Expected: Network flow data showing traffic between pfSense segments
**Step 3: Generate test traffic and confirm it appears**
Run: `kubectl --kubeconfig $(pwd)/config exec -n default deploy/some-pod -- curl -s https://example.com > /dev/null`
Expected: New edge appears in Caretta for the pod, new flow in GoFlow2 for the external connection
**Step 4: Push all changes**
Run: `git push origin master`