diff --git a/docs/plans/2026-02-28-network-visualization-plan.md b/docs/plans/2026-02-28-network-visualization-plan.md new file mode 100644 index 00000000..1858c478 --- /dev/null +++ b/docs/plans/2026-02-28-network-visualization-plan.md @@ -0,0 +1,445 @@ +# Network Traffic Visualization Implementation Plan + +> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. + +**Goal:** Deploy Caretta (pod-to-pod eBPF topology) and GoFlow2 + pfSense softflowd (full network NetFlow) with Grafana dashboards for real-time network visualization. + +**Architecture:** Two data paths feed into existing Prometheus+Grafana: (1) Caretta eBPF DaemonSet tracks pod TCP connections, (2) pfSense exports NetFlow to GoFlow2 collector pod. Both expose Prometheus metrics scraped by existing Prometheus, visualized in Grafana Node Graph panels. + +**Tech Stack:** Terraform/Terragrunt, Helm (Caretta), raw K8s resources (GoFlow2), pfSense SSH (softflowd), Prometheus, Grafana + +**Design doc:** `docs/plans/2026-02-28-network-visualization-design.md` + +--- + +### Task 1: Create Caretta Terraform stack + +**Files:** +- Create: `stacks/caretta/terragrunt.hcl` +- Create: `stacks/caretta/main.tf` + +**Step 1: Create the terragrunt.hcl** + +```hcl +# stacks/caretta/terragrunt.hcl +include "root" { + path = find_in_parent_folders() +} + +dependency "platform" { + config_path = "../platform" + skip_outputs = true +} +``` + +**Step 2: Create main.tf with Helm release** + +```hcl +variable "tls_secret_name" { type = string } + +resource "kubernetes_namespace" "caretta" { + metadata { + name = "caretta" + labels = { + tier = local.tiers.cluster + } + } +} + +resource "helm_release" "caretta" { + namespace = kubernetes_namespace.caretta.metadata[0].name + name = "caretta" + repository = "https://helm.groundcover.com/" + chart = "caretta" + version = "0.0.16" + + set { + name = "victoria-metrics-single.enabled" + value = "false" + } + set { + name = "grafana.enabled" + value = "false" + } +} +``` + +**Step 3: Create secrets symlink** + +Run: `cd stacks/caretta && ln -s ../../secrets secrets` + +**Step 4: Apply** + +Run: `cd stacks/caretta && terragrunt apply --non-interactive` + +**Step 5: Verify DaemonSet is running** + +Run: `kubectl --kubeconfig $(pwd)/config get daemonset -n caretta` +Expected: Caretta DaemonSet with 5 pods (one per node) + +**Step 6: Commit** + +```bash +git add stacks/caretta/ +git commit -m "[ci skip] deploy caretta eBPF pod topology visualization" +``` + +--- + +### Task 2: Add Caretta Grafana dashboard + +**Files:** +- Modify: `stacks/caretta/main.tf` + +**Step 1: Download dashboard JSON** + +Run: `curl -sL https://raw.githubusercontent.com/groundcover-com/caretta/master/chart/dashboard.json > stacks/caretta/dashboard.json` + +**Step 2: Add ConfigMap to main.tf** + +Append to `stacks/caretta/main.tf`: + +```hcl +resource "kubernetes_config_map" "caretta_dashboard" { + metadata { + name = "caretta-grafana-dashboard" + namespace = kubernetes_namespace.caretta.metadata[0].name + labels = { + grafana_dashboard = "1" + } + } + data = { + "caretta-dashboard.json" = file("${path.module}/dashboard.json") + } +} +``` + +**Step 3: Apply** + +Run: `cd stacks/caretta && terragrunt apply --non-interactive` + +**Step 4: Verify dashboard appears in Grafana** + +Open `https://grafana.viktorbarzin.me` → Dashboards → search "Caretta" +Expected: Dashboard visible with Node Graph panel (may be empty until Prometheus scrape is configured) + +**Step 5: Commit** + +```bash +git add stacks/caretta/ +git commit -m "[ci skip] add caretta grafana dashboard via sidecar configmap" +``` + +--- + +### Task 3: Create GoFlow2 Terraform stack + +**Files:** +- Create: `stacks/goflow2/terragrunt.hcl` +- Create: `stacks/goflow2/main.tf` + +**Step 1: Create the terragrunt.hcl** + +```hcl +# stacks/goflow2/terragrunt.hcl +include "root" { + path = find_in_parent_folders() +} + +dependency "platform" { + config_path = "../platform" + skip_outputs = true +} +``` + +**Step 2: Create main.tf with Deployment + Services** + +```hcl +variable "tls_secret_name" { type = string } + +resource "kubernetes_namespace" "goflow2" { + metadata { + name = "goflow2" + labels = { + tier = local.tiers.cluster + } + } +} + +resource "kubernetes_deployment" "goflow2" { + metadata { + name = "goflow2" + namespace = kubernetes_namespace.goflow2.metadata[0].name + } + spec { + replicas = 1 + selector { + match_labels = { + app = "goflow2" + } + } + template { + metadata { + labels = { + app = "goflow2" + } + } + spec { + container { + name = "goflow2" + image = "netsampler/goflow2:v2.2.1" + args = ["-listen", "netflow://:2055", "-transport", "stdout", "-format", "json"] + + port { + name = "netflow" + container_port = 2055 + protocol = "UDP" + } + port { + name = "metrics" + container_port = 8080 + protocol = "TCP" + } + + resources { + requests = { + cpu = "50m" + memory = "64Mi" + } + limits = { + cpu = "200m" + memory = "256Mi" + } + } + } + } + } + } +} + +resource "kubernetes_service" "goflow2_metrics" { + metadata { + name = "goflow2" + namespace = kubernetes_namespace.goflow2.metadata[0].name + } + spec { + selector = { + app = "goflow2" + } + port { + name = "metrics" + port = 8080 + target_port = 8080 + protocol = "TCP" + } + } +} + +resource "kubernetes_service" "goflow2_netflow" { + metadata { + name = "goflow2-netflow" + namespace = kubernetes_namespace.goflow2.metadata[0].name + } + spec { + type = "NodePort" + selector = { + app = "goflow2" + } + port { + name = "netflow" + port = 2055 + target_port = 2055 + protocol = "UDP" + node_port = 32055 + } + } +} +``` + +**Step 3: Create secrets symlink** + +Run: `cd stacks/goflow2 && ln -s ../../secrets secrets` + +**Step 4: Apply** + +Run: `cd stacks/goflow2 && terragrunt apply --non-interactive` + +**Step 5: Verify pod is running** + +Run: `kubectl --kubeconfig $(pwd)/config get pods -n goflow2` +Expected: 1 goflow2 pod running + +**Step 6: Verify NodePort is accessible** + +Run: `kubectl --kubeconfig $(pwd)/config get svc -n goflow2 goflow2-netflow` +Expected: NodePort 32055/UDP + +**Step 7: Commit** + +```bash +git add stacks/goflow2/ +git commit -m "[ci skip] deploy goflow2 netflow collector for network visualization" +``` + +--- + +### Task 4: Add Prometheus scrape targets for Caretta and GoFlow2 + +**Files:** +- Modify: `stacks/platform/modules/monitoring/prometheus_chart_values.tpl` (append to extraScrapeConfigs) + +**Step 1: Append scrape jobs** + +Add at the end of `extraScrapeConfigs` (before the final blank line at line 882): + +```yaml + - job_name: 'caretta' + static_configs: + - targets: + - "caretta-caretta.caretta.svc.cluster.local:7117" + metrics_path: '/metrics' + - job_name: 'goflow2' + static_configs: + - targets: + - "goflow2.goflow2.svc.cluster.local:8080" + metrics_path: '/metrics' +``` + +**Step 2: Apply platform stack** + +Run: `cd stacks/platform && terragrunt apply --non-interactive` + +**Step 3: Verify Prometheus targets** + +Open `https://grafana.viktorbarzin.me` → Explore → Prometheus → query `up{job="caretta"}` and `up{job="goflow2"}` +Expected: Both return `1` + +**Step 4: Verify Caretta metrics flowing** + +Query: `caretta_links_observed` +Expected: Multiple time series with client_name/server_name labels showing pod connections + +**Step 5: Commit** + +```bash +git add stacks/platform/modules/monitoring/prometheus_chart_values.tpl +git commit -m "[ci skip] add caretta and goflow2 prometheus scrape targets" +``` + +--- + +### Task 5: Install and configure softflowd on pfSense + +**Files:** None (SSH to pfSense) + +**Step 1: SSH to pfSense and install softflowd** + +Run: `ssh admin@10.0.20.1 "pkg install -y softflowd"` + +If `softflowd` is available via pfSense package manager instead: +Run: `ssh admin@10.0.20.1 "pfSsh.php playback installpkg softflowd"` + +**Step 2: Determine LAN interface name** + +Run: `ssh admin@10.0.20.1 "ifconfig -l"` +Expected: Identify the LAN interface (likely `vtnet1` or `igb1`) + +**Step 3: Configure softflowd** + +Pick any K8s node IP (e.g., 10.0.20.100) with NodePort 32055: + +Run: +```bash +ssh admin@10.0.20.1 "softflowd -i -n 10.0.20.100:32055 -v 9 -t maxlife=300" +``` + +Flags: +- `-i `: Monitor this interface +- `-n 10.0.20.100:32055`: Send NetFlow v9 to GoFlow2 NodePort +- `-v 9`: NetFlow version 9 +- `-t maxlife=300`: Max flow lifetime 5 minutes + +**Step 4: Verify flows are arriving at GoFlow2** + +Run: `kubectl --kubeconfig $(pwd)/config logs -n goflow2 -l app=goflow2 --tail=20` +Expected: JSON flow records appearing in stdout + +**Step 5: Make softflowd persistent** + +Ensure softflowd starts on boot. On pfSense/FreeBSD: +Run: `ssh admin@10.0.20.1 'echo "softflowd_enable=\"YES\"" >> /etc/rc.conf && echo "softflowd_flags=\"-i -n 10.0.20.100:32055 -v 9\"" >> /etc/rc.conf'` + +--- + +### Task 6: Add GoFlow2 Grafana dashboard + +**Files:** +- Create: `stacks/goflow2/dashboard.json` +- Modify: `stacks/goflow2/main.tf` + +**Step 1: Create a GoFlow2 dashboard JSON** + +Create `stacks/goflow2/dashboard.json` — a Grafana dashboard with panels for: +- Top talkers by bytes (bar chart, query: `topk(10, sum by (src_addr, dst_addr) (rate(flow_bytes[5m])))`) +- Protocol breakdown (pie chart, query: `sum by (proto) (rate(flow_bytes[5m]))`) +- Flows over time (time series, query: `sum(rate(flow_packets[5m]))`) + +Note: Exact metric names will depend on GoFlow2's Prometheus output — verify after Task 5 by querying `{job="goflow2"}` in Prometheus. Adjust dashboard queries to match actual metric names. + +**Step 2: Add ConfigMap to main.tf** + +Append to `stacks/goflow2/main.tf`: + +```hcl +resource "kubernetes_config_map" "goflow2_dashboard" { + metadata { + name = "goflow2-grafana-dashboard" + namespace = kubernetes_namespace.goflow2.metadata[0].name + labels = { + grafana_dashboard = "1" + } + } + data = { + "goflow2-dashboard.json" = file("${path.module}/dashboard.json") + } +} +``` + +**Step 3: Apply** + +Run: `cd stacks/goflow2 && terragrunt apply --non-interactive` + +**Step 4: Verify in Grafana** + +Open `https://grafana.viktorbarzin.me` → Dashboards → search "GoFlow2" +Expected: Dashboard with network flow data from pfSense + +**Step 5: Commit** + +```bash +git add stacks/goflow2/ +git commit -m "[ci skip] add goflow2 grafana dashboard for network flow visualization" +``` + +--- + +### Task 7: End-to-end verification + +**Step 1: Verify Caretta topology** + +Open Grafana → Caretta Dashboard → Service Map panel +Expected: Node graph showing pods connected by edges with byte counts + +**Step 2: Verify GoFlow2 flows** + +Open Grafana → GoFlow2 Dashboard +Expected: Network flow data showing traffic between pfSense segments + +**Step 3: Generate test traffic and confirm it appears** + +Run: `kubectl --kubeconfig $(pwd)/config exec -n default deploy/some-pod -- curl -s https://example.com > /dev/null` +Expected: New edge appears in Caretta for the pod, new flow in GoFlow2 for the external connection + +**Step 4: Push all changes** + +Run: `git push origin master`