First analysis pass over Calico GNP wave1-egress-observe-tier34 data captured
in Loki since 2026-05-19. Pulled ~10000 flow log lines covering 36 source
namespaces (of 82 selected by tier 3+4). Analysis script outputs preserved
on the dev host at /tmp/{analyze_flows2,build_allowlist}.py.
## Findings
**Universal baseline (every observed ns):**
- DNS to kube-system/kube-dns UDP/53
- Often mysql.dbaas TCP/3306 or pg.dbaas TCP/5432
- Often redis.redis TCP/6379
**Rollout tiering by egress fan-out:**
- Tier A (recruiter-responder only): 2 destinations, ideal pilot
- Tier B (29 namespaces): ≤3 external IPs, ≤5 internal — batch rollout
- Tier C (4 namespaces: f1-stream/openclaw/woodpecker/status-page):
needs per-IP investigation
- Tier D (servarr): 130+ external IPs (BitTorrent P2P) — keep Log+Allow
permanently or move to dedicated egress proxy
## Caveats blocking immediate enforce
- Observation horizon too short: ~6h dense data, ~24h total. Need ≥7 days
to catch weekly CronJobs, Vault token rotations, Keel pulls.
- External IPs are dynamic (Cloudflare/AWS rotate). Static IP allowlists
will break — need DNS-based selectors or CIDR ranges.
- Some intra-namespace traffic bypasses the Calico filter chain.
## Recommended next steps
1. Continue observation through 2026-05-29 (full week). Compare destination
set day-over-day; if stable, allowlist is ready.
2. First enforce: recruiter-responder (allowlist = kube-dns + telegram CIDR
+ vault/ESO service IPs).
3. Tier B phased rollout at 3-5 ns/day after pilot proves out.
Full analysis: docs/architecture/wave1-egress-observation-2026-05-22.md
Tracked under beads code-8ywc.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| architecture | ||
| benchmarks | ||
| plans | ||
| post-mortems | ||
| runbooks | ||
| known-issues.md | ||
| README.md | ||
Infrastructure Documentation
This repository contains the configuration and documentation for a homelab Kubernetes cluster running on Proxmox. The infrastructure hosts 70+ services managed declaratively with Terraform and Terragrunt.
Quick Reference
Network Ranges
- Physical Network:
192.168.1.0/24- Physical devices and host network - Management VLAN 10:
10.0.10.0/24- Infrastructure VMs and management - Kubernetes VLAN 20:
10.0.20.0/24- Kubernetes cluster network
Key URLs
- Public:
viktorbarzin.me - Internal:
viktorbarzin.lan
Architecture Documentation
| Document | Description |
|---|---|
| Overview | Infrastructure overview, hardware specs, VM inventory, and service catalog |
| Networking | Network topology, VLANs, routing, and firewall rules |
| VPN | Headscale mesh VPN and Cloudflare Tunnel configuration |
| Storage | Proxmox host NFS, Proxmox CSI (LVM-thin + LUKS2), and persistent volume management |
| Authentication | Authentik SSO, OIDC flows, and service integration |
| Security | CrowdSec IPS, Kyverno policies, and security controls |
| Monitoring | Prometheus, Grafana, Loki, and observability stack |
| Secrets Management | HashiCorp Vault integration and secret rotation |
| CI/CD | Woodpecker CI pipeline and deployment automation |
| Backup & DR | Backup strategy, disaster recovery, and restore procedures |
| Compute | Proxmox VMs, GPU passthrough, K8s resource management, and VPA |
| Databases | PostgreSQL, MySQL, Redis, and database operators |
| Multi-tenancy | Namespace isolation, tier system, and resource quotas |
Operations
- Runbooks - Step-by-step operational procedures
- Plans - Infrastructure change plans and rollout strategies
Getting Started
- Review the Overview for a high-level understanding
- Read the Networking doc to understand connectivity
- Check Compute for resource management patterns
- Explore individual architecture docs based on your area of interest