kms: dedicate MetalLB IP 10.0.20.202 + filter probe noise

Two coupled fixes for the hourly Slack noise + missing client IPs:

1. Move windows-kms off shared 10.0.20.200 to a dedicated MetalLB IP
   10.0.20.202 with externalTrafficPolicy=Local, so vlmcsd sees real
   WAN client IPs (pfSense WAN forwards do DNAT-only; ETP=Local skips
   kube-proxy SNAT). Same pattern mailserver used pre-2026-04-19.
   Sharing 10.0.20.200 is blocked because all 10 services there are
   ETP=Cluster and MetalLB requires consistent ETP per shared IP.

2. Slack notifier now suppresses Slack posts for bare TCP open/close
   pairs (no Application/Activation block) — these are Uptime Kuma's
   port monitor and the new kubelet readiness/liveness probes. Probe
   counts go to a new metric kms_connection_probes_total{source} where
   source classifies the IP as internal_pod / cluster_node / external.
   Real activations are unaffected.

Pod fluidity: added TCP readiness/liveness probes on 1688 to gate Pod
Ready on the listener actually being up — required for ETP=Local so
MetalLB only advertises 10.0.20.202 from a node where vlmcsd is serving.

pfSense side (applied separately, not codified):
- New alias k8s_kms_lb = 10.0.20.202 (KMS-only)
- WAN:1688 NAT + filter rule retargeted from k8s_shared_lb to k8s_kms_lb
- All other forwards on k8s_shared_lb (WireGuard, HTTPS, shadowsocks,
  smtps, etc.) untouched

Runbook updated. Tests added for classify_source / is_probe / process_line.
This commit is contained in:
Viktor Barzin 2026-05-10 13:02:58 +00:00
parent 295cfd776c
commit 9d4e2fc4a0
5 changed files with 317 additions and 56 deletions

View file

@ -228,6 +228,23 @@ resource "kubernetes_deployment" "windows_kms" {
port {
container_port = 1688
}
# Gate Pod Ready on the listener actually being up. Required for
# ETP=Local: MetalLB only advertises 10.0.20.202 from a node where
# the backing pod is Ready, so without this the pod is "Ready"
# before vlmcsd has bound 1688 and ARP can briefly point at a node
# that drops connections during pod start.
readiness_probe {
tcp_socket { port = 1688 }
initial_delay_seconds = 1
period_seconds = 5
failure_threshold = 3
}
liveness_probe {
tcp_socket { port = 1688 }
initial_delay_seconds = 5
period_seconds = 30
failure_threshold = 3
}
volume_mount {
name = "vlmcsd-log"
mount_path = "/var/log/vlmcsd"
@ -300,14 +317,17 @@ resource "kubernetes_service" "windows_kms" {
app = "kms-service"
}
annotations = {
"metallb.io/loadBalancerIPs" = "10.0.20.200"
"metallb.io/allow-shared-ip" = "shared"
# Dedicated MetalLB IP (not shared) so ETP=Local can preserve real
# client IPs in the vlmcsd log. Sharing 10.0.20.200 isn't an option:
# all 10 services there are ETP=Cluster and MetalLB requires a single
# ETP per shared IP.
"metallb.io/loadBalancerIPs" = "10.0.20.202"
}
}
spec {
type = "LoadBalancer"
external_traffic_policy = "Cluster"
external_traffic_policy = "Local"
selector = {
app = "kms-service"
}