kms: deploy slack-notifier sidecar with Prometheus metrics + document public exposure

Slack notifier now also exposes /metrics on :9101 with stdlib HTTP — counts
activations and dedup-skips by product, gauges last-activation timestamp.
Pod template gets the standard prometheus.io/scrape annotations so the
cluster-wide kubernetes-pods job picks it up via pod IP. Memory request
bumped to 48Mi to cover counter dicts + HTTPServer.

Plus docs: networking.md footnotes the windows-kms row noting public WAN
exposure with the rate-limited (max-src-conn 50, max-src-conn-rate 10/60,
overload <virusprot> flush) pfSense filter rule, and a new runbook covers
log locations, rate-limit tuning, and how to revoke the WAN forward.

The matching pfSense rule was tightened in place (TCP-only + rate limits)
via SSH; pfSense isn't Terraform-managed.
This commit is contained in:
Viktor Barzin 2026-05-09 22:12:46 +00:00
parent 82dc0f9687
commit 572d6cd8e0
4 changed files with 456 additions and 3 deletions

View file

@ -261,7 +261,7 @@ MetalLB v0.15.3 allocates IPs from the range 10.0.20.200-10.0.20.220 in **Layer
| traefik | traefik | 10.0.20.200 (shared) | 80, 443, 443/UDP (HTTP/3), 10200, 10300, 11434/TCP |
| coturn | coturn | 10.0.20.200 (shared) | 3478/UDP (STUN/TURN), 49152-49252/UDP (relay) |
| headscale | headscale | 10.0.20.200 (shared) | 41641/UDP, 3479/UDP |
| windows-kms | kms | 10.0.20.200 (shared) | 1688/TCP |
| windows-kms¹ | kms | 10.0.20.200 (shared) | 1688/TCP |
| qbittorrent | servarr | 10.0.20.200 (shared) | 50000/TCP+UDP |
| shadowsocks | shadowsocks | 10.0.20.200 (shared) | 8388/TCP+UDP |
| torrserver-bt | tor-proxy | 10.0.20.200 (shared) | 5665/TCP |
@ -272,6 +272,8 @@ MetalLB v0.15.3 allocates IPs from the range 10.0.20.200-10.0.20.220 in **Layer
pfSense aliases reference these IPs: `k8s_shared_lb` (10.0.20.200), `technitium_dns` (10.0.20.201). NAT rules use aliases for maintainability.
¹ **windows-kms is publicly WAN-exposed.** pfSense forwards WAN TCP/1688 → `k8s_shared_lb:1688` so any internet host can activate. The matching filter rule applies a per-source rate limit (`max-src-conn 50`, `max-src-conn-rate 10/60`) with `overload <virusprot>` flush — offenders are auto-added to pfSense's stock `virusprot` pf table for follow-on blocks. Operations (rate-limit tuning, log locations, revocation) are documented in `docs/runbooks/kms-public-exposure.md`.
Critical services are scaled to **3 replicas**:
- Traefik (PDB: minAvailable=2)
- Authentik (PDB: minAvailable=2)

View file

@ -0,0 +1,115 @@
# Runbook: KMS public exposure (kms.viktorbarzin.me:1688)
`kms.viktorbarzin.me:1688/TCP` is intentionally open to the internet so any
visitor can activate Volume License Microsoft products. The webpage at
`https://kms.viktorbarzin.me/` documents how to use it.
This runbook covers operations on the public exposure: where to find logs,
how to tune the rate limit, how to revoke if abused.
## Architecture
- **K8s service**: `windows-kms` in namespace `kms`, MetalLB shared LB IP
`10.0.20.200:1688`. ETP=Cluster, so client IPs in vlmcsd logs are SNAT'd
k8s node IPs (not real-world client IPs). Trade-off accepted —
preserving real client IPs would require a dedicated MetalLB IP with
ETP=Local or a PROXY-protocol bounce; vlmcsd doesn't speak PROXY-v2.
- **pfSense WAN forward**: `WAN TCP/1688 → k8s_shared_lb:1688`
(alias = `10.0.20.200`). Description: `KMS public — kms.viktorbarzin.me`.
- **Filter rule** on the WAN interface, TCP/1688, with state-table
per-source caps:
- `max-src-conn 50` — concurrent connections per source IP
- `max-src-conn-rate 10/60` — 10 new connections per 60 seconds per
source
- `overload <virusprot>` flush — sources that exceed either cap get added
to pfSense's stock `virusprot` pf table and have their existing states
flushed. (`virusprot` is the only table pfSense's filter generator
targets for `overload`; see `/etc/inc/filter.inc`. Don't try to point
it at a custom table — the schema doesn't expose that knob.)
## Where the logs are
### vlmcsd (kms namespace, k8s)
```bash
# Live tail
kubectl logs -n kms -l app=kms-service -c windows-kms --tail=50 -f
# All activations in the running pod
kubectl logs -n kms -l app=kms-service -c windows-kms | grep "Incoming KMS request"
```
Source IPs in this log are the SNAT'd node IPs because the LB Service uses
ETP=Cluster on a shared MetalLB IP. Don't expect real WAN client IPs here.
### Slack notifier (kms namespace, k8s)
```bash
kubectl logs -n kms -l app=kms-service -c slack-notifier --tail=50 -f
```
Posts to `#alerts`, dedup window 1h per (source-IP, product). Activations
also increment the Prometheus counter `kms_activations_total{product,status}`
exposed on the same pod at `:9101/metrics` (scraped by the cluster-wide
`kubernetes-pods` job; query via Prometheus or Grafana directly).
### pfSense — virusprot table and filter hits
```bash
# SSH to 10.0.20.1 as root
pfctl -t virusprot -T show # who's currently in the virusprot table
pfctl -t virusprot -T expire 86400 # boot anyone added more than 24h ago
pfctl -t virusprot -T flush # nuke the entire table
# Filter rule hit counts (find the KMS public rule, look at Evaluations / States)
pfctl -sr -v | grep -A 4 1688
# State table — current TCP/1688 connections, per source
pfctl -ss | grep ':1688 '
```
## Tightening or loosening the rate limit
The filter rule is configured via the pfSense web UI
(`Firewall → Rules → WAN`, look for the `KMS public — kms.viktorbarzin.me`
rule) under **Advanced Options → "Maximum new connections per source per
seconds"** and **"Maximum state entries per source"**.
- **Default**: `max-src-conn 50`, `max-src-conn-rate 10/60`
- To **tighten** (suspected abuse): drop to `max-src-conn 10`,
`max-src-conn-rate 3/60`. Flush state and existing virusprot afterwards
(`pfctl -k 0.0.0.0/0 -K 0.0.0.0/0` is overkill — just save+apply the
rule, pfSense reloads pf and existing virusprot stay blocked).
- To **loosen** (legitimate users blocked): bump to
`max-src-conn-rate 30/60`. The `virusprot` table flush still applies on
overload; reduce its lifetime via
`Firewall → Advanced → State Timeouts` if entries linger.
The `overload` table entry survives pf reloads. Running
`pfctl -t virusprot -T flush` after a tuning change clears the slate.
## Revoking the public exposure
If the activation surface needs to come down (abuse, legal, audit):
1. **pfSense web UI**`Firewall → NAT → Port Forward` → find
`WAN TCP/1688 → k8s_shared_lb`**delete** (or disable). Apply.
2. **pfSense web UI**`Firewall → Rules → WAN` → find
`KMS public — kms.viktorbarzin.me`**delete** (or disable). Apply.
3. Verify externally: from a phone tether, `nc -zw3 kms.viktorbarzin.me 1688`
should now fail.
The k8s service stays reachable on the LAN
(`10.0.20.200:1688` and the internal `kms.viktorbarzin.lan` ingress for
the webpage) — only the WAN port-forward is removed.
To put it back, recreate the NAT rule (target alias `k8s_shared_lb`,
port `1688`) and the filter rule with the same per-source caps.
## Related
- Stack: `stacks/kms/` (Terraform; deployment, MetalLB Service, ingress,
ExternalSecret for the Slack webhook)
- Webpage source: `kms-website/` repo (Hugo + nginx, deployed via Drone CI)
- Networking architecture footnote:
`docs/architecture/networking.md` § "MetalLB & Load Balancing"

View file

@ -0,0 +1,222 @@
#!/usr/bin/env python3
"""
Tail vlmcsd verbose log; post a Slack message per activation, and expose
Prometheus metrics on /metrics for activation counts.
vlmcsd verbose output emits a multi-line block per request:
<ts>: IPv4 connection accepted: <ip>:<port>.
<ts>: <<< Incoming KMS request
<ts>: Application ID : <uuid> (<name>)
<ts>: Activation ID (Product): <uuid> (<product>)
<ts>: Workstation name : <hostname>
...
<ts>: IPv4 connection closed: <ip>:<port>.
We accumulate per-connection state and emit on close. Dedupes by
(client_ip, product) within DEDUP_WINDOW_SECONDS to avoid spam from
Windows' default 7-day re-activation cycle hitting us repeatedly.
Prometheus metrics (text format, no client_ip label cardinality):
kms_activations_total{product, status} counter
kms_activations_dedup_skipped_total{product} counter
kms_last_activation_timestamp_seconds gauge
kms_slack_notifier_up gauge (heartbeat)
"""
import json
import os
import re
import sys
import threading
import time
import urllib.error
import urllib.request
from collections import OrderedDict
from http.server import BaseHTTPRequestHandler, HTTPServer
LOG_PATH = os.environ.get("VLMCSD_LOG", "/var/log/vlmcsd/vlmcsd.log")
WEBHOOK = os.environ["SLACK_WEBHOOK_URL"]
CHANNEL = os.environ.get("SLACK_CHANNEL", "#alerts")
DEDUP_WINDOW = int(os.environ.get("DEDUP_WINDOW_SECONDS", "3600"))
DEDUP_MAX = 4096
METRICS_PORT = int(os.environ.get("METRICS_PORT", "9101"))
OPEN_RE = re.compile(r":\s*IPv[46] connection accepted:\s*([0-9a-f.:\[\]]+):\d+")
CLOSE_RE = re.compile(r":\s*IPv[46] connection closed:\s*([0-9a-f.:\[\]]+):\d+")
APP_RE = re.compile(r":\s*Application ID\s*:\s*[0-9a-f-]+\s*\(([^)]+)\)")
PROD_RE = re.compile(r":\s*Activation ID \(Product\)\s*:\s*[0-9a-f-]+\s*\(([^)]+)\)")
HOST_RE = re.compile(r":\s*Workstation name\s*:\s*(.+?)\s*$")
STATUS_RE = re.compile(r":\s*Licensing status\s*:\s*\d+\s*\((.+?)\)\s*$")
_metrics_lock = threading.Lock()
_activations: dict = {}
_dedup_skipped: dict = {}
_last_activation_ts: float = 0.0
def _esc(value: str) -> str:
return str(value).replace("\\", "\\\\").replace('"', '\\"').replace("\n", "\\n")
def record_activation(product: str, status: str) -> None:
global _last_activation_ts
with _metrics_lock:
key = (product, status)
_activations[key] = _activations.get(key, 0) + 1
_last_activation_ts = time.time()
def record_dedup_skip(product: str) -> None:
with _metrics_lock:
_dedup_skipped[product] = _dedup_skipped.get(product, 0) + 1
def render_metrics() -> bytes:
out = []
with _metrics_lock:
activations = dict(_activations)
dedup_skipped = dict(_dedup_skipped)
last_ts = _last_activation_ts
out.append("# HELP kms_activations_total KMS activation events that resulted in a Slack post.")
out.append("# TYPE kms_activations_total counter")
for (product, status), count in sorted(activations.items()):
out.append(
f'kms_activations_total{{product="{_esc(product)}",status="{_esc(status)}"}} {count}'
)
out.append("# HELP kms_activations_dedup_skipped_total KMS activation events suppressed by dedup window.")
out.append("# TYPE kms_activations_dedup_skipped_total counter")
for product, count in sorted(dedup_skipped.items()):
out.append(f'kms_activations_dedup_skipped_total{{product="{_esc(product)}"}} {count}')
out.append("# HELP kms_last_activation_timestamp_seconds Unix ts of the last non-deduped activation.")
out.append("# TYPE kms_last_activation_timestamp_seconds gauge")
out.append(f"kms_last_activation_timestamp_seconds {last_ts}")
out.append("# HELP kms_slack_notifier_up 1 while the notifier process is running.")
out.append("# TYPE kms_slack_notifier_up gauge")
out.append("kms_slack_notifier_up 1")
return ("\n".join(out) + "\n").encode("utf-8")
class MetricsHandler(BaseHTTPRequestHandler):
def do_GET(self):
if self.path == "/healthz":
self.send_response(200)
self.send_header("Content-Type", "text/plain")
self.end_headers()
self.wfile.write(b"ok\n")
return
if self.path != "/metrics":
self.send_response(404)
self.end_headers()
return
body = render_metrics()
self.send_response(200)
self.send_header("Content-Type", "text/plain; version=0.0.4; charset=utf-8")
self.send_header("Content-Length", str(len(body)))
self.end_headers()
self.wfile.write(body)
def log_message(self, *args, **kwargs):
pass
def start_metrics_server() -> None:
server = HTTPServer(("0.0.0.0", METRICS_PORT), MetricsHandler)
print(f"[slack-notifier] metrics on :{METRICS_PORT}/metrics", flush=True)
server.serve_forever()
def slack_post(text: str) -> None:
payload = json.dumps({"channel": CHANNEL, "text": text, "username": "kms.viktorbarzin.me", "icon_emoji": ":computer:"}).encode("utf-8")
req = urllib.request.Request(WEBHOOK, data=payload, headers={"Content-Type": "application/json"})
try:
urllib.request.urlopen(req, timeout=10).read()
except urllib.error.URLError as exc:
print(f"[slack] post failed: {exc}", file=sys.stderr)
class DedupCache(OrderedDict):
def should_send(self, key: str) -> bool:
now = time.time()
while self and (now - next(iter(self.values()))) > DEDUP_WINDOW:
self.popitem(last=False)
if key in self and (now - self[key]) < DEDUP_WINDOW:
return False
if len(self) >= DEDUP_MAX:
self.popitem(last=False)
self[key] = now
self.move_to_end(key)
return True
def follow(path: str):
while not os.path.exists(path):
time.sleep(1)
fh = open(path, "r", encoding="utf-8", errors="replace")
fh.seek(0, 2)
inode = os.fstat(fh.fileno()).st_ino
while True:
line = fh.readline()
if line:
yield line.rstrip("\n")
continue
time.sleep(0.5)
try:
new_inode = os.stat(path).st_ino
if new_inode != inode:
fh.close()
fh = open(path, "r", encoding="utf-8", errors="replace")
inode = new_inode
except FileNotFoundError:
time.sleep(1)
def main() -> None:
threading.Thread(target=start_metrics_server, daemon=True).start()
dedup = DedupCache()
print(f"[slack-notifier] tailing {LOG_PATH}, posting to {CHANNEL} via Slack", flush=True)
state: dict = {}
for line in follow(LOG_PATH):
if (m := OPEN_RE.search(line)):
state = {"ip": m.group(1)}
continue
if not state:
continue
if (m := APP_RE.search(line)):
state["app"] = m.group(1)
elif (m := PROD_RE.search(line)):
state["product"] = m.group(1)
elif (m := HOST_RE.search(line)):
state["host"] = m.group(1)
elif (m := STATUS_RE.search(line)):
state["status"] = m.group(1)
elif CLOSE_RE.search(line):
ip = state.get("ip", "?")
product = state.get("product", state.get("app", "unknown"))
host = state.get("host", "?")
status = state.get("status", "unknown")
key = f"{ip}|{product}"
if dedup.should_send(key):
text = (
f":computer: KMS activation\n"
f"• *Client*: `{ip}`\n"
f"• *Workstation*: `{host}`\n"
f"• *Product*: `{product}`\n"
f"• *Status before*: {status}"
)
slack_post(text)
record_activation(product, status)
print(f"[slack-notifier] sent: ip={ip} product={product!r} host={host!r}", flush=True)
else:
record_dedup_skip(product)
print(f"[slack-notifier] dedup-skip: ip={ip} product={product!r}", flush=True)
state = {}
if __name__ == "__main__":
main()

View file

@ -119,6 +119,46 @@ module "ingress" {
}
}
resource "kubernetes_config_map" "kms_slack_notifier" {
metadata {
name = "kms-slack-notifier"
namespace = kubernetes_namespace.kms.metadata[0].name
}
data = {
"notifier.py" = file("${path.module}/files/slack-notifier.py")
}
}
resource "kubernetes_manifest" "kms_slack_external_secret" {
manifest = {
apiVersion = "external-secrets.io/v1beta1"
kind = "ExternalSecret"
metadata = {
name = "kms-slack-webhook"
namespace = kubernetes_namespace.kms.metadata[0].name
}
spec = {
refreshInterval = "1h"
secretStoreRef = {
name = "vault-kv"
kind = "ClusterSecretStore"
}
target = {
name = "kms-slack-webhook"
creationPolicy = "Owner"
}
data = [{
secretKey = "url"
remoteRef = {
key = "kms"
property = "slack_webhook_url"
}
}]
}
}
depends_on = [kubernetes_namespace.kms]
}
resource "kubernetes_deployment" "windows_kms" {
metadata {
name = "kms"
@ -140,11 +180,31 @@ resource "kubernetes_deployment" "windows_kms" {
labels = {
app = "kms-service"
}
annotations = {
# Reload pods when the notifier script changes
"checksum/notifier" = sha1(file("${path.module}/files/slack-notifier.py"))
# Prometheus scrape kubernetes-pods job picks up via pod IP
"prometheus.io/scrape" = "true"
"prometheus.io/port" = "9101"
"prometheus.io/path" = "/metrics"
}
}
spec {
volume {
name = "vlmcsd-log"
empty_dir {}
}
volume {
name = "slack-notifier-script"
config_map {
name = kubernetes_config_map.kms_slack_notifier.metadata[0].name
}
}
container {
image = "kebe/vlmcsd:latest"
name = "windows-kms"
image = "kebe/vlmcsd:latest"
name = "windows-kms"
command = ["/usr/bin/vlmcsd"]
args = ["-D", "-v", "-l", "/var/log/vlmcsd/vlmcsd.log"]
resources {
limits = {
memory = "64Mi"
@ -157,6 +217,59 @@ resource "kubernetes_deployment" "windows_kms" {
port {
container_port = 1688
}
volume_mount {
name = "vlmcsd-log"
mount_path = "/var/log/vlmcsd"
}
}
container {
image = "python:3.12-alpine"
name = "slack-notifier"
command = ["python3", "-u", "/scripts/notifier.py"]
env {
name = "VLMCSD_LOG"
value = "/var/log/vlmcsd/vlmcsd.log"
}
env {
name = "SLACK_CHANNEL"
value = "#alerts"
}
env {
name = "DEDUP_WINDOW_SECONDS"
value = "3600"
}
env {
name = "SLACK_WEBHOOK_URL"
value_from {
secret_key_ref {
name = "kms-slack-webhook"
key = "url"
}
}
}
port {
container_port = 9101
name = "metrics"
}
resources {
limits = {
memory = "64Mi"
}
requests = {
cpu = "5m"
memory = "48Mi"
}
}
volume_mount {
name = "vlmcsd-log"
mount_path = "/var/log/vlmcsd"
read_only = true
}
volume_mount {
name = "slack-notifier-script"
mount_path = "/scripts"
read_only = true
}
}
}
}
@ -165,6 +278,7 @@ resource "kubernetes_deployment" "windows_kms" {
# KYVERNO_LIFECYCLE_V1: Kyverno admission webhook mutates dns_config with ndots=2
ignore_changes = [spec[0].template[0].spec[0].dns_config]
}
depends_on = [kubernetes_manifest.kms_slack_external_secret]
}
resource "kubernetes_service" "windows_kms" {