12 KiB
Runbook: KMS public exposure (vlmcs.viktorbarzin.me:1688)
vlmcs.viktorbarzin.me:1688/TCP is intentionally open to the internet so any
visitor can activate Volume License Microsoft products. The webpage at
https://kms.viktorbarzin.me/ documents how to use it.
Two hostnames, on purpose (do not merge them):
kms.viktorbarzin.me— the website (Traefik). Serves the docs and the/scripts/*.ps1activators. Internally resolves to the Traefik LB (10.0.20.203), which has no:1688listener.vlmcs.viktorbarzin.me— the KMS endpoint (vlmcsd). A-only (no AAAA — the IPv6 tunnel doesn't forward 1688). Resolves to10.0.20.202on the LAN (Technitium split-horizon, set via API —cloudflare_record.vlmcsinstacks/kmsowns the public A) and to176.12.22.76on the internet (Cloudflare → pfSense WAN NAT :1688). Everyslmgr/osppcommand on the page points here.
Pointing a client at kms.viktorbarzin.me:1688 fails from the LAN with "KMS
server cannot be reached" — that name is the website, not the KMS server.
This runbook covers operations on the public exposure: where to find logs, how to tune the rate limit, how to revoke if abused.
Architecture
- K8s service:
windows-kmsin namespacekms, MetalLB dedicated LB IP10.0.20.202:1688. ETP=Local, so vlmcsd sees real WAN client IPs in its log (pfSense WAN forwards do DNAT-only, no SNAT; ETP=Local skips the kube-proxy SNAT too). Same pattern mailserver used pre-2026-04-19. Sharing10.0.20.200isn't an option — all 10 services there are ETP=Cluster and MetalLB requires a single ETP per shared IP. - Native DNS auto-discovery for LAN clients: any Windows client with
DNS suffix
viktorbarzin.lanactivates with zero config — Windows queries_vlmcs._tcp.viktorbarzin.lanSRV by default, the SRV target resolves tovlmcs.viktorbarzin.lan→10.0.20.202, andslmgr /atosucceeds. Records:_vlmcs._tcp.viktorbarzin.lanSRV 0 0 1688 vlmcs.viktorbarzin.lanvlmcs.viktorbarzin.lanA10.0.20.202kms.viktorbarzin.lanA10.0.20.200(Traefik — for the user-facing website athttps://kms.viktorbarzin.lan/; not the KMS server) Manual override (e.g., for clients without the suffix or for clients on the public internet):slmgr /skms vlmcs.viktorbarzin.me:1688(works LAN + WAN) orslmgr /skms 10.0.20.202:1688(LAN, direct). Do not usekms.viktorbarzin.me:1688— that name is the website (Traefik), not the KMS server. To revert a manually-overridden client back to auto-discovery:slmgr /ckms.
- Pod fluidity: deployment has
replicas=1(notifier dedup state is per-pod) with no node affinity. TCP readiness/liveness probes on 1688 gate Pod Ready on the listener actually being up, so MetalLB only advertises10.0.20.202from a node where vlmcsd is serving. - pfSense WAN forward:
WAN TCP/1688 → k8s_kms_lb:1688(alias =10.0.20.202, dedicated to KMS). Description:KMS public — kms.viktorbarzin.me. Other forwards usingk8s_shared_lb(WireGuard, HTTPS, shadowsocks, smtps, etc.) are unaffected. - Filter rule on the WAN interface, TCP/1688 destination
<k8s_kms_lb>, with state-table per-source caps:max-src-conn 50— concurrent connections per source IPmax-src-conn-rate 10/60— 10 new connections per 60 seconds per sourceoverload <virusprot>flush — sources that exceed either cap get added to pfSense's stockvirusprotpf table and have their existing states flushed. (virusprotis the only table pfSense's filter generator targets foroverload; see/etc/inc/filter.inc. Don't try to point it at a custom table — the schema doesn't expose that knob.)
- Probe filter in slack-notifier: a bare TCP open/close (no
Application/Activation block from vlmcsd) is treated as a probe — Uptime
Kuma's port-type monitor on
windows-kms.kms.svc:1688and the kubelet readiness/liveness probes both hit this path. Probes incrementkms_connection_probes_total{source}(source∈internal_pod,cluster_node,external) and log to stdout, but never post to Slack. Real activations still post. - Website
/scripts+/keys.jsoncarve-out: the website is Anubis-fronted (PoW challenge)./scripts/*and/keys.jsonare carved out to the bare nginx backend (module.ingress_scriptsinstacks/kms,ingress_path) because PowerShelliwr | iex/ConvertFrom-Jsonare non-JS clients that can't solve the PoW — without the carve-out they'd download the Anubis challenge HTML and choke. Everything else stays behind Anubis. Verify:curl -A curl https://kms.viktorbarzin.me/scripts/setup-kms.ps1and.../keys.jsonboth return real content (not "Making sure you're not a bot!"). - Auto-key selection: the scripts no longer require the user to pick a GVLK.
/keys.jsonisdata/products.yamlrendered to JSON (Hugo KEYS output format). When no Volume License key is installed,setup-kms.ps1/kms-bootstrap.ps1detect the edition — Windows via registryEditionID(+CurrentBuildNumberfor LTSC/Server, which share an EditionID across releases), Office via the Click-to-RunProductReleaseIds— fetch/keys.json, andslmgr /ipk/ospp /inpkeythe matching key before activating. Only fires when not already licensed (never clobbers a working retail key). Azure-Edition server SKUs are intentionally unmapped (they collide with Datacenter and KMS may fail there). - Edition switch (kms-bootstrap.ps1, consent-gated): when the installed
product can't KMS-activate (Windows Home/retail; no VL Office), the bootstrap
shows the consequences and asks before changing anything (default No). Windows
→
changepk.exe /ProductKey <target GVLK>(default Pro;$env:KMS_EDITIONoverrides) — in-place edition UPGRADE, needs a reboot then re-run, one-way (no in-place downgrade). Office → slim ODTsetup.exe /configureto a VL product (default ProPlus2024Volume;$env:KMS_OFFICE_PRODUCToverrides) — ~3 GB download, closes Office. Non-interactive runs only proceed with an explicit env override. setup-kms.ps1 stays minimal and points non-VL editions at the bootstrap. NOTE: the changepk/ODT execution paths are unverified on real hardware (no Home/retail test box; the Pro test VM can't be switched reversibly) — syntax-checked + activation regression-tested only. - Self-hosted ODT bootstrapper: the Office reinstall path fetches the Office
Deployment Tool from
https://kms.viktorbarzin.me/scripts/odt-setup.exe(a committed copy inkms-website/static/scripts/), NOT from Microsoft —download.microsoft.com's ODT URL is build-numbered and rotates every release (the old hardcoded one 404'd).$env:KMS_ODT_URLoverrides. The bootstrapper self-updates the Office payload, so refresh the committed copy only occasionally. - Client telemetry → Loki: the scripts POST a small ANONYMOUS diagnostics
event per run to
https://kms.viktorbarzin.me/diag(action, outcome, error + exit codes, EditionID/build/locale, detected Office products, script version; NO hostname/user/keys). Fire-and-forget (3s, swallowed) — never affects activation.$env:KMS_NO_TELEMETRY=1opts out;$env:KMS_DIAG_URLoverrides. Collector: standalonekms-diagDeployment (stacks/kms, python stdlib HTTP on :9102) reachable via the/diagingress carve-out (bypasses Anubis like/scripts); it printsKMSDIAG <json>to stdout → Loki. Query in Grafana:{namespace="kms",pod=~"kms-diag.*"} |= "KMSDIAG". Disclosed in the site FAQ.
Where the logs are
vlmcsd (kms namespace, k8s)
# Live tail
kubectl logs -n kms -l app=kms-service -c windows-kms --tail=50 -f
# All activations in the running pod
kubectl logs -n kms -l app=kms-service -c windows-kms | grep "Incoming KMS request"
Source IPs from the WAN are real client IPs (pfSense DNAT-only + ETP=Local
preserve them through the chain). LAN clients hitting the LB IP directly
appear as their own IP. Pod-source probes (Uptime Kuma) appear as a Calico
pod IP in 10.10.0.0/16. Kubelet readiness/liveness probes appear as the
hosting node IP in 10.0.20.0/24.
Slack notifier (kms namespace, k8s)
kubectl logs -n kms -l app=kms-service -c slack-notifier --tail=50 -f
Posts to #alerts, dedup window 1h per (source-IP, product). Activations
also increment the Prometheus counter kms_activations_total{product,status}
exposed on the same pod at :9101/metrics (scraped by the cluster-wide
kubernetes-pods job; query via Prometheus or Grafana directly).
Probe-only TCP connections (open+close, no KMS RPC) are silently filtered
out of Slack and counted in kms_connection_probes_total{source}. Useful
queries:
# Probe rate by source
rate(kms_connection_probes_total[5m])
# Probes from the public WAN (a non-zero rate here means real port-scans
# are reaching us, not just internal monitoring)
rate(kms_connection_probes_total{source="external"}[5m])
pfSense — virusprot table and filter hits
# SSH to 10.0.20.1 as root
pfctl -t virusprot -T show # who's currently in the virusprot table
pfctl -t virusprot -T expire 86400 # boot anyone added more than 24h ago
pfctl -t virusprot -T flush # nuke the entire table
# Filter rule hit counts (find the KMS public rule, look at Evaluations / States)
pfctl -sr -v | grep -A 4 1688
# State table — current TCP/1688 connections, per source
pfctl -ss | grep ':1688 '
Tightening or loosening the rate limit
The filter rule is configured via the pfSense web UI
(Firewall → Rules → WAN, look for the KMS public — kms.viktorbarzin.me
rule) under Advanced Options → "Maximum new connections per source per
seconds" and "Maximum state entries per source".
- Default:
max-src-conn 50,max-src-conn-rate 10/60 - To tighten (suspected abuse): drop to
max-src-conn 10,max-src-conn-rate 3/60. Flush state and existing virusprot afterwards (pfctl -k 0.0.0.0/0 -K 0.0.0.0/0is overkill — just save+apply the rule, pfSense reloads pf and existing virusprot stay blocked). - To loosen (legitimate users blocked): bump to
max-src-conn-rate 30/60. Thevirusprottable flush still applies on overload; reduce its lifetime viaFirewall → Advanced → State Timeoutsif entries linger.
The overload table entry survives pf reloads. Running
pfctl -t virusprot -T flush after a tuning change clears the slate.
Revoking the public exposure
If the activation surface needs to come down (abuse, legal, audit):
- pfSense web UI →
Firewall → NAT → Port Forward→ findWAN TCP/1688 → k8s_kms_lb→ delete (or disable). Apply. - pfSense web UI →
Firewall → Rules → WAN→ findKMS public — kms.viktorbarzin.me→ delete (or disable). Apply. - Verify externally: from a phone tether,
nc -zw3 kms.viktorbarzin.me 1688should now fail.
The k8s service stays reachable on the LAN
(10.0.20.202:1688 directly, and the website at kms.viktorbarzin.lan
via Traefik on 10.0.20.200:443) — only the WAN port-forward is removed.
To put it back, recreate the NAT rule (target alias k8s_kms_lb,
port 1688) and the filter rule with the same per-source caps. The alias
itself is independent of any forward and persists across delete/restore.
Related
- Stack:
stacks/kms/(Terraform; deployment, MetalLB Service, ingress, ExternalSecret for the Slack webhook) - Webpage source:
kms-website/repo (Hugo + nginx; Woodpecker builds + pushes to forgejo, thenkubectl set image deployment/kms-web-page) - Networking architecture footnote:
docs/architecture/networking.md§ "MetalLB & Load Balancing"