Four-agent parallel investigation finally pinned down what's happening
with the hmembeds.one streams. The TL;DR is unexpected: there is no
fingerprint check, no decoder failure, no broken JS — the obfuscated
decoder is trivial to reproduce, but the upstream origin is dead.
Findings (saved at /tmp/jwre/{findings.md, blob-analysis.md,
fingerprint-gap.md, trace-summary.md}):
1. **The "ZpQw9XkLmN8c3vR3" blob is decoy.** It's an Adcash adblock-
bypass config — not the stream URL. The actual stream URL is in a
different inline `<script>` block of the embed HTML.
2. **The real decoder is base64 + XOR with a hardcoded key**, the key
appears literally in the HTML (e.g. `var k="bux7ver6mow4trh1"`).
No browser-derived inputs. We can run it in Python in 50µs.
3. **The decoded URL is JWT-bound to /24 of the requestor's IP**. JWT
payload: `{stream, ip:"176.12.22.0/24", session_id, exp}`. From our
cluster (egress 176.12.22.76) the JWT IP-binding is satisfied.
4. **The origin still returns 404 (GET) / 403 (HEAD).** Tested both
curated embeds (Sky F1 888520f3..., DAZN F1 fc3a5463...) — same
404. Origin landing page (`/`) returns 200, so the host is up;
the `/sec/<JWT>/<embed_id>.m3u8` endpoint specifically refuses.
5. **No fingerprint surface trips this.** Runtime trace via
chrome-service hooks confirmed: decoder reads navigator.userAgent
(heavy), screen dimensions, and a single WebGL getParameter call.
No canvas, audio, fonts, fetch-to-fingerprint-API. JW Player setup
is given a valid file URL — the playlist stays empty because JW
can't fetch the manifest from the (dead) origin.
Verdict: **the legacy curated hmembeds embeds (`888520f3...` Sky F1,
`fc3a5463...` DAZN F1) are upstream-dead.** No browser-side fix is
possible. The community uses these IDs as "24/7 channels" but they're
in a perpetually-offline state right now.
This commit ships the offline decoder anyway, registered as a new
extractor. Two reasons:
- If those origins come back online, no code change needed.
- Future curated hmembeds IDs (added by hand or discovered via
subreddit posts) will resolve through the same path.
Files added: `extractors/hmembeds.py` (~120 lines incl. the decoder
and a `decode_embed(html) -> str | None` helper that's reusable).
Registered in `__init__.py`. The existing CuratedExtractor stays
disabled; this replaces its mechanism with one that can absorb new
embed IDs without code changes.
Bonus from the agent work:
- Confirmed our stealth.js is sufficient — the runtime trace showed
the decoder reads only the surfaces we already cover.
- Identified ~10 fingerprint surfaces we don't spoof (platform,
userAgentData, hardwareConcurrency, deviceMemory, timezone,
AudioContext, ICE candidates) but proved they're not what's
blocking us, so no change needed for now.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>