[ci skip] f1-stream: replace Go service with Python/FastAPI skeleton

Replaces the existing Go-based f1-stream service with a new Python/FastAPI
backend as the foundation for the rebuilt F1 streaming aggregation service.

- New FastAPI backend with health and root endpoints
- Python 3.13 slim Dockerfile (replaces Go multi-stage build)
- Updated Terraform deployment (port 8000, reduced resources)
- Buildx-based redeploy.sh with --platform linux/amd64
- Added Woodpecker CI pipeline for automated builds
- Removed all old Go source, node_modules, static assets
This commit is contained in:
Viktor Barzin 2026-02-23 22:47:06 +00:00
parent c5a4c6e97b
commit f3bcd95242
80 changed files with 81 additions and 13879 deletions

30
.woodpecker/f1-stream.yml Normal file
View file

@ -0,0 +1,30 @@
when:
event: push
path: "stacks/f1-stream/files/**"
clone:
git:
image: woodpeckerci/plugin-git
settings:
attempts: 5
backoff: 10s
steps:
- name: build-image
image: woodpeckerci/plugin-docker-buildx
settings:
username: "viktorbarzin"
password:
from_secret: dockerhub-pat
repo: viktorbarzin/f1-stream
dockerfile: stacks/f1-stream/files/Dockerfile
context: stacks/f1-stream/files
platforms: linux/amd64
provenance: false
auto_tag: true
- name: deploy
image: bitnami/kubectl
commands:
- kubectl -n f1-stream rollout restart deployment f1-stream
- kubectl -n f1-stream rollout status deployment f1-stream --timeout=120s

View file

@ -1,78 +0,0 @@
# F1 Stream
## What This Is
A self-hosted web app that aggregates live Formula 1 streaming links from Reddit and user submissions, presenting them in a clean UI with embedded iframes. It scrapes r/motorsportsstreams2, allows users to submit their own stream URLs, and provides admin controls for content moderation. Built in Go with vanilla JS frontend, deployed on Kubernetes.
## Core Value
Users can find working F1 streams quickly — the app automatically discovers, validates, and surfaces healthy streams while removing dead ones.
## Requirements
### Validated
<!-- Shipped and confirmed valuable. Inferred from existing codebase. -->
- ✓ Reddit scraper polls r/motorsportsstreams2 for F1-related posts — existing
- ✓ URL extraction from post bodies and comment trees — existing
- ✓ F1 keyword filtering with negative keyword exclusion — existing
- ✓ Domain filtering (reddit, imgur, youtube, twitter excluded) — existing
- ✓ Deduplication via normalized URLs — existing
- ✓ User stream submission (anonymous + authenticated) — existing
- ✓ WebAuthn passwordless authentication — existing
- ✓ Admin approval workflow for user-submitted streams — existing
- ✓ HTTP proxy with rate limiting, private IP blocking, CSP stripping — existing
- ✓ Static frontend with iframe-based stream viewing — existing
- ✓ Default seed streams on first run — existing
- ✓ Stale link cleanup (24h) — existing
- ✓ Client-side health sort (reorder by reachability) — existing
### Active
<!-- Current scope. Building toward these. -->
- [ ] Scraper validates extracted URLs look like actual streams (video/player content), not random links
- [ ] Server-side health checker runs every 5 minutes against all known streams
- [ ] Health check: HTTP reachability check first, then proxy-fetch to detect video/player markers
- [ ] Configurable health check timeout
- [ ] Streams marked unhealthy after 5 consecutive check failures get hidden from public page
- [ ] Unhealthy streams retried on each check cycle — restored if they recover
- [ ] Scraped streams that pass health checks auto-published to main streams page
- [ ] Dead streams dynamically removed from the page without manual intervention
- [ ] Health status persisted (failure count, last check time, healthy/unhealthy state)
### Out of Scope
- Database migration (SQLite/PostgreSQL) — file-based storage is fine for this scope
- Multiple subreddit sources — stick with r/motorsportsstreams2 for now
- Real-time WebSocket push of stream status — polling is sufficient
- Mobile app — web-only
- OAuth/social login — WebAuthn is sufficient
## Context
- The app runs on a personal Kubernetes cluster, deployed via Terraform
- Single-user / small-group usage — performance at scale is not a concern
- The existing client-side `sortStreamsByHealth` does a basic `no-cors` fetch but can't inspect content; server-side checks via the proxy can do deeper validation
- Reddit's public JSON API requires no auth but rate-limits aggressively; the scraper already handles 429s with backoff
- Stream sites frequently go down, change URLs, or get taken down — health checking is essential for a good UX
## Constraints
- **Tech stack**: Go backend, vanilla JS frontend — no new frameworks or dependencies unless strictly necessary
- **Storage**: File-based JSON — no database
- **Deployment**: Docker container on Kubernetes, single replica
- **Reddit API**: Public JSON endpoints, must respect rate limits (1 req/sec delay already in place)
## Key Decisions
| Decision | Rationale | Outcome |
|----------|-----------|---------|
| Server-side health checks over client-side only | Client can't inspect response content (CORS); server proxy can detect video markers | — Pending |
| 5 consecutive failures before hiding | Avoids flapping — streams that are temporarily down aren't immediately removed | — Pending |
| Auto-publish scraped streams that pass health | Reduces manual admin work; the health check is the quality gate | — Pending |
| Health check every 5 minutes | Balances freshness vs. load — streams don't change status that frequently | — Pending |
---
*Last updated: 2026-02-17 after initialization*

View file

@ -1,115 +0,0 @@
# Requirements: F1 Stream
**Defined:** 2026-02-17
**Core Value:** Users can find working F1 streams quickly — the app automatically discovers, validates, and surfaces healthy streams while removing dead ones.
## v1 Requirements
Requirements for initial release. Each maps to roadmap phases.
### Scraper Validation
- [ ] **SCRP-01**: Scraper filters Reddit posts by F1 keywords before extracting URLs (existing behavior, preserve)
- [ ] **SCRP-02**: Scraper validates each extracted URL by proxy-fetching it and checking for video/player content markers (video tags, HLS/DASH manifests, common player libraries)
- [ ] **SCRP-03**: URLs that don't look like streams (no video markers detected) are discarded before saving
- [ ] **SCRP-04**: Validation has a configurable timeout (default 10s) to avoid blocking on slow sites
### Health Checking
- [ ] **HLTH-01**: Background health checker service runs every 5 minutes against all known streams (scraped + user-submitted)
- [ ] **HLTH-02**: Health check performs HTTP reachability check first (does the URL respond with 2xx?)
- [ ] **HLTH-03**: If HTTP check passes, health checker proxy-fetches the page and checks for video/player content markers
- [ ] **HLTH-04**: Health check has a configurable timeout per check (default 10s)
- [ ] **HLTH-05**: Each stream tracks consecutive failure count, last check time, and healthy/unhealthy status in persisted state
- [ ] **HLTH-06**: Stream marked unhealthy after 5 consecutive health check failures
- [ ] **HLTH-07**: Unhealthy streams hidden from public streams page (`GET /api/streams/public`)
- [ ] **HLTH-08**: Unhealthy streams continue to be checked — restored to healthy if they recover (failure count resets)
- [ ] **HLTH-09**: Health check interval configurable via `HEALTH_CHECK_INTERVAL` env var (default 5m)
### Auto-publish Pipeline
- [ ] **AUTO-01**: Scraped streams that pass both scraper validation and initial health check are auto-published to the main streams page
- [ ] **AUTO-02**: Dead streams (unhealthy after 5 failures) are dynamically removed from the public page without admin intervention
- [ ] **AUTO-03**: Auto-published streams are distinguishable from user-submitted streams in the data model (source field)
### Secure Embedding
- [ ] **EMBED-01**: Proxy fetches stream page and attempts to extract direct video source URL (HLS .m3u8, DASH .mpd, direct MP4/WebM, or embedded video player source)
- [ ] **EMBED-02**: When direct video source is found, render it in a minimal HTML5 video player on the app's own page (no third-party page loaded)
- [ ] **EMBED-03**: When direct extraction fails, fall back to rendering the full proxied page in a shadow DOM sandbox
- [ ] **EMBED-04**: Shadow DOM sandbox blocks `window.open`, `window.top` navigation, popup creation, and `alert`/`confirm`/`prompt`
- [ ] **EMBED-05**: Shadow DOM sandbox prevents access to parent page cookies and localStorage
- [ ] **EMBED-06**: Proxy strips known ad/tracker scripts and domains from proxied content before serving
- [ ] **EMBED-07**: Proxy rewrites relative URLs in proxied content to route through the proxy (so sub-resources load correctly)
- [ ] **EMBED-08**: All proxied content served with strict CSP headers scoped to the sandbox context
## v2 Requirements
Deferred to future release. Tracked but not in current roadmap.
### Enhanced Sources
- **SRC-01**: Support scraping from additional subreddits or Discord channels
- **SRC-02**: User-reported stream quality ratings
### UI Enhancements
- **UI-01**: Real-time WebSocket push of stream health status changes
- **UI-02**: Stream quality indicator (resolution, bitrate if detectable)
- **UI-03**: Stream viewer count or popularity metric
### Security Hardening
- **SEC-01**: Ad-blocker filter list integration (uBlock Origin lists)
- **SEC-02**: JavaScript AST analysis for malicious patterns before allowing execution
## Out of Scope
| Feature | Reason |
|---------|--------|
| Database migration (SQLite/PostgreSQL) | File-based storage is sufficient for current scale |
| Multiple replica deployment | Single-user/small-group usage, single replica is fine |
| Mobile app | Web-only, responsive design sufficient |
| OAuth/social login | WebAuthn already works |
| Full browser automation (Puppeteer/Playwright) | Too heavy for stream validation; HTTP-based checks are sufficient |
| Video transcoding/re-streaming | Out of scope — we link to or proxy existing streams |
## Traceability
Which phases cover which requirements. Updated during roadmap creation.
| Requirement | Phase | Status |
|-------------|-------|--------|
| SCRP-01 | Phase 1 | Pending |
| SCRP-02 | Phase 1 | Pending |
| SCRP-03 | Phase 1 | Pending |
| SCRP-04 | Phase 1 | Pending |
| HLTH-01 | Phase 2 | Pending |
| HLTH-02 | Phase 2 | Pending |
| HLTH-03 | Phase 2 | Pending |
| HLTH-04 | Phase 2 | Pending |
| HLTH-05 | Phase 2 | Pending |
| HLTH-06 | Phase 2 | Pending |
| HLTH-07 | Phase 2 | Pending |
| HLTH-08 | Phase 2 | Pending |
| HLTH-09 | Phase 2 | Pending |
| AUTO-01 | Phase 3 | Pending |
| AUTO-02 | Phase 3 | Pending |
| AUTO-03 | Phase 3 | Pending |
| EMBED-01 | Phase 4 | Pending |
| EMBED-02 | Phase 4 | Pending |
| EMBED-03 | Phase 5 | Pending |
| EMBED-04 | Phase 5 | Pending |
| EMBED-05 | Phase 5 | Pending |
| EMBED-06 | Phase 5 | Pending |
| EMBED-07 | Phase 5 | Pending |
| EMBED-08 | Phase 5 | Pending |
**Coverage:**
- v1 requirements: 24 total
- Mapped to phases: 24
- Unmapped: 0
---
*Requirements defined: 2026-02-17*
*Last updated: 2026-02-17 after roadmap creation*

View file

@ -1,106 +0,0 @@
# Roadmap: F1 Stream
## Overview
This roadmap delivers server-side stream quality assurance and secure viewing. First, the scraper learns to validate that extracted URLs actually contain video content. Then a background health checker continuously monitors all streams. These combine into an auto-publish pipeline that surfaces good streams and hides dead ones without admin intervention. Finally, secure embedding replaces raw iframes with native video playback where possible and a hardened sandbox fallback for everything else.
## Phases
**Phase Numbering:**
- Integer phases (1, 2, 3): Planned milestone work
- Decimal phases (2.1, 2.2): Urgent insertions (marked with INSERTED)
Decimal phases appear between their surrounding integers in numeric order.
- [ ] **Phase 1: Scraper Validation** - Scraper validates extracted URLs contain video/player content before saving
- [ ] **Phase 2: Health Check Infrastructure** - Background service continuously monitors stream health and persists status
- [ ] **Phase 3: Auto-publish Pipeline** - Healthy scraped streams auto-publish; dead streams auto-hide
- [ ] **Phase 4: Video Extraction and Native Playback** - Extract direct video sources and play them in a native HTML5 player
- [ ] **Phase 5: Sandbox and Proxy Hardening** - Fallback rendering in a sandboxed shadow DOM with ad stripping and strict CSP
## Phase Details
### Phase 1: Scraper Validation
**Goal**: Scraped URLs are verified to contain actual video/player content before being stored, eliminating junk links at the source
**Depends on**: Nothing (extends existing scraper)
**Requirements**: SCRP-01, SCRP-02, SCRP-03, SCRP-04
**Success Criteria** (what must be TRUE):
1. Scraper still discovers F1-related posts from Reddit using keyword filtering (existing behavior preserved)
2. Each extracted URL is proxy-fetched and inspected for video/player content markers (video tags, HLS/DASH manifests, player libraries)
3. URLs without video content markers are discarded and do not appear in scraped.json
4. Validation respects a configurable timeout so slow sites do not block the scrape cycle
**Plans**: 1 plan
Plans:
- [ ] 01-01-PLAN.md — Add URL validation with video marker detection to scraper pipeline
### Phase 2: Health Check Infrastructure
**Goal**: All known streams are continuously monitored for health, with status persisted and unhealthy streams hidden from users
**Depends on**: Phase 1 (reuses content validation logic from scraper validation)
**Requirements**: HLTH-01, HLTH-02, HLTH-03, HLTH-04, HLTH-05, HLTH-06, HLTH-07, HLTH-08, HLTH-09
**Success Criteria** (what must be TRUE):
1. A background service checks every known stream (scraped and user-submitted) on a regular interval that defaults to 5 minutes and is configurable via environment variable
2. Each check performs HTTP reachability first, then proxy-fetches the page to verify video/player content markers
3. Each stream's health state (consecutive failure count, last check time, healthy/unhealthy flag) is persisted across restarts
4. A stream is hidden from the public streams page after 5 consecutive check failures, and restored if it later passes a check
5. Health check timeout per stream is configurable
**Plans**: 2 plans
Plans:
- [ ] 02-01-PLAN.md — HealthState model, store persistence, export HasVideoContent, create HealthChecker service
- [ ] 02-02-PLAN.md — Wire health checker in main.go, filter unhealthy streams from public API
### Phase 3: Auto-publish Pipeline
**Goal**: Scraped streams that pass validation and health checks appear on the public page automatically; dead streams disappear without admin action
**Depends on**: Phase 1, Phase 2
**Requirements**: AUTO-01, AUTO-02, AUTO-03
**Success Criteria** (what must be TRUE):
1. A scraped stream that passes scraper validation and its first health check is visible on the public streams page without any admin approval
2. A stream marked unhealthy (5 consecutive failures) is no longer visible on the public page, with no admin intervention required
3. Auto-published streams are distinguishable from user-submitted streams in the data model (source field tracks origin)
**Plans**: 1 plan
Plans:
- [ ] 03-01-PLAN.md — Add Source field to Stream model, create PublishScrapedStream, wire scraper auto-publish
### Phase 4: Video Extraction and Native Playback
**Goal**: When a stream URL contains an extractable video source, users watch it in a clean native HTML5 player instead of loading the third-party page
**Depends on**: Nothing (independent of phases 1-3; can be built in parallel but ordered here for delivery focus)
**Requirements**: EMBED-01, EMBED-02
**Success Criteria** (what must be TRUE):
1. The proxy can extract direct video source URLs (HLS .m3u8, DASH .mpd, direct MP4/WebM, or embedded player source attributes) from a stream page
2. When a direct video source is found, the user sees a minimal HTML5 video player on the app's own page playing the stream without loading the original third-party page
**Plans**: 2 plans
Plans:
- [ ] 04-01-PLAN.md — Backend video source extractor package and API endpoint
- [ ] 04-02-PLAN.md — Frontend native HTML5 video player with HLS.js and iframe fallback
### Phase 5: Sandbox and Proxy Hardening
**Goal**: When direct video extraction fails, the proxied page is rendered safely in a sandbox that blocks popups, ads, and access to the parent page
**Depends on**: Phase 4 (this is the fallback path when extraction fails)
**Requirements**: EMBED-03, EMBED-04, EMBED-05, EMBED-06, EMBED-07, EMBED-08
**Success Criteria** (what must be TRUE):
1. When direct video extraction fails, the full proxied page renders inside a shadow DOM sandbox on the app's page
2. The sandbox blocks window.open, top-frame navigation, popup creation, and alert/confirm/prompt dialogs
3. The sandbox prevents the proxied content from accessing parent page cookies and localStorage
4. Known ad/tracker scripts and domains are stripped from proxied content before serving, and relative URLs are rewritten to route through the proxy
5. All proxied content is served with strict CSP headers scoped to the sandbox context
**Plans**: 2 plans
Plans:
- [ ] 05-01-PLAN.md — Backend proxy hardening: HTML sanitizer with ad/tracker stripping, URL rewriting, and CSP headers
- [ ] 05-02-PLAN.md — Frontend shadow DOM sandbox replacing iframe fallback with API overrides
## Progress
**Execution Order:**
Phases execute in numeric order: 1 -> 2 -> 3 -> 4 -> 5
| Phase | Plans Complete | Status | Completed |
|-------|----------------|--------|-----------|
| 1. Scraper Validation | 0/1 | Planned | - |
| 2. Health Check Infrastructure | 0/2 | Planned | - |
| 3. Auto-publish Pipeline | 0/1 | Planned | - |
| 4. Video Extraction and Native Playback | 0/2 | Planned | - |
| 5. Sandbox and Proxy Hardening | 0/2 | Not started | - |

View file

@ -1,93 +0,0 @@
# Project State
## Project Reference
See: .planning/PROJECT.md (updated 2026-02-17)
**Core value:** Users can find working F1 streams quickly -- the app automatically discovers, validates, and surfaces healthy streams while removing dead ones.
**Current focus:** Phase 5: Sandbox Proxy Hardening (complete)
## Current Position
Phase: 5 of 5 (Sandbox Proxy Hardening)
Plan: 2 of 2 in current phase (complete)
Status: ALL PHASES COMPLETE -- project finished
Last activity: 2026-02-17 -- Completed 05-02 frontend shadow DOM sandbox
Progress: [██████████] 100%
## Performance Metrics
**Velocity:**
- Total plans completed: 8
- Average duration: 2.1min
- Total execution time: 0.28 hours
**By Phase:**
| Phase | Plans | Total | Avg/Plan |
|-------|-------|-------|----------|
| 01-scraper-validation | 1 | 3min | 3min |
| 02-health-check-infrastructure | 2 | 4min | 2min |
| 03-auto-publish-pipeline | 1 | 2min | 2min |
| 04-video-extraction-native-playback | 2 | 5min | 2.5min |
| 05-sandbox-proxy-hardening | 2 | 3min | 1.5min |
**Recent Trend:**
- Last 5 plans: 2min, 3min, 2min, 2min, 1min
- Trend: Stable
*Updated after each plan completion*
## Accumulated Context
### Decisions
Decisions are logged in PROJECT.md Key Decisions table.
Recent decisions affecting current work:
- Server-side health checks chosen over client-only (client can't inspect CORS responses)
- 5 consecutive failures threshold to avoid flapping
- Auto-publish for scraped streams that pass health check (health check is the quality gate)
- 5-minute health check interval (freshness vs load balance)
- String matching over DOM parsing for video detection (DOM reserved for Phase 4)
- 2MB body limit for HTML inspection to prevent memory issues
- 3 redirect limit to avoid infinite redirect chains on stream sites
- HealthMap reads file without lock to avoid deadlock from cross-lock scenarios
- Single HasVideoContent call covers both reachability and content checks
- Orphaned health state entries pruned each cycle to prevent unbounded file growth
- URLs not in health map assumed healthy to prevent new streams disappearing before first check
- HealthMap called within streamsMu/scrapedMu read locks safely via lock-free file read
- Source field uses string values (user/system/scraped) for readability over int enum
- PublishScrapedStream deduplicates by exact URL match; normalized matching stays in scraper layer
- Auto-publish iterates all validated links each cycle; deduplication makes repeat calls no-ops
- DOM parsing with golang.org/x/net/html for structured video source extraction (Phase 4)
- Dual extraction strategy: DOM walking + regex script parsing for maximum video URL coverage
- Priority ordering HLS > DASH > MP4 > WebM for frontend source selection
- 5-minute cache on extract endpoint to reduce upstream load
- Empty sources array (not error) when no video found to distinguish from fetch failures
- HLS.js loaded from jsDelivr CDN to avoid bundling complexity
- Extraction runs async after card render -- progressive enhancement with shadow DOM sandbox fallback
- DASH sources fall back to shadow DOM sandbox (dash.js too heavy for current scope)
- Silent console.log on extraction failure -- no user-facing errors for extraction issues
- 50+ ad/tracker domains in blocklist with parent-domain walk-up matching
- Inline scripts kept for video players; blocked scripts removed by domain
- CSP allows img/media/connect broadly since video sources come from arbitrary origins
- Non-HTML sub-resources proxied as-is with CSP headers
- Closed shadow DOM mode prevents external JS from accessing shadow root
- Script element created via createElement for execution in shadow DOM (innerHTML scripts don't execute)
- Direct link fallback when sandbox proxy fetch fails rather than broken state
### Pending Todos
None yet.
### Blockers/Concerns
None yet.
## Session Continuity
Last session: 2026-02-17
Stopped at: Completed 05-02-PLAN.md (frontend shadow DOM sandbox) -- ALL PHASES COMPLETE
Resume file: None

View file

@ -1,191 +0,0 @@
# Architecture
**Analysis Date:** 2026-02-17
## Pattern Overview
**Overall:** Layered monolithic service with clear separation between HTTP API layer, business logic, and persistent storage layer.
**Key Characteristics:**
- Single Go binary serving both API and static frontend
- File-based JSON persistence (no database)
- Modular internal packages for distinct concerns
- WebAuthn-based passwordless authentication
- Background scraper for content aggregation
- Rate-limited proxy service
## Layers
**HTTP Handler Layer:**
- Purpose: Accept and route HTTP requests, apply middleware, respond to clients
- Location: `internal/server/`
- Contains: Route registration, handler functions, middleware chains
- Depends on: Auth, Store, Proxy, Scraper packages
- Used by: HTTP clients (browser, mobile)
**Authentication & Authorization Layer:**
- Purpose: Manage user registration, login, sessions, and permission checks
- Location: `internal/auth/`
- Contains: WebAuthn ceremony implementations, session management, context helpers
- Depends on: Store, go-webauthn library
- Used by: Server middleware and handlers
**Business Logic Layer:**
- Purpose: Core domain operations (stream management, scraping, proxying)
- Location: `internal/scraper/`, `internal/proxy/`
- Contains: Scraper service (Reddit polling), Proxy service (content fetching with rate limiting)
- Depends on: Store for persistence
- Used by: Server, main entry point for orchestration
**Data Model Layer:**
- Purpose: Define domain types and interfaces
- Location: `internal/models/models.go`
- Contains: `User`, `Stream`, `ScrapedLink`, `Session` types
- Depends on: External WebAuthn library for credential types
- Used by: All layers
**Persistence Layer:**
- Purpose: Provide file-based storage abstraction
- Location: `internal/store/`
- Contains: JSON read/write helpers, file-based storage per entity type (streams, users, sessions, scraped links)
- Depends on: Models, filesystem
- Used by: All business logic layers
## Data Flow
**Stream Submission Flow:**
1. Client submits stream URL and title via `POST /api/streams`
2. Server handler validates URL format and length
3. Optional: If authenticated user, stream marked as unpublished; if anonymous, marked as published
4. Stream stored via `Store.AddStream()` which reads current `streams.json`, appends new stream, writes atomically
5. Response returned with stream metadata
**Authentication Flow (WebAuthn):**
1. User initiates registration with `POST /api/auth/register/begin` sending username
2. Server validates username format, checks uniqueness, creates temporary user
3. Server generates WebAuthn registration options via go-webauthn library
4. Server stores session data in memory with 5-minute expiry
5. Client performs attestation ceremony, sends credential via `POST /api/auth/register/finish?username=...`
6. Server retrieves in-memory session, validates with go-webauthn
7. Credential appended to user in `users.json`
8. Session token created in `sessions.json`, set as HttpOnly cookie
**Scraper Flow:**
1. Scraper runs on timer (default 15 minutes) or on manual trigger
2. Calls `scrapeReddit()` to poll r/motorsportsstreams2 new posts
3. Extracts URLs using regex, filters by F1-related keywords
4. Merges with existing `scraped.json`, deduplicating by normalized URL
5. Writes updated list atomically
6. Stale entries cleaned up, active ones returned via `GET /api/scraped`
**Proxy Flow:**
1. Client requests `GET /proxy?url=https://...`
2. Server validates URL scheme (must be HTTPS), length, and target is not private IP
3. Applies rate limiting via token bucket per client IP
4. Fetches URL with timeout, limits response body to 5MB
5. Injects `<base>` tag into HTML response for relative URL resolution
6. Strips X-Frame-Options and CSP headers to allow iframe embedding
7. Returns modified content
**Admin Approval Flow:**
1. Anonymous streams created with `Published: false`
2. Admin views all streams via `GET /api/admin/streams`
3. Admin toggles publication status via `PUT /api/streams/{id}/publish`
4. Published streams visible in `GET /api/streams/public`
**State Management:**
- **User Sessions:** In-memory WebAuthn ceremony sessions (5-minute TTL), persistent sessions in `sessions.json` with configurable TTL
- **Streams:** Fully loaded into memory from `streams.json` on each read/write, entire file rewritten atomically
- **Scraped Links:** Similar full-file pattern, deduplicated during scrape merge
- **Users:** Fully loaded per query, updated atomically per write
- **Cleanup:** Hourly cleanup of expired sessions via background goroutine
## Key Abstractions
**Store Interface (implicit):**
- Purpose: Encapsulate all file-based persistence operations
- Examples: `store.AddStream()`, `store.GetUserByName()`, `store.CreateSession()`
- Pattern: Each entity type has dedicated file; reads are lock-protected; writes are atomic (temp-file-then-rename)
**Auth Middleware Chain:**
- Purpose: Extract and validate user from session cookie, inject into request context
- Examples: `AuthMiddleware()`, `RequireAuth()`, `RequireAdmin()`
- Pattern: Composable handler functions that wrap next handler
**Scraper Service:**
- Purpose: Periodically fetch and aggregate content from external sources
- Examples: Background goroutine running on interval, triggered scrape
- Pattern: Mutex-protected scrape operations to prevent concurrent executions
**Proxy Handler:**
- Purpose: Fetch external content safely with rate limiting and framing bypass
- Examples: URL validation, private IP blocking, rate limiting per IP, HTML base tag injection
- Pattern: Implements `http.Handler` interface, maintains per-IP token bucket state
## Entry Points
**HTTP Server (`main.go`):**
- Location: `main.go`
- Triggers: Process start
- Responsibilities: Initialize all services, configure routes, handle graceful shutdown on SIGTERM/SIGINT
**Handler Routes (`internal/server/server.go`):**
- Location: `internal/server/server.go:registerRoutes()`
- Pattern: All routes defined in single function, middleware applied uniformly
- Public endpoints: Health, public streams, public scraped links
- Authenticated endpoints: Personal streams, submit stream, delete stream
- Admin endpoints: All streams, toggle publish, trigger scrape
**Background Services:**
- Scraper: Started in goroutine at startup via `scraper.Run(ctx)`
- Session cleanup: Goroutine with hourly ticker
- Proxy rate-limit cleanup: Goroutine with 10-minute ticker
## Error Handling
**Strategy:** Error strings returned in JSON responses with appropriate HTTP status codes. Panics caught and logged by recovery middleware.
**Patterns:**
- Validation errors: `400 Bad Request`
- Authentication failures: `401 Unauthorized`
- Permission denied: `403 Forbidden`
- Resource not found: `404 Not Found`
- Duplicate entries: `409 Conflict`
- Server errors: `500 Internal Server Error`
- Rate limit exceeded: `429 Too Many Requests`
Errors include descriptive messages: `{"error":"username must be 3-30 chars, alphanumeric or underscore"}`
## Cross-Cutting Concerns
**Logging:** stdlib log package
- Request logging: Method, path, remote address via `LoggingMiddleware`
- Scraper logging: Intervals, timing, link counts
- Proxy logging: Fetch errors
- All goes to stdout
**Validation:**
- Username: 3-30 chars, alphanumeric + underscore
- URLs: Must be HTTP(S), max 2048 chars, proxy-only supports HTTPS
- HTML escaping on stream titles to prevent injection
**Authentication:**
- WebAuthn for registration/login (passwordless)
- Session tokens as HttpOnly, Secure, SameSite=Strict cookies
- Configurable session TTL (default 720 hours)
- First registered user becomes admin unless ADMIN_USERNAME env var set
**CORS/Origin Check:**
- Origin header validated on mutation requests (POST, PUT, DELETE)
- Allowed origins configurable via WEBAUTHN_ORIGIN env var (comma-separated)
- CSRF protection via origin validation
---
*Architecture analysis: 2026-02-17*

View file

@ -1,232 +0,0 @@
# Codebase Concerns
**Analysis Date:** 2026-02-17
## Tech Debt
**File-based JSON storage as primary data persistence:**
- Issue: All data (users, streams, sessions, scraped links) are stored as JSON files on disk with file-level locking. This is a fundamental scalability constraint.
- Files: `internal/store/store.go`, `internal/store/streams.go`, `internal/store/sessions.go`, `internal/store/users.go`, `internal/store/scraped.go`
- Impact:
- Non-atomic multi-file operations (e.g., DeleteStream reads all streams, filters, writes back). Race conditions possible if two deletes happen simultaneously.
- Entire file loaded into memory for any operation, even reads. With thousands of streams/sessions, this becomes slow and memory-inefficient.
- Sessions file grows unbounded until manual cleanup (CleanExpiredSessions runs hourly). Could cause memory/disk pressure.
- No transaction support, no rollback capability on failure.
- Fix approach: Migrate to a proper database (SQLite for simplicity, PostgreSQL for production). Keep JSON file for backup/export purposes only.
**In-memory WebAuthn ceremony session storage with no cleanup guarantee:**
- Issue: Registration and login ceremony session data stored in `Auth.regSessions` and `Auth.loginSessions` maps. Cleanup relies on goroutines that may not execute if server crashes.
- Files: `internal/auth/auth.go` (lines 27-29, 107-117, 230-239)
- Impact:
- Memory leak on server restarts: orphaned sessions never cleaned up.
- No recovery mechanism if goroutine misses cleanup window.
- Session hijacking if an attacker can predict/guess the cleanup timing.
- Fix approach: Either move ceremony sessions to persistent store or use a time.AfterFunc with guaranteed cleanup (still risky). Better: use signed JWTs for ceremony state instead of server-side storage.
**Scraper loads entire scraped links list into memory on every scrape:**
- Issue: `Scraper.scrape()` loads all existing links, filters and deduplicates them, then rewrites entire file.
- Files: `internal/scraper/scraper.go` (lines 46-92)
- Impact: With thousands of links, each 15-minute scrape cycle causes a large memory spike and full file rewrite. Inefficient deduplication logic (O(n) map lookups on every new link).
- Fix approach: With database migration, use INSERT OR IGNORE / upsert patterns. For now, batch process links in chunks and use database indexes for deduplication.
**No input validation on URL lengths beyond basic checks:**
- Issue: URL length limited to 2048 chars in two places (`internal/server/server.go` line 153, `internal/proxy/proxy.go` line 72), but no validation of URL structure beyond "starts with http/https" and HTTPS-only in proxy.
- Files: `internal/server/server.go` (lines 146-160), `internal/proxy/proxy.go` (lines 54-80)
- Impact: Malformed URLs could bypass checks and cause unexpected behavior in downstream systems. User submission streams could contain typos/malware links.
- Fix approach: Use a proper URL parsing library with validation. Whitelist domains for stream submissions. Consider regex validation for known stream site patterns.
**Hardcoded default streams in main.go:**
- Issue: Default stream URLs are hardcoded and point to external streaming sites that may become unavailable, redirect, or change terms of service.
- Files: `main.go` (lines 100-123)
- Impact: If any of these URLs break, users get broken default content. Sites could shut down or get legal takedown notices. Application appears to endorse/support these sites.
- Fix approach: Move to configuration file. Make seeding optional. Add stream validation/health checks before serving. Consider removing entirely if this is a liability concern.
**Proxy strips CSP headers without replacement:**
- Issue: `internal/proxy/proxy.go` deliberately strips `X-Frame-Options` and CSP headers (line 123) to allow iframe-based proxying. No security headers added back.
- Files: `internal/proxy/proxy.go` (lines 121-125)
- Impact: Proxied content loses all origin security protections. Could allow downstream attacks to run XSS, clickjacking, etc. in the proxy context. Injected `<base>` tag doesn't prevent all attacks.
- Fix approach: Add back a strict CSP policy scoped to the proxy origin. Implement iframe sandbox attributes. Add additional security headers (X-Content-Type-Options: nosniff, etc.).
## Security Considerations
**Authentication ceremony session fixation vulnerability:**
- Risk: Username used as session key for WebAuthn ceremonies (`Auth.BeginRegistration`, `Auth.BeginLogin`). Attacker could start ceremony for victim's account, then victim continues from attacker's session state.
- Files: `internal/auth/auth.go` (lines 107-108, 230-231)
- Current mitigation: None. Ceremony session stored in-memory and deleted after 5 minutes, but no CSRF token or state validation.
- Recommendations: Use cryptographically random state tokens for ceremony sessions instead of username. Store state in secure HTTP-only cookies or database. Validate state on finish.
**Rate limiting per-IP but no account lockout for failed authentication:**
- Risk: Brute force attacks on specific usernames are possible. Attacker can try many passwords (using different IPs) against a single account without consequence.
- Files: `internal/proxy/proxy.go` implements rate limiting (per-IP token bucket), but no equivalent exists for auth endpoints (`internal/auth/auth.go`).
- Current mitigation: WebAuthn makes guessing harder (passkeys), but early attack surface (BeginLogin endpoint) has no protection. Leaked user list could enable targeted attacks.
- Recommendations: Add per-username failure tracking. Lock account after N failed attempts. Add exponential backoff. Require captcha after threshold.
**CORS Origin validation incomplete:**
- Risk: `OriginCheck` middleware in `internal/server/middleware.go` (lines 71-93) only checks on non-GET requests. GET requests can still trigger state-changing operations (e.g., visiting a crafted link that proxies through the app).
- Files: `internal/server/middleware.go` (lines 74)
- Current mitigation: Proxy request uses query param, but no SameSite cookie attribute on proxy endpoint (only on session cookie).
- Recommendations: Require Origin header on all mutation requests. Consider using POST for scrape trigger. Add X-CSRF-Token validation.
**Admin user initialization has race condition:**
- Risk: First user to register becomes admin if `ADMIN_USERNAME` not set. Two concurrent registration requests could both see 0 users and both become admin.
- Files: `internal/auth/auth.go` (lines 83-91)
- Current mitigation: Relies on file-level locking in store operations, but store operations are done after the check (line 121), not atomic.
- Recommendations: Move first-user-is-admin logic into CreateUser transaction, or seed admin during initialization phase before accepting requests.
**Session token stored in http-only cookie but not marked Secure in non-HTTPS:**
- Risk: Cookie marked `Secure: r.TLS != nil` (line 187, 300). In development or non-HTTPS deployments, session token sent over plaintext HTTP.
- Files: `internal/auth/auth.go` (lines 187, 300)
- Current mitigation: None for non-HTTPS. Relies on deployment to enforce HTTPS.
- Recommendations: Always set Secure=true. Force HTTPS in production via HSTS header. Log warning if TLS is nil.
**Proxy does not validate Content-Type before injecting `<base>` tag:**
- Risk: Non-HTML responses (PDFs, images, binaries) could be corrupted by injecting `<base>` tag. Base64 encoded binary data could break.
- Files: `internal/proxy/proxy.go` (lines 104-119)
- Current mitigation: 5MB body size limit, but no content-type validation.
- Recommendations: Check Content-Type header before modification. Skip injection for non-HTML types. Use proper HTML parsing (e.g., golang.org/x/net/html) instead of string manipulation.
## Performance Bottlenecks
**Scraper Reddit parsing with inefficient comment recursion:**
- Problem: `walkComments` in `internal/scraper/reddit.go` (lines 245-260) recursively walks comment trees using JSON unmarshaling in each recursion level. Could cause O(n^2) behavior on deep comment threads.
- Files: `internal/scraper/reddit.go` (lines 245-260, 132-142)
- Cause: Each comment reply is unmarshaled separately. For a thread with 1000 nested replies, this could create 1000 unmarshaling operations.
- Improvement path: Pre-flatten comment tree or use iterative traversal instead of recursion. Cache unmarshaled comments during initial fetch.
**O(n) lookups on every store operation:**
- Problem: All store methods (GetUserByName, GetUserByID, FindStream by ID) iterate through entire in-memory list.
- Files: `internal/store/users.go` (lines 21-49), `internal/store/streams.go` (lines 12-52)
- Cause: File-based storage forces full-file loads. Even with caching, no indexing.
- Improvement path: With database migration, use indexed lookups. For now, maintain in-process cache with invalidation on updates.
**Rate limiter token bucket not garbage collected properly:**
- Problem: Buckets for old IPs are deleted every 10 minutes (bucketCleanup), but inactive users' buckets accumulate until cleanup cycle.
- Files: `internal/proxy/proxy.go` (lines 170-181)
- Cause: Cleanup is reactive, not triggered on write. High-traffic scenarios could have thousands of stale buckets in memory.
- Improvement path: Use sync.Map for lock-free reads. Implement heap-based cleanup timer per bucket instead of global interval.
**Entire streams/sessions list rewritten on every add/delete:**
- Problem: Adding one stream requires reading all streams, appending, and rewriting entire file. Deleting a session does the same.
- Files: `internal/store/streams.go` (lines 54-78, 80-103), `internal/store/sessions.go` (lines 22-44, 61-81)
- Cause: Atomic write pattern (writeJSON uses temp-file-then-rename), but forces full serialization.
- Improvement path: Migrate to database with transaction support. Implement write-ahead logging if staying with files.
## Fragile Areas
**Proxy string-based HTML manipulation is fragile:**
- Files: `internal/proxy/proxy.go` (lines 107-119)
- Why fragile: Uses string.Index to find `<head>` and `<html>` tags with string.ToLower comparisons. Cases like `<HEAD>` would be missed. Malformed HTML (missing closing tags, nested structures) could place `<base>` tag in wrong location.
- Safe modification: Use golang.org/x/net/html parser. Insert `<base>` into head node properly. Handle edge cases (no head, multiple heads, xhtml).
- Test coverage: No tests for proxy HTML injection logic. Edge cases untested.
**Auth ceremony cleanup relies on goroutines:**
- Files: `internal/auth/auth.go` (lines 112-117, 234-239)
- Why fragile: If goroutine is blocked or delayed, cleanup doesn't happen. No guarantee cleanup runs at correct time. Server crash loses all in-flight ceremonies.
- Safe modification: Use context deadlines instead of sleep timers. Implement cleanup on FinishRegistration/FinishLogin regardless of goroutine. Store ceremonies in database with TTL.
- Test coverage: No tests for ceremony timeout behavior. Hard to test goroutine cleanup timing.
**DeleteStream and related operations use string.Contains for error classification:**
- Files: `internal/server/server.go` (lines 196-203)
- Why fragile: Error messages must contain specific strings ("not authorized", "not found") for proper HTTP status mapping. Changing error text breaks error handling.
- Safe modification: Use error types (custom errors or error wrapping with errors.Is/As). Map error types to status codes centrally.
- Test coverage: No tests for error status code mapping.
**Scraper is single-threaded with mutex but TriggerScrape starts new goroutine:**
- Files: `internal/scraper/scraper.go` (lines 42-44, 46-92)
- Why fragile: Calling TriggerScrape while scrape() is running (locked) will queue a second scrape. If scrapes take >15 minutes, queue grows. No bounds on concurrent scrapes.
- Safe modification: Use atomic flag to prevent concurrent scrapes. Queue only one pending scrape. Timeout long-running scrapes.
- Test coverage: No tests for concurrent scrape behavior or queue limits.
**Admin check depends on user count atomicity:**
- Files: `internal/auth/auth.go` (lines 83-91)
- Why fragile: Check user count, then create user are separate operations. Two concurrent registrations both see count=0, both get admin. Later operation fails due to username uniqueness check, but by then both claimed to be admin.
- Safe modification: Move atomicity into CreateUser. Use database transaction.
- Test coverage: No concurrency tests for admin initialization.
## Scaling Limits
**All data files live on single filesystem:**
- Current capacity: Depends on disk size. Assuming 1GB available, JSON files with generous spacing could hold ~100k streams, users, or sessions before performance degrades.
- Limit: At 10k active users with 5 sessions each (50k sessions), sessions.json alone is >50MB uncompressed. Each read loads entire file.
- Scaling path: Migrate to database. Use SQLite for single-node, PostgreSQL for distributed. Implement sharding for sessions by user_id.
**In-memory rate limit buckets per IP:**
- Current capacity: ~100k unique IPs can be tracked before memory pressure (each bucket ~48 bytes).
- Limit: Behind a proxy/load balancer, all traffic appears from proxy IP, making per-IP limiting useless. Map grows indefinitely per proxy.
- Scaling path: Move rate limiting to reverse proxy/load balancer layer (nginx, Envoy). Or, extract real IP from X-Forwarded-For more carefully (currently does this, but assumes trust).
**Scraper single-threaded, only hits one subreddit:**
- Current capacity: 25 posts per run * 15-min interval = 100 posts/hour. Each post processes comments once. Total throughput ~1000-5000 URLs/hour depending on post depth.
- Limit: If stream demand increases or multiple subreddits need scraping, single scraper becomes bottleneck. No parallelism.
- Scaling path: Implement scraper pool. Scrape multiple subreddits in parallel. Move scraper to separate service. Implement distributed job queue.
**WebAuthn session storage grows unbounded until server restart:**
- Current capacity: Each ceremony session is ~1KB. 1000 concurrent registrations = 1MB. 100k in-flight = 100MB.
- Limit: Memory exhaustion if registrations are started but not finished (or attacker starts many ceremonies).
- Scaling path: Use database for ceremony sessions. Implement hard timeout (e.g., 5 min) enforced by scheduled cleanup task. Set max concurrent ceremonies.
## Dependencies at Risk
**go-webauthn/webauthn v0.15.0:**
- Risk: Security library. May have vulnerabilities. Check for updates regularly.
- Impact: Passkey authentication could be compromised if library has bugs.
- Migration plan: Keep updated. Monitor GitHub releases. Test updates before deploying.
**Hardcoded subreddit URL (reddit.com API):**
- Risk: Reddit API could change, add authentication requirements, or shut down /r/motorsportsstreams2 community.
- Impact: Scraper stops working entirely. No fallback stream sources.
- Migration plan: Implement abstraction for stream sources. Support multiple scraper backends (Reddit, Discord, Twitter, etc.). Add health checks for scraper endpoints.
## Test Coverage Gaps
**No tests for HTTP error handling:**
- What's not tested: Error status code mapping, error response formatting, error logging.
- Files: `internal/server/server.go` (all handlers), `internal/auth/auth.go` (all endpoints)
- Risk: Error responses could be inconsistent or leaky (exposing internal details). Status codes could be wrong.
- Priority: High
**No tests for concurrent store operations:**
- What's not tested: Race conditions in add/delete/update. Concurrent reads while write in progress.
- Files: `internal/store/streams.go`, `internal/store/sessions.go`, `internal/store/users.go`
- Risk: Data corruption or loss under load. Auth bypass if race condition allows duplicate users.
- Priority: High
**No tests for WebAuthn ceremony timeouts:**
- What's not tested: Behavior when ceremony session expires. Cleanup of orphaned sessions.
- Files: `internal/auth/auth.go` (BeginRegistration, FinishRegistration, BeginLogin, FinishLogin)
- Risk: Session fixation, orphaned memory, unexpected behavior on retry.
- Priority: Medium
**No tests for proxy HTML injection:**
- What's not tested: Edge cases (malformed HTML, no head tag, nested structures). Security implications (XSS prevention, CSP).
- Files: `internal/proxy/proxy.go` (ServeHTTP)
- Risk: Injected tags could be placed incorrectly. Proxied content could break. Security headers could be ineffective.
- Priority: Medium
**No tests for rate limiter token bucket algorithm:**
- What's not tested: Burst capacity behavior, refill rate, edge cases (high request volume, time skew).
- Files: `internal/proxy/proxy.go` (allowRequest, cleanBuckets)
- Risk: Rate limiting could be too strict or too lenient. Cleanup could fail to run.
- Priority: Medium
**No tests for admin initialization logic:**
- What's not tested: First user gets admin flag. Edge cases with concurrent registrations. Behavior when ADMIN_USERNAME is set.
- Files: `internal/auth/auth.go` (BeginRegistration, lines 83-91)
- Risk: Non-admin user gets admin flag (privilege escalation). Two admins created unexpectedly.
- Priority: High
**No integration tests for full auth flow:**
- What's not tested: Complete registration + login + logout cycle. Error recovery. Session expiration.
- Files: All of `internal/auth/auth.go` and `internal/server/server.go` auth endpoints.
- Risk: Subtle bugs in ceremony sequencing. Auth logic could break without being detected.
- Priority: High
**No tests for scraper Reddit parsing:**
- What's not tested: Comment tree recursion. URL extraction. F1 keyword matching. Deduplication logic.
- Files: `internal/scraper/reddit.go`
- Risk: Scraper could miss streams, extract bad URLs, or fail on unexpected Reddit response format.
- Priority: Medium
---
*Concerns audit: 2026-02-17*

View file

@ -1,159 +0,0 @@
# Coding Conventions
**Analysis Date:** 2026-02-17
## Naming Patterns
**Files:**
- Go packages: lowercase, single word when possible (e.g., `auth`, `store`, `proxy`)
- Go files: lowercase with descriptive names (e.g., `server.go`, `middleware.go`, `reddit.go`)
- JSON files: snake_case (e.g., `users.json`, `sessions.json`, `scraped_links.json`)
- JavaScript files: camelCase (e.g., `app.js`, `auth.js`, `streams.js`)
**Functions and Methods:**
- Go: PascalCase for exported functions (e.g., `New`, `BeginRegistration`, `ServeHTTP`)
- Go: camelCase for unexported functions (e.g., `randomID`, `isF1Post`, `normalizeURL`)
- JavaScript: camelCase for all functions (e.g., `showToast`, `switchTab`, `doRegister`)
**Variables and Fields:**
- Go: camelCase for local variables (e.g., `streams`, `userID`, `sessionTTL`)
- Go: PascalCase for exported struct fields (e.g., `ID`, `Username`, `IsAdmin`)
- Go: prefixed mutex pattern: `resourceMu` for mutex protecting resource (e.g., `streamsMu`, `usersMu`, `sessionsMu`)
- JavaScript: camelCase for all variables (e.g., `currentUser`, `beginResp`, `container`)
**Types and Constants:**
- Go: PascalCase for exported types (e.g., `Server`, `Auth`, `Store`, `User`)
- Go: camelCase for unexported types (e.g., `contextKey`, `bucket`, `redditListing`)
- Go: SCREAMING_SNAKE_CASE for constants (e.g., `maxBodySize`, `rateLimit`, `bucketCleanup`)
**Interfaces:**
- Go context keys use private types with exported constants (e.g., `type contextKey string; const userKey contextKey = "user"`)
## Code Style
**Formatting:**
- Language: Go (no automated formatter config detected, using standard gofmt conventions)
- Import organization: Standard library → local packages (separated by blank line)
- File layout: Package declaration → Imports → Constants/Variables → Types → Functions
**Linting:**
- No eslint or golangci-yml configuration found
- Go code follows idiomatic Go conventions: error checking, defer cleanup, interface composition
## Import Organization
**Go Order:**
1. Standard library imports (context, encoding/json, fmt, log, etc.)
2. Blank line
3. Local f1-stream packages (internal/auth, internal/models, etc.)
4. Blank line
5. External third-party packages (github.com/...)
**Example from `internal/auth/auth.go`:**
```go
import (
"crypto/rand"
"encoding/json"
"fmt"
"log"
"net/http"
"regexp"
"sync"
"time"
"f1-stream/internal/models"
"f1-stream/internal/store"
"github.com/go-webauthn/webauthn/webauthn"
)
```
**JavaScript:**
- No explicit import organization (vanilla JavaScript, no modules)
- HTML file loads scripts in order: utils → app → auth/streams
## Error Handling
**Patterns:**
- Go: Explicit error return as second value (e.g., `err := operation(); if err != nil { return err }`)
- Go: Wrapping errors with context: `fmt.Errorf("operation failed: %w", err)`
- Go: String matching on error messages for classification (see `internal/server/server.go` line 196-205)
- Go: Logging errors with `log.Printf()` for non-critical failures, `log.Fatalf()` for startup errors
- Go: HTTP errors returned via `http.Error(w, message, statusCode)` for API endpoints
- JavaScript: Try-catch blocks for async operations, error fields in UI (e.g., `errEl.textContent = err.error || 'Operation failed'`)
**HTTP Error Responses:**
- Standard JSON format: `{"error":"description"}`
- Success responses vary by endpoint (JSON arrays, `{"ok":true}`, encoded objects via `json.NewEncoder`)
## Logging
**Framework:** `log` package (standard library)
**Patterns:**
- Informational: `log.Printf("message with %v context", value)`
- Errors: `log.Printf("operation failed: %v", err)`
- Startup: `log.Fatalf("critical: %v", err)` for initialization failures
- Component prefixes: `log.Printf("scraper: action description")`
**Example from `internal/scraper/scraper.go`:**
```go
log.Printf("scraper: starting scrape")
log.Printf("scraper: error after %v: %v", time.Since(start).Round(time.Millisecond), err)
log.Printf("scraper: done in %v, added %d new links (total: %d)", time.Since(start).Round(time.Millisecond), added, len(existing))
```
## Comments
**When to Comment:**
- Explain WHY, not WHAT (code shows what, comments explain reasoning)
- Used for non-obvious logic or security concerns
- Example from `internal/proxy/proxy.go` line 123: `// Explicitly do NOT copy X-Frame-Options or CSP`
- Example from `internal/auth/auth.go` line 120: `// Store user temporarily - will be committed on finish`
**Patterns:**
- Short inline comments before complex sections
- Package-level comments before exported types explaining purpose
- Security/business logic gets explained
## Function Design
**Size:** Functions keep complexity low, typically 20-50 lines; larger operations split across helpers
**Parameters:**
- Receiver methods use pointer receivers: `func (s *Store) GetSession(token string) ...`
- Constructor pattern returns initialized type and error: `func New(...) (*Type, error)`
- HTTP handlers follow signature: `func(w http.ResponseWriter, r *http.Request)`
**Return Values:**
- Errors always returned as last value: `(result, error)`
- Multiple return values when needed: `(*Type, error)` or `([]Type, error)`
- HTTP handlers write directly to ResponseWriter, return via `http.Error()` or direct writes
- Query methods return nil for "not found" rather than error (see `internal/store/users.go` line 33)
## Module Design
**Exports:**
- Exported names start with capital letter (e.g., `New`, `User`, `Server`)
- Unexported helpers start with lowercase
- Types exported when they're part of public API
- Helper functions (e.g., `isF1Post`, `normalizeURL`) kept unexported
**Barrel Files:** Not used; single concerns per file
**Package Organization:**
- `internal/auth/`: Authentication and WebAuthn implementation
- `internal/store/`: Data persistence (users.go, streams.go, sessions.go, scraped.go, store.go)
- `internal/server/`: HTTP routing and middleware
- `internal/scraper/`: Reddit scraping logic
- `internal/proxy/`: HTTP proxy with rate limiting
- `internal/models/`: Type definitions only
**Struct Composition:**
- `Server` struct holds dependencies injected at construction (line 15-21 in `internal/server/server.go`)
- Methods extend functionality through receiver pattern
- No inheritance, composition via embedded types used sparingly
---
*Convention analysis: 2026-02-17*

View file

@ -1,121 +0,0 @@
# External Integrations
**Analysis Date:** 2026-02-17
## APIs & External Services
**Reddit API:**
- Service: Reddit public JSON API (no authentication required)
- What it's used for: Scraping F1 stream links from r/motorsportsstreams2 subreddit
- Fetches 25 most recent posts: `https://www.reddit.com/r/motorsportsstreams2/new.json?limit=25`
- Extracts URLs from post titles and comment bodies
- Filters by F1-related keywords
- Runs on configurable interval (default 15 minutes)
- Implementation: `internal/scraper/reddit.go`
- Authentication: None - public API endpoint
**HTTP Proxy Service:**
- Service: Internal HTTP proxy for accessing external streams
- What it's used for: Fetching and proxying external stream pages while enforcing security policies
- Rate limiting: 30 requests/minute per IP with 5-request burst capacity
- URL validation: Only HTTPS URLs allowed, 2048-character limit
- Private IP blocking: Blocks requests to loopback, private, and link-local addresses
- Content transformation: Injects `<base>` tag for relative URL resolution
- Strips X-Frame-Options and CSP headers to allow iframe embedding
- Implementation: `internal/proxy/proxy.go`
- Endpoint: `GET /proxy?url=[url]`
## Data Storage
**File Storage:**
- Type: Local filesystem (JSON files)
- Location: Configurable via `DATA_DIR` environment variable (default: `/data`)
- Persistence mechanism:
- Atomic writes using temp-file-then-rename pattern
- No database server required
- Files stored in flat structure:
- `streams.json` - User-submitted and scraped stream links
- `users.json` - User accounts with WebAuthn credentials
- `sessions.json` - Active user sessions
- `scraped.json` - Reddit-scraped links
- Client: Go `encoding/json` standard library with sync.RWMutex for thread-safe access
**Caching:**
- Type: None - file-based storage only
- Session cleanup: Automatic garbage collection every 1 hour
## Authentication & Identity
**Auth Provider:**
- Type: Custom WebAuthn/FIDO2 implementation
- Library: github.com/go-webauthn/webauthn v0.15.0
- Implementation details:
- Passwordless authentication using WebAuthn standard
- Registration ceremony: `POST /api/auth/register/begin``POST /api/auth/register/finish`
- Login ceremony: `POST /api/auth/login/begin``POST /api/auth/login/finish`
- Session tokens stored in HTTP-only, SameSite-strict cookies
- In-memory ceremony data storage with 5-minute expiration
- Manual admin assignment via `ADMIN_USERNAME` env var
- First user automatically becomes admin if no `ADMIN_USERNAME` set
- Files: `internal/auth/auth.go`, `internal/auth/context.go`
## Monitoring & Observability
**Error Tracking:**
- Type: None - no external error tracking service
- Implementation: Standard Go logging with `log` package
**Logs:**
- Format: Standard Go log output (stdout)
- Level: Info and error messages
- No centralized logging, no external integration
## CI/CD & Deployment
**Hosting:**
- Platform: Kubernetes (Terraform module at `infra/modules/kubernetes/f1-stream/`)
- Deployment method: Container image
**CI Pipeline:**
- Type: Not detected in this codebase
- Build method: Dockerfile multi-stage build
- Builder: golang:1.23-alpine with `go mod download`
- Runtime: alpine:3.20 with minimal dependencies
## Environment Configuration
**Required env vars (with defaults):**
- `LISTEN_ADDR` - Server listen address (default: `:8080`)
- `DATA_DIR` - Data storage directory (default: `/data`)
- `SCRAPE_INTERVAL` - Reddit scraper frequency (default: 15m)
- `SESSION_TTL` - Session expiration (default: 720h)
- `PROXY_TIMEOUT` - Proxy request timeout (default: 10s)
- `WEBAUTHN_RPID` - Relying party ID (default: `localhost`)
- `WEBAUTHN_ORIGIN` - Origin URL list, comma-separated (default: `http://localhost:8080`)
- `WEBAUTHN_DISPLAY_NAME` - UI display name (default: `F1 Stream`)
- `ADMIN_USERNAME` - Optional: pre-set admin username (no default)
**Secrets location:**
- No secrets required - uses WebAuthn credentials stored locally
- CORS origin validation via `WEBAUTHN_ORIGIN` env var
## Webhooks & Callbacks
**Incoming:**
- None detected
**Outgoing:**
- None detected
## Stream Link Sources
**Default Stream URLs (hardcoded in main.go):**
1. `https://wearechecking.live/streams-pages/motorsports` - WeAreChecking Motorsports
2. `https://vipleague.im/formula-1-schedule-streaming-links` - VIPLeague F1
3. `https://www.vipbox.lc/` - VIPBox
4. `https://f1box.me/` - F1Box
5. `https://1stream.vip/formula-1-streams/` - 1Stream F1
---
*Integration audit: 2026-02-17*

View file

@ -1,109 +0,0 @@
# Technology Stack
**Analysis Date:** 2026-02-17
## Languages
**Primary:**
- Go 1.24.1 - Backend application and main server logic
**Secondary:**
- HTML/CSS/JavaScript - Frontend UI
## Runtime
**Environment:**
- Go runtime (compiled binary)
**Container Runtime:**
- Docker/Alpine Linux (3.20) - Production deployment target
- Multi-stage Dockerfile with golang:1.23-alpine builder
**Package Manager:**
- Go modules (go.mod/go.sum)
## Frameworks
**Core:**
- Standard Go `net/http` - HTTP server and routing
- Native http.ServeMux for route handling (Go 1.22+ pattern routing)
- Native http.FileServer for static file serving
- Native http.Handler interface for middleware
**Authentication:**
- github.com/go-webauthn/webauthn v0.15.0 - WebAuthn/FIDO2 authentication
- Handles registration and login ceremonies
- Supports multiple credential types
**Frontend:**
- HTML5 - Markup
- CSS - Styling (Pico CSS framework for minimal styling)
- Vanilla JavaScript - Client-side interactivity (no framework detected)
## Key Dependencies
**Critical:**
- github.com/go-webauthn/webauthn v0.15.0 - Passwordless authentication via WebAuthn
- Includes transitive dependencies:
- github.com/go-webauthn/x v0.1.26 - WebAuthn extension support
- github.com/golang-jwt/jwt/v5 v5.3.0 - JWT token handling
- github.com/google/go-tpm v0.9.6 - TPM support for credentials
- github.com/fxamacker/cbor/v2 v2.9.0 - CBOR encoding/decoding
- github.com/go-viper/mapstructure/v2 v2.4.0 - Configuration mapping
- github.com/google/uuid v1.6.0 - UUID generation
- golang.org/x/crypto v0.43.0 - Cryptographic primitives
- golang.org/x/sys v0.37.0 - System-level primitives
**Infrastructure:**
- None detected (no external databases, queues, or third-party services in go.mod)
- File-based storage only
## Configuration
**Environment Variables:**
- `LISTEN_ADDR` - Server listen address (default: `:8080`)
- `DATA_DIR` - Data storage directory (default: `/data`)
- `SCRAPE_INTERVAL` - Reddit scraper interval (default: 15 minutes)
- `ADMIN_USERNAME` - Admin account username (optional)
- `SESSION_TTL` - Session expiration time (default: 720 hours)
- `PROXY_TIMEOUT` - HTTP proxy request timeout (default: 10 seconds)
- `WEBAUTHN_RPID` - WebAuthn relying party ID (default: `localhost`)
- `WEBAUTHN_ORIGIN` - WebAuthn origin URL (default: `http://localhost:8080`)
- `WEBAUTHN_DISPLAY_NAME` - WebAuthn display name (default: `F1 Stream`)
**Build:**
- `Dockerfile` - Multi-stage Docker build
- Builder stage: golang:1.23-alpine with CGO_ENABLED=0
- Runtime stage: alpine:3.20 with ca-certificates
- Exposes port 8080
## Platform Requirements
**Development:**
- Go 1.24.1 or compatible
- Unix-like shell (bash/zsh) for build scripts
- Optional: Docker for containerized development
**Production:**
- Kubernetes cluster (Terraform module structure suggests K8s deployment)
- Persistent volume for `/data` directory
- Port 8080 exposed for HTTP traffic
- ca-certificates for HTTPS proxying
## Storage
**Data Persistence:**
- File-based JSON storage in `DATA_DIR`
- Files: `streams.json`, `users.json`, `sessions.json`, `scraped.json`
- Atomic writes using temp-file-then-rename pattern (`writeJSON` function in `internal/store/store.go`)
## External Data Sources
**Reddit API:**
- URL: `https://www.reddit.com/r/motorsportsstreams2/new.json?limit=25`
- No authentication required (public subreddit)
- Used for scraping F1 stream links
---
*Stack analysis: 2026-02-17*

View file

@ -1,202 +0,0 @@
# Codebase Structure
**Analysis Date:** 2026-02-17
## Directory Layout
```
f1-stream/
├── main.go # Entry point, service initialization, signal handling
├── go.mod # Go module definition
├── go.sum # Dependency lock file
├── Dockerfile # Container image definition
├── redeploy.sh # Kubernetes redeployment script
├── index.html # HTML template served at root
├── internal/ # Private Go packages
│ ├── auth/ # WebAuthn authentication and session management
│ ├── models/ # Domain data types
│ ├── server/ # HTTP handlers, routes, middleware
│ ├── store/ # File-based persistence layer
│ ├── scraper/ # Reddit content scraper
│ └── proxy/ # HTTP proxy with rate limiting
├── static/ # Frontend assets served to clients
│ ├── index.html # Main SPA template
│ ├── css/ # Stylesheets
│ └── js/ # Client-side JavaScript modules
└── .planning/ # Planning/documentation directory
└── codebase/ # Architecture analysis documents
```
## Directory Purposes
**Root Level:**
- Purpose: Service configuration and entry point
- Contains: Go module, main executable, Docker configuration, shell scripts
- Key files: `main.go` (service bootstrap), `go.mod` (dependencies)
**`internal/`:**
- Purpose: Private packages (not importable by external code)
- Contains: All business logic, separated by concern
- Key pattern: Each subdirectory is a distinct Go package with clear responsibility
**`internal/auth/`:**
- Purpose: User authentication, session management, context helpers
- Contains: WebAuthn ceremony handlers, session token management, user-in-context utilities
- Key files:
- `auth.go`: Registration/login handlers, ceremony session storage, credential validation
- `context.go`: Request context helpers for passing user data between middleware and handlers
**`internal/models/`:**
- Purpose: Domain model definitions
- Contains: User, Stream, ScrapedLink, Session type definitions
- Key files: `models.go` (all types, includes WebAuthn interface implementations)
**`internal/server/`:**
- Purpose: HTTP API and routing layer
- Contains: Handler functions, route registration, middleware implementations
- Key files:
- `server.go`: Server struct, route registration, API handlers (streams, admin, public endpoints)
- `middleware.go`: LoggingMiddleware, RecoveryMiddleware, AuthMiddleware, RequireAuth, RequireAdmin, OriginCheck
**`internal/store/`:**
- Purpose: Persistent storage abstraction over file system
- Contains: JSON file operations, per-entity storage methods, atomic write patterns
- Key files:
- `store.go`: Store struct, directory initialization, JSON helper functions (readJSON, writeJSON)
- `streams.go`: Stream CRUD operations, publish toggle, seeding
- `users.go`: User lookup, credential updates, admin count
- `sessions.go`: Session creation, validation, expiry cleanup
- `scraped.go`: Scraped link persistence, active link filtering
**`internal/scraper/`:**
- Purpose: Background content aggregation
- Contains: Interval-based scraper, Reddit-specific scraper logic
- Key files:
- `scraper.go`: Scraper service, interval-based run loop, manual trigger mechanism, deduplication logic
- `reddit.go`: Reddit API polling, F1 keyword filtering, URL extraction (not included in sample reads but referenced)
**`internal/proxy/`:**
- Purpose: HTTP content fetching with security controls and rate limiting
- Contains: Rate limiter, private IP validation, response modification
- Key files: `proxy.go` (implements http.Handler, rate limiting, content fetching, base tag injection)
**`static/`:**
- Purpose: Frontend assets served to browser
- Contains: HTML template and client-side code
- Key files:
- `index.html`: SPA HTML template (includes script tags loading js/)
- `js/app.js`: Toast notifications, dialog system, tab switching, initialization
- `js/auth.js`: Registration/login UI, WebAuthn client ceremony
- `js/streams.js`: Stream display, filtering, admin operations
- `js/utils.js`: Shared utilities (HTML escaping)
- `css/`: Stylesheets for app UI
## Key File Locations
**Entry Points:**
- `main.go`: Service initialization, dependency injection, signal handling, goroutine startup
**Configuration:**
- Environment variables read in `main.go` (LISTEN_ADDR, DATA_DIR, SCRAPE_INTERVAL, etc.)
- WebAuthn config passed to `auth.New()`
- `.env` files not tracked (see .gitignore)
**Core Logic:**
- Request routing: `internal/server/server.go:registerRoutes()`
- Auth logic: `internal/auth/auth.go`
- Data storage: `internal/store/store.go` and per-entity files
- Scraping: `internal/scraper/scraper.go`
- Proxying: `internal/proxy/proxy.go`
**Testing:**
- No test files present in codebase (see TESTING.md concerns section)
## Naming Conventions
**Files:**
- Go source files: lowercase with underscores (e.g., `auth.go`, `middleware.go`)
- JavaScript files: lowercase with hyphens or underscores (e.g., `app.js`, `auth.js`)
- JSON data files: lowercase (e.g., `streams.json`, `users.json`, `sessions.json`)
**Directories:**
- Go packages: lowercase, single word preferred (e.g., `auth`, `store`, `models`)
- Frontend assets: plural nouns (e.g., `static`, `css`, `js`)
**Functions:**
- Go: CamelCase (exported), camelCase (unexported)
- JavaScript: camelCase throughout (e.g., `loadPublicStreams()`, `showToast()`)
**Types:**
- Go structs: CamelCase (e.g., `User`, `Stream`, `Store`, `Auth`)
- Methods: CamelCase (e.g., `BeginLogin()`, `AddStream()`)
**Variables:**
- Go: camelCase (e.g., `listenAddr`, `dataDir`, `adminUsername`)
- JavaScript: camelCase (e.g., `container`, `userID`, `sessionToken`)
## Where to Add New Code
**New Feature (e.g., new stream filter):**
- Primary code: Add handler in `internal/server/server.go`, register route in `registerRoutes()`
- Store operations: Add method to appropriate file in `internal/store/` (likely `streams.go`)
- Frontend: Add UI in `static/` and API call in `static/js/streams.js` or new module
- Models: Extend types in `internal/models/models.go` if new fields needed
**New Authentication Method:**
- Core implementation: New file in `internal/auth/` (e.g., `oauth.go`)
- Handlers: Add methods following WebAuthn pattern (BeginXxx, FinishXxx)
- Routes: Register in `registerRoutes()`
- Frontend: Add form/button in `static/js/auth.js`
**New Background Service (e.g., content validator):**
- Implementation: New file in `internal/` or new package `internal/validator/`
- Integration: Initialize in `main()` alongside `scraper.New()`
- Lifecycle: Use context pattern from scraper's `Run(ctx)` method
- Storage: Use existing `Store` instance
**Utilities/Helpers:**
- Shared by Go packages: Add to package where most useful, or create new `internal/util/` package
- Shared by frontend: Add to `static/js/utils.js` or create new module
- Shared helpers pattern: Functions not tied to single package, used across multiple
## Special Directories
**`internal/`:**
- Purpose: Enforce package privacy (cannot be imported by external code)
- Generated: No
- Committed: Yes
**`static/`:**
- Purpose: Served directly to clients via `http.FileServer`
- Generated: No (hand-written frontend)
- Committed: Yes
**`.planning/codebase/`:**
- Purpose: Architecture documentation for development guidance
- Generated: No (manually created by mapping process)
- Committed: Yes
**Data Directory (runtime):**
- Purpose: Persistent JSON files (streams.json, users.json, sessions.json, scraped.json)
- Location: Specified by DATA_DIR env var (default `/data`)
- Generated: Yes (created on first run)
- Committed: No (varies per deployment environment)
## Import Patterns
**Go Package Imports:**
- Standard library first: `import ("context" "fmt" "log")`
- Internal packages second: `import ("f1-stream/internal/auth" "f1-stream/internal/store")`
- External third-party last: `import ("github.com/go-webauthn/webauthn/webauthn")`
**Cross-Package Dependencies:**
- Server depends on: Auth, Store, Proxy, Scraper, Models
- Auth depends on: Store, Models
- Scraper depends on: Store, Models
- Proxy depends on: none (standalone service)
- Store depends on: Models
- Models depends on: external WebAuthn library only
---
*Structure analysis: 2026-02-17*

View file

@ -1,256 +0,0 @@
# Testing Patterns
**Analysis Date:** 2026-02-17
## Test Framework
**Status:** No testing infrastructure present
**Runner:** Not detected
**Assertion Library:** Not applicable
**Run Commands:** Not applicable
## Test File Organization
**Current State:** Zero test files found
After scanning the codebase:
- No `*_test.go` files in `internal/` packages
- No `*.test.js` or `*.spec.js` files in static assets
- No test configuration files (jest.config.js, vitest.config.ts, etc.)
- No test runners in go.mod dependencies
## Test Coverage
**Requirements:** Not enforced; no test infrastructure
**Current Coverage:** 0% - no tests exist
## Test Types Present in Codebase
### Unit Test Candidates (Not Currently Tested)
**`internal/models/models.go`:**
- User and Stream model struct definitions
- WebAuthn interface implementations (lines 18-21)
**`internal/auth/auth.go`:**
- Username validation via regex `usernameRe` (line 19)
- Registration/login ceremony steps
- Session creation and token generation
- Admin user detection logic (lines 83-91)
**`internal/store/*.go` (all files):**
- JSON read/write operations with file locking
- User lookup and stream operations
- Session creation, validation, and cleanup
- Scraped link filtering and deduplication
**`internal/scraper/reddit.go`:**
- F1 post detection: `isF1Post()` function (line 272-285)
- URL normalization: `normalizeURL()` function (line 262-270)
- URL extraction: `extractURLs()` function (line 210-243)
- Comment walking: `walkComments()` function (line 245-260)
- Keyword matching logic (lines 29-45)
- Retry logic with backoff (lines 183-208)
**`internal/proxy/proxy.go`:**
- Rate limiting with token bucket algorithm (lines 145-168)
- Private host detection: `isPrivateHost()` function (line 128-143)
- Client IP extraction: `clientAddr()` function (line 184-191)
- Bucket cleanup mechanism (lines 170-182)
**`internal/server/middleware.go`:**
- Auth middleware context injection
- Authorization checks (RequireAuth, RequireAdmin)
- Origin validation for CSRF protection
- Panic recovery middleware
### Integration Test Candidates (Not Currently Tested)
**Authentication Flow:**
- Begin registration → finish registration → session creation
- Begin login → finish login → session creation
- Session validation and expiration
- WebAuthn ceremony with mock credentials
**Stream Management:**
- Add stream → save to JSON → retrieve
- Delete stream with authorization checks
- Toggle publish status
- Filter streams by visibility/ownership
**Scraping Pipeline:**
- Fetch Reddit listing
- Extract F1 posts
- Walk comments recursively
- Deduplicate URLs
- Merge with existing links
### E2E Test Candidates (Not Currently Tested)
**HTTP Endpoints:**
- Full registration flow (POST /api/auth/register/begin, /api/auth/register/finish)
- Full login flow (POST /api/auth/login/begin, /api/auth/login/finish)
- Stream CRUD operations
- Public stream viewing
- Scrape triggering and result retrieval
## Critical Untested Paths
**High Risk - Security:**
- Authentication middleware context injection (`internal/server/middleware.go` lines 33-43)
- Admin authorization checks (line 62)
- CSRF origin validation (line 78-88)
- Private address filtering in proxy (line 77-80 in `internal/proxy/proxy.go`)
- Rate limiting enforcement (line 62-65 in `internal/proxy/proxy.go`)
**High Risk - Data Integrity:**
- Concurrent access to store files via mutex protection (no verification that race conditions are prevented)
- JSON read/write atomicity with temp files (lines 41-52 in `internal/store/store.go`)
- Session expiration cleanup (lines 83-98 in `internal/store/sessions.go`)
- Stream deduplication during scraping (lines 65-85 in `internal/scraper/scraper.go`)
**Medium Risk - Business Logic:**
- F1 post detection with negative keywords (lines 272-285 in `internal/scraper/reddit.go`)
- URL normalization for deduplication (line 262-270)
- Retry logic with rate limit backoff (line 183-208)
## What Needs Testing
### Unit Test Suggestions
```go
// Example: Test username validation
func TestUsernameValidation(t *testing.T) {
tests := []struct {
username string
valid bool
}{
{"valid123", true},
{"valid_name", true},
{"ab", false}, // too short
{"invalid-char", false}, // invalid character
{"", false}, // empty
}
// usernameRe.MatchString(username) for each test case
}
// Example: Test F1 post detection
func TestIsF1Post(t *testing.T) {
tests := []struct {
title string
expected bool
}{
{"F1 GP Race - Monaco", true},
{"Formula 1 Practice", true},
{"Help with F1 key binding", false}, // negative keyword
{"Random post about cars", false},
}
// isF1Post(title) for each test case
}
// Example: Test URL normalization
func TestNormalizeURL(t *testing.T) {
// Check that different URL formats normalize to same string
// Check case-insensitivity and trailing slash handling
}
// Example: Test rate limiting
func TestRateLimiting(t *testing.T) {
p := New(10 * time.Second)
ip := "192.168.1.1"
// First burst allowed
for i := 0; i < 5; i++ {
if !p.allowRequest(ip) {
t.Fail()
}
}
// Burst exhausted
if p.allowRequest(ip) {
t.Fail()
}
// Wait and verify replenishment
time.Sleep(10 * time.Second)
if !p.allowRequest(ip) {
t.Fail()
}
}
```
### Integration Test Suggestions
```go
// Example: Test store operations with concurrency
func TestConcurrentStreamOperations(t *testing.T) {
st, _ := store.New(t.TempDir())
// Concurrent adds from multiple goroutines
// Verify no data corruption
// Verify final count is correct
}
// Example: Test scraper deduplication
func TestScraperDeduplication(t *testing.T) {
// Create scraper with test store
// Mock Reddit response with duplicate URLs
// Verify only unique URLs are stored
// Verify normalization works (http vs https, trailing slashes)
}
// Example: Test auth middleware
func TestAuthMiddleware(t *testing.T) {
st, _ := store.New(t.TempDir())
auth, _ := auth.New(st, ...)
// Create test token
// Make request with session cookie
// Verify user injected into context
}
```
## Recommended Testing Strategy
1. **Phase 1 - Unit Tests (Highest Priority):**
- Validation functions (username regex, F1 keywords)
- String utilities (URL normalization, truncate)
- Rate limiting algorithm
- Private host detection
2. **Phase 2 - Integration Tests:**
- Store operations with concurrency (verify mutex protection)
- Scraper pipeline (Reddit fetch → parse → deduplicate → save)
- Auth ceremony flow with mock WebAuthn
- Stream CRUD with permission checks
3. **Phase 3 - E2E Tests:**
- Full HTTP request flows
- Middleware chain validation
- Session management across endpoints
## Testing Patterns to Establish
**Once framework chosen (Go: testing or testify):**
- Use `t.TempDir()` for store tests to avoid file conflicts
- Mock HTTP responses for scraper tests
- Use `net/http/httptest` for handler testing
- Mock WebAuthn responses for auth tests
- Table-driven tests for validation logic
- Parallel test execution with `-race` flag for concurrency detection
**Coverage gaps to close:**
- All error paths in store operations
- Session expiration edge cases
- Concurrent access scenarios
- HTTP header validation
- CORS/origin validation
---
*Testing analysis: 2026-02-17*

View file

@ -1,12 +0,0 @@
{
"mode": "yolo",
"depth": "standard",
"parallelization": true,
"commit_docs": true,
"model_profile": "quality",
"workflow": {
"research": true,
"plan_check": true,
"verifier": true
}
}

View file

@ -1,237 +0,0 @@
---
phase: 01-scraper-validation
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- internal/scraper/validate.go
- internal/scraper/validate_test.go
- internal/scraper/scraper.go
- main.go
autonomous: true
requirements:
- SCRP-01
- SCRP-02
- SCRP-03
- SCRP-04
must_haves:
truths:
- "Scraper still discovers F1-related posts using keyword filtering (existing isF1Post behavior unchanged)"
- "Each newly scraped URL is fetched and inspected for video/player content markers before being saved to scraped_links.json"
- "URLs without video content markers are discarded and do not appear in scraped_links.json"
- "Validation uses a configurable timeout (SCRAPER_VALIDATE_TIMEOUT env var, default 10s) that prevents slow sites from blocking the scrape cycle"
artifacts:
- path: "internal/scraper/validate.go"
provides: "URL validation logic with video marker detection"
contains: "func validateLinks"
- path: "internal/scraper/validate_test.go"
provides: "Unit tests for marker detection and content type checks"
contains: "func TestContainsVideoMarkers"
- path: "internal/scraper/scraper.go"
provides: "Updated Scraper struct with validateTimeout field and validation call in scrape()"
contains: "validateTimeout"
- path: "main.go"
provides: "SCRAPER_VALIDATE_TIMEOUT env var configuration"
contains: "SCRAPER_VALIDATE_TIMEOUT"
key_links:
- from: "internal/scraper/scraper.go"
to: "internal/scraper/validate.go"
via: "validateLinks call in scrape() between URL extraction and merge"
pattern: "validateLinks\\(links"
- from: "main.go"
to: "internal/scraper/scraper.go"
via: "scraper.New() call with validateTimeout parameter"
pattern: "scraper\\.New\\(st.*validateTimeout"
---
<objective>
Add URL validation to the scraper pipeline so that each extracted URL is proxy-fetched and inspected for video/player content markers before being saved. URLs without video markers are discarded at the source.
Purpose: Eliminate junk links (blog posts, news articles, social media) from scraped results so users only see actual stream pages.
Output: Working validation step integrated into scraper pipeline, with unit tests and configurable timeout.
</objective>
<execution_context>
@/Users/viktorbarzin/.claude/get-shit-done/workflows/execute-plan.md
@/Users/viktorbarzin/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/01-scraper-validation/01-RESEARCH.md
@internal/scraper/scraper.go
@internal/scraper/reddit.go
@internal/models/models.go
@main.go
</context>
<tasks>
<task type="auto">
<name>Task 1: Create validate.go with video marker detection</name>
<files>internal/scraper/validate.go</files>
<action>
Create `internal/scraper/validate.go` in package `scraper` with the following:
1. Define `videoMarkers` string slice (case-insensitive markers checked against lowercased HTML body):
- HTML5: `<video`
- HLS: `.m3u8`, `application/x-mpegurl`, `application/vnd.apple.mpegurl`
- DASH: `.mpd`, `application/dash+xml`
- Player libraries: `hls.js`, `hls.min.js`, `dash.js`, `dash.all.min.js`, `video.js`, `video.min.js`, `videojs`, `jwplayer`, `clappr`, `flowplayer`, `plyr`, `shaka-player`, `mediaelement`, `fluidplayer`
2. Define `videoContentTypes` string slice for direct video Content-Type detection:
- `video/`, `application/x-mpegurl`, `application/vnd.apple.mpegurl`, `application/dash+xml`
3. Define `validateBodyLimit = 2 * 1024 * 1024` (2MB).
4. Implement `validateLinks(links []models.ScrapedLink, timeout time.Duration) []models.ScrapedLink`:
- Create `http.Client` with the given timeout and a `CheckRedirect` function that stops after 3 redirects (return `http.ErrUseLastResponse` when `len(via) >= 3`).
- Iterate over links. For each, call `hasVideoContent(client, link.URL)`. If true, keep the link. If false, log: `scraper: discarded %s (no video markers)` using `truncate(link.URL, 60)`.
- Return the filtered slice.
5. Implement `hasVideoContent(client *http.Client, rawURL string) bool`:
- Create GET request with `User-Agent` set to the existing `userAgent` constant from `reddit.go`.
- Execute request. On error, log and return false.
- Check status code: if < 200 or >= 400, return false.
- Check Content-Type header (lowercased) against `videoContentTypes` -- if match, return true (it's a direct video file).
- If Content-Type is not `text/html` or `application/xhtml`, return false (not a video file, not an HTML page to inspect).
- Read body with `io.LimitReader` (2MB limit).
- Return `containsVideoMarkers(strings.ToLower(string(body)))`.
6. Implement `containsVideoMarkers(loweredBody string) bool`:
- Iterate over `videoMarkers`, return true on first `strings.Contains` match.
7. Implement `isDirectVideoContentType(ct string) bool`:
- Lowercase ct, iterate `videoContentTypes`, return true on first `strings.Contains` match.
Imports needed: `io`, `log`, `net/http`, `strings`, `time`, and `f1-stream/internal/models`.
Do NOT use `golang.org/x/net/html` (reserved for Phase 4). This is detection, not extraction -- string matching is sufficient.
</action>
<verify>
Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./...` -- must compile without errors.
</verify>
<done>
`validate.go` exists with `validateLinks`, `hasVideoContent`, `containsVideoMarkers`, `isDirectVideoContentType` functions. The file compiles as part of the `scraper` package.
</done>
</task>
<task type="auto">
<name>Task 2: Wire validation into scraper pipeline and add config</name>
<files>internal/scraper/scraper.go, main.go</files>
<action>
**In `internal/scraper/scraper.go`:**
1. Add `validateTimeout time.Duration` field to the `Scraper` struct.
2. Update `New()` signature to accept `validateTimeout` parameter:
```go
func New(s *store.Store, interval time.Duration, validateTimeout time.Duration) *Scraper {
return &Scraper{store: s, interval: interval, validateTimeout: validateTimeout}
}
```
3. In `scrape()` method, add validation step between the `scrapeReddit()` return and the merge-with-existing logic. Insert AFTER the line `log.Printf("scraper: reddit scrape completed in %v, got %d links", ...)` and BEFORE the line `existing, err := s.store.LoadScrapedLinks()`:
```go
// Validate links - only keep those with video content markers
if len(links) > 0 {
validated := validateLinks(links, s.validateTimeout)
log.Printf("scraper: validated %d/%d links as streams", len(validated), len(links))
links = validated
}
```
This preserves SCRP-01: existing keyword filtering in `scrapeReddit()` via `isF1Post()` runs first, then validation filters the results.
**In `main.go`:**
1. Add `validateTimeout` env var read after the existing `scrapeInterval` line:
```go
validateTimeout := envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)
```
2. Update the `scraper.New()` call to pass the new parameter:
```go
sc := scraper.New(st, scrapeInterval, validateTimeout)
```
Both changes are minimal and follow the existing configuration pattern used for `SCRAPE_INTERVAL`, `PROXY_TIMEOUT`, etc.
</action>
<verify>
Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./...` -- must compile without errors. Then run `go vet ./...` -- no issues.
</verify>
<done>
`Scraper` struct has `validateTimeout` field. `New()` accepts 3 parameters. `scrape()` calls `validateLinks` between extraction and merge. `main.go` reads `SCRAPER_VALIDATE_TIMEOUT` env var (default 10s) and passes it to `scraper.New()`.
</done>
</task>
<task type="auto">
<name>Task 3: Add unit tests for validation functions</name>
<files>internal/scraper/validate_test.go</files>
<action>
Create `internal/scraper/validate_test.go` in package `scraper` with the following test functions:
**`TestContainsVideoMarkers`** - table-driven test covering:
- Positive cases (should return true):
- `<video>` tag: `<div><video src="stream.mp4"></video></div>`
- HLS manifest ref: `var url = "https://cdn.example.com/live.m3u8";`
- DASH manifest ref: `<source src="stream.mpd" type="application/dash+xml">`
- HLS.js library: `<script src="/js/hls.min.js"></script>`
- Video.js library: `<script src="https://cdn.example.com/video.js"></script>`
- JW Player: `<div id="jwplayer-container"></div><script>jwplayer("jwplayer-container")</script>`
- Clappr: `<script src="clappr.min.js"></script>`
- Flowplayer: `<script>flowplayer("#player")</script>`
- Negative cases (should return false):
- Plain HTML: `<html><body><p>Hello world</p></body></html>`
- Reddit link page: `<html><body><a href="https://example.com">Click here</a></body></html>`
- Blog post: `<html><body><article>F1 race results and analysis...</article></body></html>`
- Empty string: ``
Each test calls `containsVideoMarkers(body)` (input is already lowercased in real usage, but test with lowercase content to match real behavior) and asserts the result.
**`TestIsDirectVideoContentType`** - table-driven test covering:
- Positive: `video/mp4`, `video/webm`, `application/x-mpegurl`, `application/vnd.apple.mpegurl`, `application/dash+xml`, `video/mp4; charset=utf-8` (with params)
- Negative: `text/html`, `application/json`, `image/png`, `text/plain`, empty string
Each test calls `isDirectVideoContentType(ct)` and asserts the result.
Use standard `testing` package only. Follow existing test conventions in the codebase (table-driven tests with `t.Run`).
</action>
<verify>
Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go test ./internal/scraper/ -v -run "TestContainsVideoMarkers|TestIsDirectVideoContentType"` -- all tests pass.
</verify>
<done>
`validate_test.go` exists with `TestContainsVideoMarkers` (8+ positive, 4+ negative cases) and `TestIsDirectVideoContentType` (6+ positive, 5+ negative cases). All tests pass.
</done>
</task>
</tasks>
<verification>
1. `go build ./...` compiles without errors
2. `go vet ./...` reports no issues
3. `go test ./internal/scraper/ -v` -- all tests pass
4. Verify `validate.go` contains the `videoMarkers` slice with at least 15 markers
5. Verify `scraper.go:scrape()` calls `validateLinks` between `scrapeReddit()` return and `LoadScrapedLinks()`
6. Verify `main.go` reads `SCRAPER_VALIDATE_TIMEOUT` with default `10*time.Second`
7. Verify the existing `isF1Post` keyword filtering in `scrapeReddit()` is untouched (SCRP-01)
</verification>
<success_criteria>
- The scraper pipeline compiles and all tests pass
- `validateLinks` is called in `scrape()` after URL extraction but before merge, filtering out URLs without video markers
- The validation timeout is configurable via `SCRAPER_VALIDATE_TIMEOUT` env var (default 10s)
- Existing F1 keyword filtering behavior is preserved unchanged
- No new external dependencies are introduced (stdlib only)
</success_criteria>
<output>
After completion, create `.planning/phases/01-scraper-validation/01-01-SUMMARY.md`
</output>

View file

@ -1,107 +0,0 @@
---
phase: 01-scraper-validation
plan: 01
subsystem: scraper
tags: [go, http, video-detection, content-validation, streaming]
# Dependency graph
requires: []
provides:
- "URL validation pipeline with video marker detection (validateLinks)"
- "Configurable validation timeout via SCRAPER_VALIDATE_TIMEOUT env var"
- "Video content type and HTML marker detection functions"
affects: [02-health-checks, 04-link-extraction]
# Tech tracking
tech-stack:
added: []
patterns:
- "Pipeline filter pattern: scrapeReddit -> validateLinks -> merge"
- "String-match video detection (no DOM parsing) for Phase 1 speed"
- "2MB body limit for HTML inspection to prevent memory issues"
key-files:
created:
- internal/scraper/validate.go
- internal/scraper/validate_test.go
modified:
- internal/scraper/scraper.go
- main.go
key-decisions:
- "String matching over DOM parsing for video detection (DOM reserved for Phase 4)"
- "2MB body limit to prevent memory issues on large pages"
- "3 redirect limit to avoid infinite redirect chains"
patterns-established:
- "Pipeline filter: validate scraped links before merge into store"
- "Env var config pattern: envDuration for timeout configuration"
requirements-completed: [SCRP-01, SCRP-02, SCRP-03, SCRP-04]
# Metrics
duration: 3min
completed: 2026-02-17
---
# Phase 1 Plan 1: Scraper Validation Summary
**URL validation pipeline with 18 video/player markers filtering scraped links before store merge, configurable via SCRAPER_VALIDATE_TIMEOUT**
## Performance
- **Duration:** 3 min
- **Started:** 2026-02-17T20:49:16Z
- **Completed:** 2026-02-17T20:51:54Z
- **Tasks:** 3
- **Files modified:** 4
## Accomplishments
- Created validate.go with 18 video/player markers covering HTML5, HLS, DASH, and 10+ player libraries
- Wired validateLinks into scrape() pipeline between URL extraction and store merge
- Added SCRAPER_VALIDATE_TIMEOUT env var (default 10s) following existing config patterns
- Added 25 unit tests (10 positive + 4 negative marker tests, 6 positive + 5 negative content type tests)
## Task Commits
Each task was committed atomically:
1. **Task 1: Create validate.go with video marker detection** - `adeb478` (feat)
2. **Task 2: Wire validation into scraper pipeline and add config** - `22d29db` (feat)
3. **Task 3: Add unit tests for validation functions** - `6c5cc02` (test)
## Files Created/Modified
- `internal/scraper/validate.go` - URL validation with video marker detection (validateLinks, hasVideoContent, containsVideoMarkers, isDirectVideoContentType)
- `internal/scraper/validate_test.go` - Table-driven unit tests for marker detection and content type checks (25 cases)
- `internal/scraper/scraper.go` - Added validateTimeout field and validateLinks call in scrape()
- `main.go` - Added SCRAPER_VALIDATE_TIMEOUT env var read (default 10s)
## Decisions Made
- Used string matching (not DOM parsing) for video detection -- DOM parsing reserved for Phase 4 link extraction
- Set 2MB body read limit to prevent memory issues on large streaming pages
- Limited redirects to 3 to avoid infinite redirect chains on sketchy stream sites
- Validation runs sequentially (not concurrent) to avoid overwhelming target sites
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
None.
## User Setup Required
None - no external service configuration required.
## Next Phase Readiness
- Validation pipeline is integrated and tested, ready for health check layer (Phase 2)
- The validateLinks function provides the filtering foundation that health checks will build upon
- No blockers or concerns
## Self-Check: PASSED
All 5 files verified present. All 3 task commits verified in git log.
---
*Phase: 01-scraper-validation*
*Completed: 2026-02-17*

View file

@ -1,599 +0,0 @@
# Phase 1: Scraper Validation - Research
**Researched:** 2026-02-17
**Domain:** HTTP content fetching and HTML video/player content detection in Go
**Confidence:** HIGH
## Summary
Phase 1 adds a validation step to the existing Reddit scraper pipeline. Currently, the scraper extracts ALL URLs from F1-related Reddit posts and saves them to `scraped_links.json` without verifying whether they point to actual stream pages. The validation step will proxy-fetch each extracted URL (reusing the existing proxy's HTTP client pattern) and inspect the HTML response for video/player content markers before saving.
The implementation is straightforward because the codebase already has all the infrastructure needed: HTTP fetching with timeouts (used in both `internal/scraper/reddit.go` and `internal/proxy/proxy.go`), URL validation, and the scraper pipeline with deduplication. The new code is a validation function inserted between URL extraction and saving, operating on the same `[]models.ScrapedLink` type.
**Primary recommendation:** Add a `validateStreamURL` function in a new file `internal/scraper/validate.go` that uses string-based content matching (not full HTML parsing) to detect video markers, with `golang.org/x/net/html` reserved for Phase 4 (video extraction). Keep it simple: fetch the page, lowercase the body, check for known patterns. This avoids adding a dependency for Phase 1 while Phase 2 will reuse the same validation logic for health checks.
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|-----------------|
| SCRP-01 | Scraper filters Reddit posts by F1 keywords before extracting URLs (existing behavior, preserve) | Existing `isF1Post()` function in `reddit.go` lines 272-285 handles this. No changes needed -- just ensure the validation step is added AFTER URL extraction, not replacing the keyword filter. |
| SCRP-02 | Scraper validates each extracted URL by proxy-fetching it and checking for video/player content markers | New `validateStreamURL()` function fetches URL with configurable timeout, reads response body, checks for video content markers (see "Video Content Markers" section below for complete list). Reuse existing HTTP client pattern from `reddit.go:88`. |
| SCRP-03 | URLs that don't look like streams (no video markers detected) are discarded before saving | Filter applied in `scraper.go:scrape()` between URL extraction (line 57) and merge/save (line 60). Only URLs passing validation are included in the `links` slice passed to the merge step. |
| SCRP-04 | Validation has a configurable timeout (default 10s) to avoid blocking on slow sites | Add `SCRAPER_VALIDATE_TIMEOUT` environment variable read in `main.go`, passed to `scraper.New()`. Use `context.WithTimeout` on per-URL fetch to enforce deadline. Default 10 seconds. |
</phase_requirements>
## Standard Stack
### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| `net/http` | stdlib | HTTP client for fetching URLs | Already used throughout codebase (`reddit.go`, `proxy.go`). No external dependency needed. |
| `strings` | stdlib | Case-insensitive string matching for content markers | Already used extensively. `strings.Contains` on lowercased body is the simplest approach for marker detection. |
| `regexp` | stdlib | Pattern matching for HLS/DASH URLs in page source | Already used in `reddit.go` for URL extraction. Needed for matching `.m3u8` and `.mpd` URL patterns in HTML content. |
| `context` | stdlib | Timeout enforcement per URL validation | Already used in scraper (`scraper.go:Run`). `context.WithTimeout` provides per-request deadline. |
| `io` | stdlib | `io.LimitReader` for response body size limiting | Already used in `proxy.go` and `reddit.go` for body size limits. |
### Supporting
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| `golang.org/x/net/html` | latest | Full HTML DOM parsing | NOT needed for Phase 1. Reserve for Phase 4 (video source extraction). String matching is sufficient for detection. |
| `sync` | stdlib | WaitGroup for parallel validation | If parallel validation is desired. But sequential is simpler and respects rate limits of target sites. |
### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| String matching on body | `golang.org/x/net/html` DOM parsing | DOM parsing is more accurate but adds a dependency and complexity. For Phase 1 (detection, not extraction), string matching is sufficient. Phase 4 needs DOM parsing for actual source extraction. |
| Sequential URL validation | `sync.WaitGroup` parallel validation | Parallel is faster but risks triggering rate limits on target sites and complicates error handling. Sequential with timeout is simpler and predictable. |
| Custom HTTP client | Reuse proxy's `*http.Client` | The proxy client has redirect limits and timeout already configured. But the scraper should have its own client with validation-specific timeout. Keep them independent. |
**Installation:**
```bash
# No new dependencies needed for Phase 1. All stdlib.
# golang.org/x/net/html deferred to Phase 4.
```
## Architecture Patterns
### Where Validation Fits in the Pipeline
```
Current flow:
scrapeReddit() -> []models.ScrapedLink -> merge with existing -> save
New flow:
scrapeReddit() -> []models.ScrapedLink -> validateLinks() -> []models.ScrapedLink -> merge with existing -> save
```
The validation step is a filter function that takes a slice of scraped links and returns only those that pass validation. This keeps the existing pipeline intact and makes the validation step independently testable.
### Recommended File Structure
```
internal/scraper/
scraper.go # Orchestrator (existing, add validateTimeout field + call validateLinks)
reddit.go # Reddit API scraping (existing, no changes)
validate.go # NEW: validateStreamURL(), validateLinks(), content marker definitions
```
### Pattern 1: Validation as a Filter Function
**What:** A pure filter function that takes `[]models.ScrapedLink` and returns the subset that pass validation.
**When to use:** When adding a validation/filter step to an existing pipeline.
**Example:**
```go
// internal/scraper/validate.go
// validateLinks filters links to only those with video content markers.
// Each URL is fetched with the given timeout and inspected for markers.
func validateLinks(links []models.ScrapedLink, timeout time.Duration) []models.ScrapedLink {
client := &http.Client{Timeout: timeout}
var valid []models.ScrapedLink
for _, link := range links {
if hasVideoContent(client, link.URL) {
valid = append(valid, link)
} else {
log.Printf("scraper: discarded %s (no video markers)", truncate(link.URL, 60))
}
}
return valid
}
// hasVideoContent fetches a URL and checks for video/player content markers.
func hasVideoContent(client *http.Client, rawURL string) bool {
req, err := http.NewRequest("GET", rawURL, nil)
if err != nil {
return false
}
req.Header.Set("User-Agent", userAgent) // reuse existing constant
resp, err := client.Do(req)
if err != nil {
log.Printf("scraper: validate fetch error for %s: %v", truncate(rawURL, 60), err)
return false
}
defer resp.Body.Close()
// Only inspect HTML responses
ct := resp.Header.Get("Content-Type")
if !strings.Contains(ct, "text/html") && !strings.Contains(ct, "application/xhtml") {
// Could be a direct video file (.m3u8, .mpd, .mp4) which is valid
if isDirectVideoContentType(ct) {
return true
}
return false
}
body, err := io.ReadAll(io.LimitReader(resp.Body, 2*1024*1024)) // 2MB limit for validation
if err != nil {
return false
}
return containsVideoMarkers(strings.ToLower(string(body)))
}
```
### Pattern 2: Configuration Via Struct Field
**What:** Pass validation timeout through the existing `Scraper` struct, configured from `main.go` env vars.
**When to use:** Following existing codebase pattern where all config flows through `main.go` -> constructor.
**Example:**
```go
// internal/scraper/scraper.go
type Scraper struct {
store *store.Store
interval time.Duration
validateTimeout time.Duration // NEW
mu sync.Mutex
}
func New(s *store.Store, interval time.Duration, validateTimeout time.Duration) *Scraper {
return &Scraper{store: s, interval: interval, validateTimeout: validateTimeout}
}
// In main.go:
validateTimeout := envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)
sc := scraper.New(st, scrapeInterval, validateTimeout)
```
### Pattern 3: Integration Point in scrape()
**What:** Call validateLinks between scrapeReddit return and merge step.
**When to use:** Minimal change to existing scrape flow.
**Example:**
```go
// In scraper.go:scrape() - between lines 57 and 60
links, err := scrapeReddit()
if err != nil {
// ... existing error handling
}
log.Printf("scraper: reddit scrape completed in %v, got %d links", time.Since(start).Round(time.Millisecond), len(links))
// NEW: validate links before merging
if len(links) > 0 {
validated := validateLinks(links, s.validateTimeout)
log.Printf("scraper: validated %d/%d links as streams", len(validated), len(links))
links = validated
}
// Continue with existing merge logic...
```
### Anti-Patterns to Avoid
- **Fetching URLs inside the Reddit API loop:** Validation should happen after all URLs are collected from Reddit, not interleaved with Reddit API calls. This keeps the Reddit API calls fast and avoids mixing rate-limit concerns.
- **Using the proxy's HTTP handler for internal validation:** The proxy (`internal/proxy/proxy.go`) is designed as an HTTP handler for client-facing requests with IP-based rate limiting. The scraper should use its own HTTP client without rate limiting since it is a trusted internal caller.
- **Modifying the ScrapedLink model to track validation state:** For Phase 1, validation is a binary filter (pass or discard). Adding validation metadata to the model is premature and adds complexity to the store layer. If needed in Phase 2 for health checking, it can be added then.
- **Full HTML DOM parsing for detection:** Using `golang.org/x/net/html` to parse the full DOM tree just to detect presence of video tags is overkill. String matching on lowercased HTML body is sufficient for detection. DOM parsing is needed in Phase 4 for actual source URL extraction.
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| HTTP fetching with timeout | Custom TCP client | `net/http.Client` with `Timeout` field | stdlib handles redirects, TLS, timeouts, connection pooling |
| HTML content inspection | Full DOM parser | `strings.Contains` on lowercased body | Detection (yes/no) does not need structural parsing; string matching is faster and simpler |
| URL scheme validation | Manual string prefix check | `net/url.Parse` + scheme check | Already used in codebase; handles edge cases |
| Concurrent timeout enforcement | Manual goroutine + channel | `context.WithTimeout` + `http.NewRequestWithContext` | stdlib integration; cancels in-flight requests properly |
**Key insight:** Phase 1 is a detection problem (does this page look like a stream?), not an extraction problem (what is the stream URL?). Detection can be done with string matching. Extraction (Phase 4) needs DOM parsing.
## Video Content Markers
### HIGH confidence markers (any one of these strongly indicates a stream page)
**HTML Tags:**
- `<video` - HTML5 video element
- `<source` with `type="application/x-mpegurl"` or `type="application/dash+xml"` - HLS/DASH sources
- `<iframe` with src containing known player domains
**HLS/DASH Manifest References:**
- `.m3u8` - HLS manifest file extension
- `.mpd` - DASH manifest file extension
- `application/x-mpegurl` - HLS MIME type
- `application/vnd.apple.mpegurl` - Alternative HLS MIME type
- `application/dash+xml` - DASH MIME type
**Player Library References:**
- `hls.js` or `hls.min.js` - HLS.js player library
- `dash.js` or `dash.all.min.js` - DASH.js player library
- `video.js` or `video.min.js` or `videojs` - Video.js player
- `jwplayer` - JW Player
- `clappr` - Clappr player
- `flowplayer` - Flowplayer
- `plyr` - Plyr player
- `shaka-player` or `shaka` - Google Shaka Player
- `mediaelement` - MediaElement.js
- `fluidplayer` - Fluid Player
**Direct Video File Extensions in URLs:**
- `.mp4` - MPEG-4 video
- `.webm` - WebM video
- `.ts` (in context of `.m3u8` references) - MPEG-TS segments
### MEDIUM confidence markers (suggestive but not conclusive)
- `player` (as class, id, or variable name in context of video)
- `stream` (in context of video-related markup)
- `embed` (in context of video players)
### Implementation Strategy
Use a tiered approach:
1. First check for HIGH confidence markers (any single match = valid)
2. Do NOT use MEDIUM confidence markers alone (too many false positives)
3. Direct video content types in HTTP response (`video/mp4`, `application/x-mpegurl`, etc.) are valid without HTML inspection
```go
// Content markers to check (case-insensitive, checked against lowercased body)
var videoMarkers = []string{
// HTML5 video element
"<video",
// HLS markers
".m3u8",
"application/x-mpegurl",
"application/vnd.apple.mpegurl",
// DASH markers
".mpd",
"application/dash+xml",
// Player libraries
"hls.js", "hls.min.js",
"dash.js", "dash.all.min.js",
"video.js", "video.min.js", "videojs",
"jwplayer",
"clappr",
"flowplayer",
"plyr",
"shaka-player",
"mediaelement",
"fluidplayer",
}
// Direct video content types (check Content-Type header)
var videoContentTypes = []string{
"video/",
"application/x-mpegurl",
"application/vnd.apple.mpegurl",
"application/dash+xml",
"application/mpegurl",
}
func containsVideoMarkers(loweredBody string) bool {
for _, marker := range videoMarkers {
if strings.Contains(loweredBody, marker) {
return true
}
}
return false
}
func isDirectVideoContentType(ct string) bool {
ct = strings.ToLower(ct)
for _, vct := range videoContentTypes {
if strings.Contains(ct, vct) {
return true
}
}
return false
}
```
## Common Pitfalls
### Pitfall 1: Blocking the Scrape Cycle on Slow/Unresponsive URLs
**What goes wrong:** A single URL that times out at 10s, multiplied by 50 URLs per scrape cycle, means the validation step takes 500 seconds (8+ minutes). With a 15-minute scrape interval, validation could overlap with the next cycle.
**Why it happens:** Sequential validation with per-URL timeout does not have a total budget for the validation step.
**How to avoid:** The per-URL timeout (SCRP-04) handles individual slowness. Additionally, consider logging total validation time. The mutex in `scraper.go:scrape()` already prevents concurrent scrapes, so overlap is safe (next scrape just waits). With typical scrape volumes (5-20 new URLs per cycle), even worst case (20 * 10s = 200s) is well within the 15-minute interval.
**Warning signs:** Scrape logs showing validation taking longer than half the scrape interval.
### Pitfall 2: False Negatives from JavaScript-Rendered Pages
**What goes wrong:** Many streaming sites load their video player via JavaScript. An HTTP fetch gets the initial HTML which may not contain `<video>` tags or player references -- those are injected by JS after page load.
**Why it happens:** HTTP fetching returns raw HTML; no JavaScript execution.
**How to avoid:** This is an accepted limitation per the requirements doc ("Full browser automation (Puppeteer/Playwright)" is Out of Scope). The marker list includes JavaScript library references (e.g., `hls.js`, `video.js`) which ARE present in the raw HTML even before execution. Most streaming sites include their player library in `<script>` tags in the initial HTML. The marker list is designed to catch these references.
**Warning signs:** Known good stream URLs being discarded. Monitor discard rate in logs.
### Pitfall 3: HTTP vs HTTPS URL Handling
**What goes wrong:** The existing proxy only supports HTTPS URLs (line 68 in `proxy.go`), but scraped URLs may be HTTP. If the validator only accepts HTTPS, valid HTTP stream URLs get discarded.
**Why it happens:** The proxy has a stricter security requirement (client-facing) than the scraper (internal validation).
**How to avoid:** The scraper validator should accept both HTTP and HTTPS URLs for fetching. The proxy's HTTPS restriction is appropriate for its purpose but should not be inherited by the validator.
**Warning signs:** All HTTP URLs being discarded.
### Pitfall 4: Redirect Chains Leading to Non-Stream Content
**What goes wrong:** A URL redirects through several pages (ads, link shorteners) before reaching the actual stream page. The HTTP client follows redirects (Go default: up to 10), but the intermediate pages may not have video markers.
**Why it happens:** Streaming links from Reddit often go through shorteners or ad-redirect chains.
**How to avoid:** Go's `http.Client` follows redirects by default (up to 10). The validation checks the FINAL response after redirects. Set a reasonable redirect limit (3, matching proxy's limit) to avoid infinite chains. The timeout applies to the entire chain, so slow redirect chains will timeout.
**Warning signs:** Timeout errors on URLs that resolve fine in a browser.
### Pitfall 5: Large Response Bodies Causing Memory Pressure
**What goes wrong:** A streaming site returns a huge HTML page (or a direct video file), and the validator reads the entire body into memory.
**Why it happens:** No body size limit on validation fetches.
**How to avoid:** Use `io.LimitReader` (already a pattern in the codebase). 2MB is sufficient for detecting markers in HTML. For direct video content types, the Content-Type header check is sufficient -- no need to read the body.
**Warning signs:** Memory spikes during scrape cycles.
### Pitfall 6: Changing the `scraper.New()` Signature Breaks Existing Call Site
**What goes wrong:** Adding `validateTimeout` parameter to `New()` changes its signature, breaking the call in `main.go`.
**Why it happens:** Go does not have optional parameters.
**How to avoid:** Update both `New()` in `scraper.go` and the call site in `main.go` simultaneously. This is a simple, predictable change. Alternative: use options pattern, but that is overkill for adding one field.
**Warning signs:** Compilation error -- easily caught.
## Code Examples
### Complete validateLinks Implementation
```go
// internal/scraper/validate.go
package scraper
import (
"io"
"log"
"net/http"
"strings"
"time"
"f1-stream/internal/models"
)
// videoMarkers are case-insensitive strings that indicate video/player content.
// Checked against lowercased HTML body.
var videoMarkers = []string{
"<video",
".m3u8",
"application/x-mpegurl",
"application/vnd.apple.mpegurl",
".mpd",
"application/dash+xml",
"hls.js", "hls.min.js",
"dash.js", "dash.all.min.js",
"video.js", "video.min.js", "videojs",
"jwplayer",
"clappr",
"flowplayer",
"plyr",
"shaka-player",
"mediaelement",
"fluidplayer",
}
// videoContentTypes are Content-Type prefixes that indicate direct video content.
var videoContentTypes = []string{
"video/",
"application/x-mpegurl",
"application/vnd.apple.mpegurl",
"application/dash+xml",
}
const validateBodyLimit = 2 * 1024 * 1024 // 2MB
// validateLinks filters links to only those whose URLs contain video content markers.
func validateLinks(links []models.ScrapedLink, timeout time.Duration) []models.ScrapedLink {
client := &http.Client{
Timeout: timeout,
CheckRedirect: func(req *http.Request, via []*http.Request) error {
if len(via) >= 3 {
return http.ErrUseLastResponse
}
return nil
},
}
var valid []models.ScrapedLink
for _, link := range links {
if hasVideoContent(client, link.URL) {
valid = append(valid, link)
log.Printf("scraper: validated %s (video markers found)", truncate(link.URL, 60))
} else {
log.Printf("scraper: discarded %s (no video markers)", truncate(link.URL, 60))
}
}
return valid
}
func hasVideoContent(client *http.Client, rawURL string) bool {
req, err := http.NewRequest("GET", rawURL, nil)
if err != nil {
return false
}
req.Header.Set("User-Agent", userAgent)
resp, err := client.Do(req)
if err != nil {
log.Printf("scraper: validate fetch error for %s: %v", truncate(rawURL, 60), err)
return false
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 400 {
return false
}
ct := strings.ToLower(resp.Header.Get("Content-Type"))
// Check if response is a direct video content type
for _, vct := range videoContentTypes {
if strings.Contains(ct, vct) {
return true
}
}
// Only inspect HTML responses for markers
if !strings.Contains(ct, "text/html") && !strings.Contains(ct, "application/xhtml") {
return false
}
body, err := io.ReadAll(io.LimitReader(resp.Body, validateBodyLimit))
if err != nil {
return false
}
return containsVideoMarkers(strings.ToLower(string(body)))
}
func containsVideoMarkers(loweredBody string) bool {
for _, marker := range videoMarkers {
if strings.Contains(loweredBody, marker) {
return true
}
}
return false
}
```
### Integration in scraper.go
```go
// scraper.go - modified scrape() method
func (s *Scraper) scrape() {
s.mu.Lock()
defer s.mu.Unlock()
start := time.Now()
log.Println("scraper: starting scrape")
links, err := scrapeReddit()
if err != nil {
log.Printf("scraper: error after %v: %v", time.Since(start).Round(time.Millisecond), err)
return
}
log.Printf("scraper: reddit scrape completed in %v, got %d links", time.Since(start).Round(time.Millisecond), len(links))
// Validate links - only keep those with video content markers
if len(links) > 0 {
validated := validateLinks(links, s.validateTimeout)
log.Printf("scraper: validated %d/%d links as streams", len(validated), len(links))
links = validated
}
// Rest of existing merge logic unchanged...
}
```
### Configuration in main.go
```go
// main.go additions
validateTimeout := envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)
sc := scraper.New(st, scrapeInterval, validateTimeout)
```
### Unit Test for containsVideoMarkers
```go
// internal/scraper/validate_test.go
package scraper
import "testing"
func TestContainsVideoMarkers(t *testing.T) {
tests := []struct {
name string
body string
expected bool
}{
{"video tag", `<div><video src="stream.mp4"></video></div>`, true},
{"hls manifest", `var url = "https://cdn.example.com/live.m3u8";`, true},
{"dash manifest", `<source src="stream.mpd" type="application/dash+xml">`, true},
{"hls.js library", `<script src="/js/hls.min.js"></script>`, true},
{"video.js library", `<script src="https://cdn.example.com/video.js"></script>`, true},
{"jwplayer", `<div id="jwplayer-container"></div><script>jwplayer("jwplayer-container")</script>`, true},
{"no markers", `<html><body><p>Hello world</p></body></html>`, false},
{"reddit link page", `<html><body><a href="https://example.com">Click here</a></body></html>`, false},
{"blog post", `<html><body><article>F1 race results...</article></body></html>`, false},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := containsVideoMarkers(tt.body)
if result != tt.expected {
t.Errorf("containsVideoMarkers(%q) = %v, want %v", truncate(tt.body, 40), result, tt.expected)
}
})
}
}
```
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| Save all URLs from F1 posts | Will validate each URL before saving | Phase 1 (now) | Eliminates junk links at the source |
| No content inspection | String-based marker detection | Phase 1 (now) | Simple, fast, no external dependencies |
| Static marker list | Static marker list (sufficient for now) | - | May need updating as new players emerge; easily extensible |
**Why not use browser automation:**
The REQUIREMENTS.md explicitly marks "Full browser automation (Puppeteer/Playwright)" as Out of Scope. HTTP-based checks with string matching catch the majority of stream pages because player library `<script>` tags are present in raw HTML even before JavaScript execution.
## Open Questions
1. **How many URLs per scrape cycle will need validation?**
- What we know: The subreddit listing fetches 25 posts, filters by F1 keywords. Typical F1-related posts might yield 5-20 unique URLs per cycle after deduplication and domain filtering.
- What's unclear: Real-world distribution. Could be 2 URLs or 50.
- Recommendation: Log validation counts in production. The sequential approach with 10s timeout per URL handles up to ~90 URLs within the 15-minute scrape interval (worst case).
2. **Should already-validated URLs be re-validated on subsequent scrape cycles?**
- What we know: The current merge step deduplicates by URL -- existing URLs are not re-processed. Validation runs only on newly discovered URLs.
- What's unclear: Whether a URL that failed validation last cycle should be retried.
- Recommendation: No retry needed for Phase 1. Failed URLs are simply not saved. If the URL appears again in a future scrape, it will be a "new" URL (not yet in scraped.json) and get validated again naturally.
3. **Will the marker list need frequent updates?**
- What we know: Major player libraries (HLS.js, Video.js, JW Player) are well-established and their names are stable.
- What's unclear: Whether niche streaming sites use custom players with no recognizable markers.
- Recommendation: Start with the current list. Monitor discard rate in logs. Add markers if known-good sites are being discarded. The list is a simple string slice, trivially extensible.
## Sources
### Primary (HIGH confidence)
- Codebase analysis: `internal/scraper/scraper.go`, `internal/scraper/reddit.go`, `internal/proxy/proxy.go` - existing patterns for HTTP fetching, URL processing, pipeline structure
- Codebase analysis: `internal/models/models.go` - ScrapedLink type definition
- Codebase analysis: `main.go` - env var pattern, dependency initialization
- Go stdlib docs: `net/http`, `strings`, `io`, `context` packages
- `golang.org/x/net/html` package API (verified via pkg.go.dev) - confirmed `html.Parse`, `Node.Descendants()`, `atom.Video` available. Reserved for Phase 4.
### Secondary (MEDIUM confidence)
- `.planning/REQUIREMENTS.md` - phase requirements, out-of-scope decisions
- `.planning/ROADMAP.md` - phase dependencies (Phase 2 reuses validation logic)
- `.planning/codebase/ARCHITECTURE.md` - data flow, cross-cutting concerns
### Tertiary (LOW confidence)
- Video player library names (hls.js, video.js, jwplayer, etc.) - based on widely known ecosystem knowledge. The specific set of markers may need tuning based on real-world stream sites.
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH - uses only stdlib; all patterns already exist in codebase
- Architecture: HIGH - minimal change to existing pipeline; clear integration point
- Pitfalls: HIGH - pitfalls are well-understood; mitigations use existing codebase patterns
- Video markers: MEDIUM - marker list covers major players but may miss niche sites; easily extensible
**Research date:** 2026-02-17
**Valid until:** 2026-03-17 (stable domain; marker list may need periodic updates)

View file

@ -1,200 +0,0 @@
---
phase: 01-scraper-validation
verified: 2026-02-17T20:58:00Z
status: passed
score: 4/4 must-haves verified
re_verification: false
---
# Phase 1: Scraper Validation Verification Report
**Phase Goal:** Scraped URLs are verified to contain actual video/player content before being stored, eliminating junk links at the source
**Verified:** 2026-02-17T20:58:00Z
**Status:** PASSED
**Re-verification:** No — initial verification
## Goal Achievement
### Observable Truths
All 4 success criteria from ROADMAP.md verified:
| # | Truth | Status | Evidence |
|---|-------|--------|----------|
| 1 | Scraper still discovers F1-related posts using keyword filtering (existing behavior preserved) | ✓ VERIFIED | `reddit.go:105` calls `isF1Post(post.Title)` before processing posts. Keywords defined in `f1Keywords` slice (lines 29-39). Keyword filtering runs BEFORE validation step. |
| 2 | Each extracted URL is proxy-fetched and inspected for video/player content markers | ✓ VERIFIED | `scraper.go:62` calls `validateLinks(links, s.validateTimeout)` after URL extraction. `validate.go:56-76` implements fetch + marker inspection with 20 video markers (HTML5, HLS, DASH, 10+ player libraries). |
| 3 | URLs without video content markers are discarded and do not appear in scraped.json | ✓ VERIFIED | `validate.go:71-73` logs discarded URLs and excludes them from return slice. Only URLs passing `hasVideoContent()` are kept in `kept` slice. |
| 4 | Validation respects a configurable timeout so slow sites do not block the scrape cycle | ✓ VERIFIED | `main.go:26` reads `SCRAPER_VALIDATE_TIMEOUT` env var (default 10s). `validate.go:58` creates HTTP client with timeout. Passed through `scraper.New()` at `main.go:56`. |
**Score:** 4/4 truths verified
### Required Artifacts
All 4 artifacts from PLAN must_haves verified at all 3 levels (exists, substantive, wired):
| Artifact | Expected | Exists | Substantive | Wired | Status |
|----------|----------|--------|-------------|-------|--------|
| `internal/scraper/validate.go` | URL validation logic with video marker detection | ✓ | ✓ 142 lines, 20 video markers, 4 functions | ✓ | ✓ VERIFIED |
| `internal/scraper/validate_test.go` | Unit tests for marker detection and content type checks | ✓ | ✓ 124 lines, 14 test cases covering positive/negative scenarios | ✓ | ✓ VERIFIED |
| `internal/scraper/scraper.go` | validateTimeout field and validation call in scrape() | ✓ | ✓ validateTimeout field on line 16, validateLinks call on line 62 | ✓ | ✓ VERIFIED |
| `main.go` | SCRAPER_VALIDATE_TIMEOUT env var configuration | ✓ | ✓ Line 26 reads env var, line 56 passes to scraper.New() | ✓ | ✓ VERIFIED |
**Artifact Details:**
**validate.go (142 lines):**
- Contains `validateLinks` function (lines 56-76)
- Contains `hasVideoContent` function (lines 81-119)
- Contains `containsVideoMarkers` function (lines 123-130)
- Contains `isDirectVideoContentType` function (lines 134-142)
- Defines 20 video markers (lines 15-40): HTML5 `<video`, HLS (.m3u8, application/x-mpegurl), DASH (.mpd, application/dash+xml), 15 player libraries (hls.js, video.js, jwplayer, clappr, flowplayer, plyr, shaka-player, mediaelement, fluidplayer, etc.)
- Defines 4 video content types (lines 44-49)
- Sets 2MB body limit (line 52)
- 3 redirect limit (line 60)
**validate_test.go (124 lines):**
- `TestContainsVideoMarkers`: 10 positive cases (video tag, HLS manifest, DASH manifest, HLS.js, Video.js, JW Player, Clappr, Flowplayer, Plyr, Shaka Player) + 4 negative cases (plain HTML, Reddit link page, blog post, empty string)
- `TestIsDirectVideoContentType`: 6 positive cases (video/mp4, video/webm, HLS content types, DASH, video with params) + 5 negative cases (text/html, application/json, image/png, text/plain, empty)
- Total: 14 test cases covering 25 assertion points
**scraper.go:**
- Line 16: `validateTimeout time.Duration` field added to Scraper struct
- Line 20-21: `New()` function updated to accept validateTimeout parameter
- Lines 60-65: Validation step inserted between URL extraction and merge logic
**main.go:**
- Line 26: `validateTimeout := envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)`
- Line 56: `sc := scraper.New(st, scrapeInterval, validateTimeout)`
### Key Link Verification
All 2 key links from PLAN must_haves verified:
| From | To | Via | Status | Details |
|------|----|----|--------|---------|
| `internal/scraper/scraper.go` | `internal/scraper/validate.go` | validateLinks call in scrape() | ✓ WIRED | `scraper.go:62` calls `validateLinks(links, s.validateTimeout)` between URL extraction (`scrapeReddit()` return at line 58) and store merge (`LoadScrapedLinks()` at line 68) |
| `main.go` | `internal/scraper/scraper.go` | scraper.New() with validateTimeout | ✓ WIRED | `main.go:56` calls `scraper.New(st, scrapeInterval, validateTimeout)` where validateTimeout is read from env on line 26 |
**Wiring Verification:**
**Link 1: scraper.go → validate.go**
- Pattern match: `validateLinks\(links` found at `scraper.go:62`
- Context: Call occurs after `scrapeReddit()` (line 53) and before `LoadScrapedLinks()` (line 68)
- Data flow: `links` variable from `scrapeReddit()` → filtered by `validateLinks()` → assigned back to `links` → merged with existing
**Link 2: main.go → scraper.go**
- Pattern match: `scraper\.New\(st.*validateTimeout` found at `main.go:56`
- Context: `validateTimeout` variable read from env on line 26, passed as 3rd parameter to `scraper.New()`
- Parameter flow: `envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)``validateTimeout` variable → `scraper.New()` parameter → `Scraper.validateTimeout` field → `validateLinks()` call
### Requirements Coverage
All 4 requirements from PLAN frontmatter verified against REQUIREMENTS.md:
| Requirement | Description | Status | Evidence |
|-------------|-------------|--------|----------|
| **SCRP-01** | Scraper filters Reddit posts by F1 keywords before extracting URLs (existing behavior, preserve) | ✓ SATISFIED | `reddit.go:105` calls `isF1Post(post.Title)` before processing. Keywords defined in `f1Keywords` (lines 29-39). This runs BEFORE validation, preserving existing behavior. |
| **SCRP-02** | Scraper validates each extracted URL by proxy-fetching it and checking for video/player content markers | ✓ SATISFIED | `validate.go:56-76` implements `validateLinks()` which calls `hasVideoContent()` for each URL. `hasVideoContent()` (lines 81-119) performs HTTP GET and checks for video markers. |
| **SCRP-03** | URLs that don't look like streams are discarded before saving | ✓ SATISFIED | `validate.go:71-73` logs and excludes URLs where `hasVideoContent()` returns false. Only kept URLs are returned and merged into store. |
| **SCRP-04** | Validation has a configurable timeout (default 10s) to avoid blocking on slow sites | ✓ SATISFIED | `main.go:26` reads `SCRAPER_VALIDATE_TIMEOUT` with default 10s. `validate.go:58` creates HTTP client with timeout. 3-redirect limit also prevents timeout from slow redirect chains. |
**No orphaned requirements:** All 4 requirements mapped to Phase 1 in REQUIREMENTS.md are accounted for in the PLAN and satisfied by the implementation.
### Anti-Patterns Found
No anti-patterns detected:
| Category | Checked | Found |
|----------|---------|-------|
| TODO/FIXME/PLACEHOLDER comments | ✓ | 0 |
| Placeholder strings | ✓ | 0 |
| Empty implementations (return null/empty) | ✓ | 0 |
| Console-only implementations | ✓ | 0 |
**Files checked:**
- `internal/scraper/validate.go` (142 lines)
- `internal/scraper/validate_test.go` (124 lines)
- `internal/scraper/scraper.go` (modified section lines 16, 20-21, 60-65)
- `main.go` (modified lines 26, 56)
All modified code is production-ready with proper error handling, logging, and no stub patterns.
### Human Verification Required
None. All success criteria are programmatically verifiable through code inspection:
- Keyword filtering behavior: Verified by checking `isF1Post()` call placement
- URL validation with HTTP fetch: Verified by code inspection of `validateLinks()` and `hasVideoContent()`
- Discard behavior: Verified by inspecting return logic in `validateLinks()`
- Timeout configuration: Verified by tracing env var read → parameter passing → HTTP client creation
**Note:** While the *functionality* of video marker detection would ideally be tested against real stream URLs, the *implementation* of the requirement (that validation logic exists, is wired correctly, and has appropriate markers) is fully verified.
## Verification Summary
### All Must-Haves VERIFIED
**Truths (4/4):**
1. ✓ F1 keyword filtering preserved (SCRP-01)
2. ✓ URLs proxy-fetched and inspected for video markers (SCRP-02)
3. ✓ Non-stream URLs discarded (SCRP-03)
4. ✓ Configurable timeout prevents blocking (SCRP-04)
**Artifacts (4/4):**
1. ✓ `validate.go` exists, substantive (142 lines, 20 markers, 4 functions), wired
2. ✓ `validate_test.go` exists, substantive (124 lines, 14 test cases), wired
3. ✓ `scraper.go` exists, substantive (validateTimeout field + call), wired
4. ✓ `main.go` exists, substantive (env var read + pass to New()), wired
**Key Links (2/2):**
1. ✓ scraper.go → validate.go via validateLinks call
2. ✓ main.go → scraper.go via scraper.New() with validateTimeout
**Requirements (4/4):**
1. ✓ SCRP-01 satisfied
2. ✓ SCRP-02 satisfied
3. ✓ SCRP-03 satisfied
4. ✓ SCRP-04 satisfied
**Anti-Patterns:** None found
**Human Verification:** Not required
### Implementation Quality
**Strengths:**
- 20 comprehensive video markers covering HTML5, HLS, DASH, and 15 player libraries
- Proper error handling throughout (HTTP errors, read errors, invalid URLs)
- Conservative resource limits (2MB body read limit, 3 redirect limit)
- Comprehensive unit test coverage (14 test cases, 25 assertion points)
- Clean integration preserving existing F1 keyword filtering (SCRP-01)
- Follows existing codebase patterns (envDuration config, logging style, truncate utility)
**Commits:**
All 3 tasks committed atomically:
1. `adeb478` - Create validate.go with video marker detection
2. `22d29db` - Wire validation into scraper pipeline and add config
3. `6c5cc02` - Add unit tests for validation functions
**No deviations from plan:** Implementation matches PLAN tasks exactly.
## Conclusion
**Phase 1 goal ACHIEVED.**
All success criteria from ROADMAP.md are satisfied:
1. ✓ Keyword filtering preserved
2. ✓ URLs validated with video marker detection
3. ✓ Non-stream URLs discarded
4. ✓ Configurable timeout prevents blocking
All artifacts exist, are substantive, and are wired correctly. All key links verified. All requirements satisfied. No anti-patterns found. No gaps requiring remediation.
**Ready to proceed to Phase 2.**
---
*Verified: 2026-02-17T20:58:00Z*
*Verifier: Claude (gsd-verifier)*

View file

@ -1,225 +0,0 @@
---
phase: 02-health-check-infrastructure
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- internal/models/models.go
- internal/store/store.go
- internal/store/health.go
- internal/scraper/validate.go
- internal/healthcheck/healthcheck.go
autonomous: true
requirements: [HLTH-01, HLTH-02, HLTH-03, HLTH-05, HLTH-06, HLTH-08]
must_haves:
truths:
- "HealthState model exists with URL, ConsecutiveFailures, LastCheckTime, and Healthy fields"
- "Health states can be loaded from and saved to health_state.json via store methods"
- "hasVideoContent is exported as HasVideoContent and still works for the scraper"
- "HealthChecker service checks all known URLs sequentially with a two-step check (reachability + content)"
- "A stream is marked unhealthy after 5 consecutive failures"
- "A previously unhealthy stream recovers when a check passes (failure count resets to 0, Healthy set to true)"
- "Orphaned health state entries are pruned during each check cycle"
artifacts:
- path: "internal/models/models.go"
provides: "HealthState struct"
contains: "HealthState"
- path: "internal/store/health.go"
provides: "LoadHealthStates, SaveHealthStates, HealthMap methods"
exports: ["LoadHealthStates", "SaveHealthStates", "HealthMap"]
- path: "internal/store/store.go"
provides: "healthMu field on Store struct"
contains: "healthMu"
- path: "internal/scraper/validate.go"
provides: "Exported HasVideoContent function"
exports: ["HasVideoContent"]
- path: "internal/healthcheck/healthcheck.go"
provides: "HealthChecker service with Run, checkAll, checkOne, collectURLs"
exports: ["New", "HealthChecker"]
key_links:
- from: "internal/healthcheck/healthcheck.go"
to: "internal/scraper/validate.go"
via: "scraper.HasVideoContent(client, url)"
pattern: "scraper\\.HasVideoContent"
- from: "internal/healthcheck/healthcheck.go"
to: "internal/store/health.go"
via: "store.LoadHealthStates() and store.SaveHealthStates()"
pattern: "store\\.(Load|Save)HealthStates"
- from: "internal/healthcheck/healthcheck.go"
to: "internal/store/streams.go"
via: "store.LoadStreams() to collect URLs"
pattern: "store\\.LoadStreams"
- from: "internal/healthcheck/healthcheck.go"
to: "internal/store/scraped.go"
via: "store.LoadScrapedLinks() to collect URLs"
pattern: "store\\.LoadScrapedLinks"
---
<objective>
Create the health check foundation: HealthState model, store persistence methods, and the HealthChecker background service that monitors all known stream URLs.
Purpose: Enable continuous health monitoring of all streams with persisted state, reusing Phase 1's video content detection.
Output: Working HealthChecker service with model, persistence, and check logic ready for wiring in main.go.
</objective>
<execution_context>
@/Users/viktorbarzin/.claude/get-shit-done/workflows/execute-plan.md
@/Users/viktorbarzin/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/02-health-check-infrastructure/02-RESEARCH.md
@.planning/phases/01-scraper-validation/01-01-SUMMARY.md
@internal/models/models.go
@internal/store/store.go
@internal/store/streams.go
@internal/store/scraped.go
@internal/scraper/validate.go
@internal/scraper/scraper.go
</context>
<tasks>
<task type="auto">
<name>Task 1: Add HealthState model and store persistence layer</name>
<files>
internal/models/models.go
internal/store/store.go
internal/store/health.go
</files>
<action>
1. In `internal/models/models.go`, add the HealthState struct after the Session struct:
```go
type HealthState struct {
URL string `json:"url"`
ConsecutiveFailures int `json:"consecutive_failures"`
LastCheckTime time.Time `json:"last_check_time"`
Healthy bool `json:"healthy"`
}
```
The `time` import already exists in models.go.
2. In `internal/store/store.go`, add `healthMu sync.RWMutex` field to the Store struct (after `sessionsMu`). No other changes to store.go.
3. Create `internal/store/health.go` with three methods:
- `LoadHealthStates() ([]models.HealthState, error)` -- acquires `healthMu.RLock`, reads `health_state.json` via `readJSON`, returns slice.
- `SaveHealthStates(states []models.HealthState) error` -- acquires `healthMu.Lock`, writes via `writeJSON`.
- `HealthMap() map[string]bool` -- reads `health_state.json` directly via `readJSON` WITHOUT acquiring `healthMu` (to avoid deadlock when called from PublicStreams/GetActiveScrapedLinks which hold other locks). Returns map of URL -> Healthy. Ignores read errors (returns empty map). URLs not in the map are implicitly healthy.
Follow the exact patterns in `internal/store/streams.go` and `internal/store/scraped.go` for lock acquisition and `readJSON`/`writeJSON` usage.
</action>
<verify>
Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./...` -- must compile without errors.
Verify the new HealthState type is recognized: `go vet ./internal/models/... ./internal/store/...`
</verify>
<done>
HealthState model defined in models.go. Store has healthMu field. health.go has LoadHealthStates, SaveHealthStates, and HealthMap methods. Code compiles.
</done>
</task>
<task type="auto">
<name>Task 2: Export HasVideoContent and create HealthChecker service</name>
<files>
internal/scraper/validate.go
internal/healthcheck/healthcheck.go
</files>
<action>
1. In `internal/scraper/validate.go`:
- Rename `hasVideoContent` to `HasVideoContent` (capitalize first letter only).
- Update the call site in `validateLinks()` on the same file: `hasVideoContent(client, link.URL)` becomes `HasVideoContent(client, link.URL)`.
- Do NOT change the function signature or behavior. Only capitalize the name.
2. Create `internal/healthcheck/healthcheck.go` with the HealthChecker service. Follow the research code examples closely:
Package: `healthcheck`
Imports: `context`, `log`, `net/http`, `sync`, `time`, `f1-stream/internal/models`, `f1-stream/internal/scraper`, `f1-stream/internal/store`
Constants: `const unhealthyThreshold = 5`
Struct:
```go
type HealthChecker struct {
store *store.Store
interval time.Duration
timeout time.Duration
client *http.Client
mu sync.Mutex
}
```
Constructor `New(s *store.Store, interval, timeout time.Duration) *HealthChecker`:
- Creates `http.Client` with `Timeout: timeout` and `CheckRedirect` limiting to 3 redirects (same as scraper pattern).
- Sets `User-Agent` header in requests (see checkOne).
`Run(ctx context.Context)`:
- Log startup message with interval and timeout values.
- Call `checkAll()` immediately.
- Create `time.NewTicker(interval)`.
- Select loop on `ctx.Done()` (log shutdown, return) and `ticker.C` (call `checkAll()`).
`checkAll()`:
- Acquire `hc.mu.Lock()` to prevent concurrent check runs.
- Record start time.
- Call `collectURLs()` to get all known stream URLs.
- Log URL count.
- Load existing health states via `hc.store.LoadHealthStates()`. Build `map[string]*models.HealthState` keyed by URL.
- For each URL, call `scraper.HasVideoContent(hc.client, url)`.
- NOTE: HasVideoContent already does GET, checks status code (reachability), and checks content markers. This covers HLTH-02 and HLTH-03 in a single request per the research design decision.
- On success: set `ConsecutiveFailures = 0`, `Healthy = true`. If was previously unhealthy, log recovery.
- On failure: increment `ConsecutiveFailures`. If `>= unhealthyThreshold` and currently `Healthy`, set `Healthy = false`, log the event.
- Set `LastCheckTime = now` for all checked URLs.
- Prune orphaned entries: only keep states whose URL is in the current URL set (prevents unbounded file growth).
- Save via `hc.store.SaveHealthStates(finalStates)`.
- Log summary: duration, checked count, healthy count, recovered count, newly unhealthy count.
`collectURLs() []string`:
- Call `hc.store.LoadStreams()` -- iterate all streams, add URLs to a `map[string]bool` for deduplication.
- Call `hc.store.LoadScrapedLinks()` -- iterate all scraped links, add URLs.
- Log errors from load calls but continue (partial check is better than no check).
- Return deduplicated URL slice.
Helper `truncate(s string, n int) string` -- same pattern as scraper's truncate for log messages.
</action>
<verify>
Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./...` -- must compile without errors.
Run `go vet ./internal/scraper/... ./internal/healthcheck/...` -- no issues.
Verify HasVideoContent export doesn't break scraper: `go build ./internal/scraper/...`
</verify>
<done>
HasVideoContent exported in validate.go. HealthChecker service created with Run, checkAll, checkOne, collectURLs. Compiles cleanly. Scraper still builds.
</done>
</task>
</tasks>
<verification>
```bash
cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files
# Full build
go build ./...
# Vet all modified packages
go vet ./internal/models/... ./internal/store/... ./internal/scraper/... ./internal/healthcheck/...
# Run existing tests to verify no regressions
go test ./internal/scraper/... -v
```
</verification>
<success_criteria>
- HealthState model exists with all 4 required fields (URL, ConsecutiveFailures, LastCheckTime, Healthy)
- Store has healthMu and 3 health methods (LoadHealthStates, SaveHealthStates, HealthMap)
- HasVideoContent is exported and scraper still builds
- HealthChecker has Run/checkAll/collectURLs implementing the full check cycle
- All existing tests pass
- `go build ./...` succeeds
</success_criteria>
<output>
After completion, create `.planning/phases/02-health-check-infrastructure/02-01-SUMMARY.md`
</output>

View file

@ -1,106 +0,0 @@
---
phase: 02-health-check-infrastructure
plan: 01
subsystem: healthcheck
tags: [health-monitoring, http-client, video-detection, json-persistence]
# Dependency graph
requires:
- phase: 01-scraper-validation
provides: "HasVideoContent video detection, Store persistence patterns"
provides:
- "HealthState model for tracking stream health"
- "Store methods: LoadHealthStates, SaveHealthStates, HealthMap"
- "HealthChecker service with Run loop, failure counting, recovery detection"
- "Exported HasVideoContent for cross-package use"
affects: [02-health-check-infrastructure, 03-api-integration]
# Tech tracking
tech-stack:
added: []
patterns: ["health check loop with configurable interval/timeout", "consecutive failure threshold for flap prevention", "orphan pruning to prevent unbounded state growth"]
key-files:
created:
- "internal/store/health.go"
- "internal/healthcheck/healthcheck.go"
modified:
- "internal/models/models.go"
- "internal/store/store.go"
- "internal/scraper/validate.go"
key-decisions:
- "HealthMap reads file without lock to avoid deadlock from cross-lock scenarios"
- "Single HasVideoContent call per URL covers both reachability and content checks"
- "Orphaned health state entries pruned each cycle to prevent unbounded file growth"
patterns-established:
- "Health state persistence: JSON file with RWMutex protection matching store patterns"
- "Background service: constructor + Run(ctx) with ticker loop matching scraper pattern"
requirements-completed: [HLTH-01, HLTH-02, HLTH-03, HLTH-05, HLTH-06, HLTH-08]
# Metrics
duration: 2min
completed: 2026-02-17
---
# Phase 02 Plan 01: Health Check Infrastructure Summary
**HealthState model with JSON persistence, exported HasVideoContent, and HealthChecker background service with 5-failure threshold and recovery detection**
## Performance
- **Duration:** 2 min
- **Started:** 2026-02-17T21:17:03Z
- **Completed:** 2026-02-17T21:19:32Z
- **Tasks:** 2
- **Files modified:** 5
## Accomplishments
- HealthState model with URL, ConsecutiveFailures, LastCheckTime, Healthy fields
- Store persistence layer with LoadHealthStates, SaveHealthStates, and lock-free HealthMap
- Exported HasVideoContent from scraper package for cross-package reuse
- HealthChecker service with configurable interval/timeout, failure counting (threshold=5), recovery detection, and orphan pruning
## Task Commits
Each task was committed atomically:
1. **Task 1: Add HealthState model and store persistence layer** - `c53b557` (feat)
2. **Task 2: Export HasVideoContent and create HealthChecker service** - `e719efe` (feat)
## Files Created/Modified
- `internal/models/models.go` - Added HealthState struct with 4 fields
- `internal/store/store.go` - Added healthMu sync.RWMutex field to Store struct
- `internal/store/health.go` - LoadHealthStates, SaveHealthStates, HealthMap methods
- `internal/scraper/validate.go` - Exported hasVideoContent as HasVideoContent
- `internal/healthcheck/healthcheck.go` - HealthChecker service with Run, checkAll, collectURLs
## Decisions Made
- HealthMap reads health_state.json without acquiring healthMu to avoid deadlock when called from methods holding other locks (streamsMu, scrapedMu)
- Single HasVideoContent call per URL covers both reachability (HTTP status check) and content validation (video marker detection), matching the research design decision
- Orphaned health state entries are pruned each cycle to prevent unbounded JSON file growth
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
None
## User Setup Required
None - no external service configuration required.
## Next Phase Readiness
- HealthChecker service is ready to be wired into main.go (plan 02-02)
- HealthMap is ready for use by API handlers to filter unhealthy streams
- All existing scraper tests pass with the HasVideoContent export
---
*Phase: 02-health-check-infrastructure*
*Completed: 2026-02-17*
## Self-Check: PASSED
All 6 files verified present. Both task commits (c53b557, e719efe) verified in git log.

View file

@ -1,197 +0,0 @@
---
phase: 02-health-check-infrastructure
plan: 02
type: execute
wave: 2
depends_on: ["02-01"]
files_modified:
- main.go
- internal/store/streams.go
- internal/store/scraped.go
autonomous: true
requirements: [HLTH-01, HLTH-04, HLTH-07, HLTH-09]
must_haves:
truths:
- "Health checker starts as a background goroutine in main.go alongside the scraper"
- "Health check interval is configurable via HEALTH_CHECK_INTERVAL env var (default 5m)"
- "Health check timeout is configurable via HEALTH_CHECK_TIMEOUT env var (default 10s)"
- "PublicStreams() filters out streams whose URL is marked unhealthy in health_state.json"
- "GetActiveScrapedLinks() filters out scraped links whose URL is marked unhealthy in health_state.json"
- "Streams with no health state entry (new/unchecked) are assumed healthy and still appear"
artifacts:
- path: "main.go"
provides: "Health checker initialization and goroutine startup"
contains: "healthcheck.New"
- path: "internal/store/streams.go"
provides: "Health-filtered PublicStreams method"
contains: "HealthMap"
- path: "internal/store/scraped.go"
provides: "Health-filtered GetActiveScrapedLinks method"
contains: "HealthMap"
key_links:
- from: "main.go"
to: "internal/healthcheck/healthcheck.go"
via: "healthcheck.New(st, healthInterval, healthTimeout) and go hc.Run(ctx)"
pattern: "healthcheck\\.New"
- from: "internal/store/streams.go"
to: "internal/store/health.go"
via: "s.HealthMap() for filtering in PublicStreams"
pattern: "s\\.HealthMap\\(\\)"
- from: "internal/store/scraped.go"
to: "internal/store/health.go"
via: "s.HealthMap() for filtering in GetActiveScrapedLinks"
pattern: "s\\.HealthMap\\(\\)"
---
<objective>
Wire the HealthChecker into the application lifecycle and add health-based filtering to the public API endpoints.
Purpose: Complete the health check infrastructure by starting the service on boot and hiding unhealthy streams from users.
Output: Fully integrated health checking with env var configuration and public API filtering.
</objective>
<execution_context>
@/Users/viktorbarzin/.claude/get-shit-done/workflows/execute-plan.md
@/Users/viktorbarzin/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/02-health-check-infrastructure/02-RESEARCH.md
@.planning/phases/02-health-check-infrastructure/02-01-SUMMARY.md
@main.go
@internal/store/streams.go
@internal/store/scraped.go
@internal/store/health.go
@internal/healthcheck/healthcheck.go
</context>
<tasks>
<task type="auto">
<name>Task 1: Wire health checker in main.go with env var configuration</name>
<files>main.go</files>
<action>
1. Add import for `"f1-stream/internal/healthcheck"` to the import block in main.go (in the local package group, after the scraper import).
2. After the scraper initialization (`sc := scraper.New(...)`) and before the server initialization (`srv := server.New(...)`), add:
```go
healthInterval := envDuration("HEALTH_CHECK_INTERVAL", 5*time.Minute)
healthTimeout := envDuration("HEALTH_CHECK_TIMEOUT", 10*time.Second)
hc := healthcheck.New(st, healthInterval, healthTimeout)
```
This uses the existing `envDuration()` helper already in main.go, following the exact same pattern as `scrapeInterval` and `validateTimeout`.
3. After the scraper goroutine start (`go sc.Run(ctx)`), add:
```go
go hc.Run(ctx)
```
This starts the health checker in its own goroutine, receiving the same cancellable context for graceful shutdown.
Note: The `hc` variable is only used for Run -- it does not need to be passed to the server. The health checker communicates exclusively through the store (reading/writing health_state.json). The server reads health state via `store.HealthMap()` in the filtering methods.
</action>
<verify>
Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./...` -- must compile.
Verify env var defaults: grep for HEALTH_CHECK_INTERVAL and HEALTH_CHECK_TIMEOUT in main.go.
</verify>
<done>
Health checker initialized with configurable interval (HEALTH_CHECK_INTERVAL, default 5m) and timeout (HEALTH_CHECK_TIMEOUT, default 10s). Starts as background goroutine with graceful shutdown via context.
</done>
</task>
<task type="auto">
<name>Task 2: Filter unhealthy streams from PublicStreams and GetActiveScrapedLinks</name>
<files>
internal/store/streams.go
internal/store/scraped.go
</files>
<action>
1. Modify `PublicStreams()` in `internal/store/streams.go`:
- After loading and iterating streams, add health filtering.
- Call `s.HealthMap()` to get the URL -> healthy map.
- In the filter loop, change from only checking `st.Published` to also checking health:
```go
healthMap := s.HealthMap()
var pub []models.Stream
for _, st := range streams {
if !st.Published {
continue
}
// Filter unhealthy streams. URLs not in healthMap are assumed healthy (new/unchecked).
if healthy, exists := healthMap[st.URL]; exists && !healthy {
continue
}
pub = append(pub, st)
}
```
- CRITICAL: `HealthMap()` reads health_state.json directly without acquiring healthMu, so calling it while holding streamsMu.RLock() is safe (no deadlock risk). This was a deliberate design decision from the research.
- CRITICAL: URLs NOT in the health map must be treated as healthy. The check `exists && !healthy` ensures this -- if `exists` is false, the stream is kept. This prevents newly submitted or scraped streams from disappearing before their first health check.
2. Modify `GetActiveScrapedLinks()` in `internal/store/scraped.go`:
- After the existing staleness filter, add health filtering.
- Call `s.HealthMap()` to get the URL -> healthy map.
- Add health check to the filter loop:
```go
healthMap := s.HealthMap()
var active []models.ScrapedLink
for _, l := range links {
l.Stale = now.Sub(l.ScrapedAt) > 24*time.Hour
if l.Stale {
continue
}
// Filter unhealthy scraped links. URLs not in healthMap are assumed healthy.
if healthy, exists := healthMap[l.URL]; exists && !healthy {
continue
}
active = append(active, l)
}
```
- Same deadlock-safe and assume-healthy-by-default pattern as PublicStreams.
</action>
<verify>
Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./...` -- must compile.
Run `go vet ./internal/store/...` -- no issues.
Run `go test ./... -v 2>&1 | tail -20` -- all existing tests pass.
</verify>
<done>
PublicStreams() filters out unhealthy streams. GetActiveScrapedLinks() filters out unhealthy scraped links. URLs without health state entries are assumed healthy. No deadlock risk from lock ordering.
</done>
</task>
</tasks>
<verification>
```bash
cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files
# Full build
go build ./...
# Vet everything
go vet ./...
# Run all tests
go test ./... -v
# Verify the health checker is wired
grep -n "healthcheck" main.go
# Verify health filtering is in place
grep -n "HealthMap" internal/store/streams.go internal/store/scraped.go
```
</verification>
<success_criteria>
- main.go initializes HealthChecker with HEALTH_CHECK_INTERVAL (default 5m) and HEALTH_CHECK_TIMEOUT (default 10s)
- Health checker runs as a goroutine with the shared context
- PublicStreams() excludes URLs marked unhealthy in health_state.json
- GetActiveScrapedLinks() excludes URLs marked unhealthy in health_state.json
- New/unchecked URLs (no health state entry) are treated as healthy
- All existing tests pass
- `go build ./...` succeeds
</success_criteria>
<output>
After completion, create `.planning/phases/02-health-check-infrastructure/02-02-SUMMARY.md`
</output>

View file

@ -1,101 +0,0 @@
---
phase: 02-health-check-infrastructure
plan: 02
subsystem: healthcheck
tags: [health-monitoring, api-filtering, lifecycle-management, env-configuration]
# Dependency graph
requires:
- phase: 02-health-check-infrastructure
plan: 01
provides: "HealthChecker service, HealthMap method, HealthState model"
provides:
- "Health checker wired into application lifecycle with env var configuration"
- "PublicStreams filters unhealthy streams from user-facing API"
- "GetActiveScrapedLinks filters unhealthy scraped links from user-facing API"
affects: [03-api-integration]
# Tech tracking
tech-stack:
added: []
patterns: ["env var configuration for health check interval/timeout", "health-based filtering with assume-healthy-by-default for unchecked URLs"]
key-files:
created: []
modified:
- "main.go"
- "internal/store/streams.go"
- "internal/store/scraped.go"
key-decisions:
- "URLs not in health map assumed healthy to prevent new streams disappearing before first check"
- "HealthMap called within streamsMu/scrapedMu read locks safely via lock-free file read"
patterns-established:
- "Health filtering pattern: load healthMap, skip entries where exists && !healthy"
- "Env var configuration: envDuration helper with sensible defaults for operational tuning"
requirements-completed: [HLTH-01, HLTH-04, HLTH-07, HLTH-09]
# Metrics
duration: 2min
completed: 2026-02-17
---
# Phase 02 Plan 02: Health Check Integration Summary
**Health checker wired into main.go with configurable interval/timeout, and PublicStreams/GetActiveScrapedLinks filtering out unhealthy URLs**
## Performance
- **Duration:** 2 min
- **Started:** 2026-02-17T21:21:38Z
- **Completed:** 2026-02-17T21:23:15Z
- **Tasks:** 2
- **Files modified:** 3
## Accomplishments
- Health checker initialized in main.go with HEALTH_CHECK_INTERVAL (default 5m) and HEALTH_CHECK_TIMEOUT (default 10s)
- Health checker runs as background goroutine with graceful shutdown via shared context
- PublicStreams() filters out streams marked unhealthy in health_state.json
- GetActiveScrapedLinks() filters out scraped links marked unhealthy in health_state.json
- Unchecked URLs (no health state entry) are assumed healthy and still appear to users
## Task Commits
Each task was committed atomically:
1. **Task 1: Wire health checker in main.go with env var configuration** - `8ad68d5` (feat)
2. **Task 2: Filter unhealthy streams from PublicStreams and GetActiveScrapedLinks** - `535c56d` (feat)
## Files Created/Modified
- `main.go` - Added healthcheck import, env var config, HealthChecker init, background goroutine start
- `internal/store/streams.go` - PublicStreams now calls HealthMap and filters unhealthy URLs
- `internal/store/scraped.go` - GetActiveScrapedLinks now calls HealthMap and filters unhealthy URLs
## Decisions Made
- URLs not present in the health map are treated as healthy (assume-healthy-by-default) to prevent newly submitted or scraped streams from disappearing before their first health check cycle
- HealthMap() is called within streamsMu.RLock()/scrapedMu.RLock() scopes safely because it reads health_state.json directly without acquiring healthMu (lock-free design from plan 02-01)
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
None
## User Setup Required
None - no external service configuration required. HEALTH_CHECK_INTERVAL and HEALTH_CHECK_TIMEOUT env vars are optional with sensible defaults.
## Next Phase Readiness
- Health check infrastructure is fully operational: checker runs on boot, unhealthy streams hidden from users
- Phase 02 complete -- all health check requirements fulfilled
- Ready for Phase 03 API integration work
---
*Phase: 02-health-check-infrastructure*
*Completed: 2026-02-17*
## Self-Check: PASSED
All 5 key files verified present. Both task commits (8ad68d5, 535c56d) verified in git log.

View file

@ -1,801 +0,0 @@
# Phase 2: Health Check Infrastructure - Research
**Researched:** 2026-02-17
**Domain:** Background health monitoring service with persisted state in Go
**Confidence:** HIGH
## Summary
Phase 2 adds a background health checker service that continuously monitors all known streams (both user-submitted from `streams.json` and scraped from `scraped_links.json`) on a configurable interval. The service performs a two-step check per stream: first an HTTP HEAD/GET for reachability (2xx status), then a full proxy-fetch to verify video/player content markers using the `hasVideoContent()` function already implemented in Phase 1's `internal/scraper/validate.go`. Health state (consecutive failure count, last check time, healthy/unhealthy flag) is persisted to a new `health_state.json` file using the existing store pattern. Streams with 5+ consecutive failures are hidden from the public API.
The implementation closely mirrors the existing scraper service pattern: a struct with a `Run(ctx)` method started as a goroutine in `main.go`, using `time.Ticker` for interval-based execution. The health checker needs access to the store (to enumerate streams and read/write health state) and to the validation logic from `internal/scraper/validate.go`. The main architectural decision is where to place the health checker -- a new `internal/healthcheck/` package is the cleanest approach, importing the scraper's validation functions.
The public streams endpoint (`GET /api/streams/public`) and the scraped links endpoint (`GET /api/scraped`) need modification to filter out unhealthy streams. This requires the store to cross-reference health state when returning public data.
**Primary recommendation:** Create a new `internal/healthcheck/` package containing the `HealthChecker` service. Add a `HealthState` model to `internal/models/models.go`. Add store methods for health state persistence in a new `internal/store/health.go` file. Modify `PublicStreams()` and `GetActiveScrapedLinks()` to exclude unhealthy entries. Wire everything in `main.go` following the scraper initialization pattern exactly.
<phase_requirements>
## Phase Requirements
| ID | Description | Research Support |
|----|-------------|-----------------|
| HLTH-01 | Background health checker service runs every 5 minutes against all known streams (scraped + user-submitted) | New `internal/healthcheck/` package with `HealthChecker` struct. `Run(ctx)` method with `time.Ticker` (same pattern as `scraper.Run()`). Enumerates all streams via `store.LoadStreams()` and `store.LoadScrapedLinks()`. Default interval 5 minutes. |
| HLTH-02 | Health check performs HTTP reachability check first (does the URL respond with 2xx?) | First step: HTTP HEAD request (or GET with body discard) to the URL. Check `resp.StatusCode >= 200 && resp.StatusCode < 300`. If not reachable, mark as failed without doing content check. Uses same `http.Client` with configurable timeout. |
| HLTH-03 | If HTTP check passes, health checker proxy-fetches the page and checks for video/player content markers | Second step: reuse `hasVideoContent()` from `internal/scraper/validate.go`. This function already performs GET, checks Content-Type, reads body with 2MB limit, and runs `containsVideoMarkers()`. The function must be exported or the health checker must import the scraper package. |
| HLTH-04 | Health check has a configurable timeout per check (default 10s) | `HEALTH_CHECK_TIMEOUT` env var read in `main.go`, passed to `healthcheck.New()`. Used as `http.Client.Timeout`. Default 10 seconds. Follows `envDuration()` pattern already in `main.go`. |
| HLTH-05 | Each stream tracks consecutive failure count, last check time, and healthy/unhealthy status in persisted state | New `HealthState` model with fields: `URL string`, `ConsecutiveFailures int`, `LastCheckTime time.Time`, `Healthy bool`. Stored in `health_state.json` via new store methods. Keyed by URL (not ID) since both streams and scraped links have URLs but different ID schemes. |
| HLTH-06 | Stream marked unhealthy after 5 consecutive health check failures | In health checker's check loop: increment `ConsecutiveFailures` on failure. When count reaches 5, set `Healthy = false`. Constant `unhealthyThreshold = 5` defined in healthcheck package. |
| HLTH-07 | Unhealthy streams hidden from public streams page (`GET /api/streams/public`) | Modify `store.PublicStreams()` to cross-reference health state. Also modify `store.GetActiveScrapedLinks()` similarly. Both methods already filter (by `Published` and by `Stale`) -- add health filter. |
| HLTH-08 | Unhealthy streams continue to be checked -- restored to healthy if they recover (failure count resets) | Health checker always checks ALL streams regardless of health status. On successful check: set `Healthy = true`, reset `ConsecutiveFailures = 0`. This is the default behavior since the checker iterates all known URLs. |
| HLTH-09 | Health check interval configurable via `HEALTH_CHECK_INTERVAL` env var (default 5m) | `HEALTH_CHECK_INTERVAL` env var read in `main.go` via `envDuration()`, passed to `healthcheck.New()`. Used as `time.Ticker` interval. Default `5 * time.Minute`. |
</phase_requirements>
## Standard Stack
### Core
| Library | Version | Purpose | Why Standard |
|---------|---------|---------|--------------|
| `net/http` | stdlib | HTTP client for reachability checks and content fetching | Already used throughout codebase. `http.Client` with `Timeout` field handles per-check timeout (HLTH-04). |
| `time` | stdlib | `time.Ticker` for interval-based health check loop | Already used in scraper's `Run()` method and session cleanup. Proven pattern in this codebase. |
| `context` | stdlib | Graceful shutdown of health checker goroutine | Already used in `main.go` with `signal.NotifyContext()`. Health checker's `Run(ctx)` listens for `ctx.Done()`. |
| `sync` | stdlib | `sync.RWMutex` for health state file access | Already used in every store file (`streamsMu`, `usersMu`, `scrapedMu`, `sessionsMu`). Add `healthMu`. |
| `encoding/json` | stdlib | JSON serialization of health state to file | Already used by `readJSON`/`writeJSON` in `internal/store/store.go`. |
### Supporting
| Library | Version | Purpose | When to Use |
|---------|---------|---------|-------------|
| `log` | stdlib | Structured logging with component prefix | Use `log.Printf("healthcheck: ...")` following scraper's convention `log.Printf("scraper: ...")`. |
| `strings` | stdlib | URL normalization for health state key lookup | Already used in scraper for `normalizeURL()`. |
### Alternatives Considered
| Instead of | Could Use | Tradeoff |
|------------|-----------|----------|
| File-based health state (`health_state.json`) | In-memory map only | In-memory is simpler but violates HLTH-05 requirement for persistence across restarts. File-based follows existing store pattern. |
| New `internal/healthcheck/` package | Health check methods on existing `Scraper` struct | Separate package is cleaner: different concern (monitoring vs discovery), different lifecycle, different interval. Avoids coupling health check config/state to scraper. |
| HEAD request for reachability | GET request only | HEAD is faster (no body transfer) but some servers don't support it or return different status codes. Fallback to GET if HEAD fails or returns 405. Alternatively, just use GET for both steps since `hasVideoContent()` already does GET. |
| URL as health state key | Stream/ScrapedLink ID as key | URL is better because: (1) streams and scraped links have different ID formats, (2) same URL may appear in both, (3) health is a property of the URL not the record, (4) deduplication is simpler. |
**Installation:**
```bash
# No new dependencies needed. All stdlib.
```
## Architecture Patterns
### Recommended Project Structure
```
internal/
healthcheck/
healthcheck.go # HealthChecker struct, Run(), checkAll(), checkOne()
models/
models.go # Add HealthState struct (existing file)
store/
health.go # NEW: LoadHealthState(), SaveHealthState(), GetHealthStatus()
store.go # Add healthMu field (existing file)
streams.go # Modify PublicStreams() to filter unhealthy (existing file)
scraped.go # Modify GetActiveScrapedLinks() to filter unhealthy (existing file)
scraper/
validate.go # Existing - export HasVideoContent() for health checker use
main.go # Add healthcheck initialization and goroutine (existing file)
```
### Pattern 1: Background Service with Ticker (proven pattern in codebase)
**What:** A service struct with `Run(ctx context.Context)` that executes on a configurable interval using `time.Ticker`, stopping cleanly when the context is cancelled.
**When to use:** Background periodic tasks that need graceful shutdown.
**Example:**
```go
// internal/healthcheck/healthcheck.go
package healthcheck
import (
"context"
"log"
"net/http"
"time"
"f1-stream/internal/store"
)
type HealthChecker struct {
store *store.Store
interval time.Duration
timeout time.Duration
client *http.Client
}
func New(s *store.Store, interval, timeout time.Duration) *HealthChecker {
return &HealthChecker{
store: s,
interval: interval,
timeout: timeout,
client: &http.Client{
Timeout: timeout,
CheckRedirect: func(req *http.Request, via []*http.Request) error {
if len(via) >= 3 {
return http.ErrUseLastResponse
}
return nil
},
},
}
}
func (hc *HealthChecker) Run(ctx context.Context) {
log.Printf("healthcheck: starting with interval %v, timeout %v", hc.interval, hc.timeout)
// Run immediately on start
hc.checkAll()
ticker := time.NewTicker(hc.interval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
log.Println("healthcheck: shutting down")
return
case <-ticker.C:
hc.checkAll()
}
}
}
```
### Pattern 2: Health State Persistence (follows store pattern)
**What:** A new JSON file in the data directory storing per-URL health state, with the same mutex-protected read/write pattern as other store entities.
**When to use:** Persisting health state across restarts (HLTH-05).
**Example:**
```go
// internal/models/models.go - add to existing file
type HealthState struct {
URL string `json:"url"`
ConsecutiveFailures int `json:"consecutive_failures"`
LastCheckTime time.Time `json:"last_check_time"`
Healthy bool `json:"healthy"`
}
// internal/store/health.go
package store
import "f1-stream/internal/models"
func (s *Store) LoadHealthStates() ([]models.HealthState, error) {
s.healthMu.RLock()
defer s.healthMu.RUnlock()
var states []models.HealthState
if err := readJSON(s.filePath("health_state.json"), &states); err != nil {
return nil, err
}
return states, nil
}
func (s *Store) SaveHealthStates(states []models.HealthState) error {
s.healthMu.Lock()
defer s.healthMu.Unlock()
return writeJSON(s.filePath("health_state.json"), states)
}
// IsURLHealthy checks if a URL is considered healthy.
// Returns true if no health state exists (new URLs are assumed healthy).
func (s *Store) IsURLHealthy(url string) (bool, error) {
states, err := s.LoadHealthStates()
if err != nil {
return true, err // assume healthy on error
}
for _, st := range states {
if st.URL == url {
return st.Healthy, nil
}
}
return true, nil // no state = assumed healthy
}
```
### Pattern 3: Two-Step Health Check (HTTP reachability + content validation)
**What:** Each stream URL is checked in two steps: (1) HTTP reachability (does it respond with 2xx?), then (2) content validation (does the response contain video markers?). Step 2 only runs if step 1 passes.
**When to use:** HLTH-02 and HLTH-03 require this two-step approach.
**Example:**
```go
// internal/healthcheck/healthcheck.go
const unhealthyThreshold = 5
func (hc *HealthChecker) checkOne(url string) bool {
// Step 1: HTTP reachability check (HLTH-02)
req, err := http.NewRequest("GET", url, nil)
if err != nil {
log.Printf("healthcheck: request error for %s: %v", truncate(url, 60), err)
return false
}
req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36")
resp, err := hc.client.Do(req)
if err != nil {
log.Printf("healthcheck: fetch error for %s: %v", truncate(url, 60), err)
return false
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
log.Printf("healthcheck: %s returned status %d", truncate(url, 60), resp.StatusCode)
return false
}
// Step 2: Content validation (HLTH-03)
// Reuse scraper's hasVideoContent logic
// (either call exported function or inline equivalent logic)
return scraper.HasVideoContent(hc.client, url)
}
```
### Pattern 4: Exporting Validation Functions for Cross-Package Use
**What:** The `hasVideoContent()` function in `internal/scraper/validate.go` is currently unexported (lowercase). Phase 2 needs it in `internal/healthcheck/`. Export it by capitalizing.
**When to use:** When a function in one package needs to be called from another.
**Example:**
```go
// internal/scraper/validate.go - rename for export
// HasVideoContent performs a GET request and returns true if the response
// contains video/player content markers.
func HasVideoContent(client *http.Client, rawURL string) bool {
// ... existing implementation unchanged
}
// Update call site in same file:
func validateLinks(links []models.ScrapedLink, timeout time.Duration) []models.ScrapedLink {
// ...
if HasVideoContent(client, link.URL) { // was hasVideoContent
// ...
}
```
### Pattern 5: Filtering Unhealthy Streams in Public API
**What:** Modify `PublicStreams()` and `GetActiveScrapedLinks()` to cross-reference health state and exclude unhealthy URLs.
**When to use:** HLTH-07 requires hiding unhealthy streams from the public page.
**Example:**
```go
// internal/store/streams.go - modified PublicStreams()
func (s *Store) PublicStreams() ([]models.Stream, error) {
s.streamsMu.RLock()
defer s.streamsMu.RUnlock()
var streams []models.Stream
if err := readJSON(s.filePath("streams.json"), &streams); err != nil {
return nil, err
}
// Load health states to filter unhealthy streams
healthStates := s.loadHealthMap()
var pub []models.Stream
for _, st := range streams {
if st.Published && healthStates[st.URL] {
pub = append(pub, st)
}
}
return pub, nil
}
// loadHealthMap returns a map of URL -> healthy status.
// URLs not in the map are considered healthy (new/unchecked).
func (s *Store) loadHealthMap() map[string]bool {
// Note: must not acquire healthMu here if caller already holds another lock
// Use separate read to avoid deadlock
var states []models.HealthState
readJSON(s.filePath("health_state.json"), &states) // ignore error, assume healthy
result := make(map[string]bool)
for _, st := range states {
result[st.URL] = st.Healthy
}
return result
}
```
### Pattern 6: Collecting All URLs to Check
**What:** The health checker needs to enumerate ALL known stream URLs from both `streams.json` and `scraped_links.json`.
**When to use:** HLTH-01 requires checking all known streams.
**Example:**
```go
func (hc *HealthChecker) collectURLs() []string {
seen := make(map[string]bool)
var urls []string
// User-submitted streams
streams, err := hc.store.LoadStreams()
if err != nil {
log.Printf("healthcheck: failed to load streams: %v", err)
}
for _, s := range streams {
if !seen[s.URL] {
seen[s.URL] = true
urls = append(urls, s.URL)
}
}
// Scraped links
links, err := hc.store.LoadScrapedLinks()
if err != nil {
log.Printf("healthcheck: failed to load scraped links: %v", err)
}
for _, l := range links {
if !seen[l.URL] {
seen[l.URL] = true
urls = append(urls, l.URL)
}
}
return urls
}
```
### Anti-Patterns to Avoid
- **Holding store mutexes during HTTP requests:** Never lock `streamsMu` or `scrapedMu` while performing health checks. Load the URLs first (release lock), then check them, then update health state (acquire `healthMu`). Long-held locks block the API.
- **Modifying streams/scraped links directly:** The health checker should NOT modify `streams.json` or `scraped_links.json` to mark health status. Health state is a separate concern stored in `health_state.json`. The public API filters at query time.
- **Checking URLs in parallel without rate limiting:** Parallel checks would be faster but could overwhelm target servers and the network. Sequential checking is simpler and follows the scraper's established pattern. The 5-minute interval provides sufficient freshness.
- **Using stream/scraped link IDs as health state keys:** IDs are different between streams and scraped links, and the same URL could appear in both. URL is the natural key for health state.
- **Making the health checker depend on the server package:** The health checker should depend only on `store` and `scraper` (for validation). It should not import `server`. Keep the dependency tree clean.
## Don't Hand-Roll
| Problem | Don't Build | Use Instead | Why |
|---------|-------------|-------------|-----|
| Video content detection | Custom detection in healthcheck | `scraper.HasVideoContent()` (export existing) | Already implemented and tested in Phase 1. Avoids duplicating marker list and detection logic. |
| HTTP client with redirect limits | Custom redirect handler | `http.Client` with `CheckRedirect` callback | stdlib handles TLS, timeouts, connection pooling. Same pattern used in scraper and proxy. |
| Periodic execution | Custom sleep loop | `time.Ticker` | Handles drift, more accurate intervals, idiomatic Go. Already used in scraper and session cleanup. |
| Graceful shutdown | OS signal handling | `context.Context` from `signal.NotifyContext` | Already set up in `main.go`. Just pass ctx to `Run()`. |
| Atomic file writes | Direct `os.WriteFile` | Existing `writeJSON()` (temp-file-then-rename) | Already implemented in `store.go`. Prevents corruption on crash. |
| URL deduplication | Custom loop | `map[string]bool` with normalized URL key | Same pattern used in scraper's `scrape()` method. |
**Key insight:** Phase 2 is primarily an orchestration problem -- it combines existing primitives (HTTP fetching, video detection, file persistence, background service) into a new service. Almost every component has a working example in the codebase.
## Common Pitfalls
### Pitfall 1: Deadlock from Nested Mutex Acquisition
**What goes wrong:** `PublicStreams()` holds `streamsMu.RLock()` and then calls `loadHealthMap()` which tries to acquire `healthMu.RLock()`. If another goroutine holds `healthMu.Lock()` and is waiting for `streamsMu.RLock()`, deadlock occurs.
**Why it happens:** The health state filter in `PublicStreams()` needs to read `health_state.json` while holding the streams lock.
**How to avoid:** The `loadHealthMap()` helper should read the health file directly via `readJSON()` WITHOUT acquiring `healthMu`. This is safe because: (1) `readJSON` reads the file atomically (single `os.ReadFile` call), (2) `writeJSON` uses atomic rename, so the file is always in a consistent state, (3) a brief inconsistency (reading slightly stale health data) is acceptable for a status query. Alternatively, load health states BEFORE acquiring the streams lock.
**Warning signs:** Server hangs on `GET /api/streams/public` under health check load.
### Pitfall 2: Health Check Takes Longer Than the Interval
**What goes wrong:** With 100 streams and a 10-second timeout per stream, worst case is 1000 seconds (~17 minutes). The 5-minute interval would trigger another check before the first completes.
**Why it happens:** Sequential checking of many streams with generous timeouts.
**How to avoid:** Use a mutex to prevent concurrent health check runs (same as scraper's `s.mu.Lock()` pattern). Log total check duration. If consistently exceeding interval, consider reducing timeout or adding limited parallelism (e.g., 5 concurrent checks with a semaphore). With realistic stream counts (10-50), even worst case (50 * 10s = 500s = 8.3 min) is manageable, and most checks will respond much faster than 10s.
**Warning signs:** Log messages showing overlapping check cycles or checks taking longer than `interval/2`.
### Pitfall 3: Health State File Growing Unbounded
**What goes wrong:** URLs that were once checked but are no longer in streams or scraped links remain in `health_state.json` forever.
**Why it happens:** No cleanup of orphaned health state entries.
**How to avoid:** During `checkAll()`, the health checker already collects all current URLs. After updating health state, prune any entries whose URLs are not in the current set. This naturally keeps the file in sync with actual streams.
**Warning signs:** `health_state.json` growing much larger than `streams.json` + `scraped_links.json`.
### Pitfall 4: Scraper's hasVideoContent Does a Full GET (Redundant with Reachability Check)
**What goes wrong:** The two-step check (HLTH-02: reachability, HLTH-03: content) results in TWO GET requests to the same URL -- once for reachability, once for content validation.
**Why it happens:** `hasVideoContent()` performs its own GET request internally.
**How to avoid:** Option A: Combine both steps into a single GET request within `checkOne()` -- check status code (reachability) and then inspect body (content), all from one response. This is more efficient. Option B: Use a lightweight HEAD request for reachability, then call `hasVideoContent()` for the full check. Option A is preferred since it halves the number of requests per stream.
**Warning signs:** Double the expected number of outbound HTTP requests per health check cycle.
### Pitfall 5: New Streams Assumed Unhealthy Until First Check
**What goes wrong:** A user submits a stream, and it immediately disappears from the public page because it has no health state entry, and the filter logic treats "no entry" as unhealthy.
**Why it happens:** Incorrect default assumption in health filtering.
**How to avoid:** "No health state entry" MUST mean "assumed healthy." This is the correct default because: (1) user-submitted streams should be visible immediately, (2) scraped streams have already passed Phase 1 validation, (3) the health checker will evaluate the stream within at most 5 minutes. The `loadHealthMap()` helper should return `true` for URLs not in the map.
**Warning signs:** Newly submitted or scraped streams not appearing on the public page until after the first health check.
### Pitfall 6: Modifying Exported Function Signature Breaks Scraper
**What goes wrong:** Exporting `hasVideoContent` as `HasVideoContent` is fine (just capitalize), but if the function signature changes (e.g., adding parameters), the internal call site in `validateLinks()` also needs updating.
**Why it happens:** Two call sites for the same function.
**How to avoid:** Only change capitalization. Do not change the function signature. If the health checker needs different behavior, create a wrapper in the healthcheck package rather than modifying the shared function.
**Warning signs:** Compilation error in `validate.go` after export change.
## Code Examples
### Complete HealthChecker Implementation
```go
// internal/healthcheck/healthcheck.go
package healthcheck
import (
"io"
"log"
"net/http"
"strings"
"sync"
"time"
"f1-stream/internal/models"
"f1-stream/internal/scraper"
"f1-stream/internal/store"
)
const unhealthyThreshold = 5
type HealthChecker struct {
store *store.Store
interval time.Duration
timeout time.Duration
client *http.Client
mu sync.Mutex
}
func New(s *store.Store, interval, timeout time.Duration) *HealthChecker {
return &HealthChecker{
store: s,
interval: interval,
timeout: timeout,
client: &http.Client{
Timeout: timeout,
CheckRedirect: func(req *http.Request, via []*http.Request) error {
if len(via) >= 3 {
return http.ErrUseLastResponse
}
return nil
},
},
}
}
func (hc *HealthChecker) Run(ctx context.Context) {
log.Printf("healthcheck: starting with interval %v, timeout %v", hc.interval, hc.timeout)
hc.checkAll()
ticker := time.NewTicker(hc.interval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
log.Println("healthcheck: shutting down")
return
case <-ticker.C:
hc.checkAll()
}
}
}
```
### checkAll with State Update
```go
func (hc *HealthChecker) checkAll() {
hc.mu.Lock()
defer hc.mu.Unlock()
start := time.Now()
urls := hc.collectURLs()
log.Printf("healthcheck: checking %d URLs", len(urls))
// Load existing health states into a map
existingStates, err := hc.store.LoadHealthStates()
if err != nil {
log.Printf("healthcheck: failed to load health states: %v", err)
existingStates = nil
}
stateMap := make(map[string]*models.HealthState)
for i := range existingStates {
stateMap[existingStates[i].URL] = &existingStates[i]
}
// Check each URL and update state
now := time.Now()
checked := 0
healthy := 0
recovered := 0
newlyUnhealthy := 0
for _, url := range urls {
passed := scraper.HasVideoContent(hc.client, url)
checked++
state, exists := stateMap[url]
if !exists {
state = &models.HealthState{
URL: url,
Healthy: true,
}
stateMap[url] = state
}
state.LastCheckTime = now
if passed {
if !state.Healthy {
recovered++
log.Printf("healthcheck: %s recovered", truncate(url, 60))
}
state.ConsecutiveFailures = 0
state.Healthy = true
healthy++
} else {
state.ConsecutiveFailures++
if state.ConsecutiveFailures >= unhealthyThreshold && state.Healthy {
state.Healthy = false
newlyUnhealthy++
log.Printf("healthcheck: %s marked unhealthy after %d failures",
truncate(url, 60), state.ConsecutiveFailures)
}
}
}
// Build final state slice (only URLs that are still in streams/scraped)
currentURLs := make(map[string]bool)
for _, u := range urls {
currentURLs[u] = true
}
var finalStates []models.HealthState
for url, state := range stateMap {
if currentURLs[url] {
finalStates = append(finalStates, *state)
}
}
if err := hc.store.SaveHealthStates(finalStates); err != nil {
log.Printf("healthcheck: failed to save health states: %v", err)
}
log.Printf("healthcheck: done in %v, checked %d, healthy %d, recovered %d, newly unhealthy %d",
time.Since(start).Round(time.Millisecond), checked, healthy, recovered, newlyUnhealthy)
}
```
### HealthState Model Addition
```go
// Add to internal/models/models.go
type HealthState struct {
URL string `json:"url"`
ConsecutiveFailures int `json:"consecutive_failures"`
LastCheckTime time.Time `json:"last_check_time"`
Healthy bool `json:"healthy"`
}
```
### Store Health Methods
```go
// internal/store/health.go
package store
import "f1-stream/internal/models"
func (s *Store) LoadHealthStates() ([]models.HealthState, error) {
s.healthMu.RLock()
defer s.healthMu.RUnlock()
var states []models.HealthState
if err := readJSON(s.filePath("health_state.json"), &states); err != nil {
return nil, err
}
return states, nil
}
func (s *Store) SaveHealthStates(states []models.HealthState) error {
s.healthMu.Lock()
defer s.healthMu.Unlock()
return writeJSON(s.filePath("health_state.json"), states)
}
// HealthMap returns a map of URL -> healthy boolean.
// URLs not present in the map are considered healthy.
// This method reads the file directly without holding healthMu,
// suitable for use inside other lock-holding methods.
func (s *Store) HealthMap() map[string]bool {
var states []models.HealthState
_ = readJSON(s.filePath("health_state.json"), &states)
m := make(map[string]bool)
for _, st := range states {
m[st.URL] = st.Healthy
}
return m
}
```
### Store Struct Update
```go
// internal/store/store.go - add healthMu field
type Store struct {
dir string
streamsMu sync.RWMutex
usersMu sync.RWMutex
scrapedMu sync.RWMutex
sessionsMu sync.RWMutex
healthMu sync.RWMutex // NEW
}
```
### main.go Integration
```go
// In main.go, after scraper initialization
import "f1-stream/internal/healthcheck"
healthInterval := envDuration("HEALTH_CHECK_INTERVAL", 5*time.Minute)
healthTimeout := envDuration("HEALTH_CHECK_TIMEOUT", 10*time.Second)
hc := healthcheck.New(st, healthInterval, healthTimeout)
// Start health checker in background (after scraper start)
go hc.Run(ctx)
```
### Modified PublicStreams with Health Filter
```go
// internal/store/streams.go - modified
func (s *Store) PublicStreams() ([]models.Stream, error) {
s.streamsMu.RLock()
defer s.streamsMu.RUnlock()
var streams []models.Stream
if err := readJSON(s.filePath("streams.json"), &streams); err != nil {
return nil, err
}
healthMap := s.HealthMap()
var pub []models.Stream
for _, st := range streams {
if !st.Published {
continue
}
// Check health status. If URL not in health map, assume healthy.
if healthy, exists := healthMap[st.URL]; exists && !healthy {
continue
}
pub = append(pub, st)
}
return pub, nil
}
```
### Unit Test for Health State Updates
```go
// internal/healthcheck/healthcheck_test.go
package healthcheck
import (
"testing"
"f1-stream/internal/models"
)
func TestUnhealthyThreshold(t *testing.T) {
state := &models.HealthState{URL: "http://example.com", Healthy: true}
// Simulate 4 failures - should remain healthy
for i := 0; i < 4; i++ {
state.ConsecutiveFailures++
}
if state.ConsecutiveFailures >= unhealthyThreshold {
t.Error("should not be unhealthy after 4 failures")
}
// 5th failure - should become unhealthy
state.ConsecutiveFailures++
if state.ConsecutiveFailures < unhealthyThreshold {
t.Error("should be unhealthy after 5 failures")
}
}
func TestRecovery(t *testing.T) {
state := &models.HealthState{
URL: "http://example.com",
Healthy: false,
ConsecutiveFailures: 7,
}
// Simulate successful check
state.ConsecutiveFailures = 0
state.Healthy = true
if !state.Healthy {
t.Error("should be healthy after recovery")
}
if state.ConsecutiveFailures != 0 {
t.Error("failure count should be reset")
}
}
```
## State of the Art
| Old Approach | Current Approach | When Changed | Impact |
|--------------|------------------|--------------|--------|
| No health monitoring | Phase 2 adds continuous monitoring | Phase 2 (now) | Dead streams automatically hidden from users |
| All published streams visible | Unhealthy streams filtered from public API | Phase 2 (now) | Better user experience - no broken streams shown |
| Validation only at scrape time | Continuous re-validation via health checks | Phase 2 (now) | Streams that go down after scraping are caught |
| One-shot validation (scraper) | Persistent state with failure tracking | Phase 2 (now) | Nuanced health model with recovery support |
## Design Decisions
### Using a Single GET Request Instead of HEAD + GET
The requirements specify two steps: (1) HTTP reachability check (HLTH-02), (2) content validation (HLTH-03). A literal implementation would do a HEAD request for reachability, then a GET for content. However, since `HasVideoContent()` already performs a GET and checks the status code, doing both steps in a single call is more efficient (one HTTP request instead of two). The `HasVideoContent()` function already returns `false` for non-2xx status codes (line 96-98 of `validate.go`), effectively combining the reachability check with content validation.
The health checker should still log the distinction: if the GET fails or returns non-2xx, log it as a reachability failure. If the GET succeeds but no video markers are found, log it as a content validation failure. This provides diagnostic value without the cost of an extra HTTP request.
### URL as Health State Key
Using the URL (not stream ID or scraped link ID) as the key for health state has several advantages:
1. A URL may appear in both `streams.json` and `scraped_links.json` -- one health check covers both
2. IDs are different types between streams (random hex) and scraped links (random hex but different generation)
3. URL normalization can deduplicate variations (trailing slashes, case)
4. Health is intrinsically a property of the URL, not the database record
### Separate Package vs. Adding to Scraper
The health checker could be added to the scraper package since it reuses `HasVideoContent()`. However, a separate `internal/healthcheck/` package is better because:
1. Different concern: discovery (scraper) vs. monitoring (health checker)
2. Different lifecycle: scraper runs after Reddit fetch; health checker runs independently
3. Different interval: scraper every 15 min, health checker every 5 min
4. Cleaner dependency graph: healthcheck imports scraper, not vice versa
5. Follows the codebase convention of one concern per package
## Open Questions
1. **Should the combined reachability + content check be a single GET, or separate HEAD + GET?**
- What we know: `HasVideoContent()` already does GET and checks status code. A single GET is more efficient.
- What's unclear: Whether some servers behave differently for HEAD vs GET (some CDNs return 200 for HEAD but serve different content for GET).
- Recommendation: Use single GET via `HasVideoContent()`. It already handles both reachability (status code check) and content validation (body inspection). Log failures with diagnostic detail (was it a connection error, non-2xx, or missing markers?). This halves the HTTP request count.
2. **Should health checks run in parallel or sequentially?**
- What we know: Sequential checking is simpler and follows the scraper's pattern. With 50 streams at 10s timeout each, worst case is ~8 minutes.
- What's unclear: Real-world stream count. Could be 10 or 200.
- Recommendation: Start sequential. Log total check duration. If it consistently exceeds 60% of the interval, add bounded parallelism (e.g., `semaphore` pattern with 5 workers). This matches the scraper's approach of starting simple.
3. **Lock ordering for HealthMap called inside PublicStreams**
- What we know: `PublicStreams()` holds `streamsMu.RLock()` and needs health data. `HealthMap()` reads `health_state.json`.
- What's unclear: Whether `readJSON` inside `HealthMap()` needs `healthMu` protection.
- Recommendation: `HealthMap()` should read without acquiring `healthMu` because `readJSON()` does a single `os.ReadFile()` call and `writeJSON()` uses atomic rename. The file is always in a consistent state. Brief staleness (reading milliseconds-old data) is acceptable. This avoids all deadlock risk.
## Sources
### Primary (HIGH confidence)
- Codebase analysis: `internal/scraper/scraper.go` - proven background service pattern with `Run(ctx)`, `time.Ticker`, mutex protection
- Codebase analysis: `internal/scraper/validate.go` - existing `hasVideoContent()`, `containsVideoMarkers()`, `isDirectVideoContentType()` implementations to reuse
- Codebase analysis: `internal/store/store.go` - `readJSON()`/`writeJSON()` atomic persistence pattern, `sync.RWMutex` per entity
- Codebase analysis: `internal/store/streams.go` - `PublicStreams()` filtering pattern to extend
- Codebase analysis: `internal/store/scraped.go` - `GetActiveScrapedLinks()` filtering pattern to extend
- Codebase analysis: `internal/models/models.go` - model definition pattern to follow for `HealthState`
- Codebase analysis: `main.go` - `envDuration()`, service initialization, goroutine startup, context passing
- Go stdlib docs: `time.Ticker`, `sync.RWMutex`, `net/http.Client`, `context.Context`
### Secondary (MEDIUM confidence)
- `.planning/REQUIREMENTS.md` - HLTH-01 through HLTH-09 requirement definitions
- `.planning/ROADMAP.md` - Phase 2 success criteria and dependencies on Phase 1
- `.planning/phases/01-scraper-validation/01-RESEARCH.md` - Phase 1 research confirming validation function design
- `.planning/codebase/ARCHITECTURE.md` - data flow patterns, cross-cutting concerns
- `.planning/codebase/CONVENTIONS.md` - naming, import ordering, error handling conventions
- `.planning/codebase/STRUCTURE.md` - package organization, where to add new code
### Tertiary (LOW confidence)
- None. All findings are based on direct codebase analysis and requirements documents.
## Metadata
**Confidence breakdown:**
- Standard stack: HIGH - uses only stdlib; all patterns have working examples in the codebase
- Architecture: HIGH - new package follows established scraper pattern exactly; integration points well-defined
- Pitfalls: HIGH - pitfalls identified from concrete codebase analysis (lock ordering, request doubling, unbounded state)
- Health state model: HIGH - simple struct following existing model pattern; persistence follows existing store pattern
**Research date:** 2026-02-17
**Valid until:** 2026-03-17 (stable domain; implementation uses only stdlib and existing codebase patterns)

View file

@ -1,123 +0,0 @@
---
phase: 02-health-check-infrastructure
verified: 2026-02-17T21:30:00Z
status: passed
score: 13/13 must-haves verified
re_verification: false
---
# Phase 02: Health Check Infrastructure Verification Report
**Phase Goal:** All known streams are continuously monitored for health, with status persisted and unhealthy streams hidden from users
**Verified:** 2026-02-17T21:30:00Z
**Status:** passed
**Re-verification:** No — initial verification
## Goal Achievement
### Observable Truths
| # | Truth | Status | Evidence |
|---|-------|--------|----------|
| 1 | HealthState model exists with URL, ConsecutiveFailures, LastCheckTime, and Healthy fields | ✓ VERIFIED | models.go lines 47-52 defines struct with all 4 required fields |
| 2 | Health states can be loaded from and saved to health_state.json via store methods | ✓ VERIFIED | health.go implements LoadHealthStates (lines 7-14), SaveHealthStates (lines 17-21) |
| 3 | HasVideoContent is exported and still works for the scraper | ✓ VERIFIED | validate.go line 81 exports HasVideoContent, line 69 uses it in validateLinks |
| 4 | HealthChecker service checks all known URLs sequentially with a two-step check (reachability + content) | ✓ VERIFIED | healthcheck.go line 92 calls HasVideoContent which does HTTP status check (lines 96-98) + content markers (lines 103-118) |
| 5 | A stream is marked unhealthy after 5 consecutive failures | ✓ VERIFIED | healthcheck.go line 103 checks ConsecutiveFailures >= unhealthyThreshold (5) to set Healthy=false |
| 6 | A previously unhealthy stream recovers when a check passes (failure count resets to 0, Healthy set to true) | ✓ VERIFIED | healthcheck.go lines 94-100 reset ConsecutiveFailures to 0 and set Healthy=true on success, logging recovery |
| 7 | Orphaned health state entries are pruned during each check cycle | ✓ VERIFIED | healthcheck.go lines 113-126 build urlSet and filter finalStates to only keep current URLs |
| 8 | Health checker starts as a background goroutine in main.go alongside the scraper | ✓ VERIFIED | main.go line 72 starts `go hc.Run(ctx)` after scraper startup |
| 9 | Health check interval is configurable via HEALTH_CHECK_INTERVAL env var (default 5m) | ✓ VERIFIED | main.go line 60 uses envDuration("HEALTH_CHECK_INTERVAL", 5*time.Minute) |
| 10 | Health check timeout is configurable via HEALTH_CHECK_TIMEOUT env var (default 10s) | ✓ VERIFIED | main.go line 61 uses envDuration("HEALTH_CHECK_TIMEOUT", 10*time.Second) |
| 11 | PublicStreams() filters out streams whose URL is marked unhealthy in health_state.json | ✓ VERIFIED | streams.go lines 29-38 call HealthMap() and skip entries where exists && !healthy |
| 12 | GetActiveScrapedLinks() filters out scraped links whose URL is marked unhealthy in health_state.json | ✓ VERIFIED | scraped.go lines 32-43 call HealthMap() and skip entries where exists && !healthy |
| 13 | Streams with no health state entry (new/unchecked) are assumed healthy and still appear | ✓ VERIFIED | streams.go line 36 and scraped.go line 41 use pattern `exists && !healthy` which preserves URLs not in map |
**Score:** 13/13 truths verified
### Required Artifacts
| Artifact | Expected | Status | Details |
|----------|----------|--------|---------|
| `internal/models/models.go` | HealthState struct | ✓ VERIFIED | Lines 47-52: struct with 4 required fields (URL, ConsecutiveFailures, LastCheckTime, Healthy) |
| `internal/store/health.go` | LoadHealthStates, SaveHealthStates, HealthMap methods | ✓ VERIFIED | 37 lines, 3 methods exported and implemented following store patterns |
| `internal/store/store.go` | healthMu field on Store struct | ✓ VERIFIED | Line 16: healthMu sync.RWMutex field added |
| `internal/scraper/validate.go` | Exported HasVideoContent function | ✓ VERIFIED | Line 81: HasVideoContent exported (capitalized), line 69 uses it |
| `internal/healthcheck/healthcheck.go` | HealthChecker service with Run, checkAll, collectURLs | ✓ VERIFIED | 169 lines, 5 functions: New, Run, checkAll, collectURLs, truncate |
| `main.go` | Health checker initialization and goroutine startup | ✓ VERIFIED | Lines 60-62: init with env vars, line 72: go hc.Run(ctx) |
| `internal/store/streams.go` | Health-filtered PublicStreams method | ✓ VERIFIED | Lines 29-38: HealthMap() call with filtering logic |
| `internal/store/scraped.go` | Health-filtered GetActiveScrapedLinks method | ✓ VERIFIED | Lines 32-43: HealthMap() call with filtering logic |
### Key Link Verification
| From | To | Via | Status | Details |
|------|-----|-----|--------|---------|
| healthcheck.go | validate.go | scraper.HasVideoContent | ✓ WIRED | Line 92: `scraper.HasVideoContent(hc.client, url)` |
| healthcheck.go | health.go | LoadHealthStates, SaveHealthStates | ✓ WIRED | Lines 68, 128: load existing states, save final states |
| healthcheck.go | streams.go | LoadStreams | ✓ WIRED | Line 139: `hc.store.LoadStreams()` in collectURLs |
| healthcheck.go | scraped.go | LoadScrapedLinks | ✓ WIRED | Line 148: `hc.store.LoadScrapedLinks()` in collectURLs |
| main.go | healthcheck.go | healthcheck.New, go hc.Run(ctx) | ✓ WIRED | Line 62: initialization, line 72: goroutine start |
| streams.go | health.go | HealthMap() | ✓ WIRED | Line 29: `s.HealthMap()` called in PublicStreams |
| scraped.go | health.go | HealthMap() | ✓ WIRED | Line 32: `s.HealthMap()` called in GetActiveScrapedLinks |
### Requirements Coverage
| Requirement | Source Plan | Description | Status | Evidence |
|-------------|------------|-------------|--------|----------|
| HLTH-01 | 02-01, 02-02 | Background health checker runs every 5 minutes (configurable) | ✓ SATISFIED | main.go line 60 sets configurable interval, healthcheck.go line 46 uses ticker, collectURLs gathers all streams+scraped |
| HLTH-02 | 02-01 | HTTP reachability check (2xx status) | ✓ SATISFIED | validate.go lines 96-98 check StatusCode before processing |
| HLTH-03 | 02-01 | Proxy-fetch page and check video/player markers | ✓ SATISFIED | validate.go lines 103-118 inspect HTML for videoMarkers, healthcheck.go line 92 uses HasVideoContent |
| HLTH-04 | 02-02 | Configurable timeout per check (default 10s) | ✓ SATISFIED | main.go line 61 HEALTH_CHECK_TIMEOUT env var, healthcheck.go line 31 sets client.Timeout |
| HLTH-05 | 02-01 | Track consecutive failures, last check time, healthy flag in persisted state | ✓ SATISFIED | models.go HealthState struct has all fields, health.go persists via JSON, healthcheck.go lines 99-109 update fields |
| HLTH-06 | 02-01 | Stream marked unhealthy after 5 consecutive failures | ✓ SATISFIED | healthcheck.go line 15 sets unhealthyThreshold=5, line 103 checks threshold |
| HLTH-07 | 02-02 | Unhealthy streams hidden from public API | ✓ SATISFIED | streams.go lines 36-37 filter PublicStreams, scraped.go lines 41-42 filter GetActiveScrapedLinks |
| HLTH-08 | 02-01 | Unhealthy streams continue to be checked and can recover | ✓ SATISFIED | healthcheck.go checkAll checks ALL URLs (lines 82-110), recovery detected on lines 95-97 with logging |
| HLTH-09 | 02-02 | Health check interval configurable via HEALTH_CHECK_INTERVAL env var (default 5m) | ✓ SATISFIED | main.go line 60: `envDuration("HEALTH_CHECK_INTERVAL", 5*time.Minute)` |
**All 9 requirements satisfied.**
### Anti-Patterns Found
| File | Line | Pattern | Severity | Impact |
|------|------|---------|----------|--------|
| - | - | - | - | No anti-patterns detected |
**Analysis:**
- No TODO/FIXME/PLACEHOLDER comments found in health check code
- No stub patterns (empty returns, console.log only implementations)
- All functions have substantive implementations
- Commits c53b557, e719efe, 8ad68d5, 535c56d verified in git log
- 169 lines in healthcheck.go, 37 lines in health.go — full implementations
### Human Verification Required
None. All verifiable through code inspection and requirements mapping.
The health check infrastructure is entirely backend logic:
- Background service with ticker loop (standard Go pattern, verified via code)
- JSON file persistence (verified via code paths)
- HTTP client calls (verified via HasVideoContent implementation)
- Filtering logic (verified via HealthMap usage in PublicStreams/GetActiveScrapedLinks)
No UI components, no real-time behavior requiring browser testing, no external service integration.
---
**Phase Goal Assessment:**
The phase goal "All known streams are continuously monitored for health, with status persisted and unhealthy streams hidden from users" is **ACHIEVED**:
1. **Continuous monitoring:** HealthChecker runs in background goroutine with configurable 5-minute interval, checking all streams and scraped links
2. **Health tracking:** HealthState model persists URL, consecutive failures, last check time, and healthy flag to health_state.json
3. **Two-step validation:** HasVideoContent checks HTTP reachability (2xx status) then video content markers
4. **Failure threshold:** 5 consecutive failures trigger unhealthy status
5. **Recovery:** Previously unhealthy streams can recover when checks pass
6. **User filtering:** PublicStreams() and GetActiveScrapedLinks() hide unhealthy URLs from public API
7. **Assume-healthy-by-default:** New streams appear immediately, hidden only after 5 check failures
All 9 HLTH requirements satisfied. All 13 must-have truths verified. All artifacts exist, are substantive, and wired correctly. No anti-patterns found.
---
_Verified: 2026-02-17T21:30:00Z_
_Verifier: Claude (gsd-verifier)_

View file

@ -1,186 +0,0 @@
---
phase: 03-auto-publish-pipeline
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- internal/models/models.go
- internal/store/streams.go
- main.go
- internal/scraper/scraper.go
autonomous: true
requirements:
- AUTO-01
- AUTO-02
- AUTO-03
must_haves:
truths:
- "Scraped streams that pass scraper validation appear on the public streams page without admin action"
- "Dead streams (unhealthy after 5 failures) are hidden from the public page automatically"
- "Auto-published streams have source='scraped' distinguishing them from user-submitted streams"
- "Duplicate scraped URLs are not re-added as new Stream entries"
- "Existing user-submitted and system streams are not broken by the Source field addition"
artifacts:
- path: "internal/models/models.go"
provides: "Stream model with Source field"
contains: "Source"
- path: "internal/store/streams.go"
provides: "PublishScrapedStream method for auto-publishing"
contains: "PublishScrapedStream"
- path: "internal/scraper/scraper.go"
provides: "Auto-publish call after validation"
contains: "PublishScrapedStream"
- path: "main.go"
provides: "Default streams with Source field set"
contains: "Source"
key_links:
- from: "internal/scraper/scraper.go"
to: "internal/store/streams.go"
via: "s.store.PublishScrapedStream call in scrape()"
pattern: "store\\.PublishScrapedStream"
- from: "internal/store/streams.go"
to: "internal/store/health.go"
via: "PublicStreams calls HealthMap to filter unhealthy"
pattern: "HealthMap"
---
<objective>
Add a Source field to the Stream model and wire the scraper to auto-publish validated links as Stream entries so they appear on the public streams page. Dead streams are already hidden by existing health filtering in PublicStreams().
Purpose: Complete the auto-publish pipeline so scraped streams flow from discovery through validation to the public page without manual intervention, while remaining distinguishable from user-submitted streams.
Output: Stream model with Source field, PublishScrapedStream store method, scraper auto-publish wiring.
</objective>
<execution_context>
@/Users/viktorbarzin/.claude/get-shit-done/workflows/execute-plan.md
@/Users/viktorbarzin/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@internal/models/models.go
@internal/store/streams.go
@internal/store/scraped.go
@internal/store/health.go
@internal/scraper/scraper.go
@internal/healthcheck/healthcheck.go
@main.go
</context>
<tasks>
<task type="auto">
<name>Task 1: Add Source field to Stream model and create PublishScrapedStream store method</name>
<files>internal/models/models.go, internal/store/streams.go, main.go, internal/server/server.go</files>
<action>
1. In `internal/models/models.go`, add a `Source` field to the `Stream` struct:
```go
Source string `json:"source"`
```
Add it after the `Published` field. Valid values: "user", "system", "scraped".
2. In `internal/store/streams.go`, update `AddStream` to accept a `source` parameter:
- Change signature to `AddStream(url, title, submittedBy string, published bool, source string) (models.Stream, error)`
- Set `Source: source` in the Stream struct literal
3. In `internal/store/streams.go`, add a new `PublishScrapedStream` method:
```go
func (s *Store) PublishScrapedStream(url, title string) error {
s.streamsMu.Lock()
defer s.streamsMu.Unlock()
var streams []models.Stream
if err := readJSON(s.filePath("streams.json"), &streams); err != nil {
return err
}
// Deduplicate: skip if URL already exists in streams
for _, st := range streams {
if st.URL == url {
return nil
}
}
id, err := randomID()
if err != nil {
return err
}
streams = append(streams, models.Stream{
ID: id,
URL: url,
Title: title,
SubmittedBy: "scraper",
Published: true,
Source: "scraped",
CreatedAt: time.Now(),
})
return writeJSON(s.filePath("streams.json"), streams)
}
```
4. In `main.go`, update `defaultStreams()` to set `Source: "system"` on each default stream.
5. In `internal/server/server.go`, update the `handleSubmitStream` call to `AddStream` to pass `source: "user"` as the new parameter. The call is currently `s.store.AddStream(req.URL, req.Title, submittedBy, published)` — add `"user"` as the 5th argument.
</action>
<verify>
Run `go build ./...` from the project root. It must compile with zero errors. Verify the new Source field appears in models.go and the PublishScrapedStream method exists in streams.go.
</verify>
<done>
Stream model has Source field (json:"source"). AddStream accepts source parameter. PublishScrapedStream method exists with URL deduplication. Default streams have Source="system". User-submitted streams have Source="user". Project compiles cleanly.
</done>
</task>
<task type="auto">
<name>Task 2: Wire scraper to auto-publish validated links as streams</name>
<files>internal/scraper/scraper.go</files>
<action>
In `internal/scraper/scraper.go`, in the `scrape()` method, after the validated links are saved to scraped_links.json (after the `s.store.SaveScrapedLinks(existing)` call), add auto-publishing logic:
```go
// Auto-publish newly validated links as streams
for _, l := range links {
if err := s.store.PublishScrapedStream(l.URL, l.Title); err != nil {
log.Printf("scraper: failed to auto-publish %s: %v", truncateURL(l.URL, 80), err)
}
}
```
Add a helper `truncateURL` if it doesn't exist, or inline the truncation. The scraper file already has access to `s.store` and the `links` slice contains only validated links (they passed `validateLinks`).
Note: The scraper's `validateLinks` already calls `HasVideoContent` which is the same check the health checker uses. This means a link that passes scraper validation has effectively passed its "initial health check" per AUTO-01. The health checker will then continue monitoring it. If it fails 5 checks, `PublicStreams()` already hides it (AUTO-02 is already implemented).
The auto-publish happens for ALL validated links in each scrape cycle, but `PublishScrapedStream` deduplicates by URL, so existing streams are skipped (no-op). Only genuinely new URLs get added as Stream entries.
</action>
<verify>
Run `go build ./...` from the project root. Verify the scraper imports compile. Read scraper.go to confirm the auto-publish loop exists after SaveScrapedLinks. Trace the flow: scrapeReddit -> validateLinks -> SaveScrapedLinks -> PublishScrapedStream for each validated link.
</verify>
<done>
Scraper auto-publishes validated links as Stream entries after each scrape cycle. Deduplication prevents duplicates. The full pipeline works: scrape -> validate -> save scraped -> auto-publish as Stream -> health checker monitors -> PublicStreams filters unhealthy. No admin intervention needed.
</done>
</task>
</tasks>
<verification>
1. `go build ./...` compiles without errors
2. Stream model has `Source string` field with json tag
3. PublishScrapedStream method exists in store/streams.go with URL deduplication
4. Scraper calls PublishScrapedStream for each validated link after saving
5. Default streams have Source="system", user-submitted have Source="user", scraped have Source="scraped"
6. PublicStreams() still filters by Published flag and HealthMap (existing behavior preserved for AUTO-02)
</verification>
<success_criteria>
- Scraped streams that pass validateLinks are automatically added to streams.json with source="scraped" and published=true
- They appear in PublicStreams() output (visible on public page)
- If health checker marks them unhealthy (5 failures), PublicStreams() hides them (existing behavior)
- No duplicate Stream entries for the same URL
- All existing streams and scraped links continue to work
- Source field distinguishes scraped from user and system streams
</success_criteria>
<output>
After completion, create `.planning/phases/03-auto-publish-pipeline/03-01-SUMMARY.md`
</output>

View file

@ -1,105 +0,0 @@
---
phase: 03-auto-publish-pipeline
plan: 01
subsystem: api
tags: [scraper, auto-publish, stream-model, deduplication]
# Dependency graph
requires:
- phase: 01-scraper-validation
provides: "Scraper with validateLinks and HasVideoContent"
- phase: 02-health-check-infrastructure
provides: "Health checker, HealthMap, PublicStreams filtering"
provides:
- "Stream.Source field distinguishing user/system/scraped streams"
- "PublishScrapedStream store method with URL deduplication"
- "Auto-publish wiring in scraper after validation"
affects: [04-ui-polish, 05-production-hardening]
# Tech tracking
tech-stack:
added: []
patterns: [auto-publish-pipeline, source-tagging, url-deduplication]
key-files:
created: []
modified:
- internal/models/models.go
- internal/store/streams.go
- internal/scraper/scraper.go
- internal/server/server.go
- main.go
key-decisions:
- "Source field uses string values (user/system/scraped) rather than int enum for readability"
- "PublishScrapedStream deduplicates by exact URL match (normalized URL matching left to scraper layer)"
- "Auto-publish iterates all validated links each cycle; deduplication makes repeat calls no-ops"
patterns-established:
- "Source tagging: all stream creation paths set Source field for provenance tracking"
- "Auto-publish pattern: scraper validates then publishes via store method, no manual step"
requirements-completed: [AUTO-01, AUTO-02, AUTO-03]
# Metrics
duration: 2min
completed: 2026-02-17
---
# Phase 03 Plan 01: Auto-Publish Pipeline Summary
**Source-tagged Stream model with scraper auto-publish wiring and URL deduplication for zero-touch stream discovery**
## Performance
- **Duration:** 2 min
- **Started:** 2026-02-17T21:31:46Z
- **Completed:** 2026-02-17T21:33:52Z
- **Tasks:** 2
- **Files modified:** 5
## Accomplishments
- Added Source field to Stream model distinguishing user/system/scraped provenance
- Created PublishScrapedStream store method with URL deduplication to prevent duplicates
- Wired scraper to auto-publish validated links as Stream entries after each scrape cycle
- Updated all stream creation paths (default seeds, user submissions) with correct Source values
## Task Commits
Each task was committed atomically:
1. **Task 1: Add Source field to Stream model and create PublishScrapedStream store method** - `8869dc5` (feat)
2. **Task 2: Wire scraper to auto-publish validated links as streams** - `5b60f17` (feat)
## Files Created/Modified
- `internal/models/models.go` - Added Source field to Stream struct
- `internal/store/streams.go` - Added source param to AddStream, new PublishScrapedStream method with dedup
- `internal/scraper/scraper.go` - Auto-publish loop after SaveScrapedLinks
- `internal/server/server.go` - Pass "user" source to AddStream in handleSubmitStream
- `main.go` - Set Source="system" on default streams
## Decisions Made
- Source field uses string values (user/system/scraped) rather than int enum for JSON readability
- PublishScrapedStream deduplicates by exact URL match; normalized URL matching stays in scraper layer
- Auto-publish iterates all validated links each cycle; deduplication makes repeat calls no-ops
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
None
## User Setup Required
None - no external service configuration required.
## Next Phase Readiness
- Full auto-publish pipeline operational: scrape -> validate -> save scraped links -> auto-publish as Stream -> health checker monitors -> PublicStreams filters unhealthy
- Source field enables UI to distinguish stream origins in Phase 04 (UI polish)
- Ready for Phase 04 (UI polish) and Phase 05 (production hardening)
---
*Phase: 03-auto-publish-pipeline*
*Completed: 2026-02-17*
## Self-Check: PASSED

View file

@ -1,115 +0,0 @@
---
phase: 03-auto-publish-pipeline
verified: 2026-02-17T21:40:00Z
status: passed
score: 5/5 must-haves verified
re_verification: false
---
# Phase 03: Auto-Publish Pipeline Verification Report
**Phase Goal:** Scraped streams that pass validation and health checks appear on the public page automatically; dead streams disappear without admin action
**Verified:** 2026-02-17T21:40:00Z
**Status:** passed
**Re-verification:** No — initial verification
## Goal Achievement
### Observable Truths
| # | Truth | Status | Evidence |
|---|-------|--------|----------|
| 1 | Scraped streams that pass scraper validation appear on the public streams page without admin action | ✓ VERIFIED | PublishScrapedStream called in scraper.go:102 after validation. Sets Published=true, Source="scraped". PublicStreams() in streams.go:22 includes published streams. |
| 2 | Dead streams (unhealthy after 5 failures) are hidden from the public page automatically | ✓ VERIFIED | PublicStreams() at streams.go:29 calls HealthMap(). Lines 36-38 filter unhealthy streams. Health checker (Phase 02) marks streams unhealthy after 5 failures. |
| 3 | Auto-published streams have source='scraped' distinguishing them from user-submitted streams | ✓ VERIFIED | PublishScrapedStream sets Source="scraped" at streams.go:110. User submissions set Source="user" at server.go:173. Default streams set Source="system" at main.go:127. |
| 4 | Duplicate scraped URLs are not re-added as new Stream entries | ✓ VERIFIED | PublishScrapedStream deduplicates by URL at streams.go:95-98. Returns nil (no-op) if URL exists. |
| 5 | Existing user-submitted and system streams are not broken by the Source field addition | ✓ VERIFIED | AddStream updated with source parameter at streams.go:60. All call sites updated: server.go:173 ("user"), main.go:127 ("system"). Source field added to models.go:29. |
**Score:** 5/5 truths verified
### Required Artifacts
| Artifact | Expected | Status | Details |
|----------|----------|--------|---------|
| `internal/models/models.go` | Stream model with Source field | ✓ VERIFIED | Line 29: `Source string \`json:"source"\`` exists in Stream struct. Substantive: contains expected field with json tag. Wired: Used in streams.go, server.go, main.go. |
| `internal/store/streams.go` | PublishScrapedStream method for auto-publishing | ✓ VERIFIED | Lines 87-114: PublishScrapedStream method exists with URL deduplication (lines 95-98), stream creation with Source="scraped" (line 110), Published=true (line 109). Substantive: 28 lines of implementation. Wired: Called from scraper.go:102. |
| `internal/scraper/scraper.go` | Auto-publish call after validation | ✓ VERIFIED | Lines 100-109: Auto-publish loop iterates validated links, calls s.store.PublishScrapedStream(l.URL, l.Title). Substantive: 10 lines with error handling. Wired: PublishScrapedStream imported from store package. |
| `main.go` | Default streams with Source field set | ✓ VERIFIED | Line 127: `Source: "system"` in defaultStreams() function. Substantive: Source field populated on all default streams. Wired: Passed to st.SeedStreams() at main.go:42. |
### Key Link Verification
| From | To | Via | Status | Details |
|------|----|----|--------|---------|
| `internal/scraper/scraper.go` | `internal/store/streams.go` | s.store.PublishScrapedStream call in scrape() | ✓ WIRED | scraper.go:102 calls `s.store.PublishScrapedStream(l.URL, l.Title)`. Pattern `store\.PublishScrapedStream` found. Scraper has store dependency at scraper.go:14. Auto-publish happens after SaveScrapedLinks at line 95. |
| `internal/store/streams.go` | `internal/store/health.go` | PublicStreams calls HealthMap to filter unhealthy | ✓ WIRED | streams.go:29 calls `healthMap := s.HealthMap()`. health.go:27 defines HealthMap() method. Lines 36-38 in streams.go filter streams where healthMap marks URL as unhealthy. |
### Requirements Coverage
| Requirement | Source Plan | Description | Status | Evidence |
|-------------|-------------|-------------|--------|----------|
| AUTO-01 | 03-01-PLAN.md | Scraped streams that pass both scraper validation and initial health check are auto-published to the main streams page | ✓ SATISFIED | scraper.go:62 validateLinks uses HasVideoContent (same check as health checker). Lines 100-109 auto-publish validated links via PublishScrapedStream with Published=true. PublicStreams() includes published streams. |
| AUTO-02 | 03-01-PLAN.md | Dead streams (unhealthy after 5 failures) are dynamically removed from the public page without admin intervention | ✓ SATISFIED | PublicStreams() at streams.go:29-40 filters unhealthy streams using HealthMap. Health checker (Phase 02) marks streams unhealthy after 5 consecutive failures. No admin action needed. |
| AUTO-03 | 03-01-PLAN.md | Auto-published streams are distinguishable from user-submitted streams in the data model (source field) | ✓ SATISFIED | Stream model has Source field (models.go:29). PublishScrapedStream sets Source="scraped" (streams.go:110). User submissions set Source="user" (server.go:173). System defaults set Source="system" (main.go:127). |
**Orphaned requirements:** None. All requirements mapped to Phase 03 in REQUIREMENTS.md are covered by 03-01-PLAN.md.
### Anti-Patterns Found
**None.** Modified files scanned for anti-patterns:
| File | Scan Results |
|------|--------------|
| `internal/models/models.go` | No TODO/FIXME/placeholder comments. No empty implementations. |
| `internal/store/streams.go` | No TODO/FIXME/placeholder comments. PublishScrapedStream fully implemented with deduplication. |
| `internal/scraper/scraper.go` | No TODO/FIXME/placeholder comments. Auto-publish loop complete with error logging. |
| `internal/server/server.go` | No TODO/FIXME/placeholder comments. AddStream call updated with "user" source. |
| `main.go` | No TODO/FIXME/placeholder comments. Default streams set Source="system". |
**Commits verified:**
- `8869dc5`: feat(03-01): add Source field to Stream model and PublishScrapedStream store method
- `5b60f17`: feat(03-01): wire scraper to auto-publish validated links as streams
Both commits exist in git history.
### Human Verification Required
**None required.** All truths are verifiable programmatically through code inspection:
1. Auto-publish wiring: Grep confirms PublishScrapedStream called after validation
2. Dead stream filtering: PublicStreams() code shows HealthMap filtering
3. Source field distinction: Three distinct source values set in code
4. Deduplication: URL matching logic present in PublishScrapedStream
5. Backward compatibility: All AddStream call sites updated with source parameter
The auto-publish pipeline is a backend feature with no UI-specific behavior requiring human testing.
## Verification Summary
**All 5 observable truths verified.** The auto-publish pipeline is fully operational:
1. **Scraper validates** → links pass HasVideoContent check (validateLinks at scraper.go:62)
2. **Scraper auto-publishes** → PublishScrapedStream called for each validated link (scraper.go:100-109)
3. **Streams are created** → Source="scraped", Published=true, with URL deduplication (streams.go:87-114)
4. **Public API serves** → PublicStreams() returns published streams (streams.go:22-42)
5. **Health filtering** → HealthMap removes unhealthy streams from public view (streams.go:29, 36-38)
6. **Dead streams disappear** → No admin action needed; health checker marks unhealthy after 5 failures (Phase 02)
**Source field provenance:**
- `"scraped"`: Auto-published streams from scraper
- `"user"`: User-submitted streams via API
- `"system"`: Default seed streams from main.go
**Deduplication:** PublishScrapedStream checks existing streams by URL before adding new entries.
**Requirements:** All 3 phase requirements (AUTO-01, AUTO-02, AUTO-03) satisfied.
**Next phase readiness:** Phase 03 complete. Ready for Phase 04 (UI polish) and Phase 05 (production hardening).
---
_Verified: 2026-02-17T21:40:00Z_
_Verifier: Claude (gsd-verifier)_

View file

@ -1,183 +0,0 @@
---
phase: 04-video-extraction-native-playback
plan: 01
type: execute
wave: 1
depends_on: []
files_modified:
- internal/extractor/extractor.go
- internal/server/server.go
- go.mod
- go.sum
autonomous: true
requirements:
- EMBED-01
must_haves:
truths:
- "GET /api/streams/{id}/extract returns a JSON response with extracted video source URL and type when the stream page contains a video source"
- "Extractor can find HLS .m3u8 URLs from <video>/<source> src attributes and script tag contents"
- "Extractor can find DASH .mpd URLs from <video>/<source> src attributes and script tag contents"
- "Extractor can find direct MP4/WebM URLs from <video>/<source> src attributes"
- "Extractor can find video URLs from jwplayer, video.js, and hls.js setup calls in script tags"
- "GET /api/streams/{id}/extract returns empty result (not error) when no video source is found"
artifacts:
- path: "internal/extractor/extractor.go"
provides: "Video source URL extraction from HTML pages"
contains: "func Extract"
- path: "internal/server/server.go"
provides: "API endpoint for video extraction"
contains: "/api/streams/{id}/extract"
key_links:
- from: "internal/server/server.go"
to: "internal/extractor/extractor.go"
via: "handler calls extractor.Extract"
pattern: "extractor\\.Extract"
- from: "internal/server/server.go"
to: "internal/store"
via: "handler looks up stream by ID"
pattern: "store.*stream.*id"
---
<objective>
Create a backend video source extractor that fetches a stream page, parses the HTML with golang.org/x/net/html, and extracts direct video source URLs (HLS, DASH, MP4/WebM). Expose this via a new API endpoint.
Purpose: Enable the frontend (Plan 02) to play streams natively in an HTML5 video player instead of loading the entire third-party page in an iframe.
Output: New `internal/extractor/` package and `GET /api/streams/{id}/extract` endpoint.
</objective>
<execution_context>
@/Users/viktorbarzin/.claude/get-shit-done/workflows/execute-plan.md
@/Users/viktorbarzin/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@internal/proxy/proxy.go
@internal/scraper/validate.go
@internal/server/server.go
@internal/models/models.go
@go.mod
</context>
<tasks>
<task type="auto">
<name>Task 1: Create video source extractor package</name>
<files>internal/extractor/extractor.go, go.mod, go.sum</files>
<action>
1. Add `golang.org/x/net` dependency: run `go get golang.org/x/net/html` from the project root.
2. Create `internal/extractor/extractor.go` with the following:
**Types:**
```go
type VideoSource struct {
URL string `json:"url"`
Type string `json:"type"` // "hls", "dash", "mp4", "webm", "unknown"
}
```
**Main function:** `func Extract(client *http.Client, rawURL string) ([]VideoSource, error)`
- Fetch the page with a browser-like User-Agent (reuse the same UA string from proxy.go: `Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...`)
- If the response Content-Type is a direct video type (video/mp4, video/webm, application/x-mpegurl, application/dash+xml, etc.), return it immediately as a VideoSource without parsing HTML. Use the same `videoContentTypes` logic from scraper/validate.go but inline it here.
- If the Content-Type is HTML, read the body (limit to 2MB like validate.go) and parse with `golang.org/x/net/html`.
- Extraction strategies (apply all, deduplicate results):
a. **DOM parsing** (`golang.org/x/net/html`): Walk the HTML node tree. For `<video>` elements, check `src` attribute. For `<source>` elements inside or outside `<video>`, check `src` attribute. For `<iframe>` elements, check `src` for .m3u8/.mpd URLs.
b. **Script tag regex extraction**: For each `<script>` element's text content, apply these regex patterns to find URLs:
- `.m3u8` URLs: `(?:"|')((https?://[^"'\s]+\.m3u8[^"'\s]*))`
- `.mpd` URLs: `(?:"|')((https?://[^"'\s]+\.mpd[^"'\s]*))`
- `.mp4` URLs: `(?:"|')((https?://[^"'\s]+\.mp4[^"'\s]*))`
- `.webm` URLs: `(?:"|')((https?://[^"'\s]+\.webm[^"'\s]*))`
- JWPlayer source: `file\s*:\s*["']([^"']+)` (captures the URL inside)
- video.js/hls.js src: `src\s*[:=]\s*["']([^"']+\.m3u8[^"']*)` and similar for .mpd
c. **Type classification**: Determine type from URL extension:
- `.m3u8` or Content-Type contains mpegurl -> "hls"
- `.mpd` or Content-Type contains dash -> "dash"
- `.mp4` -> "mp4"
- `.webm` -> "webm"
- Otherwise -> "unknown"
d. **Deduplication**: Return unique URLs only (use a map to track seen URLs).
e. **Priority ordering**: Return HLS sources first, then DASH, then direct video (mp4/webm). This helps the frontend pick the best option.
- If no video sources found, return empty slice (not nil), nil error.
- If fetch fails, return nil, error.
**Helper functions** (unexported):
- `classifyURL(u string) string` — returns "hls", "dash", "mp4", "webm", or "unknown"
- `extractFromDOM(doc *html.Node) []VideoSource` — walks DOM tree
- `extractFromScripts(doc *html.Node) []VideoSource` — extracts from script text content
- `isVideoURL(u string) bool` — returns true if URL looks like a video source (.m3u8, .mpd, .mp4, .webm)
Follow project conventions: log.Printf with "extractor:" prefix for logging. Limit body read to 2MB. Use 3-redirect limit matching proxy pattern.
</action>
<verify>
Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./internal/extractor/` to confirm the package compiles. Then run `go vet ./internal/extractor/` to check for issues.
</verify>
<done>
`internal/extractor/extractor.go` exists, compiles, and exports `Extract(client, url) ([]VideoSource, error)` function. `golang.org/x/net` is in go.mod.
</done>
</task>
<task type="auto">
<name>Task 2: Add extract API endpoint to server</name>
<files>internal/server/server.go</files>
<action>
1. Add import for `"f1-stream/internal/extractor"` to server.go.
2. In `registerRoutes()`, add a new public endpoint (no auth required, but with standard middleware):
```go
s.mux.Handle("GET /api/streams/{id}/extract", wrapAll(s.handleExtractVideo))
```
Place it near the existing `GET /api/streams/public` route.
3. Create handler method `func (s *Server) handleExtractVideo(w http.ResponseWriter, r *http.Request)`:
- Get stream ID from `r.PathValue("id")`
- Look up the stream from store. Use `s.store.LoadStreams()` to find the stream by ID (there's no GetStreamByID method, so iterate). If not found, return 404 `{"error":"stream not found"}`.
- Only extract from published streams (if `!stream.Published`, return 404).
- Create an HTTP client with 15-second timeout and 3-redirect limit (matching proxy pattern).
- Call `extractor.Extract(client, stream.URL)`.
- If error, return 502 `{"error":"failed to extract video source"}` and log the error.
- Return JSON response:
```json
{"sources": [...], "stream_id": "xxx"}
```
Where `sources` is the `[]extractor.VideoSource` array. If empty, return `{"sources": [], "stream_id": "xxx"}`.
4. The handler should set appropriate cache headers: `Cache-Control: public, max-age=300` (5-minute cache since extraction is expensive). This prevents hammering the upstream site when multiple users view the same stream.
</action>
<verify>
Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./...` to confirm full project compiles. Run `go vet ./...` to check for issues.
</verify>
<done>
`GET /api/streams/{id}/extract` endpoint exists in server.go, calls extractor.Extract, returns JSON with sources array, compiles successfully.
</done>
</task>
</tasks>
<verification>
1. `go build ./...` passes without errors
2. `go vet ./...` passes without issues
3. `internal/extractor/extractor.go` exists and exports `Extract` and `VideoSource`
4. `internal/server/server.go` has route `GET /api/streams/{id}/extract` registered
5. `golang.org/x/net` appears in go.mod
</verification>
<success_criteria>
- The extractor package can parse HTML and find video source URLs from DOM elements and script contents
- The API endpoint accepts a stream ID, fetches the page, runs extraction, and returns results
- Empty extraction returns empty sources array (not error)
- Full project compiles cleanly
</success_criteria>
<output>
After completion, create `.planning/phases/04-video-extraction-native-playback/04-01-SUMMARY.md`
</output>

View file

@ -1,114 +0,0 @@
---
phase: 04-video-extraction-native-playback
plan: 01
subsystem: api
tags: [html-parsing, video-extraction, hls, dash, golang-x-net]
# Dependency graph
requires:
- phase: 02-health-check-infrastructure
provides: HTTP client patterns (timeout, redirect limit, user-agent)
- phase: 03-auto-publish-pipeline
provides: Store with stream lookup (LoadStreams)
provides:
- Video source extractor package (internal/extractor)
- GET /api/streams/{id}/extract endpoint
- VideoSource struct with URL and type classification
affects: [04-02, frontend-video-player, native-playback]
# Tech tracking
tech-stack:
added: [golang.org/x/net/html]
patterns: [DOM tree walking, regex script extraction, content-type detection]
key-files:
created: [internal/extractor/extractor.go]
modified: [internal/server/server.go, go.mod, go.sum]
key-decisions:
- "DOM parsing with golang.org/x/net/html for structured element extraction"
- "Regex patterns for script tag video URL extraction (HLS, DASH, JWPlayer, video.js, hls.js)"
- "Priority ordering: HLS > DASH > MP4 > WebM for frontend source selection"
- "5-minute cache (Cache-Control: public, max-age=300) to reduce upstream load"
- "Empty sources array (not error) when no video found, to distinguish from fetch failures"
patterns-established:
- "Extractor pattern: multiple strategies (DOM + regex) with deduplication and priority sorting"
- "Direct content-type bypass: skip HTML parsing when response is already a video type"
requirements-completed: [EMBED-01]
# Metrics
duration: 3min
completed: 2026-02-17
---
# Phase 04 Plan 01: Video Source Extractor Summary
**HTML video source extractor with DOM parsing and script regex extraction, exposed via GET /api/streams/{id}/extract endpoint with 5-minute caching**
## Performance
- **Duration:** 3 min
- **Started:** 2026-02-17T21:42:49Z
- **Completed:** 2026-02-17T21:46:03Z
- **Tasks:** 2
- **Files modified:** 3
## Accomplishments
- Created video source extractor package that finds HLS, DASH, MP4, WebM URLs from HTML pages
- DOM parsing extracts URLs from `<video>`, `<source>`, and `<iframe>` elements
- Regex extraction finds video URLs from script tags including JWPlayer, video.js, and hls.js patterns
- API endpoint returns extracted sources with type classification and priority ordering
- Direct video content-type detection bypasses HTML parsing for efficiency
## Task Commits
Each task was committed atomically:
1. **Task 1: Create video source extractor package** - `74410e2` (feat)
2. **Task 2: Add extract API endpoint to server** - `bc9614e` (feat)
**Plan metadata:** (pending final docs commit)
## Files Created/Modified
- `internal/extractor/extractor.go` - Video source extraction from HTML pages with DOM and regex strategies
- `internal/server/server.go` - Added /api/streams/{id}/extract endpoint with cache headers
- `go.mod` - Added golang.org/x/net dependency
- `go.sum` - Updated dependency checksums
## Decisions Made
- Used golang.org/x/net/html for DOM parsing (consistent with plan, deferred from Phase 2 per project decisions)
- Implemented dual extraction strategy: structured DOM walking + regex script parsing for maximum coverage
- Priority ordering (HLS > DASH > MP4 > WebM) helps frontend pick best playback option automatically
- 5-minute Cache-Control header prevents hammering upstream sites when multiple users view same stream
- Return empty sources array (not error) when no video found -- caller can distinguish "no video" from "fetch failed"
- 15-second timeout and 3-redirect limit matching existing proxy/scraper patterns
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
- Go module cache permission error on default GOPATH; resolved by using temporary GOMODCACHE/GOPATH for build commands. This is a sandbox environment constraint, not a code issue.
## User Setup Required
None - no external service configuration required.
## Next Phase Readiness
- Extractor package ready for Plan 02 (frontend native video player)
- API endpoint returns structured JSON that frontend can consume to initialize HTML5 video playback
- HLS/DASH sources will need hls.js/dash.js on the frontend for browser playback
---
*Phase: 04-video-extraction-native-playback*
*Completed: 2026-02-17*
## Self-Check: PASSED
- FOUND: internal/extractor/extractor.go
- FOUND: internal/server/server.go
- FOUND: 04-01-SUMMARY.md
- FOUND: commit 74410e2
- FOUND: commit bc9614e

View file

@ -1,218 +0,0 @@
---
phase: 04-video-extraction-native-playback
plan: 02
type: execute
wave: 2
depends_on:
- 04-01
files_modified:
- static/index.html
- static/js/streams.js
autonomous: true
requirements:
- EMBED-02
must_haves:
truths:
- "When a stream has an extractable HLS source, the user sees a native video player on the app page instead of an iframe loading the third-party site"
- "When a stream has an extractable MP4/WebM source, the user sees a native HTML5 video element playing the stream"
- "When extraction fails or returns no sources, the user sees the existing iframe fallback (no regression)"
- "HLS streams play using hls.js library when the browser does not support HLS natively"
- "The video player has standard controls (play, pause, volume, fullscreen)"
artifacts:
- path: "static/index.html"
provides: "HLS.js library script tag"
contains: "hls.js"
- path: "static/js/streams.js"
provides: "Updated streamCard with native video player support"
contains: "extractVideo"
key_links:
- from: "static/js/streams.js"
to: "/api/streams/{id}/extract"
via: "fetch call to extract endpoint"
pattern: "api/streams.*extract"
- from: "static/js/streams.js"
to: "Hls"
via: "HLS.js library for .m3u8 playback"
pattern: "new Hls|Hls\\.isSupported"
---
<objective>
Update the frontend to attempt video extraction for each stream card and render a native HTML5 video player when a direct video source is found, falling back to the existing iframe approach when extraction fails.
Purpose: Users watch streams in a clean native player on the app's own page without loading the third-party page, reducing exposure to ads, popups, and framing issues.
Output: Updated `streams.js` with extraction-first rendering, HLS.js integration via CDN in `index.html`.
</objective>
<execution_context>
@/Users/viktorbarzin/.claude/get-shit-done/workflows/execute-plan.md
@/Users/viktorbarzin/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/04-video-extraction-native-playback/04-01-SUMMARY.md
@static/js/streams.js
@static/index.html
</context>
<tasks>
<task type="auto">
<name>Task 1: Add HLS.js and update streamCard for native video playback</name>
<files>static/index.html, static/js/streams.js</files>
<action>
**1. Add HLS.js to index.html:**
Add HLS.js from CDN before the existing script tags (before `utils.js`):
```html
<script src="https://cdn.jsdelivr.net/npm/hls.js@1/dist/hls.min.js"></script>
```
**2. Update `streamCard()` in streams.js:**
The current `streamCard()` function renders an iframe immediately. Change the approach:
a. The card initially renders with a loading state placeholder instead of an iframe:
```html
<div class="stream-card" data-stream-id="${stream.id}">
<div class="iframe-wrap" id="player-wrap-${stream.id}">
<div class="loading-overlay" id="loader-${stream.id}">
<div class="spinner"></div>
</div>
</div>
<div class="card-bar">
<span class="title">${escapeHtml(stream.title)}</span>
<div class="card-actions">
${externalBtn}
${deleteBtn}
</div>
</div>
</div>
```
b. After cards are rendered in the DOM (after `grid.innerHTML = ...`), call a new function `tryExtractVideos(streams)` that attempts extraction for each stream.
**3. Create `async function tryExtractVideos(streams)`:**
For each stream, call `tryExtractVideo(stream)` concurrently using `Promise.allSettled`.
**4. Create `async function tryExtractVideo(stream)`:**
- Fetch `GET /api/streams/${stream.id}/extract`
- If response is OK and `sources` array is non-empty:
- Pick the best source: prefer "hls" type, then "dash", then "mp4"/"webm"
- Call `renderNativePlayer(stream.id, source)` to replace the loading placeholder
- If response fails or sources is empty:
- Call `renderIframeFallback(stream.id, stream.url)` to show the existing iframe
**5. Create `function renderNativePlayer(streamId, source)`:**
Get the wrapper element `player-wrap-${streamId}`.
For HLS sources (source.type === "hls"):
```javascript
const video = document.createElement('video');
video.controls = true;
video.autoplay = false;
video.style.width = '100%';
video.style.height = '100%';
video.setAttribute('playsinline', '');
if (Hls.isSupported()) {
const hls = new Hls();
hls.loadSource(source.url);
hls.attachMedia(video);
} else if (video.canPlayType('application/vnd.apple.mpegurl')) {
// Native HLS support (Safari)
video.src = source.url;
}
```
For MP4/WebM sources:
```javascript
const video = document.createElement('video');
video.controls = true;
video.autoplay = false;
video.style.width = '100%';
video.style.height = '100%';
video.setAttribute('playsinline', '');
video.src = source.url;
```
For DASH sources (skip for now, just fall back to iframe — DASH requires dash.js which is heavier):
- Call `renderIframeFallback(streamId, stream.url)` instead.
Replace the loading overlay content with the video element. Remove the spinner. Add a "loaded" class to the loading overlay.
**6. Create `function renderIframeFallback(streamId, streamURL)`:**
Get the wrapper element and replace its innerHTML with the existing iframe markup:
```javascript
const wrap = document.getElementById(`player-wrap-${streamId}`);
if (!wrap) return;
wrap.innerHTML = `
<div class="loading-overlay" id="loader-${streamId}">
<div class="spinner"></div>
</div>
<iframe
id="iframe-${streamId}"
src="${escapeHtml(streamURL)}"
sandbox="allow-scripts allow-forms allow-presentation"
loading="lazy"
allowfullscreen
onload="document.getElementById('loader-${streamId}').classList.add('loaded')"
></iframe>
`;
```
**7. Update `loadPublicStreams()` and `loadMyStreams()`:**
After `grid.innerHTML = streams.map(...)`, add `tryExtractVideos(streams)` call. The existing `sortStreamsByHealth` call should remain after it.
**Important:** Do NOT block the initial render. The cards show immediately with loading spinners. Extraction runs asynchronously and replaces the spinner with either a native player or an iframe as results come in.
**Error handling:** If the fetch to `/api/streams/{id}/extract` throws (network error, timeout), silently fall back to iframe. Log to console but do not show user-facing errors for extraction failures — the iframe fallback is the safety net.
</action>
<verify>
1. Open `static/index.html` in a text editor and confirm the HLS.js script tag is present before other JS scripts.
2. Open `static/js/streams.js` and confirm:
- `streamCard()` no longer renders an iframe directly
- `tryExtractVideos()` and `tryExtractVideo()` functions exist
- `renderNativePlayer()` and `renderIframeFallback()` functions exist
- `loadPublicStreams()` calls `tryExtractVideos()`
3. Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./...` to confirm the Go project still compiles (no Go changes, but verify no accidental edits).
</verify>
<done>
- HLS.js loaded via CDN in index.html
- Stream cards attempt extraction first, render native HTML5 video player for HLS/MP4/WebM sources
- Iframe fallback works when extraction fails or returns no sources
- No visual regression for streams without extractable video sources
</done>
</task>
</tasks>
<verification>
1. `static/index.html` includes HLS.js CDN script tag
2. `static/js/streams.js` has extraction-first card rendering logic
3. `streamCard()` renders placeholder, not iframe, as initial state
4. `tryExtractVideo()` calls `/api/streams/{id}/extract` and handles success/failure
5. `renderNativePlayer()` creates HTML5 `<video>` element with HLS.js for .m3u8 sources
6. `renderIframeFallback()` creates standard iframe for non-extractable streams
7. Full project compiles: `go build ./...`
</verification>
<success_criteria>
- Streams with extractable video sources render in a native HTML5 video player with controls
- HLS streams play via hls.js (or native HLS on Safari)
- Streams without extractable sources fall back to iframe rendering (existing behavior preserved)
- No user-facing errors when extraction fails — silent fallback to iframe
- Video player has play, pause, volume, and fullscreen controls
</success_criteria>
<output>
After completion, create `.planning/phases/04-video-extraction-native-playback/04-02-SUMMARY.md`
</output>

View file

@ -1,107 +0,0 @@
---
phase: 04-video-extraction-native-playback
plan: 02
subsystem: ui
tags: [hls-js, html5-video, native-playback, video-extraction, frontend]
# Dependency graph
requires:
- phase: 04-video-extraction-native-playback
provides: Video source extractor API endpoint (GET /api/streams/{id}/extract)
provides:
- Extraction-first stream card rendering with native HTML5 video player
- HLS.js integration for .m3u8 playback in non-Safari browsers
- Iframe fallback for streams without extractable video sources
affects: [05-polish-monitoring, frontend, streaming-experience]
# Tech tracking
tech-stack:
added: [hls.js@1 (CDN)]
patterns: [extraction-first rendering, progressive enhancement with fallback, priority-based source selection]
key-files:
created: []
modified: [static/index.html, static/js/streams.js]
key-decisions:
- "HLS.js loaded from CDN (jsdelivr) to avoid bundling complexity"
- "Extraction runs async after card render -- loading spinner shows immediately, player replaces it"
- "DASH sources fall back to iframe (dash.js too heavy for current scope)"
- "pickBestSource priority: HLS > DASH > MP4 > WebM matches backend ordering"
- "Silent console.log on extraction failure -- no user-facing errors for extraction issues"
patterns-established:
- "Progressive enhancement pattern: render placeholder, attempt extraction, upgrade to native or fall back to iframe"
- "Promise.allSettled for concurrent extraction across all stream cards"
requirements-completed: [EMBED-02]
# Metrics
duration: 2min
completed: 2026-02-17
---
# Phase 04 Plan 02: Frontend Native Video Playback Summary
**Extraction-first stream card rendering with HLS.js integration and iframe fallback for native HTML5 video playback**
## Performance
- **Duration:** 2 min
- **Started:** 2026-02-17T21:48:20Z
- **Completed:** 2026-02-17T21:50:03Z
- **Tasks:** 1
- **Files modified:** 2
## Accomplishments
- Stream cards now attempt video extraction before falling back to iframe
- Native HTML5 video player renders for HLS, MP4, and WebM sources with standard controls
- HLS.js handles .m3u8 streams in non-Safari browsers; Safari uses native HLS support
- Iframe fallback preserves existing behavior for streams without extractable sources
- Loading spinners provide visual feedback during async extraction
## Task Commits
Each task was committed atomically:
1. **Task 1: Add HLS.js and update streamCard for native video playback** - `2a40af9` (feat)
**Plan metadata:** (pending final docs commit)
## Files Created/Modified
- `static/index.html` - Added HLS.js CDN script tag before other JS scripts
- `static/js/streams.js` - Extraction-first rendering: streamCard renders placeholder, tryExtractVideos/tryExtractVideo call extract API, renderNativePlayer creates HTML5 video element, renderIframeFallback preserves existing iframe approach
## Decisions Made
- HLS.js loaded from jsDelivr CDN rather than self-hosted -- avoids build tooling while keeping the library current
- DASH sources intentionally fall back to iframe -- dash.js is heavier and DASH is lower priority than HLS
- Extraction errors logged to console only -- user sees iframe fallback seamlessly, no error UI needed
- pickBestSource uses same priority ordering (HLS > DASH > MP4 > WebM) established in backend extractor
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
- Go module cache sandbox permission error during build verification; resolved with temporary GOPATH (same workaround as 04-01, environment constraint only)
## User Setup Required
None - no external service configuration required.
## Next Phase Readiness
- Phase 04 (Video Extraction & Native Playback) is now complete
- Streams with extractable video sources play in native HTML5 player
- Streams without extractable sources continue to work via iframe fallback
- Ready for Phase 05 (Polish & Monitoring) if planned
---
*Phase: 04-video-extraction-native-playback*
*Completed: 2026-02-17*
## Self-Check: PASSED
- FOUND: static/index.html
- FOUND: static/js/streams.js
- FOUND: 04-02-SUMMARY.md
- FOUND: commit 2a40af9

View file

@ -1,108 +0,0 @@
---
phase: 04-video-extraction-native-playback
verified: 2026-02-17T22:00:00Z
status: passed
score: 11/11 must-haves verified
re_verification: false
---
# Phase 4: Video Extraction and Native Playback Verification Report
**Phase Goal:** When a stream URL contains an extractable video source, users watch it in a clean native HTML5 player instead of loading the third-party page
**Verified:** 2026-02-17T22:00:00Z
**Status:** passed
**Re-verification:** No — initial verification
## Goal Achievement
### Observable Truths
| # | Truth | Status | Evidence |
|---|-------|--------|----------|
| 1 | GET /api/streams/{id}/extract returns a JSON response with extracted video source URL and type when the stream page contains a video source | ✓ VERIFIED | Endpoint exists in server.go:83, handler at server.go:242-295 returns JSON with sources array and stream_id |
| 2 | Extractor can find HLS .m3u8 URLs from video/source src attributes and script tag contents | ✓ VERIFIED | DOM extraction at extractor.go:166-192, regex patterns at extractor.go:196,201, matches .m3u8 in video/source/iframe src |
| 3 | Extractor can find DASH .mpd URLs from video/source src attributes and script tag contents | ✓ VERIFIED | DOM extraction at extractor.go:166-192, regex patterns at extractor.go:197,202, matches .mpd in video/source/iframe src |
| 4 | Extractor can find direct MP4/WebM URLs from video/source src attributes | ✓ VERIFIED | DOM extraction at extractor.go:166-192, regex patterns at extractor.go:198-199, matches .mp4/.webm in video/source/iframe src |
| 5 | Extractor can find video URLs from jwplayer, video.js, and hls.js setup calls in script tags | ✓ VERIFIED | Regex patterns at extractor.go:200-202 for JWPlayer (file:), hls.js/video.js (src:=), script extraction at extractor.go:206-223 |
| 6 | GET /api/streams/{id}/extract returns empty result (not error) when no video source is found | ✓ VERIFIED | Returns []VideoSource{} at extractor.go:66,306 with nil error when no sources found |
| 7 | When a stream has an extractable HLS source, the user sees a native video player on the app page instead of an iframe loading the third-party site | ✓ VERIFIED | tryExtractVideo at streams.js:172-189 fetches extract endpoint, renderNativePlayer at streams.js:200-228 creates HTML5 video element |
| 8 | When a stream has an extractable MP4/WebM source, the user sees a native HTML5 video element playing the stream | ✓ VERIFIED | renderNativePlayer at streams.js:200-228 handles mp4/webm by setting video.src directly (line 223) |
| 9 | When extraction fails or returns no sources, the user sees the existing iframe fallback (no regression) | ✓ VERIFIED | tryExtractVideo calls renderIframeFallback at streams.js:188 on error/empty sources, renderIframeFallback at streams.js:230-246 creates iframe element |
| 10 | HLS streams play using hls.js library when the browser does not support HLS natively | ✓ VERIFIED | HLS.js loaded from CDN at index.html:162, renderNativePlayer checks Hls.isSupported() at streams.js:212, creates new Hls() at streams.js:213-215, Safari native fallback at streams.js:216-217 |
| 11 | The video player has standard controls (play, pause, volume, fullscreen) | ✓ VERIFIED | video.controls = true at streams.js:205, HTML5 video element provides standard browser controls including play, pause, volume, fullscreen |
**Score:** 11/11 truths verified
### Required Artifacts
| Artifact | Expected | Status | Details |
|----------|----------|--------|---------|
| `internal/extractor/extractor.go` | Video source URL extraction from HTML pages | ✓ VERIFIED | 326 lines, exports Extract function and VideoSource type, implements DOM parsing with golang.org/x/net/html and regex extraction from script tags |
| `internal/server/server.go` | API endpoint for video extraction | ✓ VERIFIED | Route registered at line 83, handler at lines 242-295, calls extractor.Extract, returns JSON with sources array and Cache-Control headers |
| `static/index.html` | HLS.js library script tag | ✓ VERIFIED | HLS.js CDN script tag at line 162 before other JS scripts |
| `static/js/streams.js` | Updated streamCard with native video player support | ✓ VERIFIED | 398 lines total, includes tryExtractVideos (168-170), tryExtractVideo (172-189), renderNativePlayer (200-228), renderIframeFallback (230-246), pickBestSource (191-198) |
### Key Link Verification
| From | To | Via | Status | Details |
|------|----|----|--------|---------|
| internal/server/server.go | internal/extractor/extractor.go | handler calls extractor.Extract | ✓ WIRED | Import at server.go:13, call at server.go:282 with client and streamURL parameters |
| internal/server/server.go | internal/store | handler looks up stream by ID | ✓ WIRED | Import at server.go:16, LoadStreams() call at server.go:246, iterates to find stream by ID at lines 255-264 |
| static/js/streams.js | /api/streams/{id}/extract | fetch call to extract endpoint | ✓ WIRED | fetch call at streams.js:174 with template literal for stream ID, response parsed as JSON at line 176 |
| static/js/streams.js | Hls | HLS.js library for .m3u8 playback | ✓ WIRED | Hls.isSupported() check at streams.js:212, new Hls() instantiation at streams.js:213, loadSource/attachMedia at streams.js:214-215 |
### Requirements Coverage
| Requirement | Source Plan | Description | Status | Evidence |
|-------------|------------|-------------|--------|----------|
| EMBED-01 | 04-01-PLAN.md | Proxy fetches stream page and attempts to extract direct video source URL (HLS .m3u8, DASH .mpd, direct MP4/WebM, or embedded video player source) | ✓ SATISFIED | Extractor package at internal/extractor/extractor.go implements all extraction types: HLS (lines 107,117,147,196,201,229), DASH (lines 109,120,148,197,202,235), MP4 (lines 111,149,198), WebM (lines 113,150,199), JWPlayer/video.js/hls.js patterns (lines 200-202). Server-side extraction via handleExtractVideo at server.go:242-295 |
| EMBED-02 | 04-02-PLAN.md | When direct video source is found, render it in a minimal HTML5 video player on the app's own page (no third-party page loaded) | ✓ SATISFIED | renderNativePlayer at streams.js:200-228 creates HTML5 video element with controls=true (line 205), integrates HLS.js for .m3u8 sources (lines 212-215), sets src directly for mp4/webm (line 223), replaces loading placeholder in player-wrap div preventing third-party page load |
### Anti-Patterns Found
None detected.
Scanned files:
- `internal/extractor/extractor.go`: No TODO/FIXME/placeholder comments, no empty implementations, substantive extraction logic
- `internal/server/server.go`: Handler implementation complete, proper error handling, cache headers set
- `static/js/streams.js`: No TODO/FIXME/placeholder comments, complete extraction-first rendering logic
- `static/index.html`: HLS.js script tag present, no issues
### Human Verification Required
None. All verification completed programmatically.
### Phase Summary
Phase 4 successfully delivers video extraction and native playback capability. The backend extractor can identify HLS, DASH, MP4, and WebM sources from both DOM elements and script tags. The frontend attempts extraction first and upgrades to a native HTML5 video player when sources are found, falling back to iframe rendering when extraction fails or returns no results.
**Key accomplishments:**
- Server-side video source extraction with multiple strategies (DOM parsing + regex script extraction)
- Native HTML5 video player with HLS.js integration for .m3u8 streams
- Progressive enhancement pattern: render placeholder, attempt extraction, upgrade or fallback
- No breaking changes to existing iframe fallback behavior
- 5-minute cache on extraction endpoint to reduce upstream load
- Priority-based source selection (HLS > DASH > MP4 > WebM)
**Technical quality:**
- All artifacts exist, substantive, and properly wired
- No anti-patterns detected
- Consistent error handling with silent fallback to iframe
- User-Agent and timeout patterns match existing proxy/scraper conventions
- Proper use of golang.org/x/net/html for DOM parsing
- HLS.js loaded from CDN with browser capability detection
**Requirements fulfilled:**
- EMBED-01: Video source extraction from stream pages ✓
- EMBED-02: Native HTML5 video player rendering ✓
Phase goal achieved. Users can now watch streams with extractable video sources in a clean native player without loading third-party pages.
---
_Verified: 2026-02-17T22:00:00Z_
_Verifier: Claude (gsd-verifier)_

View file

@ -1,147 +0,0 @@
---
phase: 05-sandbox-proxy-hardening
plan: 01
type: execute
wave: 1
depends_on: []
files_modified: [internal/proxy/sanitize.go, internal/proxy/blocklist.go, internal/proxy/proxy.go, internal/server/server.go]
autonomous: true
requirements: [EMBED-06, EMBED-07, EMBED-08]
must_haves:
truths:
- "Proxied HTML content has known ad/tracker scripts and domains stripped before serving"
- "Relative URLs in proxied content are rewritten to route through the proxy"
- "All proxied content is served with strict CSP headers scoped to the sandbox context"
artifacts:
- path: "internal/proxy/sanitize.go"
provides: "HTML sanitizer that strips ads, rewrites URLs, and adds CSP"
contains: "func Sanitize"
- path: "internal/proxy/blocklist.go"
provides: "Ad/tracker domain blocklist for script filtering"
contains: "blockedDomains"
- path: "internal/proxy/proxy.go"
provides: "New /proxy/sandbox endpoint serving sanitized content"
contains: "ServeSandbox"
- path: "internal/server/server.go"
provides: "Route registration for sandbox proxy endpoint"
contains: "proxy/sandbox"
key_links:
- from: "internal/server/server.go"
to: "internal/proxy/proxy.go"
via: "route registration for /proxy/sandbox"
pattern: "proxy/sandbox.*ServeSandbox"
- from: "internal/proxy/proxy.go"
to: "internal/proxy/sanitize.go"
via: "ServeSandbox calls Sanitize"
pattern: "Sanitize\\("
---
<objective>
Backend proxy hardening: sanitize proxied HTML content by stripping ad/tracker scripts, rewriting relative URLs through the proxy, and serving with strict CSP headers.
Purpose: When the frontend shadow DOM sandbox fetches proxied page content, the backend must deliver clean HTML free of ad scripts with working sub-resources and strict content security policy.
Output: A new `/proxy/sandbox` endpoint that serves sanitized HTML for shadow DOM injection.
</objective>
<execution_context>
@/Users/viktorbarzin/.claude/get-shit-done/workflows/execute-plan.md
@/Users/viktorbarzin/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@internal/proxy/proxy.go
@internal/server/server.go
@internal/extractor/extractor.go
</context>
<tasks>
<task type="auto">
<name>Task 1: Create HTML sanitizer with ad/tracker stripping and URL rewriting</name>
<files>internal/proxy/sanitize.go, internal/proxy/blocklist.go</files>
<action>
Create `internal/proxy/blocklist.go`:
- Define a `var blockedDomains = map[string]bool{...}` containing well-known ad/tracker domains. Include at least 30-40 entries covering: doubleclick.net, googlesyndication.com, googleadservices.com, google-analytics.com, facebook.net, connect.facebook.net, adservice.google.com, pagead2.googlesyndication.com, cdn.taboola.com, cdn.outbrain.com, ads.yahoo.com, amazon-adsystem.com, adsrvr.org, criteo.com, quantserve.com, scorecardresearch.com, serving-sys.com, rubiconproject.com, pubmatic.com, moatads.com, chartbeat.com, newrelic.com, nr-data.net, hotjar.com, mixpanel.com, segment.com, amplitude.com, popads.net, popcash.net, propellerads.com, adsterra.com, exoclick.com, juicyads.com, trafficjunky.com, etc.
- Define `func IsBlockedDomain(host string) bool` that checks the hostname and its parent domains against the blocklist. For example, if host is "ads.example.com" and "example.com" is blocked, return true. Walk up the domain labels.
Create `internal/proxy/sanitize.go`:
- Import `golang.org/x/net/html` (already in go.mod).
- Define `func Sanitize(doc *html.Node, baseURL *url.URL, proxyPrefix string) string` that:
1. Walks the DOM tree and removes `<script>` elements whose `src` attribute hostname matches `IsBlockedDomain`. Remove the entire node. Inline scripts (no src) are kept for now.
2. Removes `<link>` elements with `rel="stylesheet"` or `rel="preload"` whose `href` hostname matches `IsBlockedDomain`.
3. Removes `<img>`, `<iframe>`, `<object>`, `<embed>` elements whose `src`/`data` attribute hostname matches `IsBlockedDomain`.
4. Rewrites relative URLs on `src`, `href`, `action`, `poster`, `data` attributes: resolve against `baseURL`, then prefix with `proxyPrefix + "?url=" + url.QueryEscape(resolved)`. Only rewrite if the attribute is a relative URL (doesn't start with `http://` or `https://` or `//`). Also handle `//`-prefixed URLs by resolving them with the base scheme.
5. Returns the rendered HTML as a string using `html.Render`.
- Define helper `func removeNode(n *html.Node)` that detaches a node from its parent.
- Define helper `func resolveURL(raw string, base *url.URL) string` that resolves a relative URL against the base.
</action>
<verify>
`cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./internal/proxy/...` compiles without errors. Verify that `Sanitize` function signature exists and `IsBlockedDomain` handles parent domain lookups.
</verify>
<done>
sanitize.go exports Sanitize function; blocklist.go exports IsBlockedDomain with 30+ ad/tracker domains; relative URLs are rewritten through proxy; blocked domain scripts/links/iframes are removed from the DOM tree.
</done>
</task>
<task type="auto">
<name>Task 2: Add ServeSandbox endpoint with CSP headers and wire route</name>
<files>internal/proxy/proxy.go, internal/server/server.go</files>
<action>
In `internal/proxy/proxy.go`:
- Add method `func (p *Proxy) ServeSandbox(w http.ResponseWriter, r *http.Request)`:
1. Same URL validation, rate limiting, and private-host checks as existing `ServeHTTP`.
2. Fetch the remote URL with same client and user-agent.
3. Read body (limited to `maxBodySize`).
4. Parse content type -- if not HTML, proxy as-is (for sub-resources like CSS, images, fonts). For non-HTML: set the original Content-Type, copy the body, add CSP header, and return.
5. For HTML content: parse with `html.Parse`, call `Sanitize(doc, parsedURL, "/proxy/sandbox")`, render result.
6. Set strict CSP headers on the response:
```
Content-Security-Policy: default-src 'self'; script-src 'unsafe-inline'; style-src 'self' 'unsafe-inline'; img-src * data: blob:; media-src * data: blob:; connect-src *; frame-src 'none'; object-src 'none'; base-uri 'none';
```
This allows inline scripts (needed for players) but blocks frames, objects, and restricts base-uri. img/media/connect allowed broadly since video sources can come from anywhere.
7. Set `Content-Type: text/html; charset=utf-8`.
8. Do NOT inject `<base>` tag (unlike the regular proxy) -- URL rewriting handles relative URLs.
9. Do NOT copy X-Frame-Options or upstream CSP headers.
In `internal/server/server.go`:
- In `registerRoutes`, add a new route for the sandbox proxy:
```go
s.mux.HandleFunc("GET /proxy/sandbox", s.proxy.ServeSandbox)
```
Place it after the existing `GET /proxy` route.
</action>
<verify>
`cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./...` compiles. Verify the route is registered and `ServeSandbox` sets Content-Security-Policy header.
</verify>
<done>
New `/proxy/sandbox` endpoint serves sanitized HTML with strict CSP headers. Non-HTML sub-resources proxied with CSP. Route registered in server.go. Ad scripts stripped, relative URLs rewritten, CSP enforced.
</done>
</task>
</tasks>
<verification>
1. `go build ./...` passes with no errors
2. `grep -r "ServeSandbox" internal/` shows the method and route registration
3. `grep -r "Content-Security-Policy" internal/proxy/` shows CSP header being set
4. `grep -r "IsBlockedDomain" internal/proxy/` shows blocklist integration in sanitizer
5. `grep -c "true" internal/proxy/blocklist.go` shows 30+ blocked domains
</verification>
<success_criteria>
- `/proxy/sandbox?url=...` endpoint exists and compiles
- HTML content is sanitized: ad scripts removed, relative URLs rewritten through proxy
- Strict CSP headers set on all proxied content
- Non-HTML resources proxied through with CSP headers
- Blocklist contains 30+ known ad/tracker domains
</success_criteria>
<output>
After completion, create `.planning/phases/05-sandbox-proxy-hardening/05-01-SUMMARY.md`
</output>

View file

@ -1,108 +0,0 @@
---
phase: 05-sandbox-proxy-hardening
plan: 01
subsystem: proxy
tags: [html-sanitizer, csp, ad-blocker, url-rewrite, golang-x-net-html]
# Dependency graph
requires:
- phase: 04-video-extraction-native-playback
provides: "Proxy infrastructure and html parsing with golang.org/x/net/html"
provides:
- "HTML sanitizer that strips ad/tracker scripts from proxied content"
- "Ad/tracker domain blocklist with 50+ entries and parent-domain lookup"
- "/proxy/sandbox endpoint serving sanitized HTML with strict CSP"
- "Relative URL rewriting through proxy for sub-resources"
affects: [05-02-PLAN, frontend-sandbox]
# Tech tracking
tech-stack:
added: []
patterns: [DOM-walking sanitizer, domain blocklist with parent lookup, CSP header injection]
key-files:
created:
- internal/proxy/sanitize.go
- internal/proxy/blocklist.go
modified:
- internal/proxy/proxy.go
- internal/server/server.go
key-decisions:
- "50+ ad/tracker domains in blocklist with parent-domain walk-up matching"
- "Inline scripts kept (needed for video players), blocked scripts removed by domain"
- "CSP allows img/media/connect broadly since video sources come from arbitrary origins"
- "Non-HTML sub-resources proxied as-is with CSP headers"
patterns-established:
- "DOM-walking sanitizer pattern: collect nodes to remove, then detach (avoid mutation during walk)"
- "Blocklist with parent-domain lookup: check host then walk up domain labels"
requirements-completed: [EMBED-06, EMBED-07, EMBED-08]
# Metrics
duration: 2min
completed: 2026-02-17
---
# Phase 5 Plan 1: Sandbox Proxy Hardening Summary
**Backend proxy sanitizer stripping 50+ ad/tracker domains, rewriting relative URLs through /proxy/sandbox, and enforcing strict CSP headers**
## Performance
- **Duration:** 2 min
- **Started:** 2026-02-17T22:00:20Z
- **Completed:** 2026-02-17T22:02:30Z
- **Tasks:** 2
- **Files modified:** 4
## Accomplishments
- HTML sanitizer that walks DOM tree removing scripts, links, images, iframes, objects, and embeds from blocked ad/tracker domains
- Domain blocklist with 50+ entries and parent-domain walk-up matching (e.g., ads.doubleclick.net matches doubleclick.net)
- New `/proxy/sandbox` endpoint serving sanitized HTML with strict CSP headers
- Relative and protocol-relative URL rewriting through the proxy for sub-resources
- Non-HTML content (CSS, images, fonts) proxied as-is with CSP headers applied
## Task Commits
Each task was committed atomically:
1. **Task 1: Create HTML sanitizer with ad/tracker stripping and URL rewriting** - `c0d545e` (feat)
2. **Task 2: Add ServeSandbox endpoint with CSP headers and wire route** - `322ff4d` (feat)
## Files Created/Modified
- `internal/proxy/blocklist.go` - Ad/tracker domain blocklist with 50+ domains and IsBlockedDomain with parent-domain lookup
- `internal/proxy/sanitize.go` - HTML sanitizer: strips blocked elements, rewrites relative URLs through proxy
- `internal/proxy/proxy.go` - ServeSandbox endpoint with CSP headers, HTML parsing, and sanitization
- `internal/server/server.go` - Route registration for GET /proxy/sandbox
## Decisions Made
- Kept inline scripts (no src attribute) because video players need them; only strip scripts with blocked-domain src
- CSP policy allows `script-src 'unsafe-inline'` for player compatibility while blocking frames and objects
- Images, media, and connect-src allowed broadly (`*`) since video sources come from arbitrary CDN origins
- Non-HTML resources get CSP headers but no body transformation (CSS/images/fonts pass through)
- Parent-domain walk-up in blocklist: blocking "example.com" also blocks "ads.example.com" and deeper subdomains
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
None
## User Setup Required
None - no external service configuration required.
## Next Phase Readiness
- Sandbox proxy endpoint ready for frontend shadow DOM integration (05-02)
- `/proxy/sandbox?url=...` serves clean HTML suitable for shadow DOM injection
- CSP headers prevent iframe embedding and object injection from sanitized content
---
*Phase: 05-sandbox-proxy-hardening*
*Completed: 2026-02-17*
## Self-Check: PASSED
All files verified present. All commits verified in history.

View file

@ -1,240 +0,0 @@
---
phase: 05-sandbox-proxy-hardening
plan: 02
type: execute
wave: 2
depends_on: [05-01]
files_modified: [static/js/streams.js]
autonomous: true
requirements: [EMBED-03, EMBED-04, EMBED-05]
must_haves:
truths:
- "When video extraction fails, the proxied page renders inside a shadow DOM sandbox instead of a raw iframe"
- "The sandbox blocks window.open, window.top navigation, popup creation, and alert/confirm/prompt"
- "The sandbox prevents proxied content from accessing parent page cookies and localStorage"
artifacts:
- path: "static/js/streams.js"
provides: "Shadow DOM sandbox fallback replacing iframe fallback"
contains: "attachShadow"
key_links:
- from: "static/js/streams.js"
to: "/proxy/sandbox"
via: "fetch call to get sanitized HTML for shadow DOM injection"
pattern: "fetch.*proxy/sandbox"
- from: "static/js/streams.js"
to: "static/js/streams.js"
via: "tryExtractVideo falls back to renderSandboxFallback"
pattern: "renderSandboxFallback"
---
<objective>
Frontend shadow DOM sandbox: replace the iframe fallback with a shadow DOM sandbox that fetches sanitized proxied content and injects it with dangerous API overrides blocked.
Purpose: When direct video extraction fails, users see the proxied page in a secure sandbox that prevents popups, navigation hijacking, cookie theft, and dialog spam -- while still rendering the page content.
Output: Updated streams.js with shadow DOM sandbox fallback replacing the iframe fallback.
</objective>
<execution_context>
@/Users/viktorbarzin/.claude/get-shit-done/workflows/execute-plan.md
@/Users/viktorbarzin/.claude/get-shit-done/templates/summary.md
</execution_context>
<context>
@.planning/PROJECT.md
@.planning/ROADMAP.md
@.planning/STATE.md
@.planning/phases/05-sandbox-proxy-hardening/05-01-SUMMARY.md
@static/js/streams.js
@static/index.html
</context>
<tasks>
<task type="auto">
<name>Task 1: Replace iframe fallback with shadow DOM sandbox</name>
<files>static/js/streams.js</files>
<action>
Replace the `renderIframeFallback` function with a new `renderSandboxFallback` function. Update `tryExtractVideo` to call `renderSandboxFallback` instead of `renderIframeFallback`.
**renderSandboxFallback(streamId, streamURL):**
1. Get the player wrap element: `document.getElementById('player-wrap-' + streamId)`.
2. Create a container div for the sandbox: `const container = document.createElement('div')`. Style it: `width: 100%; height: 100%; overflow: auto; position: relative;`.
3. Attach a shadow DOM: `const shadow = container.attachShadow({ mode: 'closed' })`. Using 'closed' mode prevents external JS from accessing the shadow root.
4. Fetch sanitized content from the backend: `fetch('/proxy/sandbox?url=' + encodeURIComponent(streamURL))`. If fetch fails, fall back to a simple link to open the stream directly (same pattern as current proxy error fallback).
5. Get the response text (sanitized HTML).
6. Create a sandboxing script that will be prepended to the shadow DOM content. This script overrides dangerous APIs within the shadow DOM context:
```javascript
const sandboxScript = `
<script>
(function() {
// Block window.open
window.open = function() { return null; };
// Block popups
window.showModalDialog = function() {};
// Block alert/confirm/prompt
window.alert = function() {};
window.confirm = function() { return false; };
window.prompt = function() { return null; };
// Block top-frame navigation
try {
Object.defineProperty(window, 'top', { get: function() { return window; }, configurable: false });
} catch(e) {}
try {
Object.defineProperty(window, 'parent', { get: function() { return window; }, configurable: false });
} catch(e) {}
// Block location changes to parent
try {
Object.defineProperty(window, 'location', {
get: function() { return undefined; },
set: function() {},
configurable: false
});
} catch(e) {}
// Block cookie access
try {
Object.defineProperty(document, 'cookie', {
get: function() { return ''; },
set: function() {},
configurable: false
});
} catch(e) {}
// Block localStorage/sessionStorage access
try {
Object.defineProperty(window, 'localStorage', { get: function() { return null; }, configurable: false });
Object.defineProperty(window, 'sessionStorage', { get: function() { return null; }, configurable: false });
} catch(e) {}
})();
<\/script>
`;
```
7. Inject the content into the shadow DOM: `shadow.innerHTML = sandboxScript + html`. Note: shadow DOM naturally provides CSS isolation. The closed mode prevents `container.shadowRoot` from being accessible.
8. Add a style element to the shadow DOM for basic layout:
```javascript
const style = document.createElement('style');
style.textContent = ':host { display: block; width: 100%; height: 100%; overflow: auto; } * { max-width: 100%; }';
shadow.prepend(style);
```
9. Clear the player wrap and append the container:
```javascript
wrap.innerHTML = '';
wrap.appendChild(container);
```
**Update tryExtractVideo:**
- Change the last line from `renderIframeFallback(stream.id, stream.url)` to `renderSandboxFallback(stream.id, stream.url)`.
**Keep renderIframeFallback as a private fallback:**
- Rename the existing `renderIframeFallback` to `_renderDirectLink` and simplify it to just show a "Click to open" link in case the sandbox fetch also fails. This is the ultimate fallback.
- In `renderSandboxFallback`, if the fetch to `/proxy/sandbox` fails, call `_renderDirectLink(streamId, streamURL)` which renders a simple styled link/button to open the stream in a new tab.
**Important notes:**
- Shadow DOM `innerHTML` does NOT execute `<script>` tags directly. To make the sandbox overrides work, we need a different approach. Instead of innerHTML with script tags, create the override script as a `<script>` element and insert it, then set innerHTML for the rest. Actually, scripts in shadow DOM innerHTML are also not executed. The better approach:
- The sandbox script overrides should be injected as a `<script>` element created via `document.createElement('script')` and appended to the shadow root BEFORE injecting the HTML content.
- Then create a div inside the shadow root and set its innerHTML to the sanitized HTML (which already has ad scripts stripped by the backend).
- Since the backend strips known ad/tracker scripts (EMBED-06), the inline scripts that remain are player scripts needed for video playback. The CSP from the backend response already constrains what those scripts can do.
- The shadow DOM approach primarily provides: CSS isolation, DOM isolation (closed mode), and the API overrides from the injected script.
**Revised approach for script execution:**
```javascript
async function renderSandboxFallback(streamId, streamURL) {
const wrap = document.getElementById('player-wrap-' + streamId);
if (!wrap) return;
try {
const resp = await fetch('/proxy/sandbox?url=' + encodeURIComponent(streamURL));
if (!resp.ok) throw new Error('proxy fetch failed');
const html = await resp.text();
const container = document.createElement('div');
container.style.cssText = 'width:100%;height:100%;overflow:auto;position:relative;';
const shadow = container.attachShadow({ mode: 'closed' });
// Base styles for shadow DOM content
const style = document.createElement('style');
style.textContent = ':host{display:block;width:100%;height:100%;overflow:auto}img,video,iframe,embed,object{max-width:100%}';
shadow.appendChild(style);
// Sandbox overrides script (executes because created via createElement)
const script = document.createElement('script');
script.textContent = [
'(function(){',
'window.open=function(){return null};',
'window.alert=function(){};',
'window.confirm=function(){return false};',
'window.prompt=function(){return null};',
'try{Object.defineProperty(window,"top",{get:function(){return window},configurable:false})}catch(e){}',
'try{Object.defineProperty(window,"parent",{get:function(){return window},configurable:false})}catch(e){}',
'try{Object.defineProperty(document,"cookie",{get:function(){return""},set:function(){},configurable:false})}catch(e){}',
'try{Object.defineProperty(window,"localStorage",{get:function(){return null},configurable:false})}catch(e){}',
'try{Object.defineProperty(window,"sessionStorage",{get:function(){return null},configurable:false})}catch(e){}',
'})();'
].join('');
shadow.appendChild(script);
// Content div with sanitized HTML
const content = document.createElement('div');
content.innerHTML = html;
shadow.appendChild(content);
wrap.innerHTML = '';
wrap.appendChild(container);
} catch (e) {
console.log('Sandbox fallback failed for stream', streamId, e);
_renderDirectLink(streamId, streamURL);
}
}
function _renderDirectLink(streamId, streamURL) {
const wrap = document.getElementById('player-wrap-' + streamId);
if (!wrap) return;
wrap.innerHTML = '<div style="display:flex;align-items:center;justify-content:center;height:100%;flex-direction:column;gap:8px;padding:16px;text-align:center"><p style="margin:0;color:#888">Could not load preview</p><a href="' + escapeHtml(streamURL) + '" target="_blank" rel="noopener" style="color:#e10600;text-decoration:underline">Open stream directly</a></div>';
}
```
This is the complete implementation. The key insight is that `document.createElement('script')` executed via `appendChild` DOES run in shadow DOM, unlike innerHTML scripts.
</action>
<verify>
Open `static/js/streams.js` and verify:
1. `renderSandboxFallback` function exists and uses `attachShadow({ mode: 'closed' })`
2. `tryExtractVideo` calls `renderSandboxFallback` (not `renderIframeFallback`)
3. Sandbox overrides script blocks window.open, alert, confirm, prompt, top, parent, cookie, localStorage, sessionStorage
4. Fetch targets `/proxy/sandbox?url=...`
5. `_renderDirectLink` exists as ultimate fallback
</verify>
<done>
iframe fallback replaced with shadow DOM sandbox; sandbox blocks window.open, top-frame navigation, popups, alert/confirm/prompt; shadow DOM prevents access to parent cookies and localStorage; proxied content fetched from /proxy/sandbox endpoint with sanitized HTML.
</done>
</task>
</tasks>
<verification>
1. `grep "attachShadow" static/js/streams.js` returns a match
2. `grep "renderSandboxFallback" static/js/streams.js` shows function definition and usage in tryExtractVideo
3. `grep "window.open" static/js/streams.js` shows the override in sandbox script
4. `grep "proxy/sandbox" static/js/streams.js` shows fetch call to sanitized proxy endpoint
5. `grep "renderIframeFallback" static/js/streams.js` returns NO matches (replaced)
6. `grep "_renderDirectLink" static/js/streams.js` shows ultimate fallback function
</verification>
<success_criteria>
- Shadow DOM sandbox renders proxied content when video extraction fails
- Closed shadow DOM mode prevents external access to shadow root
- window.open, alert, confirm, prompt all overridden to no-ops
- window.top and window.parent return self (prevents navigation hijacking)
- document.cookie returns empty string, localStorage/sessionStorage return null
- Sanitized HTML fetched from /proxy/sandbox endpoint
- Direct link fallback if sandbox fetch fails
</success_criteria>
<output>
After completion, create `.planning/phases/05-sandbox-proxy-hardening/05-02-SUMMARY.md`
</output>

View file

@ -1,102 +0,0 @@
---
phase: 05-sandbox-proxy-hardening
plan: 02
subsystem: frontend
tags: [shadow-dom, sandbox, security, javascript, xss-prevention]
# Dependency graph
requires:
- phase: 05-sandbox-proxy-hardening
plan: 01
provides: "/proxy/sandbox endpoint serving sanitized HTML with CSP headers"
- phase: 04-video-extraction-native-playback
provides: "tryExtractVideo and renderIframeFallback in streams.js"
provides:
- "Shadow DOM sandbox fallback replacing iframe fallback for proxied content"
- "API override script blocking window.open, alert, confirm, prompt, top/parent navigation"
- "Cookie and storage isolation preventing proxied content from accessing parent page data"
- "_renderDirectLink ultimate fallback when sandbox fetch fails"
affects: []
# Tech tracking
tech-stack:
added: []
patterns: [closed shadow DOM for content isolation, createElement script injection for sandbox overrides]
key-files:
created: []
modified:
- static/js/streams.js
key-decisions:
- "Closed shadow DOM mode prevents external JS from accessing shadow root"
- "Script element created via createElement (not innerHTML) to ensure execution in shadow DOM"
- "Direct link fallback when sandbox proxy fetch fails rather than broken state"
patterns-established:
- "Shadow DOM sandbox pattern: closed mode + script overrides + sanitized HTML injection"
- "Graduated fallback: native player > shadow DOM sandbox > direct link"
requirements-completed: [EMBED-03, EMBED-04, EMBED-05]
# Metrics
duration: 1min
completed: 2026-02-17
---
# Phase 5 Plan 2: Frontend Shadow DOM Sandbox Summary
**Closed shadow DOM sandbox replacing iframe fallback with API overrides blocking popups, navigation hijacking, cookie theft, and dialog spam**
## Performance
- **Duration:** 1 min
- **Started:** 2026-02-17T22:05:22Z
- **Completed:** 2026-02-17T22:06:28Z
- **Tasks:** 1
- **Files modified:** 1
## Accomplishments
- Replaced renderIframeFallback with renderSandboxFallback using closed shadow DOM for CSS and DOM isolation
- Sandbox script overrides window.open, alert, confirm, prompt to no-ops within proxied content
- window.top and window.parent overridden to return self, preventing navigation hijacking
- document.cookie, localStorage, sessionStorage blocked from proxied content access
- Sanitized HTML fetched from /proxy/sandbox endpoint and injected into shadow DOM
- _renderDirectLink added as ultimate fallback showing a simple "Open stream directly" link
## Task Commits
Each task was committed atomically:
1. **Task 1: Replace iframe fallback with shadow DOM sandbox** - `89c85fe` (feat)
## Files Created/Modified
- `static/js/streams.js` - Replaced renderIframeFallback with renderSandboxFallback (closed shadow DOM, API overrides, /proxy/sandbox fetch), added _renderDirectLink ultimate fallback
## Decisions Made
- Used closed shadow DOM mode to prevent external JavaScript from accessing the shadow root via container.shadowRoot
- Created sandbox override script via document.createElement('script') rather than innerHTML because scripts in innerHTML do not execute in shadow DOM
- Kept the graduated fallback chain: native video player > shadow DOM sandbox > direct link (instead of erroring)
## Deviations from Plan
None - plan executed exactly as written.
## Issues Encountered
None
## User Setup Required
None - no external service configuration required.
## Next Phase Readiness
- All 5 phases complete
- Full pipeline operational: scraping > health checks > auto-publish > video extraction > sandbox proxy hardening
- Frontend provides secure shadow DOM sandbox when video extraction fails
---
*Phase: 05-sandbox-proxy-hardening*
*Completed: 2026-02-17*
## Self-Check: PASSED
All files verified present. All commits verified in history.

View file

@ -1,170 +0,0 @@
---
phase: 05-sandbox-proxy-hardening
verified: 2026-02-17T23:15:00Z
status: passed
score: 8/8 must-haves verified
re_verification: false
---
# Phase 5: Sandbox and Proxy Hardening Verification Report
**Phase Goal:** When direct video extraction fails, the proxied page is rendered safely in a sandbox that blocks popups, ads, and access to the parent page
**Verified:** 2026-02-17T23:15:00Z
**Status:** PASSED
**Re-verification:** No — initial verification
## Goal Achievement
### Observable Truths
| # | Truth | Status | Evidence |
|---|-------|--------|----------|
| 1 | When direct video extraction fails, the full proxied page renders inside a shadow DOM sandbox on the app's page | ✓ VERIFIED | streams.js line 188: tryExtractVideo calls renderSandboxFallback; line 241: attachShadow({ mode: 'closed' }) |
| 2 | The sandbox blocks window.open, top-frame navigation, popup creation, and alert/confirm/prompt dialogs | ✓ VERIFIED | streams.js lines 252-257: overrides for window.open, alert, confirm, prompt, top, parent all present |
| 3 | The sandbox prevents proxied content from accessing parent page cookies and localStorage | ✓ VERIFIED | streams.js lines 258-260: document.cookie, localStorage, sessionStorage all blocked via Object.defineProperty |
| 4 | Known ad/tracker scripts and domains are stripped from proxied content before serving | ✓ VERIFIED | sanitize.go lines 26-51: removes script/link/img/iframe/object/embed from blocked domains; blocklist.go: 49+ blocked domains |
| 5 | Relative URLs in proxied content are rewritten to route through the proxy | ✓ VERIFIED | sanitize.go lines 114,119: relative URLs rewritten with proxyPrefix + QueryEscape |
| 6 | All proxied content is served with strict CSP headers scoped to the sandbox context | ✓ VERIFIED | proxy.go lines 195,207,216: sandboxCSP header set on all responses from ServeSandbox |
**Score:** 6/6 truths verified (100%)
### Required Artifacts
#### Plan 05-01 Artifacts
| Artifact | Expected | Status | Details |
|----------|----------|--------|---------|
| `internal/proxy/sanitize.go` | HTML sanitizer that strips ads, rewrites URLs, and adds CSP | ✓ VERIFIED | Exists, 149 lines; exports Sanitize function; contains rewriteAttrs, hostBlocked, resolveURL helpers |
| `internal/proxy/blocklist.go` | Ad/tracker domain blocklist for script filtering | ✓ VERIFIED | Exists, 98 lines; 49 blocked domains in map; IsBlockedDomain with parent-domain walk-up |
| `internal/proxy/proxy.go` | New /proxy/sandbox endpoint serving sanitized content | ✓ VERIFIED | Exists; ServeSandbox method at line 140; calls Sanitize at line 213; sets CSP headers |
| `internal/server/server.go` | Route registration for sandbox proxy endpoint | ✓ VERIFIED | Line 61: route registered "GET /proxy/sandbox" -> s.proxy.ServeSandbox |
#### Plan 05-02 Artifacts
| Artifact | Expected | Status | Details |
|----------|----------|--------|---------|
| `static/js/streams.js` | Shadow DOM sandbox fallback replacing iframe fallback | ✓ VERIFIED | Exists; renderSandboxFallback at line 230; attachShadow({ mode: 'closed' }) at line 241; sandbox script injection lines 249-263 |
### Key Link Verification
| From | To | Via | Status | Details |
|------|-----|-----|--------|---------|
| `internal/server/server.go` | `internal/proxy/proxy.go` | route registration for /proxy/sandbox | ✓ WIRED | server.go:61 registers "GET /proxy/sandbox" to s.proxy.ServeSandbox |
| `internal/proxy/proxy.go` | `internal/proxy/sanitize.go` | ServeSandbox calls Sanitize | ✓ WIRED | proxy.go:213 calls Sanitize(doc, parsed, "/proxy/sandbox") |
| `static/js/streams.js` | `/proxy/sandbox` | fetch call to get sanitized HTML for shadow DOM injection | ✓ WIRED | streams.js:235 fetches '/proxy/sandbox?url=' + encodeURIComponent(streamURL) |
| `static/js/streams.js` | `static/js/streams.js` | tryExtractVideo falls back to renderSandboxFallback | ✓ WIRED | streams.js:188 calls renderSandboxFallback when extraction fails |
### Requirements Coverage
| Requirement | Source Plan | Description | Status | Evidence |
|-------------|-------------|-------------|--------|----------|
| EMBED-03 | 05-02 | When direct extraction fails, fall back to rendering the full proxied page in a shadow DOM sandbox | ✓ SATISFIED | tryExtractVideo calls renderSandboxFallback; shadow DOM created with mode: 'closed' |
| EMBED-04 | 05-02 | Shadow DOM sandbox blocks window.open, window.top navigation, popup creation, and alert/confirm/prompt | ✓ SATISFIED | Sandbox script overrides all 7 dangerous APIs (window.open, alert, confirm, prompt, top, parent, location) |
| EMBED-05 | 05-02 | Shadow DOM sandbox prevents access to parent page cookies and localStorage | ✓ SATISFIED | document.cookie returns '', localStorage/sessionStorage return null via Object.defineProperty |
| EMBED-06 | 05-01 | Proxy strips known ad/tracker scripts and domains from proxied content before serving | ✓ SATISFIED | Sanitizer removes script/link/img/iframe/object/embed from 49+ blocked domains |
| EMBED-07 | 05-01 | Proxy rewrites relative URLs in proxied content to route through the proxy | ✓ SATISFIED | rewriteAttrs rewrites src/href/action/poster/data attributes through /proxy/sandbox |
| EMBED-08 | 05-01 | All proxied content served with strict CSP headers scoped to the sandbox context | ✓ SATISFIED | sandboxCSP constant applied to all ServeSandbox responses (HTML and non-HTML) |
**Coverage:** 6/6 requirements satisfied (100%)
**Orphaned requirements:** None — all requirements from REQUIREMENTS.md phase 5 mapping are claimed by plans
### Anti-Patterns Found
| File | Line | Pattern | Severity | Impact |
|------|------|---------|----------|--------|
| - | - | None detected | - | - |
**Scanned files:**
- internal/proxy/sanitize.go — clean, no placeholders or stubs
- internal/proxy/blocklist.go — clean, 49 real domains
- internal/proxy/proxy.go — clean, full ServeSandbox implementation
- static/js/streams.js — clean, complete shadow DOM sandbox with all overrides
**Checks performed:**
- No TODO/FIXME/PLACEHOLDER comments found
- No empty return statements (return null, return {}, return [])
- No console.log-only implementations
- All functions have substantive implementations
### Human Verification Required
#### 1. Shadow DOM Isolation Effectiveness
**Test:** Open a stream that fails video extraction. Inspect browser DevTools Console while the sandbox content loads.
**Expected:**
- No popup windows appear
- No alert/confirm/prompt dialogs appear
- Parent page cookies remain accessible to parent JavaScript (verify document.cookie in parent console)
- Shadow DOM content cannot access parent cookies (blocked by override)
**Why human:** Requires real browser testing to verify runtime isolation behavior. Cannot be verified by static analysis.
#### 2. Ad/Tracker Blocking Effectiveness
**Test:** Load a stream page through /proxy/sandbox that is known to have ads (use a popular streaming site). Inspect Network tab.
**Expected:**
- Requests to doubleclick.net, googlesyndication.com, etc. are NOT made
- Ad scripts stripped from DOM before rendering
- Page still functional (video player scripts kept)
**Why human:** Requires visual inspection and network monitoring to verify ads are actually blocked without breaking video playback.
#### 3. Relative URL Rewriting
**Test:** Load a stream page through /proxy/sandbox that uses relative URLs for images, CSS, or sub-resources.
**Expected:**
- All relative URLs (src="img/logo.png") are rewritten to route through /proxy/sandbox?url=...
- Sub-resources load correctly (no broken images or missing CSS)
- Protocol-relative URLs (//cdn.example.com/style.css) are rewritten
**Why human:** Requires testing with real stream pages that use various URL patterns. Static analysis confirms the rewrite logic exists but cannot verify it handles all edge cases.
#### 4. CSP Header Enforcement
**Test:** Open browser DevTools Console while a sandboxed stream loads. Check Network tab response headers.
**Expected:**
- All /proxy/sandbox responses have Content-Security-Policy header
- Header value matches sandboxCSP constant (default-src 'self'; script-src 'unsafe-inline'; ...)
- CSP violations (if any) appear in console but don't break video playback
**Why human:** CSP enforcement is browser-side. Need to verify headers are present and effective without breaking player functionality.
#### 5. Graduated Fallback Chain
**Test:** Test with three types of streams:
1. Stream with extractable HLS/MP4 source
2. Stream where extraction fails but proxy works
3. Stream URL that returns 404/500
**Expected:**
1. Native HTML5 video player renders
2. Shadow DOM sandbox renders the proxied page
3. "Open stream directly" link appears
**Why human:** Requires testing multiple failure scenarios to verify the fallback chain works correctly at each level.
### Gaps Summary
**No gaps found.** All must-haves verified, all requirements satisfied, no anti-patterns detected.
**Implementation quality:**
- ✓ Backend: Complete sanitizer with 49 blocked domains, URL rewriting, and CSP enforcement
- ✓ Frontend: Closed shadow DOM with 7 dangerous API overrides (window.open, alert, confirm, prompt, top, parent, cookie, localStorage, sessionStorage)
- ✓ Wiring: All key links verified (route registration, Sanitize call, fetch to /proxy/sandbox)
- ✓ Fallback: Graduated fallback chain (native player > sandbox > direct link)
**Ready to proceed:** Yes. Phase 5 goal achieved. All 6 EMBED requirements (EMBED-03 through EMBED-08) satisfied.
---
_Verified: 2026-02-17T23:15:00Z_
_Verifier: Claude (gsd-verifier)_

View file

@ -1,21 +1,12 @@
FROM golang:1.24-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o /f1-stream .
FROM python:3.13-slim-bookworm
FROM alpine:3.20
RUN apk add --no-cache \
ca-certificates \
chromium nss freetype harfbuzz ttf-freefont \
mesa-dri-gallium mesa-gl \
dbus \
xvfb-run xorg-server \
pulseaudio pulseaudio-utils \
ffmpeg
ENV CHROME_PATH=/usr/bin/chromium-browser
COPY --from=builder /f1-stream /f1-stream
COPY static/ /static/
EXPOSE 8080
ENTRYPOINT ["/f1-stream"]
WORKDIR /app
COPY backend/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY backend/ ./backend/
EXPOSE 8000
CMD ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]

View file

@ -0,0 +1,19 @@
from fastapi import FastAPI
app = FastAPI(title="F1 Streams")
@app.get("/health")
async def health():
return {"status": "ok"}
@app.get("/")
async def root():
return {"service": "f1-streams", "version": "2.0.1"}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)

View file

@ -0,0 +1,2 @@
fastapi==0.115.0
uvicorn[standard]

View file

@ -1,45 +0,0 @@
module f1-stream
go 1.24.1
require (
github.com/chromedp/cdproto v0.0.0-20250724212937-08a3db8b4327
github.com/chromedp/chromedp v0.14.2
github.com/go-webauthn/webauthn v0.15.0
github.com/gobwas/ws v1.4.0
github.com/pion/webrtc/v4 v4.2.9
)
require (
github.com/chromedp/sysutil v1.1.0 // indirect
github.com/fxamacker/cbor/v2 v2.9.0 // indirect
github.com/go-json-experiment/json v0.0.0-20250725192818-e39067aee2d2 // indirect
github.com/go-viper/mapstructure/v2 v2.4.0 // indirect
github.com/go-webauthn/x v0.1.26 // indirect
github.com/gobwas/httphead v0.1.0 // indirect
github.com/gobwas/pool v0.2.1 // indirect
github.com/golang-jwt/jwt/v5 v5.3.0 // indirect
github.com/google/go-tpm v0.9.6 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/pion/datachannel v1.6.0 // indirect
github.com/pion/dtls/v3 v3.1.2 // indirect
github.com/pion/ice/v4 v4.2.1 // indirect
github.com/pion/interceptor v0.1.44 // indirect
github.com/pion/logging v0.2.4 // indirect
github.com/pion/mdns/v2 v2.1.0 // indirect
github.com/pion/randutil v0.1.0 // indirect
github.com/pion/rtcp v1.2.16 // indirect
github.com/pion/rtp v1.10.1 // indirect
github.com/pion/sctp v1.9.2 // indirect
github.com/pion/sdp/v3 v3.0.18 // indirect
github.com/pion/srtp/v3 v3.0.10 // indirect
github.com/pion/stun/v3 v3.1.1 // indirect
github.com/pion/transport/v4 v4.0.1 // indirect
github.com/pion/turn/v4 v4.1.4 // indirect
github.com/wlynxg/anet v0.0.5 // indirect
github.com/x448/float16 v0.8.4 // indirect
golang.org/x/crypto v0.48.0 // indirect
golang.org/x/net v0.50.0 // indirect
golang.org/x/sys v0.41.0 // indirect
golang.org/x/time v0.10.0 // indirect
)

View file

@ -1,89 +0,0 @@
github.com/chromedp/cdproto v0.0.0-20250724212937-08a3db8b4327 h1:UQ4AU+BGti3Sy/aLU8KVseYKNALcX9UXY6DfpwQ6J8E=
github.com/chromedp/cdproto v0.0.0-20250724212937-08a3db8b4327/go.mod h1:NItd7aLkcfOA/dcMXvl8p1u+lQqioRMq/SqDp71Pb/k=
github.com/chromedp/chromedp v0.14.2 h1:r3b/WtwM50RsBZHMUm9fsNhhzRStTHrKdr2zmwbZSzM=
github.com/chromedp/chromedp v0.14.2/go.mod h1:rHzAv60xDE7VNy/MYtTUrYreSc0ujt2O1/C3bzctYBo=
github.com/chromedp/sysutil v1.1.0 h1:PUFNv5EcprjqXZD9nJb9b/c9ibAbxiYo4exNWZyipwM=
github.com/chromedp/sysutil v1.1.0/go.mod h1:WiThHUdltqCNKGc4gaU50XgYjwjYIhKWoHGPTUfWTJ8=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/fxamacker/cbor/v2 v2.9.0 h1:NpKPmjDBgUfBms6tr6JZkTHtfFGcMKsw3eGcmD/sapM=
github.com/fxamacker/cbor/v2 v2.9.0/go.mod h1:vM4b+DJCtHn+zz7h3FFp/hDAI9WNWCsZj23V5ytsSxQ=
github.com/go-json-experiment/json v0.0.0-20250725192818-e39067aee2d2 h1:iizUGZ9pEquQS5jTGkh4AqeeHCMbfbjeb0zMt0aEFzs=
github.com/go-json-experiment/json v0.0.0-20250725192818-e39067aee2d2/go.mod h1:TiCD2a1pcmjd7YnhGH0f/zKNcCD06B029pHhzV23c2M=
github.com/go-viper/mapstructure/v2 v2.4.0 h1:EBsztssimR/CONLSZZ04E8qAkxNYq4Qp9LvH92wZUgs=
github.com/go-viper/mapstructure/v2 v2.4.0/go.mod h1:oJDH3BJKyqBA2TXFhDsKDGDTlndYOZ6rGS0BRZIxGhM=
github.com/go-webauthn/webauthn v0.15.0 h1:LR1vPv62E0/6+sTenX35QrCmpMCzLeVAcnXeH4MrbJY=
github.com/go-webauthn/webauthn v0.15.0/go.mod h1:hcAOhVChPRG7oqG7Xj6XKN1mb+8eXTGP/B7zBLzkX5A=
github.com/go-webauthn/x v0.1.26 h1:eNzreFKnwNLDFoywGh9FA8YOMebBWTUNlNSdolQRebs=
github.com/go-webauthn/x v0.1.26/go.mod h1:jmf/phPV6oIsF6hmdVre+ovHkxjDOmNH0t6fekWUxvg=
github.com/gobwas/httphead v0.1.0 h1:exrUm0f4YX0L7EBwZHuCF4GDp8aJfVeBrlLQrs6NqWU=
github.com/gobwas/httphead v0.1.0/go.mod h1:O/RXo79gxV8G+RqlR/otEwx4Q36zl9rqC5u12GKvMCM=
github.com/gobwas/pool v0.2.1 h1:xfeeEhW7pwmX8nuLVlqbzVc7udMDrwetjEv+TZIz1og=
github.com/gobwas/pool v0.2.1/go.mod h1:q8bcK0KcYlCgd9e7WYLm9LpyS+YeLd8JVDW6WezmKEw=
github.com/gobwas/ws v1.4.0 h1:CTaoG1tojrh4ucGPcoJFiAQUAsEWekEWvLy7GsVNqGs=
github.com/gobwas/ws v1.4.0/go.mod h1:G3gNqMNtPppf5XUz7O4shetPpcZ1VJ7zt18dlUeakrc=
github.com/golang-jwt/jwt/v5 v5.3.0 h1:pv4AsKCKKZuqlgs5sUmn4x8UlGa0kEVt/puTpKx9vvo=
github.com/golang-jwt/jwt/v5 v5.3.0/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
github.com/google/go-tpm v0.9.6 h1:Ku42PT4LmjDu1H5C5ISWLlpI1mj+Zq7sPGKoRw2XROA=
github.com/google/go-tpm v0.9.6/go.mod h1:h9jEsEECg7gtLis0upRBQU+GhYVH6jMjrFxI8u6bVUY=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
github.com/ledongthuc/pdf v0.0.0-20220302134840-0c2507a12d80 h1:6Yzfa6GP0rIo/kULo2bwGEkFvCePZ3qHDDTC3/J9Swo=
github.com/ledongthuc/pdf v0.0.0-20220302134840-0c2507a12d80/go.mod h1:imJHygn/1yfhB7XSJJKlFZKl/J+dCPAknuiaGOshXAs=
github.com/orisano/pixelmatch v0.0.0-20220722002657-fb0b55479cde h1:x0TT0RDC7UhAVbbWWBzr41ElhJx5tXPWkIHA2HWPRuw=
github.com/orisano/pixelmatch v0.0.0-20220722002657-fb0b55479cde/go.mod h1:nZgzbfBr3hhjoZnS66nKrHmduYNpc34ny7RK4z5/HM0=
github.com/pion/datachannel v1.6.0 h1:XecBlj+cvsxhAMZWFfFcPyUaDZtd7IJvrXqlXD/53i0=
github.com/pion/datachannel v1.6.0/go.mod h1:ur+wzYF8mWdC+Mkis5Thosk+u/VOL287apDNEbFpsIk=
github.com/pion/dtls/v3 v3.1.2 h1:gqEdOUXLtCGW+afsBLO0LtDD8GnuBBjEy6HRtyofZTc=
github.com/pion/dtls/v3 v3.1.2/go.mod h1:Hw/igcX4pdY69z1Hgv5x7wJFrUkdgHwAn/Q/uo7YHRo=
github.com/pion/ice/v4 v4.2.1 h1:XPRYXaLiFq3LFDG7a7bMrmr3mFr27G/gtXN3v/TVfxY=
github.com/pion/ice/v4 v4.2.1/go.mod h1:2quLV1S5v1tAx3VvAJaH//KGitRXvo4RKlX6D3tnN+c=
github.com/pion/interceptor v0.1.44 h1:sNlZwM8dWXU9JQAkJh8xrarC0Etn8Oolcniukmuy0/I=
github.com/pion/interceptor v0.1.44/go.mod h1:4atVlBkcgXuUP+ykQF0qOCGU2j7pQzX2ofvPRFsY5RY=
github.com/pion/logging v0.2.4 h1:tTew+7cmQ+Mc1pTBLKH2puKsOvhm32dROumOZ655zB8=
github.com/pion/logging v0.2.4/go.mod h1:DffhXTKYdNZU+KtJ5pyQDjvOAh/GsNSyv1lbkFbe3so=
github.com/pion/mdns/v2 v2.1.0 h1:3IJ9+Xio6tWYjhN6WwuY142P/1jA0D5ERaIqawg/fOY=
github.com/pion/mdns/v2 v2.1.0/go.mod h1:pcez23GdynwcfRU1977qKU0mDxSeucttSHbCSfFOd9A=
github.com/pion/randutil v0.1.0 h1:CFG1UdESneORglEsnimhUjf33Rwjubwj6xfiOXBa3mA=
github.com/pion/randutil v0.1.0/go.mod h1:XcJrSMMbbMRhASFVOlj/5hQial/Y8oH/HVo7TBZq+j8=
github.com/pion/rtcp v1.2.16 h1:fk1B1dNW4hsI78XUCljZJlC4kZOPk67mNRuQ0fcEkSo=
github.com/pion/rtcp v1.2.16/go.mod h1:/as7VKfYbs5NIb4h6muQ35kQF/J0ZVNz2Z3xKoCBYOo=
github.com/pion/rtp v1.10.1 h1:xP1prZcCTUuhO2c83XtxyOHJteISg6o8iPsE2acaMtA=
github.com/pion/rtp v1.10.1/go.mod h1:rF5nS1GqbR7H/TCpKwylzeq6yDM+MM6k+On5EgeThEM=
github.com/pion/sctp v1.9.2 h1:HxsOzEV9pWoeggv7T5kewVkstFNcGvhMPx0GvUOUQXo=
github.com/pion/sctp v1.9.2/go.mod h1:OTOlsQ5EDQ6mQ0z4MUGXt2CgQmKyafBEXhUVqLRB6G8=
github.com/pion/sdp/v3 v3.0.18 h1:l0bAXazKHpepazVdp+tPYnrsy9dfh7ZbT8DxesH5ZnI=
github.com/pion/sdp/v3 v3.0.18/go.mod h1:ZREGo6A9ZygQ9XkqAj5xYCQtQpif0i6Pa81HOiAdqQ8=
github.com/pion/srtp/v3 v3.0.10 h1:tFirkpBb3XccP5VEXLi50GqXhv5SKPxqrdlhDCJlZrQ=
github.com/pion/srtp/v3 v3.0.10/go.mod h1:3mOTIB0cq9qlbn59V4ozvv9ClW/BSEbRp4cY0VtaR7M=
github.com/pion/stun/v3 v3.1.1 h1:CkQxveJ4xGQjulGSROXbXq94TAWu8gIX2dT+ePhUkqw=
github.com/pion/stun/v3 v3.1.1/go.mod h1:qC1DfmcCTQjl9PBaMa5wSn3x9IPmKxSdcCsxBcDBndM=
github.com/pion/transport/v3 v3.1.1 h1:Tr684+fnnKlhPceU+ICdrw6KKkTms+5qHMgw6bIkYOM=
github.com/pion/transport/v3 v3.1.1/go.mod h1:+c2eewC5WJQHiAA46fkMMzoYZSuGzA/7E2FPrOYHctQ=
github.com/pion/transport/v4 v4.0.1 h1:sdROELU6BZ63Ab7FrOLn13M6YdJLY20wldXW2Cu2k8o=
github.com/pion/transport/v4 v4.0.1/go.mod h1:nEuEA4AD5lPdcIegQDpVLgNoDGreqM/YqmEx3ovP4jM=
github.com/pion/turn/v4 v4.1.4 h1:EU11yMXKIsK43FhcUnjLlrhE4nboHZq+TXBIi3QpcxQ=
github.com/pion/turn/v4 v4.1.4/go.mod h1:ES1DXVFKnOhuDkqn9hn5VJlSWmZPaRJLyBXoOeO/BmQ=
github.com/pion/webrtc/v4 v4.2.9 h1:DZIh1HAhPIL3RvwEDFsmL5hfPSLEpxsQk9/Jir2vkJE=
github.com/pion/webrtc/v4 v4.2.9/go.mod h1:9EmLZve0H76eTzf8v2FmchZ6tcBXtDgpfTEu+drW6SY=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U=
github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U=
github.com/wlynxg/anet v0.0.5 h1:J3VJGi1gvo0JwZ/P1/Yc/8p63SoW98B5dHkYDmpgvvU=
github.com/wlynxg/anet v0.0.5/go.mod h1:eay5PRQr7fIVAMbTbchTnO9gG65Hg/uYGdc7mguHxoA=
github.com/x448/float16 v0.8.4 h1:qLwI1I70+NjRFUR3zs1JPUCgaCXSh3SW62uAKT1mSBM=
github.com/x448/float16 v0.8.4/go.mod h1:14CWIYCyZA/cWjXOioeEpHeN/83MdbZDRQHoFcYsOfg=
go.uber.org/mock v0.6.0 h1:hyF9dfmbgIX5EfOdasqLsWD6xqpNZlXblLB/Dbnwv3Y=
go.uber.org/mock v0.6.0/go.mod h1:KiVJ4BqZJaMj4svdfmHM0AUx4NJYO8ZNpPnZn1Z+BBU=
golang.org/x/crypto v0.48.0 h1:/VRzVqiRSggnhY7gNRxPauEQ5Drw9haKdM0jqfcCFts=
golang.org/x/crypto v0.48.0/go.mod h1:r0kV5h3qnFPlQnBSrULhlsRfryS2pmewsg+XfMgkVos=
golang.org/x/net v0.50.0 h1:ucWh9eiCGyDR3vtzso0WMQinm2Dnt8cFMuQa9K33J60=
golang.org/x/net v0.50.0/go.mod h1:UgoSli3F/pBgdJBHCTc+tp3gmrU4XswgGRgtnwWTfyM=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.41.0 h1:Ivj+2Cp/ylzLiEU89QhWblYnOE9zerudt9Ftecq2C6k=
golang.org/x/sys v0.41.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/time v0.10.0 h1:3usCWA8tQn0L8+hFJQNgzpWbd89begxN66o1Ojdn5L4=
golang.org/x/time v0.10.0/go.mod h1:3BpzKBy/shNhVucY/MWOyx10tF3SFh9QdLuxbVysPQM=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=

View file

@ -1,293 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>F1 Streams</title>
</head>
<body>
<h3>Use the players below to watch ad-free</h3>
<h3>If on android - I highly recomment <a href="https://sportzxtv.com/">SportzX</a>!</h3>
<!-- <h3>
If none work go to
<a href="https://f1livegp.me/f1/live.html">https://f1livegp.me/f1/live.html</a>
<a href="http://mx.freestreams-live1.com/f1-live-stream1/">http://mx.freestreams-live1.com/f1-live-stream1/</a> or
<a href="http://freestreams-live1.com/f1-live-streams/">http://freestreams-live1.com/f1-live-streams/</a>
</h3> -->
<!-- <h3>
If you don't see <span id="numstreams"></span> streams try accessing the
page via http*(try incognito if browser keeps opening https):
<a href="http://f1.viktorbarzin.me">http://f1.viktorbarzin.me</a>
</h3>
<h5>
*Some of the stream sources are http and browsers disallow loading mixed
content e.g you loaded the page over https but you are trying to connect
to http stream source which is insecure hence blocked
</h5>
<h5>
When possible, use https as otherwise streams could be monitored, altered
and blocked by upstream ISPs, firewalls etc.
</h5> -->
<div id="root">
<h2><a href="https://wac.rip/streams-pages/motorsports">https://wac.rip/streams-pages/motorsports</a></h2>
<iframe
id="s0-iframe"
width="1000"
height="600"
src="https://wearechecking.live/streams-pages/motorsports"
frameborder="0"
scrolling="yes"
gesture="media"
allow="encrypted-media"
allowfullscreen=""
sandbox="allow-top-navigation=false allow-scripts allow-same-origin allow-forms allow-modals allow-presentation"
></iframe>
<h2><a href="https://vipleague.im/formula-1-schedule-streaming-links">https://vipleague.im/formula-1-schedule-streaming-links</a></h2>
<iframe
id="s1-iframe"
width="1000"
height="600"
src="https://vipleague.im/formula-1-schedule-streaming-links"
frameborder="0"
scrolling="yes"
gesture="media"
allow="encrypted-media"
allowfullscreen=""
sandbox="allow-top-navigation=false allow-scripts allow-same-origin allow-forms allow-modals allow-presentation"
></iframe>
<h2><a href="https://www.vipbox.lc/">https://www.vipbox.lc/</a></h2>
<iframe
id="s2-iframe"
width="1000"
height="600"
src="https://www.vipbox.lc/"
frameborder="0"
scrolling="yes"
gesture="media"
allow="encrypted-media"
allowfullscreen=""
sandbox="allow-top-navigation=false allow-scripts allow-same-origin allow-forms allow-modals allow-presentation"
></iframe>
<h2><a href="https://f1box.me/">https://f1box.me/</a></h2>
<iframe
id="s3-iframe"
width="1000"
height="600"
src="https://f1box.me/"
frameborder="0"
scrolling="yes"
gesture="media"
allow="encrypted-media"
allowfullscreen=""
sandbox="allow-top-navigation=false allow-scripts allow-same-origin allow-forms allow-modals allow-presentation"
></iframe>
<h2><a href="https://1stream.vip/formula-1-streams/">https://1stream.vip/formula-1-streams/</a></h2>
<iframe
id="s4-iframe"
width="1000"
height="600"
src="https://1stream.vip/formula-1-streams/"
frameborder="0"
scrolling="yes"
gesture="media"
allow="encrypted-media"
allowfullscreen=""
sandbox="allow-top-navigation=false allow-scripts allow-same-origin allow-forms allow-modals allow-presentation"
></iframe>
<div id="root">
<h1>
F1 Race replays:
<a href="">https://f1fullraces.com/</a>
</h1>
<h1>
<a href="https://aceztrims.pages.dev/f1/">https://aceztrims.pages.dev/f1/</a>
</h1>
<h1>
<a href="https://freestreams-live.mp/f1-live-stream10/">https://freestreams-live.mp/f1-live-stream10/</a>
</h1>
<h1>
<a href="https://fmhy.net/videopiracyguide#live-sports"
>https://fmhy.net/videopiracyguide#live-sports</a
>
</h1>
<h1>
<a href="https://fmhy.net/videopiracyguide#live-sports">https://fmhy.net/videopiracyguide#live-sports</a>
</h1>
<h1><a href="https://thetvapp.to/"> https://thetvapp.to </a></h1>
<h1>
<a href="http://www.freeintertv.com/">http://www.freeintertv.com/</a>
</h1>
<h1>
<a href="https://www.bg-gledai.video/nacionalni"
>https://www.bg-gledai.video/nacionalni</a
>
</h1>
<!-- <iframe id="s1" onclick='document.getElementById("s1").src="http://mx.freestreams-live1.com/f1-live-stream1/";'
style="border-width: 10mm;" src="http://mx.freestreams-live1.com/f1-live-stream1/" class="embed-responsive-item"
frameborder="1" height="580" width="40%" allowfullscreen="" scrolling="no" allowtransparency=""
sandbox="allow-forms allow-scripts allow-same-origin allow-top-navigation"></iframe> -->
<!-- <iframe id="s1" onclick='document.getElementById("s2").src="https://f1livegp.me/f1/live.html";' -->
<!-- <iframe id="s1" onclick='document.getElementById("s2").src="https://f1livegp.me/f1/live3.html";'
class="embed-responsive-item" frameborder="1" style="border-width: 10mm;" height="580" width="40%"
allowfullscreen="" scrolling="yes" allowtransparency="" src="https://f1livegp.me/f1/live3.html"
sandbox="allow-forms allow-scripts allow-same-origin"
></iframe> -->
<!-- <iframe id="s2" onclick='document.getElementById("s2").src="http://mx.freestreams-live1.com/skysportsf1-stream/";'
class="embed-responsive-item" frameborder="1" style="border-width: 10mm;" height="580" width="40%"
allowfullscreen="" scrolling="yes" allowtransparency="" src=""
sandbox="allow-forms allow-scripts allow-same-origin allow-top-navigation"></iframe> -->
<!-- <iframe id="s3"
onclick='document.getElementById("s3").src="http://fomny.com/Video/United-kindom/Sky-sport/stream/Sky-sport-F1.php";'
class="embed-responsive-item" frameborder="1" style="border-width: 10mm;" height="580" width="40%"
allowfullscreen="" scrolling="yes" allowtransparency="" src=""
sandbox="allow-forms allow-scripts allow-same-origin allow-top-navigation"></iframe> -->
<!-- <iframe id="s4" onclick='document.getElementById("s4").src="https://cricfree.pw/sky-sports-f1-live-stream";'
class="embed-responsive-item" frameborder="1" style="border-width: 10mm;" height="580" width="40%"
allowfullscreen="" scrolling="yes" allowtransparency="" src=""
sandbox="allow-forms allow-scripts allow-same-origin allow-top-navigation"></iframe> -->
<!-- <iframe id="s6" onclick='document.getElementById("s6").src="https://en.viprow.me/sky-sports-f1-online-stream";'
class="embed-responsive-item" frameborder="1" style="border-width: 10mm;" height="580" width="40%"
allowfullscreen="" scrolling="yes" allowtransparency="" src=""
sandbox="allow-forms allow-scripts allow-same-origin "></iframe> -->
<!-- ESPN -->
<!-- <iframe src="http://freestreams-live1.com/usa/espn.php" marginwidth="0" marginheight="0" scrolling="no" width="40%"
height="580" frameborder="0" allowfullscreen="allowfullscreen" sandbox="allow-scripts allow-same-origin"></iframe>
<iframe src="http://freestreams-live1.com/usa/espn2.php" marginwidth="0" marginheight="0" scrolling="no" width="40%"
height="580" frameborder="0" allowfullscreen="allowfullscreen" sandbox="allow-scripts allow-same-origin"></iframe> -->
</div>
<!-- The ones below don't work well :/ -->
<!-- Stream 3 -->
<!-- Domain protected -->
<!-- <iframe
src="https://sportscart.xyz/ch/scplayer-60.php"
width="40%"
height="580"
frameborder="0"
marginwidth="0"
marginheight="0"
scrolling="no"
allowfullscreen="allowfullscreen"
sandbox="allow-scripts allow-same-origin"
></iframe> -->
<!-- Stream 5 -->
<!-- Ads popup not closing :/ -->
<!-- <iframe src="http://channelstream.club/stream/uk_skysport_f1.php" width="100%" height="580" frameborder="0"
marginwidth="0" marginheight="0" scrolling="no" allowfullscreen="allowfullscreen"
sandbox="allow-scripts allow-same-origin allow-forms"></iframe> -->
</body>
<!-- <script>
document.getElementById("numstreams").textContent = document.getElementById(
"root"
).childElementCount;
</script> -->
<script>
// Get a reference to the iframe
var root = document.getElementById("root");
// if (window.self !== window.top) {
// // The code is running inside an iframe
// root.style.backgroundColor = 'lightblue';
// root.innerHTML = '<iframe id="s1-iframe" width="1000" height="600" src="https://wikisport.click/strm/f1.php" frameborder="0" scrolling="no" gesture="media" allow="encrypted-media" allowfullscreen="" ></>';
// } else {
// // The code is running in the parent window
// document.body.style.backgroundColor = 'lightgreen';
// document.body.innerHTML = '<iframe width="1000" height="600" src="https://f1.viktorbarzin.me" sandbox="allow-forms allow-scripts allow-same-origin allow-top-navigation" />';
// }
// Add a 'load' event listener to the iframe
myIframe.addEventListener("load", function () {
// Set the iframe's 'contentWindow.location' property to the current URL
myIframe.contentWindow.location = myIframe.contentWindow.location.href;
});
// Add a 'beforeunload' event listener to the window to prevent redirection
myIframe.addEventListener("beforeunload", function (event) {
// If the event was triggered by a frame...
console.log("before unload");
if (event.target !== window) {
// Prevent the default action of the event
event.preventDefault();
}
});
// Add event listener to handle messages from the iframe
window.addEventListener(
"message",
function (event) {
// Check if the message is a redirect request
if (event.data.redirectTo) {
// Reject the redirect request by logging an error message
console.error(
"Iframe attempted to redirect to:",
event.data.redirectTo
);
}
},
false
);
window.onbeforeunload = function () {
// Check if an iframe attempted to redirect the parent
if (window.location.href != "about:blank") {
// Block the navigation attempt
event.returnValue = "Are you sure you want to leave this page?";
}
};
window.location = new Proxy(window.location, {
set: function (target, property, value, receiver) {
console.log("location edite");
// Check if the caller is the child iframe
if (window.frames.indexOf(receiver) != -1) {
// Block any attempts to modify the location object
console.error(
"Blocked attempt to modify parent window location:",
target,
property,
value
);
return false;
} else {
// Allow other modifications to the location object
return Reflect.set(target, property, value, receiver);
}
},
});
window.history = new Proxy(window.history, {
set: function (target, property, value, receiver) {
console.log("history edite");
// Check if the caller is the child iframe
if (window.frames.indexOf(receiver) != -1) {
// Block any attempts to modify the history object
console.error(
"Blocked attempt to modify parent window history:",
target,
property,
value
);
return false;
} else {
// Allow other modifications to the history object
return Reflect.set(target, property, value, receiver);
}
},
});
</script>
</html>

View file

@ -1,359 +0,0 @@
package auth
import (
"crypto/rand"
"encoding/json"
"fmt"
"log"
"net/http"
"regexp"
"sync"
"time"
"f1-stream/internal/models"
"f1-stream/internal/store"
"github.com/go-webauthn/webauthn/webauthn"
)
var usernameRe = regexp.MustCompile(`^[a-zA-Z0-9_]{3,30}$`)
type Auth struct {
store *store.Store
webauthn *webauthn.WebAuthn
adminUsername string
sessionTTL time.Duration
// In-memory storage for WebAuthn ceremony session data (short-lived)
regSessions map[string]*webauthn.SessionData
loginSessions map[string]*webauthn.SessionData
mu sync.Mutex
}
func New(s *store.Store, rpDisplayName, rpID string, rpOrigins []string, adminUsername string, sessionTTL time.Duration) (*Auth, error) {
wconfig := &webauthn.Config{
RPDisplayName: rpDisplayName,
RPID: rpID,
RPOrigins: rpOrigins,
}
w, err := webauthn.New(wconfig)
if err != nil {
return nil, fmt.Errorf("webauthn init: %w", err)
}
return &Auth{
store: s,
webauthn: w,
adminUsername: adminUsername,
sessionTTL: sessionTTL,
regSessions: make(map[string]*webauthn.SessionData),
loginSessions: make(map[string]*webauthn.SessionData),
}, nil
}
// BeginRegistration starts the WebAuthn registration ceremony.
func (a *Auth) BeginRegistration(w http.ResponseWriter, r *http.Request) {
var req struct {
Username string `json:"username"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, `{"error":"invalid request"}`, http.StatusBadRequest)
return
}
if !usernameRe.MatchString(req.Username) {
http.Error(w, `{"error":"username must be 3-30 chars, alphanumeric or underscore"}`, http.StatusBadRequest)
return
}
existing, err := a.store.GetUserByName(req.Username)
if err != nil {
http.Error(w, `{"error":"internal error"}`, http.StatusInternalServerError)
return
}
if existing != nil {
http.Error(w, `{"error":"username already taken"}`, http.StatusConflict)
return
}
id, err := randomID()
if err != nil {
http.Error(w, `{"error":"internal error"}`, http.StatusInternalServerError)
return
}
isAdmin := false
if a.adminUsername != "" && req.Username == a.adminUsername {
isAdmin = true
} else if a.adminUsername == "" {
count, err := a.store.UserCount()
if err == nil && count == 0 {
isAdmin = true
}
}
user := &models.User{
ID: id,
Username: req.Username,
IsAdmin: isAdmin,
CreatedAt: time.Now(),
}
options, session, err := a.webauthn.BeginRegistration(user)
if err != nil {
log.Printf("BeginRegistration error: %v", err)
http.Error(w, `{"error":"failed to begin registration"}`, http.StatusInternalServerError)
return
}
a.mu.Lock()
a.regSessions[req.Username] = session
a.mu.Unlock()
// Clean up session after 5 minutes
go func() {
time.Sleep(5 * time.Minute)
a.mu.Lock()
delete(a.regSessions, req.Username)
a.mu.Unlock()
}()
// Store user temporarily - will be committed on finish
// We create the user now so FinishRegistration can look it up
if err := a.store.CreateUser(*user); err != nil {
http.Error(w, `{"error":"failed to create user"}`, http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(options)
}
// FinishRegistration completes the WebAuthn registration ceremony.
func (a *Auth) FinishRegistration(w http.ResponseWriter, r *http.Request) {
var req struct {
Username string `json:"username"`
}
// Username is passed as query param since body is the attestation response
username := r.URL.Query().Get("username")
if username == "" {
// Try to decode from a wrapper
http.Error(w, `{"error":"username required"}`, http.StatusBadRequest)
return
}
req.Username = username
a.mu.Lock()
session, ok := a.regSessions[req.Username]
if ok {
delete(a.regSessions, req.Username)
}
a.mu.Unlock()
if !ok {
http.Error(w, `{"error":"no registration in progress"}`, http.StatusBadRequest)
return
}
user, err := a.store.GetUserByName(req.Username)
if err != nil || user == nil {
http.Error(w, `{"error":"user not found"}`, http.StatusBadRequest)
return
}
credential, err := a.webauthn.FinishRegistration(user, *session, r)
if err != nil {
log.Printf("FinishRegistration error: %v", err)
http.Error(w, `{"error":"registration failed"}`, http.StatusBadRequest)
return
}
user.Credentials = append(user.Credentials, *credential)
if err := a.store.UpdateUserCredentials(user.ID, user.Credentials); err != nil {
http.Error(w, `{"error":"failed to save credential"}`, http.StatusInternalServerError)
return
}
// Create session
token, err := a.store.CreateSession(user.ID, a.sessionTTL)
if err != nil {
http.Error(w, `{"error":"failed to create session"}`, http.StatusInternalServerError)
return
}
http.SetCookie(w, &http.Cookie{
Name: "session",
Value: token,
Path: "/",
HttpOnly: true,
SameSite: http.SameSiteStrictMode,
Secure: r.TLS != nil,
MaxAge: int(a.sessionTTL.Seconds()),
})
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]interface{}{
"id": user.ID,
"username": user.Username,
"is_admin": user.IsAdmin,
})
}
// BeginLogin starts the WebAuthn login ceremony.
func (a *Auth) BeginLogin(w http.ResponseWriter, r *http.Request) {
var req struct {
Username string `json:"username"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, `{"error":"invalid request"}`, http.StatusBadRequest)
return
}
user, err := a.store.GetUserByName(req.Username)
if err != nil {
http.Error(w, `{"error":"internal error"}`, http.StatusInternalServerError)
return
}
if user == nil {
http.Error(w, `{"error":"user not found"}`, http.StatusNotFound)
return
}
if len(user.Credentials) == 0 {
http.Error(w, `{"error":"no credentials registered"}`, http.StatusBadRequest)
return
}
options, session, err := a.webauthn.BeginLogin(user)
if err != nil {
log.Printf("BeginLogin error: %v", err)
http.Error(w, `{"error":"failed to begin login"}`, http.StatusInternalServerError)
return
}
a.mu.Lock()
a.loginSessions[req.Username] = session
a.mu.Unlock()
go func() {
time.Sleep(5 * time.Minute)
a.mu.Lock()
delete(a.loginSessions, req.Username)
a.mu.Unlock()
}()
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(options)
}
// FinishLogin completes the WebAuthn login ceremony.
func (a *Auth) FinishLogin(w http.ResponseWriter, r *http.Request) {
username := r.URL.Query().Get("username")
if username == "" {
http.Error(w, `{"error":"username required"}`, http.StatusBadRequest)
return
}
a.mu.Lock()
session, ok := a.loginSessions[username]
if ok {
delete(a.loginSessions, username)
}
a.mu.Unlock()
if !ok {
http.Error(w, `{"error":"no login in progress"}`, http.StatusBadRequest)
return
}
user, err := a.store.GetUserByName(username)
if err != nil || user == nil {
http.Error(w, `{"error":"user not found"}`, http.StatusBadRequest)
return
}
credential, err := a.webauthn.FinishLogin(user, *session, r)
if err != nil {
log.Printf("FinishLogin error: %v", err)
http.Error(w, `{"error":"login failed"}`, http.StatusUnauthorized)
return
}
// Update credential sign count
for i, c := range user.Credentials {
if string(c.ID) == string(credential.ID) {
user.Credentials[i].Authenticator.SignCount = credential.Authenticator.SignCount
break
}
}
if err := a.store.UpdateUserCredentials(user.ID, user.Credentials); err != nil {
log.Printf("Failed to update credential sign count: %v", err)
}
token, err := a.store.CreateSession(user.ID, a.sessionTTL)
if err != nil {
http.Error(w, `{"error":"failed to create session"}`, http.StatusInternalServerError)
return
}
http.SetCookie(w, &http.Cookie{
Name: "session",
Value: token,
Path: "/",
HttpOnly: true,
SameSite: http.SameSiteStrictMode,
Secure: r.TLS != nil,
MaxAge: int(a.sessionTTL.Seconds()),
})
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]interface{}{
"id": user.ID,
"username": user.Username,
"is_admin": user.IsAdmin,
})
}
// Logout clears the session.
func (a *Auth) Logout(w http.ResponseWriter, r *http.Request) {
cookie, err := r.Cookie("session")
if err == nil {
a.store.DeleteSession(cookie.Value)
}
http.SetCookie(w, &http.Cookie{
Name: "session",
Value: "",
Path: "/",
HttpOnly: true,
MaxAge: -1,
})
w.Header().Set("Content-Type", "application/json")
w.Write([]byte(`{"ok":true}`))
}
// Me returns the current user info.
func (a *Auth) Me(w http.ResponseWriter, r *http.Request) {
user := UserFromContext(r.Context())
if user == nil {
http.Error(w, `{"error":"not authenticated"}`, http.StatusUnauthorized)
return
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(map[string]interface{}{
"id": user.ID,
"username": user.Username,
"is_admin": user.IsAdmin,
})
}
// GetSessionUser returns the user for a session token.
func (a *Auth) GetSessionUser(token string) (*models.User, error) {
sess, err := a.store.GetSession(token)
if err != nil || sess == nil {
return nil, err
}
return a.store.GetUserByID(sess.UserID)
}
func randomID() (string, error) {
b := make([]byte, 16)
if _, err := rand.Read(b); err != nil {
return "", err
}
return fmt.Sprintf("%x", b), nil
}

View file

@ -1,20 +0,0 @@
package auth
import (
"context"
"f1-stream/internal/models"
)
type contextKey string
const userKey contextKey = "user"
func ContextWithUser(ctx context.Context, user *models.User) context.Context {
return context.WithValue(ctx, userKey, user)
}
func UserFromContext(ctx context.Context) *models.User {
user, _ := ctx.Value(userKey).(*models.User)
return user
}

View file

@ -1,38 +0,0 @@
package extractor
import (
"log"
"os/exec"
)
const maxConcurrentSessions = 10
var sessionSem chan struct{}
// Init starts dbus, PulseAudio, and prepares the session semaphore.
func Init() {
// Start dbus (Chrome needs it for accessibility/service queries)
if err := exec.Command("mkdir", "-p", "/var/run/dbus").Run(); err == nil {
if err := exec.Command("dbus-daemon", "--system", "--nofork").Start(); err != nil {
log.Printf("extractor: warning: failed to start dbus: %v", err)
}
}
if err := exec.Command("pulseaudio", "--start", "--exit-idle-time=-1").Run(); err != nil {
log.Printf("extractor: warning: failed to start PulseAudio: %v", err)
}
// Create a null-sink as the default audio target for all sessions
exec.Command("pactl", "load-module", "module-null-sink",
"sink_name=virtual_sink",
"sink_properties=device.description=VirtualSink").Run()
exec.Command("pactl", "set-default-sink", "virtual_sink").Run()
sessionSem = make(chan struct{}, maxConcurrentSessions)
log.Println("extractor: initialized")
}
// Stop kills PulseAudio.
func Stop() {
exec.Command("pulseaudio", "--kill").Run()
log.Println("extractor: stopped")
}

View file

@ -1,167 +0,0 @@
package extractor
import (
"fmt"
"log"
"os"
"os/exec"
"sync/atomic"
"time"
)
var displayCounter int64 = 99
func nextDisplay() int {
return int(atomic.AddInt64(&displayCounter, 1))
}
// Capture manages an Xvfb display and separate ffmpeg pipelines for video and audio.
// Audio capture is best-effort — if PulseAudio is unavailable, video still works.
type Capture struct {
display int
xvfbCmd *exec.Cmd
videoCmd *exec.Cmd
audioCmd *exec.Cmd
videoR *os.File // IVF pipe reader (VP8 frames)
audioR *os.File // OGG pipe reader (Opus frames)
}
// NewCapture starts Xvfb on the given display and two ffmpeg processes:
// one for video (x11grab → VP8/IVF) and one for audio (pulse → Opus/OGG).
// Audio is best-effort — if it fails to start, video still works and audioR
// is set to a pipe that will return EOF immediately.
func NewCapture(display, width, height int) (*Capture, error) {
c := &Capture{display: display}
// Start Xvfb
screen := fmt.Sprintf("%dx%dx24", width, height)
c.xvfbCmd = exec.Command("Xvfb", fmt.Sprintf(":%d", display),
"-screen", "0", screen, "-ac", "-nolisten", "tcp")
if err := c.xvfbCmd.Start(); err != nil {
return nil, fmt.Errorf("capture: failed to start Xvfb: %w", err)
}
// Wait for Xvfb to be ready (X11 socket must exist)
ready := false
for i := 0; i < 50; i++ {
socketPath := fmt.Sprintf("/tmp/.X11-unix/X%d", display)
if _, err := os.Stat(socketPath); err == nil {
ready = true
break
}
time.Sleep(100 * time.Millisecond)
}
if !ready {
c.xvfbCmd.Process.Kill()
c.xvfbCmd.Wait()
return nil, fmt.Errorf("capture: Xvfb did not start in time for display :%d", display)
}
// --- Video pipeline (required) ---
videoR, videoW, err := os.Pipe()
if err != nil {
c.cleanup()
return nil, fmt.Errorf("capture: video pipe: %w", err)
}
c.videoCmd = exec.Command("ffmpeg",
"-loglevel", "warning",
"-f", "x11grab", "-framerate", "30",
"-video_size", fmt.Sprintf("%dx%d", width, height),
"-i", fmt.Sprintf(":%d", display),
"-c:v", "libvpx",
"-quality", "realtime", "-cpu-used", "8",
"-deadline", "realtime", "-b:v", "2M", "-g", "30",
"-f", "ivf", "pipe:3",
)
c.videoCmd.ExtraFiles = []*os.File{videoW}
c.videoCmd.Stdout = os.Stderr
c.videoCmd.Stderr = os.Stderr
if err := c.videoCmd.Start(); err != nil {
videoR.Close()
videoW.Close()
c.cleanup()
return nil, fmt.Errorf("capture: failed to start video ffmpeg: %w", err)
}
videoW.Close()
c.videoR = videoR
go func() {
if err := c.videoCmd.Wait(); err != nil {
log.Printf("capture: video ffmpeg exited on display :%d: %v", display, err)
}
}()
// --- Audio pipeline (best-effort) ---
audioR, audioW, err := os.Pipe()
if err != nil {
log.Printf("capture: audio pipe failed on display :%d: %v (continuing without audio)", display, err)
// Provide a closed pipe so StreamAudio gets EOF immediately
r, w, _ := os.Pipe()
w.Close()
c.audioR = r
log.Printf("capture: started display :%d (%dx%d) (video only)", display, width, height)
return c, nil
}
c.audioCmd = exec.Command("ffmpeg",
"-loglevel", "warning",
"-f", "pulse", "-i", "virtual_sink.monitor",
"-c:a", "libopus",
"-b:a", "128k", "-application", "lowdelay",
"-f", "ogg", "pipe:3",
)
c.audioCmd.ExtraFiles = []*os.File{audioW}
c.audioCmd.Stdout = os.Stderr
c.audioCmd.Stderr = os.Stderr
if err := c.audioCmd.Start(); err != nil {
log.Printf("capture: audio ffmpeg failed to start on display :%d: %v (continuing without audio)", display, err)
audioR.Close()
audioW.Close()
// Provide a closed pipe so StreamAudio gets EOF immediately
r, w, _ := os.Pipe()
w.Close()
c.audioR = r
c.audioCmd = nil
log.Printf("capture: started display :%d (%dx%d) (video only)", display, width, height)
return c, nil
}
audioW.Close()
c.audioR = audioR
go func() {
if err := c.audioCmd.Wait(); err != nil {
log.Printf("capture: audio ffmpeg exited on display :%d: %v", display, err)
}
}()
log.Printf("capture: started display :%d (%dx%d) (video + audio)", display, width, height)
return c, nil
}
func (c *Capture) cleanup() {
if c.xvfbCmd != nil && c.xvfbCmd.Process != nil {
c.xvfbCmd.Process.Kill()
c.xvfbCmd.Wait()
}
}
// Close stops ffmpeg processes, Xvfb, and releases pipe resources.
func (c *Capture) Close() {
if c.videoCmd != nil && c.videoCmd.Process != nil {
c.videoCmd.Process.Kill()
}
if c.audioCmd != nil && c.audioCmd.Process != nil {
c.audioCmd.Process.Kill()
}
if c.videoR != nil {
c.videoR.Close()
}
if c.audioR != nil {
c.audioR.Close()
}
c.cleanup()
log.Printf("capture: stopped display :%d", c.display)
}

View file

@ -1,383 +0,0 @@
package extractor
import (
"context"
"encoding/json"
"fmt"
"log"
"net"
"net/http"
"os"
"os/exec"
"sync"
"time"
"github.com/chromedp/cdproto/fetch"
"github.com/chromedp/cdproto/input"
"github.com/chromedp/cdproto/network"
"github.com/chromedp/cdproto/page"
"github.com/chromedp/chromedp"
"github.com/gobwas/ws"
"github.com/gobwas/ws/wsutil"
"github.com/pion/webrtc/v4"
)
const (
sessionTimeout = 5 * time.Minute
defaultViewportW = 1280
defaultViewportH = 720
turnCredentialTTL = 24 * time.Hour
)
var (
turnURL string
turnSharedSecret string
turnInternalURL string
)
// SetTURNConfig sets the TURN server URL, shared secret, and optional internal URL.
// The internal URL is used by pion (server-side) to avoid hairpin NAT issues.
// The public URL is sent to the browser client.
func SetTURNConfig(url, secret, internalURL string) {
turnURL = url
turnSharedSecret = secret
turnInternalURL = internalURL
if turnInternalURL == "" {
turnInternalURL = "turn:coturn.coturn.svc.cluster.local:3478"
}
log.Printf("extractor: TURN configured: public=%s internal=%s", url, turnInternalURL)
}
var adDomains = []string{
"doubleclick.net", "googlesyndication.com", "googleadservices.com",
"google-analytics.com", "adnxs.com", "criteo.com", "outbrain.com",
"taboola.com", "amazon-adsystem.com", "popads.net", "popcash.net",
"juicyads.com", "exoclick.com", "trafficjunky.com", "propellerads.com",
"adsterra.com", "hilltopads.net", "revcontent.com", "mgid.com",
}
type inputMsg struct {
Type string `json:"type"`
X float64 `json:"x"`
Y float64 `json:"y"`
Button int `json:"button"`
DeltaX float64 `json:"deltaX"`
DeltaY float64 `json:"deltaY"`
Key string `json:"key"`
Code string `json:"code"`
Mods int `json:"modifiers"`
Width int `json:"width"`
Height int `json:"height"`
SDP string `json:"sdp"`
Candidate *webrtc.ICECandidateInit `json:"candidate"`
}
// HandleBrowserSession upgrades to WebSocket and runs a remote browser session
// with WebRTC video/audio streaming and CDP input relay.
func HandleBrowserSession(w http.ResponseWriter, r *http.Request, pageURL string) {
// Check session capacity
select {
case sessionSem <- struct{}{}:
defer func() { <-sessionSem }()
default:
http.Error(w, `{"error":"too many active browser sessions"}`, http.StatusServiceUnavailable)
return
}
conn, _, _, err := ws.UpgradeHTTP(r, w)
if err != nil {
log.Printf("extractor: session: ws upgrade failed: %v", err)
return
}
defer conn.Close()
ctx, cancel := context.WithCancel(r.Context())
defer cancel()
// Allocate display and start capture pipeline
display := nextDisplay()
viewW, viewH := defaultViewportW, defaultViewportH
cap, err := NewCapture(display, viewW, viewH)
if err != nil {
sendWSError(conn, "failed to start capture: "+err.Error())
log.Printf("extractor: session: capture error: %v", err)
return
}
defer cap.Close()
// Start Chrome on the virtual display
opts := append(chromedp.DefaultExecAllocatorOptions[:],
chromedp.Flag("headless", false),
chromedp.Flag("no-sandbox", true),
chromedp.Flag("disable-gpu", true),
chromedp.Flag("disable-software-rasterizer", true),
chromedp.Flag("disable-dev-shm-usage", true),
chromedp.Flag("disable-extensions", true),
chromedp.Flag("disable-background-networking", true),
chromedp.ModifyCmdFunc(func(cmd *exec.Cmd) {
cmd.Env = append(os.Environ(), fmt.Sprintf("DISPLAY=:%d", display))
}),
chromedp.Flag("autoplay-policy", "no-user-gesture-required"),
chromedp.Flag("window-size", fmt.Sprintf("%d,%d", viewW, viewH)),
chromedp.WSURLReadTimeout(30 * time.Second),
)
allocCtx, allocCancel := chromedp.NewExecAllocator(ctx, opts...)
defer allocCancel()
tabCtx, tabCancel := chromedp.NewContext(allocCtx)
defer tabCancel()
var wsMu sync.Mutex
// Build ICE servers for pion (server-side) — uses internal TURN URL to avoid hairpin NAT
iceServers := []webrtc.ICEServer{
{URLs: []string{"stun:stun.l.google.com:19302"}},
}
var turnCreds *TURNCredentials
if turnURL != "" && turnSharedSecret != "" {
// Server-side: use internal k8s DNS for TURN to bypass NAT
internalCreds := GenerateTURNCredentials(turnInternalURL, turnSharedSecret, turnCredentialTTL)
turnCreds = &internalCreds
iceServers = append(iceServers, webrtc.ICEServer{
URLs: internalCreds.URLs,
Username: internalCreds.Username,
Credential: internalCreds.Credential,
CredentialType: webrtc.ICECredentialTypePassword,
})
}
// Build ad-blocking fetch patterns
adPatterns := make([]*fetch.RequestPattern, 0, len(adDomains))
for _, domain := range adDomains {
adPatterns = append(adPatterns, &fetch.RequestPattern{
URLPattern: fmt.Sprintf("*://*.%s/*", domain),
})
}
// Set up event listeners before navigation
chromedp.ListenTarget(tabCtx, func(ev interface{}) {
switch e := ev.(type) {
case *fetch.EventRequestPaused:
go chromedp.Run(tabCtx, fetch.FailRequest(e.RequestID, network.ErrorReasonBlockedByClient))
case *page.EventFrameNavigated:
if e.Frame.ParentID == "" {
go sendURLUpdate(tabCtx, conn, &wsMu, e.Frame.URL)
}
case *page.EventNavigatedWithinDocument:
go sendURLUpdate(tabCtx, conn, &wsMu, e.URL)
}
})
// Enable fetch interception (ad blocking) and navigate
if err := chromedp.Run(tabCtx,
fetch.Enable().WithPatterns(adPatterns),
chromedp.Navigate(pageURL),
chromedp.WaitReady("body"),
); err != nil {
sendWSError(conn, "navigation failed")
log.Printf("extractor: session: navigate error for %s: %v", pageURL, err)
return
}
// Create WebRTC media stream
mediaStream, err := NewMediaStream(iceServers, func(c *webrtc.ICECandidate) {
data, _ := json.Marshal(map[string]interface{}{
"type": "ice",
"candidate": c.ToJSON(),
})
wsMu.Lock()
wsutil.WriteServerMessage(conn, ws.OpText, data)
wsMu.Unlock()
}, cancel)
if err != nil {
sendWSError(conn, "WebRTC setup failed")
log.Printf("extractor: session: webrtc error: %v", err)
return
}
defer mediaStream.Close()
// Create and send SDP offer
sdp, err := mediaStream.Offer()
if err != nil {
sendWSError(conn, "WebRTC offer failed")
log.Printf("extractor: session: offer error: %v", err)
return
}
// Send ICE config to client — uses PUBLIC TURN URL (for browser to reach from internet)
clientICE := []map[string]interface{}{
{"urls": []string{"stun:stun.l.google.com:19302"}},
}
if turnCreds != nil {
// Client-side: use public IP for TURN (browser connects from internet)
publicCreds := GenerateTURNCredentials(turnURL, turnSharedSecret, turnCredentialTTL)
clientICE = append(clientICE, map[string]interface{}{
"urls": publicCreds.URLs,
"username": publicCreds.Username,
"credential": publicCreds.Credential,
})
}
iceMsg, _ := json.Marshal(map[string]interface{}{
"type": "iceServers",
"iceServers": clientICE,
})
wsMu.Lock()
wsutil.WriteServerMessage(conn, ws.OpText, iceMsg)
wsMu.Unlock()
offerMsg, _ := json.Marshal(map[string]interface{}{
"type": "offer",
"sdp": sdp,
})
wsMu.Lock()
wsutil.WriteServerMessage(conn, ws.OpText, offerMsg)
wsMu.Unlock()
// Send ready message with viewport dimensions
readyMsg, _ := json.Marshal(map[string]interface{}{
"type": "ready",
"width": viewW,
"height": viewH,
})
wsMu.Lock()
wsutil.WriteServerMessage(conn, ws.OpText, readyMsg)
wsMu.Unlock()
// Start streaming video and audio from capture pipes
go mediaStream.StreamVideo(cap.videoR, ctx)
go mediaStream.StreamAudio(cap.audioR, ctx)
log.Printf("extractor: session: started for %s (display :%d)", pageURL, display)
// Inactivity timer — cancels session after no client input
inactivity := time.NewTimer(sessionTimeout)
defer inactivity.Stop()
go func() {
select {
case <-inactivity.C:
log.Printf("extractor: session: inactivity timeout for %s", pageURL)
cancel()
case <-ctx.Done():
}
}()
// Read loop — process signaling and input messages
for {
msgs, err := wsutil.ReadClientMessage(conn, nil)
if err != nil {
break
}
for _, m := range msgs {
if m.OpCode != ws.OpText {
continue
}
// Reset inactivity timer
if !inactivity.Stop() {
select {
case <-inactivity.C:
default:
}
}
inactivity.Reset(sessionTimeout)
var msg inputMsg
if err := json.Unmarshal(m.Payload, &msg); err != nil {
continue
}
switch msg.Type {
case "answer":
if err := mediaStream.SetAnswer(msg.SDP); err != nil {
log.Printf("extractor: session: set answer error: %v", err)
}
case "ice":
if msg.Candidate != nil {
if err := mediaStream.AddICECandidate(*msg.Candidate); err != nil {
log.Printf("extractor: session: add ICE error: %v", err)
}
}
case "back":
chromedp.Run(tabCtx, chromedp.NavigateBack())
case "forward":
chromedp.Run(tabCtx, chromedp.NavigateForward())
default:
handleInput(tabCtx, &msg)
}
}
}
log.Printf("extractor: session: ended for %s", pageURL)
}
func handleInput(ctx context.Context, msg *inputMsg) {
switch msg.Type {
case "mousemove":
chromedp.Run(ctx,
input.DispatchMouseEvent(input.MouseMoved, msg.X, msg.Y))
case "mousedown":
chromedp.Run(ctx,
input.DispatchMouseEvent(input.MousePressed, msg.X, msg.Y).
WithButton(mapButton(msg.Button)).WithClickCount(1))
case "mouseup":
chromedp.Run(ctx,
input.DispatchMouseEvent(input.MouseReleased, msg.X, msg.Y).
WithButton(mapButton(msg.Button)))
case "scroll":
chromedp.Run(ctx,
input.DispatchMouseEvent(input.MouseWheel, msg.X, msg.Y).
WithDeltaX(msg.DeltaX).WithDeltaY(msg.DeltaY))
case "keydown":
chromedp.Run(ctx,
input.DispatchKeyEvent(input.KeyDown).
WithKey(msg.Key).WithCode(msg.Code).
WithModifiers(input.Modifier(msg.Mods)))
case "keyup":
chromedp.Run(ctx,
input.DispatchKeyEvent(input.KeyUp).
WithKey(msg.Key).WithCode(msg.Code).
WithModifiers(input.Modifier(msg.Mods)))
}
}
func mapButton(jsButton int) input.MouseButton {
switch jsButton {
case 1:
return input.Middle
case 2:
return input.Right
default:
return input.Left
}
}
func sendURLUpdate(tabCtx context.Context, conn net.Conn, mu *sync.Mutex, currentURL string) {
var canBack, canForward bool
var entries []*page.NavigationEntry
var currentIndex int64
if err := chromedp.Run(tabCtx, chromedp.ActionFunc(func(ctx context.Context) error {
var err error
currentIndex, entries, err = page.GetNavigationHistory().Do(ctx)
return err
})); err == nil {
canBack = currentIndex > 0
canForward = int(currentIndex) < len(entries)-1
}
data, _ := json.Marshal(map[string]interface{}{
"type": "url",
"url": currentURL,
"canBack": canBack,
"canForward": canForward,
})
mu.Lock()
wsutil.WriteServerMessage(conn, ws.OpText, data)
mu.Unlock()
}
func sendWSError(conn net.Conn, msg string) {
data, _ := json.Marshal(map[string]string{"type": "error", "message": msg})
wsutil.WriteServerMessage(conn, ws.OpText, data)
}

View file

@ -1,248 +0,0 @@
package extractor
import (
"context"
"crypto/hmac"
"crypto/sha1"
"encoding/base64"
"fmt"
"io"
"log"
"time"
"github.com/pion/webrtc/v4"
"github.com/pion/webrtc/v4/pkg/media"
"github.com/pion/webrtc/v4/pkg/media/ivfreader"
"github.com/pion/webrtc/v4/pkg/media/oggreader"
)
// TURNCredentials holds ephemeral TURN credentials generated from a shared secret.
type TURNCredentials struct {
URLs []string `json:"urls"`
Username string `json:"username"`
Credential string `json:"credential"`
}
// GenerateTURNCredentials creates time-limited TURN credentials using the
// shared secret (TURN REST API / coturn --use-auth-secret).
func GenerateTURNCredentials(turnURL, sharedSecret string, ttl time.Duration) TURNCredentials {
expiry := time.Now().Add(ttl).Unix()
username := fmt.Sprintf("%d", expiry)
mac := hmac.New(sha1.New, []byte(sharedSecret))
mac.Write([]byte(username))
credential := base64.StdEncoding.EncodeToString(mac.Sum(nil))
return TURNCredentials{
URLs: []string{turnURL},
Username: username,
Credential: credential,
}
}
// MediaStream wraps a pion WebRTC PeerConnection with VP8 video and Opus audio tracks.
type MediaStream struct {
pc *webrtc.PeerConnection
videoTrack *webrtc.TrackLocalStaticSample
audioTrack *webrtc.TrackLocalStaticSample
}
// NewMediaStream creates a PeerConnection with VP8 + Opus tracks and an ICE callback.
// The cancel function is called when ICE fails to trigger session cleanup.
func NewMediaStream(iceServers []webrtc.ICEServer, onICE func(*webrtc.ICECandidate), cancel context.CancelFunc) (*MediaStream, error) {
config := webrtc.Configuration{
ICEServers: iceServers,
}
pc, err := webrtc.NewPeerConnection(config)
if err != nil {
return nil, err
}
videoTrack, err := webrtc.NewTrackLocalStaticSample(
webrtc.RTPCodecCapability{MimeType: webrtc.MimeTypeVP8},
"video", "stream",
)
if err != nil {
pc.Close()
return nil, err
}
audioTrack, err := webrtc.NewTrackLocalStaticSample(
webrtc.RTPCodecCapability{MimeType: webrtc.MimeTypeOpus},
"audio", "stream",
)
if err != nil {
pc.Close()
return nil, err
}
if _, err = pc.AddTrack(videoTrack); err != nil {
pc.Close()
return nil, err
}
if _, err = pc.AddTrack(audioTrack); err != nil {
pc.Close()
return nil, err
}
pc.OnICEConnectionStateChange(func(state webrtc.ICEConnectionState) {
log.Printf("webrtc: ICE connection state: %s", state.String())
if state == webrtc.ICEConnectionStateFailed {
log.Printf("webrtc: ICE failed, cancelling session")
cancel()
return
}
if state == webrtc.ICEConnectionStateConnected {
// Log selected candidate pair
if stats := pc.GetStats(); stats != nil {
for _, s := range stats {
if cp, ok := s.(webrtc.ICECandidatePairStats); ok && cp.Nominated {
log.Printf("webrtc: selected candidate pair: local=%s remote=%s",
cp.LocalCandidateID, cp.RemoteCandidateID)
}
}
}
// Start periodic stats logging
go func() {
ticker := time.NewTicker(10 * time.Second)
defer ticker.Stop()
for range ticker.C {
if pc.ICEConnectionState() != webrtc.ICEConnectionStateConnected &&
pc.ICEConnectionState() != webrtc.ICEConnectionStateCompleted {
return
}
stats := pc.GetStats()
for _, s := range stats {
if out, ok := s.(webrtc.OutboundRTPStreamStats); ok {
log.Printf("webrtc: outbound-rtp kind=%s bytes=%d packets=%d",
out.Kind, out.BytesSent, out.PacketsSent)
}
}
}
}()
}
})
pc.OnConnectionStateChange(func(state webrtc.PeerConnectionState) {
log.Printf("webrtc: peer connection state: %s", state.String())
})
pc.OnICECandidate(func(c *webrtc.ICECandidate) {
if c != nil {
log.Printf("webrtc: gathered ICE candidate: type=%s addr=%s:%d",
c.Typ.String(), c.Address, c.Port)
if onICE != nil {
onICE(c)
}
}
})
return &MediaStream{
pc: pc,
videoTrack: videoTrack,
audioTrack: audioTrack,
}, nil
}
// Offer creates an SDP offer, sets it as local description, and returns the SDP string.
func (m *MediaStream) Offer() (string, error) {
offer, err := m.pc.CreateOffer(nil)
if err != nil {
return "", err
}
if err := m.pc.SetLocalDescription(offer); err != nil {
return "", err
}
return offer.SDP, nil
}
// SetAnswer sets the remote SDP answer.
func (m *MediaStream) SetAnswer(sdp string) error {
return m.pc.SetRemoteDescription(webrtc.SessionDescription{
Type: webrtc.SDPTypeAnswer,
SDP: sdp,
})
}
// AddICECandidate adds a remote ICE candidate.
func (m *MediaStream) AddICECandidate(init webrtc.ICECandidateInit) error {
return m.pc.AddICECandidate(init)
}
// StreamVideo reads VP8 frames from an IVF stream and writes them to the video track.
// Blocks until the reader returns an error or the context is cancelled.
func (m *MediaStream) StreamVideo(r io.Reader, ctx context.Context) {
ivf, _, err := ivfreader.NewWith(r)
if err != nil {
log.Printf("webrtc: ivf reader error: %v", err)
return
}
duration := time.Second / 30
for {
select {
case <-ctx.Done():
return
default:
}
frame, _, err := ivf.ParseNextFrame()
if err != nil {
if err != io.EOF {
log.Printf("webrtc: video frame error: %v", err)
}
return
}
if err := m.videoTrack.WriteSample(media.Sample{
Data: frame,
Duration: duration,
}); err != nil {
log.Printf("webrtc: video write error: %v", err)
return
}
}
}
// StreamAudio reads Opus pages from an OGG stream and writes them to the audio track.
// Blocks until the reader returns an error or the context is cancelled.
func (m *MediaStream) StreamAudio(r io.Reader, ctx context.Context) {
ogg, _, err := oggreader.NewWith(r)
if err != nil {
log.Printf("webrtc: ogg reader error: %v", err)
return
}
for {
select {
case <-ctx.Done():
return
default:
}
page, _, err := ogg.ParseNextPage()
if err != nil {
if err != io.EOF {
log.Printf("webrtc: audio page error: %v", err)
}
return
}
if err := m.audioTrack.WriteSample(media.Sample{
Data: page,
Duration: 20 * time.Millisecond,
}); err != nil {
log.Printf("webrtc: audio write error: %v", err)
return
}
}
}
// Close closes the underlying PeerConnection.
func (m *MediaStream) Close() {
if m.pc != nil {
m.pc.Close()
}
}

View file

@ -1,188 +0,0 @@
package healthcheck
import (
"context"
"log"
"net/http"
"sync"
"time"
"f1-stream/internal/models"
"f1-stream/internal/store"
)
const unhealthyThreshold = 5
const userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
// isReachable sends a GET request and returns true if the server responds with
// an HTTP 2xx or 3xx status code.
func isReachable(client *http.Client, rawURL string) bool {
req, err := http.NewRequest("GET", rawURL, nil)
if err != nil {
return false
}
req.Header.Set("User-Agent", userAgent)
resp, err := client.Do(req)
if err != nil {
return false
}
defer resp.Body.Close()
return resp.StatusCode >= 200 && resp.StatusCode < 400
}
type HealthChecker struct {
store *store.Store
interval time.Duration
timeout time.Duration
client *http.Client
mu sync.Mutex
}
func New(s *store.Store, interval, timeout time.Duration) *HealthChecker {
return &HealthChecker{
store: s,
interval: interval,
timeout: timeout,
client: &http.Client{
Timeout: timeout,
CheckRedirect: func(req *http.Request, via []*http.Request) error {
if len(via) >= 3 {
return http.ErrUseLastResponse
}
return nil
},
},
}
}
func (hc *HealthChecker) Run(ctx context.Context) {
log.Printf("healthcheck: starting with interval=%v timeout=%v", hc.interval, hc.timeout)
hc.checkAll()
ticker := time.NewTicker(hc.interval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
log.Println("healthcheck: shutting down")
return
case <-ticker.C:
hc.checkAll()
}
}
}
func (hc *HealthChecker) checkAll() {
hc.mu.Lock()
defer hc.mu.Unlock()
start := time.Now()
urls := hc.collectURLs()
log.Printf("healthcheck: checking %d URLs", len(urls))
existing, err := hc.store.LoadHealthStates()
if err != nil {
log.Printf("healthcheck: failed to load health states: %v", err)
existing = nil
}
stateMap := make(map[string]*models.HealthState, len(existing))
for i := range existing {
stateMap[existing[i].URL] = &existing[i]
}
now := time.Now()
var recovered, newlyUnhealthy int
for _, url := range urls {
st, exists := stateMap[url]
if !exists {
st = &models.HealthState{
URL: url,
Healthy: true,
}
stateMap[url] = st
}
ok := isReachable(hc.client, url)
if ok {
if !st.Healthy {
log.Printf("healthcheck: recovered %s", truncate(url, 80))
recovered++
}
st.ConsecutiveFailures = 0
st.Healthy = true
} else {
st.ConsecutiveFailures++
if st.ConsecutiveFailures >= unhealthyThreshold && st.Healthy {
st.Healthy = false
log.Printf("healthcheck: marking unhealthy after %d failures: %s", st.ConsecutiveFailures, truncate(url, 80))
newlyUnhealthy++
}
}
st.LastCheckTime = now
}
// Prune orphaned entries: only keep states whose URL is in the current set
urlSet := make(map[string]bool, len(urls))
for _, u := range urls {
urlSet[u] = true
}
var finalStates []models.HealthState
healthyCount := 0
for _, st := range stateMap {
if urlSet[st.URL] {
finalStates = append(finalStates, *st)
if st.Healthy {
healthyCount++
}
}
}
if err := hc.store.SaveHealthStates(finalStates); err != nil {
log.Printf("healthcheck: failed to save health states: %v", err)
}
log.Printf("healthcheck: done in %v, checked=%d healthy=%d recovered=%d newly_unhealthy=%d",
time.Since(start).Round(time.Millisecond), len(urls), healthyCount, recovered, newlyUnhealthy)
}
func (hc *HealthChecker) collectURLs() []string {
seen := make(map[string]bool)
streams, err := hc.store.LoadStreams()
if err != nil {
log.Printf("healthcheck: failed to load streams: %v", err)
} else {
for _, s := range streams {
seen[s.URL] = true
}
}
scraped, err := hc.store.LoadScrapedLinks()
if err != nil {
log.Printf("healthcheck: failed to load scraped links: %v", err)
} else {
for _, l := range scraped {
seen[l.URL] = true
}
}
urls := make([]string, 0, len(seen))
for u := range seen {
urls = append(urls, u)
}
return urls
}
func truncate(s string, maxLen int) string {
if len(s) <= maxLen {
return s
}
return s[:maxLen] + "..."
}

View file

@ -1,209 +0,0 @@
package hlsproxy
import (
"bufio"
"encoding/base64"
"io"
"log"
"net/http"
"net/url"
"strings"
)
// NewHandler returns an http.Handler for /hls/{base64url_encoded_full_url}.
// It proxies HLS playlists and segments, rewriting m3u8 URLs to route
// through the proxy and forwarding X-Hls-Forward-* headers upstream.
func NewHandler() http.Handler {
client := &http.Client{
Timeout: 30_000_000_000, // 30s
CheckRedirect: func(req *http.Request, via []*http.Request) error {
if len(via) >= 5 {
return http.ErrUseLastResponse
}
return nil
},
}
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method == http.MethodOptions {
setCORS(w)
w.WriteHeader(http.StatusNoContent)
return
}
// Parse: /hls/{base64url_encoded_full_url}
trimmed := strings.TrimPrefix(r.URL.Path, "/hls/")
if trimmed == "" || trimmed == r.URL.Path {
http.Error(w, "bad hls proxy URL", http.StatusBadRequest)
return
}
// Decode the full upstream URL from base64url
upstreamURL, err := base64.RawURLEncoding.DecodeString(trimmed)
if err != nil {
http.Error(w, "invalid base64url", http.StatusBadRequest)
return
}
target := string(upstreamURL)
parsed, err := url.Parse(target)
if err != nil || (parsed.Scheme != "http" && parsed.Scheme != "https") {
http.Error(w, "invalid upstream URL", http.StatusBadRequest)
return
}
log.Printf("hlsproxy: %s -> %s", r.URL.Path, target)
upReq, err := http.NewRequestWithContext(r.Context(), http.MethodGet, target, nil)
if err != nil {
http.Error(w, "failed to create request", http.StatusInternalServerError)
return
}
// Set Referer and Origin. If the URL has a ?domain= param (CDN segments),
// use that domain as the origin so the CDN accepts the request.
refererOrigin := parsed.Scheme + "://" + parsed.Host
if domainParam := parsed.Query().Get("domain"); domainParam != "" {
refererOrigin = "https://" + domainParam
}
upReq.Header.Set("Referer", refererOrigin+"/")
upReq.Header.Set("Origin", refererOrigin)
upReq.Header.Set("User-Agent", r.Header.Get("User-Agent"))
// Forward X-Hls-Forward-* headers (strip prefix)
for key, vals := range r.Header {
if strings.HasPrefix(key, "X-Hls-Forward-") {
realKey := strings.TrimPrefix(key, "X-Hls-Forward-")
for _, v := range vals {
upReq.Header.Set(realKey, v)
}
}
}
resp, err := client.Do(upReq)
if err != nil {
log.Printf("hlsproxy: upstream fetch failed: %v", err)
http.Error(w, "upstream fetch failed", http.StatusBadGateway)
return
}
defer resp.Body.Close()
log.Printf("hlsproxy: %s <- %d (%s)", truncPath(r.URL.Path, 60), resp.StatusCode, resp.Header.Get("Content-Type"))
setCORS(w)
ct := resp.Header.Get("Content-Type")
isM3U8 := strings.Contains(ct, "mpegurl") ||
strings.Contains(ct, "x-mpegURL") ||
strings.HasSuffix(parsed.Path, ".m3u8")
if isM3U8 {
w.Header().Set("Content-Type", "application/vnd.apple.mpegurl")
w.WriteHeader(resp.StatusCode)
rewriteM3U8(w, resp.Body, target)
return
}
// Stream segment or other content directly
for key, vals := range resp.Header {
lk := strings.ToLower(key)
if lk == "content-type" || lk == "content-length" || lk == "cache-control" || lk == "accept-ranges" {
for _, v := range vals {
w.Header().Add(key, v)
}
}
}
w.WriteHeader(resp.StatusCode)
io.Copy(w, resp.Body)
})
}
// rewriteM3U8 reads an m3u8 playlist from r, rewrites segment/playlist URLs
// to route through /hls/{b64}, and writes the result to w.
func rewriteM3U8(w io.Writer, r io.Reader, playlistURL string) {
base, err := url.Parse(playlistURL)
if err != nil {
io.Copy(w, r)
return
}
scanner := bufio.NewScanner(r)
scanner.Buffer(make([]byte, 0, 64*1024), 1024*1024)
for scanner.Scan() {
line := scanner.Text()
if strings.HasPrefix(line, "#") {
// Rewrite URI="..." in directives like #EXT-X-KEY, #EXT-X-MAP
rewritten := rewriteURIAttribute(line, base)
w.Write([]byte(rewritten))
w.Write([]byte("\n"))
continue
}
// Non-comment, non-empty lines are URLs
trimmed := strings.TrimSpace(line)
if trimmed == "" {
w.Write([]byte("\n"))
continue
}
resolved := resolveURL(base, trimmed)
encoded := encodeHLSURL(resolved)
w.Write([]byte(encoded))
w.Write([]byte("\n"))
}
}
// rewriteURIAttribute rewrites URI="..." attributes in HLS directives.
func rewriteURIAttribute(line string, base *url.URL) string {
// Look for URI="..." (case insensitive)
uriIdx := strings.Index(strings.ToUpper(line), "URI=\"")
if uriIdx == -1 {
return line
}
// Find the actual position (preserving original case)
prefix := line[:uriIdx+5] // everything up to and including URI="
rest := line[uriIdx+5:]
endQuote := strings.Index(rest, "\"")
if endQuote == -1 {
return line
}
uri := rest[:endQuote]
suffix := rest[endQuote:] // closing quote and anything after
resolved := resolveURL(base, uri)
encoded := encodeHLSURL(resolved)
return prefix + encoded + suffix
}
// resolveURL resolves a potentially relative URL against a base URL.
func resolveURL(base *url.URL, ref string) string {
refURL, err := url.Parse(ref)
if err != nil {
return ref
}
return base.ResolveReference(refURL).String()
}
// encodeHLSURL encodes a full URL into /hls/{base64url} format.
func encodeHLSURL(fullURL string) string {
encoded := base64.RawURLEncoding.EncodeToString([]byte(fullURL))
return "/hls/" + encoded
}
func setCORS(w http.ResponseWriter) {
w.Header().Set("Access-Control-Allow-Origin", "*")
w.Header().Set("Access-Control-Allow-Methods", "GET, OPTIONS")
w.Header().Set("Access-Control-Allow-Headers", "*")
}
func truncPath(s string, n int) string {
if len(s) <= n {
return s
}
return s[:n] + "..."
}

View file

@ -1,53 +0,0 @@
package models
import (
"time"
"github.com/go-webauthn/webauthn/webauthn"
)
type User struct {
ID string `json:"id"`
Username string `json:"username"`
IsAdmin bool `json:"is_admin"`
Credentials []webauthn.Credential `json:"credentials"`
CreatedAt time.Time `json:"created_at"`
}
// WebAuthn interface implementation
func (u *User) WebAuthnID() []byte { return []byte(u.ID) }
func (u *User) WebAuthnName() string { return u.Username }
func (u *User) WebAuthnDisplayName() string { return u.Username }
func (u *User) WebAuthnCredentials() []webauthn.Credential { return u.Credentials }
type Stream struct {
ID string `json:"id"`
URL string `json:"url"`
Title string `json:"title"`
SubmittedBy string `json:"submitted_by"`
Published bool `json:"published"`
Source string `json:"source"`
CreatedAt time.Time `json:"created_at"`
}
type ScrapedLink struct {
ID string `json:"id"`
URL string `json:"url"`
Title string `json:"title"`
Source string `json:"source"`
ScrapedAt time.Time `json:"scraped_at"`
Stale bool `json:"stale"`
}
type Session struct {
Token string `json:"token"`
UserID string `json:"user_id"`
ExpiresAt time.Time `json:"expires_at"`
}
type HealthState struct {
URL string `json:"url"`
ConsecutiveFailures int `json:"consecutive_failures"`
LastCheckTime time.Time `json:"last_check_time"`
Healthy bool `json:"healthy"`
}

View file

@ -1,512 +0,0 @@
package playerconfig
import (
"context"
"encoding/base64"
"encoding/json"
"fmt"
"io"
"log"
"net/http"
"regexp"
"strings"
"sync"
"time"
)
// PlayerConfig is returned by the /api/streams/{id}/player-config endpoint.
type PlayerConfig struct {
Type string `json:"type"`
HLSURL string `json:"hls_url,omitempty"`
AuthToken string `json:"auth_token,omitempty"`
ChannelKey string `json:"channel_key,omitempty"`
ChannelSalt string `json:"channel_salt,omitempty"`
Timestamp string `json:"timestamp,omitempty"`
AuthModURL string `json:"auth_mod_url,omitempty"`
ServerKey string `json:"server_key,omitempty"`
Error string `json:"error,omitempty"`
}
type cacheEntry struct {
config *PlayerConfig
expiresAt time.Time
}
// Service handles stream type detection and DaddyLive config extraction.
type Service struct {
client *http.Client
mu sync.RWMutex
cache map[string]*cacheEntry
}
// New creates a new playerconfig Service.
func New() *Service {
return &Service{
client: &http.Client{
Timeout: 15 * time.Second,
},
cache: make(map[string]*cacheEntry),
}
}
// DetectStreamType returns "hls", "daddylive", "vipleague", or "proxy" based on the URL.
func DetectStreamType(rawURL string) string {
lower := strings.ToLower(rawURL)
if strings.HasSuffix(strings.SplitN(lower, "?", 2)[0], ".m3u8") {
return "hls"
}
daddyPatterns := []string{"dlhd.link", "dlhd.sx", "dlhd.dad", "daddylive", "ksohls.ru"}
for _, p := range daddyPatterns {
if strings.Contains(lower, p) {
return "daddylive"
}
}
vipPatterns := []string{"vipleague.io", "vipleague.im", "vipleague.cc", "casthill.net"}
for _, p := range vipPatterns {
if strings.Contains(lower, p) {
return "vipleague"
}
}
return "proxy"
}
// GetConfig returns a PlayerConfig for the given stream URL.
func (s *Service) GetConfig(ctx context.Context, rawURL string) *PlayerConfig {
streamType := DetectStreamType(rawURL)
switch streamType {
case "hls":
encoded := base64.RawURLEncoding.EncodeToString([]byte(rawURL))
return &PlayerConfig{
Type: "hls",
HLSURL: "/hls/" + encoded,
}
case "daddylive":
return s.getDaddyLiveConfig(ctx, rawURL)
case "vipleague":
return s.getVIPLeagueConfig(ctx, rawURL)
default:
return &PlayerConfig{Type: "proxy"}
}
}
// Channel ID extraction patterns
var channelIDPatterns = []*regexp.Regexp{
regexp.MustCompile(`stream-(\d+)\.php`),
regexp.MustCompile(`[?&]id=(\d+)`),
regexp.MustCompile(`/(\d+)\.php`),
}
// Page content extraction patterns
var (
iframeRe = regexp.MustCompile(`<iframe[^>]*src=["'](https?://[^"']*ksohls\.ru[^"']*)["']`)
authTokenRe = regexp.MustCompile(`authToken\s*[:=]\s*['"]([^'"]+)['"]`)
channelKeyRe = regexp.MustCompile(`channelKey\s*[:=]\s*['"]([^'"]+)['"]`)
channelSaltRe = regexp.MustCompile(`channelSalt\s*[:=]\s*['"]([^'"]+)['"]`)
timestampRe = regexp.MustCompile(`timestamp\s*[:=]\s*['"]?(\d+)['"]?`)
authModRe = regexp.MustCompile(`<script[^>]*src=["'](https?://[^"']*aiaged\.fun[^"']*obfuscated[^"']*)["']`)
)
func (s *Service) getDaddyLiveConfig(ctx context.Context, rawURL string) *PlayerConfig {
// Check cache
s.mu.RLock()
if entry, ok := s.cache[rawURL]; ok && time.Now().Before(entry.expiresAt) {
s.mu.RUnlock()
return entry.config
}
s.mu.RUnlock()
config := s.fetchDaddyLiveConfig(ctx, rawURL)
// Cache the result (even errors, to avoid hammering)
s.mu.Lock()
s.cache[rawURL] = &cacheEntry{
config: config,
expiresAt: time.Now().Add(1 * time.Hour),
}
s.mu.Unlock()
return config
}
func (s *Service) fetchDaddyLiveConfig(ctx context.Context, rawURL string) *PlayerConfig {
// Step 1: Extract channel ID from URL
channelID := ""
for _, re := range channelIDPatterns {
if m := re.FindStringSubmatch(rawURL); len(m) > 1 {
channelID = m[1]
break
}
}
if channelID == "" {
return &PlayerConfig{Type: "proxy", Error: "could not extract channel ID"}
}
log.Printf("playerconfig: DaddyLive channel=%s from %s", channelID, rawURL)
return s.fetchDaddyLiveConfigByID(ctx, channelID)
}
func (s *Service) fetchDaddyLiveConfigByID(ctx context.Context, channelID string) *PlayerConfig {
// Step 2: Fetch the cast page to find the ksohls iframe
castURL := fmt.Sprintf("https://dlhd.link/cast/stream-%s.php", channelID)
castBody, err := s.fetchPage(ctx, castURL, "https://dlhd.link/")
if err != nil {
log.Printf("playerconfig: failed to fetch cast page: %v", err)
return &PlayerConfig{Type: "proxy", Error: "failed to fetch cast page"}
}
// Step 3: Extract ksohls iframe URL
iframeMatch := iframeRe.FindStringSubmatch(castBody)
if iframeMatch == nil {
log.Printf("playerconfig: no ksohls iframe found in cast page")
return &PlayerConfig{Type: "proxy", Error: "no ksohls iframe found"}
}
ksohURL := iframeMatch[1]
// Step 4: Fetch the ksohls page
referer := fmt.Sprintf("https://dlhd.link/stream/stream-%s.php", channelID)
ksohBody, err := s.fetchPage(ctx, ksohURL, referer)
if err != nil {
log.Printf("playerconfig: failed to fetch ksohls page: %v", err)
return &PlayerConfig{Type: "proxy", Error: "failed to fetch ksohls page"}
}
// Step 5: Extract auth params from ksohls page
config := &PlayerConfig{Type: "daddylive"}
if m := authTokenRe.FindStringSubmatch(ksohBody); len(m) > 1 {
config.AuthToken = m[1]
}
if m := channelKeyRe.FindStringSubmatch(ksohBody); len(m) > 1 {
config.ChannelKey = m[1]
}
if m := channelSaltRe.FindStringSubmatch(ksohBody); len(m) > 1 {
config.ChannelSalt = m[1]
}
if m := timestampRe.FindStringSubmatch(ksohBody); len(m) > 1 {
config.Timestamp = m[1]
}
if m := authModRe.FindStringSubmatch(ksohBody); len(m) > 1 {
config.AuthModURL = m[1]
}
if config.ChannelKey == "" {
log.Printf("playerconfig: no channelKey found in ksohls page")
return &PlayerConfig{Type: "proxy", Error: "no channelKey found"}
}
// Step 6: Server lookup
lookupURL := fmt.Sprintf("https://chevy.soyspace.cyou/server_lookup?channel_id=%s", config.ChannelKey)
lookupBody, err := s.fetchPage(ctx, lookupURL, "")
if err != nil {
log.Printf("playerconfig: server lookup failed: %v", err)
return &PlayerConfig{Type: "proxy", Error: "server lookup failed"}
}
var lookupResp struct {
ServerKey string `json:"server_key"`
}
if err := json.Unmarshal([]byte(lookupBody), &lookupResp); err != nil || lookupResp.ServerKey == "" {
log.Printf("playerconfig: failed to parse server lookup: %v body=%s", err, lookupBody)
return &PlayerConfig{Type: "proxy", Error: "server lookup parse failed"}
}
config.ServerKey = lookupResp.ServerKey
// Step 7: Build m3u8 URL
m3u8URL := fmt.Sprintf("https://chevy.soyspace.cyou/proxy/%s/%s/mono.m3u8",
config.ServerKey, config.ChannelKey)
encoded := base64.RawURLEncoding.EncodeToString([]byte(m3u8URL))
config.HLSURL = "/hls/" + encoded
log.Printf("playerconfig: DaddyLive config ready channel=%s server=%s", config.ChannelKey, config.ServerKey)
return config
}
// VIPLeague/casthill resolution
var zmidRe = regexp.MustCompile(`(?:const|var|let)\s+zmid\s*=\s*["']([^"']+)["']`)
var casthillVRe = regexp.MustCompile(`[?&]v=([^&]+)`)
func (s *Service) getVIPLeagueConfig(ctx context.Context, rawURL string) *PlayerConfig {
// Check cache using normalized URL
s.mu.RLock()
if entry, ok := s.cache[rawURL]; ok && time.Now().Before(entry.expiresAt) {
s.mu.RUnlock()
return entry.config
}
s.mu.RUnlock()
config := s.fetchVIPLeagueConfig(ctx, rawURL)
s.mu.Lock()
s.cache[rawURL] = &cacheEntry{
config: config,
expiresAt: time.Now().Add(1 * time.Hour),
}
s.mu.Unlock()
return config
}
func (s *Service) fetchVIPLeagueConfig(ctx context.Context, rawURL string) *PlayerConfig {
lower := strings.ToLower(rawURL)
var zmid string
if strings.Contains(lower, "casthill.net") {
// Extract zmid from casthill URL query param ?v=...
if m := casthillVRe.FindStringSubmatch(rawURL); len(m) > 1 {
zmid = m[1]
}
}
if zmid == "" {
// Try to fetch VIPLeague page and extract zmid from JavaScript
body, err := s.fetchPage(ctx, rawURL, "")
if err != nil {
log.Printf("playerconfig: failed to fetch VIPLeague page: %v, trying URL-based extraction", err)
} else {
if m := zmidRe.FindStringSubmatch(body); len(m) > 1 {
zmid = m[1]
}
}
}
if zmid == "" {
// Fallback: extract slug from URL path and use it directly for channel matching
// e.g. /f-1/sky-sports-f1-streaming → "sky sports f1"
zmid = extractSlugFromURL(rawURL)
if zmid != "" {
log.Printf("playerconfig: extracted slug %q from URL path", zmid)
}
}
if zmid == "" {
log.Printf("playerconfig: no zmid found for VIPLeague URL %s", rawURL)
return &PlayerConfig{Type: "proxy", Error: "no zmid found in VIPLeague page"}
}
log.Printf("playerconfig: VIPLeague zmid=%q from %s", zmid, rawURL)
channelID, err := s.resolveChannelID(ctx, zmid)
if err != nil {
log.Printf("playerconfig: failed to resolve zmid %q: %v", zmid, err)
return &PlayerConfig{Type: "proxy", Error: fmt.Sprintf("failed to resolve zmid: %v", err)}
}
log.Printf("playerconfig: resolved zmid=%q to DaddyLive channel=%s", zmid, channelID)
return s.fetchDaddyLiveConfigByID(ctx, channelID)
}
// extractSlugFromURL extracts a channel-matching slug from a VIPLeague URL path.
// e.g. "https://vipleague.io/f-1/sky-sports-f1-streaming" → "sky sports f1"
// Strips common suffixes like "-streaming", "-live-stream", "-live", etc.
func extractSlugFromURL(rawURL string) string {
// Get the last path segment
path := rawURL
if idx := strings.Index(path, "?"); idx != -1 {
path = path[:idx]
}
path = strings.TrimRight(path, "/")
lastSlash := strings.LastIndex(path, "/")
if lastSlash == -1 {
return ""
}
slug := path[lastSlash+1:]
// Strip common suffixes
for _, suffix := range []string{"-streaming", "-live-stream", "-stream", "-live", "-online", "-free"} {
slug = strings.TrimSuffix(slug, suffix)
}
// Replace hyphens with spaces for matching against channel names
slug = strings.ReplaceAll(slug, "-", " ")
slug = strings.TrimSpace(slug)
if slug == "" || len(slug) < 3 {
return ""
}
return slug
}
var channelLinkRe = regexp.MustCompile(`<a[^>]*href=["'][^"']*watch\.php\?id=(\d+)["'][^>]*data-title=["']([^"']+)["']`)
var channelLinkRe2 = regexp.MustCompile(`<a[^>]*data-title=["']([^"']+)["'][^>]*href=["'][^"']*watch\.php\?id=(\d+)["']`)
func (s *Service) resolveChannelID(ctx context.Context, zmid string) (string, error) {
channels, err := s.getChannelIndex(ctx)
if err != nil {
return "", err
}
zmidLower := strings.ToLower(zmid)
// Build tokens: if zmid contains spaces, split on spaces; otherwise use tokenizer
var tokens []string
if strings.Contains(zmidLower, " ") {
for _, word := range strings.Fields(zmidLower) {
if len(word) >= 2 {
tokens = append(tokens, word)
}
}
} else {
tokens = tokenize(zmidLower)
}
bestID := ""
bestScore := 0
bestNameLen := 0
for id, name := range channels {
score := 0
for _, tok := range tokens {
if strings.Contains(name, tok) {
score++
}
}
// Tiebreaker: prefer shorter names (more specific match) and
// English/UK channels which tend to have shorter names
if score > bestScore || (score == bestScore && score > 0 && len(name) < bestNameLen) {
bestScore = score
bestID = id
bestNameLen = len(name)
}
}
if bestID == "" || bestScore == 0 {
return "", fmt.Errorf("no channel matched zmid %q (tried %d channels)", zmid, len(channels))
}
log.Printf("playerconfig: zmid=%q matched channel %s (%s) with score %d/%d",
zmid, bestID, channels[bestID], bestScore, len(tokens))
return bestID, nil
}
func (s *Service) getChannelIndex(ctx context.Context) (map[string]string, error) {
const cacheKey = "__channel_index__"
s.mu.RLock()
if entry, ok := s.cache[cacheKey]; ok && time.Now().Before(entry.expiresAt) {
s.mu.RUnlock()
// Decode from the Error field (ab)used as storage
var idx map[string]string
if err := json.Unmarshal([]byte(entry.config.Error), &idx); err == nil {
return idx, nil
}
}
s.mu.RUnlock()
body, err := s.fetchPage(ctx, "https://dlhd.link/24-7-channels.php", "https://dlhd.link/")
if err != nil {
return nil, fmt.Errorf("failed to fetch channel index: %w", err)
}
channels := make(map[string]string)
// Try both attribute orderings
for _, m := range channelLinkRe.FindAllStringSubmatch(body, -1) {
channels[m[1]] = strings.ToLower(strings.TrimSpace(m[2]))
}
for _, m := range channelLinkRe2.FindAllStringSubmatch(body, -1) {
channels[m[2]] = strings.ToLower(strings.TrimSpace(m[1]))
}
if len(channels) == 0 {
return nil, fmt.Errorf("no channels found in 24/7 page (%d bytes)", len(body))
}
log.Printf("playerconfig: loaded %d channels from DaddyLive 24/7 page", len(channels))
// Cache as JSON in a fake PlayerConfig entry
encoded, _ := json.Marshal(channels)
s.mu.Lock()
s.cache[cacheKey] = &cacheEntry{
config: &PlayerConfig{Error: string(encoded)},
expiresAt: time.Now().Add(6 * time.Hour),
}
s.mu.Unlock()
return channels, nil
}
// tokenize splits a zmid slug into meaningful tokens.
// e.g. "skyf1" -> ["sky", "f1"], "daznf1" -> ["dazn", "f1"]
func tokenize(zmid string) []string {
// Common known prefixes/suffixes in sports streaming slugs
knownTokens := []string{
"sky", "sports", "f1", "dazn", "espn", "fox", "bein", "bt",
"star", "nbc", "cbs", "tnt", "abc", "tsn", "supersport",
"canal", "rtl", "viaplay", "premier", "main", "event",
"arena", "action", "cricket", "football", "tennis", "golf",
"racing", "news", "extra", "max", "hd", "uhd",
}
var tokens []string
remaining := zmid
for len(remaining) > 0 {
matched := false
for _, tok := range knownTokens {
if strings.HasPrefix(remaining, tok) {
tokens = append(tokens, tok)
remaining = remaining[len(tok):]
matched = true
break
}
}
if !matched {
// Try numeric suffix (like channel numbers)
i := 0
for i < len(remaining) && remaining[i] >= '0' && remaining[i] <= '9' {
i++
}
if i > 0 {
tokens = append(tokens, remaining[:i])
remaining = remaining[i:]
} else {
// Skip single character and try again
remaining = remaining[1:]
}
}
}
// If tokenization produced nothing useful, use the whole zmid as a single token
if len(tokens) == 0 {
tokens = []string{zmid}
}
return tokens
}
func (s *Service) fetchPage(ctx context.Context, pageURL, referer string) (string, error) {
req, err := http.NewRequestWithContext(ctx, http.MethodGet, pageURL, nil)
if err != nil {
return "", err
}
req.Header.Set("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36")
if referer != "" {
req.Header.Set("Referer", referer)
}
resp, err := s.client.Do(req)
if err != nil {
return "", err
}
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
return "", fmt.Errorf("status %d from %s", resp.StatusCode, pageURL)
}
body, err := io.ReadAll(io.LimitReader(resp.Body, 2*1024*1024)) // 2MB max
if err != nil {
return "", err
}
return string(body), nil
}

View file

@ -1,473 +0,0 @@
package proxy
import (
"encoding/base64"
"fmt"
"io"
"log"
"net/http"
"net/url"
"regexp"
"strings"
)
// hopHeaders are headers that should not be forwarded by proxies.
var hopHeaders = map[string]bool{
"Connection": true,
"Keep-Alive": true,
"Proxy-Authenticate": true,
"Proxy-Authorization": true,
"Te": true,
"Trailers": true,
"Transfer-Encoding": true,
"Upgrade": true,
}
// antiFrameHeaders are headers we strip to allow iframe embedding.
var antiFrameHeaders = []string{
"X-Frame-Options",
"Content-Security-Policy",
"Content-Security-Policy-Report-Only",
"X-Content-Type-Options",
}
// forwardHeaders are request headers we copy from the client to the upstream.
// NOTE: Accept-Encoding is intentionally omitted so Go's Transport handles
// compression transparently (adds gzip, auto-decompresses response body).
// This ensures we can do text replacements on HTML/CSS bodies.
var forwardHeaders = []string{
"User-Agent",
"Accept",
"Accept-Language",
"Cookie",
"Referer",
"Range",
"If-None-Match",
"If-Modified-Since",
"Cache-Control",
}
// jsShimTemplate is injected into HTML responses to intercept JS-initiated requests.
// It patches fetch, XMLHttpRequest, WebSocket, and EventSource to route through the proxy.
const jsShimTemplate = `<script data-proxy-shim="1">(function(){
var P='/proxy/%s';
var O='%s';
var H=location.origin;
function b64(s){return btoa(s).replace(/\+/g,'-').replace(/\//g,'_').replace(/=+$/g,'');}
function rw(u){
if(!u||typeof u!=='string')return u;
if(u.startsWith('/proxy/'))return u;
if(u.startsWith(H+'/proxy/'))return u.slice(H.length);
if(u.startsWith(H+'/')){var hp=u.slice(H.length);return P+hp;}
if(u.startsWith(H))return P+'/';
if(u.startsWith('/'))return P+u;
if(u.startsWith(O))return P+u.slice(O.length);
try{var p=new URL(u);if(p.protocol==='http:'||p.protocol==='https:'){return'/proxy/'+b64(p.origin)+p.pathname+p.search+p.hash;}}catch(e){}
return u;
}
var _f=window.fetch;
window.fetch=function(i,o){
if(typeof i==='string')i=rw(i);
else if(i&&i.url)i=new Request(rw(i.url),i);
return _f.call(this,i,o);
};
var _xo=XMLHttpRequest.prototype.open;
XMLHttpRequest.prototype.open=function(m,u){var a=[].slice.call(arguments);a[1]=rw(u);return _xo.apply(this,a);};
var _ws=window.WebSocket;
window.WebSocket=function(u,p){return new _ws(rw(u),p);};
window.WebSocket.prototype=_ws.prototype;
window.WebSocket.CONNECTING=_ws.CONNECTING;
window.WebSocket.OPEN=_ws.OPEN;
window.WebSocket.CLOSING=_ws.CLOSING;
window.WebSocket.CLOSED=_ws.CLOSED;
if(window.EventSource){var _es=window.EventSource;window.EventSource=function(u,o){return new _es(rw(u),o);};window.EventSource.prototype=_es.prototype;}
var _ce=document.createElement.bind(document);
document.createElement=function(t){
var el=_ce(t);
var tag=t.toLowerCase();
if(tag==='script'||tag==='img'||tag==='link'||tag==='source'||tag==='video'||tag==='audio'){
var _ss=Object.getOwnPropertyDescriptor(HTMLElement.prototype,'src')||Object.getOwnPropertyDescriptor(el.__proto__,'src');
if(_ss&&_ss.set){Object.defineProperty(el,'src',{get:function(){return _ss.get?_ss.get.call(this):'';},set:function(v){_ss.set.call(this,rw(v));},configurable:true});}
}
return el;
};
/* Neutralize anti-debug: override setInterval/setTimeout to skip debugger-based detection */
var _si=window.setInterval;
window.setInterval=function(fn,ms){
if(typeof fn==='function'){var s=fn.toString();if(s.indexOf('debugger')!==-1||s.indexOf('devtool')!==-1)return 0;}
if(typeof fn==='string'&&(fn.indexOf('debugger')!==-1||fn.indexOf('devtool')!==-1))return 0;
return _si.apply(this,arguments);
};
var _st=window.setTimeout;
window.setTimeout=function(fn,ms){
if(typeof fn==='function'){var s=fn.toString();if(s.indexOf('debugger')!==-1||s.indexOf('devtool')!==-1)return 0;}
if(typeof fn==='string'&&(fn.indexOf('debugger')!==-1||fn.indexOf('devtool')!==-1))return 0;
return _st.apply(this,arguments);
};
/* Override eval and Function to strip debugger statements */
var _eval=window.eval;
window.eval=function(s){if(typeof s==='string')s=s.replace(/\bdebugger\b\s*;?/g,'');return _eval.call(this,s);};
var _Fn=Function;
window.Function=function(){var a=[].slice.call(arguments);if(a.length>0){var last=a.length-1;if(typeof a[last]==='string')a[last]=a[last].replace(/\bdebugger\b\s*;?/g,'');}return _Fn.apply(this,a);};
window.Function.prototype=_Fn.prototype;
/* Block loading of known anti-debug scripts */
var _ael=HTMLScriptElement.prototype.setAttribute;
HTMLScriptElement.prototype.setAttribute=function(n,v){
if(n==='src'&&typeof v==='string'&&(v.indexOf('disable-devtool')!==-1||v.indexOf('devtools-detect')!==-1)){return;}
return _ael.apply(this,arguments);
};
})();</script>`
// NewHandler returns an http.Handler that serves the reverse proxy at /proxy/.
// URL structure: /proxy/{base64_origin}/{path...}
func NewHandler() http.Handler {
client := &http.Client{
Timeout: 30 * 1000000000, // 30s
CheckRedirect: func(req *http.Request, via []*http.Request) error {
return http.ErrUseLastResponse // don't follow redirects
},
}
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Parse: /proxy/{base64_origin}/{path...}
trimmed := strings.TrimPrefix(r.URL.Path, "/proxy/")
if trimmed == "" || trimmed == r.URL.Path {
http.Error(w, "bad proxy URL", http.StatusBadRequest)
return
}
// Split into base64 segment and remaining path
slashIdx := strings.Index(trimmed, "/")
var b64Origin, pathAndQuery string
if slashIdx == -1 {
b64Origin = trimmed
pathAndQuery = "/"
} else {
b64Origin = trimmed[:slashIdx]
pathAndQuery = trimmed[slashIdx:]
}
originBytes, err := base64.RawURLEncoding.DecodeString(b64Origin)
if err != nil {
// Try standard encoding with padding
originBytes, err = base64.StdEncoding.DecodeString(b64Origin)
if err != nil {
http.Error(w, "invalid base64 origin", http.StatusBadRequest)
return
}
}
origin := string(originBytes)
// Validate origin is a valid URL
originURL, err := url.Parse(origin)
if err != nil || (originURL.Scheme != "http" && originURL.Scheme != "https") {
http.Error(w, "invalid origin URL", http.StatusBadRequest)
return
}
// Build upstream URL
targetURL := origin + pathAndQuery
if r.URL.RawQuery != "" {
targetURL += "?" + r.URL.RawQuery
}
log.Printf("proxy: %s %s -> %s", r.Method, r.URL.Path, targetURL)
// Create upstream request
upReq, err := http.NewRequestWithContext(r.Context(), r.Method, targetURL, r.Body)
if err != nil {
http.Error(w, "failed to create request", http.StatusInternalServerError)
return
}
// Copy selected headers
for _, h := range forwardHeaders {
if v := r.Header.Get(h); v != "" {
upReq.Header.Set(h, v)
}
}
// Reconstruct the original Referer from the client's proxy-rewritten Referer.
// The client sends e.g. "https://f1.viktorbarzin.me/proxy/{b64origin}/path"
// and we need to decode that back to "https://original.com/path".
upReq.Header.Set("Referer", decodeProxyReferer(r.Header.Get("Referer"), origin))
// Fetch upstream
resp, err := client.Do(upReq)
if err != nil {
log.Printf("proxy: upstream fetch failed: %v", err)
http.Error(w, "upstream fetch failed", http.StatusBadGateway)
return
}
defer resp.Body.Close()
log.Printf("proxy: %s %s <- %d (%s)", r.Method, r.URL.Path, resp.StatusCode, resp.Header.Get("Content-Type"))
// Handle redirects: rewrite Location header through proxy
if resp.StatusCode >= 300 && resp.StatusCode < 400 {
loc := resp.Header.Get("Location")
if loc != "" {
rewritten := rewriteRedirect(loc, origin, b64Origin)
w.Header().Set("Location", rewritten)
log.Printf("proxy: redirect %s -> %s", loc, rewritten)
}
w.WriteHeader(resp.StatusCode)
return
}
// Copy response headers, stripping anti-frame, hop-by-hop, and encoding headers.
// Content-Encoding is stripped because Go's Transport already decompressed the body.
// Content-Length is stripped because we may rewrite the body (changing its length).
for key, vals := range resp.Header {
if hopHeaders[key] {
continue
}
if strings.EqualFold(key, "Content-Encoding") || strings.EqualFold(key, "Content-Length") {
continue
}
skip := false
for _, ah := range antiFrameHeaders {
if strings.EqualFold(key, ah) {
skip = true
break
}
}
if skip {
continue
}
for _, v := range vals {
w.Header().Add(key, v)
}
}
// Add permissive CORS headers so cross-origin XHR/fetch from the iframe works
w.Header().Set("Access-Control-Allow-Origin", "*")
w.Header().Set("Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS")
w.Header().Set("Access-Control-Allow-Headers", "*")
w.WriteHeader(resp.StatusCode)
// For HTML responses, rewrite URLs and inject JS shim
ct := resp.Header.Get("Content-Type")
if strings.Contains(ct, "text/html") {
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Printf("proxy: failed to read HTML body: %v", err)
return
}
rewritten := rewriteHTML(string(body), origin, b64Origin)
w.Write([]byte(rewritten))
return
}
// For CSS responses, rewrite url() references
if strings.Contains(ct, "text/css") {
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Printf("proxy: failed to read CSS body: %v", err)
return
}
rewritten := rewriteCSS(string(body), origin, b64Origin)
w.Write([]byte(rewritten))
return
}
// For JavaScript responses, strip debugger statements
if strings.Contains(ct, "javascript") || strings.Contains(ct, "ecmascript") {
body, err := io.ReadAll(resp.Body)
if err != nil {
log.Printf("proxy: failed to read JS body: %v", err)
return
}
cleaned := debuggerStmtRe.ReplaceAllString(string(body), "/* */")
w.Write([]byte(cleaned))
return
}
// Stream other responses directly
io.Copy(w, resp.Body)
})
}
// rewriteRedirect rewrites a Location header value to route through the proxy.
func rewriteRedirect(loc, origin, b64Origin string) string {
// Absolute URL on the same origin
if strings.HasPrefix(loc, origin) {
path := strings.TrimPrefix(loc, origin)
return "/proxy/" + b64Origin + path
}
// Absolute URL on a different origin — proxy it too
parsed, err := url.Parse(loc)
if err != nil {
return loc
}
if parsed.IsAbs() {
newOrigin := parsed.Scheme + "://" + parsed.Host
newB64 := base64.RawURLEncoding.EncodeToString([]byte(newOrigin))
return "/proxy/" + newB64 + parsed.RequestURI()
}
// Relative URL — it will resolve naturally
return loc
}
// Precompiled regexes for root-relative URL rewriting in HTML attributes.
// Matches src="/...", href="/...", action="/...", poster="/..." but NOT "//..." (protocol-relative).
var rootRelativeAttrRe = regexp.MustCompile(`((?:src|href|action|poster|data)\s*=\s*["'])/([^/"'][^"']*)`)
// Matches url("/...") or url('/...') or url(/...) in inline styles — but NOT url("//...")
var rootRelativeCSSRe = regexp.MustCompile(`(url\(\s*["']?)/([^/"')[^"')]*)(["']?\s*\))`)
// disableDevtoolRe matches <script> tags that load disable-devtool or similar anti-debug libraries.
var disableDevtoolRe = regexp.MustCompile(`(?i)<script[^>]*(?:disable-devtool|devtools-detect)[^>]*>(?:</script>)?`)
// adScriptRe matches <script> tags that load common ad/popup libraries.
var adScriptRe = regexp.MustCompile(`(?i)<script[^>]*(?:acscdn\.com|popunder|popads|juicyads)[^>]*>\s*(?:</script>)?`)
// adInlineRe matches inline <script> blocks that call ad popup functions.
var adInlineRe = regexp.MustCompile(`(?i)<script[^>]*>\s*(?:aclib\.run|popunder|pop_)\w*\([^)]*\);\s*</script>`)
// contextMenuBlockRe matches inline scripts that block right-click and dev tools shortcuts.
var contextMenuBlockRe = regexp.MustCompile(`(?i)<script[^>]*>\s*document\.addEventListener\(\s*'contextmenu'[\s\S]{0,500}?</script>`)
// debuggerStmtRe matches debugger statements in JavaScript.
var debuggerStmtRe = regexp.MustCompile(`\bdebugger\b\s*;?`)
// rewriteHTML replaces URLs and injects the JS shim to intercept runtime requests.
func rewriteHTML(body, origin, b64Origin string) string {
proxyPrefix := "/proxy/" + b64Origin
// 1. Rewrite absolute URLs matching the target origin
escaped := regexp.QuoteMeta(origin)
absRe := regexp.MustCompile(escaped + `(/[^"'\s>)]*)?`)
body = absRe.ReplaceAllStringFunc(body, func(match string) string {
path := strings.TrimPrefix(match, origin)
if path == "" {
path = "/"
}
return proxyPrefix + path
})
// 2. Rewrite root-relative URLs in HTML attributes (src="/...", href="/...", etc.)
// Skip URLs already rewritten by step 1 (starting with /proxy/)
body = rootRelativeAttrRe.ReplaceAllStringFunc(body, func(match string) string {
m := rootRelativeAttrRe.FindStringSubmatch(match)
if len(m) < 3 {
return match
}
// m[2] is the path after the leading "/", skip if already proxied
if strings.HasPrefix(m[2], "proxy/") {
return match
}
return m[1] + proxyPrefix + "/" + m[2]
})
// 3. Rewrite root-relative URLs in inline CSS url() references
// Skip URLs already rewritten by step 1 (starting with /proxy/)
body = rootRelativeCSSRe.ReplaceAllStringFunc(body, func(match string) string {
m := rootRelativeCSSRe.FindStringSubmatch(match)
if len(m) < 4 {
return match
}
if strings.HasPrefix(m[2], "proxy/") {
return match
}
return m[1] + proxyPrefix + "/" + m[2] + m[3]
})
// 4. Strip anti-debugging scripts (disable-devtool, devtools-detect)
body = disableDevtoolRe.ReplaceAllString(body, "")
// 4b. Strip ad/popup scripts and context menu blockers
body = adScriptRe.ReplaceAllString(body, "")
body = adInlineRe.ReplaceAllString(body, "")
body = contextMenuBlockRe.ReplaceAllString(body, "")
// 4c. Strip debugger statements from inline scripts
body = debuggerStmtRe.ReplaceAllString(body, "/* */")
// 5. Inject JS shim right after <head> to intercept fetch/XHR/WebSocket
shim := fmt.Sprintf(jsShimTemplate, b64Origin, origin)
headIdx := strings.Index(strings.ToLower(body), "<head>")
if headIdx != -1 {
insertPos := headIdx + len("<head>")
body = body[:insertPos] + shim + body[insertPos:]
} else {
// No <head> tag — prepend to body
body = shim + body
}
return body
}
// rewriteCSS replaces root-relative url() references in CSS to route through the proxy.
func rewriteCSS(body, origin, b64Origin string) string {
proxyPrefix := "/proxy/" + b64Origin
// Rewrite absolute URLs matching origin
escaped := regexp.QuoteMeta(origin)
absRe := regexp.MustCompile(escaped + `(/[^"'\s)]*)?`)
body = absRe.ReplaceAllStringFunc(body, func(match string) string {
path := strings.TrimPrefix(match, origin)
if path == "" {
path = "/"
}
return proxyPrefix + path
})
// Rewrite root-relative url() references, skip already-proxied
body = rootRelativeCSSRe.ReplaceAllStringFunc(body, func(match string) string {
m := rootRelativeCSSRe.FindStringSubmatch(match)
if len(m) < 4 {
return match
}
if strings.HasPrefix(m[2], "proxy/") {
return match
}
return m[1] + proxyPrefix + "/" + m[2] + m[3]
})
return body
}
// decodeProxyReferer takes the client's Referer (which points to a proxy URL)
// and decodes it back to the original upstream URL. This is critical for
// cross-origin requests where the upstream checks the Referer (e.g. HLS servers).
// Falls back to origin+"/" if decoding fails.
func decodeProxyReferer(clientReferer, fallbackOrigin string) string {
if clientReferer == "" {
return fallbackOrigin + "/"
}
// Find /proxy/ in the Referer URL path
idx := strings.Index(clientReferer, "/proxy/")
if idx == -1 {
return fallbackOrigin + "/"
}
// Extract everything after /proxy/
rest := clientReferer[idx+len("/proxy/"):]
if rest == "" {
return fallbackOrigin + "/"
}
// Split into base64 segment and remaining path
slashIdx := strings.Index(rest, "/")
var b64Seg, pathPart string
if slashIdx == -1 {
b64Seg = rest
pathPart = "/"
} else {
b64Seg = rest[:slashIdx]
pathPart = rest[slashIdx:]
}
// Decode the base64 origin
originBytes, err := base64.RawURLEncoding.DecodeString(b64Seg)
if err != nil {
originBytes, err = base64.StdEncoding.DecodeString(b64Seg)
if err != nil {
return fallbackOrigin + "/"
}
}
return string(originBytes) + pathPart
}

View file

@ -1,327 +0,0 @@
package scraper
import (
"crypto/rand"
"encoding/json"
"fmt"
"io"
"log"
"math"
"net/http"
"net/url"
"regexp"
"strings"
"time"
"f1-stream/internal/models"
)
const (
subredditURL = "https://www.reddit.com/r/motorsportsstreams2/new.json?limit=25"
userAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
requestDelay = 1 * time.Second
)
var (
urlRe = regexp.MustCompile(`https?://[^\s\)\]\>"]+`)
// Keywords in post title that indicate F1 content (matched case-insensitively)
f1Keywords = []string{
"f1",
"formula 1",
"formula one",
"formula1",
"grand prix",
"gp qualifying",
"gp race",
"gp sprint",
"gp practice",
}
f1NegativeKeywords = []string{
"f1 key",
"function 1",
"help f1",
}
// URLs to filter out (not stream sources)
filteredDomains = map[string]bool{
"reddit.com": true,
"www.reddit.com": true,
"imgur.com": true,
"i.imgur.com": true,
"redd.it": true,
"i.redd.it": true,
"v.redd.it": true,
"youtu.be": true,
"youtube.com": true,
"twitter.com": true,
"x.com": true,
}
)
type redditListing struct {
Data struct {
Children []struct {
Data struct {
Title string `json:"title"`
SelfText string `json:"selftext"`
Permalink string `json:"permalink"`
CreatedUTC float64 `json:"created_utc"`
} `json:"data"`
} `json:"children"`
} `json:"data"`
}
type redditComments []struct {
Data struct {
Children []struct {
Data struct {
Body string `json:"body"`
Replies json.RawMessage `json:"replies"`
} `json:"data"`
} `json:"children"`
} `json:"data"`
}
func scrapeReddit() ([]models.ScrapedLink, error) {
client := &http.Client{Timeout: 15 * time.Second}
var allLinks []models.ScrapedLink
seen := make(map[string]bool)
log.Printf("scraper: fetching listing from %s", subredditURL)
listing, err := fetchJSON[redditListing](client, subredditURL)
if err != nil {
return nil, fmt.Errorf("fetch listing: %w", err)
}
totalPosts := len(listing.Data.Children)
matchedPosts := 0
log.Printf("scraper: got %d posts from listing", totalPosts)
for _, child := range listing.Data.Children {
post := child.Data
if !isF1Post(post.Title) {
log.Printf("scraper: skipped post: %s", truncate(post.Title, 60))
continue
}
matchedPosts++
log.Printf("scraper: matched post: %s", truncate(post.Title, 60))
selftextLinks := extractURLs(post.SelfText, post.Title)
log.Printf("scraper: extracted %d URLs from selftext of %q", len(selftextLinks), truncate(post.Title, 40))
for _, link := range selftextLinks {
norm := normalizeURL(link.URL)
if !seen[norm] {
seen[norm] = true
allLinks = append(allLinks, link)
}
}
time.Sleep(requestDelay)
commentsURL := fmt.Sprintf("https://www.reddit.com%s.json", post.Permalink)
comments, err := fetchJSONWithRetry[redditComments](client, commentsURL, 3)
if err != nil {
log.Printf("scraper: failed to fetch comments for %s: %v", post.Permalink, err)
continue
}
commentURLCount := 0
walkComments(*comments, func(body string) {
links := extractURLs(body, post.Title)
commentURLCount += len(links)
for _, link := range links {
norm := normalizeURL(link.URL)
if !seen[norm] {
seen[norm] = true
allLinks = append(allLinks, link)
}
}
})
log.Printf("scraper: extracted %d URLs from comments of %q", commentURLCount, truncate(post.Title, 40))
time.Sleep(requestDelay)
}
log.Printf("scraper: summary — matched %d/%d posts, extracted %d unique URLs", matchedPosts, totalPosts, len(allLinks))
return allLinks, nil
}
func fetchJSON[T any](client *http.Client, rawURL string) (*T, error) {
req, err := http.NewRequest("GET", rawURL, nil)
if err != nil {
return nil, err
}
req.Header.Set("User-Agent", userAgent)
resp, err := client.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
log.Printf("scraper: GET %s -> %d", truncate(rawURL, 80), resp.StatusCode)
if resp.StatusCode != 200 {
return nil, fmt.Errorf("status %d", resp.StatusCode)
}
body, err := io.ReadAll(io.LimitReader(resp.Body, 5*1024*1024))
if err != nil {
return nil, err
}
var result T
if err := json.Unmarshal(body, &result); err != nil {
return nil, err
}
return &result, nil
}
func fetchJSONWithRetry[T any](client *http.Client, rawURL string, maxRetries int) (*T, error) {
var lastErr error
for attempt := 0; attempt <= maxRetries; attempt++ {
result, err := fetchJSON[T](client, rawURL)
if err == nil {
return result, nil
}
lastErr = err
errMsg := err.Error()
if strings.Contains(errMsg, "status 429") {
log.Printf("scraper: rate limited on %s, backing off 30s", truncate(rawURL, 60))
time.Sleep(30 * time.Second)
continue
}
if strings.Contains(errMsg, "status 502") || strings.Contains(errMsg, "status 503") {
backoff := time.Duration(math.Pow(2, float64(attempt))) * time.Second
log.Printf("scraper: server error on %s, retry %d/%d in %v", truncate(rawURL, 60), attempt+1, maxRetries, backoff)
time.Sleep(backoff)
continue
}
return nil, err
}
return nil, fmt.Errorf("after %d retries: %w", maxRetries, lastErr)
}
// deobfuscateText normalises obfuscated URLs commonly posted on Reddit to
// evade auto-moderation. Examples:
// - "pitsport . xyz/watch/f1" → "https://pitsport.xyz/watch/f1"
// - "dlhd dot link" → "https://dlhd.link"
func deobfuscateText(text string) string {
// Common TLDs used in streaming links.
tlds := `(?:com|net|org|xyz|link|info|live|tv|me|cc|to|io|co|stream|site|fun|top|club|watch|racing)`
// 1. Replace " dot " (case-insensitive) between word-like parts that
// look like domain components: "dlhd dot link" → "dlhd.link"
dotWord := regexp.MustCompile(`(?i)(\b\w[\w-]*)\s+dot\s+(` + tlds + `\b)`)
text = dotWord.ReplaceAllString(text, "${1}.${2}")
// 2. Collapse spaces around dots in domain-like strings:
// "pitsport . xyz" → "pitsport.xyz"
spaceDot := regexp.MustCompile(`(\b\w[\w-]*)\s*\.\s*(` + tlds + `\b)`)
text = spaceDot.ReplaceAllString(text, "${1}.${2}")
// 3. Prepend https:// to bare domain-like strings that the URL regex
// would otherwise miss (no scheme present).
bareDomain := regexp.MustCompile(`(?:^|[\s(>\[])(\w[\w-]*\.` + tlds + `(?:/[^\s)\]<"]*)?)`)
text = bareDomain.ReplaceAllStringFunc(text, func(m string) string {
// Preserve the leading whitespace/punctuation character.
trimmed := strings.TrimLeft(m, " \t\n(>[")
prefix := m[:len(m)-len(trimmed)]
if strings.HasPrefix(trimmed, "http://") || strings.HasPrefix(trimmed, "https://") {
return m
}
return prefix + "https://" + trimmed
})
return text
}
func extractURLs(text, postTitle string) []models.ScrapedLink {
text = deobfuscateText(text)
matches := urlRe.FindAllString(text, -1)
var links []models.ScrapedLink
filtered := 0
for _, u := range matches {
u = strings.TrimRight(u, ".,;:!?)")
parsed, err := url.Parse(u)
if err != nil {
continue
}
if filteredDomains[parsed.Hostname()] {
filtered++
continue
}
id := make([]byte, 16)
if _, err := rand.Read(id); err != nil {
continue
}
links = append(links, models.ScrapedLink{
ID: fmt.Sprintf("%x", id),
URL: u,
Title: postTitle,
Source: "r/motorsportsstreams2",
ScrapedAt: time.Now(),
})
}
if filtered > 0 {
log.Printf("scraper: filtered %d URLs from known domains in %q", filtered, truncate(postTitle, 40))
}
return links
}
func walkComments(comments redditComments, fn func(string)) {
for _, listing := range comments {
for _, child := range listing.Data.Children {
if child.Data.Body != "" {
fn(child.Data.Body)
}
// Recurse into replies
if len(child.Data.Replies) > 0 && child.Data.Replies[0] == '{' {
var nested redditComments
if err := json.Unmarshal([]byte("["+string(child.Data.Replies)+"]"), &nested); err == nil {
walkComments(nested, fn)
}
}
}
}
}
func normalizeURL(u string) string {
parsed, err := url.Parse(u)
if err != nil {
return strings.ToLower(u)
}
parsed.Host = strings.ToLower(parsed.Host)
path := strings.TrimRight(parsed.Path, "/")
return fmt.Sprintf("%s://%s%s", parsed.Scheme, parsed.Host, path)
}
func isF1Post(title string) bool {
lower := strings.ToLower(title)
for _, neg := range f1NegativeKeywords {
if strings.Contains(lower, neg) {
return false
}
}
for _, kw := range f1Keywords {
if strings.Contains(lower, kw) {
return true
}
}
return false
}
func truncate(s string, maxLen int) string {
if len(s) <= maxLen {
return s
}
return s[:maxLen] + "..."
}

View file

@ -1,105 +0,0 @@
package scraper
import (
"context"
"log"
"sync"
"time"
"f1-stream/internal/models"
"f1-stream/internal/store"
)
type Scraper struct {
store *store.Store
interval time.Duration
validateTimeout time.Duration
mu sync.Mutex
}
func New(s *store.Store, interval time.Duration, validateTimeout time.Duration) *Scraper {
return &Scraper{store: s, interval: interval, validateTimeout: validateTimeout}
}
func (s *Scraper) Run(ctx context.Context) {
log.Printf("scraper: starting with interval %v", s.interval)
// Run immediately on start
s.scrape()
ticker := time.NewTicker(s.interval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
log.Println("scraper: shutting down")
return
case <-ticker.C:
s.scrape()
}
}
}
func (s *Scraper) TriggerScrape() {
go s.scrape()
}
func (s *Scraper) scrape() {
s.mu.Lock()
defer s.mu.Unlock()
start := time.Now()
log.Println("scraper: starting scrape")
links, err := scrapeReddit()
if err != nil {
log.Printf("scraper: error after %v: %v", time.Since(start).Round(time.Millisecond), err)
return
}
log.Printf("scraper: reddit scrape completed in %v, got %d links", time.Since(start).Round(time.Millisecond), len(links))
// Merge with existing links, filtering out non-F1 entries
existing, err := s.store.LoadScrapedLinks()
if err != nil {
log.Printf("scraper: failed to load existing links: %v", err)
existing = nil
}
seen := make(map[string]bool)
var filtered []models.ScrapedLink
for _, l := range existing {
if !isF1Post(l.Title) {
continue
}
norm := normalizeURL(l.URL)
seen[norm] = true
filtered = append(filtered, l)
}
existing = filtered
added := 0
for _, l := range links {
norm := normalizeURL(l.URL)
if !seen[norm] {
existing = append(existing, l)
seen[norm] = true
added++
}
}
if err := s.store.SaveScrapedLinks(existing); err != nil {
log.Printf("scraper: failed to save: %v", err)
return
}
// Auto-publish newly validated links as streams
for _, l := range links {
if err := s.store.PublishScrapedStream(l.URL, l.Title); err != nil {
u := l.URL
if len(u) > 80 {
u = u[:80] + "..."
}
log.Printf("scraper: failed to auto-publish %s: %v", u, err)
}
}
log.Printf("scraper: done in %v, added %d new links (total: %d)", time.Since(start).Round(time.Millisecond), added, len(existing))
}

View file

@ -1,142 +0,0 @@
package scraper
import (
"io"
"log"
"net/http"
"strings"
"time"
"f1-stream/internal/models"
)
// videoMarkers are substrings checked (case-insensitively) against the HTML
// body to detect the presence of a video player or streaming manifest.
var videoMarkers = []string{
// HTML5 video element
"<video",
// HLS manifests
".m3u8",
"application/x-mpegurl",
"application/vnd.apple.mpegurl",
// DASH manifests
".mpd",
"application/dash+xml",
// Player libraries
"hls.js",
"hls.min.js",
"dash.js",
"dash.all.min.js",
"video.js",
"video.min.js",
"videojs",
"jwplayer",
"clappr",
"flowplayer",
"plyr",
"shaka-player",
"mediaelement",
"fluidplayer",
}
// videoContentTypes are Content-Type prefixes/substrings that indicate a
// direct video response (no HTML inspection needed).
var videoContentTypes = []string{
"video/",
"application/x-mpegurl",
"application/vnd.apple.mpegurl",
"application/dash+xml",
}
// validateBodyLimit caps how much HTML we read when looking for markers.
const validateBodyLimit = 2 * 1024 * 1024 // 2 MB
// validateLinks fetches each link and keeps only those whose response
// contains video/player content markers.
func validateLinks(links []models.ScrapedLink, timeout time.Duration) []models.ScrapedLink {
client := &http.Client{
Timeout: timeout,
CheckRedirect: func(req *http.Request, via []*http.Request) error {
if len(via) >= 3 {
return http.ErrUseLastResponse
}
return nil
},
}
var kept []models.ScrapedLink
for _, link := range links {
if HasVideoContent(client, link.URL) {
kept = append(kept, link)
} else {
log.Printf("scraper: discarded %s (no video markers)", truncate(link.URL, 60))
}
}
return kept
}
// HasVideoContent performs a GET request for rawURL and returns true if the
// response is a direct video file (by Content-Type) or an HTML page that
// contains at least one video marker substring.
func HasVideoContent(client *http.Client, rawURL string) bool {
req, err := http.NewRequest("GET", rawURL, nil)
if err != nil {
log.Printf("scraper: validate request error for %s: %v", truncate(rawURL, 60), err)
return false
}
req.Header.Set("User-Agent", userAgent)
resp, err := client.Do(req)
if err != nil {
log.Printf("scraper: validate fetch error for %s: %v", truncate(rawURL, 60), err)
return false
}
defer resp.Body.Close()
if resp.StatusCode < 200 || resp.StatusCode >= 400 {
return false
}
ct := strings.ToLower(resp.Header.Get("Content-Type"))
// Direct video content type — no need to inspect body.
if isDirectVideoContentType(ct) {
return true
}
// Only inspect HTML pages for markers.
if !strings.Contains(ct, "text/html") && !strings.Contains(ct, "application/xhtml") {
return false
}
body, err := io.ReadAll(io.LimitReader(resp.Body, validateBodyLimit))
if err != nil {
log.Printf("scraper: validate read error for %s: %v", truncate(rawURL, 60), err)
return false
}
return containsVideoMarkers(strings.ToLower(string(body)))
}
// containsVideoMarkers returns true if loweredBody contains any known video
// player or streaming marker substring.
func containsVideoMarkers(loweredBody string) bool {
for _, marker := range videoMarkers {
if strings.Contains(loweredBody, marker) {
return true
}
}
return false
}
// isDirectVideoContentType returns true if ct (already lowercased) matches a
// known video content type.
func isDirectVideoContentType(ct string) bool {
ct = strings.ToLower(ct)
for _, vct := range videoContentTypes {
if strings.Contains(ct, vct) {
return true
}
}
return false
}

View file

@ -1,124 +0,0 @@
package scraper
import "testing"
func TestContainsVideoMarkers(t *testing.T) {
tests := []struct {
name string
body string
want bool
}{
// Positive cases
{
name: "video tag",
body: `<div><video src="stream.mp4"></video></div>`,
want: true,
},
{
name: "HLS manifest reference",
body: `var url = "https://cdn.example.com/live.m3u8";`,
want: true,
},
{
name: "DASH manifest reference",
body: `<source src="stream.mpd" type="application/dash+xml">`,
want: true,
},
{
name: "HLS.js library",
body: `<script src="/js/hls.min.js"></script>`,
want: true,
},
{
name: "Video.js library",
body: `<script src="https://cdn.example.com/video.js"></script>`,
want: true,
},
{
name: "JW Player",
body: `<div id="jwplayer-container"></div><script>jwplayer("jwplayer-container")</script>`,
want: true,
},
{
name: "Clappr player",
body: `<script src="clappr.min.js"></script>`,
want: true,
},
{
name: "Flowplayer",
body: `<script>flowplayer("#player")</script>`,
want: true,
},
{
name: "Plyr player",
body: `<link rel="stylesheet" href="plyr.css"><script src="plyr.js"></script>`,
want: true,
},
{
name: "Shaka Player",
body: `<script src="shaka-player.compiled.js"></script>`,
want: true,
},
// Negative cases
{
name: "plain HTML",
body: `<html><body><p>Hello world</p></body></html>`,
want: false,
},
{
name: "reddit link page",
body: `<html><body><a href="https://example.com">Click here</a></body></html>`,
want: false,
},
{
name: "blog post",
body: `<html><body><article>F1 race results and analysis...</article></body></html>`,
want: false,
},
{
name: "empty string",
body: "",
want: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := containsVideoMarkers(tt.body)
if got != tt.want {
t.Errorf("containsVideoMarkers(%q) = %v, want %v", truncate(tt.body, 60), got, tt.want)
}
})
}
}
func TestIsDirectVideoContentType(t *testing.T) {
tests := []struct {
name string
ct string
want bool
}{
// Positive cases
{name: "video/mp4", ct: "video/mp4", want: true},
{name: "video/webm", ct: "video/webm", want: true},
{name: "HLS content type", ct: "application/x-mpegurl", want: true},
{name: "Apple HLS content type", ct: "application/vnd.apple.mpegurl", want: true},
{name: "DASH content type", ct: "application/dash+xml", want: true},
{name: "video with params", ct: "video/mp4; charset=utf-8", want: true},
// Negative cases
{name: "text/html", ct: "text/html", want: false},
{name: "application/json", ct: "application/json", want: false},
{name: "image/png", ct: "image/png", want: false},
{name: "text/plain", ct: "text/plain", want: false},
{name: "empty string", ct: "", want: false},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := isDirectVideoContentType(tt.ct)
if got != tt.want {
t.Errorf("isDirectVideoContentType(%q) = %v, want %v", tt.ct, got, tt.want)
}
})
}
}

View file

@ -1,93 +0,0 @@
package server
import (
"log"
"net/http"
"strings"
"f1-stream/internal/auth"
)
func LoggingMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
log.Printf("%s %s %s", r.Method, r.URL.Path, r.RemoteAddr)
next.ServeHTTP(w, r)
})
}
func RecoveryMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
defer func() {
if err := recover(); err != nil {
log.Printf("panic: %v", err)
http.Error(w, "internal server error", http.StatusInternalServerError)
}
}()
next.ServeHTTP(w, r)
})
}
// AuthMiddleware injects user into context if session cookie is present.
func AuthMiddleware(a *auth.Auth) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
cookie, err := r.Cookie("session")
if err == nil && cookie.Value != "" {
user, err := a.GetSessionUser(cookie.Value)
if err == nil && user != nil {
r = r.WithContext(auth.ContextWithUser(r.Context(), user))
}
}
next.ServeHTTP(w, r)
})
}
}
// RequireAuth rejects unauthenticated requests.
func RequireAuth(next http.HandlerFunc) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
user := auth.UserFromContext(r.Context())
if user == nil {
http.Error(w, `{"error":"authentication required"}`, http.StatusUnauthorized)
return
}
next(w, r)
}
}
// RequireAdmin rejects non-admin requests.
func RequireAdmin(next http.HandlerFunc) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
user := auth.UserFromContext(r.Context())
if user == nil || !user.IsAdmin {
http.Error(w, `{"error":"admin access required"}`, http.StatusForbidden)
return
}
next(w, r)
}
}
// OriginCheck validates Origin header on mutation requests (CSRF protection).
func OriginCheck(allowedOrigins []string) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method != "GET" && r.Method != "HEAD" && r.Method != "OPTIONS" {
origin := r.Header.Get("Origin")
if origin != "" {
allowed := false
for _, o := range allowedOrigins {
if strings.EqualFold(origin, o) {
allowed = true
break
}
}
if !allowed {
http.Error(w, `{"error":"origin not allowed"}`, http.StatusForbidden)
return
}
}
}
next.ServeHTTP(w, r)
})
}
}

View file

@ -1,338 +0,0 @@
package server
import (
"encoding/json"
"html"
"log"
"net/http"
"strings"
"f1-stream/internal/auth"
"f1-stream/internal/extractor"
"f1-stream/internal/hlsproxy"
"f1-stream/internal/playerconfig"
"f1-stream/internal/proxy"
"f1-stream/internal/scraper"
"f1-stream/internal/store"
)
type Server struct {
store *store.Store
auth *auth.Auth
scraper *scraper.Scraper
playerConfig *playerconfig.Service
mux *http.ServeMux
headlessEnabled bool
}
func New(s *store.Store, a *auth.Auth, sc *scraper.Scraper, pc *playerconfig.Service, origins []string, headlessEnabled bool) *Server {
srv := &Server{
store: s,
auth: a,
scraper: sc,
playerConfig: pc,
mux: http.NewServeMux(),
headlessEnabled: headlessEnabled,
}
srv.registerRoutes(origins)
return srv
}
func (s *Server) Handler() http.Handler {
return s.mux
}
func (s *Server) registerRoutes(origins []string) {
// Apply middleware chain
authMw := AuthMiddleware(s.auth)
originMw := OriginCheck(origins)
// Static files
fs := http.FileServer(http.Dir("static"))
s.mux.Handle("GET /static/", http.StripPrefix("/static/", fs))
s.mux.HandleFunc("GET /", func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/" {
http.NotFound(w, r)
return
}
http.ServeFile(w, r, "static/index.html")
})
// Health
s.mux.HandleFunc("GET /api/health", func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "application/json")
w.Write([]byte(`{"status":"ok"}`))
})
// Reverse proxy for iframe embedding (strips anti-framing headers)
proxyHandler := proxy.NewHandler()
s.mux.Handle("GET /proxy/", proxyHandler)
s.mux.Handle("POST /proxy/", proxyHandler)
s.mux.Handle("HEAD /proxy/", proxyHandler)
s.mux.Handle("OPTIONS /proxy/", proxyHandler)
// HLS proxy for native video playback
hlsHandler := hlsproxy.NewHandler()
s.mux.Handle("GET /hls/", hlsHandler)
s.mux.Handle("OPTIONS /hls/", hlsHandler)
// Public API - wrap with middleware
wrapAll := func(h http.HandlerFunc) http.Handler {
return RecoveryMiddleware(LoggingMiddleware(originMw(authMw(h))))
}
// Auth endpoints
s.mux.Handle("POST /api/auth/register/begin", wrapAll(s.auth.BeginRegistration))
s.mux.Handle("POST /api/auth/register/finish", wrapAll(s.auth.FinishRegistration))
s.mux.Handle("POST /api/auth/login/begin", wrapAll(s.auth.BeginLogin))
s.mux.Handle("POST /api/auth/login/finish", wrapAll(s.auth.FinishLogin))
s.mux.Handle("POST /api/auth/logout", wrapAll(s.auth.Logout))
s.mux.Handle("GET /api/auth/me", wrapAll(s.auth.Me))
// Public streams
s.mux.Handle("GET /api/streams/public", wrapAll(s.handlePublicStreams))
s.mux.Handle("GET /api/streams/{id}/browse", wrapAll(s.handleBrowseStream))
s.mux.Handle("GET /api/streams/{id}/player-config", wrapAll(s.handlePlayerConfig))
// Scraped links
s.mux.Handle("GET /api/scraped", wrapAll(s.handleScrapedLinks))
s.mux.Handle("POST /api/scraped/refresh", wrapAll(s.handleTriggerScrape))
s.mux.Handle("POST /api/scraped/{id}/import", wrapAll(s.handleImportScraped))
// Authenticated endpoints
s.mux.Handle("GET /api/streams/mine", wrapAll(RequireAuth(s.handleMyStreams)))
s.mux.Handle("POST /api/streams", wrapAll(s.handleSubmitStream))
s.mux.Handle("DELETE /api/streams/{id}", wrapAll(s.handleDeleteStream))
// Admin endpoints
s.mux.Handle("PUT /api/streams/{id}/publish", wrapAll(RequireAdmin(s.handleTogglePublish)))
s.mux.Handle("GET /api/admin/streams", wrapAll(RequireAdmin(s.handleAllStreams)))
s.mux.Handle("POST /api/admin/scrape", wrapAll(RequireAdmin(s.handleTriggerScrape)))
}
func (s *Server) handlePublicStreams(w http.ResponseWriter, r *http.Request) {
streams, err := s.store.PublicStreams()
if err != nil {
http.Error(w, `{"error":"internal error"}`, http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(streams)
}
func (s *Server) handleScrapedLinks(w http.ResponseWriter, r *http.Request) {
links, err := s.store.GetActiveScrapedLinks()
if err != nil {
http.Error(w, `{"error":"internal error"}`, http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
if links == nil {
w.Write([]byte("[]"))
return
}
json.NewEncoder(w).Encode(links)
}
func (s *Server) handleImportScraped(w http.ResponseWriter, r *http.Request) {
id := r.PathValue("id")
link, err := s.store.GetScrapedLinkByID(id)
if err != nil {
http.Error(w, `{"error":"not found"}`, http.StatusNotFound)
return
}
if err := s.store.PublishScrapedStream(link.URL, link.Title); err != nil {
http.Error(w, `{"error":"internal error"}`, http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
w.Write([]byte(`{"ok":true}`))
}
func (s *Server) handleMyStreams(w http.ResponseWriter, r *http.Request) {
user := auth.UserFromContext(r.Context())
streams, err := s.store.UserStreams(user.ID)
if err != nil {
http.Error(w, `{"error":"internal error"}`, http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
if streams == nil {
w.Write([]byte("[]"))
return
}
json.NewEncoder(w).Encode(streams)
}
func (s *Server) handleSubmitStream(w http.ResponseWriter, r *http.Request) {
user := auth.UserFromContext(r.Context())
var req struct {
URL string `json:"url"`
Title string `json:"title"`
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, `{"error":"invalid request"}`, http.StatusBadRequest)
return
}
req.URL = strings.TrimSpace(req.URL)
req.Title = strings.TrimSpace(req.Title)
if req.URL == "" {
http.Error(w, `{"error":"url required"}`, http.StatusBadRequest)
return
}
if len(req.URL) > 2048 {
http.Error(w, `{"error":"url too long"}`, http.StatusBadRequest)
return
}
if !strings.HasPrefix(req.URL, "https://") && !strings.HasPrefix(req.URL, "http://") {
http.Error(w, `{"error":"url must start with http:// or https://"}`, http.StatusBadRequest)
return
}
if req.Title == "" {
req.Title = req.URL
}
req.Title = html.EscapeString(req.Title)
submittedBy := "anonymous"
published := true
if user != nil {
submittedBy = user.ID
published = false
}
stream, err := s.store.AddStream(req.URL, req.Title, submittedBy, published, "user")
if err != nil {
http.Error(w, `{"error":"internal error"}`, http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusCreated)
json.NewEncoder(w).Encode(stream)
}
func (s *Server) handleDeleteStream(w http.ResponseWriter, r *http.Request) {
user := auth.UserFromContext(r.Context())
id := r.PathValue("id")
var userID string
var isAdmin bool
if user != nil {
userID = user.ID
isAdmin = user.IsAdmin
}
if err := s.store.DeleteStream(id, userID, isAdmin); err != nil {
if strings.Contains(err.Error(), "not authorized") {
http.Error(w, `{"error":"not authorized"}`, http.StatusForbidden)
return
}
if strings.Contains(err.Error(), "not found") {
http.Error(w, `{"error":"not found"}`, http.StatusNotFound)
return
}
http.Error(w, `{"error":"internal error"}`, http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
w.Write([]byte(`{"ok":true}`))
}
func (s *Server) handleTogglePublish(w http.ResponseWriter, r *http.Request) {
id := r.PathValue("id")
if err := s.store.TogglePublish(id); err != nil {
http.Error(w, `{"error":"stream not found"}`, http.StatusNotFound)
return
}
w.Header().Set("Content-Type", "application/json")
w.Write([]byte(`{"ok":true}`))
}
func (s *Server) handleAllStreams(w http.ResponseWriter, r *http.Request) {
streams, err := s.store.LoadStreams()
if err != nil {
http.Error(w, `{"error":"internal error"}`, http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(streams)
}
func (s *Server) handleTriggerScrape(w http.ResponseWriter, r *http.Request) {
s.scraper.TriggerScrape()
w.Header().Set("Content-Type", "application/json")
w.Write([]byte(`{"ok":true,"message":"scrape triggered"}`))
}
func (s *Server) handleBrowseStream(w http.ResponseWriter, r *http.Request) {
if !s.headlessEnabled {
http.Error(w, `{"error":"browser sessions not available"}`, http.StatusNotFound)
return
}
id := r.PathValue("id")
streams, err := s.store.LoadStreams()
if err != nil {
log.Printf("server: browse: failed to load streams: %v", err)
http.Error(w, `{"error":"internal error"}`, http.StatusInternalServerError)
return
}
var streamURL string
var found bool
for _, st := range streams {
if st.ID == id {
if !st.Published {
http.Error(w, `{"error":"stream not found"}`, http.StatusNotFound)
return
}
streamURL = st.URL
found = true
break
}
}
if !found {
http.Error(w, `{"error":"stream not found"}`, http.StatusNotFound)
return
}
extractor.HandleBrowserSession(w, r, streamURL)
}
func (s *Server) handlePlayerConfig(w http.ResponseWriter, r *http.Request) {
id := r.PathValue("id")
streams, err := s.store.LoadStreams()
if err != nil {
log.Printf("server: player-config: failed to load streams: %v", err)
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(playerconfig.PlayerConfig{Type: "proxy"})
return
}
var streamURL string
var found bool
for _, st := range streams {
if st.ID == id {
if !st.Published {
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(playerconfig.PlayerConfig{Type: "proxy"})
return
}
streamURL = st.URL
found = true
break
}
}
if !found {
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(playerconfig.PlayerConfig{Type: "proxy"})
return
}
config := s.playerConfig.GetConfig(r.Context(), streamURL)
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(config)
}

View file

@ -1,37 +0,0 @@
package store
import (
"f1-stream/internal/models"
)
func (s *Store) LoadHealthStates() ([]models.HealthState, error) {
s.healthMu.RLock()
defer s.healthMu.RUnlock()
var states []models.HealthState
if err := readJSON(s.filePath("health_state.json"), &states); err != nil {
return nil, err
}
return states, nil
}
func (s *Store) SaveHealthStates(states []models.HealthState) error {
s.healthMu.Lock()
defer s.healthMu.Unlock()
return writeJSON(s.filePath("health_state.json"), states)
}
// HealthMap returns a map of URL -> Healthy status. It reads the health state
// file directly without acquiring healthMu to avoid deadlock when called from
// methods that already hold other locks (e.g., PublicStreams, GetActiveScrapedLinks).
// URLs not present in the map are implicitly healthy.
func (s *Store) HealthMap() map[string]bool {
var states []models.HealthState
if err := readJSON(s.filePath("health_state.json"), &states); err != nil {
return make(map[string]bool)
}
m := make(map[string]bool, len(states))
for _, st := range states {
m[st.URL] = st.Healthy
}
return m
}

View file

@ -1,63 +0,0 @@
package store
import (
"fmt"
"time"
"f1-stream/internal/models"
)
func (s *Store) LoadScrapedLinks() ([]models.ScrapedLink, error) {
s.scrapedMu.RLock()
defer s.scrapedMu.RUnlock()
var links []models.ScrapedLink
if err := readJSON(s.filePath("scraped_links.json"), &links); err != nil {
return nil, err
}
return links, nil
}
func (s *Store) SaveScrapedLinks(links []models.ScrapedLink) error {
s.scrapedMu.Lock()
defer s.scrapedMu.Unlock()
return writeJSON(s.filePath("scraped_links.json"), links)
}
func (s *Store) GetScrapedLinkByID(id string) (models.ScrapedLink, error) {
s.scrapedMu.RLock()
defer s.scrapedMu.RUnlock()
var links []models.ScrapedLink
if err := readJSON(s.filePath("scraped_links.json"), &links); err != nil {
return models.ScrapedLink{}, err
}
for _, l := range links {
if l.ID == id {
return l, nil
}
}
return models.ScrapedLink{}, fmt.Errorf("not found")
}
func (s *Store) GetActiveScrapedLinks() ([]models.ScrapedLink, error) {
s.scrapedMu.RLock()
defer s.scrapedMu.RUnlock()
var links []models.ScrapedLink
if err := readJSON(s.filePath("scraped_links.json"), &links); err != nil {
return nil, err
}
healthMap := s.HealthMap()
now := time.Now()
var active []models.ScrapedLink
for _, l := range links {
l.Stale = now.Sub(l.ScrapedAt) > 7*24*time.Hour
if l.Stale {
continue
}
// Filter unhealthy scraped links. URLs not in healthMap are assumed healthy.
if healthy, exists := healthMap[l.URL]; exists && !healthy {
continue
}
active = append(active, l)
}
return active, nil
}

View file

@ -1,98 +0,0 @@
package store
import (
"crypto/rand"
"encoding/hex"
"fmt"
"time"
"f1-stream/internal/models"
)
func (s *Store) LoadSessions() ([]models.Session, error) {
s.sessionsMu.RLock()
defer s.sessionsMu.RUnlock()
var sessions []models.Session
if err := readJSON(s.filePath("sessions.json"), &sessions); err != nil {
return nil, err
}
return sessions, nil
}
func (s *Store) CreateSession(userID string, ttl time.Duration) (string, error) {
s.sessionsMu.Lock()
defer s.sessionsMu.Unlock()
var sessions []models.Session
if err := readJSON(s.filePath("sessions.json"), &sessions); err != nil {
return "", err
}
b := make([]byte, 32)
if _, err := rand.Read(b); err != nil {
return "", err
}
token := hex.EncodeToString(b)
sess := models.Session{
Token: token,
UserID: userID,
ExpiresAt: time.Now().Add(ttl),
}
sessions = append(sessions, sess)
if err := writeJSON(s.filePath("sessions.json"), sessions); err != nil {
return "", err
}
return token, nil
}
func (s *Store) GetSession(token string) (*models.Session, error) {
s.sessionsMu.RLock()
defer s.sessionsMu.RUnlock()
var sessions []models.Session
if err := readJSON(s.filePath("sessions.json"), &sessions); err != nil {
return nil, err
}
for _, sess := range sessions {
if sess.Token == token && time.Now().Before(sess.ExpiresAt) {
return &sess, nil
}
}
return nil, nil
}
func (s *Store) DeleteSession(token string) error {
s.sessionsMu.Lock()
defer s.sessionsMu.Unlock()
var sessions []models.Session
if err := readJSON(s.filePath("sessions.json"), &sessions); err != nil {
return err
}
var updated []models.Session
found := false
for _, sess := range sessions {
if sess.Token == token {
found = true
continue
}
updated = append(updated, sess)
}
if !found {
return fmt.Errorf("session not found")
}
return writeJSON(s.filePath("sessions.json"), updated)
}
func (s *Store) CleanExpiredSessions() error {
s.sessionsMu.Lock()
defer s.sessionsMu.Unlock()
var sessions []models.Session
if err := readJSON(s.filePath("sessions.json"), &sessions); err != nil {
return err
}
now := time.Now()
var valid []models.Session
for _, sess := range sessions {
if now.Before(sess.ExpiresAt) {
valid = append(valid, sess)
}
}
return writeJSON(s.filePath("sessions.json"), valid)
}

View file

@ -1,53 +0,0 @@
package store
import (
"encoding/json"
"os"
"path/filepath"
"sync"
)
type Store struct {
dir string
streamsMu sync.RWMutex
usersMu sync.RWMutex
scrapedMu sync.RWMutex
sessionsMu sync.RWMutex
healthMu sync.RWMutex
}
func New(dir string) (*Store, error) {
if err := os.MkdirAll(dir, 0755); err != nil {
return nil, err
}
return &Store{dir: dir}, nil
}
func (s *Store) filePath(name string) string {
return filepath.Join(s.dir, name)
}
// readJSON reads a JSON file into the target. Returns nil if file doesn't exist.
func readJSON(path string, target interface{}) error {
data, err := os.ReadFile(path)
if err != nil {
if os.IsNotExist(err) {
return nil
}
return err
}
return json.Unmarshal(data, target)
}
// writeJSON atomically writes target as JSON to path using temp-file-then-rename.
func writeJSON(path string, data interface{}) error {
b, err := json.MarshalIndent(data, "", " ")
if err != nil {
return err
}
tmp := path + ".tmp"
if err := os.WriteFile(tmp, b, 0644); err != nil {
return err
}
return os.Rename(tmp, path)
}

View file

@ -1,176 +0,0 @@
package store
import (
"crypto/rand"
"encoding/hex"
"fmt"
"time"
"f1-stream/internal/models"
)
func (s *Store) LoadStreams() ([]models.Stream, error) {
s.streamsMu.RLock()
defer s.streamsMu.RUnlock()
var streams []models.Stream
if err := readJSON(s.filePath("streams.json"), &streams); err != nil {
return nil, err
}
return streams, nil
}
func (s *Store) PublicStreams() ([]models.Stream, error) {
s.streamsMu.RLock()
defer s.streamsMu.RUnlock()
var streams []models.Stream
if err := readJSON(s.filePath("streams.json"), &streams); err != nil {
return nil, err
}
healthMap := s.HealthMap()
var pub []models.Stream
for _, st := range streams {
if !st.Published {
continue
}
// Filter unhealthy streams. URLs not in healthMap are assumed healthy (new/unchecked).
if healthy, exists := healthMap[st.URL]; exists && !healthy {
continue
}
pub = append(pub, st)
}
return pub, nil
}
func (s *Store) UserStreams(userID string) ([]models.Stream, error) {
s.streamsMu.RLock()
defer s.streamsMu.RUnlock()
var streams []models.Stream
if err := readJSON(s.filePath("streams.json"), &streams); err != nil {
return nil, err
}
var result []models.Stream
for _, st := range streams {
if st.SubmittedBy == userID {
result = append(result, st)
}
}
return result, nil
}
func (s *Store) AddStream(url, title, submittedBy string, published bool, source string) (models.Stream, error) {
s.streamsMu.Lock()
defer s.streamsMu.Unlock()
var streams []models.Stream
if err := readJSON(s.filePath("streams.json"), &streams); err != nil {
return models.Stream{}, err
}
id, err := randomID()
if err != nil {
return models.Stream{}, err
}
st := models.Stream{
ID: id,
URL: url,
Title: title,
SubmittedBy: submittedBy,
Published: published,
Source: source,
CreatedAt: time.Now(),
}
streams = append(streams, st)
if err := writeJSON(s.filePath("streams.json"), streams); err != nil {
return models.Stream{}, err
}
return st, nil
}
func (s *Store) PublishScrapedStream(url, title string) error {
s.streamsMu.Lock()
defer s.streamsMu.Unlock()
var streams []models.Stream
if err := readJSON(s.filePath("streams.json"), &streams); err != nil {
return err
}
// Deduplicate: skip if URL already exists in streams
for _, st := range streams {
if st.URL == url {
return nil
}
}
id, err := randomID()
if err != nil {
return err
}
streams = append(streams, models.Stream{
ID: id,
URL: url,
Title: title,
SubmittedBy: "scraper",
Published: true,
Source: "scraped",
CreatedAt: time.Now(),
})
return writeJSON(s.filePath("streams.json"), streams)
}
func (s *Store) DeleteStream(id, userID string, isAdmin bool) error {
s.streamsMu.Lock()
defer s.streamsMu.Unlock()
var streams []models.Stream
if err := readJSON(s.filePath("streams.json"), &streams); err != nil {
return err
}
var updated []models.Stream
found := false
for _, st := range streams {
if st.ID == id {
if userID != "" && !isAdmin && st.SubmittedBy != userID {
return fmt.Errorf("not authorized")
}
found = true
continue
}
updated = append(updated, st)
}
if !found {
return fmt.Errorf("stream not found")
}
return writeJSON(s.filePath("streams.json"), updated)
}
func (s *Store) TogglePublish(id string) error {
s.streamsMu.Lock()
defer s.streamsMu.Unlock()
var streams []models.Stream
if err := readJSON(s.filePath("streams.json"), &streams); err != nil {
return err
}
for i, st := range streams {
if st.ID == id {
streams[i].Published = !st.Published
return writeJSON(s.filePath("streams.json"), streams)
}
}
return fmt.Errorf("stream not found")
}
func (s *Store) SeedStreams(defaults []models.Stream) error {
s.streamsMu.Lock()
defer s.streamsMu.Unlock()
var existing []models.Stream
if err := readJSON(s.filePath("streams.json"), &existing); err != nil {
return err
}
if len(existing) > 0 {
return nil
}
return writeJSON(s.filePath("streams.json"), defaults)
}
func randomID() (string, error) {
b := make([]byte, 16)
if _, err := rand.Read(b); err != nil {
return "", err
}
return hex.EncodeToString(b), nil
}

View file

@ -1,91 +0,0 @@
package store
import (
"fmt"
"f1-stream/internal/models"
"github.com/go-webauthn/webauthn/webauthn"
)
func (s *Store) LoadUsers() ([]models.User, error) {
s.usersMu.RLock()
defer s.usersMu.RUnlock()
var users []models.User
if err := readJSON(s.filePath("users.json"), &users); err != nil {
return nil, err
}
return users, nil
}
func (s *Store) GetUserByName(username string) (*models.User, error) {
s.usersMu.RLock()
defer s.usersMu.RUnlock()
var users []models.User
if err := readJSON(s.filePath("users.json"), &users); err != nil {
return nil, err
}
for _, u := range users {
if u.Username == username {
return &u, nil
}
}
return nil, nil
}
func (s *Store) GetUserByID(id string) (*models.User, error) {
s.usersMu.RLock()
defer s.usersMu.RUnlock()
var users []models.User
if err := readJSON(s.filePath("users.json"), &users); err != nil {
return nil, err
}
for _, u := range users {
if u.ID == id {
return &u, nil
}
}
return nil, nil
}
func (s *Store) CreateUser(user models.User) error {
s.usersMu.Lock()
defer s.usersMu.Unlock()
var users []models.User
if err := readJSON(s.filePath("users.json"), &users); err != nil {
return err
}
for _, u := range users {
if u.Username == user.Username {
return fmt.Errorf("username already exists")
}
}
users = append(users, user)
return writeJSON(s.filePath("users.json"), users)
}
func (s *Store) UpdateUserCredentials(userID string, creds []webauthn.Credential) error {
s.usersMu.Lock()
defer s.usersMu.Unlock()
var users []models.User
if err := readJSON(s.filePath("users.json"), &users); err != nil {
return err
}
for i, u := range users {
if u.ID == userID {
users[i].Credentials = creds
return writeJSON(s.filePath("users.json"), users)
}
}
return fmt.Errorf("user not found")
}
func (s *Store) UserCount() (int, error) {
s.usersMu.RLock()
defer s.usersMu.RUnlock()
var users []models.User
if err := readJSON(s.filePath("users.json"), &users); err != nil {
return 0, err
}
return len(users), nil
}

View file

@ -1,163 +0,0 @@
package main
import (
"context"
"fmt"
"log"
"net/http"
"os"
"os/signal"
"strings"
"syscall"
"time"
"f1-stream/internal/auth"
"f1-stream/internal/extractor"
"f1-stream/internal/healthcheck"
"f1-stream/internal/models"
"f1-stream/internal/playerconfig"
"f1-stream/internal/scraper"
"f1-stream/internal/server"
"f1-stream/internal/store"
)
func main() {
listenAddr := envOr("LISTEN_ADDR", ":8080")
dataDir := envOr("DATA_DIR", "/data")
scrapeInterval := envDuration("SCRAPE_INTERVAL", 15*time.Minute)
validateTimeout := envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)
adminUsername := os.Getenv("ADMIN_USERNAME")
sessionTTL := envDuration("SESSION_TTL", 720*time.Hour)
headlessEnabled := os.Getenv("HEADLESS_EXTRACT_ENABLED") == "true"
rpID := envOr("WEBAUTHN_RPID", "localhost")
rpOrigin := envOr("WEBAUTHN_ORIGIN", "http://localhost:8080")
rpDisplayName := envOr("WEBAUTHN_DISPLAY_NAME", "F1 Stream")
// Initialize store
st, err := store.New(dataDir)
if err != nil {
log.Fatalf("failed to init store: %v", err)
}
// Seed default streams
if err := st.SeedStreams(defaultStreams()); err != nil {
log.Printf("warning: failed to seed streams: %v", err)
}
// Initialize auth
origins := strings.Split(rpOrigin, ",")
a, err := auth.New(st, rpDisplayName, rpID, origins, adminUsername, sessionTTL)
if err != nil {
log.Fatalf("failed to init auth: %v", err)
}
// Initialize scraper
sc := scraper.New(st, scrapeInterval, validateTimeout)
// Initialize health checker
healthInterval := envDuration("HEALTH_CHECK_INTERVAL", 5*time.Minute)
healthTimeout := envDuration("HEALTH_CHECK_TIMEOUT", 10*time.Second)
hc := healthcheck.New(st, healthInterval, healthTimeout)
// Initialize server
pc := playerconfig.New()
srv := server.New(st, a, sc, pc, origins, headlessEnabled)
// Start scraper in background
ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer cancel()
// Initialize headless browser if enabled
if headlessEnabled {
extractor.Init()
defer extractor.Stop()
// Configure TURN server if provided
if turnURL := os.Getenv("TURN_URL"); turnURL != "" {
turnSecret := os.Getenv("TURN_SHARED_SECRET")
turnInternalURL := os.Getenv("TURN_INTERNAL_URL")
extractor.SetTURNConfig(turnURL, turnSecret, turnInternalURL)
}
log.Println("headless video extraction enabled")
}
go sc.Run(ctx)
go hc.Run(ctx)
// Clean expired sessions periodically
go func() {
sessionTicker := time.NewTicker(1 * time.Hour)
defer sessionTicker.Stop()
for {
select {
case <-ctx.Done():
return
case <-sessionTicker.C:
st.CleanExpiredSessions()
}
}
}()
httpSrv := &http.Server{
Addr: listenAddr,
Handler: srv.Handler(),
}
go func() {
<-ctx.Done()
log.Println("shutting down server...")
shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 5*time.Second)
defer shutdownCancel()
httpSrv.Shutdown(shutdownCtx)
}()
log.Printf("starting server on %s", listenAddr)
if err := httpSrv.ListenAndServe(); err != http.ErrServerClosed {
log.Fatalf("server error: %v", err)
}
log.Println("server stopped")
}
func defaultStreams() []models.Stream {
now := time.Now()
streams := []struct {
url, title string
}{
{"https://wearechecking.live/streams-pages/motorsports", "WeAreChecking - Motorsports"},
{"https://vipleague.im/formula-1-schedule-streaming-links", "VIPLeague - F1"},
{"https://www.vipbox.lc/", "VIPBox"},
{"https://f1box.me/", "F1Box"},
{"https://1stream.vip/formula-1-streams/", "1Stream - F1"},
}
var result []models.Stream
for i, s := range streams {
result = append(result, models.Stream{
ID: fmt.Sprintf("default-%d", i),
URL: s.url,
Title: s.title,
SubmittedBy: "system",
Published: true,
Source: "system",
CreatedAt: now,
})
}
return result
}
func envOr(key, fallback string) string {
if v := os.Getenv(key); v != "" {
return v
}
return fallback
}
func envDuration(key string, fallback time.Duration) time.Duration {
if v := os.Getenv(key); v != "" {
d, err := time.ParseDuration(v)
if err != nil {
log.Printf("warning: invalid %s=%q, using default %v", key, v, fallback)
return fallback
}
return d
}
return fallback
}

View file

@ -1,6 +0,0 @@
{
"name": "files",
"lockfileVersion": 3,
"requires": true,
"packages": {}
}

View file

@ -1,6 +0,0 @@
{
"name": "files",
"lockfileVersion": 3,
"requires": true,
"packages": {}
}

View file

@ -1 +0,0 @@
{}

View file

@ -1,6 +1,7 @@
#!/usr/bin/env bash
set -e
docker build -t viktorbarzin/f1-stream .
docker push viktorbarzin/f1-stream
docker buildx build --platform linux/amd64 --provenance=false \
-t viktorbarzin/f1-stream:v2.0.1 -t viktorbarzin/f1-stream:latest \
--push .
kubectl -n f1-stream rollout restart deployment f1-stream

File diff suppressed because it is too large Load diff

File diff suppressed because one or more lines are too long

View file

@ -1,205 +0,0 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>F1 Streams</title>
<meta name="description" content="Live F1 streaming links aggregated from Reddit and user submissions">
<meta property="og:title" content="F1 Streams">
<meta property="og:description" content="Live F1 streaming links aggregated from Reddit and user submissions">
<meta property="og:type" content="website">
<link rel="icon" href="data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'><rect fill='%23e10600' rx='12' width='100' height='100'/><text x='50' y='72' font-size='60' font-weight='900' text-anchor='middle' fill='white' font-family='sans-serif'>F1</text></svg>">
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600&family=Titillium+Web:wght@400;600;700;900&display=swap" rel="stylesheet">
<link rel="stylesheet" href="/static/css/pico.min.css">
<link rel="stylesheet" href="/static/css/custom.css">
</head>
<body>
<header>
<div class="header-left">
<span class="f1-logo">F1</span>
<div>
<h1 class="brand-title">Streams</h1>
<div class="brand-subtitle">Live Racing Hub</div>
</div>
<span class="live-indicator" id="live-badge" hidden>
<span class="live-dot"></span>
LIVE
</span>
</div>
<div class="auth-section" id="auth-section">
<button id="login-btn" onclick="showAuthDialog()">Login / Register</button>
</div>
</header>
<div class="racing-stripe"></div>
<nav class="tabs" id="tabs">
<button class="hamburger" id="hamburger" onclick="toggleMobileNav()" aria-label="Toggle navigation">
<svg width="22" height="22" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2.5" stroke-linecap="round"><line x1="3" y1="6" x2="21" y2="6"/><line x1="3" y1="12" x2="21" y2="12"/><line x1="3" y1="18" x2="21" y2="18"/></svg>
</button>
<button class="tab-btn active" data-tab="streams" onclick="switchTab('streams')">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><polygon points="5 3 19 12 5 21 5 3"/></svg>
Streams
</button>
<button class="tab-btn" data-tab="reddit" onclick="switchTab('reddit')">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="10"/><line x1="2" y1="12" x2="22" y2="12"/><ellipse cx="12" cy="12" rx="10" ry="4" transform="rotate(90 12 12)"/></svg>
Reddit Links
</button>
<button class="tab-btn hidden" data-tab="mine" onclick="switchTab('mine')" id="tab-mine">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M20 21v-2a4 4 0 0 0-4-4H8a4 4 0 0 0-4 4v2"/><circle cx="12" cy="7" r="4"/></svg>
My Streams
</button>
<button class="tab-btn hidden" data-tab="admin" onclick="switchTab('admin')" id="tab-admin">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="12" cy="12" r="3"/><path d="M19.4 15a1.65 1.65 0 0 0 .33 1.82l.06.06a2 2 0 0 1-2.83 2.83l-.06-.06a1.65 1.65 0 0 0-1.82-.33 1.65 1.65 0 0 0-1 1.51V21a2 2 0 0 1-4 0v-.09A1.65 1.65 0 0 0 9 19.4a1.65 1.65 0 0 0-1.82.33l-.06.06a2 2 0 0 1-2.83-2.83l.06-.06A1.65 1.65 0 0 0 4.68 15a1.65 1.65 0 0 0-1.51-1H3a2 2 0 0 1 0-4h.09A1.65 1.65 0 0 0 4.6 9a1.65 1.65 0 0 0-.33-1.82l-.06-.06a2 2 0 0 1 2.83-2.83l.06.06A1.65 1.65 0 0 0 9 4.68a1.65 1.65 0 0 0 1-1.51V3a2 2 0 0 1 4 0v.09a1.65 1.65 0 0 0 1 1.51 1.65 1.65 0 0 0 1.82-.33l.06-.06a2 2 0 0 1 2.83 2.83l-.06.06A1.65 1.65 0 0 0 19.4 9a1.65 1.65 0 0 0 1.51 1H21a2 2 0 0 1 0 4h-.09a1.65 1.65 0 0 0-1.51 1z"/></svg>
Admin
</button>
</nav>
<main>
<section class="tab-content active" id="content-streams">
<div class="submit-form-card">
<div class="submit-form">
<input type="url" id="public-submit-url" placeholder="https://stream-url.com/..." required>
<input type="text" id="public-submit-title" placeholder="Stream title (optional)">
<button onclick="addPublicStream()">Add Stream</button>
</div>
</div>
<div class="stream-grid" id="stream-grid"></div>
<div class="empty-state" id="streams-empty" style="display:none">
<span class="empty-icon">&#127937;</span>
<div class="empty-title">No Streams Yet</div>
<p class="empty-desc">Add a stream URL above to get the race started.</p>
</div>
</section>
<section class="tab-content" id="content-reddit">
<div class="section-header">
<h3>Reddit Links</h3>
<button onclick="refreshRedditLinks()">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><polyline points="23 4 23 10 17 10"/><path d="M20.49 15a9 9 0 1 1-2.12-9.36L23 10"/></svg>
Refresh
</button>
</div>
<ul class="link-list" id="reddit-list"></ul>
<div class="empty-state" id="reddit-empty" style="display:none">
<span class="empty-icon">&#128225;</span>
<div class="empty-title">No Links Found</div>
<p class="empty-desc">No Reddit links scraped yet. Check back closer to race time.</p>
</div>
</section>
<section class="tab-content" id="content-mine">
<div class="submit-form-card">
<div class="submit-form">
<input type="url" id="submit-url" placeholder="https://stream-url.com/..." required>
<input type="text" id="submit-title" placeholder="Stream title (optional)">
<button onclick="submitStream()">Submit Stream</button>
</div>
</div>
<div class="stream-grid" id="my-stream-grid"></div>
<div class="empty-state" id="mine-empty" style="display:none">
<span class="empty-icon">&#127918;</span>
<div class="empty-title">Your Pit Lane is Empty</div>
<p class="empty-desc">Submit a stream URL above to join the grid.</p>
</div>
</section>
<section class="tab-content" id="content-admin">
<div class="section-header">
<h3>All Streams</h3>
<button onclick="triggerScrape()">Trigger Scrape</button>
</div>
<div class="admin-stats" id="admin-stats"></div>
<div id="admin-stream-list"></div>
</section>
<!-- Browser Session Viewer (inline within main) -->
<section id="browser-viewer" class="browser-viewer hidden">
<div class="browser-viewer-bar">
<div class="browser-url-bar">
<svg class="browser-url-icon" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="2" y="3" width="20" height="14" rx="2" ry="2"/><line x1="8" y1="21" x2="16" y2="21"/><line x1="12" y1="17" x2="12" y2="21"/></svg>
<span class="browser-url-text" id="browser-url"></span>
</div>
<span class="browser-viewer-status"></span>
<a id="browser-open-original" href="#" target="_blank" rel="noopener" class="browser-open-btn" title="Open original in new tab">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M18 13v6a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h6"/><polyline points="15 3 21 3 21 9"/><line x1="10" y1="14" x2="21" y2="3"/></svg>
</a>
<button class="browser-viewer-close" onclick="closeBrowserSession()" title="Close viewer">&times;</button>
</div>
<div class="browser-viewer-content">
<div class="loading-overlay" id="browser-viewer-loader">
<div class="spinner"></div>
</div>
</div>
</section>
</main>
<footer>
<p>Stream links are user-submitted and scraped from Reddit. No streams are hosted on this site.</p>
</footer>
<!-- Auth Dialog -->
<dialog id="auth-dialog">
<article>
<button class="dialog-close" onclick="document.getElementById('auth-dialog').close()" aria-label="Close">&times;</button>
<div class="dialog-logo"><span class="f1-logo">F1</span></div>
<h3 class="dialog-title">Welcome</h3>
<p class="dialog-subtitle">Sign in with your passkey to manage streams</p>
<div class="dialog-tabs">
<button class="dialog-tab-btn active" onclick="switchAuthTab('login', event)">Login</button>
<button class="dialog-tab-btn" onclick="switchAuthTab('register', event)">Register</button>
</div>
<div id="auth-login-form" class="auth-form-group">
<label for="login-username">Username</label>
<input type="text" id="login-username" placeholder="Username" autocomplete="username webauthn">
<div class="error-msg" id="login-error"></div>
<button onclick="doLogin()">
<svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="3" y="11" width="18" height="11" rx="2" ry="2"/><path d="M7 11V7a5 5 0 0 1 10 0v4"/></svg>
Login with Passkey
</button>
</div>
<div id="auth-register-form" class="auth-form-group" style="display:none">
<label for="register-username">Username</label>
<input type="text" id="register-username" placeholder="Username (3-30 chars)" autocomplete="username">
<div class="error-msg" id="register-error"></div>
<button onclick="doRegister()">
<svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M16 21v-2a4 4 0 0 0-4-4H5a4 4 0 0 0-4 4v2"/><circle cx="8.5" cy="7" r="4"/><line x1="20" y1="8" x2="20" y2="14"/><line x1="23" y1="11" x2="17" y2="11"/></svg>
Register with Passkey
</button>
</div>
<button onclick="document.getElementById('auth-dialog').close()" class="btn-secondary dialog-cancel">Cancel</button>
</article>
</dialog>
<!-- Toast Container -->
<div class="toast-container" id="toast-container"></div>
<!-- Reddit Viewer Overlay -->
<div id="reddit-viewer" class="reddit-viewer hidden">
<div class="reddit-viewer-bar">
<span class="reddit-viewer-title"></span>
<button class="reddit-viewer-close" onclick="closeRedditViewer()">&times;</button>
</div>
<div class="reddit-viewer-content">
<div class="loading-overlay" id="reddit-viewer-loader">
<div class="spinner"></div>
</div>
</div>
</div>
<!-- Browser Session Viewer (inline, inside main via JS) -->
<script src="https://cdn.jsdelivr.net/npm/crypto-js@4/crypto-js.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/hls.js@latest/dist/hls.min.js"></script>
<script src="/static/js/player.js"></script>
<script src="/static/js/utils.js"></script>
<script src="/static/js/auth.js"></script>
<script src="/static/js/streams.js"></script>
<script src="/static/js/app.js"></script>
</body>
</html>

View file

@ -1,121 +0,0 @@
// Toast notification system
const TOAST_ICONS = {
success: '\u2705',
error: '\u274C',
warning: '\u26A0\uFE0F',
info: '\u2139\uFE0F'
};
function showToast(message, type = 'info', duration = 4000) {
const container = document.getElementById('toast-container');
const toast = document.createElement('div');
toast.className = `toast ${type}`;
toast.innerHTML = `
<span class="toast-icon">${TOAST_ICONS[type] || TOAST_ICONS.info}</span>
<span class="toast-message">${escapeHtml(message)}</span>
<button class="toast-close" onclick="dismissToast(this.parentElement)">&times;</button>
`;
container.appendChild(toast);
if (duration > 0) {
setTimeout(() => dismissToast(toast), duration);
}
}
function dismissToast(toast) {
if (!toast || toast.classList.contains('toast-out')) return;
toast.classList.add('toast-out');
toast.addEventListener('animationend', () => toast.remove());
}
// Confirm dialog (replaces window.confirm)
function showConfirm(message) {
return new Promise((resolve) => {
const overlay = document.createElement('div');
overlay.className = 'confirm-overlay';
overlay.innerHTML = `
<div class="confirm-box">
<div class="confirm-msg">${escapeHtml(message)}</div>
<div class="confirm-actions">
<button class="btn-secondary" id="confirm-cancel">Cancel</button>
<button class="btn-primary" id="confirm-ok">Confirm</button>
</div>
</div>
`;
document.body.appendChild(overlay);
overlay.querySelector('#confirm-ok').addEventListener('click', () => {
overlay.remove();
resolve(true);
});
overlay.querySelector('#confirm-cancel').addEventListener('click', () => {
overlay.remove();
resolve(false);
});
overlay.addEventListener('click', (e) => {
if (e.target === overlay) {
overlay.remove();
resolve(false);
}
});
});
}
// Mobile nav hamburger toggle
function toggleMobileNav() {
const tabs = document.getElementById('tabs');
tabs.classList.toggle('open');
}
// Tab switching
function switchTab(tab) {
closeRedditViewer();
document.querySelectorAll('.tab-btn').forEach(b => {
b.classList.toggle('active', b.dataset.tab === tab);
});
document.querySelectorAll('.tab-content').forEach(c => {
c.classList.toggle('active', c.id === 'content-' + tab);
});
// Close mobile nav
document.getElementById('tabs').classList.remove('open');
// Load data for the tab
switch (tab) {
case 'streams':
loadPublicStreams();
break;
case 'reddit':
loadRedditLinks();
break;
case 'mine':
loadMyStreams();
break;
case 'admin':
loadAdminStreams();
break;
}
}
// Initialize
document.addEventListener('DOMContentLoaded', async () => {
checkAuth();
await loadPublicStreams();
const grid = document.getElementById('stream-grid');
const badge = document.getElementById('live-badge');
if (badge && grid && grid.children.length > 0) {
badge.hidden = false;
}
});
// Close Reddit viewer on Escape
document.addEventListener('keydown', (e) => {
if (e.key === 'Escape') {
const viewer = document.getElementById('reddit-viewer');
if (viewer && !viewer.classList.contains('hidden')) {
closeRedditViewer();
}
}
});

View file

@ -1,219 +0,0 @@
// WebAuthn helper: base64url encode/decode
function bufToBase64url(buf) {
const bytes = new Uint8Array(buf);
let str = '';
for (const b of bytes) str += String.fromCharCode(b);
return btoa(str).replace(/\+/g, '-').replace(/\//g, '_').replace(/=/g, '');
}
function base64urlToBuf(b64) {
const pad = b64.length % 4;
if (pad) b64 += '='.repeat(4 - pad);
const str = atob(b64.replace(/-/g, '+').replace(/_/g, '/'));
const buf = new Uint8Array(str.length);
for (let i = 0; i < str.length; i++) buf[i] = str.charCodeAt(i);
return buf.buffer;
}
let currentUser = null;
function showAuthDialog() {
document.getElementById('auth-dialog').showModal();
}
function switchAuthTab(tab, evt) {
const btns = document.querySelectorAll('.dialog-tab-btn');
btns.forEach(b => b.classList.remove('active'));
evt.target.classList.add('active');
document.getElementById('auth-login-form').style.display = tab === 'login' ? 'block' : 'none';
document.getElementById('auth-register-form').style.display = tab === 'register' ? 'block' : 'none';
document.getElementById('login-error').textContent = '';
document.getElementById('register-error').textContent = '';
}
async function doRegister() {
const username = document.getElementById('register-username').value.trim();
const errEl = document.getElementById('register-error');
errEl.textContent = '';
if (!username || username.length < 3) {
errEl.textContent = 'Username must be at least 3 characters';
return;
}
try {
// Step 1: Begin registration
const beginResp = await fetch('/api/auth/register/begin', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ username })
});
if (!beginResp.ok) {
const err = await beginResp.json();
errEl.textContent = err.error || 'Registration failed';
return;
}
const options = await beginResp.json();
// Convert base64url fields to ArrayBuffers
options.publicKey.challenge = base64urlToBuf(options.publicKey.challenge);
options.publicKey.user.id = base64urlToBuf(options.publicKey.user.id);
if (options.publicKey.excludeCredentials) {
options.publicKey.excludeCredentials = options.publicKey.excludeCredentials.map(c => ({
...c,
id: base64urlToBuf(c.id)
}));
}
// Step 2: Create credential via browser
const credential = await navigator.credentials.create(options);
// Step 3: Finish registration
const attestation = {
id: credential.id,
rawId: bufToBase64url(credential.rawId),
type: credential.type,
response: {
attestationObject: bufToBase64url(credential.response.attestationObject),
clientDataJSON: bufToBase64url(credential.response.clientDataJSON)
}
};
const finishResp = await fetch(`/api/auth/register/finish?username=${encodeURIComponent(username)}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(attestation)
});
if (!finishResp.ok) {
const err = await finishResp.json();
errEl.textContent = err.error || 'Registration failed';
return;
}
const user = await finishResp.json();
setLoggedIn(user);
document.getElementById('auth-dialog').close();
} catch (e) {
console.error('Registration error:', e);
errEl.textContent = e.message || 'Registration failed';
}
}
async function doLogin() {
const username = document.getElementById('login-username').value.trim();
const errEl = document.getElementById('login-error');
errEl.textContent = '';
if (!username) {
errEl.textContent = 'Username required';
return;
}
try {
// Step 1: Begin login
const beginResp = await fetch('/api/auth/login/begin', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ username })
});
if (!beginResp.ok) {
const err = await beginResp.json();
errEl.textContent = err.error || 'Login failed';
return;
}
const options = await beginResp.json();
// Convert base64url fields
options.publicKey.challenge = base64urlToBuf(options.publicKey.challenge);
if (options.publicKey.allowCredentials) {
options.publicKey.allowCredentials = options.publicKey.allowCredentials.map(c => ({
...c,
id: base64urlToBuf(c.id)
}));
}
// Step 2: Get assertion via browser
const assertion = await navigator.credentials.get(options);
// Step 3: Finish login
const assertionData = {
id: assertion.id,
rawId: bufToBase64url(assertion.rawId),
type: assertion.type,
response: {
authenticatorData: bufToBase64url(assertion.response.authenticatorData),
clientDataJSON: bufToBase64url(assertion.response.clientDataJSON),
signature: bufToBase64url(assertion.response.signature),
userHandle: assertion.response.userHandle ? bufToBase64url(assertion.response.userHandle) : ''
}
};
const finishResp = await fetch(`/api/auth/login/finish?username=${encodeURIComponent(username)}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(assertionData)
});
if (!finishResp.ok) {
const err = await finishResp.json();
errEl.textContent = err.error || 'Login failed';
return;
}
const user = await finishResp.json();
setLoggedIn(user);
document.getElementById('auth-dialog').close();
} catch (e) {
console.error('Login error:', e);
errEl.textContent = e.message || 'Login failed';
}
}
async function doLogout() {
await fetch('/api/auth/logout', { method: 'POST' });
setLoggedOut();
}
function setLoggedIn(user) {
currentUser = user;
const section = document.getElementById('auth-section');
section.innerHTML = `
<span>Hi, ${escapeHtml(user.username)}</span>
<button onclick="doLogout()">Logout</button>
`;
document.getElementById('tab-mine').classList.remove('hidden');
if (user.is_admin) {
document.getElementById('tab-admin').classList.remove('hidden');
}
}
function setLoggedOut() {
currentUser = null;
const section = document.getElementById('auth-section');
section.innerHTML = '<button id="login-btn" onclick="showAuthDialog()">Login / Register</button>';
document.getElementById('tab-mine').classList.add('hidden');
document.getElementById('tab-admin').classList.add('hidden');
// Switch to streams tab if on a protected tab
const activeTab = document.querySelector('.tab-btn.active');
if (activeTab && (activeTab.dataset.tab === 'mine' || activeTab.dataset.tab === 'admin')) {
switchTab('streams');
}
}
async function checkAuth() {
try {
const resp = await fetch('/api/auth/me');
if (resp.ok) {
const user = await resp.json();
setLoggedIn(user);
}
} catch (e) {
// Not logged in
}
}

View file

@ -1,216 +0,0 @@
// player.js — Native HLS player management using HLS.js directly
var _hlsInstance = null;
var _videoElement = null;
/**
* Fetch player config for a stream from the backend.
* Returns {type: "hls"|"daddylive"|"proxy", hls_url, auth_token, ...}
*/
async function getPlayerConfig(streamId) {
try {
const resp = await fetch('/api/streams/' + streamId + '/player-config');
if (!resp.ok) return { type: 'proxy' };
return await resp.json();
} catch (e) {
console.error('Failed to fetch player config:', e);
return { type: 'proxy' };
}
}
/**
* Decode a /hls/{b64} URL back to the original upstream URL.
*/
function decodeHLSURL(proxyURL) {
if (!proxyURL || typeof proxyURL !== 'string') return proxyURL;
var m = proxyURL.match(/\/hls\/([A-Za-z0-9_-]+)/);
if (!m) return proxyURL;
try {
// base64url decode
var b64 = m[1].replace(/-/g, '+').replace(/_/g, '/');
// pad
while (b64.length % 4 !== 0) b64 += '=';
return atob(b64);
} catch (e) {
return proxyURL;
}
}
/**
* Create an HLS.js player for a plain HLS stream.
*/
function createHLSPlayer(containerSelector, hlsURL) {
destroyNativePlayer();
_buildPlayer(containerSelector, hlsURL, {});
}
/**
* Create an HLS.js player for DaddyLive streams with auth module integration.
*/
function createDaddyLivePlayer(containerSelector, config) {
destroyNativePlayer();
if (config.auth_mod_url) {
_loadAuthModAndPlay(containerSelector, config);
} else {
_buildPlayer(containerSelector, config.hls_url, {});
}
}
function _loadAuthModAndPlay(containerSelector, config) {
var script = document.createElement('script');
script.src = config.auth_mod_url;
script.onload = function () {
_createDaddyLivePlayerWithAuth(containerSelector, config);
};
script.onerror = function () {
console.warn('Failed to load auth module, falling back to direct HLS');
_buildPlayer(containerSelector, config.hls_url, {});
};
document.head.appendChild(script);
}
function _createDaddyLivePlayerWithAuth(containerSelector, config) {
var hlsConfig = {};
// If EPlayerAuth is available, set up xhr wrapping
if (typeof EPlayerAuth !== 'undefined' && typeof EPlayerAuth.init === 'function') {
try {
EPlayerAuth.init({
authToken: config.auth_token,
channelKey: config.channel_key,
channelSalt: config.channel_salt,
timestamp: config.timestamp,
serverKey: config.server_key
});
if (typeof EPlayerAuth.getXhrSetup === 'function') {
var origSetup = EPlayerAuth.getXhrSetup();
hlsConfig.xhrSetup = function (xhr, url) {
// Decode the real upstream URL from our /hls/{b64} proxy path
var realURL = decodeHLSURL(url);
// Create interceptor to capture headers the auth module sets
var captured = {};
var fakeXHR = {
setRequestHeader: function (k, v) { captured[k] = v; }
};
try {
origSetup(fakeXHR, realURL);
} catch (e) {
console.warn('Auth xhrSetup error:', e);
}
// Re-set captured headers with forwarding prefix
for (var k in captured) {
if (captured.hasOwnProperty(k)) {
xhr.setRequestHeader('X-Hls-Forward-' + k, captured[k]);
}
}
};
}
} catch (e) {
console.warn('EPlayerAuth init failed:', e);
}
}
_buildPlayer(containerSelector, config.hls_url, hlsConfig);
}
/**
* Build an HLS.js player with a <video> element.
*/
function _buildPlayer(containerSelector, hlsURL, extraConfig) {
var container = document.querySelector(containerSelector);
if (!container) return;
// Create video element
var video = document.createElement('video');
video.controls = true;
video.autoplay = true;
video.style.width = '100%';
video.style.height = '100%';
video.style.backgroundColor = '#000';
container.appendChild(video);
_videoElement = video;
if (Hls.isSupported()) {
var config = {
enableWorker: true,
lowLatencyMode: false,
maxBufferLength: 30,
maxMaxBufferLength: 60
};
// Merge extra config (e.g. xhrSetup for auth)
for (var k in extraConfig) {
if (extraConfig.hasOwnProperty(k)) {
config[k] = extraConfig[k];
}
}
var hls = new Hls(config);
hls.loadSource(hlsURL);
hls.attachMedia(video);
hls.on(Hls.Events.MANIFEST_PARSED, function () {
video.play().catch(function(e) {
console.warn('Autoplay blocked:', e);
});
});
hls.on(Hls.Events.ERROR, function (event, data) {
console.error('HLS.js error:', data.type, data.details, data);
if (data.fatal) {
switch (data.type) {
case Hls.ErrorTypes.NETWORK_ERROR:
console.warn('HLS network error, attempting recovery...');
hls.startLoad();
break;
case Hls.ErrorTypes.MEDIA_ERROR:
console.warn('HLS media error, attempting recovery...');
hls.recoverMediaError();
break;
default:
console.error('HLS fatal error, cannot recover');
hls.destroy();
break;
}
}
});
_hlsInstance = hls;
} else if (video.canPlayType('application/vnd.apple.mpegurl')) {
// Safari native HLS
video.src = hlsURL;
video.addEventListener('loadedmetadata', function () {
video.play().catch(function(e) {
console.warn('Autoplay blocked:', e);
});
});
} else {
container.textContent = 'HLS playback is not supported in this browser.';
}
}
/**
* Destroy the current native player instance.
*/
function destroyNativePlayer() {
if (_hlsInstance) {
try {
_hlsInstance.destroy();
} catch (e) {
console.warn('Error destroying HLS instance:', e);
}
_hlsInstance = null;
}
if (_videoElement) {
try {
_videoElement.pause();
_videoElement.removeAttribute('src');
_videoElement.load();
_videoElement.remove();
} catch (e) {
console.warn('Error removing video element:', e);
}
_videoElement = null;
}
}

View file

@ -1,419 +0,0 @@
async function loadPublicStreams() {
const grid = document.getElementById('stream-grid');
const empty = document.getElementById('streams-empty');
try {
const resp = await fetch('/api/streams/public');
const streams = await resp.json();
if (!streams || streams.length === 0) {
grid.innerHTML = '';
empty.style.display = '';
return;
}
empty.style.display = 'none';
grid.innerHTML = streams.map(s => streamCard(s, !!currentUser)).join('');
} catch (e) {
console.error('Failed to load streams:', e);
grid.innerHTML = '';
empty.style.display = '';
}
}
async function loadMyStreams() {
const grid = document.getElementById('my-stream-grid');
const empty = document.getElementById('mine-empty');
try {
const resp = await fetch('/api/streams/mine');
const streams = await resp.json();
if (!streams || streams.length === 0) {
grid.innerHTML = '';
empty.style.display = '';
return;
}
empty.style.display = 'none';
grid.innerHTML = streams.map(s => streamCard(s, true)).join('');
} catch (e) {
console.error('Failed to load my streams:', e);
}
}
async function loadRedditLinks() {
const list = document.getElementById('reddit-list');
const empty = document.getElementById('reddit-empty');
try {
const [scrapedResp, streamsResp] = await Promise.all([
fetch('/api/scraped'),
fetch('/api/streams/public')
]);
const links = await scrapedResp.json();
const streams = await streamsResp.json();
const importedURLs = new Set((streams || []).map(s => s.url));
if (!links || links.length === 0) {
list.innerHTML = '';
empty.style.display = '';
return;
}
empty.style.display = 'none';
list.innerHTML = links.map(l => {
const imported = importedURLs.has(l.url);
const actionHtml = imported
? `<span class="badge badge-imported">Imported</span>`
: `<button class="btn-import" onclick="importRedditLink('${escapeHtml(l.id)}')">Import</button>`;
return `
<li>
<span class="link-source-badge">${escapeHtml(l.source)}</span>
<div class="link-title">
<a href="${escapeHtml(l.url)}" target="_blank" rel="noopener">${escapeHtml(l.title || l.url)}</a>
</div>
${actionHtml}
<a href="${escapeHtml(l.url)}" target="_blank" rel="noopener" class="link-open-icon-wrap" title="Open in new tab">
<svg class="link-open-icon" width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M18 13v6a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h6"/><polyline points="15 3 21 3 21 9"/><line x1="10" y1="14" x2="21" y2="3"/></svg>
</a>
</li>
`;
}).join('');
} catch (e) {
console.error('Failed to load Reddit links:', e);
}
}
async function importRedditLink(id) {
try {
const resp = await fetch(`/api/scraped/${id}/import`, { method: 'POST' });
if (!resp.ok) {
const err = await resp.json();
showToast(err.error || 'Failed to import', 'error');
return;
}
showToast('Stream imported', 'success');
loadRedditLinks();
loadPublicStreams();
} catch (e) {
showToast('Failed to import stream', 'error');
}
}
async function loadAdminStreams() {
const container = document.getElementById('admin-stream-list');
const statsContainer = document.getElementById('admin-stats');
try {
const resp = await fetch('/api/admin/streams');
const streams = await resp.json();
if (!streams || streams.length === 0) {
statsContainer.innerHTML = '';
container.innerHTML = '<div class="empty-state"><span class="empty-icon">&#128203;</span><div class="empty-title">No Streams</div><p class="empty-desc">No streams have been submitted yet.</p></div>';
return;
}
const total = streams.length;
const published = streams.filter(s => s.published).length;
const drafts = total - published;
statsContainer.innerHTML = `
<div class="stat-card">
<div class="stat-number">${total}</div>
<div class="stat-label">Total</div>
</div>
<div class="stat-card">
<div class="stat-number">${published}</div>
<div class="stat-label">Published</div>
</div>
<div class="stat-card">
<div class="stat-number">${drafts}</div>
<div class="stat-label">Drafts</div>
</div>
`;
container.innerHTML = streams.map(s => `
<div class="admin-stream">
<div class="info">
<span class="status-dot ${s.published ? 'published' : 'draft'}"></span>
<div class="stream-details">
<div class="stream-title">
${escapeHtml(s.title)}
<span class="badge ${s.published ? 'badge-published' : 'badge-draft'}">
${s.published ? 'Published' : 'Draft'}
</span>
</div>
<div class="stream-url">${escapeHtml(s.url)}</div>
${s.submitted_by ? `<div class="stream-submitter">by ${escapeHtml(s.submitted_by)}</div>` : ''}
</div>
</div>
<div class="actions">
<button onclick="togglePublish('${s.id}')" class="${s.published ? 'btn-secondary-sm' : 'btn-primary-sm'}">
${s.published ? 'Unpublish' : 'Publish'}
</button>
<button onclick="deleteStream('${s.id}', true)" class="btn-danger-sm">Delete</button>
</div>
</div>
`).join('');
} catch (e) {
console.error('Failed to load admin streams:', e);
}
}
function streamCard(stream, canDelete) {
const deleteBtn = canDelete
? `<button onclick="event.stopPropagation(); deleteStream('${stream.id}', false)" class="icon-btn danger" title="Delete stream">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><polyline points="3 6 5 6 21 6"/><path d="M19 6v14a2 2 0 0 1-2 2H7a2 2 0 0 1-2-2V6m3 0V4a2 2 0 0 1 2-2h4a2 2 0 0 1 2 2v2"/></svg>
</button>`
: '';
return `
<div class="stream-card" data-stream-id="${stream.id}"
onclick="openBrowserSession('${stream.id}', '${escapeAttr(stream.title)}', '${escapeAttr(stream.url)}')">
<div class="card-body">
<div class="card-icon">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect x="2" y="3" width="20" height="14" rx="2" ry="2"/><line x1="8" y1="21" x2="16" y2="21"/><line x1="12" y1="17" x2="12" y2="21"/></svg>
</div>
<div class="card-title">${escapeHtml(stream.title)}</div>
<div class="card-url">${escapeHtml(stream.url)}</div>
</div>
<div class="card-bar">
<div class="card-actions">
<a href="${escapeHtml(stream.url)}" target="_blank" rel="noopener" onclick="event.stopPropagation()" class="icon-btn" title="Open original">
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M18 13v6a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2V8a2 2 0 0 1 2-2h6"/><polyline points="15 3 21 3 21 9"/><line x1="10" y1="14" x2="21" y2="3"/></svg>
</a>
${deleteBtn}
</div>
</div>
</div>
`;
}
async function _submitStreamCommon(urlId, titleId, successMsg, reloadFn) {
const urlInput = document.getElementById(urlId);
const titleInput = document.getElementById(titleId);
const url = urlInput.value.trim();
const title = titleInput.value.trim();
if (!url) {
showToast('URL is required', 'warning');
return;
}
try {
new URL(url);
} catch {
showToast('Please enter a valid URL', 'warning');
return;
}
try {
const resp = await fetch('/api/streams', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ url, title })
});
if (!resp.ok) {
const err = await resp.json();
showToast(err.error || 'Failed to add stream', 'error');
return;
}
urlInput.value = '';
titleInput.value = '';
showToast(successMsg, 'success');
reloadFn();
} catch (e) {
showToast('Failed to add stream', 'error');
}
}
async function addPublicStream() {
await _submitStreamCommon('public-submit-url', 'public-submit-title', 'Stream added', loadPublicStreams);
}
async function submitStream() {
await _submitStreamCommon('submit-url', 'submit-title', 'Stream submitted for review', loadMyStreams);
}
async function deleteStream(id, isAdmin) {
const confirmed = await showConfirm('Delete this stream?');
if (!confirmed) return;
try {
const resp = await fetch(`/api/streams/${id}`, { method: 'DELETE' });
if (!resp.ok) {
const err = await resp.json();
showToast(err.error || 'Failed to delete', 'error');
return;
}
showToast('Stream deleted', 'success');
if (isAdmin) {
loadAdminStreams();
} else {
loadMyStreams();
}
loadPublicStreams();
} catch (e) {
showToast('Failed to delete stream', 'error');
}
}
async function togglePublish(id) {
try {
const resp = await fetch(`/api/streams/${id}/publish`, { method: 'PUT' });
if (!resp.ok) {
showToast('Failed to toggle publish', 'error');
return;
}
showToast('Stream updated', 'success');
loadAdminStreams();
loadPublicStreams();
} catch (e) {
showToast('Failed to toggle publish', 'error');
}
}
async function refreshRedditLinks() {
try {
const resp = await fetch('/api/scraped/refresh', { method: 'POST' });
if (!resp.ok) {
showToast('Failed to trigger refresh', 'error');
return;
}
showToast('Refreshing links from Reddit...', 'info');
let attempts = 0;
const maxAttempts = 15;
const poll = setInterval(async () => {
attempts++;
await loadRedditLinks();
if (attempts >= maxAttempts) {
clearInterval(poll);
}
}, 2000);
} catch (e) {
showToast('Failed to trigger refresh', 'error');
}
}
async function triggerScrape() {
try {
await fetch('/api/admin/scrape', { method: 'POST' });
showToast('Scrape triggered', 'success');
} catch (e) {
showToast('Failed to trigger scrape', 'error');
}
}
function closeRedditViewer() {
const viewer = document.getElementById('reddit-viewer');
if (!viewer) return;
viewer.classList.add('hidden');
const contentEl = viewer.querySelector('.reddit-viewer-content');
contentEl.querySelectorAll(':scope > :not(#reddit-viewer-loader)').forEach(el => el.remove());
}
// --- Browser Session Viewer (Iframe Proxy + Native Player) ---
async function openBrowserSession(streamId, streamTitle, streamURL) {
const viewer = document.getElementById('browser-viewer');
const statusEl = viewer.querySelector('.browser-viewer-status');
const contentEl = viewer.querySelector('.browser-viewer-content');
const loader = document.getElementById('browser-viewer-loader');
const urlText = document.getElementById('browser-url');
const openOriginal = document.getElementById('browser-open-original');
statusEl.textContent = 'Loading...';
statusEl.classList.remove('connected');
loader.classList.remove('hidden');
if (urlText) urlText.textContent = streamURL;
if (openOriginal) openOriginal.href = streamURL;
// Hide all tab content sections and show the viewer
document.querySelectorAll('.tab-content').forEach(s => s.classList.remove('active'));
viewer.classList.remove('hidden');
viewer.classList.add('active');
// Remove any existing iframe or player
contentEl.querySelectorAll('.browser-iframe').forEach(el => el.remove());
contentEl.querySelectorAll('#clappr-player').forEach(el => el.remove());
destroyNativePlayer();
// Fetch player config to determine stream type
const config = await getPlayerConfig(streamId);
if (config.type === 'hls' || config.type === 'daddylive') {
// Native player mode
const playerDiv = document.createElement('div');
playerDiv.id = 'clappr-player';
contentEl.appendChild(playerDiv);
loader.classList.add('hidden');
statusEl.textContent = 'Playing';
statusEl.classList.add('connected');
if (config.type === 'daddylive') {
createDaddyLivePlayer('#clappr-player', config);
} else {
createHLSPlayer('#clappr-player', config.hls_url);
}
return;
}
// Fallback: iframe proxy mode
let parsed;
try {
parsed = new URL(streamURL);
} catch (e) {
statusEl.textContent = 'Invalid URL';
loader.classList.add('hidden');
showToast('Invalid stream URL', 'error');
return;
}
const origin = parsed.origin;
const pathAndSearch = parsed.pathname + parsed.search + parsed.hash;
const b64Origin = btoa(origin).replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
const proxyURL = '/proxy/' + b64Origin + pathAndSearch;
const iframe = document.createElement('iframe');
iframe.src = proxyURL;
iframe.className = 'browser-iframe';
iframe.setAttribute('sandbox', 'allow-scripts allow-same-origin allow-forms allow-popups allow-popups-to-escape-sandbox allow-presentation');
iframe.setAttribute('allow', 'autoplay; encrypted-media; fullscreen');
iframe.setAttribute('allowfullscreen', '');
iframe.onload = function() {
loader.classList.add('hidden');
statusEl.textContent = 'Connected';
statusEl.classList.add('connected');
};
contentEl.appendChild(iframe);
}
function closeBrowserSession() {
destroyNativePlayer();
const viewer = document.getElementById('browser-viewer');
viewer.classList.add('hidden');
viewer.classList.remove('active');
const contentEl = viewer.querySelector('.browser-viewer-content');
contentEl.querySelectorAll('.browser-iframe').forEach(el => el.remove());
contentEl.querySelectorAll('#clappr-player').forEach(el => el.remove());
const statusEl = viewer.querySelector('.browser-viewer-status');
statusEl.textContent = '';
statusEl.classList.remove('connected');
const urlText = document.getElementById('browser-url');
if (urlText) urlText.textContent = '';
// Restore the previously active tab
const activeTab = document.querySelector('.tab-btn.active');
if (activeTab) {
const tabName = activeTab.dataset.tab;
const content = document.getElementById('content-' + tabName);
if (content) content.classList.add('active');
}
}

View file

@ -1,9 +0,0 @@
function escapeHtml(str) {
const div = document.createElement('div');
div.textContent = str;
return div.innerHTML;
}
function escapeAttr(str) {
return str.replace(/&/g, '&amp;').replace(/'/g, '&#39;').replace(/"/g, '&quot;').replace(/</g, '&lt;').replace(/>/g, '&gt;');
}

View file

@ -1,6 +1,4 @@
variable "tls_secret_name" { type = string }
variable "coturn_turn_secret" { type = string }
variable "public_ip" { type = string }
variable "nfs_server" { type = string }
@ -38,48 +36,20 @@ resource "kubernetes_deployment" "f1-stream" {
}
spec {
container {
image = "viktorbarzin/f1-stream:v1.3.1"
image = "viktorbarzin/f1-stream:v2.0.1"
name = "f1-stream"
resources {
limits = {
cpu = "1"
memory = "512Mi"
cpu = "500m"
memory = "256Mi"
}
requests = {
cpu = "50m"
memory = "128Mi"
memory = "64Mi"
}
}
port {
container_port = 8080
}
env {
name = "WEBAUTHN_RPID"
value = "f1.viktorbarzin.me"
}
env {
name = "WEBAUTHN_ORIGIN"
value = "https://f1.viktorbarzin.me"
}
env {
name = "WEBAUTHN_DISPLAY_NAME"
value = "F1 Stream"
}
env {
name = "HEADLESS_EXTRACT_ENABLED"
value = "true"
}
env {
name = "TURN_URL"
value = "turn:${var.public_ip}:3478"
}
env {
name = "TURN_SHARED_SECRET"
value = var.coturn_turn_secret
}
env {
name = "TURN_INTERNAL_URL"
value = "turn:coturn.coturn.svc.cluster.local:3478"
container_port = 8000
}
volume_mount {
name = "data"
@ -114,7 +84,7 @@ resource "kubernetes_service" "f1-stream" {
}
port {
port = "80"
target_port = "8080"
target_port = "8000"
}
}
}

10
stacks/f1-stream/tiers.tf Normal file
View file

@ -0,0 +1,10 @@
# Generated by Terragrunt. Sig: nIlQXj57tbuaRZEa
locals {
tiers = {
core = "0-core"
cluster = "1-cluster"
gpu = "2-gpu"
edge = "3-edge"
aux = "4-aux"
}
}