[ci skip] Flatten module wrappers into stack roots

Remove the module "xxx" { source = "./module" } indirection layer from all 66 service stacks. Resources are now defined directly in each stack's main.tf instead of through a wrapper module. - Merge module/main.tf contents into stack main.tf - Apply variable replacements (var.tier -> local.tiers.X, renamed vars) - Fix shared module paths (one fewer ../ at each level) - Move extra files/dirs (factory/, chart_values, subdirs) to stack root - Update state files to strip module.<name>. prefix - Update CLAUDE.md to reflect flat structure Verified: terragrunt plan shows 0 add, 0 destroy across all stacks.
2026-02-22 15:13:55 +00:00 · 2026-02-22 15:13:55 +00:00 · c7c7047f1c
commit c7c7047f1c
parent b0499a7f31
245 changed files with 11733 additions and 12432 deletions
--- a/stacks/f1-stream/files/.planning/phases/01-scraper-validation/01-01-PLAN.md
+++ b/stacks/f1-stream/files/.planning/phases/01-scraper-validation/01-01-PLAN.md
@ -0,0 +1,237 @@
+---
+phase: 01-scraper-validation
+plan: 01
+type: execute
+wave: 1
+depends_on: []
+files_modified:
+  - internal/scraper/validate.go
+  - internal/scraper/validate_test.go
+  - internal/scraper/scraper.go
+  - main.go
+autonomous: true
+requirements:
+  - SCRP-01
+  - SCRP-02
+  - SCRP-03
+  - SCRP-04
+
+must_haves:
+  truths:
+    - "Scraper still discovers F1-related posts using keyword filtering (existing isF1Post behavior unchanged)"
+    - "Each newly scraped URL is fetched and inspected for video/player content markers before being saved to scraped_links.json"
+    - "URLs without video content markers are discarded and do not appear in scraped_links.json"
+    - "Validation uses a configurable timeout (SCRAPER_VALIDATE_TIMEOUT env var, default 10s) that prevents slow sites from blocking the scrape cycle"
+  artifacts:
+    - path: "internal/scraper/validate.go"
+      provides: "URL validation logic with video marker detection"
+      contains: "func validateLinks"
+    - path: "internal/scraper/validate_test.go"
+      provides: "Unit tests for marker detection and content type checks"
+      contains: "func TestContainsVideoMarkers"
+    - path: "internal/scraper/scraper.go"
+      provides: "Updated Scraper struct with validateTimeout field and validation call in scrape()"
+      contains: "validateTimeout"
+    - path: "main.go"
+      provides: "SCRAPER_VALIDATE_TIMEOUT env var configuration"
+      contains: "SCRAPER_VALIDATE_TIMEOUT"
+  key_links:
+    - from: "internal/scraper/scraper.go"
+      to: "internal/scraper/validate.go"
+      via: "validateLinks call in scrape() between URL extraction and merge"
+      pattern: "validateLinks\\(links"
+    - from: "main.go"
+      to: "internal/scraper/scraper.go"
+      via: "scraper.New() call with validateTimeout parameter"
+      pattern: "scraper\\.New\\(st.*validateTimeout"
+---
+
+<objective>
+Add URL validation to the scraper pipeline so that each extracted URL is proxy-fetched and inspected for video/player content markers before being saved. URLs without video markers are discarded at the source.
+
+Purpose: Eliminate junk links (blog posts, news articles, social media) from scraped results so users only see actual stream pages.
+Output: Working validation step integrated into scraper pipeline, with unit tests and configurable timeout.
+</objective>
+
+<execution_context>
+@/Users/viktorbarzin/.claude/get-shit-done/workflows/execute-plan.md
+@/Users/viktorbarzin/.claude/get-shit-done/templates/summary.md
+</execution_context>
+
+<context>
+@.planning/PROJECT.md
+@.planning/ROADMAP.md
+@.planning/STATE.md
+@.planning/phases/01-scraper-validation/01-RESEARCH.md
+
+@internal/scraper/scraper.go
+@internal/scraper/reddit.go
+@internal/models/models.go
+@main.go
+</context>
+
+<tasks>
+
+<task type="auto">
+  <name>Task 1: Create validate.go with video marker detection</name>
+  <files>internal/scraper/validate.go</files>
+  <action>
+Create `internal/scraper/validate.go` in package `scraper` with the following:
+
+1. Define `videoMarkers` string slice (case-insensitive markers checked against lowercased HTML body):
+   - HTML5: `<video`
+   - HLS: `.m3u8`, `application/x-mpegurl`, `application/vnd.apple.mpegurl`
+   - DASH: `.mpd`, `application/dash+xml`
+   - Player libraries: `hls.js`, `hls.min.js`, `dash.js`, `dash.all.min.js`, `video.js`, `video.min.js`, `videojs`, `jwplayer`, `clappr`, `flowplayer`, `plyr`, `shaka-player`, `mediaelement`, `fluidplayer`
+
+2. Define `videoContentTypes` string slice for direct video Content-Type detection:
+   - `video/`, `application/x-mpegurl`, `application/vnd.apple.mpegurl`, `application/dash+xml`
+
+3. Define `validateBodyLimit = 2 * 1024 * 1024` (2MB).
+
+4. Implement `validateLinks(links []models.ScrapedLink, timeout time.Duration) []models.ScrapedLink`:
+   - Create `http.Client` with the given timeout and a `CheckRedirect` function that stops after 3 redirects (return `http.ErrUseLastResponse` when `len(via) >= 3`).
+   - Iterate over links. For each, call `hasVideoContent(client, link.URL)`. If true, keep the link. If false, log: `scraper: discarded %s (no video markers)` using `truncate(link.URL, 60)`.
+   - Return the filtered slice.
+
+5. Implement `hasVideoContent(client *http.Client, rawURL string) bool`:
+   - Create GET request with `User-Agent` set to the existing `userAgent` constant from `reddit.go`.
+   - Execute request. On error, log and return false.
+   - Check status code: if < 200 or >= 400, return false.
+   - Check Content-Type header (lowercased) against `videoContentTypes` -- if match, return true (it's a direct video file).
+   - If Content-Type is not `text/html` or `application/xhtml`, return false (not a video file, not an HTML page to inspect).
+   - Read body with `io.LimitReader` (2MB limit).
+   - Return `containsVideoMarkers(strings.ToLower(string(body)))`.
+
+6. Implement `containsVideoMarkers(loweredBody string) bool`:
+   - Iterate over `videoMarkers`, return true on first `strings.Contains` match.
+
+7. Implement `isDirectVideoContentType(ct string) bool`:
+   - Lowercase ct, iterate `videoContentTypes`, return true on first `strings.Contains` match.
+
+Imports needed: `io`, `log`, `net/http`, `strings`, `time`, and `f1-stream/internal/models`.
+
+Do NOT use `golang.org/x/net/html` (reserved for Phase 4). This is detection, not extraction -- string matching is sufficient.
+  </action>
+  <verify>
+Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./...` -- must compile without errors.
+  </verify>
+  <done>
+`validate.go` exists with `validateLinks`, `hasVideoContent`, `containsVideoMarkers`, `isDirectVideoContentType` functions. The file compiles as part of the `scraper` package.
+  </done>
+</task>
+
+<task type="auto">
+  <name>Task 2: Wire validation into scraper pipeline and add config</name>
+  <files>internal/scraper/scraper.go, main.go</files>
+  <action>
+**In `internal/scraper/scraper.go`:**
+
+1. Add `validateTimeout time.Duration` field to the `Scraper` struct.
+
+2. Update `New()` signature to accept `validateTimeout` parameter:
+   ```go
+   func New(s *store.Store, interval time.Duration, validateTimeout time.Duration) *Scraper {
+       return &Scraper{store: s, interval: interval, validateTimeout: validateTimeout}
+   }
+   ```
+
+3. In `scrape()` method, add validation step between the `scrapeReddit()` return and the merge-with-existing logic. Insert AFTER the line `log.Printf("scraper: reddit scrape completed in %v, got %d links", ...)` and BEFORE the line `existing, err := s.store.LoadScrapedLinks()`:
+
+   ```go
+   // Validate links - only keep those with video content markers
+   if len(links) > 0 {
+       validated := validateLinks(links, s.validateTimeout)
+       log.Printf("scraper: validated %d/%d links as streams", len(validated), len(links))
+       links = validated
+   }
+   ```
+
+   This preserves SCRP-01: existing keyword filtering in `scrapeReddit()` via `isF1Post()` runs first, then validation filters the results.
+
+**In `main.go`:**
+
+1. Add `validateTimeout` env var read after the existing `scrapeInterval` line:
+   ```go
+   validateTimeout := envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)
+   ```
+
+2. Update the `scraper.New()` call to pass the new parameter:
+   ```go
+   sc := scraper.New(st, scrapeInterval, validateTimeout)
+   ```
+
+Both changes are minimal and follow the existing configuration pattern used for `SCRAPE_INTERVAL`, `PROXY_TIMEOUT`, etc.
+  </action>
+  <verify>
+Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./...` -- must compile without errors. Then run `go vet ./...` -- no issues.
+  </verify>
+  <done>
+`Scraper` struct has `validateTimeout` field. `New()` accepts 3 parameters. `scrape()` calls `validateLinks` between extraction and merge. `main.go` reads `SCRAPER_VALIDATE_TIMEOUT` env var (default 10s) and passes it to `scraper.New()`.
+  </done>
+</task>
+
+<task type="auto">
+  <name>Task 3: Add unit tests for validation functions</name>
+  <files>internal/scraper/validate_test.go</files>
+  <action>
+Create `internal/scraper/validate_test.go` in package `scraper` with the following test functions:
+
+**`TestContainsVideoMarkers`** - table-driven test covering:
+- Positive cases (should return true):
+  - `<video>` tag: `<div><video src="stream.mp4"></video></div>`
+  - HLS manifest ref: `var url = "https://cdn.example.com/live.m3u8";`
+  - DASH manifest ref: `<source src="stream.mpd" type="application/dash+xml">`
+  - HLS.js library: `<script src="/js/hls.min.js"></script>`
+  - Video.js library: `<script src="https://cdn.example.com/video.js"></script>`
+  - JW Player: `<div id="jwplayer-container"></div><script>jwplayer("jwplayer-container")</script>`
+  - Clappr: `<script src="clappr.min.js"></script>`
+  - Flowplayer: `<script>flowplayer("#player")</script>`
+
+- Negative cases (should return false):
+  - Plain HTML: `<html><body><p>Hello world</p></body></html>`
+  - Reddit link page: `<html><body><a href="https://example.com">Click here</a></body></html>`
+  - Blog post: `<html><body><article>F1 race results and analysis...</article></body></html>`
+  - Empty string: ``
+
+Each test calls `containsVideoMarkers(body)` (input is already lowercased in real usage, but test with lowercase content to match real behavior) and asserts the result.
+
+**`TestIsDirectVideoContentType`** - table-driven test covering:
+- Positive: `video/mp4`, `video/webm`, `application/x-mpegurl`, `application/vnd.apple.mpegurl`, `application/dash+xml`, `video/mp4; charset=utf-8` (with params)
+- Negative: `text/html`, `application/json`, `image/png`, `text/plain`, empty string
+
+Each test calls `isDirectVideoContentType(ct)` and asserts the result.
+
+Use standard `testing` package only. Follow existing test conventions in the codebase (table-driven tests with `t.Run`).
+  </action>
+  <verify>
+Run `cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go test ./internal/scraper/ -v -run "TestContainsVideoMarkers|TestIsDirectVideoContentType"` -- all tests pass.
+  </verify>
+  <done>
+`validate_test.go` exists with `TestContainsVideoMarkers` (8+ positive, 4+ negative cases) and `TestIsDirectVideoContentType` (6+ positive, 5+ negative cases). All tests pass.
+  </done>
+</task>
+
+</tasks>
+
+<verification>
+1. `go build ./...` compiles without errors
+2. `go vet ./...` reports no issues
+3. `go test ./internal/scraper/ -v` -- all tests pass
+4. Verify `validate.go` contains the `videoMarkers` slice with at least 15 markers
+5. Verify `scraper.go:scrape()` calls `validateLinks` between `scrapeReddit()` return and `LoadScrapedLinks()`
+6. Verify `main.go` reads `SCRAPER_VALIDATE_TIMEOUT` with default `10*time.Second`
+7. Verify the existing `isF1Post` keyword filtering in `scrapeReddit()` is untouched (SCRP-01)
+</verification>
+
+<success_criteria>
+- The scraper pipeline compiles and all tests pass
+- `validateLinks` is called in `scrape()` after URL extraction but before merge, filtering out URLs without video markers
+- The validation timeout is configurable via `SCRAPER_VALIDATE_TIMEOUT` env var (default 10s)
+- Existing F1 keyword filtering behavior is preserved unchanged
+- No new external dependencies are introduced (stdlib only)
+</success_criteria>
+
+<output>
+After completion, create `.planning/phases/01-scraper-validation/01-01-SUMMARY.md`
+</output>
--- a/stacks/f1-stream/files/.planning/phases/01-scraper-validation/01-01-SUMMARY.md
+++ b/stacks/f1-stream/files/.planning/phases/01-scraper-validation/01-01-SUMMARY.md
@ -0,0 +1,107 @@
+---
+phase: 01-scraper-validation
+plan: 01
+subsystem: scraper
+tags: [go, http, video-detection, content-validation, streaming]
+
+# Dependency graph
+requires: []
+provides:
+  - "URL validation pipeline with video marker detection (validateLinks)"
+  - "Configurable validation timeout via SCRAPER_VALIDATE_TIMEOUT env var"
+  - "Video content type and HTML marker detection functions"
+affects: [02-health-checks, 04-link-extraction]
+
+# Tech tracking
+tech-stack:
+  added: []
+  patterns:
+    - "Pipeline filter pattern: scrapeReddit -> validateLinks -> merge"
+    - "String-match video detection (no DOM parsing) for Phase 1 speed"
+    - "2MB body limit for HTML inspection to prevent memory issues"
+
+key-files:
+  created:
+    - internal/scraper/validate.go
+    - internal/scraper/validate_test.go
+  modified:
+    - internal/scraper/scraper.go
+    - main.go
+
+key-decisions:
+  - "String matching over DOM parsing for video detection (DOM reserved for Phase 4)"
+  - "2MB body limit to prevent memory issues on large pages"
+  - "3 redirect limit to avoid infinite redirect chains"
+
+patterns-established:
+  - "Pipeline filter: validate scraped links before merge into store"
+  - "Env var config pattern: envDuration for timeout configuration"
+
+requirements-completed: [SCRP-01, SCRP-02, SCRP-03, SCRP-04]
+
+# Metrics
+duration: 3min
+completed: 2026-02-17
+---
+
+# Phase 1 Plan 1: Scraper Validation Summary
+
+**URL validation pipeline with 18 video/player markers filtering scraped links before store merge, configurable via SCRAPER_VALIDATE_TIMEOUT**
+
+## Performance
+
+- **Duration:** 3 min
+- **Started:** 2026-02-17T20:49:16Z
+- **Completed:** 2026-02-17T20:51:54Z
+- **Tasks:** 3
+- **Files modified:** 4
+
+## Accomplishments
+- Created validate.go with 18 video/player markers covering HTML5, HLS, DASH, and 10+ player libraries
+- Wired validateLinks into scrape() pipeline between URL extraction and store merge
+- Added SCRAPER_VALIDATE_TIMEOUT env var (default 10s) following existing config patterns
+- Added 25 unit tests (10 positive + 4 negative marker tests, 6 positive + 5 negative content type tests)
+
+## Task Commits
+
+Each task was committed atomically:
+
+1. **Task 1: Create validate.go with video marker detection** - `adeb478` (feat)
+2. **Task 2: Wire validation into scraper pipeline and add config** - `22d29db` (feat)
+3. **Task 3: Add unit tests for validation functions** - `6c5cc02` (test)
+
+## Files Created/Modified
+- `internal/scraper/validate.go` - URL validation with video marker detection (validateLinks, hasVideoContent, containsVideoMarkers, isDirectVideoContentType)
+- `internal/scraper/validate_test.go` - Table-driven unit tests for marker detection and content type checks (25 cases)
+- `internal/scraper/scraper.go` - Added validateTimeout field and validateLinks call in scrape()
+- `main.go` - Added SCRAPER_VALIDATE_TIMEOUT env var read (default 10s)
+
+## Decisions Made
+- Used string matching (not DOM parsing) for video detection -- DOM parsing reserved for Phase 4 link extraction
+- Set 2MB body read limit to prevent memory issues on large streaming pages
+- Limited redirects to 3 to avoid infinite redirect chains on sketchy stream sites
+- Validation runs sequentially (not concurrent) to avoid overwhelming target sites
+
+## Deviations from Plan
+
+None - plan executed exactly as written.
+
+## Issues Encountered
+None.
+
+## User Setup Required
+
+None - no external service configuration required.
+
+## Next Phase Readiness
+- Validation pipeline is integrated and tested, ready for health check layer (Phase 2)
+- The validateLinks function provides the filtering foundation that health checks will build upon
+- No blockers or concerns
+
+## Self-Check: PASSED
+
+All 5 files verified present. All 3 task commits verified in git log.
+
+---
+*Phase: 01-scraper-validation*
+*Completed: 2026-02-17*
--- a/stacks/f1-stream/files/.planning/phases/01-scraper-validation/01-RESEARCH.md
+++ b/stacks/f1-stream/files/.planning/phases/01-scraper-validation/01-RESEARCH.md
@ -0,0 +1,599 @@
+# Phase 1: Scraper Validation - Research
+
+**Researched:** 2026-02-17
+**Domain:** HTTP content fetching and HTML video/player content detection in Go
+**Confidence:** HIGH
+
+## Summary
+
+Phase 1 adds a validation step to the existing Reddit scraper pipeline. Currently, the scraper extracts ALL URLs from F1-related Reddit posts and saves them to `scraped_links.json` without verifying whether they point to actual stream pages. The validation step will proxy-fetch each extracted URL (reusing the existing proxy's HTTP client pattern) and inspect the HTML response for video/player content markers before saving.
+
+The implementation is straightforward because the codebase already has all the infrastructure needed: HTTP fetching with timeouts (used in both `internal/scraper/reddit.go` and `internal/proxy/proxy.go`), URL validation, and the scraper pipeline with deduplication. The new code is a validation function inserted between URL extraction and saving, operating on the same `[]models.ScrapedLink` type.
+
+**Primary recommendation:** Add a `validateStreamURL` function in a new file `internal/scraper/validate.go` that uses string-based content matching (not full HTML parsing) to detect video markers, with `golang.org/x/net/html` reserved for Phase 4 (video extraction). Keep it simple: fetch the page, lowercase the body, check for known patterns. This avoids adding a dependency for Phase 1 while Phase 2 will reuse the same validation logic for health checks.
+
+<phase_requirements>
+## Phase Requirements
+
+| ID | Description | Research Support |
+|----|-------------|-----------------|
+| SCRP-01 | Scraper filters Reddit posts by F1 keywords before extracting URLs (existing behavior, preserve) | Existing `isF1Post()` function in `reddit.go` lines 272-285 handles this. No changes needed -- just ensure the validation step is added AFTER URL extraction, not replacing the keyword filter. |
+| SCRP-02 | Scraper validates each extracted URL by proxy-fetching it and checking for video/player content markers | New `validateStreamURL()` function fetches URL with configurable timeout, reads response body, checks for video content markers (see "Video Content Markers" section below for complete list). Reuse existing HTTP client pattern from `reddit.go:88`. |
+| SCRP-03 | URLs that don't look like streams (no video markers detected) are discarded before saving | Filter applied in `scraper.go:scrape()` between URL extraction (line 57) and merge/save (line 60). Only URLs passing validation are included in the `links` slice passed to the merge step. |
+| SCRP-04 | Validation has a configurable timeout (default 10s) to avoid blocking on slow sites | Add `SCRAPER_VALIDATE_TIMEOUT` environment variable read in `main.go`, passed to `scraper.New()`. Use `context.WithTimeout` on per-URL fetch to enforce deadline. Default 10 seconds. |
+</phase_requirements>
+
+## Standard Stack
+
+### Core
+
+| Library | Version | Purpose | Why Standard |
+|---------|---------|---------|--------------|
+| `net/http` | stdlib | HTTP client for fetching URLs | Already used throughout codebase (`reddit.go`, `proxy.go`). No external dependency needed. |
+| `strings` | stdlib | Case-insensitive string matching for content markers | Already used extensively. `strings.Contains` on lowercased body is the simplest approach for marker detection. |
+| `regexp` | stdlib | Pattern matching for HLS/DASH URLs in page source | Already used in `reddit.go` for URL extraction. Needed for matching `.m3u8` and `.mpd` URL patterns in HTML content. |
+| `context` | stdlib | Timeout enforcement per URL validation | Already used in scraper (`scraper.go:Run`). `context.WithTimeout` provides per-request deadline. |
+| `io` | stdlib | `io.LimitReader` for response body size limiting | Already used in `proxy.go` and `reddit.go` for body size limits. |
+
+### Supporting
+
+| Library | Version | Purpose | When to Use |
+|---------|---------|---------|-------------|
+| `golang.org/x/net/html` | latest | Full HTML DOM parsing | NOT needed for Phase 1. Reserve for Phase 4 (video source extraction). String matching is sufficient for detection. |
+| `sync` | stdlib | WaitGroup for parallel validation | If parallel validation is desired. But sequential is simpler and respects rate limits of target sites. |
+
+### Alternatives Considered
+
+| Instead of | Could Use | Tradeoff |
+|------------|-----------|----------|
+| String matching on body | `golang.org/x/net/html` DOM parsing | DOM parsing is more accurate but adds a dependency and complexity. For Phase 1 (detection, not extraction), string matching is sufficient. Phase 4 needs DOM parsing for actual source extraction. |
+| Sequential URL validation | `sync.WaitGroup` parallel validation | Parallel is faster but risks triggering rate limits on target sites and complicates error handling. Sequential with timeout is simpler and predictable. |
+| Custom HTTP client | Reuse proxy's `*http.Client` | The proxy client has redirect limits and timeout already configured. But the scraper should have its own client with validation-specific timeout. Keep them independent. |
+
+**Installation:**
+```bash
+# No new dependencies needed for Phase 1. All stdlib.
+# golang.org/x/net/html deferred to Phase 4.
+```
+
+## Architecture Patterns
+
+### Where Validation Fits in the Pipeline
+
+```
+Current flow:
+  scrapeReddit() -> []models.ScrapedLink -> merge with existing -> save
+
+New flow:
+  scrapeReddit() -> []models.ScrapedLink -> validateLinks() -> []models.ScrapedLink -> merge with existing -> save
+```
+
+The validation step is a filter function that takes a slice of scraped links and returns only those that pass validation. This keeps the existing pipeline intact and makes the validation step independently testable.
+
+### Recommended File Structure
+
+```
+internal/scraper/
+  scraper.go       # Orchestrator (existing, add validateTimeout field + call validateLinks)
+  reddit.go        # Reddit API scraping (existing, no changes)
+  validate.go      # NEW: validateStreamURL(), validateLinks(), content marker definitions
+```
+
+### Pattern 1: Validation as a Filter Function
+
+**What:** A pure filter function that takes `[]models.ScrapedLink` and returns the subset that pass validation.
+**When to use:** When adding a validation/filter step to an existing pipeline.
+**Example:**
+
+```go
+// internal/scraper/validate.go
+
+// validateLinks filters links to only those with video content markers.
+// Each URL is fetched with the given timeout and inspected for markers.
+func validateLinks(links []models.ScrapedLink, timeout time.Duration) []models.ScrapedLink {
+    client := &http.Client{Timeout: timeout}
+    var valid []models.ScrapedLink
+    for _, link := range links {
+        if hasVideoContent(client, link.URL) {
+            valid = append(valid, link)
+        } else {
+            log.Printf("scraper: discarded %s (no video markers)", truncate(link.URL, 60))
+        }
+    }
+    return valid
+}
+
+// hasVideoContent fetches a URL and checks for video/player content markers.
+func hasVideoContent(client *http.Client, rawURL string) bool {
+    req, err := http.NewRequest("GET", rawURL, nil)
+    if err != nil {
+        return false
+    }
+    req.Header.Set("User-Agent", userAgent) // reuse existing constant
+
+    resp, err := client.Do(req)
+    if err != nil {
+        log.Printf("scraper: validate fetch error for %s: %v", truncate(rawURL, 60), err)
+        return false
+    }
+    defer resp.Body.Close()
+
+    // Only inspect HTML responses
+    ct := resp.Header.Get("Content-Type")
+    if !strings.Contains(ct, "text/html") && !strings.Contains(ct, "application/xhtml") {
+        // Could be a direct video file (.m3u8, .mpd, .mp4) which is valid
+        if isDirectVideoContentType(ct) {
+            return true
+        }
+        return false
+    }
+
+    body, err := io.ReadAll(io.LimitReader(resp.Body, 2*1024*1024)) // 2MB limit for validation
+    if err != nil {
+        return false
+    }
+
+    return containsVideoMarkers(strings.ToLower(string(body)))
+}
+```
+
+### Pattern 2: Configuration Via Struct Field
+
+**What:** Pass validation timeout through the existing `Scraper` struct, configured from `main.go` env vars.
+**When to use:** Following existing codebase pattern where all config flows through `main.go` -> constructor.
+**Example:**
+
+```go
+// internal/scraper/scraper.go
+type Scraper struct {
+    store           *store.Store
+    interval        time.Duration
+    validateTimeout time.Duration // NEW
+    mu              sync.Mutex
+}
+
+func New(s *store.Store, interval time.Duration, validateTimeout time.Duration) *Scraper {
+    return &Scraper{store: s, interval: interval, validateTimeout: validateTimeout}
+}
+
+// In main.go:
+validateTimeout := envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)
+sc := scraper.New(st, scrapeInterval, validateTimeout)
+```
+
+### Pattern 3: Integration Point in scrape()
+
+**What:** Call validateLinks between scrapeReddit return and merge step.
+**When to use:** Minimal change to existing scrape flow.
+**Example:**
+
+```go
+// In scraper.go:scrape() - between lines 57 and 60
+links, err := scrapeReddit()
+if err != nil {
+    // ... existing error handling
+}
+log.Printf("scraper: reddit scrape completed in %v, got %d links", time.Since(start).Round(time.Millisecond), len(links))
+
+// NEW: validate links before merging
+if len(links) > 0 {
+    validated := validateLinks(links, s.validateTimeout)
+    log.Printf("scraper: validated %d/%d links as streams", len(validated), len(links))
+    links = validated
+}
+
+// Continue with existing merge logic...
+```
+
+### Anti-Patterns to Avoid
+
+- **Fetching URLs inside the Reddit API loop:** Validation should happen after all URLs are collected from Reddit, not interleaved with Reddit API calls. This keeps the Reddit API calls fast and avoids mixing rate-limit concerns.
+- **Using the proxy's HTTP handler for internal validation:** The proxy (`internal/proxy/proxy.go`) is designed as an HTTP handler for client-facing requests with IP-based rate limiting. The scraper should use its own HTTP client without rate limiting since it is a trusted internal caller.
+- **Modifying the ScrapedLink model to track validation state:** For Phase 1, validation is a binary filter (pass or discard). Adding validation metadata to the model is premature and adds complexity to the store layer. If needed in Phase 2 for health checking, it can be added then.
+- **Full HTML DOM parsing for detection:** Using `golang.org/x/net/html` to parse the full DOM tree just to detect presence of video tags is overkill. String matching on lowercased HTML body is sufficient for detection. DOM parsing is needed in Phase 4 for actual source URL extraction.
+
+## Don't Hand-Roll
+
+| Problem | Don't Build | Use Instead | Why |
+|---------|-------------|-------------|-----|
+| HTTP fetching with timeout | Custom TCP client | `net/http.Client` with `Timeout` field | stdlib handles redirects, TLS, timeouts, connection pooling |
+| HTML content inspection | Full DOM parser | `strings.Contains` on lowercased body | Detection (yes/no) does not need structural parsing; string matching is faster and simpler |
+| URL scheme validation | Manual string prefix check | `net/url.Parse` + scheme check | Already used in codebase; handles edge cases |
+| Concurrent timeout enforcement | Manual goroutine + channel | `context.WithTimeout` + `http.NewRequestWithContext` | stdlib integration; cancels in-flight requests properly |
+
+**Key insight:** Phase 1 is a detection problem (does this page look like a stream?), not an extraction problem (what is the stream URL?). Detection can be done with string matching. Extraction (Phase 4) needs DOM parsing.
+
+## Video Content Markers
+
+### HIGH confidence markers (any one of these strongly indicates a stream page)
+
+**HTML Tags:**
+- `<video` - HTML5 video element
+- `<source` with `type="application/x-mpegurl"` or `type="application/dash+xml"` - HLS/DASH sources
+- `<iframe` with src containing known player domains
+
+**HLS/DASH Manifest References:**
+- `.m3u8` - HLS manifest file extension
+- `.mpd` - DASH manifest file extension
+- `application/x-mpegurl` - HLS MIME type
+- `application/vnd.apple.mpegurl` - Alternative HLS MIME type
+- `application/dash+xml` - DASH MIME type
+
+**Player Library References:**
+- `hls.js` or `hls.min.js` - HLS.js player library
+- `dash.js` or `dash.all.min.js` - DASH.js player library
+- `video.js` or `video.min.js` or `videojs` - Video.js player
+- `jwplayer` - JW Player
+- `clappr` - Clappr player
+- `flowplayer` - Flowplayer
+- `plyr` - Plyr player
+- `shaka-player` or `shaka` - Google Shaka Player
+- `mediaelement` - MediaElement.js
+- `fluidplayer` - Fluid Player
+
+**Direct Video File Extensions in URLs:**
+- `.mp4` - MPEG-4 video
+- `.webm` - WebM video
+- `.ts` (in context of `.m3u8` references) - MPEG-TS segments
+
+### MEDIUM confidence markers (suggestive but not conclusive)
+
+- `player` (as class, id, or variable name in context of video)
+- `stream` (in context of video-related markup)
+- `embed` (in context of video players)
+
+### Implementation Strategy
+
+Use a tiered approach:
+1. First check for HIGH confidence markers (any single match = valid)
+2. Do NOT use MEDIUM confidence markers alone (too many false positives)
+3. Direct video content types in HTTP response (`video/mp4`, `application/x-mpegurl`, etc.) are valid without HTML inspection
+
+```go
+// Content markers to check (case-insensitive, checked against lowercased body)
+var videoMarkers = []string{
+    // HTML5 video element
+    "<video",
+    // HLS markers
+    ".m3u8",
+    "application/x-mpegurl",
+    "application/vnd.apple.mpegurl",
+    // DASH markers
+    ".mpd",
+    "application/dash+xml",
+    // Player libraries
+    "hls.js", "hls.min.js",
+    "dash.js", "dash.all.min.js",
+    "video.js", "video.min.js", "videojs",
+    "jwplayer",
+    "clappr",
+    "flowplayer",
+    "plyr",
+    "shaka-player",
+    "mediaelement",
+    "fluidplayer",
+}
+
+// Direct video content types (check Content-Type header)
+var videoContentTypes = []string{
+    "video/",
+    "application/x-mpegurl",
+    "application/vnd.apple.mpegurl",
+    "application/dash+xml",
+    "application/mpegurl",
+}
+
+func containsVideoMarkers(loweredBody string) bool {
+    for _, marker := range videoMarkers {
+        if strings.Contains(loweredBody, marker) {
+            return true
+        }
+    }
+    return false
+}
+
+func isDirectVideoContentType(ct string) bool {
+    ct = strings.ToLower(ct)
+    for _, vct := range videoContentTypes {
+        if strings.Contains(ct, vct) {
+            return true
+        }
+    }
+    return false
+}
+```
+
+## Common Pitfalls
+
+### Pitfall 1: Blocking the Scrape Cycle on Slow/Unresponsive URLs
+
+**What goes wrong:** A single URL that times out at 10s, multiplied by 50 URLs per scrape cycle, means the validation step takes 500 seconds (8+ minutes). With a 15-minute scrape interval, validation could overlap with the next cycle.
+**Why it happens:** Sequential validation with per-URL timeout does not have a total budget for the validation step.
+**How to avoid:** The per-URL timeout (SCRP-04) handles individual slowness. Additionally, consider logging total validation time. The mutex in `scraper.go:scrape()` already prevents concurrent scrapes, so overlap is safe (next scrape just waits). With typical scrape volumes (5-20 new URLs per cycle), even worst case (20 * 10s = 200s) is well within the 15-minute interval.
+**Warning signs:** Scrape logs showing validation taking longer than half the scrape interval.
+
+### Pitfall 2: False Negatives from JavaScript-Rendered Pages
+
+**What goes wrong:** Many streaming sites load their video player via JavaScript. An HTTP fetch gets the initial HTML which may not contain `<video>` tags or player references -- those are injected by JS after page load.
+**Why it happens:** HTTP fetching returns raw HTML; no JavaScript execution.
+**How to avoid:** This is an accepted limitation per the requirements doc ("Full browser automation (Puppeteer/Playwright)" is Out of Scope). The marker list includes JavaScript library references (e.g., `hls.js`, `video.js`) which ARE present in the raw HTML even before execution. Most streaming sites include their player library in `<script>` tags in the initial HTML. The marker list is designed to catch these references.
+**Warning signs:** Known good stream URLs being discarded. Monitor discard rate in logs.
+
+### Pitfall 3: HTTP vs HTTPS URL Handling
+
+**What goes wrong:** The existing proxy only supports HTTPS URLs (line 68 in `proxy.go`), but scraped URLs may be HTTP. If the validator only accepts HTTPS, valid HTTP stream URLs get discarded.
+**Why it happens:** The proxy has a stricter security requirement (client-facing) than the scraper (internal validation).
+**How to avoid:** The scraper validator should accept both HTTP and HTTPS URLs for fetching. The proxy's HTTPS restriction is appropriate for its purpose but should not be inherited by the validator.
+**Warning signs:** All HTTP URLs being discarded.
+
+### Pitfall 4: Redirect Chains Leading to Non-Stream Content
+
+**What goes wrong:** A URL redirects through several pages (ads, link shorteners) before reaching the actual stream page. The HTTP client follows redirects (Go default: up to 10), but the intermediate pages may not have video markers.
+**Why it happens:** Streaming links from Reddit often go through shorteners or ad-redirect chains.
+**How to avoid:** Go's `http.Client` follows redirects by default (up to 10). The validation checks the FINAL response after redirects. Set a reasonable redirect limit (3, matching proxy's limit) to avoid infinite chains. The timeout applies to the entire chain, so slow redirect chains will timeout.
+**Warning signs:** Timeout errors on URLs that resolve fine in a browser.
+
+### Pitfall 5: Large Response Bodies Causing Memory Pressure
+
+**What goes wrong:** A streaming site returns a huge HTML page (or a direct video file), and the validator reads the entire body into memory.
+**Why it happens:** No body size limit on validation fetches.
+**How to avoid:** Use `io.LimitReader` (already a pattern in the codebase). 2MB is sufficient for detecting markers in HTML. For direct video content types, the Content-Type header check is sufficient -- no need to read the body.
+**Warning signs:** Memory spikes during scrape cycles.
+
+### Pitfall 6: Changing the `scraper.New()` Signature Breaks Existing Call Site
+
+**What goes wrong:** Adding `validateTimeout` parameter to `New()` changes its signature, breaking the call in `main.go`.
+**Why it happens:** Go does not have optional parameters.
+**How to avoid:** Update both `New()` in `scraper.go` and the call site in `main.go` simultaneously. This is a simple, predictable change. Alternative: use options pattern, but that is overkill for adding one field.
+**Warning signs:** Compilation error -- easily caught.
+
+## Code Examples
+
+### Complete validateLinks Implementation
+
+```go
+// internal/scraper/validate.go
+package scraper
+
+import (
+    "io"
+    "log"
+    "net/http"
+    "strings"
+    "time"
+
+    "f1-stream/internal/models"
+)
+
+// videoMarkers are case-insensitive strings that indicate video/player content.
+// Checked against lowercased HTML body.
+var videoMarkers = []string{
+    "<video",
+    ".m3u8",
+    "application/x-mpegurl",
+    "application/vnd.apple.mpegurl",
+    ".mpd",
+    "application/dash+xml",
+    "hls.js", "hls.min.js",
+    "dash.js", "dash.all.min.js",
+    "video.js", "video.min.js", "videojs",
+    "jwplayer",
+    "clappr",
+    "flowplayer",
+    "plyr",
+    "shaka-player",
+    "mediaelement",
+    "fluidplayer",
+}
+
+// videoContentTypes are Content-Type prefixes that indicate direct video content.
+var videoContentTypes = []string{
+    "video/",
+    "application/x-mpegurl",
+    "application/vnd.apple.mpegurl",
+    "application/dash+xml",
+}
+
+const validateBodyLimit = 2 * 1024 * 1024 // 2MB
+
+// validateLinks filters links to only those whose URLs contain video content markers.
+func validateLinks(links []models.ScrapedLink, timeout time.Duration) []models.ScrapedLink {
+    client := &http.Client{
+        Timeout: timeout,
+        CheckRedirect: func(req *http.Request, via []*http.Request) error {
+            if len(via) >= 3 {
+                return http.ErrUseLastResponse
+            }
+            return nil
+        },
+    }
+
+    var valid []models.ScrapedLink
+    for _, link := range links {
+        if hasVideoContent(client, link.URL) {
+            valid = append(valid, link)
+            log.Printf("scraper: validated %s (video markers found)", truncate(link.URL, 60))
+        } else {
+            log.Printf("scraper: discarded %s (no video markers)", truncate(link.URL, 60))
+        }
+    }
+    return valid
+}
+
+func hasVideoContent(client *http.Client, rawURL string) bool {
+    req, err := http.NewRequest("GET", rawURL, nil)
+    if err != nil {
+        return false
+    }
+    req.Header.Set("User-Agent", userAgent)
+
+    resp, err := client.Do(req)
+    if err != nil {
+        log.Printf("scraper: validate fetch error for %s: %v", truncate(rawURL, 60), err)
+        return false
+    }
+    defer resp.Body.Close()
+
+    if resp.StatusCode < 200 || resp.StatusCode >= 400 {
+        return false
+    }
+
+    ct := strings.ToLower(resp.Header.Get("Content-Type"))
+
+    // Check if response is a direct video content type
+    for _, vct := range videoContentTypes {
+        if strings.Contains(ct, vct) {
+            return true
+        }
+    }
+
+    // Only inspect HTML responses for markers
+    if !strings.Contains(ct, "text/html") && !strings.Contains(ct, "application/xhtml") {
+        return false
+    }
+
+    body, err := io.ReadAll(io.LimitReader(resp.Body, validateBodyLimit))
+    if err != nil {
+        return false
+    }
+
+    return containsVideoMarkers(strings.ToLower(string(body)))
+}
+
+func containsVideoMarkers(loweredBody string) bool {
+    for _, marker := range videoMarkers {
+        if strings.Contains(loweredBody, marker) {
+            return true
+        }
+    }
+    return false
+}
+```
+
+### Integration in scraper.go
+
+```go
+// scraper.go - modified scrape() method
+func (s *Scraper) scrape() {
+    s.mu.Lock()
+    defer s.mu.Unlock()
+
+    start := time.Now()
+    log.Println("scraper: starting scrape")
+    links, err := scrapeReddit()
+    if err != nil {
+        log.Printf("scraper: error after %v: %v", time.Since(start).Round(time.Millisecond), err)
+        return
+    }
+    log.Printf("scraper: reddit scrape completed in %v, got %d links", time.Since(start).Round(time.Millisecond), len(links))
+
+    // Validate links - only keep those with video content markers
+    if len(links) > 0 {
+        validated := validateLinks(links, s.validateTimeout)
+        log.Printf("scraper: validated %d/%d links as streams", len(validated), len(links))
+        links = validated
+    }
+
+    // Rest of existing merge logic unchanged...
+}
+```
+
+### Configuration in main.go
+
+```go
+// main.go additions
+validateTimeout := envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)
+sc := scraper.New(st, scrapeInterval, validateTimeout)
+```
+
+### Unit Test for containsVideoMarkers
+
+```go
+// internal/scraper/validate_test.go
+package scraper
+
+import "testing"
+
+func TestContainsVideoMarkers(t *testing.T) {
+    tests := []struct {
+        name     string
+        body     string
+        expected bool
+    }{
+        {"video tag", `<div><video src="stream.mp4"></video></div>`, true},
+        {"hls manifest", `var url = "https://cdn.example.com/live.m3u8";`, true},
+        {"dash manifest", `<source src="stream.mpd" type="application/dash+xml">`, true},
+        {"hls.js library", `<script src="/js/hls.min.js"></script>`, true},
+        {"video.js library", `<script src="https://cdn.example.com/video.js"></script>`, true},
+        {"jwplayer", `<div id="jwplayer-container"></div><script>jwplayer("jwplayer-container")</script>`, true},
+        {"no markers", `<html><body><p>Hello world</p></body></html>`, false},
+        {"reddit link page", `<html><body><a href="https://example.com">Click here</a></body></html>`, false},
+        {"blog post", `<html><body><article>F1 race results...</article></body></html>`, false},
+    }
+
+    for _, tt := range tests {
+        t.Run(tt.name, func(t *testing.T) {
+            result := containsVideoMarkers(tt.body)
+            if result != tt.expected {
+                t.Errorf("containsVideoMarkers(%q) = %v, want %v", truncate(tt.body, 40), result, tt.expected)
+            }
+        })
+    }
+}
+```
+
+## State of the Art
+
+| Old Approach | Current Approach | When Changed | Impact |
+|--------------|------------------|--------------|--------|
+| Save all URLs from F1 posts | Will validate each URL before saving | Phase 1 (now) | Eliminates junk links at the source |
+| No content inspection | String-based marker detection | Phase 1 (now) | Simple, fast, no external dependencies |
+| Static marker list | Static marker list (sufficient for now) | - | May need updating as new players emerge; easily extensible |
+
+**Why not use browser automation:**
+The REQUIREMENTS.md explicitly marks "Full browser automation (Puppeteer/Playwright)" as Out of Scope. HTTP-based checks with string matching catch the majority of stream pages because player library `<script>` tags are present in raw HTML even before JavaScript execution.
+
+## Open Questions
+
+1. **How many URLs per scrape cycle will need validation?**
+   - What we know: The subreddit listing fetches 25 posts, filters by F1 keywords. Typical F1-related posts might yield 5-20 unique URLs per cycle after deduplication and domain filtering.
+   - What's unclear: Real-world distribution. Could be 2 URLs or 50.
+   - Recommendation: Log validation counts in production. The sequential approach with 10s timeout per URL handles up to ~90 URLs within the 15-minute scrape interval (worst case).
+
+2. **Should already-validated URLs be re-validated on subsequent scrape cycles?**
+   - What we know: The current merge step deduplicates by URL -- existing URLs are not re-processed. Validation runs only on newly discovered URLs.
+   - What's unclear: Whether a URL that failed validation last cycle should be retried.
+   - Recommendation: No retry needed for Phase 1. Failed URLs are simply not saved. If the URL appears again in a future scrape, it will be a "new" URL (not yet in scraped.json) and get validated again naturally.
+
+3. **Will the marker list need frequent updates?**
+   - What we know: Major player libraries (HLS.js, Video.js, JW Player) are well-established and their names are stable.
+   - What's unclear: Whether niche streaming sites use custom players with no recognizable markers.
+   - Recommendation: Start with the current list. Monitor discard rate in logs. Add markers if known-good sites are being discarded. The list is a simple string slice, trivially extensible.
+
+## Sources
+
+### Primary (HIGH confidence)
+- Codebase analysis: `internal/scraper/scraper.go`, `internal/scraper/reddit.go`, `internal/proxy/proxy.go` - existing patterns for HTTP fetching, URL processing, pipeline structure
+- Codebase analysis: `internal/models/models.go` - ScrapedLink type definition
+- Codebase analysis: `main.go` - env var pattern, dependency initialization
+- Go stdlib docs: `net/http`, `strings`, `io`, `context` packages
+- `golang.org/x/net/html` package API (verified via pkg.go.dev) - confirmed `html.Parse`, `Node.Descendants()`, `atom.Video` available. Reserved for Phase 4.
+
+### Secondary (MEDIUM confidence)
+- `.planning/REQUIREMENTS.md` - phase requirements, out-of-scope decisions
+- `.planning/ROADMAP.md` - phase dependencies (Phase 2 reuses validation logic)
+- `.planning/codebase/ARCHITECTURE.md` - data flow, cross-cutting concerns
+
+### Tertiary (LOW confidence)
+- Video player library names (hls.js, video.js, jwplayer, etc.) - based on widely known ecosystem knowledge. The specific set of markers may need tuning based on real-world stream sites.
+
+## Metadata
+
+**Confidence breakdown:**
+- Standard stack: HIGH - uses only stdlib; all patterns already exist in codebase
+- Architecture: HIGH - minimal change to existing pipeline; clear integration point
+- Pitfalls: HIGH - pitfalls are well-understood; mitigations use existing codebase patterns
+- Video markers: MEDIUM - marker list covers major players but may miss niche sites; easily extensible
+
+**Research date:** 2026-02-17
+**Valid until:** 2026-03-17 (stable domain; marker list may need periodic updates)
--- a/stacks/f1-stream/files/.planning/phases/01-scraper-validation/01-VERIFICATION.md
+++ b/stacks/f1-stream/files/.planning/phases/01-scraper-validation/01-VERIFICATION.md
@ -0,0 +1,200 @@
+---
+phase: 01-scraper-validation
+verified: 2026-02-17T20:58:00Z
+status: passed
+score: 4/4 must-haves verified
+re_verification: false
+---
+
+# Phase 1: Scraper Validation Verification Report
+
+**Phase Goal:** Scraped URLs are verified to contain actual video/player content before being stored, eliminating junk links at the source
+
+**Verified:** 2026-02-17T20:58:00Z
+
+**Status:** PASSED
+
+**Re-verification:** No — initial verification
+
+## Goal Achievement
+
+### Observable Truths
+
+All 4 success criteria from ROADMAP.md verified:
+
+| # | Truth | Status | Evidence |
+|---|-------|--------|----------|
+| 1 | Scraper still discovers F1-related posts using keyword filtering (existing behavior preserved) | ✓ VERIFIED | `reddit.go:105` calls `isF1Post(post.Title)` before processing posts. Keywords defined in `f1Keywords` slice (lines 29-39). Keyword filtering runs BEFORE validation step. |
+| 2 | Each extracted URL is proxy-fetched and inspected for video/player content markers | ✓ VERIFIED | `scraper.go:62` calls `validateLinks(links, s.validateTimeout)` after URL extraction. `validate.go:56-76` implements fetch + marker inspection with 20 video markers (HTML5, HLS, DASH, 10+ player libraries). |
+| 3 | URLs without video content markers are discarded and do not appear in scraped.json | ✓ VERIFIED | `validate.go:71-73` logs discarded URLs and excludes them from return slice. Only URLs passing `hasVideoContent()` are kept in `kept` slice. |
+| 4 | Validation respects a configurable timeout so slow sites do not block the scrape cycle | ✓ VERIFIED | `main.go:26` reads `SCRAPER_VALIDATE_TIMEOUT` env var (default 10s). `validate.go:58` creates HTTP client with timeout. Passed through `scraper.New()` at `main.go:56`. |
+
+**Score:** 4/4 truths verified
+
+### Required Artifacts
+
+All 4 artifacts from PLAN must_haves verified at all 3 levels (exists, substantive, wired):
+
+| Artifact | Expected | Exists | Substantive | Wired | Status |
+|----------|----------|--------|-------------|-------|--------|
+| `internal/scraper/validate.go` | URL validation logic with video marker detection | ✓ | ✓ 142 lines, 20 video markers, 4 functions | ✓ | ✓ VERIFIED |
+| `internal/scraper/validate_test.go` | Unit tests for marker detection and content type checks | ✓ | ✓ 124 lines, 14 test cases covering positive/negative scenarios | ✓ | ✓ VERIFIED |
+| `internal/scraper/scraper.go` | validateTimeout field and validation call in scrape() | ✓ | ✓ validateTimeout field on line 16, validateLinks call on line 62 | ✓ | ✓ VERIFIED |
+| `main.go` | SCRAPER_VALIDATE_TIMEOUT env var configuration | ✓ | ✓ Line 26 reads env var, line 56 passes to scraper.New() | ✓ | ✓ VERIFIED |
+
+**Artifact Details:**
+
+**validate.go (142 lines):**
+- Contains `validateLinks` function (lines 56-76)
+- Contains `hasVideoContent` function (lines 81-119)
+- Contains `containsVideoMarkers` function (lines 123-130)
+- Contains `isDirectVideoContentType` function (lines 134-142)
+- Defines 20 video markers (lines 15-40): HTML5 `<video`, HLS (.m3u8, application/x-mpegurl), DASH (.mpd, application/dash+xml), 15 player libraries (hls.js, video.js, jwplayer, clappr, flowplayer, plyr, shaka-player, mediaelement, fluidplayer, etc.)
+- Defines 4 video content types (lines 44-49)
+- Sets 2MB body limit (line 52)
+- 3 redirect limit (line 60)
+
+**validate_test.go (124 lines):**
+- `TestContainsVideoMarkers`: 10 positive cases (video tag, HLS manifest, DASH manifest, HLS.js, Video.js, JW Player, Clappr, Flowplayer, Plyr, Shaka Player) + 4 negative cases (plain HTML, Reddit link page, blog post, empty string)
+- `TestIsDirectVideoContentType`: 6 positive cases (video/mp4, video/webm, HLS content types, DASH, video with params) + 5 negative cases (text/html, application/json, image/png, text/plain, empty)
+- Total: 14 test cases covering 25 assertion points
+
+**scraper.go:**
+- Line 16: `validateTimeout time.Duration` field added to Scraper struct
+- Line 20-21: `New()` function updated to accept validateTimeout parameter
+- Lines 60-65: Validation step inserted between URL extraction and merge logic
+
+**main.go:**
+- Line 26: `validateTimeout := envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)`
+- Line 56: `sc := scraper.New(st, scrapeInterval, validateTimeout)`
+
+### Key Link Verification
+
+All 2 key links from PLAN must_haves verified:
+
+| From | To | Via | Status | Details |
+|------|----|----|--------|---------|
+| `internal/scraper/scraper.go` | `internal/scraper/validate.go` | validateLinks call in scrape() | ✓ WIRED | `scraper.go:62` calls `validateLinks(links, s.validateTimeout)` between URL extraction (`scrapeReddit()` return at line 58) and store merge (`LoadScrapedLinks()` at line 68) |
+| `main.go` | `internal/scraper/scraper.go` | scraper.New() with validateTimeout | ✓ WIRED | `main.go:56` calls `scraper.New(st, scrapeInterval, validateTimeout)` where validateTimeout is read from env on line 26 |
+
+**Wiring Verification:**
+
+**Link 1: scraper.go → validate.go**
+- Pattern match: `validateLinks\(links` found at `scraper.go:62`
+- Context: Call occurs after `scrapeReddit()` (line 53) and before `LoadScrapedLinks()` (line 68)
+- Data flow: `links` variable from `scrapeReddit()` → filtered by `validateLinks()` → assigned back to `links` → merged with existing
+
+**Link 2: main.go → scraper.go**
+- Pattern match: `scraper\.New\(st.*validateTimeout` found at `main.go:56`
+- Context: `validateTimeout` variable read from env on line 26, passed as 3rd parameter to `scraper.New()`
+- Parameter flow: `envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)` → `validateTimeout` variable → `scraper.New()` parameter → `Scraper.validateTimeout` field → `validateLinks()` call
+
+### Requirements Coverage
+
+All 4 requirements from PLAN frontmatter verified against REQUIREMENTS.md:
+
+| Requirement | Description | Status | Evidence |
+|-------------|-------------|--------|----------|
+| **SCRP-01** | Scraper filters Reddit posts by F1 keywords before extracting URLs (existing behavior, preserve) | ✓ SATISFIED | `reddit.go:105` calls `isF1Post(post.Title)` before processing. Keywords defined in `f1Keywords` (lines 29-39). This runs BEFORE validation, preserving existing behavior. |
+| **SCRP-02** | Scraper validates each extracted URL by proxy-fetching it and checking for video/player content markers | ✓ SATISFIED | `validate.go:56-76` implements `validateLinks()` which calls `hasVideoContent()` for each URL. `hasVideoContent()` (lines 81-119) performs HTTP GET and checks for video markers. |
+| **SCRP-03** | URLs that don't look like streams are discarded before saving | ✓ SATISFIED | `validate.go:71-73` logs and excludes URLs where `hasVideoContent()` returns false. Only kept URLs are returned and merged into store. |
+| **SCRP-04** | Validation has a configurable timeout (default 10s) to avoid blocking on slow sites | ✓ SATISFIED | `main.go:26` reads `SCRAPER_VALIDATE_TIMEOUT` with default 10s. `validate.go:58` creates HTTP client with timeout. 3-redirect limit also prevents timeout from slow redirect chains. |
+
+**No orphaned requirements:** All 4 requirements mapped to Phase 1 in REQUIREMENTS.md are accounted for in the PLAN and satisfied by the implementation.
+
+### Anti-Patterns Found
+
+No anti-patterns detected:
+
+| Category | Checked | Found |
+|----------|---------|-------|
+| TODO/FIXME/PLACEHOLDER comments | ✓ | 0 |
+| Placeholder strings | ✓ | 0 |
+| Empty implementations (return null/empty) | ✓ | 0 |
+| Console-only implementations | ✓ | 0 |
+
+**Files checked:**
+- `internal/scraper/validate.go` (142 lines)
+- `internal/scraper/validate_test.go` (124 lines)
+- `internal/scraper/scraper.go` (modified section lines 16, 20-21, 60-65)
+- `main.go` (modified lines 26, 56)
+
+All modified code is production-ready with proper error handling, logging, and no stub patterns.
+
+### Human Verification Required
+
+None. All success criteria are programmatically verifiable through code inspection:
+
+- Keyword filtering behavior: Verified by checking `isF1Post()` call placement
+- URL validation with HTTP fetch: Verified by code inspection of `validateLinks()` and `hasVideoContent()`
+- Discard behavior: Verified by inspecting return logic in `validateLinks()`
+- Timeout configuration: Verified by tracing env var read → parameter passing → HTTP client creation
+
+**Note:** While the *functionality* of video marker detection would ideally be tested against real stream URLs, the *implementation* of the requirement (that validation logic exists, is wired correctly, and has appropriate markers) is fully verified.
+
+## Verification Summary
+
+### All Must-Haves VERIFIED
+
+**Truths (4/4):**
+1. ✓ F1 keyword filtering preserved (SCRP-01)
+2. ✓ URLs proxy-fetched and inspected for video markers (SCRP-02)
+3. ✓ Non-stream URLs discarded (SCRP-03)
+4. ✓ Configurable timeout prevents blocking (SCRP-04)
+
+**Artifacts (4/4):**
+1. ✓ `validate.go` exists, substantive (142 lines, 20 markers, 4 functions), wired
+2. ✓ `validate_test.go` exists, substantive (124 lines, 14 test cases), wired
+3. ✓ `scraper.go` exists, substantive (validateTimeout field + call), wired
+4. ✓ `main.go` exists, substantive (env var read + pass to New()), wired
+
+**Key Links (2/2):**
+1. ✓ scraper.go → validate.go via validateLinks call
+2. ✓ main.go → scraper.go via scraper.New() with validateTimeout
+
+**Requirements (4/4):**
+1. ✓ SCRP-01 satisfied
+2. ✓ SCRP-02 satisfied
+3. ✓ SCRP-03 satisfied
+4. ✓ SCRP-04 satisfied
+
+**Anti-Patterns:** None found
+
+**Human Verification:** Not required
+
+### Implementation Quality
+
+**Strengths:**
+- 20 comprehensive video markers covering HTML5, HLS, DASH, and 15 player libraries
+- Proper error handling throughout (HTTP errors, read errors, invalid URLs)
+- Conservative resource limits (2MB body read limit, 3 redirect limit)
+- Comprehensive unit test coverage (14 test cases, 25 assertion points)
+- Clean integration preserving existing F1 keyword filtering (SCRP-01)
+- Follows existing codebase patterns (envDuration config, logging style, truncate utility)
+
+**Commits:**
+All 3 tasks committed atomically:
+1. `adeb478` - Create validate.go with video marker detection
+2. `22d29db` - Wire validation into scraper pipeline and add config
+3. `6c5cc02` - Add unit tests for validation functions
+
+**No deviations from plan:** Implementation matches PLAN tasks exactly.
+
+## Conclusion
+
+**Phase 1 goal ACHIEVED.**
+
+All success criteria from ROADMAP.md are satisfied:
+1. ✓ Keyword filtering preserved
+2. ✓ URLs validated with video marker detection
+3. ✓ Non-stream URLs discarded
+4. ✓ Configurable timeout prevents blocking
+
+All artifacts exist, are substantive, and are wired correctly. All key links verified. All requirements satisfied. No anti-patterns found. No gaps requiring remediation.
+
+**Ready to proceed to Phase 2.**
+
+---
+
+*Verified: 2026-02-17T20:58:00Z*
+*Verifier: Claude (gsd-verifier)*