[ci skip] Flatten module wrappers into stack roots

Remove the module "xxx" { source = "./module" } indirection layer
from all 66 service stacks. Resources are now defined directly in
each stack's main.tf instead of through a wrapper module.

- Merge module/main.tf contents into stack main.tf
- Apply variable replacements (var.tier -> local.tiers.X, renamed vars)
- Fix shared module paths (one fewer ../ at each level)
- Move extra files/dirs (factory/, chart_values, subdirs) to stack root
- Update state files to strip module.<name>. prefix
- Update CLAUDE.md to reflect flat structure

Verified: terragrunt plan shows 0 add, 0 destroy across all stacks.
This commit is contained in:
Viktor Barzin 2026-02-22 15:13:55 +00:00
parent b0499a7f31
commit c7c7047f1c
245 changed files with 11733 additions and 12432 deletions

View file

@ -0,0 +1,200 @@
---
phase: 01-scraper-validation
verified: 2026-02-17T20:58:00Z
status: passed
score: 4/4 must-haves verified
re_verification: false
---
# Phase 1: Scraper Validation Verification Report
**Phase Goal:** Scraped URLs are verified to contain actual video/player content before being stored, eliminating junk links at the source
**Verified:** 2026-02-17T20:58:00Z
**Status:** PASSED
**Re-verification:** No — initial verification
## Goal Achievement
### Observable Truths
All 4 success criteria from ROADMAP.md verified:
| # | Truth | Status | Evidence |
|---|-------|--------|----------|
| 1 | Scraper still discovers F1-related posts using keyword filtering (existing behavior preserved) | ✓ VERIFIED | `reddit.go:105` calls `isF1Post(post.Title)` before processing posts. Keywords defined in `f1Keywords` slice (lines 29-39). Keyword filtering runs BEFORE validation step. |
| 2 | Each extracted URL is proxy-fetched and inspected for video/player content markers | ✓ VERIFIED | `scraper.go:62` calls `validateLinks(links, s.validateTimeout)` after URL extraction. `validate.go:56-76` implements fetch + marker inspection with 20 video markers (HTML5, HLS, DASH, 10+ player libraries). |
| 3 | URLs without video content markers are discarded and do not appear in scraped.json | ✓ VERIFIED | `validate.go:71-73` logs discarded URLs and excludes them from return slice. Only URLs passing `hasVideoContent()` are kept in `kept` slice. |
| 4 | Validation respects a configurable timeout so slow sites do not block the scrape cycle | ✓ VERIFIED | `main.go:26` reads `SCRAPER_VALIDATE_TIMEOUT` env var (default 10s). `validate.go:58` creates HTTP client with timeout. Passed through `scraper.New()` at `main.go:56`. |
**Score:** 4/4 truths verified
### Required Artifacts
All 4 artifacts from PLAN must_haves verified at all 3 levels (exists, substantive, wired):
| Artifact | Expected | Exists | Substantive | Wired | Status |
|----------|----------|--------|-------------|-------|--------|
| `internal/scraper/validate.go` | URL validation logic with video marker detection | ✓ | ✓ 142 lines, 20 video markers, 4 functions | ✓ | ✓ VERIFIED |
| `internal/scraper/validate_test.go` | Unit tests for marker detection and content type checks | ✓ | ✓ 124 lines, 14 test cases covering positive/negative scenarios | ✓ | ✓ VERIFIED |
| `internal/scraper/scraper.go` | validateTimeout field and validation call in scrape() | ✓ | ✓ validateTimeout field on line 16, validateLinks call on line 62 | ✓ | ✓ VERIFIED |
| `main.go` | SCRAPER_VALIDATE_TIMEOUT env var configuration | ✓ | ✓ Line 26 reads env var, line 56 passes to scraper.New() | ✓ | ✓ VERIFIED |
**Artifact Details:**
**validate.go (142 lines):**
- Contains `validateLinks` function (lines 56-76)
- Contains `hasVideoContent` function (lines 81-119)
- Contains `containsVideoMarkers` function (lines 123-130)
- Contains `isDirectVideoContentType` function (lines 134-142)
- Defines 20 video markers (lines 15-40): HTML5 `<video`, HLS (.m3u8, application/x-mpegurl), DASH (.mpd, application/dash+xml), 15 player libraries (hls.js, video.js, jwplayer, clappr, flowplayer, plyr, shaka-player, mediaelement, fluidplayer, etc.)
- Defines 4 video content types (lines 44-49)
- Sets 2MB body limit (line 52)
- 3 redirect limit (line 60)
**validate_test.go (124 lines):**
- `TestContainsVideoMarkers`: 10 positive cases (video tag, HLS manifest, DASH manifest, HLS.js, Video.js, JW Player, Clappr, Flowplayer, Plyr, Shaka Player) + 4 negative cases (plain HTML, Reddit link page, blog post, empty string)
- `TestIsDirectVideoContentType`: 6 positive cases (video/mp4, video/webm, HLS content types, DASH, video with params) + 5 negative cases (text/html, application/json, image/png, text/plain, empty)
- Total: 14 test cases covering 25 assertion points
**scraper.go:**
- Line 16: `validateTimeout time.Duration` field added to Scraper struct
- Line 20-21: `New()` function updated to accept validateTimeout parameter
- Lines 60-65: Validation step inserted between URL extraction and merge logic
**main.go:**
- Line 26: `validateTimeout := envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)`
- Line 56: `sc := scraper.New(st, scrapeInterval, validateTimeout)`
### Key Link Verification
All 2 key links from PLAN must_haves verified:
| From | To | Via | Status | Details |
|------|----|----|--------|---------|
| `internal/scraper/scraper.go` | `internal/scraper/validate.go` | validateLinks call in scrape() | ✓ WIRED | `scraper.go:62` calls `validateLinks(links, s.validateTimeout)` between URL extraction (`scrapeReddit()` return at line 58) and store merge (`LoadScrapedLinks()` at line 68) |
| `main.go` | `internal/scraper/scraper.go` | scraper.New() with validateTimeout | ✓ WIRED | `main.go:56` calls `scraper.New(st, scrapeInterval, validateTimeout)` where validateTimeout is read from env on line 26 |
**Wiring Verification:**
**Link 1: scraper.go → validate.go**
- Pattern match: `validateLinks\(links` found at `scraper.go:62`
- Context: Call occurs after `scrapeReddit()` (line 53) and before `LoadScrapedLinks()` (line 68)
- Data flow: `links` variable from `scrapeReddit()` → filtered by `validateLinks()` → assigned back to `links` → merged with existing
**Link 2: main.go → scraper.go**
- Pattern match: `scraper\.New\(st.*validateTimeout` found at `main.go:56`
- Context: `validateTimeout` variable read from env on line 26, passed as 3rd parameter to `scraper.New()`
- Parameter flow: `envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second)``validateTimeout` variable → `scraper.New()` parameter → `Scraper.validateTimeout` field → `validateLinks()` call
### Requirements Coverage
All 4 requirements from PLAN frontmatter verified against REQUIREMENTS.md:
| Requirement | Description | Status | Evidence |
|-------------|-------------|--------|----------|
| **SCRP-01** | Scraper filters Reddit posts by F1 keywords before extracting URLs (existing behavior, preserve) | ✓ SATISFIED | `reddit.go:105` calls `isF1Post(post.Title)` before processing. Keywords defined in `f1Keywords` (lines 29-39). This runs BEFORE validation, preserving existing behavior. |
| **SCRP-02** | Scraper validates each extracted URL by proxy-fetching it and checking for video/player content markers | ✓ SATISFIED | `validate.go:56-76` implements `validateLinks()` which calls `hasVideoContent()` for each URL. `hasVideoContent()` (lines 81-119) performs HTTP GET and checks for video markers. |
| **SCRP-03** | URLs that don't look like streams are discarded before saving | ✓ SATISFIED | `validate.go:71-73` logs and excludes URLs where `hasVideoContent()` returns false. Only kept URLs are returned and merged into store. |
| **SCRP-04** | Validation has a configurable timeout (default 10s) to avoid blocking on slow sites | ✓ SATISFIED | `main.go:26` reads `SCRAPER_VALIDATE_TIMEOUT` with default 10s. `validate.go:58` creates HTTP client with timeout. 3-redirect limit also prevents timeout from slow redirect chains. |
**No orphaned requirements:** All 4 requirements mapped to Phase 1 in REQUIREMENTS.md are accounted for in the PLAN and satisfied by the implementation.
### Anti-Patterns Found
No anti-patterns detected:
| Category | Checked | Found |
|----------|---------|-------|
| TODO/FIXME/PLACEHOLDER comments | ✓ | 0 |
| Placeholder strings | ✓ | 0 |
| Empty implementations (return null/empty) | ✓ | 0 |
| Console-only implementations | ✓ | 0 |
**Files checked:**
- `internal/scraper/validate.go` (142 lines)
- `internal/scraper/validate_test.go` (124 lines)
- `internal/scraper/scraper.go` (modified section lines 16, 20-21, 60-65)
- `main.go` (modified lines 26, 56)
All modified code is production-ready with proper error handling, logging, and no stub patterns.
### Human Verification Required
None. All success criteria are programmatically verifiable through code inspection:
- Keyword filtering behavior: Verified by checking `isF1Post()` call placement
- URL validation with HTTP fetch: Verified by code inspection of `validateLinks()` and `hasVideoContent()`
- Discard behavior: Verified by inspecting return logic in `validateLinks()`
- Timeout configuration: Verified by tracing env var read → parameter passing → HTTP client creation
**Note:** While the *functionality* of video marker detection would ideally be tested against real stream URLs, the *implementation* of the requirement (that validation logic exists, is wired correctly, and has appropriate markers) is fully verified.
## Verification Summary
### All Must-Haves VERIFIED
**Truths (4/4):**
1. ✓ F1 keyword filtering preserved (SCRP-01)
2. ✓ URLs proxy-fetched and inspected for video markers (SCRP-02)
3. ✓ Non-stream URLs discarded (SCRP-03)
4. ✓ Configurable timeout prevents blocking (SCRP-04)
**Artifacts (4/4):**
1. ✓ `validate.go` exists, substantive (142 lines, 20 markers, 4 functions), wired
2. ✓ `validate_test.go` exists, substantive (124 lines, 14 test cases), wired
3. ✓ `scraper.go` exists, substantive (validateTimeout field + call), wired
4. ✓ `main.go` exists, substantive (env var read + pass to New()), wired
**Key Links (2/2):**
1. ✓ scraper.go → validate.go via validateLinks call
2. ✓ main.go → scraper.go via scraper.New() with validateTimeout
**Requirements (4/4):**
1. ✓ SCRP-01 satisfied
2. ✓ SCRP-02 satisfied
3. ✓ SCRP-03 satisfied
4. ✓ SCRP-04 satisfied
**Anti-Patterns:** None found
**Human Verification:** Not required
### Implementation Quality
**Strengths:**
- 20 comprehensive video markers covering HTML5, HLS, DASH, and 15 player libraries
- Proper error handling throughout (HTTP errors, read errors, invalid URLs)
- Conservative resource limits (2MB body read limit, 3 redirect limit)
- Comprehensive unit test coverage (14 test cases, 25 assertion points)
- Clean integration preserving existing F1 keyword filtering (SCRP-01)
- Follows existing codebase patterns (envDuration config, logging style, truncate utility)
**Commits:**
All 3 tasks committed atomically:
1. `adeb478` - Create validate.go with video marker detection
2. `22d29db` - Wire validation into scraper pipeline and add config
3. `6c5cc02` - Add unit tests for validation functions
**No deviations from plan:** Implementation matches PLAN tasks exactly.
## Conclusion
**Phase 1 goal ACHIEVED.**
All success criteria from ROADMAP.md are satisfied:
1. ✓ Keyword filtering preserved
2. ✓ URLs validated with video marker detection
3. ✓ Non-stream URLs discarded
4. ✓ Configurable timeout prevents blocking
All artifacts exist, are substantive, and are wired correctly. All key links verified. All requirements satisfied. No anti-patterns found. No gaps requiring remediation.
**Ready to proceed to Phase 2.**
---
*Verified: 2026-02-17T20:58:00Z*
*Verifier: Claude (gsd-verifier)*