Remove the module "xxx" { source = "./module" } indirection layer
from all 66 service stacks. Resources are now defined directly in
each stack's main.tf instead of through a wrapper module.
- Merge module/main.tf contents into stack main.tf
- Apply variable replacements (var.tier -> local.tiers.X, renamed vars)
- Fix shared module paths (one fewer ../ at each level)
- Move extra files/dirs (factory/, chart_values, subdirs) to stack root
- Update state files to strip module.<name>. prefix
- Update CLAUDE.md to reflect flat structure
Verified: terragrunt plan shows 0 add, 0 destroy across all stacks.
11 KiB
| phase | plan | type | wave | depends_on | files_modified | autonomous | requirements | must_haves | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 01-scraper-validation | 01 | execute | 1 |
|
true |
|
|
Purpose: Eliminate junk links (blog posts, news articles, social media) from scraped results so users only see actual stream pages. Output: Working validation step integrated into scraper pipeline, with unit tests and configurable timeout.
<execution_context> @/Users/viktorbarzin/.claude/get-shit-done/workflows/execute-plan.md @/Users/viktorbarzin/.claude/get-shit-done/templates/summary.md </execution_context>
@.planning/PROJECT.md @.planning/ROADMAP.md @.planning/STATE.md @.planning/phases/01-scraper-validation/01-RESEARCH.md@internal/scraper/scraper.go @internal/scraper/reddit.go @internal/models/models.go @main.go
Task 1: Create validate.go with video marker detection internal/scraper/validate.go Create `internal/scraper/validate.go` in package `scraper` with the following:-
Define
videoMarkersstring slice (case-insensitive markers checked against lowercased HTML body):- HTML5:
<video - HLS:
.m3u8,application/x-mpegurl,application/vnd.apple.mpegurl - DASH:
.mpd,application/dash+xml - Player libraries:
hls.js,hls.min.js,dash.js,dash.all.min.js,video.js,video.min.js,videojs,jwplayer,clappr,flowplayer,plyr,shaka-player,mediaelement,fluidplayer
- HTML5:
-
Define
videoContentTypesstring slice for direct video Content-Type detection:video/,application/x-mpegurl,application/vnd.apple.mpegurl,application/dash+xml
-
Define
validateBodyLimit = 2 * 1024 * 1024(2MB). -
Implement
validateLinks(links []models.ScrapedLink, timeout time.Duration) []models.ScrapedLink:- Create
http.Clientwith the given timeout and aCheckRedirectfunction that stops after 3 redirects (returnhttp.ErrUseLastResponsewhenlen(via) >= 3). - Iterate over links. For each, call
hasVideoContent(client, link.URL). If true, keep the link. If false, log:scraper: discarded %s (no video markers)usingtruncate(link.URL, 60). - Return the filtered slice.
- Create
-
Implement
hasVideoContent(client *http.Client, rawURL string) bool:- Create GET request with
User-Agentset to the existinguserAgentconstant fromreddit.go. - Execute request. On error, log and return false.
- Check status code: if < 200 or >= 400, return false.
- Check Content-Type header (lowercased) against
videoContentTypes-- if match, return true (it's a direct video file). - If Content-Type is not
text/htmlorapplication/xhtml, return false (not a video file, not an HTML page to inspect). - Read body with
io.LimitReader(2MB limit). - Return
containsVideoMarkers(strings.ToLower(string(body))).
- Create GET request with
-
Implement
containsVideoMarkers(loweredBody string) bool:- Iterate over
videoMarkers, return true on firststrings.Containsmatch.
- Iterate over
-
Implement
isDirectVideoContentType(ct string) bool:- Lowercase ct, iterate
videoContentTypes, return true on firststrings.Containsmatch.
- Lowercase ct, iterate
Imports needed: io, log, net/http, strings, time, and f1-stream/internal/models.
Do NOT use golang.org/x/net/html (reserved for Phase 4). This is detection, not extraction -- string matching is sufficient.
Run cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./... -- must compile without errors.
validate.go exists with validateLinks, hasVideoContent, containsVideoMarkers, isDirectVideoContentType functions. The file compiles as part of the scraper package.
-
Add
validateTimeout time.Durationfield to theScraperstruct. -
Update
New()signature to acceptvalidateTimeoutparameter:func New(s *store.Store, interval time.Duration, validateTimeout time.Duration) *Scraper { return &Scraper{store: s, interval: interval, validateTimeout: validateTimeout} } -
In
scrape()method, add validation step between thescrapeReddit()return and the merge-with-existing logic. Insert AFTER the linelog.Printf("scraper: reddit scrape completed in %v, got %d links", ...)and BEFORE the lineexisting, err := s.store.LoadScrapedLinks():// Validate links - only keep those with video content markers if len(links) > 0 { validated := validateLinks(links, s.validateTimeout) log.Printf("scraper: validated %d/%d links as streams", len(validated), len(links)) links = validated }This preserves SCRP-01: existing keyword filtering in
scrapeReddit()viaisF1Post()runs first, then validation filters the results.
In main.go:
-
Add
validateTimeoutenv var read after the existingscrapeIntervalline:validateTimeout := envDuration("SCRAPER_VALIDATE_TIMEOUT", 10*time.Second) -
Update the
scraper.New()call to pass the new parameter:sc := scraper.New(st, scrapeInterval, validateTimeout)
Both changes are minimal and follow the existing configuration pattern used for SCRAPE_INTERVAL, PROXY_TIMEOUT, etc.
Run cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go build ./... -- must compile without errors. Then run go vet ./... -- no issues.
Scraper struct has validateTimeout field. New() accepts 3 parameters. scrape() calls validateLinks between extraction and merge. main.go reads SCRAPER_VALIDATE_TIMEOUT env var (default 10s) and passes it to scraper.New().
TestContainsVideoMarkers - table-driven test covering:
-
Positive cases (should return true):
<video>tag:<div><video src="stream.mp4"></video></div>- HLS manifest ref:
var url = "https://cdn.example.com/live.m3u8"; - DASH manifest ref:
<source src="stream.mpd" type="application/dash+xml"> - HLS.js library:
<script src="/js/hls.min.js"></script> - Video.js library:
<script src="https://cdn.example.com/video.js"></script> - JW Player:
<div id="jwplayer-container"></div><script>jwplayer("jwplayer-container")</script> - Clappr:
<script src="clappr.min.js"></script> - Flowplayer:
<script>flowplayer("#player")</script>
-
Negative cases (should return false):
- Plain HTML:
<html><body><p>Hello world</p></body></html> - Reddit link page:
<html><body><a href="https://example.com">Click here</a></body></html> - Blog post:
<html><body><article>F1 race results and analysis...</article></body></html> - Empty string: ``
- Plain HTML:
Each test calls containsVideoMarkers(body) (input is already lowercased in real usage, but test with lowercase content to match real behavior) and asserts the result.
TestIsDirectVideoContentType - table-driven test covering:
- Positive:
video/mp4,video/webm,application/x-mpegurl,application/vnd.apple.mpegurl,application/dash+xml,video/mp4; charset=utf-8(with params) - Negative:
text/html,application/json,image/png,text/plain, empty string
Each test calls isDirectVideoContentType(ct) and asserts the result.
Use standard testing package only. Follow existing test conventions in the codebase (table-driven tests with t.Run).
Run cd /Users/viktorbarzin/code/infra/modules/kubernetes/f1-stream/files && go test ./internal/scraper/ -v -run "TestContainsVideoMarkers|TestIsDirectVideoContentType" -- all tests pass.
validate_test.go exists with TestContainsVideoMarkers (8+ positive, 4+ negative cases) and TestIsDirectVideoContentType (6+ positive, 5+ negative cases). All tests pass.
<success_criteria>
- The scraper pipeline compiles and all tests pass
validateLinksis called inscrape()after URL extraction but before merge, filtering out URLs without video markers- The validation timeout is configurable via
SCRAPER_VALIDATE_TIMEOUTenv var (default 10s) - Existing F1 keyword filtering behavior is preserved unchanged
- No new external dependencies are introduced (stdlib only) </success_criteria>