feat(meet-kevin): caption extractor via yt-dlp
- Implement CaptionResult frozen dataclass for structured caption data - Add parse_srt() to parse SubRip format with flexible timestamp handling - Add extract_captions() async function using yt-dlp subprocess wrapper - Prefer manual captions over auto-generated; clean up SRT files after parsing - Add 16 comprehensive tests covering edge cases (empty input, malformed SRT, timestamp variations, language extraction, manual vs auto selection) - Type-safe implementation with full mypy --strict compliance - Add sample.srt fixture with 3 segments mentioning NVDA for test reference
This commit is contained in:
parent
8ce3ede09c
commit
145f7dbec5
3 changed files with 589 additions and 0 deletions
11
tests/fixtures/sample.srt
vendored
Normal file
11
tests/fixtures/sample.srt
vendored
Normal file
|
|
@ -0,0 +1,11 @@
|
|||
1
|
||||
00:00:01,000 --> 00:00:04,500
|
||||
Welcome back to Meet Kevin
|
||||
|
||||
2
|
||||
00:00:04,500 --> 00:00:09,000
|
||||
Today we are talking about NVDA and AMD earnings
|
||||
|
||||
3
|
||||
00:00:09,000 --> 00:00:14,250
|
||||
You will want to watch this until the end
|
||||
Loading…
Add table
Add a link
Reference in a new issue