Design for reviving the trading-bot K8s stack with a new Meet Kevin YouTube watcher pipeline. v1 scope: poll RSS every 3h, extract captions via yt-dlp, run transcript through Claude Sonnet 4.6 for structured per-ticker recommendations and market outlook, surface in a new ticker-centric UI under /meet-kevin/*. Bot integration (signal_generator) and auto-trading deferred to v2. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
20 KiB
Meet Kevin Revival — Design
Overview
Revive the trading bot from its current fully-disabled state and add a new pipeline that watches the Meet Kevin (Kevin Paffrath) YouTube channel, extracts captions from every new upload, runs the transcript through Claude Sonnet 4.6 to infer per-ticker recommendations and overall market outlook, and surfaces the results in a ticker-centric UI inside the existing dashboard. Bot integration (feeding the recommendations into signal_generator) is explicitly v2; auto-trading stays off in v1.
Goals
- Subscribe to the Meet Kevin channel and process each new upload end-to-end without human intervention.
- Present sentiment and recommendations keyed on the stock ticker (the user's stated primary need), with the video feed as a secondary surface.
- Revive the bot in K8s with the lowest reasonable footprint — keep the scaffolding warm, leave the resource-heavy scrapers and auto-trading off.
Scope
In scope (v1)
- New
services/meet_kevin_watcher/Python service insideviktor/trading. - 5 new Postgres tables via a single Alembic migration.
/api/meet-kevin/*routes added to the existingapi_gateway.- 5 new dashboard pages under
/meet-kevin/*. - Terraform revival of
infra/stacks/trading-bot/with 3 services held atreplicas=0(news-fetcher, sentiment-analyzer, trade-executor) and the new watcher container added to the workers Deployment. - 1 new pyproject extras group (
meet_kevin) pinning a currentyt-dlp.
Out of scope (v2 or later)
- Signal injection into
signal_generator(will be wired via a newkevin:signalsRedis stream + a "kevin" strategy weight). - Multi-channel support in the UI (schema is multi-channel-ready, surface is single-channel).
- Audio archival or local Whisper fallback (captions-only).
- Slack notification on high-conviction signals.
- Auto-reanalysis when the LLM prompt version changes (manual "Reprocess" only).
Architecture
┌────────────────────────── viktor/trading (revived K8s stack) ──────────────────────────────┐
│ │
│ NEW services/meet_kevin_watcher/ single async service, sequential stages │
│ ├── rss_poller.py every 3h, GET YouTube /feeds/videos.xml, dedupe │
│ ├── caption_extractor.py yt-dlp --write-auto-sub --skip-download │
│ ├── llm_analyzer.py Claude Sonnet 4.6, structured JSON output │
│ └── main.py orchestrates 3 stages, writes Postgres │
│ │
│ EXTEND │
│ services/api_gateway/ +1 router meet_kevin.py + 1 schemas module │
│ dashboard/ +1 page group /meet-kevin/* + components │
│ shared/models/ +kevin_channel, kevin_video, kevin_transcript, │
│ kevin_analysis, kevin_stock_mention │
│ shared/schemas/ +meet_kevin.py │
│ alembic/ +1 migration with 5 tables │
│ │
│ DISABLED (container removed from `trading-bot-workers` Pod): │
│ news_fetcher, sentiment_analyzer, trade_executor │
│ ACTIVE dashboard, api_gateway, (frontend Pod)│
│ signal_generator, learning_engine, market_data, (workers Pod) │
│ meet_kevin_watcher (NEW) (workers Pod) │
│ │
└────────────────────────────────────────────────────────────────────────────────────────────┘
Key architectural choices
- No Redis streams in v1. With ~3 videos/day the coordination cost of streams would dwarf the value. The watcher walks each video through all three stages and writes directly to Postgres. A
kevin:signalsstream gets added in v2 when wiring intosignal_generator. - Single long-running async service (matching the
news_fetcherpattern). Reuses the sameshared.config.BaseConfig,shared.db,shared.telemetrysetup as the other services. - K8s revival is a single Terraform edit. No new namespace, no new PVC, no new ingress — transcripts live in Postgres, the embedded YouTube iframe handles video display, no file storage needed.
Channel
- Channel: Meet Kevin (Kevin Paffrath)
- YouTube channel ID:
UCUvvj5lwue7PspotMDjk5UA - Feed:
https://www.youtube.com/feeds/videos.xml?channel_id=UCUvvj5lwue7PspotMDjk5UA - Verified 2026-05-21: feed parses (Atom), returns 15 most recent uploads, ~2–3 videos/day.
- Multi-channel support is schema-ready; only this one channel is wired in v1.
Data model
New Postgres tables (single Alembic migration). All id columns are bigserial; all timestamp columns are timestamptz NOT NULL DEFAULT now().
kevin_channels
id, youtube_channel_id (unique), title,
poll_enabled (bool, default true), poll_interval_seconds (int, default 10800),
daily_cost_cap_usd (numeric(8,2), default 5.00),
last_polled_at (nullable), created_at, updated_at
kevin_videos
id, channel_id (FK kevin_channels.id), youtube_video_id (unique, indexed),
title, description, published_at (indexed), duration_seconds (nullable),
thumbnail_url, status (enum {discovered, captioned, analyzed, failed, skipped}),
failure_reason (nullable), retry_count (int, default 0),
processed_at (nullable), created_at, updated_at
kevin_transcripts
id, video_id (FK kevin_videos.id, unique), source (enum {captions_manual, captions_auto, none}),
language (varchar(8)), raw_text (text), segments_json (jsonb), word_count, created_at
kevin_analyses -- one row per LLM run; supports re-runs
id, video_id (FK kevin_videos.id, indexed), model, prompt_version,
market_outlook_direction (enum {bullish, neutral, bearish, mixed}),
market_outlook_reasoning (text),
macro_themes_json (jsonb), -- array of strings
key_risks_json (jsonb), -- array of strings
summary (text), raw_response_json (jsonb),
prompt_tokens, completion_tokens, cost_usd (numeric(10,4)),
created_at
kevin_stock_mentions -- denormalized per-ticker
id, video_id (FK kevin_videos.id, indexed),
analysis_id (FK kevin_analyses.id),
symbol (varchar(16), indexed), action (enum {buy, sell, hold, watch, avoid}),
conviction (numeric(4,3)), -- 0.000 to 1.000
time_horizon (enum {intraday, days, weeks, months, long_term, unspecified}),
rationale_quote (text),
video_timestamp_seconds (int, nullable), -- deep-link target
created_at
Indexes: (symbol, created_at desc) on kevin_stock_mentions (ticker timeline queries), (published_at desc) on kevin_videos (video feed), (status) partial index on kevin_videos WHERE status IN ('discovered', 'captioned') (pipeline scan).
Pipeline
Three stages, run sequentially per video by a single async loop.
Stage 1 — RSS poll (every 3h)
- For each
kevin_channels.poll_enabled = truerow, GET/feeds/videos.xml?channel_id=<id>. - Parse the Atom feed with
feedparser. - For each entry,
INSERT … ON CONFLICT (youtube_video_id) DO NOTHINGintokevin_videoswithstatus = 'discovered'. - Update
kevin_channels.last_polled_at. - Failure: log warning, increment metric, retry on next 3h cycle. No DLQ — RSS replays the same 15 entries on every poll.
First-run backfill: the feed returns the 15 most recent uploads at once, so the first poll inserts up to 15 discovered rows. The pipeline processes them sequentially (~60s each, ~$1.50 in Claude calls total). One-off burst, contained by the daily cost cap.
Stage 2 — Caption extraction
For each video with status = 'discovered', in published_at ascending order:
yt-dlp --write-auto-sub --write-sub --sub-lang 'en.*' --skip-download --convert-subs srt -o '<workdir>/<vid>.%(ext)s' 'https://www.youtube.com/watch?v=<vid>'- If the SRT file exists, parse into segments
[{start, end, text}]and a flatraw_text. - Insert
kevin_transcriptsrow, advancekevin_videos.statustocaptioned. - Cleanup the workdir.
Failure modes:
- No captions available →
status = 'failed',failure_reason = 'no_captions'. Surfaces in the dashboard with a "Reprocess" button (no-op for now; meaningful when we add Whisper fallback in v2). - yt-dlp bot-detection / network → increment
retry_count, exp-backoff (1m, 5m, 15m, 1h, 6h), keepstatus = 'discovered'. After 5 retries →failed.
yt-dlp version: the system's apt-installed yt-dlp (2024.04.09) trips YouTube's bot-detection. The Dockerfile pip-installs a current PyPI release (pinned >=2026.04).
Stage 3 — LLM analysis
For each video with status = 'captioned':
- Check daily cost:
SELECT SUM(cost_usd) FROM kevin_analyses WHERE created_at >= date_trunc('day', now() at time zone 'UTC'). If>= daily_cost_cap_usd, log + skip (video remainscaptioned, resumes tomorrow). - Build the Claude Sonnet 4.6 request:
- System prompt (cached via Anthropic prompt caching — 2–3K tokens describing schema, examples, output format)
- User message:
published_at,title,description, plus the full transcript with segment timestamps preserved - Tool definition that constrains response to the structured JSON schema below
- Call the API. On HTTP error, exp-backoff retry up to 3x, then
failed. - Validate the tool-input JSON via Pydantic. On parse error,
failedwithraw_response_jsoncaptured for debugging. - Insert one
kevin_analysesrow + Nkevin_stock_mentionsrows in a single transaction. Advancekevin_videos.statustoanalyzed.
LLM output schema (Pydantic, also the tool-input schema)
class MeetKevinTickerMention(BaseModel):
symbol: str # uppercased, validated [A-Z]{1,6}
action: Literal["buy","sell","hold","watch","avoid"]
conviction: float # 0.0–1.0
time_horizon: Literal["intraday","days","weeks","months","long_term","unspecified"]
rationale_quote: str # short verbatim or paraphrased quote
video_timestamp_seconds: int | None # deep-link target
class MeetKevinAnalysis(BaseModel):
market_outlook_direction: Literal["bullish","neutral","bearish","mixed"]
market_outlook_reasoning: str
macro_themes: list[str]
key_risks: list[str]
summary: str # ~200 words
tickers: list[MeetKevinTickerMention]
Cost & rate
- ~3 videos/day × ~7–15K input tokens × Sonnet 4.6 → ~$0.07–0.10 per video
- Prompt cache: the 2–3K-token system prompt is reused across videos in a 5-min window → ~95% cache hit on the system portion
- Default daily cap $5/day, configurable per channel via
daily_cost_cap_usd - Expected monthly spend: $3–10
API surface
All routes added to services/api_gateway/routes/meet_kevin.py. Behind the existing JWT/passkey auth.
GET /api/meet-kevin/health
GET /api/meet-kevin/channels
PATCH /api/meet-kevin/channels/{id}
GET /api/meet-kevin/videos ?status=&from=&to=&q=&page=&limit=
GET /api/meet-kevin/videos/{id}
GET /api/meet-kevin/videos/{id}/transcript
POST /api/meet-kevin/videos/{id}/reprocess ?stage={captions|analysis}
GET /api/meet-kevin/stocks
GET /api/meet-kevin/stocks/{symbol}
GET /api/meet-kevin/stocks/{symbol}/timeline ?bucket=day|week
GET /api/meet-kevin/dashboard home aggregate
Response schemas live in shared/schemas/meet_kevin.py so both backend and frontend (via codegen or hand-written TS interfaces) consume the same shape.
Dashboard pages
New page group under dashboard/src/pages/meet-kevin/ with React Router routes. Sidebar gains a "Meet Kevin" section.
| Route | Purpose | Key components |
|---|---|---|
/meet-kevin |
Home — latest video card, top-conviction tickers this week, outlook trendline (14d), pipeline status pill | LatestVideoHero, TopConvictionTable, OutlookTrendChart |
/meet-kevin/videos |
Feed with filters (status, date range, ticker search) | VideoFeed, VideoCard, FilterBar |
/meet-kevin/videos/:id |
Detail — embedded YouTube iframe + tabs (Analysis / Transcript / Raw) | YouTubeEmbed, AnalysisPanel, TranscriptPanel, TickerCard |
/meet-kevin/stocks |
Primary surface — sortable table: symbol, mention count, last seen, latest action, avg conviction, 14d sparkline | StocksTable, Sparkline |
/meet-kevin/stocks/:symbol |
Drill-down — sentiment trendline + chronological mention list + (optional) market-data price overlay | TickerHeader, SentimentTrendChart, MentionList |
Stack reuse (all already in dashboard/package.json): TanStack Query, Tailwind 4, recharts, lightweight-charts (for the optional price overlay), React Router 7. No new top-level dependencies.
Empty states: home shows "Waiting for first poll" with a manual-trigger button. Stocks list shows "No analyses yet" until Claude returns first response.
Frontend stack note: ~/.claude/CLAUDE.md defaults new web apps to Svelte; this is extending the existing React 19 dashboard, so React stays. Documented for traceability.
K8s revival changes
Single edit to ~/code/infra/stacks/trading-bot/main.tf (file is currently inside a /* … */ block-comment, disabled 2026-04-06):
-
Uncomment the entire
/* … */block. -
In the
trading-bot-workersDeployment remove the 3 disabled containers from the Pod spec:news-fetcher,sentiment-analyzer,trade-executor. (Note: all 6 workers share one Pod, so "disable" means delete the container block — there is no per-containerreplicasknob.) -
Add a new
meet-kevin-watchercontainer to the same Pod spec:container { name = "meet-kevin-watcher" image = "viktorbarzin/trading-bot-service:latest" image_pull_policy = "Always" command = ["python", "-m", "services.meet_kevin_watcher.main"] dynamic "env" { for_each = local.common_env content { name = env.key, value = env.value } } env { name = "TRADING_OTEL_METRICS_PORT", value = "9097" } env_from { secret_ref { name = "trading-bot-secrets" } } env_from { secret_ref { name = "trading-bot-db-creds" } } resources { requests = { cpu = "10m", memory = "128Mi" } limits = { memory = "256Mi" } } } -
Final container order in the
trading-bot-workersPod (4 containers):[0]signal-generator[1]learning-engine[2]market-data[3]meet-kevin-watcher (NEW)
Update
lifecycle.ignore_changesaccordingly: 4 image-index lines (spec[0].template[0].spec[0].container[0..3].image) plus the existingdns_configline. -
Extend the
external_secretExternalSecret template with two new keys:TRADING_ANTHROPIC_API_KEY = "{{ .anthropic_api_key }}" TRADING_MEET_KEVIN_CHANNEL_ID = "{{ .meet_kevin_channel_id }}"and add the matching
dataentries. -
Extend
locals.common_env:TRADING_MEET_KEVIN_POLL_INTERVAL_SECONDS = "10800" # 3h TRADING_MEET_KEVIN_DAILY_COST_CAP_USD = "5" TRADING_MEET_KEVIN_LLM_MODEL = "claude-sonnet-4-6" TRADING_MEET_KEVIN_PROMPT_VERSION = "v1"
No Helm changes, no ingress changes, no new namespace, no new PVC, no new Service.
Vault secrets
Two new keys at secret/trading-bot (Vault KV v2):
anthropic_api_key = sk-ant-... # Claude API key (pay-per-use)
meet_kevin_channel_id = UCUvvj5lwue7PspotMDjk5UA
Populate via:
vault kv patch secret/trading-bot \
anthropic_api_key=sk-ant-... \
meet_kevin_channel_id=UCUvvj5lwue7PspotMDjk5UA
The existing ExternalSecret refresh interval (15m) picks them up.
Container & dependency changes
pyproject.toml — new optional dependency group:
meet_kevin = [
"yt-dlp>=2026.04", # avoid the apt-shipped 2024.04.09 (bot-detected by YouTube)
"feedparser>=6.0",
"anthropic>=0.40",
]
docker/Dockerfile.service — add EXTRAS=meet_kevin to the install matrix so the watcher container has its deps.
Observability
Reuses the existing OpenTelemetry → Prometheus pattern. New metrics (port 9097):
meet_kevin_videos_discovered_total{channel}
meet_kevin_captions_extracted_total{result="ok|no_captions|error"}
meet_kevin_llm_calls_total{result="ok|api_error|parse_error|cost_capped"}
meet_kevin_llm_cost_usd_total # counter
meet_kevin_llm_cost_usd_today # gauge, resets at UTC midnight
meet_kevin_pipeline_lag_seconds{stage} # published_at → stage transition
Existing pod-annotation scrape config auto-discovers the new port.
Failure modes (consolidated)
| Stage | Failure | Behavior | User-visible signal |
|---|---|---|---|
| RSS | network / 5xx | log + retry next 3h | /health flags last_poll_age > 6h |
| RSS | XML parse error | log + skip cycle | same |
| Caption | no captions | failed / no_captions |
"Reprocess" button (no-op v1) |
| Caption | yt-dlp / network | exp-backoff to 5x → failed |
per-row "Reprocess" |
| LLM | API error | exp-backoff 3x → failed (raw response captured) |
"Reprocess"; Raw JSON tab |
| LLM | malformed JSON | failed, raw response retained |
Raw JSON tab |
| LLM | daily cost cap hit | stop processing, leave captioned |
pipeline-status badge: "Cost cap reached, resumes 00:00 UTC" |
Testing strategy
Match the existing test-quality bar (1 behavior per test, no internal mocking of own code).
tests/services/meet_kevin_watcher/
test_rss_parser.py fixture: committed Meet Kevin RSS XML; parse + dedupe paths
test_caption_extractor.py mock yt-dlp subprocess; SRT parse, no_captions path
test_llm_analyzer.py mock anthropic.Anthropic.messages.create; prompt structure,
response validation, cost calc, daily cap logic
test_pipeline.py status transitions, retry, failure paths
(pytest-asyncio + in-memory SQLite where viable)
tests/integration/test_meet_kevin_e2e.py [@pytest.mark.integration]
full Postgres + Redis; mock the external HTTP layer only (RSS, yt-dlp, Anthropic);
assert discovered → captioned → analyzed flow + ticker mentions row count
tests/api_gateway/routes/test_meet_kevin.py
FastAPI TestClient per endpoint, auth bypass via existing dev-mode flag
dashboard/src/pages/meet-kevin/__tests__/ (manual QA only — repo has no FE test suite)
Target: ~30 unit tests + 1 integration test + 8 API tests.
Future work (v2 candidates)
- Signal injection into
signal_generator: newkevin:signalsRedis stream, new "kevin" strategy weight in the ensemble, opt-in approval queue in the UI. - Whisper API fallback when captions are missing — currently
no_captionsis terminal. - Multi-channel support in the UI (schema already supports it).
- Slack notification on new high-conviction mention (
conviction > 0.8). - Auto re-analysis on prompt version bump — currently manual via "Reprocess" button.
- Audio archival to NFS for offline reprocessing or training data.