# Trading Bot Design ## Overview An automated stock trading bot that combines news sentiment analysis with technical strategies to make buy/sell decisions on US equities. The system continuously learns from its own trade outcomes to improve decision-making without human supervision. ## Constraints & Decisions | Decision | Choice | |----------|--------| | Markets | US equities only | | Initial mode | Paper trading (Alpaca sandbox) | | Trading style | Adaptive (both swing and day trading) | | News sources | Free only: RSS, Reddit, Twitter/X | | Sentiment analysis | Tiered: FinBERT (local) + Ollama (local LLM for ambiguous cases) | | Tech stack | Python (FastAPI) backend + React (TypeScript) frontend | | Learning approach | Phase 1: strategy weight adjustment. Phase 2: deep RL | | Deployment | Docker Compose (k8s later) | | Database | PostgreSQL + TimescaleDB | | Message broker | Redis Streams | | Primary brokerage | Alpaca (with abstraction layer for future swaps) | ## Architecture Event-driven microservices communicating via Redis Streams. ### Services 1. **News Fetcher** - Polls free news sources on a schedule: - RSS feeds (Yahoo Finance, Reuters, MarketWatch, SEC filings) - Reddit (r/wallstreetbets, r/stocks, r/investing via PRAW) - Twitter/X financial accounts - Publishes raw articles to Redis Stream `news:raw` 2. **Sentiment Analyzer** - Consumes `news:raw`, produces scored sentiment: - Tier 1: FinBERT model (runs locally) for fast scoring - Tier 2: Ollama (Mistral/Llama 3) for articles where FinBERT confidence < threshold - Extracts: ticker mentions, sentiment score (-1 to +1), confidence, key entities - Publishes to `news:scored` 3. **Signal Generator** - Combines sentiment with technical indicators: - Consumes `news:scored` and market data (from Alpaca WebSocket) - Applies strategy rules (momentum, mean reversion, news-driven) - Weighted ensemble combines signals (weights adjusted by learning engine) - Publishes trade signals to `signals:generated` 4. **Trade Executor** - Executes trades via brokerage API: - Consumes `signals:generated` - Risk management: position sizing, stop-losses, max portfolio exposure - Interfaces with Alpaca through an abstraction layer - Records all trades to PostgreSQL - Publishes execution results to `trades:executed` 5. **Learning Engine** - The continuous feedback loop: - Consumes `trades:executed` - Tracks trade outcomes over configurable time windows - Computes reward signals: realized P&L, risk-adjusted return (Sharpe), max drawdown - Adjusts strategy weights based on which strategies are profitable - Stores performance metrics in TimescaleDB 6. **API Gateway (FastAPI)** - Serves the dashboard: - Portfolio state, trade history, signal history - Performance metrics (ROI, Sharpe, win rate, drawdown) - Real-time WebSocket for live updates - Manual override endpoints (pause trading, force close position) 7. **Dashboard (React/TypeScript)** - The UI ### Data Flow ``` RSS/Reddit/X -> News Fetcher -> [news:raw] -> Sentiment Analyzer -> [news:scored] | Alpaca WebSocket -> Signal Generator <-- [news:scored] | | | [signals:generated] | | | Trade Executor -> Alpaca API | | | [trades:executed] | | | Learning Engine -> adjusts Signal Generator weights | | | TimescaleDB (metrics) + PostgreSQL (trades, news) | | | API Gateway -> Dashboard | ``` ## Data Model ### PostgreSQL Tables **Core trading:** - `trades` - ticker, side, qty, price, timestamp, strategy_id, signal_id, status, pnl - `positions` - ticker, qty, avg_entry, unrealized_pnl, stop_loss - `signals` - ticker, direction, strength, strategy_sources, sentiment_score, acted_on - `strategies` - name, description, current weight, active flag - `strategy_weights_history` - strategy_id, old_weight, new_weight, timestamp **News & sentiment:** - `articles` - source, url, title, published_at, fetched_at, content_hash - `article_sentiments` - article_id, ticker, score, confidence, model_used **Learning:** - `trade_outcomes` - trade_id, hold_duration, realized_pnl, roi_pct, was_profitable - `learning_adjustments` - strategy_id, old_weight, new_weight, reason, reward_signal, timestamp ### TimescaleDB Hypertables - `market_data` - OHLCV bars partitioned by time - `portfolio_snapshots` - periodic portfolio value snapshots - `strategy_metrics` - per-strategy performance over time ### Redis - Streams: `news:raw`, `news:scored`, `signals:generated`, `trades:executed` - Cache: current positions, latest prices, strategy weights ## Learning Loop (Phase 1) Multi-armed bandit style strategy weight adjustment: 1. **Track**: Tag every trade with contributing strategies and their weights 2. **Evaluate**: After position close, compute realized P&L, ROI, risk-adjusted return 3. **Attribute**: Distribute credit/blame proportionally to signal strength 4. **Adjust**: Update weights via exponential moving average: `new_weight = (1 - lr) * old_weight + lr * reward_signal` 5. **Log**: Record every adjustment for auditability **Guardrails:** - Minimum 20 trades per strategy before adjustments - Max 10% weight shift per adjustment cycle - Weight floor of 0.05 (no strategy permanently silenced) - Weights normalized to sum to 1.0 - Decay factor for recency bias ## Backtesting Engine Runs as a standalone process (or Docker container triggered from dashboard): - Replays historical market data from TimescaleDB - Replays historical news sentiment from articles/sentiments tables - Shares code with live system (same strategies, same signal generation) - Uses simulated executor (not Alpaca) - Configurable: date range, initial capital, commission model, slippage, strategy weights - Output: trade log, equity curve, Sharpe, max drawdown, win rate, per-strategy attribution ## Dashboard Views 1. **Portfolio Overview** - Value, daily P&L, equity curve, open positions, key metrics 2. **Trade Log** - All trades with expandable detail (news context, strategies, ROI, profitable badge) 3. **Strategy Performance** - Per-strategy metrics, weight allocation, weight history, adjustments log 4. **News & Sentiment Feed** - Live processed articles, ticker filtering, correlation view 5. **Backtesting** - Config form, results dashboard, multi-run comparison Real-time updates via WebSocket. Toast notifications for trade executions. ## Project Structure ``` trading-bot/ ├── services/ │ ├── news-fetcher/ │ ├── sentiment-analyzer/ │ ├── signal-generator/ │ ├── trade-executor/ │ ├── learning-engine/ │ └── api-gateway/ ├── dashboard/ ├── shared/ │ ├── models/ # SQLAlchemy models │ ├── schemas/ # Pydantic schemas │ ├── broker/ # Brokerage abstraction layer │ └── strategies/ # Strategy implementations ├── backtester/ ├── docker/ # Dockerfiles per service ├── docker-compose.yml # Full stack orchestration ├── tests/ ├── alembic/ # Database migrations └── docs/plans/ ``` ## Key Libraries **Backend:** FastAPI, SQLAlchemy (async), alpaca-py, transformers (FinBERT), praw, redis-py, websockets, ollama-python, py-webauthn **Frontend:** React 18, TypeScript, TanStack Query, Recharts or TradingView lightweight-charts, Tailwind CSS, @simplewebauthn/browser **ML:** transformers (FinBERT), ollama, numpy, pandas **Observability:** opentelemetry-sdk, opentelemetry-exporter-prometheus, opentelemetry-instrumentation-fastapi **Testing:** pytest, pytest-asyncio, React Testing Library ## Brokerage Choice Alpaca is the primary brokerage: - Free API, commission-free US equity trading - Built-in paper trading (sandbox environment) - Official Python SDK (alpaca-py): market/limit/stop/trailing-stop orders - REST + WebSocket + SSE endpoints - Built-in NewsClient for financial news - 200 requests/minute rate limit - $0 minimum balance An abstraction layer in `shared/broker/` allows adding other brokerages (Interactive Brokers, Tradier) later without changing strategy or execution logic. ## Observability All Python services are instrumented with OpenTelemetry: - **Metrics**: `opentelemetry-sdk` + `opentelemetry-exporter-prometheus` expose a `/metrics` endpoint on each service for external Prometheus scraping - **Traces**: Distributed tracing across services via OpenTelemetry (OTLP export to any collector) - **Key metrics per service**: - News Fetcher: articles fetched/min, source error rates, fetch latency - Sentiment Analyzer: articles scored/min, FinBERT vs Ollama routing ratio, inference latency - Signal Generator: signals generated/min, per-strategy signal counts - Trade Executor: trades executed/min, order fill latency, rejection rate - Learning Engine: adjustment frequency, weight drift magnitude - API Gateway: request rate, response latency (p50/p95/p99), error rate - **Business metrics** exposed via API Gateway `/metrics`: - Portfolio value, daily P&L, drawdown, Sharpe ratio - Per-strategy win rate and weight allocation No Prometheus or Grafana deployed — the existing infrastructure handles scraping and visualization. ## Authentication & Security Passkey-based authentication (WebAuthn/FIDO2): - **Sign up**: User registers with username + passkey (biometric/security key). Server stores credential public key in PostgreSQL `users` / `user_credentials` tables - **Sign in**: Challenge-response via WebAuthn API. No passwords stored or transmitted - **Session management**: JWT issued after successful passkey authentication, short-lived access token + refresh token - **API Gateway middleware**: All API endpoints require valid JWT (except `/auth/*` routes and `/metrics`) - **CORS**: Restricted to dashboard origin - **Secrets**: Alpaca API keys and signing keys stored as environment variables via `.env` files, never committed to git ### Additional Tables - `users` - id, username, display_name, created_at - `user_credentials` - id, user_id, credential_id, public_key, sign_count, created_at ## Docker Compose Services - `news-fetcher` - Python service - `sentiment-analyzer` - Python service (needs GPU access or CPU for FinBERT) - `signal-generator` - Python service - `trade-executor` - Python service - `learning-engine` - Python service - `api-gateway` - Python (FastAPI) service - `dashboard` - Node/nginx serving React build - `postgres` - PostgreSQL 16 + TimescaleDB extension - `redis` - Redis 7 with Streams - `ollama` - Ollama server for local LLM inference