docs: map existing codebase

This commit is contained in:
Viktor Barzin 2026-02-23 20:04:05 +00:00
parent d36ae40df1
commit bc34c78072
No known key found for this signature in database
GPG key ID: 0EB088298288D958
7 changed files with 2003 additions and 0 deletions

View file

@ -0,0 +1,246 @@
# Architecture
**Analysis Date:** 2025-02-23
## Pattern Overview
**Overall:** Event-driven microservices architecture using Redis Streams for inter-service communication.
**Key Characteristics:**
- 8 independent Python services communicating asynchronously via Redis Streams
- Layered abstraction for extensibility: broker abstraction (`BaseBroker`), strategy abstraction (`BaseStrategy`), source abstraction for news
- Each service consumes from one or more Redis Streams, processes data, persists to PostgreSQL, and publishes downstream results
- Shared libraries provide domain models, database layer, telemetry, and abstractions
- React TypeScript frontend connected via WebSocket and REST API
## Layers
**Service Layer (Event Processors):**
- Purpose: Independent microservices that transform data across the trading pipeline
- Location: `services/*/main.py` (entry points)
- Contains: RSS/Reddit fetchers, sentiment analysis, signal generation, trade execution, learning engine, market data subscriptions, API gateway
- Depends on: `shared/` libraries, external APIs (Alpaca, Reddit, Ollama)
- Used by: Each other (via Redis Streams), dashboard (via API gateway)
**Infrastructure/Abstraction Layer:**
- Purpose: Pluggable interfaces and database access
- Location: `shared/broker/`, `shared/strategies/`, `shared/db.py`
- Contains: `BaseBroker` (Alpaca implementation), `BaseStrategy` (Momentum/MeanReversion/NewsDriven), database engine setup
- Depends on: SQLAlchemy, alpaca-py, external ML models
- Used by: Trade executor, signal generator, backtester
**Data Layer:**
- Purpose: Persistent storage of trades, signals, articles, market data, user credentials
- Location: `shared/models/` (SQLAlchemy ORM models)
- Contains: 14 tables including `trades`, `signals`, `articles`, `article_sentiments`, `market_data`, `positions`, `users`, `strategies`, strategy weights history
- Depends on: PostgreSQL + TimescaleDB extension
- Used by: All services for read/write persistence
**Messaging Layer:**
- Purpose: Asynchronous event streaming between services
- Location: `shared/redis_streams.py`
- Contains: `StreamPublisher` (XADD), `StreamConsumer` (XREADGROUP with consumer groups)
- Depends on: Redis Streams
- Used by: All services for pub/sub communication
**Frontend Layer:**
- Purpose: User-facing dashboard for monitoring and control
- Location: `dashboard/src/`
- Contains: React pages, components, API client, WebAuthn auth
- Depends on: TanStack Query, TradingView charts, Recharts, React Router
- Used by: End users for real-time monitoring
**Utility Layer:**
- Purpose: Cross-cutting concerns
- Location: `shared/config.py`, `shared/telemetry.py`, `shared/schemas/`
- Contains: Configuration management (Pydantic BaseSettings), OpenTelemetry/Prometheus metrics setup, message schemas (Pydantic v2)
- Depends on: pydantic-settings, opentelemetry-sdk, prometheus-client
- Used by: All services
## Data Flow
**News Ingestion Pipeline:**
1. `services/news_fetcher/main.py` polls RSS feeds and Reddit on independent schedules
2. Uses `RSSSource` and `RedditSource` to fetch articles
3. Deduplicates via Redis SET (`news:seen_hashes`)
4. Publishes `RawArticle` messages to `news:raw` Redis Stream
**Sentiment Analysis:**
1. `services/sentiment_analyzer/main.py` consumes from `news:raw`
2. Runs articles through `FinBERTAnalyzer` for sentiment scores
3. Falls back to `OllamaAnalyzer` if confidence < threshold (configured via `SentimentAnalyzerConfig.finbert_confidence_threshold`)
4. Extracts ticker mentions via regex in `ticker_extractor.py`
5. Persists `Article` + `ArticleSentiment` to database
6. Publishes `ScoredArticle` messages (one per ticker per article) to `news:scored` Redis Stream
**Market Data Collection:**
1. `services/market_data/main.py` fetches OHLCV bars from Alpaca (historical + live)
2. Publishes bars to `market:bars` Redis Stream as ticker-specific snapshots
3. Consumed by signal generator for technical indicators
**Signal Generation:**
1. `services/signal_generator/main.py` consumes from both `news:scored` and `market:bars`
2. Maintains in-memory buffers: per-ticker sentiment context (deque of last 50 scores) and market data (SMA, RSI)
3. Runs `WeightedEnsemble` combining three strategies:
- `MomentumStrategy`: SMA cross-over based
- `MeanReversionStrategy`: RSI-based
- `NewsDrivenStrategy`: Sentiment-aware
4. Weights loaded from Redis cache (default equal weights: 0.333 each)
5. Only emits `TradeSignal` when combined strength > threshold (default 0.3)
6. Publishes to `signals:generated` Redis Stream, persists `Signal` model to database
**Trade Execution:**
1. `services/trade_executor/main.py` consumes from `signals:generated`
2. `RiskManager` performs pre-trade checks:
- Market hours validation (9:30 AM - 4:00 PM ET)
- Max daily loss limit
- Per-ticker position limits
- Cooldown period after exits
3. Calculates position size based on account equity and risk parameters
4. Submits order via `AlpacaBroker` (abstraction over alpaca-py)
5. Records `Trade` model with order details
6. Publishes `TradeExecution` to `trades:executed` Redis Stream
**Learning & Weight Adjustment:**
1. `services/learning_engine/main.py` consumes from `trades:executed`
2. Identifies position closes by tracking opening trades in Redis hash (`positions:history`)
3. `TradeEvaluator` calculates realized P&L
4. `WeightAdjuster` uses multi-armed bandit approach:
- Attributes P&L credit to contributing strategies
- Adjusts weights (with guardrails: min trades threshold, max shift limit, weight floor)
- Persists adjustments to `LearningAdjustment` model for auditability
5. Saves updated weights to Redis cache
**Portfolio Sync (Background Task):**
1. `services/api_gateway/tasks/portfolio_sync.py` runs as background coroutine
2. Periodically queries Alpaca for current positions, cash, equity
3. Stores `PortfolioSnapshot` in database for historical tracking
4. Dashboard polls portfolio endpoint for real-time data
**API Gateway & Dashboard:**
1. `services/api_gateway/main.py` FastAPI app with:
- REST endpoints (`/api/signals`, `/api/trades`, `/api/portfolio`, `/api/news`, `/api/strategies`, `/api/backtest`, `/api/controls`)
- WebSocket endpoint for real-time updates
- WebAuthn/JWT authentication
2. Dashboard (`dashboard/src/App.tsx`) consumed by authenticated users
3. Real-time market data, trades, signals streamed via WebSocket
**State Management:**
- **Redis Streams**: Messages flow unidirectionally; consumer groups ensure at-least-once processing
- **Redis Cache**: Strategy weights, position history, deduplication hashes
- **PostgreSQL**: Long-term storage of trades, signals, articles, market snapshots, users
- **In-Memory**: Services maintain per-ticker buffers (sentiment scores, market data) for current processing cycle; cleared/rebuilt as data arrives
- **JWT Sessions**: Issued at login, renewed via refresh token endpoint
## Key Abstractions
**BaseBroker:**
- Purpose: Swap brokerage providers without changing services
- Examples: `shared/broker/base.py` (ABC), `shared/broker/alpaca_broker.py` (implementation)
- Pattern: Abstract methods for `submit_order()`, `get_account()`, `get_positions()`, `cancel_order()`
- Trade executor only knows `BaseBroker` interface; `AlpacaBroker` is a concrete implementation
**BaseStrategy:**
- Purpose: Pluggable trading strategy implementations
- Examples: `shared/strategies/base.py` (ABC), `momentum.py`, `mean_reversion.py`, `news_driven.py`
- Pattern: Each strategy implements `async evaluate(ticker, market, sentiment)` returning `TradeSignal | None`
- Signal generator instantiates all three at startup, runs via `WeightedEnsemble.evaluate()`
**StreamPublisher / StreamConsumer:**
- Purpose: Simplified Redis Streams wrapper with JSON serialization
- Examples: `shared/redis_streams.py`
- Pattern: `Publisher.publish(dict)``XADD`, `Consumer.consume()` → async iterator yielding `(msg_id, data)`
- Consumer groups ensure each message is processed once per group
**Pydantic Schemas:**
- Purpose: Message validation and serialization
- Examples: `shared/schemas/news.py`, `shared/schemas/trading.py`, `shared/schemas/learning.py`
- Pattern: Each stream has a schema (e.g., `RawArticle`, `ScoredArticle`, `TradeSignal`, `TradeExecution`)
- Services validate on consume, serialize with `model_dump(mode="json")` before publish
## Entry Points
**News Fetcher:**
- Location: `services/news_fetcher/main.py`
- Triggers: Starts as standalone service, polling on timers
- Responsibilities: Fetch RSS/Reddit articles, deduplicate, publish to `news:raw`
**Sentiment Analyzer:**
- Location: `services/sentiment_analyzer/main.py`
- Triggers: Consumes messages from `news:raw` stream
- Responsibilities: Score sentiment (FinBERT + Ollama fallback), extract tickers, persist articles, publish to `news:scored`
**Market Data Service:**
- Location: `services/market_data/main.py`
- Triggers: Starts as standalone service, fetches on startup + live streaming
- Responsibilities: Fetch OHLCV bars from Alpaca, publish to `market:bars`
**Signal Generator:**
- Location: `services/signal_generator/main.py`
- Triggers: Consumes from both `news:scored` and `market:bars` streams
- Responsibilities: Combine sentiment + technical data, run strategy ensemble, publish to `signals:generated`
**Trade Executor:**
- Location: `services/trade_executor/main.py`
- Triggers: Consumes from `signals:generated` stream
- Responsibilities: Risk check, position sizing, order submission, trade recording, publish to `trades:executed`
**Learning Engine:**
- Location: `services/learning_engine/main.py`
- Triggers: Consumes from `trades:executed` stream
- Responsibilities: Evaluate closed positions, adjust strategy weights, audit adjustments
**API Gateway:**
- Location: `services/api_gateway/main.py`
- Triggers: HTTP server listening on configurable port (default 8000)
- Responsibilities: REST/WebSocket API, user authentication (WebAuthn), portfolio sync task, query database
**Backtester:**
- Location: `backtester/engine.py`
- Triggers: Called from API endpoint `/api/backtest` (POST)
- Responsibilities: Historical replay with `SimulatedBroker`, metrics calculation, equity curve generation
## Error Handling
**Strategy:** Graceful degradation and event loop resilience.
**Patterns:**
- **Service-level**: Catch broad `Exception` in polling loops (news fetcher), log and continue; metrics counters increment for monitoring
- **Fallback chains**: Sentiment analyzer tries FinBERT first, falls back to Ollama on low confidence
- **Risk checks**: Trade executor rejects signals that fail risk checks (logs reason, continues)
- **Database retries**: If `IntegrityError` on article insert (duplicate), continue without re-raising
- **Stream consumption**: Consumer groups handle delivery guarantees; manual `XACK` after processing ensures fault tolerance
- **Shutdown**: `asyncio.TaskGroup` and signal handlers ensure graceful termination; cleanup of DB/Redis connections in FastAPI lifespan
## Cross-Cutting Concerns
**Logging:**
- Each service uses Python `logging` module with service name
- All exceptions logged via `logger.exception()` for stack traces
- Key events logged at INFO level (article counts, trades executed, weight updates)
**Validation:**
- Pydantic v2 schemas validate all messages on consume
- Database models use SQLAlchemy constraints (NOT NULL, UNIQUE, FOREIGN KEY)
- Config classes extend `BaseSettings` with environment variable prefixes (`TRADING_*`)
**Authentication:**
- WebAuthn (passkey) via `webauthn` PyPI package (`services/api_gateway/auth/`)
- JWT tokens issued at login, verified on every protected endpoint via `get_current_user` dependency
- Credentials stored as `UserCredential` model (credential ID, public key, counter)
**Telemetry:**
- OpenTelemetry meters instantiated per service via `setup_telemetry(service_name, metrics_port)`
- Counters: `articles_fetched`, `finbert_count`, `ollama_count`, `rejections`, `trades_executed`
- Histograms: `fill_latency`, sentiment analysis duration, strategy evaluation time
- Prometheus scrapes `/metrics` endpoint (default port 9090) from each service
- No Prometheus/Grafana deployed; uses existing infrastructure for metrics collection

View file

@ -0,0 +1,385 @@
# Codebase Concerns
**Analysis Date:** 2026-02-23
## Tech Debt
**Incomplete API Portfolio Metrics:**
- Issue: Portfolio metrics endpoint returns hardcoded placeholder values instead of computed analytics
- Files: `services/api_gateway/routes/portfolio.py` (lines 90-92, 170, 172)
- Impact: Dashboard displays incorrect P&L, drawdown, and hold duration metrics; client decisions based on false data
- Fix approach: Implement cumulative P&L tracking from first portfolio snapshot, compute max drawdown from historical snapshots, calculate average hold duration from trade outcomes table, implement Redis trading pause flag integration
**Missing Portfolio Sync Credentials Handling:**
- Issue: Portfolio sync task silently disables when Alpaca credentials are missing, with no alerting mechanism
- Files: `services/api_gateway/tasks/portfolio_sync.py` (lines 125-129)
- Impact: Dashboard shows stale or zero portfolio data without user awareness; critical data source failure goes unnoticed
- Fix approach: Add explicit warning log at startup, implement health check endpoint that reports sync status, consider sending alerts to dashboard via WebSocket
**Broad Exception Handling:**
- Issue: Multiple services catch bare `Exception` without specific error classification
- Files: `services/sentiment_analyzer/main.py` (lines 150, 207, 226), `services/trade_executor/main.py` (lines 136, 203, 222), `services/signal_generator/main.py` (lines 87, 178, 194, 257), `services/learning_engine/main.py` (lines 304), `services/market_data/main.py` (lines 108, 158, 253), `services/api_gateway/tasks/portfolio_sync.py` (lines 152)
- Impact: Errors silently logged without retry logic, potential data loss, operational issues invisible to monitoring
- Fix approach: Implement specific exception handling (IntegrityError, APIError, ValidationError, etc.), add telemetry counters for error categories, implement exponential backoff for transient failures
**Implicit Database Persistence Fallback:**
- Issue: Multiple services (sentiment-analyzer, trade-executor) silently continue if DB initialization fails
- Files: `services/sentiment_analyzer/main.py` (lines 203-208), `services/trade_executor/main.py` (lines 199-204), `services/signal_generator/main.py` (initial DB setup)
- Impact: Services run without persistence layer — no audit trail, no dashboard data, potential duplicate processing if service restarts
- Fix approach: Fail fast on DB connection errors, or implement persistent message queue (file-based) as fallback to ensure no trades are lost
**Learning Engine Position State In Redis Only:**
- Issue: Opening trades stored in Redis `positions:history` hash with no persistence — loss of intermediate state if service crashes
- Files: `services/learning_engine/main.py` (lines 65-82, 121-138)
- Impact: Position close detection fails after service restart; partial trades with no close detected = unrealistic P&L
- Fix approach: Move position history to database with persistent transaction log, implement consistent read-modify-write with transactions
---
## Known Bugs
**Portfolio Sync N+1 Query Pattern:**
- Symptoms: One SELECT per position during portfolio sync; if 100 positions, 100 database queries per cycle
- Files: `services/api_gateway/tasks/portfolio_sync.py` (lines 74-92)
- Trigger: Any portfolio with multiple open positions
- Workaround: None; will degrade performance as position count grows
- Fix: Use `SELECT * WHERE ticker IN (...)` to fetch all positions in one query, then update in-memory dict
**Daily P&L Calculation Bug:**
- Symptoms: Portfolio endpoint returns same value for `daily_pnl_pct` and `total_pnl_pct`
- Files: `services/api_gateway/routes/portfolio.py` (lines 90-91)
- Trigger: Every portfolio request
- Root cause: Code reuses `daily_pnl_pct` for `total_pnl_pct` instead of computing cumulative metric
- Workaround: Manually inspect database portfolio_snapshots for correct metrics
- Fix: Compute cumulative P&L from first snapshot timestamp to now
**Position Unrealized P&L Division by Zero:**
- Symptoms: Potential division by zero when `p.qty == 0` (closed position)
- Files: `services/api_gateway/routes/portfolio.py` (lines 116-117, 120-121)
- Trigger: When position is exactly 0 shares (edge case in reconciliation)
- Workaround: Guards in place (`if p.qty`) but logic is error-prone
- Fix: Refactor to use `if p.qty and p.avg_entry` guard early, simplify arithmetic
**Signal Sentiment Context Missing Price Field:**
- Symptoms: Risk manager reads `sentiment_context["current_price"]` but signal may not include it
- Files: `services/trade_executor/risk_manager.py` (lines 118-120)
- Trigger: If signal is generated without market snapshot or price is omitted
- Workaround: Falls back to 0.0, position sizing becomes incorrect
- Fix: Ensure signal generator always includes current_price in sentiment_context, add assertion in executor
---
## Security Considerations
**JWT Secret Key Not Enforced in Development:**
- Risk: Default development deployments may use weak or missing JWT secret keys
- Files: `services/api_gateway/config.py` (lines 13-15), `services/api_gateway/auth/jwt.py` (lines 41, 96)
- Current mitigation: Config requires `jwt_secret_key` but does not validate length/entropy
- Recommendations:
- Add validation in ApiGatewayConfig that secret is >= 32 bytes
- Fail startup if secret is not set (don't use default)
- Add warning log if secret is weak (< 256 bits)
- Document minimum secret requirements in README
**Test Keys Insufficient for Production:**
- Risk: Test suites use hardcoded short keys that would be cryptographically weak in production
- Files: Multiple test files use test keys like `"test-secret-key"`
- Current mitigation: Only in test environment
- Recommendations:
- Use pytest fixture to generate strong keys per test
- Document that tests should not be deployed as-is
- Consider environment variable override in CI
**CORS Configuration Hardcoded for Localhost:**
- Risk: Default CORS allows only `http://localhost:5173`, but production deployment requires environment override
- Files: `services/api_gateway/config.py` (line 26)
- Current mitigation: CORS is at least restricted (not `*`), but easy to miss in production deploy
- Recommendations:
- Make CORS_ORIGINS a required config variable (no default)
- Add validation that origins use HTTPS in production
- Log warning if origin is localhost in production environment
**Passkey Registration Not Rate-Limited:**
- Risk: WebAuthn registration endpoint (if present) has no rate limiting, allowing brute-force attacks
- Files: Not fully reviewed; check `services/api_gateway/auth/routes.py`
- Current mitigation: Unknown
- Recommendations:
- Add rate limiting (redis-backed) on registration endpoints
- Implement CAPTCHA or email verification for new accounts
- Log failed registration attempts for monitoring
**No HTTPS Enforcement in API Gateway:**
- Risk: JWT tokens transmitted over plaintext HTTP in development config
- Files: `services/api_gateway/config.py` (line 31 uses `http://localhost:5173`)
- Current mitigation: Only affects localhost development
- Recommendations:
- Add `enforce_https` config flag; reject insecure origins in production
- Set `secure` flag on JWT cookies
- Use HSTS headers in FastAPI
---
## Performance Bottlenecks
**Sentiment Scores Stored in Unbounded Deque:**
- Problem: Signal generator stores up to 50 sentiment scores per ticker in memory (never evicted from running process)
- Files: `services/signal_generator/main.py` (line 35, lines 107-111)
- Cause: deques are bounded but not persisted; high-frequency tickers accumulate memory
- Current: 50 * ~5000 tickers = ~250K floats in memory during high volume
- Improvement path:
- Store sentiment history in database (timeseries table) instead of memory
- Query last N scores on-demand per signal generation
- Add TTL/expiration for old scores
**Market Data Manager Holds Full OHLCV History:**
- Problem: `MarketDataManager` buffers all bars received for each ticker in memory
- Files: `services/signal_generator/main.py` (referenced from main)
- Cause: No bounded retention; running service accumulates bars indefinitely
- Impact: Memory leak over weeks of operation
- Improvement path:
- Implement sliding window (keep last 100 bars only)
- Persist bars to database for strategy backtesting
- Use TimescaleDB hypertable compression for efficient historical queries
**Portfolio Sync Deletes and Re-inserts All Positions:**
- Problem: Every sync cycle deletes all local positions and re-inserts from broker
- Files: `services/api_gateway/tasks/portfolio_sync.py` (lines 94-101)
- Cause: Simplistic approach; doesn't diff changes
- Impact: Every 60s delete cascade on Position table (triggers cascade on related trades?)
- Improvement path:
- Implement diff-based upsert: only update changed tickers
- Delete positions not in broker snapshot
- Reduces churn, avoids potential foreign key cascade issues
**No Connection Pooling Tuning:**
- Problem: AsyncPG default pool size may be insufficient for 7 services + API load
- Files: `shared/db.py` (line 13-16)
- Cause: `create_async_engine` uses default pool_size
- Impact: Connection starvation under load, query timeouts
- Improvement path:
- Add `pool_size` and `max_overflow` config to BaseConfig
- Monitor pool utilization (add telemetry)
- Right-size pools per environment
**Alpaca API Calls Not Cached:**
- Problem: Every trade executor and portfolio sync makes synchronous Alpaca API calls for account/positions
- Files: `services/trade_executor/main.py` (line 74), `services/api_gateway/tasks/portfolio_sync.py` (line 64)
- Cause: No request deduplication or caching layer
- Impact: Slow order submission, API rate limit risks
- Improvement path:
- Cache account snapshot in Redis (TTL 5s)
- Batch position queries where possible
- Implement exponential backoff for API failures
---
## Fragile Areas
**Sentiment Analyzer Ollama Fallback Untested:**
- Files: `services/sentiment_analyzer/analyzers/ollama_analyzer.py`, `services/sentiment_analyzer/main.py` (lines 71-79)
- Why fragile: Fallback path (Ollama) rarely exercised if FinBERT always has high confidence
- Safe modification: Add integration test that mocks FinBERT confidence < threshold, verifies Ollama called
- Test coverage: No explicit test for fallback path
- Impact: If FinBERT fails completely, Ollama fallback could silently degrade or crash
**Signal Generator Ensemble Evaluation:**
- Files: `services/signal_generator/main.py` (line 149), `services/signal_generator/ensemble.py`
- Why fragile: Three strategies weighted by Redis weights; if weights become NaN or negative, ensemble breaks silently
- Safe modification: Add assertions that weights are valid (sum~1, all in [0,1]), implement weight validation on load
- Test coverage: No explicit test for weight anomalies (e.g., all weights corrupted)
- Impact: Invalid weights could produce signals with undefined strength
**Risk Manager Market Hours Check Time Zone:**
- Files: `services/trade_executor/risk_manager.py` (lines 19-25, 59-61)
- Why fragile: Hard-coded ET timezone; DST transitions not handled explicitly
- Safe modification: Use ZoneInfo with explicit DST handling (Python 3.9+), add tests for DST boundaries
- Test coverage: No explicit DST test
- Impact: Orders placed 1 hour off during DST transition
**Learning Engine Position Matching Logic:**
- Files: `services/learning_engine/main.py` (lines 85-98, 121-138)
- Why fragile: String comparison of side enum (`opening_side == OrderSide.BUY.value`)
- Safe modification: Use enum comparison directly, add type hints, add unit tests for all side combinations
- Test coverage: Tests exist but may not cover edge cases (e.g., SELL -> BUY -> SELL)
- Impact: Position close detection fails, P&L not recorded, weights never adjusted
**WebAuthn Credential Validation:**
- Files: `services/api_gateway/auth/routes.py` (check WebAuthn implementation)
- Why fragile: WebAuthn implementation may not validate all required fields (origin, user verification, etc.)
- Safe modification: Review against OWASP WebAuthn checklist, add explicit assertion tests
- Test coverage: Limited test coverage for security properties
- Impact: Weak authentication allows account takeover
---
## Scaling Limits
**Redis Single Node No Persistence:**
- Current capacity: In-memory Redis instance; loses all data on restart
- Limit: All stream data (news, signals, trades) lost if Redis crashes
- Blocks: Reliable message delivery, audit trails, service recovery
- Scaling path:
- Enable AOF (append-only file) persistence
- Consider Redis Cluster for HA
- Alternatively, add database-backed message queue (e.g., TimescaleDB FIFO table)
**PostgreSQL Shared Memory Not Tuned:**
- Current capacity: Default shared_buffers, work_mem, etc.
- Limit: Query performance degrades with 7 concurrent services
- Scaling path:
- Tune PostgreSQL config: shared_buffers, work_mem, effective_cache_size per machine
- Add read replicas for reporting queries
- Partition portfolio snapshots by date (TimescaleDB hypertables)
**Dashboard WebSocket No Backpressure:**
- Current capacity: Single WebSocket connection broadcasts all events
- Limit: Slow clients blocked, queue unbounded memory growth
- Scaling path:
- Implement client subscription topics (only send relevant tickers)
- Add message batching (send updates every 100ms)
- Implement client-side connection pooling
**No Circuit Breaker for Alpaca API:**
- Current capacity: Alpaca API calls block entire trade executor on timeout
- Limit: One slow Alpaca response stalls all signal processing
- Scaling path:
- Implement circuit breaker pattern (fail-open after N failures)
- Add request timeout with retry logic
- Queue unprocessable signals for retry after recovery
---
## Dependencies at Risk
**Ollama Local LLM Not Containerized with Stack:**
- Risk: Ollama runs separately; must be manually started for sentiment analyzer to work
- Files: `docker-compose.yml` (check if Ollama service present)
- Impact: Sentiment analyzer fails silently if Ollama not running; fallback depends on FinBERT
- Migration plan: Add Ollama to docker-compose.yml, or make Ollama truly optional with better error handling
**FinBERT Model Download Not Cached:**
- Risk: FinBERT downloads model from HuggingFace on first run (~600MB)
- Files: `services/sentiment_analyzer/analyzers/finbert.py`
- Impact: Long startup time, may timeout in resource-constrained environments
- Migration plan: Pre-download model into Docker image, add model cache invalidation strategy
**Alpaca SDK Direct HTTP Calls Not Mocked in Integration Tests:**
- Risk: Integration tests mock Alpaca SDK, so real behavior never tested
- Files: `tests/services/test_trade_executor.py` (check mocks), `tests/integration/`
- Impact: Order submission bugs discovered only in production
- Migration plan: Implement live integration tests against Alpaca paper trading, verify in CI/CD
**SQLAlchemy 2.0 Async Mode Requires Specific Dialect:**
- Risk: AsyncPG optional dependency; falls back to sync driver if missing
- Files: `shared/db.py` (line 13 uses `postgresql+asyncpg://`)
- Impact: If asyncpg not installed, services run in sync mode, defeating async benefits
- Migration plan: Add explicit check in `create_db()` to fail if asyncpg missing
---
## Missing Critical Features
**No Duplicate Trade Detection:**
- Problem: If signal generator processes same article twice (due to Redis consumer group lag), executor may place duplicate trades
- Blocks: Multi-service reliability, audit accuracy
- Scope: Implement idempotency key in TradeExecution, check signal_id before placing order
**No Circuit Break / Trading Pause Mechanism:**
- Problem: If models degrade, bot continues trading losses
- Blocks: Risk mitigation, manual emergency stop
- Current: `TRADING_PAUSED_KEY` in Redis (`services/api_gateway/routes/controls.py`), but not enforced in executor
- Scope: Check executor consumes trading pause flag before submitting orders
**No Backtester Trade Slippage Model:**
- Problem: Backtester assumes fills at exact signal price; unrealistic
- Blocks: Accurate backtest results, risk estimation
- Scope: Add configurable slippage model (fixed basis points + volume-based)
**No Strategy Parameter Sensitivity Analysis:**
- Problem: Can't understand which parameters drive P&L
- Blocks: Scientific strategy optimization, risk understanding
- Scope: Implement parameter sweep in backtester, record sensitivity metrics
**No Live Trade Reconciliation:**
- Problem: If Alpaca and local DB diverge, no auto-correction
- Blocks: Accurate position tracking, tax reporting
- Scope: Implement daily reconciliation job that compares broker vs. DB, alerts on mismatch
---
## Test Coverage Gaps
**Dashboard Components Not Unit Tested:**
- What's not tested: React components (portfolio display, trade table, charts, alerts)
- Files: `dashboard/src/components/`, `dashboard/src/pages/`
- Risk: Component bugs, broken UI layouts only caught in E2E (if at all)
- Priority: High — dashboard is user-facing; regressions impact usability
**API Routes Happy Path Only:**
- What's not tested: Edge cases (missing fields, invalid enum values, concurrent requests)
- Files: `tests/services/test_api_routes.py` (13 tests covering basic CRUD)
- Risk: API crashes on edge inputs, poor error messages
- Priority: Medium — most users use dashboard, not direct API
**Market Data Manager Not Unit Tested:**
- What's not tested: Deque overflow behavior, snapshot generation with missing bars
- Files: `services/signal_generator/market_data.py` (no test file)
- Risk: Ensemble generates signals with incomplete data
- Priority: High — directly impacts signal quality
**Learning Engine Weight Adjustment Edge Cases:**
- What's not tested: Weight convergence, NaN/inf propagation, simultaneous rewards for same strategy
- Files: `services/learning_engine/weight_adjuster.py` (test coverage unknown)
- Risk: Weights diverge to extremes, strategies disabled unintentionally
- Priority: High — learning is core feature
**WebAuthn Registration and Verification:**
- What's not tested: Invalid registration data, replay attacks, malformed challenge responses
- Files: `services/api_gateway/auth/routes.py` (WebAuthn handlers)
- Risk: Authentication bypass, account takeover
- Priority: Critical — security sensitive
**Redis Streams Consumer Group Edge Cases:**
- What's not tested: Consumer group recovery after crash, message redelivery, offset management
- Files: `shared/redis_streams.py` (5 tests total)
- Risk: Messages lost or duplicated after service restart
- Priority: High — data reliability depends on it
**Concurrent Signal and Trade Execution:**
- What's not tested: Race conditions when multiple signals processed simultaneously
- Files: Integration tests mock this with single thread
- Risk: Phantom positions, incorrect risk calculations
- Priority: High — real production has concurrent load
---
## Recommendations for Production Readiness
**Before going live, address at minimum:**
1. **Critical (blocks production):**
- Implement portfolio metrics computation (cumulative P&L, max drawdown)
- Fix portfolio sync query performance (N+1 issue)
- Validate JWT secret key entropy at startup
- Add rate limiting to auth endpoints
- Implement circuit breaker for Alpaca API calls
- Add duplicate trade detection (idempotency)
2. **High (should address soon):**
- Implement trade reconciliation job
- Move position state from Redis to database
- Add specific exception handling (not bare Exception)
- Implement backpressure in WebSocket broadcasting
- Test Ollama fallback path end-to-end
- Add comprehensive dashboard unit tests
3. **Medium (can defer):**
- Optimize portfolio sync with diff-based approach
- Persist market data and sentiment to TimescaleDB
- Tune PostgreSQL connection pool
- Add DST test coverage for market hours
- Implement parameter sensitivity analysis in backtester
---
*Concerns audit: 2026-02-23*

View file

@ -0,0 +1,276 @@
# Coding Conventions
**Analysis Date:** 2026-02-23
## Naming Patterns
**Files (Python):**
- Modules: lowercase with underscores (e.g., `redis_streams.py`, `finbert_analyzer.py`)
- Packages: underscores, not hyphens (e.g., `news_fetcher`, `sentiment_analyzer`)
- Private modules: prefix with underscore (e.g., `_deduplicate_and_publish`)
**Files (TypeScript/React):**
- Components: PascalCase (e.g., `MetricsRow.tsx`, `ProtectedRoute.tsx`)
- Hooks: camelCase with `use` prefix (e.g., `useAuth.ts`, `useWebSocket.ts`)
- Utilities/APIs: camelCase (e.g., `client.ts`, `auth.ts`)
- Config files: camelCase with dots (e.g., `tsconfig.app.json`, `vite.config.ts`)
**Functions and Methods:**
- Snake_case in Python (e.g., `_deduplicate_and_publish`, `ensure_group`)
- CamelCase in TypeScript/React (e.g., `startRegistration`, `useCallback`)
- Async functions: prefix with `async` keyword, no special naming convention
- Private/internal functions: prefix with underscore in Python (e.g., `_make_config`, `_make_signal`)
**Variables:**
- Snake_case in Python for all variables (e.g., `session_factory`, `confidence_threshold`)
- CamelCase in TypeScript (e.g., `isAuthenticated`, `loading`, `token`)
- Constants: UPPER_CASE with underscores (e.g., `SEEN_HASHES_KEY = "news:seen_hashes"`, `RAW_STREAM = "test:news:raw"`)
- Loop variables and temporaries: follow standard conventions (e.g., `msg_id`, `score`, `ticker`)
**Types and Classes:**
- Classes: PascalCase (e.g., `StreamPublisher`, `TradeEvaluator`, `BaseStrategy`)
- Enums: PascalCase (e.g., `TradeSide`, `TradeStatus`, `SignalDirection`)
- Type hints: use full import path or import explicitly (e.g., `list[PositionInfo]`, `dict[str, str]`)
- Generic types: use Python 3.12+ syntax without `typing.` prefix (e.g., `dict`, `list`, `tuple`)
**Interfaces (TypeScript):**
- PascalCase with `Props` suffix for component props (e.g., `MetricsRowProps`)
- PascalCase for return types (e.g., `UseAuthReturn`)
- Prefix common interfaces: `I` is not used; use descriptive names instead
## Code Style
**Formatting (Python):**
- Line length: 120 characters (configured in `pyproject.toml` under `[tool.ruff]`)
- Target Python version: 3.12 (configured under `[tool.ruff] target-version`)
- Formatter: Ruff (for linting and formatting)
- No Black configuration — Ruff is the primary tool
**Formatting (TypeScript/React):**
- Target: ES2022 (configured in `tsconfig.app.json`)
- Module resolution: bundler
- Strict mode: enabled (`"strict": true`)
- JSX mode: react-jsx
**Linting (Python):**
- Tool: Ruff (`ruff>=0.3`)
- Key rules: Follow Python 3.12 conventions, enforce async functions
- MyPy: enabled (`mypy>=1.8`)
- `warn_return_any = true`
- `warn_unused_configs = true`
**Linting (TypeScript):**
- ESLint with flat config (`eslint.config.js` in `dashboard/`)
- Extends:
- `@eslint/js` (recommended)
- `typescript-eslint` (recommended)
- `react-hooks` (react-hooks/exhaustive-deps)
- `react-refresh` (for Vite)
- Global ignores: `dist/`
- Files: `**/*.{ts,tsx}`
## Import Organization
**Python Import Order:**
1. `from __future__ import annotations` (always first if present)
2. Standard library: `import asyncio`, `import json`, `import logging`, etc.
3. Third-party packages: `from redis.asyncio import Redis`, `from pydantic import BaseModel`, etc.
4. Shared modules: `from shared.config import BaseConfig`, `from shared.redis_streams import StreamPublisher`
5. Service-specific modules: `from services.news_fetcher.config import NewsFetcherConfig`
6. Absolute imports preferred; relative imports acceptable within a service
**TypeScript Import Order:**
1. React and core libraries: `import { useState, useCallback } from 'react'`
2. Routing: `import { Routes, Route } from 'react-router-dom'`
3. Third-party UI/packages: `import { startRegistration } from '@simplewebauthn/browser'`
4. Internal utilities/API: `import { client } from '../api/client'`
5. Components: `import { MetricsRow } from '../components/MetricsRow'`
6. Hooks: `import { useAuth } from '../hooks/useAuth'`
**Path Aliases:**
- Python: No path aliases configured; use absolute imports from project root
- TypeScript: None configured in `tsconfig.app.json`; use relative paths with `../`
## Error Handling
**Python Error Handling:**
- Use `try/except` for external service calls (Redis, database, API calls)
- Log all exceptions at the point of handling: `logger.exception("Context: %s", variable)`
- Specific exception handling before generic: `except IntegrityError:` before `except Exception:`
- Re-raise specific errors or return default/None on fallback paths
- Example from `sentiment_analyzer/main.py`:
```python
try:
async with db_session_factory() as session:
# persist to DB
except IntegrityError:
# Handle duplicate
pass
except Exception:
logger.exception("Error processing article: %s", data.get("title", "<unknown>"))
```
- Async functions use `try/except` around `await` calls
- Use `asyncio.TimeoutError` for timeout handling
**TypeScript/React Error Handling:**
- Catch errors from API calls and set error state: `err?.response?.data?.detail || err?.message`
- Always provide fallback messages: `'Registration failed'` as default
- Log error context: `throw err` after setting error state to propagate to caller if needed
- Example from `useAuth.ts`:
```typescript
catch (err: any) {
const message = err?.response?.data?.detail || err?.message || 'Registration failed';
setError(message);
throw err;
}
```
## Logging
**Framework:** Python uses `logging` module; no structured logging library
**Patterns:**
- Create logger at module level: `logger = logging.getLogger(__name__)`
- Use appropriate levels:
- `logger.debug()` — Detailed debug info (e.g., "Published to stream:xxx")
- `logger.info()` — General informational (e.g., "Created consumer group…")
- `logger.warning()` — Warning conditions (e.g., "Order rejected by broker")
- `logger.exception()` — Log exceptions with stack trace (always use at exception point)
- Always include context in log messages using `%` formatting: `logger.info("Message with %s", variable)`
- Do NOT use f-strings in logging (allows lazy evaluation)
- Log level configuration: `log_level` in `BaseConfig` (default: "INFO")
**Examples from codebase:**
```python
logger.debug("Published to %s: %s", self.stream, msg_id)
logger.info("Created consumer group %s on %s", self.group, self.stream)
logger.exception("Ollama analysis failed") # Includes stack trace
logger.warning("Order rejected by Alpaca: %s", exc)
```
## Comments
**When to Comment:**
- Complex business logic that isn't self-documenting (e.g., weight adjustment formulas, P&L calculations)
- Non-obvious design decisions (e.g., "SADD returns 1 if the member was added (i.e. not already present)")
- Caveats or workarounds (e.g., "BUSYGROUP means group already exists — expected on subsequent starts.")
- Public APIs and exported functions should have docstrings
**When NOT to Comment:**
- Self-explanatory code (variable names, clear function logic)
- Redundant comments that repeat the code
- Outdated comments (remove or update)
**Docstring/Comments:**
- Module docstrings: Present at top of file, describe purpose
```python
"""Thin wrappers around redis-py Streams for publish/consume with JSON serialization."""
```
- Class docstrings: Describe the class purpose and key behavior
```python
class StreamPublisher:
"""Publishes JSON-encoded messages to a Redis Stream."""
```
- Method docstrings: Use Google/NumPy style for async methods
```python
async def ensure_group(self) -> None:
"""Create the consumer group if it does not already exist."""
```
- Parameters/Returns documented in docstrings for public APIs:
```python
"""Score a single article and publish one ScoredArticle per extracted ticker.
Parameters
----------
article:
The raw article consumed from the ``news:raw`` stream.
"""
```
**Inline Comments:**
- Prefix with `#` and space: `# This explains why…`
- Place above the code line (not at end of line unless very brief)
- Example: `# SADD returns 1 if the member was added (i.e. not already present)`
## Function Design
**Size Guidelines:**
- Functions should be focused and single-responsibility
- Async functions in services are typically 5-30 lines (after docstring)
- Helper functions (prefixed with `_`) are 3-20 lines
- Example: `_deduplicate_and_publish` is ~20 lines, handles one concern
**Parameters:**
- Use keyword arguments for clarity in calls: `await publish(data=article, index=stream)`
- Type hints are required for all parameters and returns
- Default parameters allowed (e.g., `batch_size: int = 10`)
- For configuration, pass config objects rather than many flags: `config: MyConfig` not `threshold=0.5, timeout=10`
**Return Values:**
- Use union types for optional/multiple returns: `TradeSignal | None` (Python 3.10+ syntax)
- Return early for error cases (guard clauses):
```python
if not tickers:
logger.debug("No tickers found")
return
```
- For async functions, return types should use `Awaitable` or direct annotation: `async def x() -> PositionInfo:`
## Module Design
**Exports:**
- Use `__all__` in modules that re-export from submodules
- Example from `shared/models/__init__.py`:
```python
__all__ = [
"Base",
"TimestampMixin",
# Trading
"Strategy",
"Signal",
# ...
]
```
- Imports are grouped by category (Trading, News, Learning, Auth, Timeseries) with comments
**Barrel Files:**
- `shared/models/__init__.py` acts as a barrel, importing all models so Alembic can discover them
- Comment each import group for clarity
- Do NOT use wildcard imports (`from x import *`)
**Config Classes:**
- All config classes extend `shared.config.BaseConfig` (Pydantic BaseSettings)
- Use `model_config = {"env_prefix": "TRADING_"}` for environment variable loading
- Example: `services/news_fetcher/config.py` extends `BaseConfig`
**Service Entry Points:**
- Located at `services/{service_name}/main.py`
- Contains `async def main()` or async generator functions
- Modules imported from shared libraries and service-specific modules
- Signal handling for graceful shutdown
## Special Conventions
**Redis Stream Constants:**
- Define at module level as uppercase strings
- Example: `NEWS_RAW_STREAM = "news:raw"`, `SEEN_HASHES_KEY = "news:seen_hashes"`
**Pydantic Models:**
- Use `model_dump(mode="json")` when serializing to JSON (v2 syntax)
- Set `model_config = {"from_attributes": True}` for ORM mapping
- Enum fields use value comparison: `TradeSide.BUY == "BUY"`
**SQLAlchemy (2.0+):**
- Use async sessions: `async_sessionmaker` with async context managers
- Use `mapped_column` style (not `Column`)
- Type hints required on model fields
- Example: models in `shared/models/trading.py`
**Test Naming:**
- Test functions: `test_{component}_{scenario}` (e.g., `test_publisher_publishes_json`)
- Test classes: `Test{ComponentName}` (e.g., `TestEvaluateProfitableTrade`)
- Helper functions in tests: prefix with `_make_` (e.g., `_make_config`, `_make_signal`)
---
*Convention analysis: 2026-02-23*

View file

@ -0,0 +1,221 @@
# External Integrations
**Analysis Date:** 2025-02-23
## APIs & External Services
**Brokerage Trading:**
- Alpaca Markets - Paper and live trading
- SDK/Client: `alpaca-py` (21.0+)
- API Key: `TRADING_ALPACA_API_KEY` env var
- Secret: `TRADING_ALPACA_SECRET_KEY` env var
- Base URL: `TRADING_ALPACA_BASE_URL` (defaults to paper trading endpoint)
- Paper trading: `TRADING_PAPER_TRADING=true` flag in config
- Used by:
- `shared/broker/alpaca_broker.py` - Order execution, position tracking, account info via `TradingClient`
- `services/market_data/main.py` - Historical bars via `StockHistoricalDataClient` and live streaming
- `services/api_gateway/tasks/portfolio_sync.py` - Portfolio synchronization background task
- `services/trade_executor/main.py` - Order placement and status monitoring
**News & Content Sources:**
- Yahoo Finance RSS - Financial news feed
- Feed URL: `https://finance.yahoo.com/news/rssindex`
- Client: feedparser 6.0+
- Polling: `TRADING_RSS_POLL_INTERVAL_SECONDS` (default 300s)
- Configured in: `services/news_fetcher/config.py`
- Reuters RSS - Business news feed
- Feed URL: `https://feeds.reuters.com/reuters/businessNews`
- Client: feedparser 6.0+
- Dow Jones RSS - Market news feed
- Feed URL: `https://feeds.content.dowjones.io/public/rss/mw_topstories`
- Client: feedparser 6.0+
- Reddit - Wallstreetbets, stocks, investing communities
- SDK/Client: praw 7.7+ (blocking) and asyncpraw 7.7+ (async)
- Client ID: `TRADING_REDDIT_CLIENT_ID` env var
- Client Secret: `TRADING_REDDIT_CLIENT_SECRET` env var
- User Agent: `TRADING_REDDIT_USER_AGENT` (default: `trading-bot/0.1`)
- Subreddits: `["wallstreetbets", "stocks", "investing"]` (configurable via `TRADING_REDDIT_SUBREDDITS`)
- Min score filter: `TRADING_REDDIT_MIN_SCORE` (default 10)
- Polling: `TRADING_REDDIT_POLL_INTERVAL_SECONDS` (default 600s)
- Used by: `services/news_fetcher/main.py`
**Machine Learning Models:**
- FinBERT (Hugging Face) - Financial sentiment classification
- Model: `ProsusAI/finbert` (transformer)
- Confidence threshold: `TRADING_FINBERT_CONFIDENCE_THRESHOLD` (default 0.4)
- Library: transformers 4.38+
- Used by: `services/sentiment_analyzer/analyzers/finbert.py`
- Location: Loaded from Hugging Face model hub
- Ollama - Local LLM for fallback sentiment analysis
- Host: `TRADING_OLLAMA_HOST` (default `http://localhost:11434`)
- Model: `TRADING_OLLAMA_MODEL` (default `gemma3`)
- Used when: FinBERT confidence below threshold
- Library: ollama-python 0.1+
- Used by: `services/sentiment_analyzer/analyzers/ollama_analyzer.py`
## Data Storage
**Databases:**
- PostgreSQL 16 + TimescaleDB extension (Docker: `timescale/timescaledb:latest-pg16`)
- Connection: Async via asyncpg driver
- URL: `TRADING_DATABASE_URL` env var (default: `postgresql+asyncpg://trading:trading@localhost:5432/trading`)
- Username: `trading` (env: `POSTGRES_USER`)
- Password: `POSTGRES_PASSWORD` env var
- Database: `trading`
- ORM: SQLAlchemy 2.0+ with mapped_column style
- Migrations: Alembic (auto-run via `python -m alembic upgrade head`)
- Tables (14 models in `shared/models/`):
- Auth: `users`, `webauthn_credentials`
- News: `articles`, `article_sentiments`
- Trading: `strategies`, `trade_signals`, `executed_trades`, `positions`
- Learning: `strategy_weights`, `weight_history`
- Market data: OHLCV timeseries via TimescaleDB hypertables
- Used by: All services via `shared/db.py` async sessionmaker
**File Storage:**
- Local filesystem only (no cloud storage)
- Docker volumes for persistent data:
- `pgdata` - PostgreSQL persistent storage
- `redisdata` - Redis persistent storage
**Caching & Message Broker:**
- Redis 7-alpine (Docker image)
- Connection: `TRADING_REDIS_URL` env var (default: `redis://localhost:6379/0`)
- Default port: 6379
- Persistent volume: `redisdata`
- Features used:
- Streams (consumer groups) for inter-service messaging
- Key-value cache for rate limiting
- Stream topics (producer/consumer contracts):
- `news:raw` - Raw articles from news fetcher → consumed by sentiment analyzer
- `news:scored` - Scored articles from sentiment analyzer → consumed by signal generator
- `signals:generated` - Trade signals from signal generator → consumed by trade executor and learning engine
- `trades:executed` - Executed trades from trade executor → consumed by learning engine
- Client library: redis 5.0+ with async support
- Consumer group per service (e.g., `sentiment-analyzer-group` on `news:raw` stream)
## Authentication & Identity
**Auth Provider:**
- Custom WebAuthn/Passkey implementation (no third-party OAuth provider)
- Library: webauthn 2.0+ (Python backend)
- Frontend: @simplewebauthn/browser 13.2.2 (JavaScript/TypeScript)
- Flow: WebAuthn registration → credential storage → authentication challenge verification
- Credentials storage: `webauthn_credentials` table in PostgreSQL
- Implementation: `services/api_gateway/auth/routes.py` (register/login/verify endpoints)
**Session Management:**
- JWT tokens with HS256 (HMAC-SHA256)
- Library: PyJWT 2.8+ with crypto support
- Secret key: `TRADING_JWT_SECRET_KEY` env var (REQUIRED, must be 32+ bytes in production)
- Algorithm: `TRADING_JWT_ALGORITHM` (default: HS256)
- Access token expiry: `TRADING_ACCESS_TOKEN_EXPIRE_MINUTES` (default 15)
- Refresh token expiry: `TRADING_REFRESH_TOKEN_EXPIRE_DAYS` (default 7)
- Token verification: `services/api_gateway/auth/middleware.py` (Bearer token extraction and validation)
- Implementation: `services/api_gateway/auth/jwt.py`
## Monitoring & Observability
**Error Tracking:**
- None detected - No Sentry or other APM integration
- Logging: Standard Python logging module with `TRADING_LOG_LEVEL` env var
**Logs:**
- Console logging to stdout/stderr
- Log level controlled by: `TRADING_LOG_LEVEL` env var (default: INFO)
- Each service logs to standard output (captured by Docker/container orchestration)
**Metrics & Telemetry:**
- OpenTelemetry 1.20+ - Metrics collection and export
- PrometheusMetricReader - Metrics exporter
- prometheus-client - HTTP `/metrics` endpoint
- Setup: `shared/telemetry.py` - `setup_telemetry()` called by each service
- Metrics port: `TRADING_OTEL_METRICS_PORT` env var (default 9090)
- Service name: `TRADING_OTEL_SERVICE_NAME` env var (default: `trading-bot`)
- Scrape endpoint: `http://<service>:9090/metrics` for each service
- Note: Prometheus/Grafana not deployed; metrics available for external monitoring
## CI/CD & Deployment
**Hosting:**
- Docker & Docker Compose (local development and self-hosted deployment)
- No managed hosting detected (e.g., AWS, Heroku, GCP)
**CI Pipeline:**
- None detected - No GitHub Actions, GitLab CI, or other CI/CD pipeline
**Docker Composition:**
- Services defined in `docker-compose.yml`:
- `postgres` - Database
- `redis` - Message broker
- `migrations` - Database schema initialization
- `news-fetcher` - Service
- `sentiment-analyzer` - Service
- `signal-generator` - Service
- `trade-executor` - Service
- `learning-engine` - Service
- `market-data` - Service
- `api-gateway` - FastAPI + WebSocket server (port 8000)
- `dashboard` - Nginx serving React build (port 3000)
## Environment Configuration
**Required env vars:**
- `TRADING_JWT_SECRET_KEY` - **REQUIRED** for API Gateway JWT signing (must be set, generate with `python -c "import secrets; print(secrets.token_hex(32))"`)
- `TRADING_ALPACA_API_KEY` - Alpaca account API key
- `TRADING_ALPACA_SECRET_KEY` - Alpaca account secret
- `TRADING_REDDIT_CLIENT_ID` - Reddit app client ID
- `TRADING_REDDIT_CLIENT_SECRET` - Reddit app secret
**Optional env vars with defaults:**
- `TRADING_DATABASE_URL` - Default: `postgresql+asyncpg://trading:trading@localhost:5432/trading`
- `TRADING_REDIS_URL` - Default: `redis://localhost:6379/0`
- `TRADING_LOG_LEVEL` - Default: `INFO`
- `TRADING_ALPACA_BASE_URL` - Default: Paper trading endpoint
- `TRADING_PAPER_TRADING` - Default: `true`
- `TRADING_OLLAMA_HOST` - Default: `http://localhost:11434`
- `TRADING_OLLAMA_MODEL` - Default: `gemma3`
- `TRADING_FINBERT_CONFIDENCE_THRESHOLD` - Default: `0.4`
- `TRADING_RSS_POLL_INTERVAL_SECONDS` - Default: `300`
- `TRADING_REDDIT_POLL_INTERVAL_SECONDS` - Default: `600`
- `TRADING_RP_ID` - Default: `localhost`
- `TRADING_RP_NAME` - Default: `Trading Bot`
- `TRADING_RP_ORIGIN` - Default: `http://localhost:5173`
- `TRADING_CORS_ORIGINS` - Default: `["http://localhost:5173"]`
- `POSTGRES_PASSWORD` - Default: `trading`
**Secrets location:**
- `.env` file (git-ignored, loads via docker-compose `env_file` directive)
- Environment variables passed at container runtime
- No secrets management service (Vault, Secrets Manager) integrated
## Webhooks & Callbacks
**Incoming:**
- None detected - No webhook endpoints for external services
**Outgoing:**
- None detected - No outbound webhooks to external systems
**Real-time Communication:**
- WebSocket support via `websockets` library for API Gateway
- Used for: Real-time dashboard updates (TBD implementation in routes)
- Endpoint: `/ws` proxy configured in Vite dev server
## Service Dependencies (Internal Messaging)
**Message Flows:**
1. News Fetcher → (`news:raw` stream) → Sentiment Analyzer
2. Sentiment Analyzer → (`news:scored` stream) → Signal Generator
3. Signal Generator → (`signals:generated` stream) → Trade Executor + Learning Engine
4. Trade Executor → (`trades:executed` stream) → Learning Engine
5. Market Data → Alpaca API → Database (timeseries persisted)
6. API Gateway → Alpaca API → Portfolio sync background task
---
*Integration audit: 2025-02-23*

169
.planning/codebase/STACK.md Normal file
View file

@ -0,0 +1,169 @@
# Technology Stack
**Analysis Date:** 2025-02-23
## Languages
**Primary:**
- Python 3.12 - Backend microservices, data processing, model serving
- TypeScript 5.9 - React frontend, type safety
- JavaScript - React components, utilities
**Secondary:**
- Bash - Docker entrypoints, scripts
## Runtime
**Environment:**
- Python 3.12 runtime (Docker: `python:3.12-slim`)
- Node.js 20 (Docker: `node:20-alpine`)
**Package Manager:**
- pip (Python) with `pyproject.toml` configuration
- npm (Node) with `package.json` and `package-lock.json`
- Lockfile: `package-lock.json` present for frontend
## Frameworks
**Core Backend:**
- FastAPI 0.110+ - HTTP server for API gateway at `services/api_gateway/main.py`
- SQLAlchemy 2.0+ with asyncio - ORM for PostgreSQL at `shared/db.py`
- Alembic 1.13+ - Database migrations in `alembic/` directory
- Uvicorn 0.27+ - ASGI server for FastAPI
**Frontend:**
- React 19.2 - UI framework
- Vite 7.3 - Build tool and dev server
- Tailwind CSS 4.2 - Styling framework
- React Router 7.13 - Client-side routing
**Data & State:**
- TanStack Query 5.90+ - Server state management for API calls
- Axios 1.13+ - HTTP client for backend communication
- Recharts 3.7 - Charts for portfolio analytics
- TradingView lightweight-charts 5.1 - Professional candlestick charts
**Testing:**
- pytest 8.0+ - Python test runner with `asyncio_mode = "auto"`
- pytest-asyncio 0.23+ - Async test support
- pytest-cov 4.1+ - Coverage reporting
- React Testing Library - Frontend component testing (imported but not actively tested)
**Build/Dev:**
- Ruff 0.3+ - Python linter and formatter (line-length: 120)
- MyPy 1.8+ - Python type checker
- TypeScript Compiler (tsc) - TypeScript compilation
- ESLint 9.39+ - JavaScript/TypeScript linting with React hooks and refresh plugins
## Key Dependencies
**Critical Backend:**
- redis 5.0+ - Message broker via Redis Streams at `shared/redis_streams.py`
- asyncpg 0.29+ - PostgreSQL async driver (via SQLAlchemy)
- pydantic 2.0+ - Data validation and settings management at `shared/config.py`
- pydantic-settings 2.0+ - Environment-based configuration
**Machine Learning & NLP:**
- transformers 4.38+ - FinBERT model for sentiment analysis at `services/sentiment_analyzer/analyzers/finbert.py`
- torch 2.2+ - PyTorch runtime for transformers
- ollama 0.1+ - Local LLM inference client (fallback analyzer) at `services/sentiment_analyzer/analyzers/ollama_analyzer.py`
**News & Data Sources:**
- feedparser 6.0+ - RSS feed parsing at `services/news_fetcher/main.py`
- praw 7.7+ - Reddit API client (blocking)
- asyncpraw 7.7+ - Async Reddit API client for non-blocking Reddit fetches
- httpx 0.27+ - Async HTTP client for web requests
**Brokerage Integration:**
- alpaca-py 0.21+ - Alpaca trading API client at `shared/broker/alpaca_broker.py` and `services/market_data/main.py`
- Provides: Order management, position tracking, market data (bars), account info
- Used by: Trade executor, market data service, API gateway portfolio sync
**Authentication & Security:**
- webauthn 2.0+ - WebAuthn/passkey registration and verification at `services/api_gateway/auth/routes.py`
- PyJWT 2.8+ with crypto support - JWT token creation/verification at `services/api_gateway/auth/jwt.py`
**Observability:**
- opentelemetry-sdk 1.20+ - Metrics collection framework
- opentelemetry-exporter-prometheus 0.45b+ - Prometheus metrics exporter
- opentelemetry-api 1.20+ - Telemetry instrumentation
- prometheus-client - HTTP metrics server (imported in `shared/telemetry.py`)
**Data Processing:**
- numpy 1.26+ - Numeric operations for backtester
- pandas 2.2+ - Time series data manipulation
- pytz 2024.1+ - Timezone support for market data
**Utilities:**
- websockets 12.0+ - WebSocket support for real-time APIs (in api_gateway optional-dependencies)
## Configuration
**Environment:**
- Pydantic BaseSettings with `TRADING_` prefix for all environment variables
- Service-specific config classes extend `BaseConfig` at:
- `services/sentiment_analyzer/config.py` - FinBERT threshold, Ollama host
- `services/news_fetcher/config.py` - RSS feeds, Reddit credentials
- `services/trade_executor/config.py` - Risk limits, Alpaca credentials
- `services/api_gateway/config.py` - JWT settings, CORS, WebAuthn RP
- `services/market_data/config.py` - Watchlist, bar timeframe, poll intervals
**Configuration File:**
- `.env` - Runtime environment variables (not tracked, example: `.env.example`)
- `pyproject.toml` - Project metadata and optional dependency groups:
- `[api]` - FastAPI, WebAuthn, JWT
- `[news]` - feedparser, praw, asyncpraw, httpx
- `[sentiment]` - transformers, torch, ollama
- `[trading]` - alpaca-py, pytz
- `[backtester]` - numpy, pandas
- `[dev]` - pytest, coverage, linters
**Database:**
- PostgreSQL 16 + TimescaleDB extension (Docker: `timescale/timescaledb:latest-pg16`)
- Connection: Async with asyncpg driver
- Migrations: Alembic with `alembic upgrade head` (auto-run at startup)
**Cache/Message Broker:**
- Redis 7-alpine - In-memory data store
- Streams feature for inter-service messaging (consumer groups)
- Persistent volumes for both PostgreSQL and Redis
## Build Configuration
**Backend:**
- `pyproject.toml` - Single source of truth for Python dependencies
- Setuptools 70.0+ - Build backend
- Finds packages under: `shared*`, `services*`, `backtester*`, `scripts*`, `tests*`
**Frontend:**
- `dashboard/vite.config.ts` - Vite configuration with React plugin and Tailwind CSS
- `dashboard/tsconfig.json` - TypeScript configuration (references app and node configs)
- Build: `npm run build` produces static files served by Nginx
**Docker:**
- `docker/Dockerfile.service` - Multi-stage build for all Python services
- Stage 1: Builder installs dependencies via pip
- Stage 2: Runtime with slim Python image
- `docker/Dockerfile.dashboard` - Multi-stage Node+Nginx build
- Stage 1: Node 20 builds Vite app
- Stage 2: Nginx serves static files
- `docker-compose.yml` - Orchestrates: PostgreSQL, Redis, 7 microservices, dashboard
## Platform Requirements
**Development:**
- Python 3.12 with venv or virtualenv
- Node.js 20+ with npm
- Docker & Docker Compose (for integration tests and full-stack deployment)
- Optional: Ollama (local LLM, defaults to `http://localhost:11434`)
**Production:**
- Docker and Docker Compose
- PostgreSQL 16+ (external managed service or container)
- Redis 7+ (external managed service or container)
- Environment variables set via `.env` file or container environment
- Optional: External Ollama service for sentiment fallback
---
*Stack analysis: 2025-02-23*

View file

@ -0,0 +1,309 @@
# Codebase Structure
**Analysis Date:** 2025-02-23
## Directory Layout
```
trading-bot/
├── alembic/ # Database migration versions and configuration
│ ├── env.py # Alembic runtime environment
│ ├── script.py.mako # Migration template
│ └── versions/ # Individual migration files
├── backtester/ # Historical replay engine
│ ├── __init__.py
│ ├── config.py # Backtester configuration
│ ├── data_loader.py # Load bars from database
│ ├── engine.py # Main replay loop with SimulatedBroker
│ ├── metrics.py # Equity curve, Sharpe, drawdown calculations
│ └── simulated_broker.py # Paper trading for backtests
├── dashboard/ # React/TypeScript frontend
│ ├── public/ # Static assets (favicon, manifest)
│ ├── src/
│ │ ├── api/ # API client (auth.ts, client.ts)
│ │ ├── assets/ # Images, fonts
│ │ ├── components/ # Reusable UI (PositionsTable, EquityCurve, etc.)
│ │ ├── hooks/ # Custom React hooks (useAuth, useWebSocket, usePortfolio)
│ │ ├── pages/ # Page components (Login, Portfolio, TradeLog, etc.)
│ │ ├── App.tsx # Router and layout
│ │ ├── main.tsx # Entry point
│ │ └── index.css # Tailwind styles
│ ├── index.html # HTML template
│ ├── package.json # npm dependencies (Vite, React, TypeScript)
│ └── vite.config.ts # Vite build config
├── docker/ # Dockerfiles and compose configs
│ ├── Dockerfile.service # Python service base image
│ ├── Dockerfile.dashboard # Node.js build + Nginx serve
│ └── nginx.conf # Nginx reverse proxy config
├── docs/ # Documentation and planning
│ └── plans/ # Implementation plans from GSD phases
├── scripts/ # Utility scripts
│ ├── seed_strategies.py # Initialize default strategies in DB
│ ├── smoke_test.sh # Health check / integration test script
│ └── __init__.py
├── services/ # Microservices (Python packages with underscores)
│ ├── api_gateway/ # FastAPI HTTP/WebSocket server
│ │ ├── main.py # App factory, lifespan, router registration
│ │ ├── config.py # ApiGatewayConfig (CORS, JWT secret, etc.)
│ │ ├── ws.py # WebSocket endpoint
│ │ ├── auth/
│ │ │ ├── routes.py # Register/login/logout endpoints
│ │ │ ├── jwt.py # JWT token encoding/decoding
│ │ │ └── middleware.py # get_current_user dependency
│ │ ├── routes/ # REST endpoints organized by domain
│ │ │ ├── news.py # GET /api/news
│ │ │ ├── signals.py # GET /api/signals
│ │ │ ├── trades.py # GET /api/trades
│ │ │ ├── portfolio.py # GET /api/portfolio (live account)
│ │ │ ├── strategies.py # GET/PUT /api/strategies
│ │ │ ├── controls.py # POST /api/controls/* (pause/resume/etc)
│ │ │ └── backtest.py # POST /api/backtest
│ │ └── tasks/
│ │ └── portfolio_sync.py # Background task: sync account data
│ ├── news_fetcher/ # Polls RSS + Reddit
│ │ ├── main.py # Entry point: fetch + deduplicate + publish
│ │ ├── config.py # NewsFetcherConfig (feed URLs, poll intervals)
│ │ └── sources/
│ │ ├── rss.py # RSSSource.fetch() via feedparser
│ │ └── reddit.py # RedditSource.fetch() via asyncpraw
│ ├── sentiment_analyzer/ # Score sentiment + extract tickers
│ │ ├── main.py # Entry point: consume raw → score → publish
│ │ ├── config.py # SentimentAnalyzerConfig (thresholds, model paths)
│ │ ├── ticker_extractor.py # Regex-based company name → ticker
│ │ └── analyzers/
│ │ ├── finbert.py # FinBERTAnalyzer (transformers model)
│ │ └── ollama_analyzer.py # OllamaAnalyzer (fallback LLM)
│ ├── signal_generator/ # Combine sentiment + market → signals
│ │ ├── main.py # Entry point: consume scored articles + bars → ensemble
│ │ ├── config.py # SignalGeneratorConfig (ensemble threshold)
│ │ ├── ensemble.py # WeightedEnsemble orchestration
│ │ └── market_data.py # MarketDataManager: SMA/RSI calculation per ticker
│ ├── trade_executor/ # Risk checks + order submission
│ │ ├── main.py # Entry point: consume signals → risk check → submit
│ │ ├── config.py # TradeExecutorConfig (risk limits, position size formula)
│ │ └── risk_manager.py # RiskManager: position limits, market hours, cooldowns
│ ├── learning_engine/ # Evaluate trades + adjust weights
│ │ ├── main.py # Entry point: consume executions → evaluate → adjust weights
│ │ ├── config.py # LearningEngineConfig (adjustment parameters)
│ │ ├── evaluator.py # TradeEvaluator: match opens to closes, calc P&L
│ │ └── weight_adjuster.py # WeightAdjuster: multi-armed bandit logic
│ ├── market_data/ # Fetch + stream OHLCV bars
│ │ ├── main.py # Entry point: fetch from Alpaca → publish bars
│ │ ├── config.py # MarketDataConfig (watchlist, timeframe, limits)
│ │ └── __main__.py # Allows `python -m services.market_data`
│ └── __init__.py
├── shared/ # Shared Python libraries (imported by all services)
│ ├── __init__.py
│ ├── config.py # BaseConfig: common settings (DB, Redis, logging)
│ ├── db.py # create_db(): async engine + sessionmaker
│ ├── redis_streams.py # StreamPublisher, StreamConsumer
│ ├── telemetry.py # setup_telemetry(): OpenTelemetry + Prometheus
│ ├── broker/ # Brokerage abstraction layer
│ │ ├── base.py # BaseBroker ABC (abstract interface)
│ │ └── alpaca_broker.py # AlpacaBroker: alpaca-py implementation
│ ├── models/ # SQLAlchemy ORM models
│ │ ├── __init__.py # Imports all models so Alembic discovers them
│ │ ├── base.py # Base (declarative), TimestampMixin
│ │ ├── auth.py # User, UserCredential
│ │ ├── news.py # Article, ArticleSentiment
│ │ ├── trading.py # Signal, Trade, Position, Strategy, StrategyWeightHistory
│ │ ├── learning.py # TradeOutcome, LearningAdjustment
│ │ └── timeseries.py # MarketData, PortfolioSnapshot, StrategyMetric
│ ├── schemas/ # Pydantic v2 message schemas
│ │ ├── __init__.py # Imports all schemas
│ │ ├── base.py # BaseSchema, TimestampSchema
│ │ ├── news.py # RawArticle, ScoredArticle
│ │ ├── trading.py # TradeSignal, TradeExecution, OrderRequest, etc.
│ │ └── learning.py # TradeOutcomeSchema, WeightAdjustment
│ └── strategies/ # Trading strategy implementations
│ ├── __init__.py # Exports BaseStrategy, concrete implementations
│ ├── base.py # BaseStrategy ABC
│ ├── momentum.py # MomentumStrategy
│ ├── mean_reversion.py # MeanReversionStrategy
│ └── news_driven.py # NewsDrivenStrategy
├── tests/ # Test suites
│ ├── __init__.py
│ ├── conftest.py # Pytest fixtures (fake Redis, DB, configs)
│ ├── test_*.py # Unit tests: models, schemas, broker, strategies, backtester
│ ├── services/ # Service-specific tests
│ │ ├── test_news_fetcher.py
│ │ ├── test_sentiment_analyzer.py
│ │ ├── test_signal_generator.py
│ │ ├── test_trade_executor.py
│ │ ├── test_learning_engine.py
│ │ ├── test_api_auth.py
│ │ ├── test_api_routes.py
│ │ ├── test_market_data.py
│ │ └── test_portfolio_sync.py
│ └── integration/ # Integration tests (require docker)
│ ├── test_news_pipeline.py
│ └── test_trading_flow.py
├── .claude/ # Project-specific Claude knowledge
│ └── CLAUDE.md # Architecture notes, patterns, known issues
├── .env.example # Example environment variables
├── alembic.ini # Alembic configuration
├── docker-compose.yml # Service orchestration (8 services + infra)
├── pyproject.toml # Python package config, dependencies, test config
└── README.md # Project overview
```
## Directory Purposes
**`alembic/`:**
- Purpose: Database schema migrations using Alembic (SQLAlchemy migration tool)
- Contains: Migration scripts (timestamped Python files in `versions/`)
- Workflow: `alembic revision -m "message"` creates migration, `alembic upgrade head` applies to DB
**`backtester/`:**
- Purpose: Historical replay engine for strategy testing
- Contains: Engine loop, simulated broker, metrics calculation
- Used by: API endpoint `/api/backtest` (triggered by dashboard)
- Workflow: Load historical bars from DB, replay trades with `SimulatedBroker`, output equity curve + metrics
**`dashboard/`:**
- Purpose: React TypeScript frontend for user interaction
- Contains: Pages (Login, Portfolio, TradeLog, News, etc.), components (charts, tables), API client, hooks
- Entry: `src/main.tsx``src/App.tsx`
- Build: `npm run build` outputs to `dist/`, served by Nginx in Docker
**`docker/`:**
- Purpose: Container images and deployment config
- Contains: `Dockerfile.service` (Python services base), `Dockerfile.dashboard` (Node build + Nginx)
- Used by: docker-compose.yml to build and run all services
**`scripts/`:**
- Purpose: Utility and setup scripts
- Key files:
- `seed_strategies.py`: Inserts default strategies into DB (Momentum, MeanReversion, NewsDriven)
- `smoke_test.sh`: Checks all service health endpoints
**`services/*/main.py`:**
- Purpose: Entry point for each microservice
- Pattern: Each service has own `main.py` that can be run as `python -m services.SERVICE_NAME`
- Lifecycle: Setup config → connect to Redis/DB → start polling/consuming → handle shutdown gracefully
**`services/*/config.py`:**
- Purpose: Service-specific configuration
- Pattern: Extends `BaseConfig`, adds service-specific settings
- Example: `NewsFetcherConfig` adds `rss_urls`, `reddit_subreddits`, `rss_poll_interval_seconds`
**`shared/models/`:**
- Purpose: SQLAlchemy ORM models (database schema)
- Key tables: `trades`, `signals`, `articles`, `article_sentiments`, `market_data`, `positions`, `users`, `strategies`, `strategy_weight_history`, `learning_adjustments`, `portfolio_snapshots`
- Pattern: Each model imports in `__init__.py` so Alembic can auto-discover for migrations
**`shared/schemas/`:**
- Purpose: Pydantic v2 message schemas for validation
- Used by: Services to validate messages on consume/publish via Redis Streams
- Naming: Schemas mirror the Redis Stream message types (RawArticle, ScoredArticle, TradeSignal, TradeExecution, etc.)
**`shared/strategies/`:**
- Purpose: Pluggable trading strategy implementations
- Pattern: All extend `BaseStrategy`, implement `async evaluate(ticker, market, sentiment) → TradeSignal | None`
- Used by: Signal generator via `WeightedEnsemble`
**`tests/`:**
- Purpose: Unit and integration tests
- Pattern: `test_*.py` files match modules in `shared/` and `services/`
- Marker: `@pytest.mark.integration` for tests requiring live services (redis, postgres)
- Run: `pytest tests/ -v -m "not integration"` for unit, or with docker running for integration
## Key File Locations
**Entry Points:**
- `services/news_fetcher/main.py`: Polls RSS/Reddit, publishes to `news:raw`
- `services/sentiment_analyzer/main.py`: Consumes `news:raw`, publishes to `news:scored`
- `services/signal_generator/main.py`: Consumes `news:scored` + `market:bars`, publishes to `signals:generated`
- `services/trade_executor/main.py`: Consumes `signals:generated`, submits orders, publishes to `trades:executed`
- `services/learning_engine/main.py`: Consumes `trades:executed`, adjusts strategy weights
- `services/market_data/main.py`: Fetches OHLCV bars, publishes to `market:bars`
- `services/api_gateway/main.py`: HTTP server listening on port 8000 (configurable)
- `dashboard/src/main.tsx`: React app entry, connects to API gateway
**Configuration:**
- `.env`: Runtime environment variables (database URL, Redis URL, API keys, etc.)
- `pyproject.toml`: Python project metadata, dependencies (split into optional groups per service)
- `docker-compose.yml`: Service orchestration, port mappings, environment variables
- `alembic.ini`: Database migration tool configuration
**Core Logic:**
- `services/signal_generator/ensemble.py`: Weighted ensemble combining strategies
- `services/trade_executor/risk_manager.py`: Risk checks before order submission
- `services/learning_engine/weight_adjuster.py`: Multi-armed bandit weight adjustment
- `backtester/engine.py`: Historical replay loop
- `shared/broker/alpaca_broker.py`: Alpaca order submission and account queries
**Testing:**
- `tests/conftest.py`: Pytest fixtures for mock Redis, DB, service configs
- `tests/test_models.py`: ORM model tests
- `tests/test_schemas.py`: Pydantic schema validation tests
- `tests/services/`: Service-specific unit tests
- `tests/integration/`: End-to-end pipeline tests (require docker)
## Naming Conventions
**Files:**
- Python service directories use underscores: `news_fetcher`, `api_gateway`, `signal_generator` (matches Docker Compose service names with hyphens replaced)
- Test files: `test_*.py` or `*_test.py` (pytest convention)
- Database migration files: `{timestamp}_{description}.py` (Alembic auto-generated)
- React components: PascalCase `.tsx` files (Login.tsx, Portfolio.tsx, PositionsTable.tsx)
- React hooks: `use*.ts` prefix (useAuth.ts, useWebSocket.ts, usePortfolio.ts)
- Config classes: `*Config` suffix extending `BaseConfig` (NewsFetcherConfig, ApiGatewayConfig)
**Directories:**
- Services under `services/`: lowercase with underscores (news_fetcher, api_gateway)
- Shared libraries under `shared/`: logical grouping (broker/, models/, schemas/, strategies/)
- Tests mirror source structure: `tests/services/test_*.py` for `services/*/` modules
- React pages: match route names (Portfolio.tsx for /portfolio, TradeLog.tsx for /trades)
## Where to Add New Code
**New Feature (End-to-End):**
- Service-specific code: `services/{SERVICE_NAME}/{module}.py`
- Shared models: `shared/models/{domain}.py` (if new entity type)
- Shared schemas: `shared/schemas/{domain}.py` (if new message type)
- Tests: `tests/services/test_{SERVICE_NAME}.py` and/or `tests/integration/`
**New Component/Module:**
- Service: Create `services/{service_name}/` directory with `__init__.py`, `main.py`, `config.py`
- Strategy: Extend `BaseStrategy` in `shared/strategies/{strategy_name}.py`
- Broker adapter: Extend `BaseBroker` in `shared/broker/{provider_name}_broker.py`
- API route: Add to `services/api_gateway/routes/{domain}.py`
- Dashboard page: Add `dashboard/src/pages/{PageName}.tsx` and route in `App.tsx`
**Utilities:**
- Shared helpers: `shared/{module}.py` (e.g., `shared/redis_streams.py`)
- Service-specific helpers: `services/{SERVICE_NAME}/{helper}.py`
- API utilities: `services/api_gateway/{utility}.py`
**Database:**
- New table: Add model in `shared/models/{domain}.py`
- Schema migration: Run `alembic revision -m "description"`, fill in migration file
- Persist data: Service calls `session.add(model_instance)` then `session.commit()`
**Tests:**
- Unit test: `tests/test_{module}.py` for shared, `tests/services/test_{service}.py` for services
- Integration test: `tests/integration/test_{flow}.py`
- Fixtures: Add to `tests/conftest.py` (database, Redis, mocked API responses)
## Special Directories
**`.planning/codebase/`:**
- Purpose: Generated codebase mapping documents (ARCHITECTURE.md, STRUCTURE.md, etc.)
- Generated by: `/gsd:map-codebase` command
- Committed: Yes, part of repo for CI/CD reference
**`.pytest_cache/`, `__pycache__/`:**
- Purpose: Pytest and Python runtime caches
- Generated: Yes (auto-created by pytest/Python)
- Committed: No (in .gitignore)
**`trading_bot.egg-info/`:**
- Purpose: Build metadata from `setuptools.build_meta`
- Generated: Yes (auto-created during install)
- Committed: No (in .gitignore)
**`dashboard/dist/`:**
- Purpose: Built React app (HTML, JS, CSS bundles)
- Generated: Yes (by `npm run build`)
- Committed: No (in .gitignore)

View file

@ -0,0 +1,397 @@
# Testing Patterns
**Analysis Date:** 2026-02-23
## Test Framework
**Runner:**
- pytest 8.0+ (`pytest>=8.0` in `pyproject.toml`)
- Config: `pyproject.toml` under `[tool.pytest.ini_options]`
- `asyncio_mode = "auto"` — Automatically handles async test discovery and execution
- `testpaths = ["tests"]` — Tests located in root-level `tests/` directory
- Test markers: `integration` marks tests requiring docker services (redis, postgres)
**Assertion Library:**
- Built-in pytest assertions (`assert`, `assert x == y`)
- `pytest.approx()` for floating-point comparisons (e.g., `assert outcome.realized_pnl == pytest.approx(100.0)`)
**Run Commands:**
```bash
python -m pytest tests/ -v # Run all tests
python -m pytest tests/ -v -m "not integration" # Run unit tests only
python -m pytest tests/ -v -m integration # Run integration tests only (requires docker)
python -m pytest tests/ --cov # Run with coverage report
python -m pytest tests/ -x # Stop on first failure
python -m pytest tests/ -k test_name # Run tests matching pattern
```
**Test Execution:**
- Async tests automatically discovered and run via `asyncio_mode="auto"`
- No `@pytest.mark.asyncio` decorator needed (though present in some tests for clarity)
- Integration tests require `docker-compose up -d` with Redis and PostgreSQL running
## Test File Organization
**Location:**
- Co-located tests pattern: Tests in `tests/` directory mirroring `services/` and `shared/` structure
- Structure:
```
tests/
├── test_redis_streams.py # Tests for shared/redis_streams.py
├── test_models.py # Tests for shared/models/
├── test_schemas.py # Tests for shared/schemas/
├── test_broker.py # Tests for shared/broker/
├── test_strategies.py # Tests for shared/strategies/
├── test_backtester.py # Tests for backtester/
├── services/
│ ├── test_news_fetcher.py
│ ├── test_sentiment_analyzer.py
│ ├── test_signal_generator.py
│ ├── test_trade_executor.py
│ ├── test_learning_engine.py
│ ├── test_api_auth.py
│ ├── test_api_routes.py
│ ├── test_market_data.py
│ └── test_portfolio_sync.py
└── integration/
├── test_news_pipeline.py
└── test_trading_flow.py
```
**Naming:**
- Test files: `test_{module}.py` (e.g., `test_redis_streams.py`)
- Test functions: `test_{component}_{scenario}` (e.g., `test_publisher_publishes_json`)
- Test classes: `Test{Scenario}` (e.g., `TestEvaluateProfitableTrade`)
- Helper functions: `_make_{object}` (e.g., `_make_config`, `_make_signal`, `_make_trade_id`)
## Test Structure
**Suite Organization:**
```python
# Module docstring describing test scope
"""Tests for the Redis Streams publish/consume helpers."""
# Imports (pytest first, then unittest.mock, then project imports)
import json
from unittest.mock import AsyncMock
import pytest
from shared.redis_streams import StreamConsumer, StreamPublisher
# Fixtures (if any)
@pytest.fixture
async def redis_client():
"""Provide a clean Redis connection and clean up after."""
client = Redis.from_url(REDIS_URL)
yield client
await client.aclose()
# Test functions or classes
@pytest.mark.asyncio
async def test_publisher_publishes_json():
"""StreamPublisher should XADD a JSON-serialised payload."""
redis = AsyncMock()
# ... test implementation
class TestEvaluateProfitableTrade:
"""A long trade that gains in price should have positive PnL and ROI."""
def test_evaluate_profitable_trade(self):
# ... test implementation
```
**Section Comments:**
- Use comment separators: `# ---------------------------------------------------------------------------`
- Group tests by concern: Enums, Fixtures, RSS tests, Reddit tests, Integration tests, etc.
- Example from `test_models.py`:
```python
# ---------------------------------------------------------------------------
# Enum tests
# ---------------------------------------------------------------------------
class TestEnums:
```
**Patterns:**
- **Setup pattern**: Create fixtures as pytest `@pytest.fixture` decorators
- Can be module-level (reused) or function-level (isolated)
- Async fixtures use `async def` and `yield` or `yield from`
- Example:
```python
@pytest.fixture
async def redis_client():
client = Redis.from_url(REDIS_URL)
yield client
await client.aclose()
```
- **Teardown pattern**: Use `yield` in fixtures for cleanup
- Code after `yield` runs after the test completes
- Example from `test_news_pipeline.py`:
```python
@pytest.fixture
async def redis_client():
client = Redis.from_url(REDIS_URL)
await client.delete(RAW_STREAM, SCORED_STREAM) # Setup
yield client
await client.delete(RAW_STREAM, SCORED_STREAM) # Teardown
await client.aclose()
```
- **Assertion pattern**: Use pytest assertions directly
- For equality: `assert x == y`
- For calls: `redis.xadd.assert_called_once_with(...)`
- For floating point: `assert value == pytest.approx(expected)`
- Example from `test_redis_streams.py`:
```python
redis.xadd.assert_called_once_with(
"test:stream",
{"data": json.dumps({"ticker": "AAPL", "score": 0.8})},
)
assert msg_id == b"1-0"
```
## Mocking
**Framework:** `unittest.mock` (built-in)
**Patterns:**
- AsyncMock for async functions: `AsyncMock(return_value=...)`
- MagicMock for sync functions and objects: `MagicMock()`
- SimpleNamespace for lightweight objects: `SimpleNamespace(title=..., score=...)`
**Example from `test_redis_streams.py`:**
```python
redis = AsyncMock()
redis.xadd = AsyncMock(return_value=b"1-0")
pub = StreamPublisher(redis, "test:stream")
msg_id = await pub.publish({"ticker": "AAPL"})
redis.xadd.assert_called_once_with(...)
assert msg_id == b"1-0"
```
**Example from `test_news_fetcher.py` (multi-call behavior):**
```python
redis.xreadgroup = AsyncMock(
side_effect=[
[("test:stream", [(b"1-0", {b"data": json.dumps(payload).encode()})])],
KeyboardInterrupt, # Break loop on second call
]
)
```
**What to Mock:**
- External services: Redis, database (use AsyncMock with return values)
- API calls: HTTP requests, OpenTelemetry counters
- ML models: FinBERT and Ollama analysis (patch and return synthetic scores)
- Broker connections: Alpaca API (return fake order results)
- File I/O and network operations
**What NOT to Mock:**
- Core business logic (RiskManager, TradeEvaluator, WeightAdjuster)
- Data structures and schemas
- Internal function calls within a module
- Time-based operations in unit tests (use fixtures for time-dependent tests)
**Patching Example from `test_sentiment_analyzer.py`:**
```python
with patch("services.sentiment_analyzer.analyzers.finbert.FinBERTAnalyzer") as mock_finbert:
mock_instance = AsyncMock()
mock_instance.analyze = AsyncMock(return_value=(0.8, 0.95))
mock_finbert.return_value = mock_instance
# ... run test
```
## Fixtures and Factories
**Test Data Patterns:**
Helper functions to create test objects:
```python
def _make_config(**overrides) -> LearningEngineConfig:
"""Create a LearningEngineConfig with sensible defaults + overrides."""
defaults = dict(
learning_rate=0.1,
min_trades_before_adjustment=20,
max_weight_shift_pct=0.10,
)
defaults.update(overrides)
return LearningEngineConfig(**defaults)
def _make_signal(
ticker: str = "AAPL",
direction: SignalDirection = SignalDirection.LONG,
) -> TradeSignal:
return TradeSignal(
ticker=ticker,
direction=direction,
strength=0.8,
strategy_sources=["test"],
timestamp=datetime.now(timezone.utc),
)
```
**Pytest Fixtures:**
```python
@pytest.fixture
def sample_article() -> RawArticle:
"""Return a sample RawArticle mentioning AAPL."""
return RawArticle(
source="rss",
url="https://example.com/aapl-news",
title="Apple Inc AAPL reports record quarterly earnings",
content="...",
published_at=datetime.now(timezone.utc),
fetched_at=datetime.now(timezone.utc),
content_hash="test-hash-aapl-001",
)
@pytest.fixture()
def config() -> ApiGatewayConfig:
return ApiGatewayConfig(
jwt_secret_key="test-secret-for-routes",
database_url="sqlite+aiosqlite:///:memory:",
redis_url="redis://localhost:6379/0",
)
```
**Location:**
- Helper functions: Top of test file, marked with `_make_` prefix, after docstring and imports
- Pytest fixtures: After helpers, before test classes/functions, decorated with `@pytest.fixture`
- Shared fixtures: In separate test files if reused across multiple tests
- Integration test fixtures: `redis_client` (cleanup with delete and close), database fixtures
## Coverage
**Requirements:** Not enforced by default; 246 unit tests pass with zero failures (as of last sprint)
**View Coverage:**
```bash
python -m pytest tests/ --cov=shared --cov=services --cov-report=term-missing
python -m pytest tests/ --cov --cov-report=html # Generate HTML report
```
**Coverage Statistics (approximate):**
- `tests/test_redis_streams.py` — 5 tests (complete coverage of StreamPublisher/Consumer)
- `tests/test_models.py` — 21 tests (enums, relationships)
- `tests/test_schemas.py` — 49 tests (Pydantic schema validation)
- `tests/test_broker.py` — 18 tests (AlpacaBroker implementation)
- `tests/test_strategies.py` — 24 tests (RSI, EMA, Momentum strategies)
- `tests/test_backtester.py` — 13 tests (backtest simulation)
- `tests/services/test_news_fetcher.py` — 10 tests (RSS, Reddit, deduplication)
- `tests/services/test_sentiment_analyzer.py` — 19 tests (FinBERT, Ollama, tickers)
- `tests/services/test_signal_generator.py` — 17 tests (weighted ensemble)
- `tests/services/test_trade_executor.py` — 16 tests (RiskManager, order flow)
- `tests/services/test_learning_engine.py` — 28 tests (trade evaluation, weight adjustment)
- `tests/services/test_api_auth.py` — 13 tests (WebAuthn, JWT)
- `tests/services/test_api_routes.py` — 13 tests (endpoint responses)
- `tests/integration/` — 9 integration tests (news pipeline, trading flow)
## Test Types
**Unit Tests:**
- Scope: Single function, class, or module
- Strategy: Mock all external dependencies (Redis, DB, API calls, ML models)
- Location: `tests/test_*.py` and `tests/services/test_*.py`
- Execution: Runs in isolation without services
- Examples: `test_publisher_publishes_json`, `test_evaluate_profitable_trade`
**Integration Tests:**
- Scope: Multi-service interaction (e.g., news fetcher → sentiment analyzer pipeline)
- Strategy: Real Redis streams, real database, mocked ML models and external APIs
- Location: `tests/integration/test_*.py`
- Execution: Requires `docker-compose up -d` with Redis and PostgreSQL running
- Marker: `@pytest.mark.integration` (separate via `pytest -m integration`)
- Examples: `test_news_pipeline.py` (publishes to `news:raw`, reads from `news:scored`), `test_trading_flow.py`
**E2E Tests:**
- Not implemented; would require running full docker-compose stack with live Alpaca paper trading
- Could be added for smoke testing production deployments
## Common Patterns
**Async Testing:**
```python
@pytest.mark.asyncio
async def test_consumer_consume_yields_and_acks() -> None:
"""consume() should yield deserialised data and ACK each message."""
redis = AsyncMock()
redis.xgroup_create = AsyncMock()
redis.xreadgroup = AsyncMock(side_effect=[
[("test:stream", [(b"1-0", {b"data": json.dumps(payload).encode()})])],
KeyboardInterrupt,
])
consumer = StreamConsumer(redis, "test:stream", "grp", "c1")
results = []
try:
async for msg_id, data in consumer.consume():
results.append((msg_id, data))
except KeyboardInterrupt:
pass
assert len(results) == 1
assert results[0] == (b"1-0", payload)
```
**Error Testing:**
```python
def test_consumer_ensure_group_ignores_existing() -> None:
"""If the group already exists the exception should be swallowed."""
redis = AsyncMock()
redis.xgroup_create = AsyncMock(side_effect=Exception("BUSYGROUP"))
consumer = StreamConsumer(redis, "test:stream", "my-group", "worker-1")
# Should not raise
await consumer.ensure_group() # No assertion; test passes if no exception
```
**Parametrized Tests:**
- Not heavily used in current codebase
- Could be added for testing multiple input scenarios (e.g., different signal directions)
- Use `@pytest.mark.parametrize` if needed
**Floating-Point Assertions:**
```python
assert outcome.realized_pnl == pytest.approx(100.0) # Allows small differences
assert outcome.roi_pct == pytest.approx(10.0, rel=1e-5) # With tolerance
```
**Class-Based Test Organization:**
```python
class TestEvaluateProfitableTrade:
"""A long trade that gains in price should have positive PnL and ROI."""
def test_evaluate_profitable_trade(self):
evaluator = TradeEvaluator()
outcome = evaluator.evaluate_trade(...)
assert outcome.realized_pnl == pytest.approx(100.0)
assert outcome.was_profitable is True
class TestEvaluateLosingTrade:
"""A long trade that drops should have negative PnL."""
def test_evaluate_losing_trade(self):
# ... different scenario
```
## Test Configuration
**pytest.ini_options (from pyproject.toml):**
```toml
[tool.pytest.ini_options]
asyncio_mode = "auto"
testpaths = ["tests"]
markers = ["integration: marks tests requiring docker services (redis, postgres)"]
```
**Environment:**
- Database URL: Tests use `sqlite+aiosqlite:///:memory:` for in-memory databases
- Redis: Tests use `redis://localhost:6379/1` (DB 1) for integration tests to avoid conflicts
- Async mode: "auto" handles all async test discovery automatically
---
*Testing analysis: 2026-02-23*