diff --git a/docs/plans/2026-02-22-trading-bot-design.md b/docs/plans/2026-02-22-trading-bot-design.md index 271c7df..4ff0fbc 100644 --- a/docs/plans/2026-02-22-trading-bot-design.md +++ b/docs/plans/2026-02-22-trading-bot-design.md @@ -182,9 +182,10 @@ trading-bot/ ## Key Libraries -**Backend:** FastAPI, SQLAlchemy (async), alpaca-py, transformers (FinBERT), praw, redis-py, websockets, ollama-python -**Frontend:** React 18, TypeScript, TanStack Query, Recharts or TradingView lightweight-charts, Tailwind CSS +**Backend:** FastAPI, SQLAlchemy (async), alpaca-py, transformers (FinBERT), praw, redis-py, websockets, ollama-python, py-webauthn +**Frontend:** React 18, TypeScript, TanStack Query, Recharts or TradingView lightweight-charts, Tailwind CSS, @simplewebauthn/browser **ML:** transformers (FinBERT), ollama, numpy, pandas +**Observability:** opentelemetry-sdk, opentelemetry-exporter-prometheus, opentelemetry-instrumentation-fastapi **Testing:** pytest, pytest-asyncio, React Testing Library ## Brokerage Choice @@ -200,6 +201,41 @@ Alpaca is the primary brokerage: An abstraction layer in `shared/broker/` allows adding other brokerages (Interactive Brokers, Tradier) later without changing strategy or execution logic. +## Observability + +All Python services are instrumented with OpenTelemetry: + +- **Metrics**: `opentelemetry-sdk` + `opentelemetry-exporter-prometheus` expose a `/metrics` endpoint on each service for external Prometheus scraping +- **Traces**: Distributed tracing across services via OpenTelemetry (OTLP export to any collector) +- **Key metrics per service**: + - News Fetcher: articles fetched/min, source error rates, fetch latency + - Sentiment Analyzer: articles scored/min, FinBERT vs Ollama routing ratio, inference latency + - Signal Generator: signals generated/min, per-strategy signal counts + - Trade Executor: trades executed/min, order fill latency, rejection rate + - Learning Engine: adjustment frequency, weight drift magnitude + - API Gateway: request rate, response latency (p50/p95/p99), error rate +- **Business metrics** exposed via API Gateway `/metrics`: + - Portfolio value, daily P&L, drawdown, Sharpe ratio + - Per-strategy win rate and weight allocation + +No Prometheus or Grafana deployed — the existing infrastructure handles scraping and visualization. + +## Authentication & Security + +Passkey-based authentication (WebAuthn/FIDO2): + +- **Sign up**: User registers with username + passkey (biometric/security key). Server stores credential public key in PostgreSQL `users` / `user_credentials` tables +- **Sign in**: Challenge-response via WebAuthn API. No passwords stored or transmitted +- **Session management**: JWT issued after successful passkey authentication, short-lived access token + refresh token +- **API Gateway middleware**: All API endpoints require valid JWT (except `/auth/*` routes and `/metrics`) +- **CORS**: Restricted to dashboard origin +- **Secrets**: Alpaca API keys and signing keys stored as environment variables via `.env` files, never committed to git + +### Additional Tables + +- `users` - id, username, display_name, created_at +- `user_credentials` - id, user_id, credential_id, public_key, sign_count, created_at + ## Docker Compose Services - `news-fetcher` - Python service