Add observability and authentication sections to trading bot design

Add OpenTelemetry instrumentation plan with /metrics endpoints for
external Prometheus scraping, and passkey/WebAuthn authentication flow
with JWT sessions.

[ci skip]
This commit is contained in:
Viktor Barzin 2026-02-22 14:41:38 +00:00
parent bd3ff169e3
commit ab0c932287
No known key found for this signature in database
GPG key ID: 0EB088298288D958

View file

@ -182,9 +182,10 @@ trading-bot/
## Key Libraries
**Backend:** FastAPI, SQLAlchemy (async), alpaca-py, transformers (FinBERT), praw, redis-py, websockets, ollama-python
**Frontend:** React 18, TypeScript, TanStack Query, Recharts or TradingView lightweight-charts, Tailwind CSS
**Backend:** FastAPI, SQLAlchemy (async), alpaca-py, transformers (FinBERT), praw, redis-py, websockets, ollama-python, py-webauthn
**Frontend:** React 18, TypeScript, TanStack Query, Recharts or TradingView lightweight-charts, Tailwind CSS, @simplewebauthn/browser
**ML:** transformers (FinBERT), ollama, numpy, pandas
**Observability:** opentelemetry-sdk, opentelemetry-exporter-prometheus, opentelemetry-instrumentation-fastapi
**Testing:** pytest, pytest-asyncio, React Testing Library
## Brokerage Choice
@ -200,6 +201,41 @@ Alpaca is the primary brokerage:
An abstraction layer in `shared/broker/` allows adding other brokerages (Interactive Brokers, Tradier) later without changing strategy or execution logic.
## Observability
All Python services are instrumented with OpenTelemetry:
- **Metrics**: `opentelemetry-sdk` + `opentelemetry-exporter-prometheus` expose a `/metrics` endpoint on each service for external Prometheus scraping
- **Traces**: Distributed tracing across services via OpenTelemetry (OTLP export to any collector)
- **Key metrics per service**:
- News Fetcher: articles fetched/min, source error rates, fetch latency
- Sentiment Analyzer: articles scored/min, FinBERT vs Ollama routing ratio, inference latency
- Signal Generator: signals generated/min, per-strategy signal counts
- Trade Executor: trades executed/min, order fill latency, rejection rate
- Learning Engine: adjustment frequency, weight drift magnitude
- API Gateway: request rate, response latency (p50/p95/p99), error rate
- **Business metrics** exposed via API Gateway `/metrics`:
- Portfolio value, daily P&L, drawdown, Sharpe ratio
- Per-strategy win rate and weight allocation
No Prometheus or Grafana deployed — the existing infrastructure handles scraping and visualization.
## Authentication & Security
Passkey-based authentication (WebAuthn/FIDO2):
- **Sign up**: User registers with username + passkey (biometric/security key). Server stores credential public key in PostgreSQL `users` / `user_credentials` tables
- **Sign in**: Challenge-response via WebAuthn API. No passwords stored or transmitted
- **Session management**: JWT issued after successful passkey authentication, short-lived access token + refresh token
- **API Gateway middleware**: All API endpoints require valid JWT (except `/auth/*` routes and `/metrics`)
- **CORS**: Restricted to dashboard origin
- **Secrets**: Alpaca API keys and signing keys stored as environment variables via `.env` files, never committed to git
### Additional Tables
- `users` - id, username, display_name, created_at
- `user_credentials` - id, user_id, credential_id, public_key, sign_count, created_at
## Docker Compose Services
- `news-fetcher` - Python service