# Codebase Concerns **Analysis Date:** 2026-02-23 ## Tech Debt **Incomplete API Portfolio Metrics:** - Issue: Portfolio metrics endpoint returns hardcoded placeholder values instead of computed analytics - Files: `services/api_gateway/routes/portfolio.py` (lines 90-92, 170, 172) - Impact: Dashboard displays incorrect P&L, drawdown, and hold duration metrics; client decisions based on false data - Fix approach: Implement cumulative P&L tracking from first portfolio snapshot, compute max drawdown from historical snapshots, calculate average hold duration from trade outcomes table, implement Redis trading pause flag integration **Missing Portfolio Sync Credentials Handling:** - Issue: Portfolio sync task silently disables when Alpaca credentials are missing, with no alerting mechanism - Files: `services/api_gateway/tasks/portfolio_sync.py` (lines 125-129) - Impact: Dashboard shows stale or zero portfolio data without user awareness; critical data source failure goes unnoticed - Fix approach: Add explicit warning log at startup, implement health check endpoint that reports sync status, consider sending alerts to dashboard via WebSocket **Broad Exception Handling:** - Issue: Multiple services catch bare `Exception` without specific error classification - Files: `services/sentiment_analyzer/main.py` (lines 150, 207, 226), `services/trade_executor/main.py` (lines 136, 203, 222), `services/signal_generator/main.py` (lines 87, 178, 194, 257), `services/learning_engine/main.py` (lines 304), `services/market_data/main.py` (lines 108, 158, 253), `services/api_gateway/tasks/portfolio_sync.py` (lines 152) - Impact: Errors silently logged without retry logic, potential data loss, operational issues invisible to monitoring - Fix approach: Implement specific exception handling (IntegrityError, APIError, ValidationError, etc.), add telemetry counters for error categories, implement exponential backoff for transient failures **Implicit Database Persistence Fallback:** - Issue: Multiple services (sentiment-analyzer, trade-executor) silently continue if DB initialization fails - Files: `services/sentiment_analyzer/main.py` (lines 203-208), `services/trade_executor/main.py` (lines 199-204), `services/signal_generator/main.py` (initial DB setup) - Impact: Services run without persistence layer — no audit trail, no dashboard data, potential duplicate processing if service restarts - Fix approach: Fail fast on DB connection errors, or implement persistent message queue (file-based) as fallback to ensure no trades are lost **Learning Engine Position State In Redis Only:** - Issue: Opening trades stored in Redis `positions:history` hash with no persistence — loss of intermediate state if service crashes - Files: `services/learning_engine/main.py` (lines 65-82, 121-138) - Impact: Position close detection fails after service restart; partial trades with no close detected = unrealistic P&L - Fix approach: Move position history to database with persistent transaction log, implement consistent read-modify-write with transactions --- ## Known Bugs **Portfolio Sync N+1 Query Pattern:** - Symptoms: One SELECT per position during portfolio sync; if 100 positions, 100 database queries per cycle - Files: `services/api_gateway/tasks/portfolio_sync.py` (lines 74-92) - Trigger: Any portfolio with multiple open positions - Workaround: None; will degrade performance as position count grows - Fix: Use `SELECT * WHERE ticker IN (...)` to fetch all positions in one query, then update in-memory dict **Daily P&L Calculation Bug:** - Symptoms: Portfolio endpoint returns same value for `daily_pnl_pct` and `total_pnl_pct` - Files: `services/api_gateway/routes/portfolio.py` (lines 90-91) - Trigger: Every portfolio request - Root cause: Code reuses `daily_pnl_pct` for `total_pnl_pct` instead of computing cumulative metric - Workaround: Manually inspect database portfolio_snapshots for correct metrics - Fix: Compute cumulative P&L from first snapshot timestamp to now **Position Unrealized P&L Division by Zero:** - Symptoms: Potential division by zero when `p.qty == 0` (closed position) - Files: `services/api_gateway/routes/portfolio.py` (lines 116-117, 120-121) - Trigger: When position is exactly 0 shares (edge case in reconciliation) - Workaround: Guards in place (`if p.qty`) but logic is error-prone - Fix: Refactor to use `if p.qty and p.avg_entry` guard early, simplify arithmetic **Signal Sentiment Context Missing Price Field:** - Symptoms: Risk manager reads `sentiment_context["current_price"]` but signal may not include it - Files: `services/trade_executor/risk_manager.py` (lines 118-120) - Trigger: If signal is generated without market snapshot or price is omitted - Workaround: Falls back to 0.0, position sizing becomes incorrect - Fix: Ensure signal generator always includes current_price in sentiment_context, add assertion in executor --- ## Security Considerations **JWT Secret Key Not Enforced in Development:** - Risk: Default development deployments may use weak or missing JWT secret keys - Files: `services/api_gateway/config.py` (lines 13-15), `services/api_gateway/auth/jwt.py` (lines 41, 96) - Current mitigation: Config requires `jwt_secret_key` but does not validate length/entropy - Recommendations: - Add validation in ApiGatewayConfig that secret is >= 32 bytes - Fail startup if secret is not set (don't use default) - Add warning log if secret is weak (< 256 bits) - Document minimum secret requirements in README **Test Keys Insufficient for Production:** - Risk: Test suites use hardcoded short keys that would be cryptographically weak in production - Files: Multiple test files use test keys like `"test-secret-key"` - Current mitigation: Only in test environment - Recommendations: - Use pytest fixture to generate strong keys per test - Document that tests should not be deployed as-is - Consider environment variable override in CI **CORS Configuration Hardcoded for Localhost:** - Risk: Default CORS allows only `http://localhost:5173`, but production deployment requires environment override - Files: `services/api_gateway/config.py` (line 26) - Current mitigation: CORS is at least restricted (not `*`), but easy to miss in production deploy - Recommendations: - Make CORS_ORIGINS a required config variable (no default) - Add validation that origins use HTTPS in production - Log warning if origin is localhost in production environment **Passkey Registration Not Rate-Limited:** - Risk: WebAuthn registration endpoint (if present) has no rate limiting, allowing brute-force attacks - Files: Not fully reviewed; check `services/api_gateway/auth/routes.py` - Current mitigation: Unknown - Recommendations: - Add rate limiting (redis-backed) on registration endpoints - Implement CAPTCHA or email verification for new accounts - Log failed registration attempts for monitoring **No HTTPS Enforcement in API Gateway:** - Risk: JWT tokens transmitted over plaintext HTTP in development config - Files: `services/api_gateway/config.py` (line 31 uses `http://localhost:5173`) - Current mitigation: Only affects localhost development - Recommendations: - Add `enforce_https` config flag; reject insecure origins in production - Set `secure` flag on JWT cookies - Use HSTS headers in FastAPI --- ## Performance Bottlenecks **Sentiment Scores Stored in Unbounded Deque:** - Problem: Signal generator stores up to 50 sentiment scores per ticker in memory (never evicted from running process) - Files: `services/signal_generator/main.py` (line 35, lines 107-111) - Cause: deques are bounded but not persisted; high-frequency tickers accumulate memory - Current: 50 * ~5000 tickers = ~250K floats in memory during high volume - Improvement path: - Store sentiment history in database (timeseries table) instead of memory - Query last N scores on-demand per signal generation - Add TTL/expiration for old scores **Market Data Manager Holds Full OHLCV History:** - Problem: `MarketDataManager` buffers all bars received for each ticker in memory - Files: `services/signal_generator/main.py` (referenced from main) - Cause: No bounded retention; running service accumulates bars indefinitely - Impact: Memory leak over weeks of operation - Improvement path: - Implement sliding window (keep last 100 bars only) - Persist bars to database for strategy backtesting - Use TimescaleDB hypertable compression for efficient historical queries **Portfolio Sync Deletes and Re-inserts All Positions:** - Problem: Every sync cycle deletes all local positions and re-inserts from broker - Files: `services/api_gateway/tasks/portfolio_sync.py` (lines 94-101) - Cause: Simplistic approach; doesn't diff changes - Impact: Every 60s delete cascade on Position table (triggers cascade on related trades?) - Improvement path: - Implement diff-based upsert: only update changed tickers - Delete positions not in broker snapshot - Reduces churn, avoids potential foreign key cascade issues **No Connection Pooling Tuning:** - Problem: AsyncPG default pool size may be insufficient for 7 services + API load - Files: `shared/db.py` (line 13-16) - Cause: `create_async_engine` uses default pool_size - Impact: Connection starvation under load, query timeouts - Improvement path: - Add `pool_size` and `max_overflow` config to BaseConfig - Monitor pool utilization (add telemetry) - Right-size pools per environment **Alpaca API Calls Not Cached:** - Problem: Every trade executor and portfolio sync makes synchronous Alpaca API calls for account/positions - Files: `services/trade_executor/main.py` (line 74), `services/api_gateway/tasks/portfolio_sync.py` (line 64) - Cause: No request deduplication or caching layer - Impact: Slow order submission, API rate limit risks - Improvement path: - Cache account snapshot in Redis (TTL 5s) - Batch position queries where possible - Implement exponential backoff for API failures --- ## Fragile Areas **Sentiment Analyzer Ollama Fallback Untested:** - Files: `services/sentiment_analyzer/analyzers/ollama_analyzer.py`, `services/sentiment_analyzer/main.py` (lines 71-79) - Why fragile: Fallback path (Ollama) rarely exercised if FinBERT always has high confidence - Safe modification: Add integration test that mocks FinBERT confidence < threshold, verifies Ollama called - Test coverage: No explicit test for fallback path - Impact: If FinBERT fails completely, Ollama fallback could silently degrade or crash **Signal Generator Ensemble Evaluation:** - Files: `services/signal_generator/main.py` (line 149), `services/signal_generator/ensemble.py` - Why fragile: Three strategies weighted by Redis weights; if weights become NaN or negative, ensemble breaks silently - Safe modification: Add assertions that weights are valid (sum~1, all in [0,1]), implement weight validation on load - Test coverage: No explicit test for weight anomalies (e.g., all weights corrupted) - Impact: Invalid weights could produce signals with undefined strength **Risk Manager Market Hours Check Time Zone:** - Files: `services/trade_executor/risk_manager.py` (lines 19-25, 59-61) - Why fragile: Hard-coded ET timezone; DST transitions not handled explicitly - Safe modification: Use ZoneInfo with explicit DST handling (Python 3.9+), add tests for DST boundaries - Test coverage: No explicit DST test - Impact: Orders placed 1 hour off during DST transition **Learning Engine Position Matching Logic:** - Files: `services/learning_engine/main.py` (lines 85-98, 121-138) - Why fragile: String comparison of side enum (`opening_side == OrderSide.BUY.value`) - Safe modification: Use enum comparison directly, add type hints, add unit tests for all side combinations - Test coverage: Tests exist but may not cover edge cases (e.g., SELL -> BUY -> SELL) - Impact: Position close detection fails, P&L not recorded, weights never adjusted **WebAuthn Credential Validation:** - Files: `services/api_gateway/auth/routes.py` (check WebAuthn implementation) - Why fragile: WebAuthn implementation may not validate all required fields (origin, user verification, etc.) - Safe modification: Review against OWASP WebAuthn checklist, add explicit assertion tests - Test coverage: Limited test coverage for security properties - Impact: Weak authentication allows account takeover --- ## Scaling Limits **Redis Single Node No Persistence:** - Current capacity: In-memory Redis instance; loses all data on restart - Limit: All stream data (news, signals, trades) lost if Redis crashes - Blocks: Reliable message delivery, audit trails, service recovery - Scaling path: - Enable AOF (append-only file) persistence - Consider Redis Cluster for HA - Alternatively, add database-backed message queue (e.g., TimescaleDB FIFO table) **PostgreSQL Shared Memory Not Tuned:** - Current capacity: Default shared_buffers, work_mem, etc. - Limit: Query performance degrades with 7 concurrent services - Scaling path: - Tune PostgreSQL config: shared_buffers, work_mem, effective_cache_size per machine - Add read replicas for reporting queries - Partition portfolio snapshots by date (TimescaleDB hypertables) **Dashboard WebSocket No Backpressure:** - Current capacity: Single WebSocket connection broadcasts all events - Limit: Slow clients blocked, queue unbounded memory growth - Scaling path: - Implement client subscription topics (only send relevant tickers) - Add message batching (send updates every 100ms) - Implement client-side connection pooling **No Circuit Breaker for Alpaca API:** - Current capacity: Alpaca API calls block entire trade executor on timeout - Limit: One slow Alpaca response stalls all signal processing - Scaling path: - Implement circuit breaker pattern (fail-open after N failures) - Add request timeout with retry logic - Queue unprocessable signals for retry after recovery --- ## Dependencies at Risk **Ollama Local LLM Not Containerized with Stack:** - Risk: Ollama runs separately; must be manually started for sentiment analyzer to work - Files: `docker-compose.yml` (check if Ollama service present) - Impact: Sentiment analyzer fails silently if Ollama not running; fallback depends on FinBERT - Migration plan: Add Ollama to docker-compose.yml, or make Ollama truly optional with better error handling **FinBERT Model Download Not Cached:** - Risk: FinBERT downloads model from HuggingFace on first run (~600MB) - Files: `services/sentiment_analyzer/analyzers/finbert.py` - Impact: Long startup time, may timeout in resource-constrained environments - Migration plan: Pre-download model into Docker image, add model cache invalidation strategy **Alpaca SDK Direct HTTP Calls Not Mocked in Integration Tests:** - Risk: Integration tests mock Alpaca SDK, so real behavior never tested - Files: `tests/services/test_trade_executor.py` (check mocks), `tests/integration/` - Impact: Order submission bugs discovered only in production - Migration plan: Implement live integration tests against Alpaca paper trading, verify in CI/CD **SQLAlchemy 2.0 Async Mode Requires Specific Dialect:** - Risk: AsyncPG optional dependency; falls back to sync driver if missing - Files: `shared/db.py` (line 13 uses `postgresql+asyncpg://`) - Impact: If asyncpg not installed, services run in sync mode, defeating async benefits - Migration plan: Add explicit check in `create_db()` to fail if asyncpg missing --- ## Missing Critical Features **No Duplicate Trade Detection:** - Problem: If signal generator processes same article twice (due to Redis consumer group lag), executor may place duplicate trades - Blocks: Multi-service reliability, audit accuracy - Scope: Implement idempotency key in TradeExecution, check signal_id before placing order **No Circuit Break / Trading Pause Mechanism:** - Problem: If models degrade, bot continues trading losses - Blocks: Risk mitigation, manual emergency stop - Current: `TRADING_PAUSED_KEY` in Redis (`services/api_gateway/routes/controls.py`), but not enforced in executor - Scope: Check executor consumes trading pause flag before submitting orders **No Backtester Trade Slippage Model:** - Problem: Backtester assumes fills at exact signal price; unrealistic - Blocks: Accurate backtest results, risk estimation - Scope: Add configurable slippage model (fixed basis points + volume-based) **No Strategy Parameter Sensitivity Analysis:** - Problem: Can't understand which parameters drive P&L - Blocks: Scientific strategy optimization, risk understanding - Scope: Implement parameter sweep in backtester, record sensitivity metrics **No Live Trade Reconciliation:** - Problem: If Alpaca and local DB diverge, no auto-correction - Blocks: Accurate position tracking, tax reporting - Scope: Implement daily reconciliation job that compares broker vs. DB, alerts on mismatch --- ## Test Coverage Gaps **Dashboard Components Not Unit Tested:** - What's not tested: React components (portfolio display, trade table, charts, alerts) - Files: `dashboard/src/components/`, `dashboard/src/pages/` - Risk: Component bugs, broken UI layouts only caught in E2E (if at all) - Priority: High — dashboard is user-facing; regressions impact usability **API Routes Happy Path Only:** - What's not tested: Edge cases (missing fields, invalid enum values, concurrent requests) - Files: `tests/services/test_api_routes.py` (13 tests covering basic CRUD) - Risk: API crashes on edge inputs, poor error messages - Priority: Medium — most users use dashboard, not direct API **Market Data Manager Not Unit Tested:** - What's not tested: Deque overflow behavior, snapshot generation with missing bars - Files: `services/signal_generator/market_data.py` (no test file) - Risk: Ensemble generates signals with incomplete data - Priority: High — directly impacts signal quality **Learning Engine Weight Adjustment Edge Cases:** - What's not tested: Weight convergence, NaN/inf propagation, simultaneous rewards for same strategy - Files: `services/learning_engine/weight_adjuster.py` (test coverage unknown) - Risk: Weights diverge to extremes, strategies disabled unintentionally - Priority: High — learning is core feature **WebAuthn Registration and Verification:** - What's not tested: Invalid registration data, replay attacks, malformed challenge responses - Files: `services/api_gateway/auth/routes.py` (WebAuthn handlers) - Risk: Authentication bypass, account takeover - Priority: Critical — security sensitive **Redis Streams Consumer Group Edge Cases:** - What's not tested: Consumer group recovery after crash, message redelivery, offset management - Files: `shared/redis_streams.py` (5 tests total) - Risk: Messages lost or duplicated after service restart - Priority: High — data reliability depends on it **Concurrent Signal and Trade Execution:** - What's not tested: Race conditions when multiple signals processed simultaneously - Files: Integration tests mock this with single thread - Risk: Phantom positions, incorrect risk calculations - Priority: High — real production has concurrent load --- ## Recommendations for Production Readiness **Before going live, address at minimum:** 1. **Critical (blocks production):** - Implement portfolio metrics computation (cumulative P&L, max drawdown) - Fix portfolio sync query performance (N+1 issue) - Validate JWT secret key entropy at startup - Add rate limiting to auth endpoints - Implement circuit breaker for Alpaca API calls - Add duplicate trade detection (idempotency) 2. **High (should address soon):** - Implement trade reconciliation job - Move position state from Redis to database - Add specific exception handling (not bare Exception) - Implement backpressure in WebSocket broadcasting - Test Ollama fallback path end-to-end - Add comprehensive dashboard unit tests 3. **Medium (can defer):** - Optimize portfolio sync with diff-based approach - Persist market data and sentiment to TimescaleDB - Tune PostgreSQL connection pool - Add DST test coverage for market hours - Implement parameter sensitivity analysis in backtester --- *Concerns audit: 2026-02-23*