Fix real-time task progress by closing WS on pubsub exit and keeping polling active
Three interconnected bugs prevented progress updates from reaching the frontend: 1. _forward_pubsub could exit silently while _handle_client_messages kept the WebSocket alive (responding to pings), so the client never detected the broken forwarding path. Replace asyncio.gather with asyncio.wait (FIRST_COMPLETED) so both coroutines are cancelled together. 2. Polling was stopped on WS connect with no fallback if forwarding broke. Now polling runs always alongside WebSocket as a safety net. 3. Redis publish failures in task_progress_publisher were logged at DEBUG and the broken client was reused forever. Log at WARNING and reset the client so the next call reconnects.
This commit is contained in:
parent
8d52bdf99d
commit
791b5a9d55
3 changed files with 362 additions and 19 deletions
|
|
@ -35,8 +35,9 @@ def publish_task_progress(task_id: str, state: str, meta: dict[str, Any]) -> Non
|
|||
state: Celery state string (e.g. 'PROGRESS', 'SUCCESS').
|
||||
meta: Metadata dict (progress, phase, logs, counters, etc.).
|
||||
|
||||
Failures are caught and logged at DEBUG level so they never break the
|
||||
critical task execution path.
|
||||
Failures are caught and logged at WARNING level so they never break the
|
||||
critical task execution path. The Redis client is reset on failure so
|
||||
subsequent calls can reconnect.
|
||||
"""
|
||||
try:
|
||||
client = _get_redis_client()
|
||||
|
|
@ -47,4 +48,6 @@ def publish_task_progress(task_id: str, state: str, meta: dict[str, Any]) -> Non
|
|||
})
|
||||
client.publish(f"task_progress:{task_id}", payload)
|
||||
except Exception:
|
||||
logger.debug("Failed to publish task progress for %s", task_id, exc_info=True)
|
||||
logger.warning("Failed to publish task progress for %s", task_id, exc_info=True)
|
||||
# Reset client so next call creates a fresh connection
|
||||
_redis_client = None
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue