Each WhatsApp session is represented by a long-running workflow that starts an async activity which establishes and keeps a WebSocket connection to WhatsApp alive indefinitely.
We’re using throw new CompleteAsyncError() to make the activity never complete and rely on periodic heartbeats to keep it alive. The workflow handles reconnections, QR updates, new messages received, and state changes via signals, and we use continueAsNew to avoid history limits.
However, we’re issues:
Activities sometimes get marked as “completed” unexpectedly, killing the socket.
Activity heartbeat timeout.
The design feels fragile and expensive to scale (sessions means open workflows, async activities, actions).
Is this architecture recommended? What are the risks and best practices for managing long-lived WebSocket connections like this with Temporal?
Is any way to get notifications from WhatsApp without keeping a connection per session? The fragility comes from a stateful connections, not from Temporal.
You are right, the fragility comes mainly from the fact that WhatsApp Web is fundamentally stateful and connection-oriented.
Unfortunately, there’s no official push mechanism from WhatsApp that allows us to receive events without maintaining a live connection per session. Each session(phone number) requires a dedicated WebSocket connection with WhatsApp to:
We’re leaning toward moving the socket outside Temporal (to a long-lived worker), and using Temporal only to orchestrate the lifecycle of each session. That seems more robust and scalable.
But if you or the team have seen other architectures or ideas to avoid keeping live WebSockets per session, we’d love to hear about them!
I think moving the websocket out of an activity is going to make managing them more complicated. For example how do you know when to reconnect if a process that hosts the websocket is restarted?