Signalling system - Human driven workflows

Might be totally off on this but the current Signal* system in Temporal works in a fire-and-forget way.
Also there might be an overlap/misunderstanding on when to use signals/async complete activities.

I’ll layout a generic scenario below;

  1. user submits email address in a form -> OTP is generated + Start Workflow execution
  2. user is sent an SMS with OTP
    this can fail - if it fails internally temporal will retry :white_check_mark:
    if it passes but user doesn’t receive SMS because operator felt like not sending it - we want the user to be able to request it again :x:
  3. first signal – with help of Maxim I was able to figure this out. this signal can be fire and forget.
  4. user receives OTP and enters OTP
  5. second signal – when we receive this signal the next activity is to verify the OTP. there’s no point in retrying if this fails because it’s user error - so lets say we detect this; we can’t get a hold of the response from the activity and send back to the user.
    we want to get the result of the activity, if it fails respond back to the user and say wrong OTP and then allow them to request a new OTP (first signal) and then enter the OTP again (second signal).

this usecase also won’t work with async complete activity

1 Like

What you need is synchronous update, similar to the query, but with the ability to mutate the workflow state.

We have plans in adding such a feature, but in the meantime, there are a couple of workarounds to achieve what you want.

Signal then Query

The simplest approach is to signal workflow and then query it periodically until query will indicate that the request is successful or failed. Temporal guarantees read after write consistency of signal and then query pair.
The main drawback of this approach is that it introduces additional latency as query polling interval cannot be very low.

Use Local Activity to Notify

A little bit more complex approach is to use a local activity to unblock the caller.
Here is the sequence of steps to implement this:

  • A process that initiates the request needs to expose an endpoint to complete the request.
  • The synchronous request first sends a signal to the workflow. The signal arguments include a unique requestId as well as host and port of the client endpoint. Then it creates a channel (or CompletableFuture in case of JavaSDK) and inserts it into a map using requestId as a key. Then the requesting thread blocks on this channel receive (or Future.get).
  • Upon receiving the signal the workflow performs all the necessary activities and state transitions. Then to reply the workflow invokes the deliver result local activity. This activity invokes the complete request endpoint using the host and port and passing requestId as one of the arguments.
  • As the host and port which are specific to the process are used the request handler is executed in the same process that sent the original signal. Then it gets from the map the channel (or CompletableFuture) that the original requested thread is blocked on and completes it with the result of the request received as another activity argument.
  • The original thread is unblocked by receiving the request result and can continue its execution.
2 Likes

just curious, (using local activity to update) ,is this how the out of box synchronous signals (with reply) be implemented in the future version of temporal?

just curious, (using local activity to update) ,is this how the out of box synchronous signals (with reply) be implemented in the future version of temporal?

It is not 100% designed yet. I believe that the most useful API is a synchronous long poll request to the Temporal service. The open question is about reconnection behavior if the caller process crashed. Some of the options are:

  1. Caller has to make another request
  2. Caller can reconnect by request id. The problem with this approach is deciding when to remove the response as it is not clear if it was consumed or not. And keeping it around for the lifetime of a workflow is expensive.
  3. Updates cannot block the workflow code. This way response is delivered through a long poll (not implemented yet) query. This has benefit over 2 as response lifecycle is managed by the workflow implementor. But it complicates the life of the workflow developer.
  4. Updates are delivered asynchronously through some activity.

We’ll post the design proposal once we get to this.