Using temporal in synchronous API with strong consistency db op requirement and dealing with failures

Hello,
Temporal seems awesome for processing async background jobs/ workflows.
But we’re also evaluating temporal for processing a synchronous REST endpoint use case

Just to simplify the example, we have a REST endpoint that:

1. persists data in a database
    1.a. if step 1 is successful: we queue up events on a queue to be processed by external services
    1.b. if step 1 is unsuccessful: we halt processing, reject the request and throw relevant errors. The user can decide to retry transient issues.

We’re thinking of using temporal to ensure resiliency (via retries) and consistency (via eventual consistency) between the persist op + queuing. Though we require that the db persist op is strongly consistent (i.e the persist op must be performed before we return the response to the user whereas queuing the event can be done “later/eventually”)

Without temporal: The approach is to persist data to db (in 1 transaction) and have a background job polling for new changes and processing the queuing of events + updating the processed events to processed.

With temporal: We thought of the two approaches:

Approach 1:
We thought of keeping the database persist logic outside the temporal workflow. We first ensure that the data is persisted to db. If the persist op is successful, then we kick off a workflow to queue up the event.
drawback:
This seems to suffer from a similar problem as the initial case. Starting a workflow requires a network hop to Temporal server. This operation can fail after we had persisted data to db and as a result the event won’t be queued.

Approach 2:
Introducing a workflow with two activities:
a. activity1 performs persisting data to db
b. activity2 performs the queuing.
We can use the synchronous/blocking start mechanism to process the workflow within the REST endpoint processing.
Questions

  1. What’s a typical latency for workflow execution?
  2. Is there a way to ensure that only a specific activity is successful (i.e the db persist activity) within the REST processing. We’re thinking we can try and read the persist activity status, but wouldn’t that also require a network hop and can also potentially fail leaving us without the ability to figure out if the db op was successful or not by the time we send the REST response back. In essence we don’t know if we should send a success or failure response.

I feel like some of the challenges we have are a result of our requirement for strongly consistent db persist op within a synchronous REST endpoint with the ability to manage potential poison message (i.e operation that will never succeed) all in real-time.

I am not sure if there’s a recommended pattern for our seemingly simple use case.

1 Like

We are currently working on a synchronous update feature that would directly support your use case. Until then you can use the following workaround:

  • Execute workflow synchronously
  • The workflow updates DB and starts a child workflow in an abandoned mode. See this post for details.
  • Complete the workflow
  • The child workflow performs the queueing.
2 Likes

@maxim what’s the expected ETA around this?

We don’t give public ETAs, but this feature is under active implementation.

1 Like

The amazing part of this is it leads to a design pattern where Temporal can orchestrate the entire flow of an end-user facing backend system end-to-end.

I’m talking synchronous workflows, not just asynchronous ones. This single feature would lead to a revolution in terms of backend system design.

APIs can be driven completely by Temporal then.

4 Likes

Regarding this, I have a question if I update the DB successfully but fail to start the child workflow what should I do next? Because when db updates successfully parent workflow must also have to be successful so I can’t return an error when unable to start the child workflow.

The only reason for failing to start a child workflow is a duplicated child workflow ID. If you don’t override its ID to allow duplicates it should never fail.

Hi @maxim would you recommend this pattern for the following situation? There are some overlaps with the original question so asking here, but happy to open a new thread - let me know.

Context: We want to synchronously orchestrate multiple calls to various upstream systems (including our database) as part of an API request.

We could synchronously execute the workflow method until completion but need to account for:

  • Partial Failure Clean Up: One approach could be to use Saga based compensation. This introduces additional delays to the synchronous workflow, so maybe the Saga compensation has to run as an abandoned child workflow

  • Timeouts: There should be an upper bound of 10 seconds for the synchronous workflow execution. We would also need to run compensation logic when timeouts occur. What would be a recommended approach to implementing timeouts in this situation?

One approach could be:

  1. Set the Workflow Run Timeout to 10 seconds
  2. Run saga.compensate in an abandoned workflow on activity failure
  3. Catch the timeout exception at the workflow method caller level and run a separate compensation workflow

Another approach could be:

Promise.anyOf(
  Async::function(businessLogic)
  Async::function(timeoutLogic) // timeoutLogic does a workflow.sleep
)

The future proper solution will be UpdateWithStart, which would unblock workflow after some part of the workflow finished executing. Then the remainder of the workflow would be able to execute compensations in the background.

Until that is available, using abandoned children I think is the best approach.

2 Likes

Hi Maxim, approximately when will the update be available for synchronous update feature