Building a sync API to do distributed transaction

Hi, Folks i am building a sync API that needs to write to multiple external services transactionally. I am trying to figure if/best way to use temporal for this.

My API needs synchronously to do the following:

  1. Write to Database
  2. Write to service A
  3. Respond back to the client

One way we are thinking of solving this problem is by building a cleanup temporal workflow. i.e the workflow only has a single step, which is to wait for 5 mins and if no signal received do clean up of steps 1 & 2.

So the steps in the API would be:

  1. Start a cleanup workflow that goes into wait state (below steps not part of the workflow)
  2. write to database
  3. write to service A
  4. Signal the clean up workflow to close out since all write actions are complete
  5. respond back to the client

If the clean up workflow does not receive signal (step 4) within 5 mins it goes ahead and reverts any possible changes in database and service A

Would love to get your thoughts on if there is a better way to do this distributed transaction using temporal.

One of the alternative ways we looked at is using the SAGA on with a synch workflow start and local activities. But i am not completely sure how the states are saved in for local activities (i.e what happens if host crashes)

While your approach works I would go with SAGA which is simpler.

But i am not completely sure how the states are saved in for local activities (i.e what happens if host crashes)

If you execute both updates as parallel local activities and host crashes both of them will be retried on a different host.

Hey Maxim!

Let me add a little more detail here and see if it changes anything. We want to do the following in a synchronous API:

  1. write to db
  2. write to service A
  3. clean up if #1 or #2 fail (ideally doing this asynchronously)
  4. continue workflow execution asynchronously (this could also be a call to kick off a different workflow)

If #1 or #2 fail we would like to return a 500 to caller immediately.
If #1 and #2 are successful, we would like to return a 200 to caller. #4 should run asynchronously and shouldn’t affect the result of the api call.

Thoughts?

Thanks!

If you execute both updates as parallel local activities and host crashes both of them will be retried on a >different host.

We wouldn’t want this to happen.

I see. Then your original approach with the cleanup workflow would be the simplest. What is the start workflow rate you are targeting?

What is the start workflow rate you are targeting?

The start rate should be low, < 1 second. This API should only be called a few times per customer throughout their lifetime.

Then your original approach with the cleanup workflow would be the simplest.

This API is used to create a resource in our system. Unfortunately, it is not idempotent. It will be generating a new UUID each time it is called and that is the id that we will use to start the WF. The original approach I think works, but will leak resources if the host dies or if the client times out.

We are considering adding a session_id to the create call that the caller can pass in. We could use that id as the id of the workflow we create. That would allow us to lookup if a workflow that an earlier call created. I think if we can do that then sagas will work.

We would still want to run #1 and #2 synchronously and #3 and #4 asynchronously? Is that possible?

We would still want to run #1 and #2 synchronously and #3 and #4 asynchronously? Is that possible?

We plan to add direct support for “block start call until workflow unblocks it” feature. In the meantime the simplest workaround is:

  1. Execute workflow synchronously waiting for it to complete
  2. Workflow executes #1 & #2
  3. Workflow starts a child workflow in disconnected mode.
  4. Workflow completes unblocking the original thread that started it
  5. The child workflow executes #3 & #4 in the background

We plan to add direct support for “block start call until workflow unblocks it” feature.
I think this is what we need!

Another use case, this time an update call, we want to update the database then do some asynchronous processing kinda like we could do with a dynamo stream.

So api would do this:
#1 Api thread: kick off workflow. Workflow blocks until signaled
#2 Api thread: write to database recording run-id (not in workflow)
#3 Api thread: signal workflow (success|fail)
#4 Workflow: check run-id matches running workflow (to ensure write to database was successful in case we missed signal on failure)
#5 Workflow: continue processing workflow

Another requirement is that we only want to ensure 1 workflow is running for entity at a time. So our create and update APIs would kick off workflows with the same id.

I think if we had your “block start call until workflow unblocks it” feature we could do:
#1 start workflow
#2 activity writes to db (run this as a local activity)
#3 signal workflow to unblock which would cause API to return
#4 continue processing workflow

Do you have any thoughts / suggestions for us here?

Another option is to notify your frontends from workflows to unblock synchronous requests as described in this post.