Use temporal to replace transactional outbox pattern

I’m new to temporal and there is a use case I have in mind that I want to check with the community if it is a fit.
I currently have application logic to write events into postgres and a separate process to read from a logical replication slot to push these events to an event bus. So in short, it’s a CDC process/transactional event outbox pattern. This pattern gives us latency around sub 500ms(p95) for each event, more than 10k events per second sent to the event bus and at least once event delivery to the bus (avoiding the common pitfall of dual writes to db and bus).
I’m wondering if I can use a workflow to simplify such set up to achieve the same or better non functional characteristics?

Hi Shine, welcome to the forum! I assume writing the event to Postgres is in a transaction with another DB update? Using workflows is a much simpler alternative to transactional outboxes—you just do:

async function myWorkflow(info) {
  await dbUpdateLocalActivity(info)
  await publishEventLocalActivity(info)
}
  • Latency: Assuming Postgres and the event bus are nearby, it should normally complete within 500ms.

  • At least once delivery: Yes

  • Throughput: 10k events/s is definitely possible on Temporal Cloud (we’ve tested up to a million state transitions per second), and I think you could reach it with a self-hosted Temporal Cluster if it’s tuned well, along with a large enough Cassandra cluster.

Some more efficient approaches would be to start a workflow for each batch of events or using a long-running activity with heartbeats to record progress and recover from a crash. There are also ways to parallelize the processing of the replication log by partitioning the events based on their ID.

Eager workflow start also helps reduce the load and latency but that’s currently only available with latest server and the Java SDK.

It’s not quite that simple, is it? dbUpdateLocalActivity now needs to be idempotent, because reporting its state to Temporal can fail and Temporal will retry it, even though the local database transaction already succeeded. Additionally, the response when it is retried has to be without errors in order for the workflow state to correctly reflect the result, even if it cannot be executed twice (e.g. in the case of an insert).

Correct, it should be idempotent and not throw on subsequent executions.

For the non-Temporal version to be correct, I’m imagining it also needing to have the same property? If I fetch an event from Kafka, do a db transaction with state update + outbox message, and then the process dies before I get to the next step (committing offset to mark Kafka event processed), then another process will pick the same event up from Kafka and attempt to do the same db transaction, so it should be idempotent.

In our use case it is a single atomic operation initiated by a client call to an API, so idempotency was not necessary because there was no partial success expected (though I suppose there could be a timeout on the response even if the transaction succeeds).

We need to initiate an asynchronous child workflow if that operation succeeds. Our plan currently is to make the initial operation a synchronous workflow with one activity that includes an idempotency check using an idempotency key. If it succeeds the API will initiate a child workflow for the asynchronous part of the process and the user will receive a success message. If it fails the user will receive a failure response. Ideally the user would receive a successful response if the database transaction succeeds, even if the workflow state is not updated in Temporal. We wouldn’t want both Temporal and the user to retry the operation.

Is this a case in which the Workflow Id can be the idempotency key? With reuse policy Reject Duplicate or Allow Duplicate Failed Only. Id either provided by the client or deterministically generated from the input, so that if the client sends the request again, a duplicate Workflow won’t be created.