Replayability and batch processing

Two of the major advantages of pubsubs is that

  • Batch processing: You can consume events/messages in batches and process them in batches. This enables you to write batch style queries which gives great performance improvements
  • Replayability: You can replay the events in the case that the deploy was bad.

How do you implement these two use cases in temporal?

1 Like

Batch Processing is a very broad term. There are dozens of systems that use Temporal for batch processing, but the exact pattern depends on use case. What exactly is your use case? Are you running multiple workflows and some activity benefits from a batch requests? Or something else?

Temporal supports reset feature, when state of all workflows automatically rolled back before the bad build and all the events are reapplied. Note that it is superior feature to what you can achieve through queues. While you can replay messages from queues it doesn’t help if bad build corrupted state of a DB. In case of Temporal the state is rolled back automatically as well.

1 Like

Let me give a simple example for batch processing. We do payment processing and we want to settle transactions. So the use case for a queue would be to identify the transactions ready for settlement and then enqueue them. When dequeued by the settlement workflow, we don’t want to settle one transaction at a time as it would take too many queries to our DB. We may want to dequeue 100 ids and do something like

UPDATE transactions set state = settled where ids IN <list_of_100_ids>

How do you propose we handle such cases?

You can use an asynchronous activity implementation. An activity would add the request to the local buffer and return the activity function. Then a separate thread would execute the batch update to the DB and complete the activities after that.

Note that Temporal is using queues internally. So an activity invocation is done through a task queue which an activity worker process listens on. So all the benefits of queues as flow control, rate limiting, batching, etc. are not lost.

But as Temporal is a workflow engine it adds a lot of features on top:

  • Unlimited activity execution time
  • Unlimited heartbeating
  • Unlimited exponential retries
  • Activity cancellation
  • Routing of activities to specific processes