Collecting results for bulk operations

I have a case where I have several separate workflows that all need to do a similar activity (interacting with an external API). This API supports bulk requests, and I would like to collect up the requests for several workflows and then send a single request to the API. What is the best way to do this? I’ve tried using a ChildWorkflow, but that does not seem quite right.

Here is a drawing that hopefully makes it more clear.

1 Like

What is the request rate? The number of requests per batch and number of batches per second?

This is just an example of a use case that we need to support, where we are batching work together into groups. The bulk request might be to an API (happens quickly, seconds or less), or it might be waiting for human action (takes minutes or hours).

Ideally the collector would wait for N requests, or X amount of time before triggering the bulk request, and it would be able to collect a batch for another bulk request while waiting.

The design depends on the rate and batch size. So N/X doesn’t really help :).

haha fair enough. Can you help me understand why? Or several options that I should consider?

Let’s say N = 10, and X = 30 minutes, and lets assume it is human work, so the bulk request takes 1 hour to complete.

For such low rate the best design is to have a separate batching workflow. It receives requests from other workflows in the form of signals, then based on timer or batch size executes an activity with the batch update. Then notify all the initial senders through signals about the batch completion.

This design wouldn’t work with 100k N for example :).

Ok that makes sense.

What is the best way to trigger the batch workflow? ExecuteChildWorkflow?

It should always be alive. I would start it when your worker starts and give it unique id like “batch” to ensure that only one instance is running. It should also call continue as new periodically to ensure that history is reset to 0 size.

That makes sense.

Let’s make it a bit more complicated. Lets say I want to batch based on some criteria (such as a company ID) so that each batch is for a specific company ID. Would a batch workflow for each company ID be the best way to go?

Yes, absolutely. Use the company ID as a batch workflow ID in this case.

Ok next challenge. I’m working on implementing a POC of this, but I’ve run into an interesting problem.

I don’t necessarily know the company ID ahead of time. Since there is not an API for workflow.SignalWithStart, I thought I might be able to trigger the batch workflow using workflow.ExecuteChildWorkflow, but this has 2 problems: 1) the batch workflow is now a child of whichever workflow got there first (not ideal) and 2) I’ve discovered that if I send a signal to the workflow before it has started (without specifying the runID) the signal appears to be lost.

I’m not sure if that 2nd problem is a bug or if it is expected behavior.

Here is a sample of the POC I have created, based on samples-go/expense: POC for batching work together. by jmoseley · Pull Request #1 · jmoseley/samples-go · GitHub

Is it possible to use SignalWithStart from within a workflow?

The workaround is to use SignalWithStart from an activity.

Oh that makes sense, I’ll give that a try.

That worked great, thanks for the help.

For anyone coming here later, I updated that branch with the working implementation.

Small nit: we recommend using structures to implement activities. This allows passing dependencies to activities when structure is initialized before registration with a worker.

See greetings example.