Parallell processing should be activities or workers?

Hey guys,

Sorry if I am question is a bit silly since I only have like 3 months of use with Temporal so I am still fresh. Being said that right now I have like a table with 50k records and each row needs to be processed some data that can take some time. So I decided for each row to initiate a workflow on temporal and process it. But it seems there is some spoil over that data, meaning that data that was being used on workflow-ID-A sometimes is passed into workflow-id-B.

So I am a bit lost in what I am doing wrong. Should I have created one workflow (since this is cronjob) and for each row just create an activity?

Any help would be highly appreciated,
Thanks

Hi @Vitor_Goncalves ,

there are silly questions.

It seems you could implement this procesing the rows in batches.

You can have a workflow with one activity that poll the x first number of rows (let say 100), and them for each row create an activity or child workflow (it dependes if processing each row involve one or more actions/activities) to process each row.

But it seems there is some spoil over that data, meaning that data that was being used on workflow-ID-A sometimes is passed into workflow-id-B.
Processing the rows sequentially will allow you to pass any data from the previous row to the next processor (activity/child workflow)

After processing the first batch, you can invoke continueAsNew pasing the offset to the next workflow run, that will use the offset in the first activity to poll the next x number of rows, and so on.

We have an example in java with different approaches https://github.com/temporalio/samples-java/tree/main/core/src/main/java/io/temporal/samples/batch

Let me know if it helps,
Antonio

Hey @antonio.perez ,

Thank you very much for your help. I will change as you said.

Thanks,
Vitor