Parallel execution of tasks and waiting for the result

windbridges · January 22, 2024, 6:33am

This is my first time using temporal. I would like to ask what best practices exist for my case study.

I’m writing an application that when a new user is added, needs to contact a third party service and get the IDs of all the user’s products. There could be thousands of them. Then for each product, another query needs to be done to get its characteristics. But these queries need to be done in parallel.

I have calculated the structure, it turns out that I will have
FetchUserWorkflow, FetchProductsWorkflow, FetchPropertiesWorkflow. Each workflow will have one activity with retrieve (to query the api) and store (to save to the database) methods. I say this to make it clear that it is the child workflows that are needed, not just the activities.

FetchUserWorkflow will be the parent for the other two. Pseudocode:

FetchUserWorkflow:
$productIds = await executeChildWorkflow(FetchProductsWorkflow)

foreach($productIds as $productId):
$promises = executeChildWorkflow(FetchPropertiesWorkflow)

$result = await Promise.all($promises)

As far as I understand, this code will send tens of thousands of tasks to temporal at once, but they will be executed according to the number of workers set in the configuration. Is this a normal practice?

The issue is that a user may have many items - thousands, tens of thousands, maybe even hundreds. In any case, I don’t want to be limited by the resources of the process in which the workflow is executed. And getting the result from a bunch of workflows into a variable will prove to be too resource-intensive, the only issue is their number, which I can’t control.

In this regard, I would like to know what tactics would be correct in the case when I need to process thousands of tasks in parallel and wait for completion in the parent workflow.

Wolfy-J · January 22, 2024, 2:41pm

As far as I understand, this code will send tens of thousands of tasks to temporal at once, but they will be executed according to the number of workers set in the configuration. Is this a normal practice?

Yes, it is; tasks will be queued until a free worker is available; however, you have to ensure that your ScheduleToClose timeout allows that, or it is not set.

The issue is that a user may have many items - thousands, tens of thousands, maybe even hundreds. In any case, I don’t want to be limited by the resources of the process in which the workflow is executed. And getting the result from a bunch of workflows into a variable will prove to be too resource-intensive, the only issue is their number, which I can’t control.

I would recommend operating not on individual IDs if possible, but rather on batch IDs (if external API allows that). In that case, you can store a list of IDs in the external DB and reference them.

The general idea is to avoid carrying massive amounts of data in workflow and instead carry references to this data.

windbridges · January 22, 2024, 6:56pm

Could you please explain in more detail what it means to use Batch IDs? I’m using the PHP SDK if it’s important. I didn’t find any mention of batch in its documentation.

maxim · January 22, 2024, 7:03pm

Check out these Java samples (sorry no PHP ones for this) for implementing batch jobs:

windbridges · January 23, 2024, 7:50pm

Got it, all the same things can be implemented in php.

Still, I am confused by one more question, which I have not fully understood and it does not give me peace. Can I ask it in this thread?

If I still use the standard approach (not batch), but break all the IDs into chunks for example 100 pieces, I get about this code:

while(true) {
  $offset = 0;
  $page_100_productIds = yield $activity->fetchWithLimitAndOffset(100, $offset);

  if(!$page_100_productIds) {
    break;
  }

  foreach($page_100_productIds as $productId) {
    $promises[] = yield $activity->process($productId);
  }

  yield Promise::all($promises);
  $offset += 100;
}

Do I understand correctly that if, for example, having a million IDs somewhere in the end at 999 thousand workflow will crash, it will restart and will be replayed from the beginning? That is, the history of all chunks will be stored somewhere, and starting from the very first one until the moment of crash, the activity will be called, and the proxy will return the result of old executions? So this loop will execute just under 10,000 times to restore its state to the time of the last crash, and then continue to execute as before?

Wolfy-J · January 23, 2024, 9:07pm

That is correct. Every activity that was executed before the workflow restart will be replayed from the history (returning old execution results).

bnpnt · January 28, 2024, 11:18pm

bear in mind the Temporal limits as well

Topic		Replies	Views
Waiting for child workflows Community Support	3	1722	April 23, 2021
Asynchronous child workflow execution problem Community Support child-workflow , php-sdk	1	69	December 3, 2024
Organizing flow in UI Community Support child-workflow , workflow-options , typescript-sdk , web-ui , workflow-implementat	3	342	July 13, 2023
Parallel activity execution java-sdk Community Support	12	3019	October 7, 2022
Adding task orchestration functionality on top of workflows Community Support general-impl	3	673	April 26, 2024

Parallel execution of tasks and waiting for the result

Related topics