Parallel execution of tasks and waiting for the result

This is my first time using temporal. I would like to ask what best practices exist for my case study.

I’m writing an application that when a new user is added, needs to contact a third party service and get the IDs of all the user’s products. There could be thousands of them. Then for each product, another query needs to be done to get its characteristics. But these queries need to be done in parallel.

I have calculated the structure, it turns out that I will have
FetchUserWorkflow, FetchProductsWorkflow, FetchPropertiesWorkflow. Each workflow will have one activity with retrieve (to query the api) and store (to save to the database) methods. I say this to make it clear that it is the child workflows that are needed, not just the activities.

FetchUserWorkflow will be the parent for the other two. Pseudocode:

FetchUserWorkflow:
$productIds = await executeChildWorkflow(FetchProductsWorkflow)

foreach($productIds as $productId):
$promises = executeChildWorkflow(FetchPropertiesWorkflow)

$result = await Promise.all($promises)

As far as I understand, this code will send tens of thousands of tasks to temporal at once, but they will be executed according to the number of workers set in the configuration. Is this a normal practice?

The issue is that a user may have many items - thousands, tens of thousands, maybe even hundreds. In any case, I don’t want to be limited by the resources of the process in which the workflow is executed. And getting the result from a bunch of workflows into a variable will prove to be too resource-intensive, the only issue is their number, which I can’t control.

In this regard, I would like to know what tactics would be correct in the case when I need to process thousands of tasks in parallel and wait for completion in the parent workflow.

As far as I understand, this code will send tens of thousands of tasks to temporal at once, but they will be executed according to the number of workers set in the configuration. Is this a normal practice?

Yes, it is; tasks will be queued until a free worker is available; however, you have to ensure that your ScheduleToClose timeout allows that, or it is not set.

The issue is that a user may have many items - thousands, tens of thousands, maybe even hundreds. In any case, I don’t want to be limited by the resources of the process in which the workflow is executed. And getting the result from a bunch of workflows into a variable will prove to be too resource-intensive, the only issue is their number, which I can’t control.

I would recommend operating not on individual IDs if possible, but rather on batch IDs (if external API allows that). In that case, you can store a list of IDs in the external DB and reference them.

The general idea is to avoid carrying massive amounts of data in workflow and instead carry references to this data.

Could you please explain in more detail what it means to use Batch IDs? I’m using the PHP SDK if it’s important. I didn’t find any mention of batch in its documentation.

Check out these Java samples (sorry no PHP ones for this) for implementing batch jobs:

Got it, all the same things can be implemented in php.

Still, I am confused by one more question, which I have not fully understood and it does not give me peace. Can I ask it in this thread?

If I still use the standard approach (not batch), but break all the IDs into chunks for example 100 pieces, I get about this code:

while(true) {
  $offset = 0;
  $page_100_productIds = yield $activity->fetchWithLimitAndOffset(100, $offset);

  if(!$page_100_productIds) {
    break;
  }

  foreach($page_100_productIds as $productId) {
    $promises[] = yield $activity->process($productId);
  }

  yield Promise::all($promises);
  $offset += 100;
}

Do I understand correctly that if, for example, having a million IDs somewhere in the end at 999 thousand workflow will crash, it will restart and will be replayed from the beginning? That is, the history of all chunks will be stored somewhere, and starting from the very first one until the moment of crash, the activity will be called, and the proxy will return the result of old executions? So this loop will execute just under 10,000 times to restore its state to the time of the last crash, and then continue to execute as before?

That is correct. Every activity that was executed before the workflow restart will be replayed from the history (returning old execution results).

bear in mind the Temporal limits as well