Use case: Child workflows that run activities in parallel

Hi,
I’ve been recently trying to build a workflow to create users in bulk. Mainly because this process is long and it wasn’t very resilient. This process has a few steps, let’s call them step A,B & C.

So, for each step, I call a child workflow and run the activities in parallel (in blocks of 200 activities). Something like so:

[Inside Workflow]

    ctx = workflow.WithChildOptions(ctx, chOptions)
    // companyID int64 - fileID UUID - owner string
    future := workflow.ExecuteChildWorkflow(ctx, Step, companyID, fileID, owner)
    err := future.Get(ctx, nil)
    if err != nil {
	  return errors.Wrap(err, "failed to run child workflow")
    }

[Inside the child workflow]

   actOptions := workflow.ActivityOptions{
     StartToCloseTimeout: 10 * time.Minute,
   }

	ctx = workflow.WithActivityOptions(ctx, actOptions)
    // groupSize = 200
	userGroupInstructions := userInstructions.SplitInGroups(groupSize)
	for _, instructions := range userGroupInstructions {
		var futures []workflow.Future
		for _, processInstruction := range instructions {
			fut := workflow.ExecuteActivity(ctx, ActivityA, instructionID)
			futures = append(futures, fut)
		}

		for _, future := range futures {
			err := future.Get(ctx, nil)
			if err != nil {
				fmt.Printf("failed to execute activity: %v\n", err.Error())
			}
		}
	}

This is the plan I currently have implemented.
I apologize in advance if it’s silly, I’m fairly new to this and I’m trying to make it work in the best way possible.
That’s why I’m seeking your advice on this, especially because after the workflow has been running for a while, I start seeing the lookup failed for scheduledEventID to activityID (followed by a panic) error, and some activities are closed with the error WORKFLOW_TASK_FAILED_CAUSE_FORCE_CLOSE_COMMAND.

Any advice is well appreciated, thank you in advance to anyone who read through this.

Make sure that userGroupInstructions and instructions are not maps.

Thanks for the quick reply.
userGroupInstructions is a slice of slices and instructions is a slice
Sorry for not pointing that out.

It looks like nondeterministic execution of the workflow code. Make sure that you don’t do anything outlined as breaking determinism in the relevant section of the documentation.

1 Like

Thank you for sharing that.
I need to double-check those ground rules. I’ll get back to you soon.

@maxim Okay, I found many places in which I was doing the following:

  • Calling a different logger than the one offered by the SDK
  • Using the time library in some activities and DB calls

I adjusted that according to the ground rules described in the docs you shared and I was able to run the workflow to process a file with 1000 users without any issue showing up.
So thank you so much.

Now, before closing this… Do you have any suggestions for the plan described above?

  • Calling a different logger than the one offered by the SDK

Using a different logger produces duplicated log lines but is not expected to break workflow deterministic execution.

  • Using the time library in some activities and DB calls

The restrictions apply only to the workflow code. Activities can use any APIs including time ones without restrictions.

Do you have any suggestions for the plan described above?

The plan looks reasonable. One thing to watch is the size of an individual workflow. For example, it is not going to work if userGroupInstructions slice returns 100k elements.

1 Like

Understood. Thank you again for all the help