Cross contamination between batched work triggered using Async.function()

Adam_Young · October 2, 2024, 2:34pm

We have a workflow that is built to accept a large group of tasks, divide them into batches of a specific size and then process each batch in parallel. We are currently achieving this with something like:

// Workflow Method
override fun processTasks(tasks: List<Task>): List<Result> {
    val results = tasks
        .chunked(batchSize)
        .map { batch -> function(::processBatch, batch) }
        .flatMap { promise -> promise.get() }
    return results
}

private fun processBatch(tasks: List<Task>): List<Result> {
    // Translate tasks to job inputs
    val jobInputs = tasks.map { activities.getJobInput(task) }
    
    // Run job in another system
    val job = activities.triggerJob(jobInputs)

    // Wait for job completion
    val results = activities.waitForJobCompletion(job)

    return results
}

This works perfectly 99% of the time. However, sometimes we encounter a situation where there seems to be cross contamination between the batches. We have logging that indicates that sometimes the job inputs that are generated by one batch are sometimes passed to the external job trigger by another batch. And then when the job completes the workflow errors out because the expected tasks weren’t processed in the job that the batch was tracking but instead were processed in one of the other jobs that a different batch was tracking.

Important points:

These activity functions are well tested in other workflows and have never shown to have any issues except in this specific workflow where they are running in parallel.
There always seems to be a replay of the workflow during processing of the batches when we encounter the failure. Maybe related, maybe not. We know this because one of our loggers involved in the process isn’t using a workflow logger and is repeating some log messages it had printed previously.

My questions are:

Should we be calling anything but an activity method inside Async.function()? Are we abusing that API by calling a standard Kotlin function that in turn invokes multiple activities?
Would this be better off running each batch in a small child workflow?

maxim · October 2, 2024, 4:20pm

Are functions like chunked, map, and flatMap deterministic?

Adam_Young · October 2, 2024, 5:01pm

Well I’m glad you asked me that and made me think it through again. I originally thought they were and my above example is in fact deterministic. However, that example isn’t 100% accurate to our production code, the difference being that the tasks are passed to the workflow method as a set instead of a list. And the chunked function is not deterministic in that case since a set doesn’t guarantee iteration order.

So in the case that the chunked function may produce different batches on a replay, how does that interact with the Async.function() call. Doesn’t that preserve the parameters it was originally invoked with upon a replay?

maxim · October 2, 2024, 5:10pm

Async.function is deterministic.

awwx · October 3, 2024, 3:33am

No, Async.function is itself deterministic, but it doesn’t take a non-deterministic input and turn it into a deterministic output.

Adam_Young · October 3, 2024, 5:11pm

Yeah, I had a fundamental misunderstanding of what Async.function was doing. I thought it preserved inputs/outputs in the workflow history kind of like an activity would. Thank you both for your help, we reviewed our workflow and think it’s the fact we are supplying a Set as input to the workflow is causing our problem.

Topic		Replies	Views
Parallell processing should be activities or workers? Community Support	2	481	August 24, 2023
Producer/Consumer in a Workflow Community Support java-sdk	1	78	October 9, 2024
Organizing flow in UI Community Support child-workflow , workflow-options , typescript-sdk , web-ui , workflow-implementat	3	343	July 13, 2023
Async.function control flow in command queue pattern Community Support java-sdk , general-impl	9	86	April 29, 2025
Perform Batch API calls using workflows Community Support java-sdk	7	905	July 24, 2023

Cross contamination between batched work triggered using Async.function()

Related topics