Signals not recieved/delayed during load testing

Usecase
We have a main workflow and a couple of childworkflows. Childworkflows via activitiy, signal data to main workflow.

During our load testing, we observed that the childworkfow’s activity responsible for sending signals was completing successfully. But the MainWorkflow wasn’t receiving these signals. This was intermittent, out of the 100 workflows we created, some got the signal and some didn’t get the signal.

Question

  1. What should be scaled to ensure signals are received without delay?

Problem
I believe we made a wrong design decission to use signals for the communication between Child workflow and Main Workflow. Say there are 5 childworkflows. For our application to workflow properly, It is importatnt for the current child workflow to signal to the main worklfow and for the main workflow to recieve data before the current childworkflow completes and the next child workflow starts. If this doesn’t happen, then the next childworkflow will start without the data it requires and it will fail.

Question
Is there a way to ensure that the child workflow waits for the main workflow to recieve the signal and succussfully store the data?
Or
Is it better to change the design and in the main workflow we query the data from the childworkflow after the child workflow completes and before the next one starts?

Childworkflows via activitiy, signal data to main workflow.

Why via activity and not from the workflow directly?

During our load testing, we observed that the childworkfow’s activity responsible for sending signals was completing successfully. But the MainWorkflow wasn’t receiving these signals. This was intermittent, out of the 100 workflows we created, some got the signal and some didn’t get the signal.

activity responsible for sending signals was completing successfully. But the MainWorkflow wasn’t receiving these signals.

How do you define “MainWorkflow wasn’t receiving these signals.”? If signal was successfully sent it is guaranteed to be written in a workflow event history. Would you check the workflow history for those “lost” signals?

I believe we made a wrong design decission to use signals for the communication between Child workflow and Main Workflow.

I don’t think it is a wrong decision. The weird part is that you use activities to send signals instead of a workflow directly signaling other workflow.

For our application to workflow properly, It is importatnt for the current child workflow to signal to the main worklfow and for the main workflow to recieve data before the current childworkflow completes and the next child workflow starts.

Temporal is a fully consistent system. It is not possible for a child to send a signal to a parent and parent receive the child completion before that signal. My guess is that the signal handling logic in your workflow is not correct.

Is there a way to ensure that the child workflow waits for the main workflow to recieve the signal and succussfully store the data?

As you are using an activity to send signal then when this activity returns the signal is guaranteed to be stored.

You can send a signal without the activity directly from the workflow code and it is going to guarantee that signal is delivered before the completion.

Is it better to change the design and in the main workflow we query the data from the childworkflow after the child workflow completes and before the next one starts?

Query is certainly not a good fit here.

The flow involves getting the object from Main Workflow via query method , updating it and then signaling back to the main worklfow. I was told that query methods call needs to come from an activity in this question (What is the best way to access and modify object in Parent Workflow from Child Workflow - #4 by maxim). So as query and signal are part of the same flow, we have both in the same activity.

Yes, you are correct. The issue was in the signal logic. My Activiy is a spring components. I had declared the Main workflow stub as the class variable. So when Multiple workflows were executing at the same time, the stub was being modifed by all. So singals were being sent to wrong Main workflows.
In some cases, Some main workflows were getting 5 singals while some weren’t getting any.
After fixing the bug, Its working as expected.
Thanks

1 Like