signalWithStart weird behaviour

Hi folks,
Trying to implement the following pattern with the help of signalWithStart.
If a workflow with a given ID exists, signal will enqueue the next token to be processed for a given ID in a queue. Workflow method will pop it out in a loop and process. Workflow finishes when queue is empty.

If the workflow with a given ID has already finished, signalWithStart will start the workflow (again with the same ID), send the signal to enqueue the token and execute the workflow method.

I am trying to test this with back to back signalWithStart calls in my test with same workflowId. I am creating a different stub for each signalWithStart call however passing the same workflowOptions to each stub so the ID remains the same.

There seem to be following unexpected things happening

  1. The first signal i.e. the first token is enqueued twice. Tested it many times. Second token is enqueued only once.
  2. I am logging a message at the end of the workflow method. It seems like workflow method is getting executed twice instead of tokens getting enqueued and a single workflow processing the tokens back to back.
    But even if there are two executions, the runId seems to be the same across both runs?
    So I log in the signal method and log as the workflow method exits. It could be that 2 separate executions are running back to back. But are they expected to share the runId?
    Why is my first signal being delivered to the second execution i.e. how come the token I am enqueuing through first signalWithStart getting enqueued again? I checked the code many times and the second signalWithStart should not be enqueuing the first token. Is this some artifact of the test environment?

Please correct me if there are any gaps in my understanding :pray: Thank you!

cc @maxim Please guide :pray:

Do you see any signal side effect twice? Are you logging using the provided Workflow logger?

I guess that you are experiencing workflow replay and get confused.

Thanks Maxim. Yes I do see the signal side effect twice. The first token is indeed being processed twice because it was enqueued twice. Yes logging with the workflow logger.
I put a breakpoint in the signal method and see it hit 2 times for the first signal.

Is the runId expected to stay the same across runs?

If I wait in between two executions to allow the first one to complete, then I get
WorkflowExecutionAlreadyCompletedError for the second execution.

A gist of my workflow is as follows

@SignalMethod
fun enqueue(token: TokenType) {
   tokensToProcess.add(token)
   logger.info("Enqueued token" + token)
}

@WorkflowMethod
fun process(workflowParam: WorkflowParamType) {
      while (tokensToProcess.isNotEmpty()) {
           val token = tokensToProcess.removeFirst()
           // process token
      }
      logger.info("Completed")
}

workflow invocation

        workflowClient = TestWorkflowEnvironment.newInstance().newWorkflowClient()     

        val firstStub = workflowClient.newWorkflowStub(
            MyWorkflow::class.java,
            workflowOptions
        )

        val request1 = workflowClient.newSignalWithStartRequest().apply {
            add(
                firstStub::process,
                WorkflowParamType(workflowParam)
            )
            add(
                firstStub::enqueue,
                TokenType(token1)
            )
        }
        workflowClient.signalWithStart(request1)

        val secondStub = workflowClient.newWorkflowStub(
            MyWorkflow::class.java,
            workflowOptions
        )

        val request2 = workflowClient.newSignalWithStartRequest().apply {
            add(
                secondStub::process,
                WorkflowParamType(workflowParam)
            )
            add(
                secondStub::enqueue,
                TokenType(token2)
            )
        }
        workflowClient.signalWithStart(request2)

Workflow is already completed: COMPLETED
WorkflowExecutionAlreadyCompletedError(message:Workflow is already completed: COMPLETED)
	at app//com.uber.cadence.internal.testservice.TestWorkflowMutableStateImpl.checkCompleted(TestWorkflowMutableStateImpl.java:1246)
	at app//com.uber.cadence.internal.testservice.TestWorkflowMutableStateImpl.update(TestWorkflowMutableStateImpl.java:231)
	at app//com.uber.cadence.internal.testservice.TestWorkflowMutableStateImpl.update(TestWorkflowMutableStateImpl.java:212)
	at app//com.uber.cadence.internal.testservice.TestWorkflowMutableStateImpl.signal(TestWorkflowMutableStateImpl.java:1528)
	at app//com.uber.cadence.internal.testservice.TestWorkflowService.SignalWithStartWorkflowExecution(TestWorkflowService.java:546)

Do you see two signals in the history?

Let me check. This is a test environment so I dont have a web UI to check the history. Any quick way to verify this?

Looks like in test env, all runIds will be the same. Could that be causing issues here? Any way to mitigate it if it is the case?

For this one even setting

.setWorkflowIdReusePolicy(WorkflowIdReusePolicy.AllowDuplicate)

does not help :frowning:

Ok I see only 2 signal events in the history

HistoryEvent: eventType:WorkflowExecutionSignaled.

Is the first signal being replayed somehow?

Could this be happening @maxim ? I am sending another signal while the workflow method is still processing the previous token. Does cadence replay all received signals so far?

Ok doesn’t look like signalWithStart in test env supports AllowDuplicate ID reuse policy :frowning:
https://github.com/uber/cadence-java-client/blame/fdddfbc6b6b774228884da0cdc95ee0734bfd122/src/main/java/com/uber/cadence/internal/testservice/TestWorkflowService.java#L546

where start workflow test API is tolerant of it

https://github.com/uber/cadence-java-client/blame/fdddfbc6b6b774228884da0cdc95ee0734bfd122/src/main/java/com/uber/cadence/internal/testservice/TestWorkflowService.java#L244

Am I reading it correctly? Any way to bypass it?

If the signal is found in the history twice, then the workflow is expected to process it twice. So it is working as expected. You need to find out why you end up sending two signals to the workflow.

Ok doesn’t look like signalWithStart in test env supports AllowDuplicate ID reuse policy

The ID resue policy applies only no workflows are running. I believe in your case the workflow is running, so it is not relevant.

The ID resue policy applies only no workflows are running. I believe in your case the workflow is running, so it is not relevant.
I can see that the first workflow instance has completed. (I am waiting in the test for a few seconds to make sure). Then when I try to start a new instance of the workflow using signalWithStart it fails with “WorkflowExecutionAlreadyCompletedError”.

In the first case where I call signalWithStart with 2 tokens back to back, they are getting enqueued in the same workflow instance. No matter whether the workflow is already running or completed, I should never encounter “WorkflowExecutionAlreadyCompletedError”, right?

If the signal is found in the history twice, then the workflow is expected to process it twice.

Yes I am sending 2 signals intentionally to test serialized token processing. However I see that the first signal is processed twice, hence a total of 3 signal method invocations are happening (2 for the first token and 1 for the second token). Is the second signal hitting the workflow during the execution of the WF method causing the first one to be replayed?

Thanks @maxim :pray: Appreciate your help. Cadence is a phenomenal piece of engineering, however with a steep learning curve :smiling_face_with_tear:

Sorry if I mixed the two issues, both are happening in unit test environment.

  1. First issue where a 2 signalWithStarts are invoked and 3 seen to be processed (first one twice).

  2. Second issue is if I let the first instance of workflow finish and attempt to start a new instance using signalWithStart, it fails with WorkflowExecutionAlreadyCompletedError. I’m wondering if this is an artifact of the TestWorkflowEnvironment

  1. I don’t believe this is possible. How do you determine that the signal is processed twice? Stand-alone reproduction is needed to help you here.

  2. WorkflowIdReusePolicy does apply in this case. What is the value used in your case? I doubt this is an artifact of the test environment.

Thanks Maxim.

  1. I print a log in the signal method. I see the first token is being enqueued twice. First time when I do signalWithStart with first token, then when I do signalWithStart with token 2 I see both token 1 and token 2 getting enqueued. There is nothing in the signal method but add to the queue and print a log.
    Thankfully my workflow method is idempotent so duplicate tokens are ignored.

  2. Workflow id is a UUID. Definitely the same. The stacktrace shows that it finds the previous instance of the workflow in completed state and throws and exception. The ID reuse policy should allow for another instance of workflow to be started here, but doesn’t look test env handles it.

https://github.com/uber/cadence-java-client/blame/fdddfbc6b6b774228884da0cdc95ee0734bfd122/src/main/java/com/uber/cadence/internal/testservice/TestWorkflowService.java#L546