We have set the workflow timeout to be 45 seconds using setWorkflowExecutionTimeout(). The expectation was, all the workflows will be closed within that time (maybe some will have Timedout status).
After running a few rounds of stress tests we can see some workflows are still open (with Running status) after a few days.
Not sure in what circumstances this can happen. If we start accumulating these open instances, how will this impact the Temporal system?
Here is what I see in the Temporal UI for once such workflow:
Note the empty WorkflowExecutionTimeout and WorkflowRunTimeouts. So they defaulted to the infinity. It looks like your 45 second timeout wasn’t specified. Would you post the code that starts the workflow specifying this timeout?
Thanks, @maxim. I understand there is some problem with our code. I will share it with you in some time.
One question I had though, can we have any default timeout in the Workflow definition to protect us from instances when the client does not specify anything? Something like @WorkflowMethod(executionStartToCloseTimeoutSeconds = 10) in Cadence?
Defining timeouts in the annotation can be misleading. For example, the value is changed and the new worker is deployed. But the process that starts workflow still uses the old version of the workflow interface. The old timeout is going to be used. Or workflow is started using non strongly typed interface then the timeout will not be specified.
We recommend not setting workflow timeouts at all. They don’t have any business meaning and the only reason for them is to avoid runaway workflows that consume system resources for a long time.