One of the workflows we have, process files and perform different operations over them and following the examples in the documentation (samples-java/FileProcessingWorkflowImpl.java at master · temporalio/samples-java · GitHub), we are using
taskQueues so the activities that need those files are executed always in the same worker.
The issue we have faced is that when a worker dies (or is taken out of load balance), the next activities are not taken by any other worker (as it should be) and eventually the workflow fails due to a timeout on the activity not being taken.
We would like a mechanism to automatically recover from that, as it’s something it could happen fairly often. Any idea? We were thinking about some kind of reset of the workflow when the timeout is reached, but we don’t if it’s feasible or not.