How to prevent multiple workers from executing the same workflow with same workflow id

My initial though was if there were multiple workers the task would only be processed by a single worker ? is that the case or am I missing something?

I don’t understand your question. Are you asking about “executing the same workflow” or “executing the task”? What is the exact problem you are trying to solve on the business level?

Hi Shawel,

One of the things the Temporal platform guarantees is that a workflow-id will only ever be running on one worker at a time:

“It is not possible for a new Workflow Execution to spawn with the same Workflow Id as another Open Workflow Execution”.

So, to answer your question: There is nothing you need to do to “prevent” concurrent execution, if this is what you are worried about…its just how Temporal works.

If you are worried about the potential re-use of the workflow-id, there is a policy setting on re-use (with some limitations related to retention periods, etc. You can read more here

1 Like

Hi @maxim and @Greg_Haskins

What i am observing is I submit a workflow and both workers execute the same code. may be i am doing something wrong?

essentially submitting the following function with 2 workers running will run the workflow on both workers.

async function runVectorWorkflow(input: string, id: string) {
    const connection = await getConnection()
    const client = new WorkflowClient({
        connection,
        namespace: 'vector-runner'
    });

    const handle = await client.start(vectorRunnerWorkFlow, {
        args: [input, id],
        taskQueue: 'vector-runner',
        workflowId: 'workflow-' + id,
        
    });

    console.log(`Started workflow ${handle.workflowId}`);

    // const result = await handle.result()
    //console.log(await handle.result());

    return { workflowId: handle.workflowId };
}

thank you

Is this the typescript SDK? If so, perhaps use Namespace: workflow | Temporal Node.js SDK API Reference to print out the workflowId and runId to try to get a handle on what is happening.

Super high-level guess: Your workflow is getting scheduled on node 1, crashing, and then scheduled for retry on node 2.

Also potentially relevant here is “lastFailure”

It is OK for the workflow to run on multiple workers as a worker going down shouldn’t block a workflow. At the same time, Temporal guarantees the uniqueness of a workflow by workflow ID. So you cannot have more than one open workflow execution with the same ID.

essentially submitting the following function with 2 workers running will run the workflow on both workers.

This is not a problem even if this is happening. What is the exact business problem are you trying to solve?

1 Like

Thanks @maxim and @Greg_Haskins

The business case i am solving is managing a command line process that gets started by a workflow. The command line process started is a server (transformers) with a unique port and a health check path. It should only be run once (i.e multiple workers should not have the same process running.) The health check is done in an activity which then create a heartbeat to the workflow. It runs on one worker and works fine but somehow the same workflow crashes on the second worker on health check because it could not find the process with the port since it is in a different worker and retries and starts another process in the second worker.

What i am observing is:

  1. 1 worker every thing works fine
  2. 1 worker with the workflow running on it and 2nd worker started everything works fine
  3. 2 or more workers started and workflow started after and I see that observation all the workers eventually run the process.

But if the behavior is that 3) should not happen then i will experiment and see what i am doing wrong…

Best

My understanding is that Temporal will guarantee that only one worker will be running a specific workflow definition on a per namespace/taskqueue/workflow-id basis, but it doesn’t guarantee that all executions (either from reuse or retry) will occur on the same worker. So, if the workflow is being retried, or the workflow-id is being re-used, it could very well bounce around within your worker cluster and this is normal.

I’m still fuzzy on your use case. Do you see anything in the event history or the lastFailure that indicates its being retried? Its not clear to me what you are doing but it sounds like you have workflow failures that are being retried.

OK thanks @Greg_Haskins I will experiment and get to the bottom of it. i have got good information

How exactly does the workflow start the “command line process”? Workflows are not allowed to make any external API calls directly, only through activities. So in your case, I would start the process from an activity and monitor it through a health check from the same activity. The activity has to heartbeat to report that the process is still alive. If it fails, then the process should be declared dead, and the activity retry would restart it on a different worker.

Hi @maxim

Yes all executions run through a few activity. the last activity polls and sends a heartbeat to the workflow until the process is terminated using a signal

Then just route all activity executions to the same host. There is no need to think about the locality of the workflow execution itself.