Java SDK: Random ALREADY_EXISTS error when sending 100 concurrent requests

Hi,

During load testing of my workflow I’m noticing random “ALREADY_EXISTS” errors when trying to start my workflow.

I have a REST endpoint which I call from a shell script 100x in parallel to simulate 100 users trying to start the process.

It all works fine most of the time, but there are a few times where I get an exception and it seems that my process doesn’t get run.

Seems like a race condition or something, because sometimes when I check on the status of the workflow listed in the exception is says “Completed” other times I don’t see it run.

Is there any other debugging I can do? metrics to look for?

Error:

Caused by: io.grpc.StatusRuntimeException: ALREADY_EXISTS: Workflow execution is already running. WorkflowId: LicensePurchaseWorkflow-17ee804a-717f-495b-9d3c-b4961b065f77, RunId: 4bd44a55-302c-4eea-8aa9-785dc2fd5115.

This is my rest endpoint which I’m calling in my shell script. It then attempts to start my workflow in temporal.

public Response purchaseLicenses(@Valid @SpanAttribute(value = "http.payload") PurchaseLicenseRequest request) {
        log.infof("Initiating purchase license request with incoming request - %s", request);

        try {
        UUID requestId = UUID.randomUUID();
        
        workflow = observer.getClient().newWorkflowStub(
                PurchaseLicenseWorkflow.class, WorkflowOptions.newBuilder()
                        .setWorkflowId(workflowPrefix + "-" + requestId.toString())
                        .setTaskQueue(taskQueue).build()
        );

        // Create the context
        PurchaseLicenseContext ctx = PurchaseLicenseContext.builder()
                .transactionId(requestId)
                .customerEmail(request.getCustomerEmail())
                .expirationDate(request.getExpirationDate())
                .giftCards(request.getGiftCards())
                .serials(request.getSerials())
                .build();
        
        WorkflowClient.start(workflow::purchaseLicenses, ctx);
        return Response.accepted().build();
        } catch (Exception e) {
            e.printStackTrace();
            throw e;
        }

    }

I think I found the issue.

Turns out I was accidently sharing the workflow variable globally in my REST class and as a result it was getting overwritten during high incoming loads.

Moving the variable to the scope of my method seems to have fixed the issue. I can now submit high loads concurrently and haven’t received any errors.

// Create the workflow stub and start the workflow
            PurchaseLicenseWorkflow workflow = observer.getClient().newWorkflowStub(
                    PurchaseLicenseWorkflow.class, WorkflowOptions.newBuilder()
                            .setWorkflowId(workflowPrefix + "-" + requestId.toString())
                            .setTaskQueue(taskQueue).build()
            );
2 Likes