Sometime worker is not able to find the running workflow by workflowId

This method is used to find the running workflow by a workflow Id. Worker pods runs in kubernetes cluster. Sometime this function does not return the running workflow by workflowId but most of the time it works.

  ListOpenWorkflowExecutionsRequest listOpenWorkflowExecutionsRequest = ListOpenWorkflowExecutionsRequest.newBuilder().setNamespace(this.client.getOptions().getNamespace()).setExecutionFilter(WorkflowExecutionFilter.newBuilder().setWorkflowId(workflowId).build()).build();

            try {
                return this.getWorkflowClient().getWorkflowServiceStubs().blockingStub().listOpenWorkflowExecutions(listOpenWorkflowExecutionsRequest);
            } catch (StatusRuntimeException var5) {
               
            }

Sometime this function does not return the running workflow by workflowId but most of the time it works.

In those cases that it doesn’t, are you getting some errors?
All visibility “List” apis are eventually consistent, meaning information is “near real time” that could have a delay of 2+ seconds depending on your server load. So it could happen that if you call ListOpenWorkflowExecutions right after an server starts an execution, it would not show in results right away.

Not getting any error . If we retry sometime it returns the running workflow in 5/10 seconds. Sometime it does not return even after 20 seconds. All the scenarios are tested on a running server. Could the issue due to high query response time from DB when the load is high? We use postgresql.

You can use the server “task_latency_queue_bucket” metric to measure visibility task end-to-end latencies, sample query:

histogram_quantile($percentile, sum(rate(task_latency_queue_bucket{operation=~“VisibilityTask.*“}[1m])) by (operation, le))

If you have Elasticsearch enabled you can use “visibility_persistence_latency_bucket” metric to track latencies of pushing data to ES:

sum(rate(visibility_persistence_latency_bucket{visibility_type=“advanced_visibility”}[1m])) by (operation, le))

You should not rely on the List apis for business decisions because as mentioned it is eventually consistent. You could use apis like DescribeWorkflowExecution or GetWorkflowExecutionHistory which hit your primary db and should not have a latency, but if done on a high scale/rate could impact your server performance.

Is there any sample using java SDK for DescribeWorkflowExecution or GetWorkflowExecutionHistory ?

I meant that we need to know how to form the DescribeWorkflowExecutionRequest object before using it in an API call .

@santosh1990

          DescribeWorkflowExecutionRequest req = DescribeWorkflowExecutionRequest.newBuilder()
                .setExecution(WorkflowExecution.newBuilder()
                        .setWorkflowId("<workflow_id")
                        // optional, if not specified, its either the currently running exec
                        // or the latest completed one
                        .setRunId("<run_id>")
                        .build())
                .setNamespace(client.getOptions().getNamespace())
                .build();

        DescribeWorkflowExecutionResponse res = service.blockingStub().describeWorkflowExecution(req);
        // ...
1 Like

You can see many of these apis and util methods that might be helpful in our java sdk workshop repo here.

1 Like

Many thanks