Alternatives: Verify Workflow status before calling Workflow.getResult

I have a flow where the client initiate a new workflow but it doesn’t wait for the result in-line. Instead it will do some other stuff and come back later. For example:

WorkflowClient workflowClient = ...;
WorkflowOptions options = ...;
var workflowInstance =
        workflowClient.newWorkflowStub(MyWorkflow.class, options);
WorkflowExecution workflowExecution = WorkflowClient.start(workflowInstance::execute, request);

When that moment arrives, it will need to check workflow status and based on that, get the result or continue doing some other stuff (the idea is not to park the thread). I am using versión 1.14.0 of temporal.io/sdk-java and I came across this snippet code I am using:

WorkflowClient workflowClient = ...;
WorkflowExecution workflowExecution = ...;
var stub = workflowClient.newUntypedWorkflowStub(execution.getId(), Optional.of(execution.getRunId()), Optional.empty());
Result result = stub.getResult(1L, TimeUnit.SECONDS, Result.class);

The thing is that I see some DEADLINE_EXCEEDED and UNAVAILABLE errors when that 1 second is reached since the workflow execution is still in progress.

Looking at this response. I can:

  1. Use a query feature to retried a custom workflow internal state. I will then, based on this state, call the getResult.
  2. Continue as I am doing, and discarding the DEADLINE_EXCEEDED and UNAVAILABLE errores, considering them it just part of the business logic.
  3. Use DescribeWorkflowExecution API in replace of the custom Query of point 1, and based on workflow Status call the getResult.

I find option 1 a little bit of unnecessary since I can use DescribeWorkflowExecution API to achieve the same without adding a new value to store to Temporal.
Option 2 seems risky, I might end up considering unwanted errors to be treated as expected and metrics become dirty.
Option 3 seems the best approach, I will probably have 5 to 10 attempts of describe an in-progress workflow execution before it actually ends and I can get the result. And since this is a heavy used API I might end-up with more than 1000 request per minute at peak.

What is the overhead of these solutions?

Thanks!

I see some DEADLINE_EXCEEDED and UNAVAILABLE errors when that 1 second is reached

Can you show details on these errors? Could you also check frontend service logs when this happens?

Do you see any service errors, via metrics? Sample grafana query:
sum(rate(service_error_with_type{service_type="frontend"}[5m])) by (error_type)
Latencies (by operation):
histogram_quantile(0.95, sum(rate(service_latency_bucket{service_type="frontend"}[5m])) by (operation, le))

Maybe another option could be to useWorkflowStub.getResultAsync for you? It returns CompletableFuture which you can complete at a later point as well.

Hard to check, I’m using the docker compose solution :frowning: I did see those DEADLINE_EXCEEDED and some UNAVAILABLE on the client metrics and also logs. I didn’t find a way to check server logs.

I’m quite certain that those DEADLINE_EXCEEDED matched when the timeout triggered, the other I can’t tell.

I did attempt the use of WorkflowStub.getResultAsync, since I’m using spring-webflux as the application client code. Nevertheless, the issue persists, since I’m implementing WorkflowStub.getResultAsync as a way to check if the workflow finished or not, if not, my app will tell the REST customer to wait for about 1 second and fire-up another REST call. Somehow like a poll mechanism to check if the workflow finished and get the corresponding result.

I don’t know Temporal internals to understand if it’s better for this purpose to use first a DescribeWorkflowExecution just to see if the workflow completed or not, and based on that, get the corresponding result instead of waiting for the result for a really short period of time.