Sdk-side retry handling

Hi Team,

Before starting execution on the temporal server side, our Java SDK may encounter exceptions. We need a comprehensive list of these exceptions so that we can determine if they are eligible for SDK-side retry attempts, taking into account business logic. Any additional information in this regard would be highly appreciated.

So far, we have observed three main exceptions:

  • WorkflowExecutionAlreadyStarted: should not be retried (based on our business logic)

  • WorkflowNotFoundException: Also driven by our business logic, this exception should not be retried.

  • io.grpc.StatusRuntimeException: This exception is more generic and encompasses various scenarios with different message information. We have observed the following categories so far:

    (1) DEADLINE_EXCEEDED: When the deadline exceeds after 9.999961238 seconds.

    (2) NOT_FOUND: Namespace “xxx” is not found.

    (3) INVALID_ARGUMENT: Namespace “xxx” has no mapping defined for the search attribute “xxx”.

    (4) RESOURCE_EXHAUSTED: namespace rate limit exceeded. → this situation is eligible for retry attempts.

Questions:

  • How does Temporal map internal exceptions to grpc exceptions? Is this fair to use the status code (NOT_FOUND / RESOURCE_EXHAUSTED) of the grpc exception to determine whether they should be retried on sdk side?
  • Is there a comprehensive list of exception I can take a closer look? Thanks!