Support for setting custom ActivityId in Java SDK

Understand that similar questions have been asked already, but I am creating this thread to explain my use case, and get the inputs.

Implementation:

  1. I have an activity calling a remote service, and the remote service will respond via Pubsub (it could take at least an hour to receive the message).
  2. The requirement is to retry the activity if the message is not received within one hour. Therefore, I will be setting startToClose timeout to 1h with multiple retries. We should also be able to retry the activity in case of receiving retryable failure message on PubSub
  3. To leverage temporal’s retry mechanism, it has been decided to use ActivityCompletionClient so that 1h timeout is included while we wait for the Pubsub message.
  4. Remote service is idempotent in nature i.e. if same ID is passed, it is expected to behave the same.
  5. The Pubsub will send us a success/fail message using the ID. Some failures are retryable.
  6. When the Pubsub message is received, we wanted to parse the ID (which is the combination of workflowId_activityId), determine the ActivityCompletionClient to mark is complete or fail.

Problem:
In order to adhere to the idempotent nature of our remote service, we should be able to call the service with the same ID. But the problem is that on Activity retries, we get a different activityId everytime and thus we are losing the idempotency of the remote service.

If there is an option to setActivityId in Java-SDK, we can make sure that the ID is always the same across Activity retries.

What are my options to efficiently implement this?

PS: We looked at Workflow Signals, but we don’t want to take that path. Because with signals, the activity (which calls the remote service) has already been completed. So, in case of errors, we might have to implement the manual retry of the activity or Worflow.reset which we want to avoid.

We don’t recommend using manual activity completion for such use cases. The problem is that if the activity task is lost before the remote service call succeeds, then the retry will come only after the 1-hour timeout. But usually, you want to retry that call as fast as possible. The recommended approach is to use an activity to send the initial request and then signal workflow to process the reply,.

Thanks @maxim for your prompt response.

Few follow up questions.

  1. If long timeout is the reason for the recommendation to not have ActivityCompletionClient (for this use-case), then we can design it to be shorter timeout. Please let me know if there are any other reasons for your recommendation.
  2. In anyway, we believe the support for setActivityId (in Java SDK) might resolve these sort of async issues. Especially due to it’s availability in go-sdk, are there any plans to add support for it in near future?
  3. By saying “activity task could be lost”, do you mean if the Worker gets crashed while waiting at doNotCompleteOnReturn, the Temporal loses the activity and cannot mark it as complete or fail on another worker? Is it a correct understanding?
  4. With respect to workflow signals, if we need to process a failure message from Pubsub channel, are there any best practises to retry an activity manually? Could you please point me to any code samples? I saw some docs on Workflow.retry but wasn’t sure if that’s the only option or any other options available for us to try out.
  1. You said the async operation can take up to one hour. Therefore, you cannot set a timeout shorter than that.
  2. We don’t want to support this to avoid precisely this type of incorrect usage.
  3. Correct.
  4. Workflow.retry is the way to go.

Thanks @maxim

We are leaning towards implementing Workflow signals with custom retries.

I tried Workflow.retry, and it worked.

We noticed that invoking the ActivityMethod without using Workflow.retry has worked too. For example, below code has worked without any issues.

do {
   activity.CreateTransaction();

   // Workflow.retry(retryOptions, Optional.Empty(), () -> activity.CreateTransaction());

   Workflow.await(() -> hasReceivedSignal);
   if (failedSignalResponse) {
        attempt++;
   } else {
        return transactionData;
   }
} while (attempts <= 3)

We have also confirmed that if a Worker crashes while waiting for a signal, the other worker is able to pick up retaining the attempt count. So, that’s good.

Just want to understand what are the advantages of using Workflow.retry?

Workflow.retry calls the lambda in a loop according to the retry options. You implemented the retry loop using a while loop. This also works if you want to implement the retry logic and don’t care about the retry options.