Long running workflow with signal and retries

I have a workflow which sends an http requests via an activity and then waits for another http request which acts as a response of first request.
I am achieving this with signal. (the body of second http request is sent as a signal parameter to the workflow).

workflow awaits the response for n seconds and if response is not received in given time then workflow should restart immediately. I am achieving this with Retryable failure. (if signal is not received in time code throws retryable exception).

and if signal is received in time then workflow should restart after m minutes. Keeping the event history limit of 50k in mind i am using continueAsNew to start new instance after m minutes of sleep.

The issue i am facing is that the continueAsNew does not inherit retry policy of current workflow and thus after 1st successful run new workflow does not retry on failure of not receiving the response.

I’ve considered child workflows as an option too but eventually parent workflows event history will be reached anyways and also considering the overhead of invoking child workflows I am trying to avoid that option unless there is no other way.

I could also do continue as new instead of throwing a retryable exception but that will affect the visibility of status as both success and failure cases will show up as continuedAsNew, and that will also be an overuse of continueAsNew which I would like to avoid as it is not recommended.

Workflow failures are not recommended, but since the use case spans over multiple activities i don’t think i can use activity retry policies thus relying on workflow retry policies.

  • Keeping above points in consideration, I would like to know what will be the best approach to achieve this use case.
  • Also need confirmation on whether not inheriting retry policy is a conscious design decision or it could be a bug.

sdk version: 1.13.0

public void workflowMethod() {
  activity.sendRequest();//first outgoing httpReqest sent
  await(ofSeconds(n), () -> response != null);
  if (response == null) {
    throw ApplicationFailure.newFailure("Response not received in time", RuntimeException.class.getTypeName(), null);
  }
  activity.onResponseReceived(response);
  Workflow.sleep(ofMinutes(m));
  Workflow.continueAsNew();
}

@Override
public void onResponseReceived(Object response) {//body of second incoming http request
  log.info("response received.");
  this.response = response;
}

Hi @Pushkar_Desai

continueAs should inheritance the workflow options, I am going to check it to ensure that it is working as expected

Hi @Pushkar_Desai

Another possible implementation:

You could store the status of both, success and failure cases in an object. Pass this object as a parameter to new workflow instance (with continueAs) and have a query method to retrieve the status from a client.

And instead of continueAs on every iteration, you could have a loop and continueAs when your counter reaches, let’s say 5k iterations. Having in mind that each iteration involves more than one event (ActivityTaskScheduled, ActivityTaskStarted, TimerStarted…)

Would something like that work for you?

@Override
public void workflowMethod(RequestStatus status) {

   this.requestStatus = status;

   ActivityContinueAs activity = Workflow.newActivityStub(ActivityContinueAsImpl.class);

   for (int i = 0; i < MAX_ITERATIONS; i++) {

       activity.sendRequest();
       Workflow.await(ofSeconds(n), () -> response != null);
       if (response == null) {
           this.requestStatus.recordResponseNotReceived();

           continue;
           //throw ApplicationFailure.newFailure("Response not received in time", RuntimeException.class.getTypeName(), null);
       }

       this.requestStatus.recordResponseReceived();
       activity.onResponseReceived(response);
       Workflow.sleep(ofMinutes(m));
   }

   Workflow.continueAsNew(this.requestStatus);

}

Thanks @antonio.perez for the reply.
This solution will work. Passing result to next workflow invocation seems great approach.
But i would still like to know if we can use continueAsNew to achieve this functionality. as with exception we will get visibility on the temporal ui as well.

On the other hand having for loop to check event count makes code little difficult to read and also seems to mix implementation details with business logic. I would like to avoid that if possible.

I am going ahead with this solution, but it will be great if we get update on why countiueAsNew is not inheriting caller workflow retry policies.
on a side note, it will help if we could get event count as a temporal sdk function. that will help optimise this solution further

Hi @Pushkar_Desai

I have checked this with the team, there is an open issue for continueAsNew, I think that it is related to the problem you are facing Revisit continue-as-new that doesn't carry over some attributes of parent run if not set explicitly · Issue #1201 · temporalio/sdk-java · GitHub

Agree with you, but right now the SDKs does not provide it but is coming Provide WorkflowInfo#getHistoryLength() · Issue #1300 · temporalio/sdk-java · GitHub

1 Like

Thanks @antonio.perez,
going ahead with earlier shared solution as of now. Will keep tab on open issues.
cheers!!