Looking for an equivalent of lastCompletionEndTimestamp

I would like to start a workflow only if it was not completed in the last N milliseconds. Looks like I can filter workflows by start time and end time with execution filters, but how I can I get the last end timestamp of a single workflow?

WorkflowStub stb = client.newUntypedWorkflowStub("WORKFLOW_ID", Optional.empty(), Optional.of("LongRunningWorkflow"));

if (currentTime - stb.lastCompletionEndTimestamp > N ) {
   stb.start();
}
1 Like

You can use DescribeWorkflowExecution API. Its result contains workflow_execution_info.close_time field.

Another, more generic option is to not complete the workflow once its logic is done and sleep for some time before completing. Then use signal with start to either signal an open workflow or start a new one.

Hey, @maxim, I have a question regarding the solution

Another, more generic option is to not complete the workflow once its logic is done and sleep for some time before completing. Then use signal with start to either signal an open workflow or start a new one.

If we do this, how can we know outside of the workflow when the logic is actually done and the workflow is just sleeping.

Code would look something similar to

public class ChildWorkflowImpl {

  private Boolean done = false;

  // @WorkflowMethod
  public void createChild() {
    childActivity();
    childActivity2();
    childActivity3();
    done = true;
    Workflow.sleep(Duration.ofDays(1));
  }

  // @SignalMethod
  public void awaitDone() {
    Workflow.await(() -> !done);
  }
}
public class ParentWorkflowImpl {
  
  @WorkflowMethod
  public void createParent() {
    // ... some setup I am not sure how to do
    Async.procedure(childWorkflow::createChild);
    childWorkflow.awaitDone();
    // parentActivity needs the child to be done for 
    // some resources to be created in an external service
    parentActivity();
  }

}

I am able to keep the child workflows running for minutes (later will change to days), but I am not able to “connect” to them form new parent workflow runs.


If I do this setup for calling child workflow:

        ChildWorkflow workflow = Workflow.newChildWorkflowStub(ChildWorkflow.class, options);

        WorkflowServiceStubs workflowServiceStubs = WorkflowServiceStubs.newInstance();
        WorkflowClient workflowClient = WorkflowClient.newInstance(workflowServiceStubs);
        BatchRequest batchRequest = workflowClient.newSignalWithStartRequest();
        batchRequest.add(workflow::createChild);
        batchRequest.add(workflow::awaitDone);
        workflowClient.signalWithStart(batchRequest);

I successfully create the child workflows but they never let the parent workflow that they are ready.

Also… If I run again, the child workflows fail with START_CHILD_WORKFLOW_EXECUTION_FAILED_CAUSE_WORKFLOW_ALREADY_EXISTS, even though I expected it to attach to the running workflow.

Signals are fully asynchronous. So the signal sender gets confirmation immediately as soon as the signal is delivered. But it doesn’t wait for the signal processing. So your approach of blocking the signal handler is not going to work.

I’m confused why you just don’t invoke the child synchronously from the parent?

  @WorkflowMethod
  public void createParent() {
    // this blocks until the child is completed
    childWorkflow.createChild();
    parentActivity();
  }

Another, more generic option is to not complete the workflow once its logic is done and sleep for some time before completing. Then use signal with start to either signal an open workflow or start a new one.

I have the Workflow.sleep call inside the childWorkflow.createChild(). Should I do the sleep in some other place?

Even if the logic may take minutes it will take workflow to be done over a day, and you would like to know when logic is done, not the whole workflow.

In this case, the child workflow can send a signal to the parent to notify about the business logic completion.

Could you explain what are you trying to achieve from the business point of view? Why, for example, the child workflow is used at all?

That is an interesting solution, I will test it out!

We need to do this because the real use case is a little more complex. The child workflow is to create tables in BigQuery, then the parent workflow could use any number of the child tables to generate a joined table from them. So, we do not know beforehand which child tables we will use at all and if they will be used by zero or many parent tables.

The specific case is that the child tables, since can be used many times by parent tables, we don’t want them to be re calculated again and again if the data is fresh enough, so we want to re use the calculated one for up to a day and then if called after the day, it should be recalculated.

Hope this is clear, let me know if I need to explain something better. And please let me know if this design for using Temporal makes sense to you.

In this case, the child workflow can send a signal to the parent to notify about the business logic completion.

I now think this will not work because we could have the following happen:

  1. ParentWorkflow1 (PW1) starts ChildWorkflow1 (CW1) and ChildWorkflow2 (CW2)
  2. PW2 starts CW3 and finds and attaches to CW1
  3. CW1 finishes business logic and notifies PW1
  4. CW2 finishes business logic and notifies PW1
  5. PW1 realizes all dependencies are ready, so it continues with the execution
  6. CW3 finishes business logic and notifies PW2

PW2 will never know that CW1 finished.

I will try to implement busy waiting a query method of the Child workflow from a Parent activity

PD: If you think there is a better way to represent the issue in temporal concepts, please let me know!

PD: If you think there is a better way to represent the issue in temporal concepts, please let me know!

Could you explain what is your use case? As I don’t understand what you are trying to achieve it is hard to give some recommendations.

Sure thing. We have a set of queries based on external data to be ran and stored in BigQuery, let’s call them base tables, then these base tables are used to calculate what we will call a product table by joining some of the base tables together.

Since the base tables are created based on external data, they can become “stale” if a lot of time has passed between the base table being created and then used to create a product table.

So the idea is that the base tables are calculated as they are needed for product tables, that means, if the product table 1 needs base tables A and B, we calculate base tables A and B and when they are done, use them to calculate product table 1. But if right afterwards we need to calculate product table 2 which needs base tables A and B again, we don’t want to re calculate them, since that would be a waste of time, it has just passed a few minutes.

The idea then is that a base table once calculated, is considered “fresh” enough for a whole day, and after the day, we want to consider it stale and re calculate it again if it is ever needed.

Cool use case.

How many base and product tables are you targeting?

The simplest solution would be to have a single workflow that keeps staticstics about all the tables. But it assumes a bounded number of tables. This workflow would start activities and child workflows to execute refreshes.

1 Like

What we ended up doing is that we removed the child / parent idea and treated each of the workflows separately, but triggered base workflow from product workflow.

The problem with child/parent was that whenever a product workflow finished, base workflow was terminated and not let to sleep for 1 day.

Warning: java pseudo-code, trying to convey an idea only

We have a base workflow that has code like

class BaseWorkflow {

  private Boolean done = false;
  private List<String> workflowIds = new ArrayList<>();

  // @WorkflowMethod
  void calculateBaseTable(BaseTableConfig config) {
    activity.calculateBaseTable(config);
    done = true;
    for (workflowId : workflowIds) {
      ProductWorkflow productWorkflow = generateProductWorkflow(workflowId);
      productWorkflow.notifyBaseDone(Workflow.getInfo().getWorkflowId());
    }
    Workflow.sleep(Duration.ofDays(1));
    activity.deleteStaleTable(config);
  }

  // @SignalMethod
  void notifyWhenDone(String workflowId) {
    if (done) {
      ProductWorkflow productWorkflow = generateProductWorkflow(workflowId);
      productWorkflow.notifyBaseDone(Workflow.getInfo().getWorkflowId());
    } else {
      workflowIds.add(workflowId);
    }
  }
}

and a product workflow that does this:

class ProductWorkflow {

  private List<String> waitingForWorkflowIds = new ArrayList<>();

  // @WorkflowMethod
  void calculateProductTable(List<BaseTableConfig> baseTables) {
    for (baseTable : baseTables) {
      String baseWorkflowId = baseTables.getId();
      // ... code to call base workflow, calculateBaseTable and notifyWhenDone
      client.startWithSignal(...)
      waitingForWorkflowIds.add(baseWorkflowId);
    }
   
    Workflow.await(() -> waitingForWorkflowIds.isEmpty());
    // continue using it
  }

  // @SignalMethod
  void notifyBaseDone(String workflowId) {
    waitingForWorkflowIds.remove(workflowId);
  }

}

So we keep the n-to-n relationship by having a list on each workflow.

Any comment is welcome, and if something is not clear in the code, let me know.

It is possible to start a child in the abandoned mode. But in your case of an n-to-n relationship it doesn’t make sense to use child workflows. So your current approach is fine.

1 Like