I would like to start a workflow only if it was not completed in the last N milliseconds. Looks like I can filter workflows by start time and end time with execution filters, but how I can I get the last end timestamp of a single workflow?
WorkflowStub stb = client.newUntypedWorkflowStub("WORKFLOW_ID", Optional.empty(), Optional.of("LongRunningWorkflow"));
if (currentTime - stb.lastCompletionEndTimestamp > N ) {
stb.start();
}
You can use DescribeWorkflowExecution API. Its result contains workflow_execution_info.close_time field.
Another, more generic option is to not complete the workflow once its logic is done and sleep for some time before completing. Then use signal with start to either signal an open workflow or start a new one.
Hey, @maxim, I have a question regarding the solution
Another, more generic option is to not complete the workflow once its logic is done and sleep for some time before completing. Then use signal with start to either signal an open workflow or start a new one.
If we do this, how can we know outside of the workflow when the logic is actually done and the workflow is just sleeping.
Code would look something similar to
public class ChildWorkflowImpl {
private Boolean done = false;
// @WorkflowMethod
public void createChild() {
childActivity();
childActivity2();
childActivity3();
done = true;
Workflow.sleep(Duration.ofDays(1));
}
// @SignalMethod
public void awaitDone() {
Workflow.await(() -> !done);
}
}
public class ParentWorkflowImpl {
@WorkflowMethod
public void createParent() {
// ... some setup I am not sure how to do
Async.procedure(childWorkflow::createChild);
childWorkflow.awaitDone();
// parentActivity needs the child to be done for
// some resources to be created in an external service
parentActivity();
}
}
I am able to keep the child workflows running for minutes (later will change to days), but I am not able to “connect” to them form new parent workflow runs.
Also… If I run again, the child workflows fail with START_CHILD_WORKFLOW_EXECUTION_FAILED_CAUSE_WORKFLOW_ALREADY_EXISTS, even though I expected it to attach to the running workflow.
Signals are fully asynchronous. So the signal sender gets confirmation immediately as soon as the signal is delivered. But it doesn’t wait for the signal processing. So your approach of blocking the signal handler is not going to work.
I’m confused why you just don’t invoke the child synchronously from the parent?
@WorkflowMethod
public void createParent() {
// this blocks until the child is completed
childWorkflow.createChild();
parentActivity();
}
Another, more generic option is to not complete the workflow once its logic is done and sleep for some time before completing. Then use signal with start to either signal an open workflow or start a new one.
I have the Workflow.sleep call inside the childWorkflow.createChild(). Should I do the sleep in some other place?
Even if the logic may take minutes it will take workflow to be done over a day, and you would like to know when logic is done, not the whole workflow.
That is an interesting solution, I will test it out!
We need to do this because the real use case is a little more complex. The child workflow is to create tables in BigQuery, then the parent workflow could use any number of the child tables to generate a joined table from them. So, we do not know beforehand which child tables we will use at all and if they will be used by zero or many parent tables.
The specific case is that the child tables, since can be used many times by parent tables, we don’t want them to be re calculated again and again if the data is fresh enough, so we want to re use the calculated one for up to a day and then if called after the day, it should be recalculated.
Hope this is clear, let me know if I need to explain something better. And please let me know if this design for using Temporal makes sense to you.
Sure thing. We have a set of queries based on external data to be ran and stored in BigQuery, let’s call them base tables, then these base tables are used to calculate what we will call a product table by joining some of the base tables together.
Since the base tables are created based on external data, they can become “stale” if a lot of time has passed between the base table being created and then used to create a product table.
So the idea is that the base tables are calculated as they are needed for product tables, that means, if the product table 1 needs base tables A and B, we calculate base tables A and B and when they are done, use them to calculate product table 1. But if right afterwards we need to calculate product table 2 which needs base tables A and B again, we don’t want to re calculate them, since that would be a waste of time, it has just passed a few minutes.
The idea then is that a base table once calculated, is considered “fresh” enough for a whole day, and after the day, we want to consider it stale and re calculate it again if it is ever needed.
How many base and product tables are you targeting?
The simplest solution would be to have a single workflow that keeps staticstics about all the tables. But it assumes a bounded number of tables. This workflow would start activities and child workflows to execute refreshes.
What we ended up doing is that we removed the child / parent idea and treated each of the workflows separately, but triggered base workflow from product workflow.
The problem with child/parent was that whenever a product workflow finished, base workflow was terminated and not let to sleep for 1 day.
Warning: java pseudo-code, trying to convey an idea only
It is possible to start a child in the abandoned mode. But in your case of an n-to-n relationship it doesn’t make sense to use child workflows. So your current approach is fine.