Is there a way to support callbacks when the async workflow is complete in Java?

Hi,

After the async workflow execution is complete, I want to post metrics to another service. The metrics is a combination of the workflow info (start, close, execution duration) + workflow internal query state.

I’m trying to figure out what the best way to do this. Is there a mechanism that supports that?

Option 1
One option that I can think of is to create an activity that gets executed at the end of the workflow. This activity can report the workflow’s internal query state. However, with this approach, it’s not clear to me how I would get the (workflow execution start, close, execution duration) other then executing the following call in the activity

      DescribeWorkflowExecutionRequest request =
          DescribeWorkflowExecutionRequest.newBuilder()
              .setNamespace("default")
              .setExecution(WorkflowExecution.newBuilder().setWorkflowId(workflowId.toString()))
              .build();
      DescribeWorkflowExecutionResponse response = stub.describeWorkflowExecution(request);

Option 2
The other option is that I could just get Workflow.getInfo.getRunStartedTimestampMillis() which I think is the start time and base on that I can get the current time which would be the close out time and subtract the two to get the execution duration.

However, with this approach I feel like the times could be a bit inaccurate since

  • it’s using the worker machine time vs. the temporal server’s time
  • There is a comment on the getRunStartedTimestampMillis() which states it can diff then the actual started execution time. Curious on why that would be the case?
  /**
   * The time workflow run has started. Note that this time can be different from the time workflow
   * function started actual execution.
   */

Option 3
Is there a way to trigger a callback once the async workflow completes? In the worker service, I could then just trigger a call to get the data and send it to the external service. I’m guessing this is probably not recommended since you want to do everything within the context of a workflow.

I also realize that setting up elastic search could prob solve this problem as well. Let’s say that I need a workaround before that is configured and setup properly.

Thanks,
Derek

I would go with an activity executed at the end of the workflow. Then I would use Workflow.currentTimeMillis() - Workflow.getInfo().getRunStartedTimestampMillis() to measure the workflow execution latency. Both APIs return workflow time that comes from the service. So worker clock skew is not going to affect the result.

  • There is a comment on the getRunStartedTimestampMillis() which states it can diff then the actual started execution time. Curious on why that would be the case?

If all workflow workers are down then the actual workflow execution is not going to happen until they are back. That’s why Workflow.getInfo().getRunStartedTimestampMillis() can return different time than Workflow.currentTimeMillis() at the beginning of the workflow method.

Thanks!

I also want to report the status of the workflow up till I publish the metrics. I’m not sure if this is accurate, but I"m guessing if it gets to this point then the workflow is pretty much complete so the status here is complete.

However, I’m trying to figure out how I can tell if any of the other workflowExecution status where triggered

    WORKFLOW_EXECUTION_STATUS_FAILED(3),
    WORKFLOW_EXECUTION_STATUS_CANCELED(4),
    WORKFLOW_EXECUTION_STATUS_TERMINATED(5),
    WORKFLOW_EXECUTION_STATUS_TIMED_OUT(7),

What I did was within the workflow scope have a try catch to catch any exception with a final block that would execute the last activity. I was hoping that there would be some indication that of the other status. Fro example, if a WorkflowException or a TemporalException is thrown then that would imply WORKFLOW_EXECUTION_STATUS_FAILED. However, Its not clear if that would be true for WORKFLOW_EXECUTION_STATUS_CANCELED, WORKFLOW_EXECUTION_STATUS_TERMINATED, WORKFLOW_EXECUTION_STATUS_TIMED_OUT.

I’m guessing this may not work, and wondered if you could give me any insights?
Thanks,
Derek

You are correct. At this point, the application code is not notified in case of termination and time out. By definition, both of them kill workflow without giving it any chance to perform the cleanup. In a normally operating system, both of those are not really used. The termination and timeout metrics are reported by the service. So you could use those instead.