So I have a workflow with an activity that sometimes will always fail. I want to give that activity a grace period to succeed, and at the end of that grace period if it hasn’t succeeded yet I want to continue the workflow anyway. This is a short activity, so I’m using a ScheduleToClose to set the grace period. StartToCloseTimeout is set short as the activity will ether complete (pass/fail) within a minute, or it got hung up and needs to be retried.
In the tests for my workflow I’m trying to create a test case that emulates retrying until we run out of ScheduleToClose time. I’m having a real hard time figuring out what kind of error to return from the mocked Activity. I can’t figure out how to use time skipping to get the error raised, and instead I’m trying to guess on how to construct an error, that I then can catch in my workflow code. I keep getting it wrong and in real life I’m not correctly catching the timeout and my workflow is erroring out.
Example workflow code:
readyForRebootCtx := workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
StartToCloseTimeout: 60 * time.Second,
ScheduleToCloseTimeout: 15 * time.Minute,
})
if err := workflow.ExecuteActivity(readyForRebootCtx, a.LifecycleReadyForRebootSsh, destroyInstanceReq.Hostname).Get(ctx, &lifecycleReadyForRebootResp); err != nil {
// Timeout error here would only be ScheduleToClose as StartToClose is retryable.
// Any other error is logged and returned
if !temporal.IsTimeoutError(err) {
logger.Error("failed to run lifecycle ready for reboot", err)
return err
}
logger.Info("lifecycle ready for reboot reached ReadyToDestroyTimeout... moving on to instance destruction", "hostname", destroyInstanceReq.Hostname)
}
Test code
testSuite := &testsuite.WorkflowTestSuite{}
env := testSuite.NewTestWorkflowEnvironment()
var a *activity.Activities
env.OnActivity(a.LifecycleReadyForRebootSsh, mock.Anything, mock.Anything).Return(
&static.LifecycleResp{
StdOut: "",
StdErr: "NOTOKAY",
Rc: 1,
Hostname: hostname,
LifecycleScript: static.LifecycleReadyForReboot,
}, temporal.NewTimeoutError(3, fmt.Errorf("lifecycle ready for reboot failed")))
env.ExecuteWorkflow(DestroyInstanceWorkflow, workflowoptions.DestroyInstanceWorkflowOptions{Hostname: hostname, ScaleSetID: 1, ExplicitDestroy: false})
s.True(env.IsWorkflowCompleted())
s.NoError(env.GetWorkflowError())
The tests pass, but in real life what I’m seeing is ActivityTaskFailed.
So, is there a way to use TestWorkflowEnvironment to simulate the timeout so that it returns the error to the workflow like it would in real life, and my workflow code can catch the right thing?