Hello!
I have a workflow implementation in which we run long running workflows which does health-checks. We do heartbeating every 5 seconds and heartbeattimeout is 30 seconds. I have observed that many of my worklfows are getting closed(failed) with ActivityTimeoutException(type = heartbeat), when worker goes down(due to new deployment).
“level”:“ERROR”,“logger_name”:“com.uber.cadence.internal.sync.POJOWorkflowImplementationFactory”,“thread_name”:“workflow-root”,“message”:“Workflow execution failure WorkflowID=ea27167e-ca2f-4177-9ca5-22aba7c233d1, RunID=8e495171-bddf-43d5-85f3-58d3d8be45a9, WorkflowType=HealthCheckWorkflow::monitorHealthStatus”,“stack_trace”:“com.uber.cadence.workflow.ActivityTimeoutException: TimeoutType=HEARTBEAT, ActivityType=“HealthCheckActivityV2::startPeriodicHealthCheck”, ActivityID=“null”, EventID=8\n\tat java.base/java.lang.Thread.getStackTrace(Thread.java:1606)\n\tat com.uber.cadence.internal.sync.ActivityStubBase.execute(ActivityStubBase.java:46)\n\tat com.uber.cadence.internal.sync.ActivityStubImpl.execute(ActivityStubImpl.java:26)\n\tat com.uber.cadence.internal.sync.ActivityInvocationHandler.lambda$getActivityFunc$0(ActivityInvocationHandler.java:51)\n\tat com.uber.cadence.internal.sync.ActivityInvocationHandlerBase.invoke(ActivityInvocationHandlerBase.java:76)\n\tat com.sun.proxy.$Proxy23.startPeriodicHealthCheck(Unknown Source)\n\tat com.cloudera.cdp.healthCheck.workflow.HealthCheckWorkflowImpl.monitorHealthStatus(HealthCheckWorkflowImpl.java:101)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat com.uber.cadence.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation.execute(POJOWorkflowImplementationFactory.java:233)\n\tat com.uber.cadence.internal.sync.WorkflowRunnable.run(WorkflowRunnable.java:46)\n\tat com.uber.cadence.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:102)\n\tat com.uber.cadence.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:85)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:834)\n”
I have retries already configured for my activity.
@ActivityMethod(scheduleToCloseTimeoutSeconds = 631139040, heartbeatTimeoutSeconds = 30)
@MethodRetry(initialIntervalSeconds = 1, backoffCoefficient = 1, maximumAttempts = Integer.MAX_VALUE, maximumIntervalSeconds = 15)
void startPeriodicHealthCheck(Request monitorHealthRequest);
Please let me know how to avoid workflows getting completed with this failure.