Temporal Workflow DestroyWorkflowThreadError

Hi, we are seeing DestroyWorkflowThreadError on workflow level even when all activities under the workflow is successful. This error happens regularly and we want to know the reason of it.

Here is the stack trace:
“Failed to process DailyReachAggregationWorkflow, request: [campaign_id 7feb478f-2612-4b9d-8e67-6a69b50ba4d7, hours [2020-10-29T04:00:00+08:00], zone ASIA_SHANGHAI], exception: io.temporal.internal.sync.DestroyWorkflowThreadError at io.temporal.internal.sync.WorkflowThreadContext.lambda$destroy$597e554$1(WorkflowThreadContext.java:260) at io.temporal.internal.sync.WorkflowThreadContext.mayBeEvaluate(WorkflowThreadContext.java:103) at io.temporal.internal.sync.WorkflowThreadContext.yield(WorkflowThreadContext.java:80) at io.temporal.internal.sync.WorkflowThreadImpl.yield(WorkflowThreadImpl.java:402) at io.temporal.internal.sync.WorkflowThread.await(WorkflowThread.java:45) at io.temporal.internal.sync.CompletablePromiseImpl.getImpl(CompletablePromiseImpl.java:84) at io.temporal.internal.sync.CompletablePromiseImpl.get(CompletablePromiseImpl.java:74) at io.temporal.internal.sync.ActivityStubBase.execute(ActivityStubBase.java:44) at io.temporal.internal.sync.ActivityInvocationHandler.lambda$getActivityFunc$0(ActivityInvocationHandler.java:59) at io.temporal.internal.sync.ActivityInvocationHandlerBase.invoke(ActivityInvocationHandlerBase.java:65) at com.sun.proxy.$Proxy42.aggregateDaily(Unknown Source) at snapchat.reach.aggregator.temporal.DailyReachAggregationWorkflowImpl.run(DailyReachAggregationWorkflowImpl.java:49) at snapchat.reach.aggregator.temporal.DailyReachAggregationWorkflowImpl.run(DailyReachAggregationWorkflowImpl.java:17) at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation$RootWorkflowInboundCallsInterceptor.execute(POJOWorkflowImplementationFactory.java:289) at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation.execute(POJOWorkflowImplementationFactory.java:253) at io.temporal.internal.sync.WorkflowExecuteRunnable.run(WorkflowExecuteRunnable.java:52) at io.temporal.internal.sync.SyncWorkflow.lambda$start$0(SyncWorkflow.java:119) at io.temporal.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:104) at io.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:107) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) ”

For the workflow worker settings, we have the following configs and all other configs are by default.
// Worker
static final int WORKER_COUNT = 1;
static final int ACTIVITY_POLL_THREAD_COUNT = 32;
static final int WORKFLOW_POLL_THREAD_COUNT = 32;

// Worker Factory
static final int WORKFLOW_HOST_LOCAL_DECISION_TIMEOUT_SECONDS = 120;
static final int WORKFLOW_HOST_LOCAL_POLL_THREAD_COUNT = 64;
static final boolean ENABLE_LOGGING_IN_REPLAY = true;

// Workflow Client
static final int CLIENT_RPC_TIMEOUT_MS = 15000;
static final int CLIENT_QUERY_RPC_TIMEOUT_MS = 15000;

We have 8 pods deployed each with 1 worker 32 poll threads. The QPS is around 20k per 15 mins but mostly coming in the first 1 minute of the 15 minute window.

DestroyWorkflowThreadError is used to release threads that workflow holds back to the pool when a workflow is removed from a worker cache.

Never catch Throwable or Error in the workflow code to ensure the correct execution of the SDK logic.

2 Likes

@maxim are you saying we should not do something like this in workflow?

try {
    state = run;
    activity.run();
} catch (Exception e) {
    state = fail;
    throw e;
}

Then whats the recommended way to handle activity exceptions? Or this only applies to error/throwable?

This example, the workflow catches exception

No, it is OK to do this as you are catching an Exception, not Throwable or Error.

Thanks a lot @maxim! This helps!