Hi, we are seeing DestroyWorkflowThreadError on workflow level even when all activities under the workflow is successful. This error happens regularly and we want to know the reason of it.
Here is the stack trace:
“Failed to process DailyReachAggregationWorkflow, request: [campaign_id 7feb478f-2612-4b9d-8e67-6a69b50ba4d7, hours [2020-10-29T04:00:00+08:00], zone ASIA_SHANGHAI], exception: io.temporal.internal.sync.DestroyWorkflowThreadError at io.temporal.internal.sync.WorkflowThreadContext.lambda$destroy$597e554$1(WorkflowThreadContext.java:260) at io.temporal.internal.sync.WorkflowThreadContext.mayBeEvaluate(WorkflowThreadContext.java:103) at io.temporal.internal.sync.WorkflowThreadContext.yield(WorkflowThreadContext.java:80) at io.temporal.internal.sync.WorkflowThreadImpl.yield(WorkflowThreadImpl.java:402) at io.temporal.internal.sync.WorkflowThread.await(WorkflowThread.java:45) at io.temporal.internal.sync.CompletablePromiseImpl.getImpl(CompletablePromiseImpl.java:84) at io.temporal.internal.sync.CompletablePromiseImpl.get(CompletablePromiseImpl.java:74) at io.temporal.internal.sync.ActivityStubBase.execute(ActivityStubBase.java:44) at io.temporal.internal.sync.ActivityInvocationHandler.lambda$getActivityFunc$0(ActivityInvocationHandler.java:59) at io.temporal.internal.sync.ActivityInvocationHandlerBase.invoke(ActivityInvocationHandlerBase.java:65) at com.sun.proxy.$Proxy42.aggregateDaily(Unknown Source) at snapchat.reach.aggregator.temporal.DailyReachAggregationWorkflowImpl.run(DailyReachAggregationWorkflowImpl.java:49) at snapchat.reach.aggregator.temporal.DailyReachAggregationWorkflowImpl.run(DailyReachAggregationWorkflowImpl.java:17) at jdk.internal.reflect.GeneratedMethodAccessor52.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation$RootWorkflowInboundCallsInterceptor.execute(POJOWorkflowImplementationFactory.java:289) at io.temporal.internal.sync.POJOWorkflowImplementationFactory$POJOWorkflowImplementation.execute(POJOWorkflowImplementationFactory.java:253) at io.temporal.internal.sync.WorkflowExecuteRunnable.run(WorkflowExecuteRunnable.java:52) at io.temporal.internal.sync.SyncWorkflow.lambda$start$0(SyncWorkflow.java:119) at io.temporal.internal.sync.CancellationScopeImpl.run(CancellationScopeImpl.java:104) at io.temporal.internal.sync.WorkflowThreadImpl$RunnableWrapper.run(WorkflowThreadImpl.java:107) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) ”
For the workflow worker settings, we have the following configs and all other configs are by default.
// Worker
static final int WORKER_COUNT = 1;
static final int ACTIVITY_POLL_THREAD_COUNT = 32;
static final int WORKFLOW_POLL_THREAD_COUNT = 32;
// Worker Factory
static final int WORKFLOW_HOST_LOCAL_DECISION_TIMEOUT_SECONDS = 120;
static final int WORKFLOW_HOST_LOCAL_POLL_THREAD_COUNT = 64;
static final boolean ENABLE_LOGGING_IN_REPLAY = true;
// Workflow Client
static final int CLIENT_RPC_TIMEOUT_MS = 15000;
static final int CLIENT_QUERY_RPC_TIMEOUT_MS = 15000;
We have 8 pods deployed each with 1 worker 32 poll threads. The QPS is around 20k per 15 mins but mostly coming in the first 1 minute of the 15 minute window.