Timeouts on local host specific task queue

nithin · October 26, 2020, 8:09pm

I am seeing timeouts happening regularly on tasks that are scheduled on the local host task queue (not the shared task queue).

This happens every time a task is scheduled to to local task queue. After the timeout happens, the task is then scheduled into the shared task queue and it gets completed successfully. Any idea why this is happening? Is there anything i need to configure so that the workers pick tasks from this local task queue?

maxim · October 26, 2020, 8:23pm

The type of the timeout type is StartToClose. It means that it happened after the workflow task was picked up by a worker. It looks like the worker fails to process the task. Do you see any errors in the worker logs?

nithin · October 27, 2020, 8:10am

I do see the following errors in the logs, any idea what could be causing them? I am running temporal-sdk 1.0.0

[2020-10-26T19:04:22,152] ERROR [Host Local Workflow Poller: 5] io.temporal.internal.worker.Poller: Failure in thread Host Local Workflow Poller: 5 []
    java.lang.NullPointerException: null
    	at io.temporal.internal.worker.WorkflowWorker.apply(WorkflowWorker.java:272) ~[service-1.0.jar:?]
    	at io.temporal.internal.sync.SyncWorkflowWorker.apply(SyncWorkflowWorker.java:215) ~[service-1.0.jar:?]
    	at io.temporal.internal.sync.SyncWorkflowWorker.apply(SyncWorkflowWorker.java:51) ~[service-1.0.jar:?]
    	at io.temporal.internal.worker.PollWorkflowTaskDispatcher.process(PollWorkflowTaskDispatcher.java:75) ~[service-1.0.jar:?]
    	at io.temporal.internal.worker.PollWorkflowTaskDispatcher.process(PollWorkflowTaskDispatcher.java:40) ~[service-1.0.jar:?]
    	at io.temporal.internal.worker.Poller$PollExecutionTask.run(Poller.java:277) ~[service-1.0.jar:?]
    	at io.temporal.internal.worker.Poller$PollLoopTask.run(Poller.java:242) [service-1.0.jar:?]
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    	at java.lang.Thread.run(Thread.java:834) [?:?]

nithin · October 27, 2020, 11:00pm

@maxim any idea what could be causing this?
Looking at the code where it throws NPE, it seems like pollTaskExecutor is null in this case, not sure why.

maxim · October 28, 2020, 12:35am

This exception is certainly not expected. Do you see it reported constantly or only on startup? I also don’t think that it is related to the original problem of workflow task timeout.

nithin · October 29, 2020, 5:52pm

I was not able to find any other error logs beside this. I do see the following request failure metric which i am guessing is related.

nithin · October 30, 2020, 9:03pm

@maxim any other suggestions on how to debug this?

nithin · November 2, 2020, 9:03pm

After some more debugging it looks like this was caused because i had separate workers for activity and workflow tasks both listening to same task queue. I updated the workers to now register for both workflow and activity tasks (with single queue) and i dont see this error anymore.

@maxim If we have separate workers for activity and workflow tasks should they not be sharing the same task queue?

maxim · November 2, 2020, 9:46pm

Interesting. Let us look into this. It should work fine.

There are two completely independent task queue types for workflow and activity tasks. So it is OK to have them with the same name.

A Worker internally contains two task queue consumers. One for activities and another workflows. If no workflows or activities registered the corresponding consumer is not created.

nithin · November 2, 2020, 10:08pm

As you can see after I made the change to combine the workers, the start_to_close timeouts went away. I am still seeing few schedule_to_start timeouts, not sure if this is related to the sticky queue, still looking into them.

Vitaly · November 6, 2020, 1:56am

I wasn’t able to reproduce this issue locally. I’ve split one of existing workers that’s been handling both activity and workflow tasks into two separate ones - one that handles activities only and the other one that handles workflows only and I didn’t see any exceptions.
There might be something special in your setup, can you share with us the way you setup the worker?

nithin · November 9, 2020, 7:17pm

@Vitaly Here is a sample of my code

private void setupFeedUploadWorkflow() {
    final WorkflowConfiguration config = // get config;
    logger.info("Setting up feed workflow workers with following config : {}", config);

    // register worker for the main workflow
    setupWorkflowWorker(FeedWorkflowImpl.class,
                        workflowTaskQueueName,
                        WorkerOptions.newBuilder()
                                .setMaxConcurrentWorkflowTaskExecutionSize(config.getWorkflowMaxConcurrentExecutionSize())
                                .setWorkflowPollThreadCount(config.getWorkflowPollThreadCount())
                                .build());

    // register workflow for general feed activities
    setupActivityWorker(activities,
                        workflowTaskQueueName, // TODO consider moving activities to new task queue
                        WorkerOptions.newBuilder()
                                .setMaxConcurrentActivityExecutionSize(config.getActivityMaxConcurrentExecutionSize())
                                .setActivityPollThreadCount(config.getActivityPollThreadCount())
                                .build());
}

/**
 * Workflows are stateful. So you need a type to create instances.
 */
private void setupWorkflowWorker(final Class<?> workflow,
                                 final String taskQueueName,
                                 final WorkerOptions options) {
    final Worker worker = workerFactory.newWorker(taskQueueName, options);
    worker.registerWorkflowImplementationTypes(workflow);
}

/**
 * Activities are stateless and thread safe. So a shared instance is used.
 */
private void setupActivityWorker(final Object activity,
                                 final String taskQueueName,
                                 final WorkerOptions options) {
    final Worker worker = workerFactory.newWorker(taskQueueName, options);
    worker.registerActivitiesImplementations(activity);
}

Vitaly · November 10, 2020, 7:03am

Thanks for your example, I was able to reproduce it now.

Main difference between your code and what I’ve been doing is that you run both workers in the same java process, creating them using same factory and I ran them in separate processes.

I think I have identified a source of the problem and it is related to the way we setup sticky task poller and register workflow workers to the dispatcher.

Here is my attempt to address the issue:

It should soon be available when we roll out next SDK release.

nithin · December 10, 2020, 9:21pm

@Vitaly Look like there has not been another release after 1.0.3, could you let me know when a version with this fix will be released?

Vitaly · December 11, 2020, 6:37am

Correct, we plan to release 1.0.4 within a week or two.

nithin · December 11, 2020, 6:58am

Sounds good, in the interim is there any way i can get around this? you mentioned you tried creating workers on separate threads, would you have an example?

Right now i am using the same task queue for both workflow and activity workers , i am planning on making some change to have little more control on my task queues.

Vitaly · December 11, 2020, 8:40am

If you use two different main functions and run two java processes with one worker in each it should work fine. You could also try using two different factories and creating each worker with a separate factory, but I haven’t tested if that would work with 1.0.3. Alternatively if this is not for production use, you could try using 1.0.3-SNAPSHOT, which should have this bug fixed.

Topic		Replies	Views
Continuous frontend connection timeout Community Support java-sdk , mysql	1	2239	March 25, 2021
WorkflowTaskTimedOut when testing performance Community Support timeout	6	3155	September 8, 2024
Workflow task timeout after activity is completed Community Support go-sdk	5	54	April 21, 2025
WorkflowTaskFailed: Failure handling event 5 of 'EVENT TYPE ACTIVITY TASK SCHEDULED' Community Support	3	2642	March 16, 2021
Temporal activity timeout issue Community Support	4	1533	December 18, 2020

Timeouts on local host specific task queue

Related topics