Worker Configuration

mike · February 9, 2021, 6:59am

We are deploying a worker that host an activity whose MaxConcurrentWorkflowTaskPollers is set to 1. We do this because this specific activity is memory intensive and we do not want any other activity gets executed when another one is already running for another workflow. The activity sends a heartbeat every few seconds and it is cancelable.

Now the problem is that if the workflow gets cancelled, the activity gets properly cancelled but the worker will never be assigned to another waiting workflows (who needs the same activity). The weird part is that if when we look at the “task-queues” for the UI Pollers page, it shows the ACTIVITY HANDLER checked and reports the activity worker to be available however no new work is being assigned to this worker whose activity just got cancelled. The situation stays the same until temporal server completely stop reporting the activity worker in the Pollers (after maybe 10-15 min) however the worker process is alive and looks healthy.

Also, we see a similar situation if the activity takes a long time to complete (like an hour). At that point, the activity finishes up, then the workflow gets completed but no more work will be assigned to that worker anymore.

During the time that the activity is running, the last event is ActivityTaskScheduled and the state is PENDING_ACTIVITY_STATE_STARTED. When the activity finishes up, then the ActivityTaskStarted shows up in the ui with following events.

1> What could be the cause of this?

2> Is the anyway for the worker to check its status with the temporal server and re-register itself with the server or something?

We are running on the latest Master branch.

Wenquan_Xing · February 9, 2021, 5:54pm

^ this is an expected behavior
PENDING_ACTIVITY_STATE_STARTED means activity is in pending state, the state is started

Wenquan_Xing · February 9, 2021, 5:55pm

are there other worker polling from the same queue / are there any new activity to be executed by worker?

Wenquan_Xing · February 9, 2021, 5:56pm

please try not to use the master branch, ever
latest release version is 1.6.3

mike · February 9, 2021, 8:46pm

no, this is the only worker polling from this queue.(we want to make sure a single worker works as expected first before scaling this up)

yes, there are a whole bunch of other workflows that need the same activity on this worker. The worker is supposed to be assigned to them after it is cancelled but it is not

maxim · February 9, 2021, 8:52pm

You mentioned that “the activity gets properly cancelled”. How do you ensure this? The only way for activity to get properly canceled is to heartbeat and rethrow the ActivityCanceledException thrown from the heartbeat method from its body.

Here is the relevant sample.

mike · February 9, 2021, 9:03pm

Our code base is go. We followed this example.

github.com

temporalio/samples-go/blob/master/cancelactivity/activities.go

package cancelactivity

import (
	"context"
	"time"

	"go.temporal.io/sdk/activity"
)

type Activities struct {
}

func (a *Activities) ActivityToBeCanceled(ctx context.Context) (string, error) {
	logger := activity.GetLogger(ctx)
	logger.Info("activity started, to cancel workflow, use 'go run cancelactivity/cancel/main.go " +
		"-w <WorkflowID>' or CLI: 'tctl wf cancel -w <WorkflowID>' to cancel")
	for {
		select {
		case <-time.After(1 * time.Second):
			logger.Info("heartbeating...")

This file has been truncated. show original

We do call activity.RecordHeartbeat(ctx, …) from the activity but stop sending the hearbeat once the activity is cancelled.

Wenquan_Xing · February 9, 2021, 9:24pm

would you mind using our public slack channel so we can quickly debug this?

Vitaly · February 10, 2021, 3:41am

I think we should try to reproduce this issue with a minimal sample.
Slightly orthogonal but important question. If you want to limit the number of activities that you run, why don’t you use setMaxConcurrentActivityExecutionSize instead? Limiting the number of workflow task pollers doesn’t sound like a right approach.

maxim · February 10, 2021, 3:45am

+1 to @Vitaly point. MaxConcurrentWorkflowTaskPollers doesn’t limit at all the number of parallel workflows and activities running on a worker.

mike · February 10, 2021, 7:07pm

My Apologies for pasting the wrong parameter into the question. We do indeed set MaxConcurrentActivityExecutionSize to 1 and do not touch the MaxConcurrentWorkflowTaskPollers.

maxim · February 11, 2021, 12:24am

Have you confirmed that activity is handling the heartbeating correctly?

mike · February 11, 2021, 7:31pm

Yes, I have been sending an incremental counter using activity.RecordHeartbeat(ctx, myCounter) and the UI reports that the heartbeat counter is being received by the temporal server (shows it is being incremented).

Vitaly · February 12, 2021, 8:01pm

I feel that there might be some misunderstanding here, should we follow up on slack/zoom and try to debug your issue together? Please ping me directly on our company slack and we can go from there.

Topic		Replies	Views
Activity poller becomes inactive, activities stuck in PENDING_ACTIVITY_STATE_SCHEDULED state Community Support typescript-sdk	5	105	April 4, 2025
Python SDK, workflow terminated but activity still running Community Support	3	1020	August 18, 2023
Redistribute pending activities among new workers after scaling up Community Support activity , worker	6	65	May 30, 2025
Exception with Activity Poller Community Support	10	3746	November 2, 2020
Worker.suspendPolling not working? Community Support java-sdk	9	709	February 3, 2023

Worker Configuration

Related topics