We are deploying a worker that host an activity whose MaxConcurrentWorkflowTaskPollers is set to 1. We do this because this specific activity is memory intensive and we do not want any other activity gets executed when another one is already running for another workflow. The activity sends a heartbeat every few seconds and it is cancelable.
Now the problem is that if the workflow gets cancelled, the activity gets properly cancelled but the worker will never be assigned to another waiting workflows (who needs the same activity). The weird part is that if when we look at the “task-queues” for the UI Pollers page, it shows the ACTIVITY HANDLER checked and reports the activity worker to be available however no new work is being assigned to this worker whose activity just got cancelled. The situation stays the same until temporal server completely stop reporting the activity worker in the Pollers (after maybe 10-15 min) however the worker process is alive and looks healthy.
Also, we see a similar situation if the activity takes a long time to complete (like an hour). At that point, the activity finishes up, then the workflow gets completed but no more work will be assigned to that worker anymore.
During the time that the activity is running, the last event is ActivityTaskScheduled and the state is PENDING_ACTIVITY_STATE_STARTED. When the activity finishes up, then the ActivityTaskStarted shows up in the ui with following events.
1> What could be the cause of this?
2> Is the anyway for the worker to check its status with the temporal server and re-register itself with the server or something?
We are running on the latest Master branch.