Proper termination of a worker deployed on Kubernetes?

ringods · July 7, 2020, 1:44pm

Hello,

I am deploying my (workflow & activity) workers on Kubernetes. Today, I was wondering what the proper way is to upgrade these workers to a new version? Given Kubernetes wants to stop the old version, how can I align this with the worker process itself? My intent is this:

the old worker can still finish the ongoing activity.
the old worker may not pull for additional messages to process.
the worker process ends gracefully and the pod is cleaned up

Aligned to it, if my workflow uses session support, can I keep my old worker alive until all the required activities for that session are executed?

Ringo

maxim · July 7, 2020, 3:46pm

I believe both Java and Go SDK support a clean shutdown that waits for activities to complete without polling for new ones. The ability to wait for the whole session completion is not currently supported. I filed a feature request to get this added.

Go SDK worker.Run waits for TERM signal which pod sends to the process on clean shutdown and then executes the clean shutdown.

In Java the main method of the application has to deal with signal handling and then use worker.shutdown and worker.awaitTermination to wait for activity completions.

cg1972 · November 18, 2020, 7:21am

I’ve put together a simple worker example that registers a couple of Activities and a simple 2 stage workflow that runs as a cron job. When I shutdown the worker I’ve noticed that it takes quite a long time for it to be removed from the list of “Pollers” in the webapp.

I’ve added a listener for when the java main shuts down and call the WorkerFactory.shutdown() method.

If I startup the worker again (to simulate what will happen in k8) I see multiple Pollers in the list, the old one and the new one.

Does calling shutdown() on the WorkerFactory shutdown the worker correctly or do I need to call something else as well?
Is there a reason why the old worker takes so long to get removed. It can take 10-15mins for it to be removed from the list?
While the old worker remains in the list can it have any adverse affects on the running of the workflows?

Part of the reason I’m asking these questions is that the cron job I’m testing is set to run every minute but takes a long time before it actually triggers. I saw in another thread something about the WorkflowExecutionStarted.FirstWorkflowTaskBackoff? This explains why the cron job took so long to trigger but is this a setting that can be changed so the initial job triggers correctly i.e. if the cron is set to every minute it triggers after a minute?

maxim · November 18, 2020, 5:44pm

If you care about waiting for task completions you can call WorkerFactory.awaitTermination.
The pollers list in the web app is for human troubleshooting. It keeps the information about the worker for 15 minutes, but it doesn’t mean that the system treats this worker as available. To receive tasks workers have to implicitly call Temporal service API, so remembering worker identity doesn’t affect that behavior at all.
No, see 2.

The cron triggers when the cron expression tells it to trigger. So if the cron says that it should trigger at the beginning of the hour it is going to wait for the next beginning of an hour. To verify this behavior look at WorkflowExecutionStarted.FirstWorkflowTaskBackoff value in your workflow execution history.

cg1972 · November 18, 2020, 11:12pm

Thanks Maxim for clarifying.

Topic		Replies	Views
Proper termination of a worker in Python in Kubernetes Community Support python-sdk , general-impl , kubernetes	13	1375	May 9, 2025
Clean shutdown of activity workers Community Support java-sdk	0	551	March 3, 2023
Best practices for shutting down workers Community Support java-sdk	1	958	January 8, 2021
Releasing Pending activities due to worker lost Community Support python-sdk	5	382	October 12, 2023
Graceful Shutdown of temporal workers not marking activity as complete Community Support go-sdk , php-sdk	1	2765	August 5, 2021

Proper termination of a worker deployed on Kubernetes?

Related topics