Quickly retry activities on different worker after graceful shutdown

AhmedMozaly · October 13, 2022, 7:45pm

From what I got so far, when a worker that has been started with w.Run(worker.InterruptCh()) gets SIGTERM it will do graceful shutdown by not polling for new activities and wait a while for the in-progress activities to finish

Once this waiting time is over, Worker will be terminated & in-progress activities that was not finished will timeout based on the configured settings for

If the goal was to quickly retry the activity on different worker when the original worker shut down.
HeatbeatTimeout can be used, But what if the activity is doing blocking IO logic that will make it hard to keep sending heartbeats. is it recommended practice to use some goroutine in such activities to keep sending heartbeat?

Is there is another way (using Interceptors for example) to mark these in-progress activities as failed during the graceful shutdown to be able to retry quickly on different worker?
Instead of implementing heartbeat logic for every single activity

maxim · October 13, 2022, 8:44pm

If the blocking IO is blocking for a very long time, then heartbeating from another goroutine is OK.

Also consider listening on WorkerStopChannel which is closed as soon as the worker shutdown is initiated.

AhmedMozaly · October 13, 2022, 8:59pm

Great!
Is there is a built-in support in go sdk to get list of in-progress activities in the current worker & mark them as failed (to cause a retry on another worker) ?

maxim · October 13, 2022, 11:37pm

You can write an activity interceptor that would subscribe to WorkerStopChannel and fail the activity once it is closed.

AhmedMozaly · October 14, 2022, 3:14pm

Thanks for the info
Will try it out!

Alex_Zhang · August 16, 2024, 8:25am

hello @maxim , could u share some details based on go-sdk (maybe some codes would be much better) on how to write an activity intercepter to subscribe the WorkerStopChannel? i suppose it would be synchronous, how to allow normal activity pass while abnormal cases would be captured
in addition, how to fail the activity, i just know if the activity returns error then it would be marked as failure, but how to define the interceptor to make error return by activity once worker exit?
i found a function called: GetWorkerStopChannel and also one interface: ActivityOutboundInterceptor including the function with same name, could u provide advice on which approach would be better for failing the activity
best regards

maxim · August 16, 2024, 5:54pm

The activity interceptor has access to the activity context like any other activity code. So you can call activity.GetWorkerStopChannel.

Look at the AcivityInboundInterceptor.ExecuteActivity. If this method returns an error the activity fails (or retried depending on the error type).

jhoney12 · August 22, 2024, 7:50am

Implementing a heartbeat mechanism for blocking I/O tasks is indeed a practical solution, especially if you need to ensure quick retries of activities after a worker shutdown. Using a goroutine to maintain heartbeats can help, but leveraging activity interceptors might be even more efficient. By subscribing to the WorkerStopChannel, you can intercept the shutdown event and mark activities as failed, prompting immediate retries on different workers. This approach avoids modifying each activity individually and centralizes the failure handling, which could streamline your error management. For integrating with systems like TraceShipments, similar practices can be applied to ensure smooth data processing and timely retries in case of interruptions.

Topic		Replies	Views
Releasing Pending activities due to worker lost Community Support python-sdk	5	379	October 12, 2023
Graceful Shutdown of temporal workers not marking activity as complete Community Support go-sdk , php-sdk	1	2727	August 5, 2021
Activity not recovered after worker restarted Community Support go-sdk , general-impl	3	893	February 9, 2023
Worker does not start activity after restart Community Support go-sdk , retries , worker	17	3385	May 24, 2021
Allowing activity retries to be interrupted Community Support go-sdk , retries , signals	3	829	November 30, 2020

Quickly retry activities on different worker after graceful shutdown

Related topics