From what I got so far, when a worker that has been started with w.Run(worker.InterruptCh()) gets SIGTERM it will do graceful shutdown by not polling for new activities and wait a while for the in-progress activities to finish
Once this waiting time is over, Worker will be terminated & in-progress activities that was not finished will timeout based on the configured settings for
If the goal was to quickly retry the activity on different worker when the original worker shut down.
HeatbeatTimeout can be used, But what if the activity is doing blocking IO logic that will make it hard to keep sending heartbeats. is it recommended practice to use some goroutine in such activities to keep sending heartbeat?
Is there is another way (using Interceptors for example) to mark these in-progress activities as failed during the graceful shutdown to be able to retry quickly on different worker?
Instead of implementing heartbeat logic for every single activity
Great!
Is there is a built-in support in go sdk to get list of in-progress activities in the current worker & mark them as failed (to cause a retry on another worker) ?
hello @maxim , could u share some details based on go-sdk (maybe some codes would be much better) on how to write an activity intercepter to subscribe the WorkerStopChannel? i suppose it would be synchronous, how to allow normal activity pass while abnormal cases would be captured
in addition, how to fail the activity, i just know if the activity returns error then it would be marked as failure, but how to define the interceptor to make error return by activity once worker exit?
i found a function called: GetWorkerStopChannel and also one interface: ActivityOutboundInterceptor including the function with same name, could u provide advice on which approach would be better for failing the activity
best regards
Implementing a heartbeat mechanism for blocking I/O tasks is indeed a practical solution, especially if you need to ensure quick retries of activities after a worker shutdown. Using a goroutine to maintain heartbeats can help, but leveraging activity interceptors might be even more efficient. By subscribing to the WorkerStopChannel, you can intercept the shutdown event and mark activities as failed, prompting immediate retries on different workers. This approach avoids modifying each activity individually and centralizes the failure handling, which could streamline your error management. For integrating with systems like TraceShipments, similar practices can be applied to ensure smooth data processing and timely retries in case of interruptions.