Hi,
I am running an activity where a SQL query is run - the execution can take from a few seconds to several hours. For this reason I have kept my start_to_close start_to_close_timeout
to be a large value, while having a heartbeat_timeout.
I am using the auto_heartbeater to avoid long wait times in case of worker crashes or any other issues, where there would be a heartbeat timeout and the activity would be retried.
What I am observing is that - In case of a long running query (say 10 minutes) - the heartbeat is produced, but then halfway through I am seeing this error and the activity is retried.
2024-09-18T00:45:49.402378Z WARN temporal_sdk_core::worker::activities::activity_heartbeat_manager: Error when recording heartbeat: Status { code: Cancelled, message: "operation was canceled", source: Some(tonic::transport::Error(Transport, hyper::Error(Canceled, "connection closed"))) }
I am unable to catch what could cause this. could someone please help understand this better.
Thank you!