Long running activity with auto_heartbeater failing

Hi,

I am running an activity where a SQL query is run - the execution can take from a few seconds to several hours. For this reason I have kept my start_to_close start_to_close_timeout to be a large value, while having a heartbeat_timeout.

I am using the auto_heartbeater to avoid long wait times in case of worker crashes or any other issues, where there would be a heartbeat timeout and the activity would be retried.

What I am observing is that - In case of a long running query (say 10 minutes) - the heartbeat is produced, but then halfway through I am seeing this error and the activity is retried.

2024-09-18T00:45:49.402378Z  WARN temporal_sdk_core::worker::activities::activity_heartbeat_manager: Error when recording heartbeat: Status { code: Cancelled, message: "operation was canceled", source: Some(tonic::transport::Error(Transport, hyper::Error(Canceled, "connection closed"))) }

I am unable to catch what could cause this. could someone please help understand this better.

Thank you!

1 Like

Hi @maxim, any thoughts on how can I proceed further on this or where I redirect this to find some help?

I don’t know Python. My guess is that something in your python code blocks heartbeating.

Which SDK version? There was an issue recently fixed in 1.7.1 that looks similar.

1 Like

Hi @Chad_Retz - thanks for sharing reference to the issue.

I am currently on 1.7.0 - I’ll review the issue and check with the new version if I am still seeing this.