Monitoring and logging heartbeat timeouts

Brianecker · July 17, 2020, 5:48pm

A couple questions related to monitoring and logging heartbeat timeouts:

Is there a default metric for heartbeat timeouts by activity? I didn’t see one but could have missed it.
If I wanted to log a message when an activity heartbeat timeout occurs, how could I do that? It seems like I could catch ActivityTimeoutException and check for TimeoutType=HEARTBEAT, but I can’t catch that from the workflow if I’m also using retries until the retries expire.

maxim · July 18, 2020, 5:19pm

Is there a default metric for heartbeat timeouts by activity? I didn’t see one but could have missed it.

When timed out activity heartbeat, the request is going to fail and the Java SDK will emit metric metric “temporal_request_failure” with tags:

“Operation” == “RecordActivityTaskHeartbeat”
“StatusCode” == “NOT_FOUND”.
“Namespace” == activity namespace
“ActivityType” == activity type
“WorkflowType” == type of the workflow that invoked the activity

Currently, the metric doesn’t discriminate why the activity is not valid anymore. The same failure is reported if the workflow has closed or activity timed out due to some other timeout (like start to close).

The server emits “heartbeat_timeout” metric, but it is tagged by namespace only at this point.

I believe the activity heartbeating or completion failures that return “NOT_FOUND” are logged by the SDK.

Topic		Replies	Views
Activity timeout questions Community Support	3	1368	November 9, 2020
Activity Cancellation Heartbeat Community Support java-sdk	1	46	September 21, 2024
Temporal activity timeout -- where are the logs? Community Support	12	1301	January 18, 2024
Random Activity Timeouts Community Support go-sdk	3	248	January 30, 2024
Metric for complete activity failures Community Support go-sdk , general-impl	1	622	November 28, 2022

Monitoring and logging heartbeat timeouts

Related topics