Client metrics to detect connectivity issues

idiezmaperez · March 29, 2022, 1:40pm

Hi team,

I was taking a look to the client metrics in this class but was unable to find more documentation to understand what are the main differences between long_request_failure and request_failure.

I want to use a metric to detect when our roles lost connectivity with Temporal server (even though we do not have running workflows at that moment).

Could you provide more information on the metrics above and any recommendation to create a monitor that alert us when there is not connectivity between client and server?

Thanks in advance.

tihomir · March 29, 2022, 7:19pm

what are the main differences between long_request_failure and request_failure

Client operations (SDK client APIs) that are async, for example

WorkflowClient.start(...);
...
workflowStub.mySignalMethod(...);
... 
async activity/child workflow invocations

their failures would fall under request_failure bucket.

On the other hand things like workers long-polling their task queue to get workflow tasks, or any sync client api calls such as:

typedStub.myWorkflowMethod(...); // waits for wf to compelte
untypedStub.getResult(....); // potentially waits for completion
typedStub.myQueryMethod(); // waits for query to complete
or any sync child workflow/activity invocations

their failures would fall under the long_request_failure bucket.

Note these are not business level failures, but failures due to gRPC request issues to the Temporal frontend service (if io.grpc.Status is not “OK”, see here).

tihomir · March 29, 2022, 9:52pm

any recommendation to create a monitor that alert us when there is not connectivity between client and server?

If you are asking for connection failures from client to server, you could alert on temporal_request_failure and temporal_long_request_failure buckets, as well as associated “_latency” buckets (SDK metrics docs here, note SDK metrics are prefixed with “temporal_”.

Topic		Replies	Views
Individual workflow metric Community Support metrics	4	1443	March 30, 2022
Differentiating single workflow failures vs exhausted retry attempts in Temporal metrics Community Support go-sdk	2	738	June 20, 2023
Is there a way to categorize the metrics available from temporal server and client? Community Support metrics	6	984	March 9, 2022
Latency Metric that excludes sleep or idle time cases Community Support metrics	3	958	February 27, 2023
Java-sdk @UpdateMethod request metrics Community Support java-sdk	3	37	August 21, 2024

Client metrics to detect connectivity issues

Related topics