Is there a way to categorize the metrics available from temporal server and client?

Ruchir · February 10, 2022, 6:40am

Hello Experts,

I am new to Temporal. In my POC, I’ve deployed Temporal cluster on my Kubernetes cluster using helm charts from https://github.com/temporalio/helm-charts.git. Also, I have enabled scrape endpoints from my client code(both starter and worker in java),

Is there a way to categorize between metrics generated from Temporal server and Client? Also, is there a list of all the available metrics along with a brief description to indicate what the metric indicates?

Regards,
Ruchir

tihomir · February 10, 2022, 3:05pm

Is there a way to categorize between metrics generated from Temporal server and Client?

SDK metrics have by default the “temporal_” prefix for both service and worker metrics.

is there a list of all the available metrics along with a brief description to indicate what the metric indicates?

SDK metrics: Java, Go. Also see docs page on SDK metrics. Here are all server metrics, as well as a docs page.

Ruchir · February 11, 2022, 4:12am

SDK metrics have by default the “temporal_” prefix for both service and worker metrics.

@tihomir but in some cases, the temporal prefix is not present, for eg: workflow_success, workflow_failed, service_requests. Hence, asked for a list.

Will definitely go through the links you have provided. Thank you so much for your quick response!

Ruchir · February 14, 2022, 10:49am

@tihomir Is there any one liner description for these metrics? Need to understand which of them might be useful for us to monitor as the list is huge. Thanks.

tihomir · February 14, 2022, 3:10pm

That’s something that we should be adding to our docs soon.
To get a feel of which ones might be useful for you (including grafana queries) see this sdk and server dashboard defs.
This dashboard is also set up out of the box in the background checks learning path demo.

dpincas · March 9, 2022, 7:59pm

Hi @tihomir,

Thanks for letting us know that this is coming to the docs soon. Several of my peers have been chatting about this and hunting for metrics meaning in the code, etc. I think having a simple page listing all metrics and their meanings (and units) on the docs would be WONDERFUL. Please take this as a gentle “bump” that many Temporal users would love this documentation! (We can find what we need by hunting in the code today, but it is always a hunt)

Thanks,
Dan

tihomir · March 9, 2022, 8:22pm

@dpincas totally agree, team is looking into that and making sure the SDKs are aligned as well.
Also looking into providing better out-of-box dashboards for SDK and server metrics as well.
Will make sure to post updates here.

Topic		Replies	Views
Temporal cluster not emitting certain metrics Server Deployment prometheus , metrics	2	315	February 1, 2024
Different client worker metric names for Java and Go SDKs Community Support java-sdk , prometheus , worker	3	631	September 9, 2022
Temporal worker vs customer worker Community Support	3	165	December 6, 2024
Metrics For Monitoring Server Performance Community Support performance , metrics	2	4058	August 27, 2020
Prom metrics missing using python worker and go workflow Community Support general-impl , kubernetes	1	714	June 28, 2022

Is there a way to categorize the metrics available from temporal server and client?

Related topics