I deployed Prometheus and Grafana to monitor metrics for both the server and SDK. However, when I imported the json file from dashboards/sdk-general.json at master · temporalio/dashboards · GitHub into the Grafana dashboard, I noticed that some metrics emitted by my worker were missing even though they had the same name as those listed in the json file.
The sdk version is v1.21.1.
I need the temporal_request_latency_bucket
metric to display the RPC Latencies Panel, but I couldn’t find it. Instead, I found a metric called temporal_request_latency_seconds_bucket
that was emitted by my worker. Has the name changed?
However, the P95 of the temporal_request_latency_seconds_bucket
seems to be very high, up to 1 second. Is there a problem with this metric?
Here’s the corresponding Prometheus query for reference:
histogram_quantile(0.95, sum by (namespace, operation, le) (rate(temporal_samples_temporal_request_latency_seconds_bucket{namespace=~"$Namespace"}[5m])))