Missing workflow data from /metrics

Hello

Looking at dashboards/sdk.json at 6094dd666f386e76a3c03e0049f02521210b6883 · temporalio/dashboards · GitHub I see that there should exist metrics named temporal_workflow_completed, temporal_workflow_failed, etc.

I cannot find any of these in the output from /metrics. I’m getting the following values:

build_age
build_information
client_errors
client_latency_bucket
client_latency_count
client_latency_sum
client_redirection_errors
client_redirection_latency_bucket
client_redirection_latency_count
client_redirection_latency_sum
client_redirection_requests
client_requests
event_blob_size_bucket
event_blob_size_count
event_blob_size_sum
gomaxprocs
history_size_bucket
history_size_count
history_size_sum
invalid_task_queue_name
memory_allocated
memory_gc_pause_ms_bucket
memory_gc_pause_ms_count
memory_gc_pause_ms_sum
memory_heap
memory_heapidle
memory_heapinuse
memory_num_gc
memory_stack
namespace_cache_callbacks_latency_bucket
namespace_cache_callbacks_latency_count
namespace_cache_callbacks_latency_sum
namespace_cache_prepare_callbacks_latency_bucket
namespace_cache_prepare_callbacks_latency_count
namespace_cache_prepare_callbacks_latency_sum
num_goroutines
persistence_latency_bucket
persistence_latency_count
persistence_latency_sum
persistence_requests
restarts
service_authorization_latency_bucket
service_authorization_latency_count
service_authorization_latency_sum
service_errors_context_timeout
service_errors_entity_not_found
service_errors_execution_already_started
service_latency_bucket
service_latency_count
service_latency_sum
service_requests
version_check_failed
version_check_latency_bucket
version_check_latency_count
version_check_latency_sum
version_check_request_failed

Is there something I missed turning on? I’ve set the environment variable PROMETHEUS_ENDPOINT: "0.0.0.0:9090" on the frontend node.

I’m using Docker Swarm to run my services. Is there anything from helm-charts/values.yaml at master · temporalio/helm-charts · GitHub I should set as env to trigger metrics on the workflows?

1 Like

temporal_workflow_completed , temporal_workflow_failed these metrics are reported by SDK:

Thank you for your reply!

Turns out, there’s several issues which I did not understand when I wrote this question.

First of all - I was only picking up the metrics from the Temporal FRONTEND node. In order to get a complete set of metric data, one should also set the PROMETHEUS_ENDPOINT environment variable for the HISTORY, WORKER and HISTORY nodes.

Will ADMINTOOLS also give you metrics? I don’t know.

Secondly - I did not understand that you need to report “your own” metrics directly from your Temporal Client as implemented by the SDK you’re using. There’s the example in samples-go where we see how to set the MetricsScope when creating the Temporal Client.

	c, err := client.NewClient(client.Options{
		MetricsScope: newPrometheusScope(prometheus.Configuration{
			ListenAddress: "0.0.0.0:9090",
			TimerType:     "histogram",
		}),
	})

So - between setting the env correctly for all the Temporal server nodes and reporting from the Temporal Client when using the SDK - I’m now getting all the metrics. I think?

(this post for other people trying to get this working. Is this collected in the documentation in any way, yet?)

1 Like

admin tools provides the tools for accessing Temporal, I do not expect this pod to emit metrics

i believe so