Metrics not being emitted

Hello,

I’m trying to emit temporal metrics from our Kubernetes deployment to Datadog and followed the same approach that was done for cadence by adding the following to the annotations:

ad.datadoghq.com/temporal-frontend.check_names: ‘[“openmetrics”]’
ad.datadoghq.com/temporal-frontend.init_configs: ‘[{“skip_proxy”:true}]’
ad.datadoghq.com/temporal-frontend.instances: ‘[{“prometheus_url”: “http://%%host%%:9090/metrics”,“namespace”: “temporal”,“metrics”: [“workflow*”,“temporal_requests*”,“fail*”,“complete*”,“schedule*”]}]’

However, there are a lot of metrics missing in temporal that are available in cadence. For example, there is no temporal.worflow_failed as there is a cadence.workflow_failed and I see it in the code below.

Temporal Metrics: temporal/defs.go at master · temporalio/temporal · GitHub

Any ideas about this?

Thanks

i’m not sure about datadog and how they handles things, but internally we use prometheus operator with kubernetes and the kubernetes mixin … we’ve seen some metrics get missed when there are some collisions between metrics / labels that we have to correct for at ingestion time or we lose things - for instance, there are kubernetes namespaces and temporal namespaces - if we’re applying kubernetes based labels to metrics emitted from temporal then that collision needs to be handled. with prometheus we do that via scrape_config. maybe there’s something similar going on in your datadog setup?