Tags are appearing in the metric name when using statsd configuration

amajedi · May 2, 2022, 2:20pm

Hi,

I’m running into an issue setting up a datadog dashboard for temporal due to the way the metric names are formatted, for example this is a metric produced now by the frontend service:

temporal.service_latency.namespace.default.operation.DescribeNamespace.service.temporal.type.frontend.95percentile

It looks like tags are being appended to the metric name with dot delimited notation. I’d expect namespace, operation, service, and type to be tags on this metric with the metric name being “temporal.service_latency”.

The reason this metric naming scheme makes it difficult to create datadog dashboard is that each individual combination above needs to be manually added as a series.

Our server config (for the matching server), is as follows:

global:
    metrics:
        tags:
            service: temporal
            type: matching
        statsd:
            hostPort: localhost:8125
            prefix: temporal

My apologies if this has been covered elsewhere, I couldn’t find anything similar to this post. I’ve seen some discussion around using Prometheus, but it seems like using the statsd config should just work in this case. I’ve also seen some discussion around using a custom reporter which may be what we want here?

Thank you!

tihomir · May 2, 2022, 3:16pm

Tally statsd reporter does add "." as delimeter, see here. are you sure you don’t want to produce prometheus style metrics (iirc datadog does have prometheus integration, could be wrong)? If not, the mentioned custom reporter should be ok imo.

amajedi · May 4, 2022, 1:44pm

Thanks @tihomir!

are you sure you don’t want to produce prometheus style metrics (iirc datadog does have prometheus integration, could be wrong)?

I can give the datadog prometheus integration a try. It would be surprising to me if the behavior via the integration was different from the statsd backend however.

If not, the mentioned custom reporter should be ok imo.

Do you know of or have any code examples where a custom metric reporter is used when creating a server? I’ve seen this post which outlines the code for a custom metric reporting implementation, but not clear on how to put it all together.

amajedi · June 8, 2022, 9:06pm

Wanted to provide an update on the above issue. After switching to prometheus metrics and using the DD integration, this resolved the original problem. Now getting metrics in DD without the tags baked into the metric name. (e.g. We’re seeing metrics like “schedule_to_start_latency” with various tags like “namespace” and “taskqueue”).

Prerequisites

For our guidance to work, you’ll need to be using kubernetes
You need to make sure the prometheus integration is enabled on the datadog agent
Ensure your helm chart is set up properly to accept podAnnotations
Follow this guide. It’s not specific to temporal but gets you pretty close

For our deployments, we define values in terraform and deploy via helm release. During the helm release step, it pulls our values-<env>.yaml files, takes the values and templates them with our helm chart, before release. Our values.yaml file ended up looking like this for our prometheus<>datadog integration:

server:

  frontend:

    podAnnotations: {
      ad.datadoghq.com/temporal-frontend.check_names: '["openmetrics"]',
      ad.datadoghq.com/temporal-frontend.init_configs: '[{}]',
      ad.datadoghq.com/temporal-frontend.instances: '[{
            "prometheus_url": "http://%%host%%:9090/metrics",
            "namespace": "temporal",
            "metrics": ["*"],
            "send_distribution_buckets": true,
            "max_returned_metrics": 100000
          }]'
    }
  
  history: 
    ...
  worker:
    ...

Datadog’s prometheus integration will then scrape your metrics endpoint (%%host%%:9090/metrics in our case) at some frequency and push them to datadog. We configured ours to use temporal as the namespace as we already had temporal-history, temporal-worker, temporal-frontend, and temporal-matching as tags on our metrics.

A sample metric query looks like this:
avg:temporal.task_schedule_to_start_latency{environment:prod,kube_container_name:temporal-history}

We’re still in the process of building out dashboards and may share the dashboard JSON once we’re done!

Topic		Replies	Views
Setting up of monitoring with Datadog Community Support prometheus	8	3866	September 15, 2023
Cost estimation for datadog usage Community Support metrics	1	873	June 25, 2021
Metrics not being emitted Community Support	1	714	March 11, 2021
Attaching custom tags workflow metrics Community Support prometheus , metrics	8	2852	August 26, 2020
What metrics does temporal expose out of box and how to consume this in prometheus? Community Support prometheus , metrics	10	8731	August 5, 2022

Tags are appearing in the metric name when using statsd configuration

Related topics