I’m running into an issue setting up a datadog dashboard for temporal due to the way the metric names are formatted, for example this is a metric produced now by the frontend service:
It looks like tags are being appended to the metric name with dot delimited notation. I’d expect namespace, operation, service, and type to be tags on this metric with the metric name being “temporal.service_latency”.
The reason this metric naming scheme makes it difficult to create datadog dashboard is that each individual combination above needs to be manually added as a series.
Our server config (for the matching server), is as follows:
My apologies if this has been covered elsewhere, I couldn’t find anything similar to this post. I’ve seen some discussion around using Prometheus, but it seems like using the statsd config should just work in this case. I’ve also seen some discussion around using a custom reporter which may be what we want here?
Tally statsd reporter does add
"." as delimeter, see here. are you sure you don’t want to produce prometheus style metrics (iirc datadog does have prometheus integration, could be wrong)? If not, the mentioned custom reporter should be ok imo.
are you sure you don’t want to produce prometheus style metrics (iirc datadog does have prometheus integration, could be wrong)?
I can give the datadog prometheus integration a try. It would be surprising to me if the behavior via the integration was different from the statsd backend however.
If not, the mentioned custom reporter should be ok imo.
Do you know of or have any code examples where a custom metric reporter is used when creating a server? I’ve seen this post which outlines the code for a custom metric reporting implementation, but not clear on how to put it all together.
Wanted to provide an update on the above issue. After switching to prometheus metrics and using the DD integration, this resolved the original problem. Now getting metrics in DD without the tags baked into the metric name. (e.g. We’re seeing metrics like “schedule_to_start_latency” with various tags like “namespace” and “taskqueue”).
- For our guidance to work, you’ll need to be using kubernetes
- You need to make sure the prometheus integration is enabled on the datadog agent
- Ensure your helm chart is set up properly to accept podAnnotations
- Follow this guide. It’s not specific to temporal but gets you pretty close
For our deployments, we define values in terraform and deploy via helm release. During the helm release step, it pulls our
values-<env>.yaml files, takes the values and templates them with our helm chart, before release. Our
values.yaml file ended up looking like this for our prometheus<>datadog integration:
Datadog’s prometheus integration will then scrape your metrics endpoint (
%%host%%:9090/metrics in our case) at some frequency and push them to datadog. We configured ours to use
temporal as the namespace as we already had
temporal-matching as tags on our metrics.
A sample metric query looks like this:
We’re still in the process of building out dashboards and may share the dashboard JSON once we’re done!