What metrics does temporal expose out of box and how to consume this in prometheus?

Hi, there is not much documentation around metrics, the statd which was removed too seemed very complex.

I want to to understand
a)what metrics does temporal expose by default
b) are the metrics namespace specific?
c) can i get queue /task list specific metrics?
d) how to consume them in prometheus.
e) if i am to develop custom metrics what’s the best way, should those be activities in workflows or interceptors?

1 Like

Hey @madhu,
Yes this is an area where we lack any public documentation at the moment but this is definitely something which is pretty high up in priority among the list of tasks which we plan to address soon.

Temporal server reports a wide variety of metric to help operators get visibility into cluster and setup alerts. We use tally for reporting metric from the application and it supports multiple backends like prometheus, statsd, and M3db. We generally recommend to run Temporal with Prometheus backend and plan to provide dashboards using promQL to the community very soon. Here is a dashboard repo which we started recently. This is something we are iterating over pretty heavily at the moment and not ready for production use at the moment, but you can definitely use this as a reference to build your own dashboards.

All the metric emitted by server are listed in defs.go. So if you see somethings are missing in the dashboards then you can use the defs.go as a reference.

We have provided a development config which shows how to run the server using prometheus as the back end. You can also checkout our helm chart which also has a section on how to run Temporal with prometheus as the metric backend.

2 Likes

Thanks much @samar i will check these links and get back. really appreciate .

Hi @samar ,
“with the context of .29 helm charts”
in the config we have
datasources:
- name: TemporalMetrics
type: prometheus
url: http://{{ .Release.Name }}-prometheus-server
access: proxy
isDefault: true
“url: http://{{ .Release.Name }}-prometheus-server” what does this URL stand for and being resolved.

why am I asking this is, I have installed the temporal helm charts tags .29 in X namespace and we have prometheus operator and grafana in another Y namespace.
Im not getting the datasources in the grafana UI.

please let me know if Iam missing something or how should I proceed to use existing prometheus operator.

regards
Sandeep

basically your question is how to configure “bring your own prometheus”
i am not too sure, but each component/Role has a promethus section right, will you not be able to provide the prmoethus endpoint there?

  frontend:
    # replicaCount: 1
    service:
      type: ClusterIP
      port: 7233
    metrics:
      annotations:
        enabled: true
      serviceMonitor: {}
       prometheus: {//HERE GOES YOUR STUFF??}
 
      # enabled: false

Hi @Sandeep_Paul - those configs that you are talking about are for connecting the grafana deployed by our helm chart to the prometheus deployed by our helm chart.

If you want to bring your own prometheus and your own grafana you can make use of our dashboards but you’ll use your existing prometheus as datasource and configure your prometheus to scrape metrics in the namespace you have temporal deployed to.

Depending on how you have that set up, things to check include: prometheus is configured to scrape metrics in your temporal namespace, your RBAC settings allow prometheus to scrape metrics in your temporal namespace, and that any annotations you need on temporal deployments are set.

With prometheus operator there’s a note in our default values file about setting serviceMonitor to enabled which should create the required resources for you.

when using your own prom/grafana, install the helm chart with prometheus.enabled=false and grafana.enabled=false as well.

1 Like