Individual workflow metric

jorge.neto · March 29, 2022, 8:00pm

Hi experts,

We want to setup individual monitoring/alerting for a few business-critical Temporal workflows. What’s the best way to accomplish that?

We are already using the workflow_failed metric published by the server but we need a more fine-grained view into some of them.

Does it require a custom metric to be programatically emitted by each workflow implementation or is there another option?

Thanks in advance!

Cully · March 29, 2022, 8:12pm

Both the Temporal Cluster and a Temporal SDK emit metrics.

Temporal SDK Metrics are documented here: SDK metrics | Temporal Documentation
And some Cluster metrics are loosely documented here: Temporal Server self-hosted production deployment | Temporal Documentation

For SDK metrics - where they are emitted to is controlled by the handler specified in Temporal Client options
Go How to set ClientOptions in Go | Temporal Documentation
TypeScript: Logging and Sinks in TypeScript SDK | Temporal Documentation

tihomir · March 29, 2022, 8:30pm

Just to add regarding

We are already using the workflow_failed metric published by the server but we need a more fine-grained view into some of them.

Yes, the server metric workflow_failed does not include the workflow type. You could instead use the
SDK temporal_workflow_failed metric which does include workflow_type param you can set in Grafana queries to the specific workflows you want to monitor.

jorge.neto · March 30, 2022, 1:27pm

Thanks @tihomir, that’s exactly what I was looking for.

The only issue remaining is it seems our client code is not emitting temporal_workflow_failed metric. We’re using Java SDK. Is there any config I might be missing?

Alternatively, can I use workflow_failed metric with taskqueue param?

tihomir · March 30, 2022, 4:53pm

The only issue remaining is it seems our client code is not emitting temporal_workflow_failed metric. We’re using Java SDK. Is there any config I might be missing?

This should be emitted when a workflow execution fails by your workers if you configured sdk metrics with workers.
Using Java SDK it’s a counter so you should see:
temporal_workflow_failed_total{ .... } 1.0
logged in your Prometheus metrics endpoint for example.

Alternatively, can I use workflow_failed metric with taskqueue param?

Yes it does have task_queue param that you can use in your Grafana queries.

Topic		Replies	Views
Signal External Workflow Returning Unknown External Workflow Execution Error Community Support temporal-cloud	3	776	April 14, 2023
Differentiating single workflow failures vs exhausted retry attempts in Temporal metrics Community Support go-sdk	2	745	June 20, 2023
Attaching custom tags workflow metrics Community Support prometheus , metrics	8	2873	August 26, 2020
Info about prometheus metrics Documentation Feedback python-sdk	6	66	February 19, 2025
Looking for certain metrics to alarms on Community Support	3	1979	October 31, 2020

Individual workflow metric

Related topics