We are already using the workflow_failed metric published by the server but we need a more fine-grained view into some of them.
Yes, the server metric workflow_failed does not include the workflow type. You could instead use the
SDK temporal_workflow_failed metric which does include workflow_type param you can set in Grafana queries to the specific workflows you want to monitor.
Thanks @tihomir, that’s exactly what I was looking for.
The only issue remaining is it seems our client code is not emitting temporal_workflow_failed metric. We’re using Java SDK. Is there any config I might be missing?
Alternatively, can I use workflow_failed metric with taskqueue param?
The only issue remaining is it seems our client code is not emitting temporal_workflow_failed metric. We’re using Java SDK. Is there any config I might be missing?
This should be emitted when a workflow execution fails by your workers if you configured sdk metrics with workers.
Using Java SDK it’s a counter so you should see: temporal_workflow_failed_total{ .... } 1.0
logged in your Prometheus metrics endpoint for example.
Alternatively, can I use workflow_failed metric with taskqueue param?
Yes it does have task_queue param that you can use in your Grafana queries.