Hi,
My team recently adopted Temporal, and metrics is become crucial for debugging production workflows and activities. We’re primarily working with the Python SDK.
The default out-of-the-box metrics have been very helpful. Meanwhile, I thought adding custom labels (dimensions) to these metrics would offer more precise insights for troubleshooting.
For example, we are submitting thousands of WorkflowA
instances, each with multiple subtype
values. While temporal_workflow_failed
is a useful metric for tracking failures, we would have better insights if we could add subtype
as a custom label in the metric, such as temporal_workflow_failed{..., subtype=...}
.
In our actual use case, there are additional levels of dimensions we’d like to track.
Does anyone have experience or advice on managing custom labels in Temporal metrics, or any alternate approaches for enhancing metrics in this way? Any guidance would be greatly appreciated!