Info about prometheus metrics

I am trying to create a metric to detect non determinism errors, but I see there are a lot of metrics (workflow_task_execution_failed, workflow_failed and workflow_task_queue_poll_failed), I am not sure which one should I use and I can’t find any details about these metrics anywhere.

I tried to create a workflow then changed its history to make the workflow be in a non-deterministic state, then when checking metrics I found that “workflow_task_queue_poll_failed” increased, while I can’t find “workflow_task_execution_failed” at all in the metrics, at the same time I don’t see “workflow_task_queue_poll_failed” metric in the documentation.

Are there some details about these metrics somewhere indicating the purpose of each one?

metric to detect non determinism errors

use temporal_workflow_task_execution_failed by failure_reason and can set it to be NonDeterminismError

1 Like

Are there some details about these metrics somewhere indicating the purpose of each one?

Docs have page here if it helps, but yeah as you said it might not be up to date to show all sdk metrics and their properties currently. We will work on updating it

1 Like

Thank you for your response, but I can’t see this metric at all in the emitted metrics, any idea why?
Do I need to be on specific version of temporal to start seeing it?

I just updated the temporal version to V1.8.0 and it started showing (I was on V1.0.0 :D)

What other values are expected other than this one and when they can happen?

What other values are expected other than this one and when they can happen?

value would be either NonDeterminismError or WorkflowError for any other intermittent failures of workflow task

1 Like