Info about prometheus metrics

Omar_Sawan · February 17, 2025, 1:58pm

I am trying to create a metric to detect non determinism errors, but I see there are a lot of metrics (workflow_task_execution_failed, workflow_failed and workflow_task_queue_poll_failed), I am not sure which one should I use and I can’t find any details about these metrics anywhere.

I tried to create a workflow then changed its history to make the workflow be in a non-deterministic state, then when checking metrics I found that “workflow_task_queue_poll_failed” increased, while I can’t find “workflow_task_execution_failed” at all in the metrics, at the same time I don’t see “workflow_task_queue_poll_failed” metric in the documentation.

Are there some details about these metrics somewhere indicating the purpose of each one?

tihomir · February 19, 2025, 3:52am

metric to detect non determinism errors

use temporal_workflow_task_execution_failed by failure_reason and can set it to be NonDeterminismError

tihomir · February 19, 2025, 3:53am

Are there some details about these metrics somewhere indicating the purpose of each one?

Docs have page here if it helps, but yeah as you said it might not be up to date to show all sdk metrics and their properties currently. We will work on updating it

Omar_Sawan · February 19, 2025, 6:27am

Thank you for your response, but I can’t see this metric at all in the emitted metrics, any idea why?
Do I need to be on specific version of temporal to start seeing it?

Omar_Sawan · February 19, 2025, 10:44am

I just updated the temporal version to V1.8.0 and it started showing (I was on V1.0.0 :D)

Omar_Sawan · February 19, 2025, 10:46am

What other values are expected other than this one and when they can happen?

tihomir · February 19, 2025, 1:51pm

What other values are expected other than this one and when they can happen?

value would be either NonDeterminismError or WorkflowError for any other intermittent failures of workflow task

Topic		Replies	Views
Detect failure in workflow Community Support python-sdk , error-handling , activity	1	54	December 1, 2024
Individual workflow metric Community Support metrics	4	1464	March 30, 2022
Find non-determinism issues from the web-ui Community Support web-ui	2	271	August 20, 2024
Troubles shoot of workflow execution latency Community Support performance	1	884	August 11, 2022
WorkflowTaskTimedOut observed during performance testing Community Support python-sdk	4	466	December 1, 2023

Info about prometheus metrics

Related topics